Roles and recognition.pdf (1.8 MB)

The Roles and Recognition of Haptic-Ostensive Actions in Collaborative Multimodal Human-Human Dialogues

Download (1.8 MB)
journal contribution
posted on 10.09.2016 by L. Chen, M. Javaid, B. Di Eugenio, M. Zefran
The RoboHelper project has the goal of developing assistive robots for the elderly. One crucial component of such a robot is a multimodal dialogue architecture, since collaborative task-oriented human-human dialogue is inherently multimodal. In this paper, we focus on a specific type of interaction, Haptic-Ostensive (H-O) actions, that are pervasive in collaborative dialogue. H-O actions manipulate objects, but they also often perform a referring function. We collected 20 collaborative task-oriented human-human dialogues between a helper and an elderly person in a realistic setting. To collect the haptic signals, we developed an unobtrusive sensory glove with pressure sensors. Multiple annotations were then conducted to build the Find corpus. Supervised machine learning was applied to these annotations in order to develop reference resolution and dialogue act classification modules. Both corpus analysis, and these two modules show that H-O actions play a crucial role in interaction: models that include H-O actions, and other extra-linguistic information such as pointing gestures, perform better. For true human-robot interaction, all communicative intentions must of course be recognized in real time, not on the basis of annotated categories. To demonstrate that our corpus analysis is not an end in itself, but can inform actual human-robot interaction, the last part of our paper presents additional experiments on recognizing H-O actions from the haptic signals measured through the sensory glove. We show that even though pressure sensors are relatively imprecise and the data provided by the glove is noisy, the classification algorithms can successfully identify actions of interest within subjects.


Milos Zefran, Ph.D, research has been funded by the National Science Foundation (NSF), and he is a recipient of the NSF CAREER award (2001).


Publisher Statement

Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Speech and Language. 2015. 34(1): 201-231. DOI: 10.1016/j.csl.2015.03.010.


Elsevier Inc.



Issue date