A Framework for Using Object Manipulation Actions in Multimodal Human-Robot Interaction
2019-08-01T00:00:00Z (GMT) by
There is a growing need for service robots that can support independent living of the elderly and people with disabilities, as well as robots that can assist human workers in a warehouse or on a fac- tory floor. However, robots that collaborate with humans should act predictably and ensure that the interaction is safe and effective. Therefore, when humans and robots collaborate for example during Activities of Daily Living (ADLs), robots should be able to recognize human actions and intentions and produce appropriate responses. To do so, it is crucial to understand how two humans interact during a collaborative task and how they perform them. Humans employ multiple communication modalities when engaging in collaborative activities; similarly, service robots require information from multiple sensors to plan their actions based on the interaction and the task states. In particular, it is necessary to collect and analyze data from different sensor modalities when humans engage in the interaction which involves physical manipulation of the environment. Towards this goal, our research focused on three research problems. 1- Human Grasp Analysis: Grasp is an important phase of object manipulation actions and service robots can be trained how to execute a grasp using programming by demonstration (PbD). To control a grasp, one should consider the position of the fingers on the grasped object including the general configuration of the fingers along with their applied forces, similarly for learning by demonstration, these variables provide good candidate features. However considering these multimodal features, the question arises whether existing grasp types presented in literature are appropriate. We endeavor to address this question empirically by studying the human grasp data using the pressure and bend sensors measurements from our developed data glove. We use unsupervised learning methods to extract the human grasp patterns from our collected data. The findings of our work implies that the grasp types that can be perceived from the measured data are a subset of the grasp types presented in taxonomies by literature. These results have direct implications on how PbD should be used for robots that assist with ADLs. 2- Human Manipulation Actions: Object manipulation actions represent an important share of ADLs. We therefore investigated how to enable service robots to use human multimodal data to learn how to perform these actions, and how to recognize them during human-robot collaboration. In mul- timodal human-robot interaction, when the humans and robots physically interact, the physical contact itself, be it between the human and the robot, or with the objects in the environment, becomes an impor- tant channel for communication. In this thesis, we focus on the latter, understanding how manipulation actions enhance the understanding between the human and the robot. Of special interest are so called haptic-ostensive (H-O) actions, manipulation actions that help the communication. In particular, we identify motion primitives from which H-O actions are composed. In this way, H-O actions can be recognized and classified. Our multimodal data consists of videos, and measurements from a data glove (including hand mo- tion, finger joint angle and applied force measured by bend and pressure sensors) collected from human participants implementing object manipulation actions. We explore two different methods, the first of which is primitive-based method while the second is purely data-driven, called visual-data-flow. In the former, we show that the multimodal signal generated by a manipulation action can be decomposed into a set of primitives namely building blocks. The primitives are modeled using physical insights obtained from experimental data. In the latter, the raw measurements are used along with the features extracted from videos using deep convolutional neural networks (DCNN). Both methods present acceptable per- formance which suggests that vision-based method can be adequate for H-O action recognition during multimodal human-robot interaction. In addition, it motivates us to utilize only image for performing human H-O action recognition in real-time. 3- A Manager for Human-Robot Multimodal Interaction: Service robots for the elderly require information from multiple modalities to maintain active interaction with a human during performing interactive tasks. We study in detail the scenario where a human and a service robot collaborate to find an object (Find Task) in the kitchen so it can be used in a subsequent task such as cooking. Based on the data collected during human studies, we develop an Interaction Manager which allows the robot to actively participate in the interaction and plan its next action given human spoken utterances, observed manipulation actions, and gestures. We develop multiple modules for a robot in the Robot Operating System (ROS), including H-O action recognition using vision, gesture recognition using vision, speech recognition using the Google speech recognition API, a dialogue tool which includes a multimodal dialogue act (DA) classifier that determines the intention of the speaker, and the Interaction Manager itself. The proposed system is validated using two different robot platforms: a Baxter robot and a Nao robot. The preliminary user study provides the evidence that by using the developed multimodal Interaction Manager, the robot can successfully interact with the human in the Find Task.