posted on 2021-08-01, 00:00authored bySanket Gaurav
Some limitations and challenges prevent robots from being accepted widely as human peers. This thesis studies prediction and learning methods to understand human intentions and address challenges posed to collaborative robots related to translating between human and robotic behavior. The first topic discusses the correspondence learning problem of estimating a mapping of human embodiment to robot-joint configuration for robotic teleoperation using virtual reality. The second topic seeks to enable robots to more accurately predict human intentions from partial trajectories so that the robot can plan complementary activities. Finally, the research extends to learn a daily human activity (mopping the floor) from human demonstration videos to a robotic arm.
By projecting into a 3-D workspace, robotic teleoperation using virtual reality allows for a more intuitive method of control for the operator than using a 2-D view from the robot's visual sensors. This chapter investigates a setup that places the teleoperator in a virtual representation of the robot's environment and develops a deep learning based architecture modeling the correspondence between the operator's movements in the virtual space and joint angles for a humanoid robot using data collected from a series of demonstrations. We evaluate the correspondence model's performance in a pick-and-place teleoperation experiment.
More accurately inferring human intentions/goals can help robots complete collaborative human-robot tasks more safely and efficiently. Bayesian reasoning has become a popular approach for predicting the intention or goal of a partial sequence of actions/controls using a trajectory likelihood model. However, the mismatch between the training objective for these models (maximizing trajectory likelihood) and the application objective (maximizing intention likelihood) can be detrimental. In this chapter, we seek to improve the goal prediction of maximum entropy inverse reinforcement learning (MaxEnt IRL) models by training to maximize goal likelihood. We demonstrate the benefits of our method on pointing task goal prediction with multiple possible goals and predicting goal based activities in the Cornell Activity Dataset (CAD-120).
Though mopping the floor is a mundane and tedious daily task, enabling robots to perform it comparably to humans remains a challenge. Hand-coding desired mopping behaviors for variable surfaces and situations is particularly difficult. In this chapter, we develop a robotic system for mopping the floor by mimicking the human behavior demonstrated in videos. Our approach builds upon the recent successes of imitation learning of other capabilities from human video demonstration (e.g., pouring tasks. Our first proposed robotic system uses traditional computer vision techniques for tracking and inverse kinematics. Our second system comprises advanced computer vision techniques, Time Contrastive Network (TCN), and reinforcement learning. From these, we devise a reward function for the mopping task. We use a Universal 10e robotic arm attached with a mop to perform the mopping task and a first-person camera attached on top of the robotic arm to provide feedback for robot learning.