SARMA-THESIS-2019.pdf (7.77 MB)

Hand-Eye Coordination for Robotic Grasping using Deep Learning for Frame Coordinate Transform

Download (7.77 MB)
posted on 06.08.2019 by Tejas Seshari Sarma
Coordinate Frame Transformation, or Reference Frame Transformation, is a process that is used to effectively locate a target, detected in a robot's field of view, in three-dimensional Cartesian space. Being able to evaluate the three-dimensional Cartesian coordinates of a target object helps eliminate the usage of error based proximity detection of the robotic arm, in relation to the target. This greatly reduces the grasp time, which in-turn reduces the overall time for completion of a complex task involving multiple grasps. Here, the grasp approach mechanism begins with detecting, and isolating the target object within the task frame, which is a subset of the field of view of the robot's camera. Post the detection of the target by the robot, the coordinates of the centroid of the target are now fed into a trained Deep Learning model, which effectively outputs the corresponding three-dimensional Cartesian coordinates of this target in the robot's own frame of reference. Subsequently, the usage of an Inverse Kinematics solver, on the calculated Cartesian coordinates, directs the motion of the robot arm towards the target, effectively attempting a grasp. This mechanism attempts a grasp without the usage of dynamic calibration of the robotic arm, during the traversal, which leads to significant reduction in latency between planning and the grasp action itself. Dynamic trajectory planning can sometimes lead the robot arm to take a very convoluted path from its initial position, to ultimately reach the target object. Alternate approaches, that use external cameras equipped with depth sensors, extend the dependence of the robot beyond its own resources. The presented Frame Transform model can reduce this dependence, by using only the robot's resources, and Deep Learning to bridge the gap created by the absence of advanced features(in this case, lack of depth sensors on the built-in head Camera), in the hardware present on the Baxter Robot. The goal of the experiment is to determine the extent to which a Deep Learning Model can be used to replace a more conventional semi-analytical multi-stage pipeline for location estimation. We aim to explore the capabilities of Deeper Models and their ability to learn complex relations which would otherwise be concluded by means of a multi-stage pipeline of execution. While this experiment is constrained to a sample Cartesian Plane in the Baxter's frame of reference, it is intended to be a starting point to using Deep Learning to locate targets in Baxter's proximity efficiently.



Ziebart, Brian D


Ziebart, Brian D


Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level


Committee Member

Zefran, Milos Parde, Natalie

Submitted date

May 2019

Issue date