Imitation Learning Under Suboptimal Demonstrations

Bashiri, Mohammad Ali

doi:10.25417/uic.20254266.v1

Imitation Learning Under Suboptimal Demonstrations

thesis

posted on 2022-05-01, 00:00 authored by Mohammad Ali Bashiri

The purpose of imitation learning (IL) is to efficiently learn a desired behavior by imitating an expert’s behavior. The behaviors of interest are usually complex and goal-oriented. In practice, the goals or rewards for such behaviors are difficult for a human to specify. However, the experts are able to provide demonstrations of the desired task even when they do not know the underlying mathematical model of learning. The main challenge in imitation learning, however, is the scarcity of high-quality demonstrations. This is due to noisy or suboptimal demonstrations, especially when humans are involved in the data-collecting process or when collecting high-quality data is expensive. In this thesis, we study the problem of imitation learning under noisy demonstrations or when demonstrations with varying quality are available. We first study distributionally robust imitation learning (DRoIL), an adversarial approach for imitation learning that is naturally designed to perform robustly against noisy demonstrations. We establish a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning, a well-studied imitation learning method. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a certain class of loss functions. We also study the problem of imitation learning when demonstrations with varying degrees of quality are available. We assume additional information on the quality of demonstrations, such as when rankings or pairwise preferences are available. For this setting, we develop Multiple Ranked Distributionally Robust Imitation Learning (MRDRoIL), a novel IL method that directly incorporates the ranked demonstration by employing inverse reinforcement learning techniques. In our method, we robustly learn a higher-quality reward function by minimizing a given loss with respect to the worst-case estimated policy that matches the features of demonstrated data while preserving the rankings of demonstrated data. We provide two efficient optimization algorithms to solve the corresponding problem. In our experiments, we show the significant benefits of DRoIL’s new optimization method on synthetic data and a highway driving environment. We also compare MRDRoIL with other preference-based and ranking imitation learning methods and show that MRDRoIL performs competitively against them.

History

Advisor

Ziebart, Brian

Chair

Ziebart, Brian

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Zhang, Xinhua Kash, Ian Reyzin, Lev Ratliff, Nathan

Submitted date

May 2022

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Machine Learning Imitation Learning Inverse Reinforcement Learning Robust Learning

Licence

In Copyright

Imitation Learning Under Suboptimal Demonstrations

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports