TIRINZONI-THESIS-2017.pdf (1.41 MB)
0/0

Adversarial Inverse Reinforcement Learning with Changing Dynamics

Download (1.41 MB)
thesis
posted on 01.11.2017 by Andrea Tirinzoni
Most work on inverse reinforcement learning, the problem of recovering the unknown reward function being optimized by a decision-making agent, has focused on cases where optimal demonstrations are provided under single dynamics. We analyze the more general settings where the learner has access to sub-optimal demonstrations under several different dynamics. We argue that several problems, such as learning under covariate shift or risk aversion, can be modeled in this way. We propose an adversarial formulation where the learner tries to imitate a constrained, worst-case estimate of the demonstrator’s control policy. We adopt the method of Lagrange multipliers to remove the constraints and produce a convex optimization problem. We prove that the constraints imposed by the multiple dynamics lead to an NP-Hard optimization subproblem, the computation of a deterministic policy maximizing the total expected reward from several different Markov decision processes. We propose a tractable approximation by reducing the latter to the optimal control of partially observable Markov decision processes. We show the performance of our algorithm on two synthetic data problems. In the first one, we try to recover the reward function of a randomly generated Markov decision process, while in the second we try to rationalize a robot navigating through a grid and demonstrating goal-directed behavior.

History

Advisor

Ziebart, Brian D.

Chair

Ziebart, Brian D.

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Committee Member

Gmytrasiewicz, Piotr J. Santambrogio, Marco D.

Submitted date

August 2017

Issue date

24/07/2017

Exports