Adversarial Inverse Reinforcement Learning with Changing Dynamics

Tirinzoni, Andrea

TIRINZONI-THESIS-2017.pdf (1.41 MB)

Adversarial Inverse Reinforcement Learning with Changing Dynamics

thesis

posted on 2017-11-01, 00:00 authored by Andrea Tirinzoni

Most work on inverse reinforcement learning, the problem of recovering the unknown reward function being optimized by a decision-making agent, has focused on cases where optimal demonstrations are provided under single dynamics. We analyze the more general settings where the learner has access to sub-optimal demonstrations under several different dynamics. We argue that several problems, such as learning under covariate shift or risk aversion, can be modeled in this way. We propose an adversarial formulation where the learner tries to imitate a constrained, worst-case estimate of the demonstrator’s control policy. We adopt the method of Lagrange multipliers to remove the constraints and produce a convex optimization problem. We prove that the constraints imposed by the multiple dynamics lead to an NP-Hard optimization subproblem, the computation of a deterministic policy maximizing the total expected reward from several different Markov decision processes. We propose a tractable approximation by reducing the latter to the optimal control of partially observable Markov decision processes. We show the performance of our algorithm on two synthetic data problems. In the first one, we try to recover the reward function of a randomly generated Markov decision process, while in the second we try to rationalize a robot navigating through a grid and demonstrating goal-directed behavior.

History

Advisor

Ziebart, Brian D.

Chair

Ziebart, Brian D.

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Committee Member

Gmytrasiewicz, Piotr J. Santambrogio, Marco D.

Submitted date

August 2017

Issue date

2017-07-24

Usage metrics

Keywords

Machine Learning Inverse Reinforcement Learning Reinforcement Learning Adversarial Prediction Markov Decision Process Imitation Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Adversarial Inverse Reinforcement Learning with Changing Dynamics

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Issue date

Usage metrics

Categories

Keywords

Licence

Exports