Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multi-agent environment. I-POMDPs augment POMDP beliefs with nested hierarchical belief structures. In order to plan optimally using I-POMDPs, we propose symbolic and neural approaches that learn others’ intentional models which ascribe to them beliefs, preferences and rationality in action selection.
In the symbolic Bayesian approach, agents maintain beliefs over intentional models of other agents and make sequential Bayesian updates using observations. To deal with the complexity of the hierarchical belief space, we have devised a customized interactive particle filter (I-PF) to descend the belief hierarchy, parametrize others' models, and sample all model parameters at each nesting level. We have also devised a neural network approximation of the I-POMDP framework, in which the belief update, value function, and policy function are implemented by various neural networks (NNs). Then we combined the same network architecture with the QMDP planner, and trained it end-to-end in a reinforcement learning fashion.
Empirical results show that our Bayesian learning approach accurately learns models of the other agent. It serves as a generalized Bayesian learning algorithm that learns other agents' beliefs, nesting levels, and transition, observation and reward functions. Moreover, we show that the model-based network which learns to plan outperforms the model-free network which only learns reactive policies. The learned policy can directly generalize to a larger, unseen setting.
History
Advisor
Gmytrasiewicz, Piotr
Chair
Gmytrasiewicz, Piotr
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Committee Member
Liu, Bing
Ziebart, Brian
Zhang, Xinhua
Koyuncu, Erdem