posted on 2023-12-01, 00:00authored byThiruvenkadam Sivaprakasam Radhakrishnan
In this work, we study reinforcement learning algorithms in the context of the multi-agent setting of two-player zero-sum games. We examine recent advancements in adapting mirror descent as a practical deep reinforcement learning algorithm. Specifically, we study Mirror Descent Policy Optimization and Magnetic Mirror Descent, two theoretically well-motivated algorithms that exhibit strong empirical performance. We propose the incorporation of certain modifications, namely - the Neural Replicator Dynamics, Extragradient updates, and Optimistic Updates into these algorithms to improve their convergence behavior (particularly to induce or speed up last-iterate convergence) in two-player zero-sum games. We evaluate the performance of these algorithms experimentally in tabular normal-form settings and in extensive-form games under function approximation. Through this study, we provide some recommendations for designing multi-agent reinforcement learning algorithms and some remarks about future research directions.