posted on 2023-05-01, 00:00authored byJacob P Love
Reinforcement learning is a process in which an agent learns a task through trial and error. Policy gradient methods solve this problem using various approaches to estimate the gradient of the objective with respect to policy parameters. In the multi-agent setting, approaches often involve each agent performing their own gradient estimate.
Traditional approaches to reinforcement learning used modified forms of backpropagation, a recursive algorithm for finding the gradient of a function, to solve single-agent problems. As research moved into the multi-agent setting, attention diverted from backpropagation to other gradient estimation methods.
I propose a backpropagation approach for multiagent sequential systems. Multiagent sequential systems are systems where multiple actors perform actions one after another to reach a goal. Real-life implementations of such systems range from assembly lines to golf scrambles, but there is a gap in the literature addressing these systems. What does exist primarily focuses on simulation-based physics scenarios or actor-critic models where a critic with full state information is used to estimate gradients for each actor during training. My work shows that backpropagation is uniquely suited for agent training in such systems without the need for a critic, especially in scenarios where the system is continuous and well-defined.
This work uses custom models to
Simulations compare the novel backpropagation approach to other popular reinforcement learning algorithms, including multi-agent deep deterministic policy gradients (MADDPG), and demonstrate the viability and limitations of the method in sequential agent models.