Removing Redundant Partner Policies May Be Unnecessary for Ad Hoc Teamwork
Ad hoc teamwork (AHT) is concerned with developing an AI agent who learns to collaborate with different previously unseen partners. We consider a setting where the AI agent is provided a hypothesis set of partners’ policies. Several online algorithms that take the hypothesis set as input can be applied to solve the AHT problem. One way to speed up these online learning algorithms is to eliminate the redundant policies from the hypothesis set. Redundant policies in AHT are equivalent to partner models sharing the same collaborating policy in the hypothesis set. Nevertheless, we show whether this elimination should be applied depending on the learning algorithm used by the AI agent. Specifically, we identify two relevant properties of a learning algorithm: redundancy-aware and extrapolating. When the learning algorithm is redundancy-aware, redundancy elimination is unnecessary. When the learning algorithm is not extrapolating, redundancy elimination is undesirable because it may prevent collaboration with certain partners. We demonstrate through an example that the extrapolating property is not universal among existing online learning algorithms. The result implies that redundancy elimination in AHT should be used with caution.