Evader-Agnostic Team-Based Pursuit Strategies in Partially-Observable Environments

Here we have the level-1 evader beating the level-0 pursuer team. The level-0 pursuer team is trained on a naive A* evader and is not familiar with responding to any strategic evasive actions when seen. About 5 seconds into the top two videos, the evader (red trail) is seen entering under a thick group of trees as soon as it sees the high-level pursuer (HLP) and exits on the other side. When the videos pause, the top left video shows what the naive evader would have done (denoted in pink), if it would have taken this non-evasive path, the HLP would have surely seen it thus increasing its likelyhood of capture.

Abstract

In this paper, we consider a scenario where a team of two unmanned aerial vehicles (UAVs) pursue an evader UAV within an urban environment. Each agent has a limited view of their environment where buildings can occlude their field-of-view. Additionally, the pursuer team is agnostic about the evader in terms of its initial and final location, and the behavior of evader. Consequently, the team needs to gather information by searching the environment and then track it to eventually intercept. To solve this multi-player, partially-observable, pursuit-evasion game, we develop a two-phase neuro-symbolic algorithm centered around the principle of bounded rationality. First, we devise an offline approach using deep reinforcement learning to progressively train adversarial policies for the pursuer team against fictitious evaders. This creates $k$-levels of rationality for each agent in preparation for the online phase. Then, we employ an online classification algorithm to determine a "best guess" of our current opponent from the set of iteratively-trained strategic agents and apply the best player response. Using this schema, we improved average performance when facing a random evader in our environment.

Environment

Challenges

Methodology

Results