This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\WarningsOff

[caption]

Double Oracle Algorithm for
Game-Theoretic Robot Allocation on Graphs

Zijian An and Lifeng Zhou Zijian An and Lifeng Zhou are with the Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA. Email: {za382,lz457}@drexel.edu.This research was sponsored by the Drexel Seaman Endowed Fellowship.
Abstract

We study the problem of game-theoretic robot allocation where two players strategically allocate robots to compete for multiple sites of interest. Robots possess offensive or defensive capabilities to interfere and weaken their opponents to take over a competing site. This problem belongs to the conventional Colonel Blotto Game. Considering the robots’ heterogeneous capabilities and environmental factors, we generalize the conventional Blotto game by incorporating heterogeneous robot types and graph constraints that capture the robot transitions between sites. Then we employ the Double Oracle Algorithm (DOA) to solve for the Nash equilibrium of the generalized Blotto game. Particularly, for cyclic-dominance-heterogeneous (CDH) robots that inhibit each other, we define a new transformation rule between any two robot types. Building on the transformation, we design a novel utility function to measure the game’s outcome quantitatively. Moreover, we rigorously prove the correctness of the designed utility function. Finally, we conduct extensive simulations to demonstrate the effectiveness of DOA on computing Nash equilibrium for homogeneous, linear heterogeneous, and CDH robot allocation on graphs.

Index Terms:
Colonel Blotto Game on Graphs, Double Oracle Algorithm, Heterogeneous Robots, Nash Equilibrium

I Introduction

With the advancement in computing, sensing, and communication, robots are increasingly used in various data collection tasks such as environmental exploration and coverage [1, 2, 3, 4, 5, 6], surveillance and reconnaissance [7, 8], and target tracking [9, 10, 11, 12, 13, 14, 15]. These tasks are typically replete with rival competition. For example, in oil, ocean, and aerospace exploration, multiple competitors tend to deploy their specialized robots to cover and occupy sites of interest [16]. This paper models this type of competition as a game-theoretic robot allocation problem, where two players allocate robots possessing defensive or offensive capabilities to compete for multiple interested sites.

Game-theoretic robot allocation belongs to the class of resource allocation problems [17, 18, 19, 20, 21]. In the realm of game theory, these problems are typically modeled as the Colonel Blotto game, which was first introduced by Borel [22] and further discussed in [23]. In the Colonel Blotto game, the two players allocate resources across multiple strategic points, with the player allocating more resources at a given strategic point being deemed the winner of that point. The seminal study from Borel and Ville has computed the equilibrium for a Colonel Blotto Game with three strategic points and equivalent total resources [24]. In 1950, Gross and Wagner extended these findings to scenarios involving more than three strategic points [25]. Later, Roberson addressed the equilibrium problem for continuous Colonel Games under the premise of unequal total resources [26]. Subsequent refinements to this continuous version emerged in [27] and [28].

Refer to caption
Figure 1: Game-theoretic robot allocation on a graph. Two players (“red” and “blue”) allocate three types of robots to compete for five sites in an area. The area is abstracted into a graph with five sites as the nodes and their connections as edges.

Nevertheless, in addressing the problem of game-theoretic robot allocation, research on the conventional Blotto game exhibits several limitations. First, these studies have mainly focused on unilateral or homogeneous resources [26, 27, 28]. However, the robots can be heterogeneous and have different capabilities, which means the number of robots is not the only criterion of win and loss. Second, considering the environmental constraints and the motion abilities of the robots, there are sites between which no direct routes are accessible, e.g., the aerial robots are not allowed to enter no-fly zones and the wheeled robots may not be able to traverse terrains with sharp hills, tall grass, or mud puddles. Therefore, given these real-world constraints and limitations, the environment in which the robots operate is a non-fully connected graph where robots can transition only between connected sites. However, none of the previous research on resource allocation games involves different types of resources and graph constraints. Therefore, in this paper, we primarily focus on game-theoretic robot allocation on graphs with multiple types of robots (Fig. 1).

First, the game requires both players to allocate robots under graph constraints, i.e., the robots can only be transitioned from one node to the other if there exists an edge between the two nodes. In practical scenarios, the resource allocation among nodes often demands consideration of the relative positions and connectivity of these points, as highlighted by [29, 30], in discussions about offensive and defensive security games. However, [29] mainly focuses on calming the winning conditions for one player in the presence of a more powerful opponent, without computing the equilibrium of the game (i.e., the optimal strategies for the two players). In [30], the main focus is the network security game where the defender places resources to catch the attacker instead of a Colonel Blotto Game. Specifically, we consider distinct nodes to share unidirectional connections. Thus, robots cannot be directly transited between disconnected points. This setting morphs the allocation game into an asymmetric game, detailed further in the subsequent sections.

Second, we consider that both players allocate multiple types of robots that may inhibit each other, termed cyclic dominance (CDH) [31]. In [27] the multiple-resource Colonel Blotto game has been introduced, but it considers different resources to be completely independent of each other. Different from that, For cyclic dominance heterogeneous (CDH) resources, any two types of resources hold a restrictive relationship as in the classic game of Rock-Paper-Scissors. This relationship is widely studied in evolutionary dynamics in biology science. For instance, in [32], Nowak utilized the examples of Uta stansburiana lizard and Escherichia coli to illustrate this mutually restrictive relationship. Similarly, an example of CDH robots could be three types of functional robots—cyber-security robots, network intrusion robots, and combat robots. The cyber-security robot is primarily designed for defensive operations, specializing in counteracting cyber threats. It is equipped with advanced cybersecurity algorithms, enabling it to detect and neutralize attacks from network intrusion robots. The network intrusion robot excels in offensive cyber operations. This robot is capable of executing sophisticated network attacks, targeting vulnerabilities in other robots, particularly those relying on network-based controls such as combat robots. The combat robot is engineered for physical combat and tactical defense. It is armed with state-of-the-art weaponry and robust armor, making it highly effective in direct physical confrontations such as with the cyber-security robot. In other words, a cyber-security robot is capable of defending against network attacks from a network intrusion robot, but it can be physically destroyed by a combat robot. Meanwhile, a network intrusion robot can easily paralyze the network systems of a combat robot, causing it to crash, but it can be defeated by a security robot. In [33], this relationship is represented in matrix form and incorporated into the replicator dynamics. We use the concept of cyclic dominance to distinguish different types of mutual-restrictive robots. Based on it, we formulate a utility function to quantitatively evaluate the outcomes of the game.

We utilize the double oracle algorithm (DOA) to compute the equilibrium of the Colonel Blotto game on graphs. This algorithm was initially proposed by McMahan, Gordon, and Blum [34] and has since been extensively adopted across various domains within game theory. In [35] and [36], DOA was employed to solve two-player zero-sum sequential games with perfect information, with its performance verified on adversarial graph search and simplified variants of Poker. A theoretical guarantee on the convergence of the DOA was also provided. Additionally, Adam [37] employed DOA for the classic Colonel’s Game, juxtaposing it with the fictitious play algorithm and identifying a more rapid convergence rate for the former.

Contributions. We make three main contributions as follows:

  • Problem formulation: We formulate a game-theoretic robot allocation problem where two players allocate multiple types of robots on graphs.

  • Approach: We leverage the DOA to calculate Nash equilibrium for the games with homogeneous, linear heterogeneous, and CDH robots. Particularly, for the CDH game, we introduce a new transformation rule between different types of robots. Based on it, we design a novel utility function and prove that the utility function is able to rigorously determine the winning conditions of two players. We also linearize the constructed utility function and formulate a mixed integer linear programming problem to solve the best response optimization problem in DOA.

  • Results: We demonstrate that players’ strategies computed by our approach converge to the Nash equilibrium and are better than other baseline approaches.

Overall, our main contribution lies in the incorporation of graph constraints into the classical Colonel’s Game and its extension to a heterogeneous context, where the multiple types of robots owned by opposing players are subject to mutual constraints. In addition, we design a novel utility function and utilize the DOA to calculate the equilibrium of this newly formulated game. Furthermore, the simulation results validate that our approach achieves the Nash equilibrium.

II Conventions and Problem Formulation

In this section, we first introduce game-theoretic robot allocation on graphs. The problem can be modeled as a Colonel Blotto game. Then we introduce the Colonel Blotto game including pure and mixed strategies and the equilibrium. Finally, we present the main problems to address in this paper.

II-A Robot Allocation between Two Players on Graphs

The environment where two players allocate robots can be abstracted into a graph as shown in Fig. 1. An ordered pair G=(𝒱,)G=(\mathcal{V},\mathcal{E}) can be used to define an undirected graph, comprising set of nodes 𝒱\mathcal{V} and set of edges :={{x,y}|x,y𝒱}\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}=\{\{x,y\}\ |\ x,y\in\mathcal{V}\}. Particularly, a complete graph is a simple undirected graph in which every pair of distinct nodes is connected by a unique edge [38]. Replacing nodes {x,y}\{x,y\} in \mathcal{E} by ordered sequence of two elements (x,y)(x,y) in 𝒱\mathcal{V} leads to directed graph, or digraph with :={(x,y)|(x,y)𝒱2}\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}=\{(x,y)\ |\ (x,y)\in\mathcal{V}^{2}\} a set of edges. For an edge ε=(x,y)\varepsilon=(x,y) directed from node xx to yy, xx and yy are called the endpoints of the edge with xx the tail of ε\varepsilon and yy the head of ε\varepsilon. A directed graph GG is called unilaterally connected or unilateral if GG contains a directed path from xx to yy or a directed path from yy to xx for every pair of nodes {x,y}\{x,y\} [39].

Previous studies on Colonel Blotto games do not incorporate graphs [27, 28, 37]. If the same logic is applied, they can be seen as the games conducted on a complete graph since there is no limitation on the robot transitions between any two nodes. In this paper, we focus on the Colonel Blotto games on unilateral graphs where the robots must be transitioned on the edges. Graph nodes are the strategic points where two players can allocate robots. We denote the number of nodes as NN. We model the robot allocation by its two characteristics—type denoted as RR, and number denoted as AA. If there are MM robot types i.e., R1,R2,,RMR_{1},R_{2},\cdots,R_{M}, the number of robots for each type can be denoted by A1,A2,,AMA_{1},A_{2},\cdots,A_{M}, respectively. In the paper, M=1M=1 denotes homogeneous robot allocation and M>1M>1 denotes heterogeneous robot allocation. To start the game of robot allocation, there must be an initial robot distribution for both players. Denote 𝐝0x(Ri)\mathbf{d}_{0}^{x}(R_{i}) and 𝐝0y(Ri)\mathbf{d}_{0}^{y}(R_{i}) as the initial distribution of ii-th type robots from Player 1 and Player 2, respectively. Notably, 𝐝0x(Ri)\mathbf{d}_{0}^{x}(R_{i}) (or 𝐝0y(Ri)\mathbf{d}_{0}^{y}(R_{i})) is a vector of NN entries, each of which stands for the ii-th type robots allocated on a particular node. Fig. 2(a) demonstrates an example of homogeneous robot allocation. The initial robot distribution of Player 1 (red) is [1,0,2,2,0][1,0,2,2,0]. Then she translates one robot from node 3 to node 2 and another from node 4 to node 1. The robot distribution becomes [2,0,2,1,0][2,0,2,1,0]. Since (1,4),(1,5)(1,4),(1,5)\in\mathcal{E}, this allocation is valid. Meanwhile, the robot allocation of Player 2 (blue) after one step is [1,1,0,2,1][1,1,0,2,1]. If the rule stipulates that the player who owns more robots on a node wins that node, then the outcome is that Player 2 wins. That is because she wins nodes 2,4,52,4,5 while Player 1 only wins nodes 1,31,3.

Refer to caption
((a)) Homogeneous robot allocation at time steps tt and t+1t+1
Refer to caption
((b)) Heterogeneous robot allocation for two players with Player 1’s mixed strategy and Player 2’s pure strategy
Figure 2: Examples of allocation games with (a) homogeneous and (b) heterogeneous robots. The number of robots allocated on each node is demonstrated as the length of the color bar, with red representing Player 1, and blue representing Player 2. In (a), at time tt (left), Player 1 wins node 3 and Player 2 wins node 5. At time t+1t+1 (right). Player 1 moves one robot from node 4 into node 1, and Player 2 moves one robot from node 3 to node 2. Thus Player 1 wins nodes 1 and 3 while Player 2 wins the remaining nodes. In (b), Player 1 adopts mixed strategy 𝚫X=[𝐏x1𝐒x1𝐏x2𝐒x2]\mathbf{\Delta}_{X}=\begin{bmatrix}\mathbf{P}_{x}^{1}\cdot\mathbf{S}_{x}^{1}&\mathbf{P}_{x}^{2}\cdot\mathbf{S}_{x}^{2}\end{bmatrix} and Player 2 adopts pure strategy 𝚫Y=𝐒y1\mathbf{\Delta}_{Y}=\mathbf{S}_{y}^{1}. The win or loss cannot be claimed solely based on the number of robots because of the CDH robots.

II-B Colonel Blotto Game

A Colonel Blotto Game is a two-player constant-sum game in which the players are tasked to simultaneously allocate limited resources over several strategic nodes [22]. Each player selects strategies from a nonempty strategy set 𝒳=𝒳(G),𝒴=𝒴(G)\mathcal{X}=\mathcal{X}(G),~{}\mathcal{Y}=\mathcal{Y}(G), where 𝒳,𝒴{𝐒M×N| 0𝐒ijAi,j𝐒ij=Ai,i{1,2,,M},j{1,2,,N}}\mathcal{X},~{}\mathcal{Y}\subseteq\{\mathbf{S}_{M\times N}\ |\ 0\leq\mathbf{S}_{ij}\leq A_{i},\sum_{j}\mathbf{S}_{ij}=A_{i},i\in\{1,2,\cdots,M\},j\in\{1,2,...,N\}\}. 𝐒ij\mathbf{S}_{ij} stands for amount of ii-th resource allocated on jj-th node. Each player picks their strategies combination 𝒳^𝒳,𝒴^𝒴\hat{\mathcal{X}}\subseteq\mathcal{X},\hat{\mathcal{Y}}\subseteq\mathcal{Y} over probability distribution 𝐏x\mathbf{P}_{x} and 𝐏y\mathbf{P}_{y}, respectively. The probability of strategy 𝐒x\mathbf{S}_{x} adopted by Player 1 is denoted as Px(𝐒x)P_{x}(\mathbf{S}_{x}). Analogously for Player 2, it is donated as Py(𝐒y)P_{y}(\mathbf{S}_{y}). This strategy’s combination over the probability distribution is called mixed strategy, denoted by 𝚫X\mathbf{\Delta}_{X} and 𝚫Y\mathbf{\Delta}_{Y}. The number of strategies in a mixed strategy set is denoted by KK. For example, if Player 1 decides to apply the mixed strategy of KxK_{x} strategies from 𝒳\mathcal{X}, then 𝚫X=𝐏x×𝐒xM×(N×Kx)\mathbf{\Delta}_{X}=\mathbf{P}_{x}\times\mathbf{S}_{x}\in\mathbb{R}^{M\times(N\times K_{x})}. If K=1K=1, the strategy is called pure strategy. Fig. 2(b) illustrates an example of heterogeneous robot allocation that expands from the case of homogeneous robot allocation (Fig. 2(a)). There are three types of robots, i.e., M=3M=3 and each type has five robots, i.e., A1=A2=A3=5A_{1}=A_{2}=A_{3}=5. There are five nodes in the graph, i.e., N=5N=5. For Player 1,

𝐒x1=[310012300010220],𝐒x2=[210112102020120].\mathbf{S}_{x}^{1}=\begin{bmatrix}3&1&0&0&1\\ 2&3&0&0&0\\ 1&0&2&2&0\\ \end{bmatrix},\quad\mathbf{S}_{x}^{2}=\begin{bmatrix}2&1&0&1&1\\ 2&1&0&2&0\\ 2&0&1&2&0\\ \end{bmatrix}.

and for Player 2,

𝐒y1=[110122201010031].\mathbf{S}_{y}^{1}=\begin{bmatrix}1&1&0&1&2\\ 2&2&0&1&0\\ 1&0&0&3&1\\ \end{bmatrix}.

If the probability of taking the first strategy and the second strategy is respectively 0.4 and 0.6, i.e., Px(𝐒x1)=0.4,Px(𝐒x2)=0.6P_{x}(\mathbf{S}_{x}^{1})=0.4,P_{x}(\mathbf{S}_{x}^{2})=0.6, then mixed strategy for Player 1 is

𝚫X\displaystyle\mathbf{\Delta}_{X} =[0.4𝐒x10.6𝐒x2]\displaystyle=\begin{bmatrix}0.4\cdot\mathbf{S}_{x}^{1}&\quad 0.6\cdot\mathbf{S}_{x}^{2}\end{bmatrix}
=[1.20.4000.41.20.600.60.60.81.20001.20.601.200.400.80.801.200.61.20].\displaystyle=\begin{bmatrix}1.2&0.4&0&0&0.4&1.2&0.6&0&0.6&0.6\\ 0.8&1.2&0&0&0&1.2&0.6&0&1.2&0\\ 0.4&0&0.8&0.8&0&1.2&0&0.6&1.2&0\\ \end{bmatrix}.

While Player 2 takes pure strategy with Py(𝐒y1)=1P_{y}(\mathbf{S}_{y}^{1})=1, thus 𝚫Y=𝐒y1\mathbf{\Delta}_{Y}=\mathbf{S}_{y}^{1}. Notably, the mixed strategy comprises elements from compact strategy sets 𝒳\mathcal{X} and 𝒴\mathcal{Y}. For each strategy element 𝐒x\mathbf{S}_{x} and 𝐒y\mathbf{S}_{y}, the utility function is a continuous function measuring the outcome of the game u=u(𝐒x,𝐒y):𝒳×𝒴u=u(\mathbf{S}_{x},\mathbf{S}_{y})\mathrel{\mathop{\mathchar 58\relax}}\mathcal{X}\times\mathcal{Y}\rightarrow\mathbb{R} for Player 1, while for Player 2 utility function is u-u if the game is zero-sum. In classic Colonel Blotto game (i.e., homogeneous robot allocation), the utility function can be modeled as a sign function since the player allocating more robots to a node wins that node. The utility is the total number of strategic points the player wins, as shown in Equation 1.

u(𝐒x,𝐒y)=i=1Nsgn(𝐒x,i𝐒y,i).\displaystyle u(\mathbf{S}_{x},\mathbf{S}_{y})=\sum_{i=1}^{N}{\texttt{sgn}(\mathbf{S}_{x,i}-\mathbf{S}_{y,i})}. (1)
wheresgn(x)={1,xC;x/C,x[C,C];1,xC.\displaystyle\text{where}\quad\texttt{sgn}(x)=\left\{\begin{aligned} -1&,&x&\leq-C;\\ x/C&,&x&\in[-C,C];\\ 1&,&x&\geq C.\end{aligned}\right.

Here 𝐒x\mathbf{S}_{x} and 𝐒y\mathbf{S}_{y} are both NN-dimension vectors since the classic Colonel Blotto game is a homogeneous game where M=1M=1 (mentioned in section II-A), and 𝐒x,i,𝐒y,i\mathbf{S}_{x,i},\mathbf{S}_{y,i} refer to ii-th entry of 𝐒x\mathbf{S}_{x} and 𝐒y\mathbf{S}_{y}. However, in games with heterogeneous resources, the utility functions can be more complicated.

The triple =(𝒳(G),𝒴(G),u)\mathbb{C}=(\mathcal{X}(G),\mathcal{Y}(G),u) is called a continuous game. In a continuous game, given two players’ mixed strategies 𝚫X\mathbf{\Delta}_{X} and 𝚫Y\mathbf{\Delta}_{Y}, expected utility of Player 1 is defined as

U(𝚫X,𝚫Y)\displaystyle U(\mathbf{\Delta}_{X},\mathbf{\Delta}_{Y}) =𝐒x𝒳^𝐒y𝒴^Px(𝐒x)Py(𝐒y)u(𝐒x,𝐒y).\displaystyle=\sum_{\mathbf{S}_{x}\in\hat{\mathcal{X}}}\sum_{\mathbf{S}_{y}\in\hat{\mathcal{Y}}}P_{x}(\mathbf{S}_{x})P_{y}(\mathbf{S}_{y})u(\mathbf{S}_{x},\mathbf{S}_{y}).
Lemma 1

A mixed strategy group (𝚫X,𝚫Y)(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}) is equilibrium of a continuous game \mathbb{C} if, for all (𝚫X,𝚫Y)(\mathbf{\Delta}_{X},\mathbf{\Delta}_{Y}),

U(𝚫X,𝚫Y)U(𝚫X,𝚫Y)U(𝚫X,𝚫Y).\displaystyle U(\mathbf{\Delta}_{X},\mathbf{\Delta}_{Y}^{*})\leq U(\mathbf{\Delta}_{X},\mathbf{\Delta}_{Y})\leq U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}).

According to [40], every continuous game has at least one equilibrium.

II-C Problem Formulation

We consider a two-player game-theoretic robot allocation on a connected directed graph G=(𝒱,)G=(\mathcal{V},\mathcal{E}) with |𝒱|=N|\mathcal{V}|=N. Suppose each player has MM types of robots R1,R2,,RMR_{1},R_{2},\cdots,R_{M} with corresponding quantity A1,A2,,ANA_{1},A_{2},\cdots,A_{N}. The initial robot distributions of the two players are 𝐝x0\mathbf{d}_{x}^{0} and 𝐝y0\mathbf{d}_{y}^{0}, respectively. We first introduce two basic assumptions.

Assumption 1 (One-step transition)

We assume the transition of robots between two connected nodes can be accomplished within one step.

Assumption 2 (Perfect information)

We assume the allocation strategies of the two players are public.

With these two assumptions, we aim to compute the equilibrium (i.e., optimal mixed strategies for the two players) for three versions of the game-theoretic robot allocation on graphs.

Problem 1 (Allocation game with homogeneous robots)

All robots are homogeneous (i.e., M=1M=1). The win or loss at a given node is determined purely by the number of robots allocated. The utility function for the players is introduced in (1), where ϵ\epsilon is a continuous factor ensuring the smoothness and continuity of the function.

Problem 2 (Allocation game with linear heterogeneous robots)

Robots are heterogeneous, i.e., M>1M>1 but there exists a linear transformation between different robot types, e.g., Ri=aRjR_{i}=aR_{j} with aa a positive constant.

Problem 3 (Allocation game with CDH robots)

Robots are heterogeneous, i.e., M>1M>1, and they mutually inhibit one another. This is commonly referred to as cyclic dominance [31]. In CDH robot allocation, the number of robots on a node can no longer determine the winning condition of that node since the combination of heterogeneous robots also plays an important role. Therefore new utility function is required to describe this inhibiting relationship, which is introduced in Section IV-C.

III Preliminaries

III-A Double Oracle Algorithm

Algorithm 1 Double Oracle Algorithm

Input:

  • Graph G=(𝒱,)G=(\mathcal{V},\mathcal{E});

  • Continuous game =(𝒳(G),𝒴(G),u)\mathbb{C}=(\mathcal{X}(G),\mathcal{Y}(G),u);

  • Initial nonempty strategy set 𝒳0𝒳,𝒴0𝒴\mathcal{X}_{0}\subseteq\mathcal{X},\mathcal{Y}_{0}\subseteq\mathcal{Y};

  • threshold ϵ\epsilon.

1:while UuUl>ϵU_{u}-U_{l}>\epsilon do
2:     (𝚫Xi,𝚫Yi)(𝒳i,𝒴i,u)(\mathbf{\Delta}_{X}^{i*},\mathbf{\Delta}_{Y}^{i*})\leftarrow(\mathcal{X}_{i},\mathcal{Y}_{i},u) calculate equilibrium for subgame;
3:     (𝚫Xi,𝚫Yi)(\mathbf{\Delta}_{X}^{i*},\mathbf{\Delta}_{Y}^{i*}) reject strategies with low probability;
4:     𝐒xi+1,𝐒yi+1(𝚫Xi,𝚫Yi)\mathbf{S}_{x}^{i+1},\mathbf{S}_{y}^{i+1}\leftarrow(\mathbf{\Delta}_{X}^{i*},\mathbf{\Delta}_{Y}^{i*}) find best response strategy;
5:     if 𝐒xi+1𝒳i\mathbf{S}_{x}^{i+1}\notin\mathcal{X}_{i} and 𝐒yi+1𝒴i\mathbf{S}_{y}^{i+1}\notin\mathcal{Y}_{i} then
6:         𝒳i+1=𝒳i{𝐒xi+1},𝒴i+1=𝒴i{𝐒yi+1}\mathcal{X}_{i+1}=\mathcal{X}_{i}\ \bigcup\ \{\mathbf{S}_{x}^{i+1}\},\mathcal{Y}_{i+1}=\mathcal{Y}_{i}\ \bigcup\ \{\mathbf{S}_{y}^{i+1}\};
7:         Ul:=U(𝚫Xi,𝐒yi+1),Uu:=U(𝐒xi+1,𝚫Yi)U_{l}\mathrel{\mathop{\mathchar 58\relax}}=U(\mathbf{\Delta}_{X}^{i*},\mathbf{S}_{y}^{i+1}),U_{u}\mathrel{\mathop{\mathchar 58\relax}}=U(\mathbf{S}_{x}^{i+1},\mathbf{\Delta}_{Y}^{i*});
8:     end if
9:end while

Output:

  • ϵ\epsilon-equilibrium mixed strategy set (𝚫X,𝚫Y)(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}) of game \mathbb{C};

The double oracle algorithm was first presented in [34]. Later, it was widely utilized to compute the equilibrium (or mixed strategies) for continuous games. [30, 41]. To better understand the algorithm, we first introduce the notion of best response strategy. The best response strategy of Player 1, given the mixed strategy 𝚫Y\mathbf{\Delta}_{Y} of Player 2, is defined as

δx(𝚫Y):={𝐒x𝒳|U(𝐒x,𝚫Y)=max𝐒x𝒳U(𝐒x,𝚫Y)}.\displaystyle\delta_{x}(\mathbf{\Delta}_{Y})\mathrel{\mathop{\mathchar 58\relax}}=\{\mathbf{S}_{x}\in\mathcal{X}\ |\ U(\mathbf{S}_{x},\mathbf{\Delta}_{Y})=\max_{\mathbf{S}^{\prime}_{x}\in\mathcal{X}}U(\mathbf{S}^{\prime}_{x},\mathbf{\Delta}_{Y})\}. (2)

It is Player 1’s best pure strategy against her opponent (i.e., Player 2). Similarly, Player 2’s best response strategy is

δy(𝚫X):={𝐒y𝒴|U(𝚫X,𝐒y)=min𝐒y𝒴U(𝚫X,𝐒y)}.\displaystyle\delta_{y}(\mathbf{\Delta}_{X})\mathrel{\mathop{\mathchar 58\relax}}=\{\mathbf{S}_{y}\in\mathcal{Y}\ |\ U(\mathbf{\Delta}_{X},\mathbf{S}_{y})=\min_{\mathbf{S}^{\prime}_{y}\in\mathcal{Y}}U(\mathbf{\Delta}_{X},\mathbf{S}^{\prime}_{y})\}. (3)

The double oracle algorithm begins from an initial strategy set 𝒳0\mathcal{X}_{0} and 𝒴0\mathcal{Y}_{0} for the two players, which are picked randomly from strategy space 𝒳\mathcal{X} and 𝒴\mathcal{Y}. 0=(𝒳0,𝒴0,u)\mathbb{C}_{0}=(\mathcal{X}_{0},\mathcal{Y}_{0},u) is a subgame of =(𝒳,𝒴,u)\mathbb{C}=(\mathcal{X},\mathcal{Y},u), and the equilibrium of this subgame can be calculated by linear programming, as in Algorithm 1, line 2. From the equilibrium of the subgame, notated as 𝚫Xi\mathbf{\Delta}_{X}^{i*} and 𝚫Yi\mathbf{\Delta}_{Y}^{i*} in ii-th iteration, best response strategies for both players are computed as in Algorithm 1, line 4. If these strategies are not in the subgame strategy set, then add them into the set to form a bigger subgame, and calculate whether equilibrium is reached. According to [37], the equilibrium condition (Lemma 1) is equivalent to the following lemma.

Lemma 2

A mixed strategy group (𝚫X,𝚫Y)(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}) is equilibrium of a continuous game \mathbb{C} if, for all 𝐒xδx(𝚫Y)\mathbf{S}_{x}\in\delta_{x}(\mathbf{\Delta}_{Y}) and 𝐒yδy(𝚫X)\mathbf{S}_{y}\in\delta_{y}(\mathbf{\Delta}_{X}),

U(𝐒x,𝚫Y)U(𝚫X,𝚫Y)U(𝚫X,𝐒y).\displaystyle U(\mathbf{S}_{x},\mathbf{\Delta}_{Y}^{*})\leq U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*})\leq U(\mathbf{\Delta}_{X}^{*},\mathbf{S}_{y}). (4)

maxU(𝚫X,𝐒y)\max{U(\mathbf{\Delta}_{X}^{*},\mathbf{S}_{y})} is called upper utility of the game, denoted by UuU_{u}, and minU(𝐒x,𝚫Y)\min{U(\mathbf{S}_{x},\mathbf{\Delta}_{Y}^{*})} is called lower utility of the game, denoted by UlU_{l}. Note that, in practice, (4) is typically replaced by ϵ\epsilon-equilibrium equation for the simplicity of computation.

U(𝐒x,𝚫Y)ϵU(𝚫X,𝚫Y)U(𝚫X,𝐒y)+ϵ.\displaystyle U(\mathbf{S}_{x},\mathbf{\Delta}_{Y}^{*})-\epsilon\leq U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*})\leq U(\mathbf{\Delta}_{X}^{*},\mathbf{S}_{y})+\epsilon.

Therefore, the termination condition is UuUl>ϵU_{u}-U_{l}>\epsilon. In addition, since the output is the mixed strategy and too many redundant strategies with a low probability would slow down the algorithm, we delete strategies with a low probability, as shown in Algorithm 1, line 3.

The optimization problem in Algorithm 1, line 2 is a linear programming (LP) problem. Define the utility matrix over 𝐒x\mathbf{S}_{x} and 𝐒y\mathbf{S}_{y} by 𝐔\mathbf{U}, with each entry 𝐔ij\mathbf{U}_{ij} being the utility of Player 1’s ii-th strategy verses Player 2’s jj-th strategy, i.e.,

𝐔ij=u(𝐒xi,𝐒yj).\displaystyle\mathbf{U}_{ij}=u(\mathbf{S}_{x}^{i},\mathbf{S}_{y}^{j}).

Therefore, the expected utility can be calculated by

U(𝚫X,𝚫Y)=𝐏x𝐔𝐏y.\displaystyle U(\mathbf{\Delta}_{X},\mathbf{\Delta}_{Y})=\mathbf{P}_{x}\mathbf{U}\mathbf{P}_{y}.

Suppose 𝐔\mathbf{U} is a Kx×KyK_{x}\times K_{y} matrix (that is, Player 1’s mixed strategy set containing KxK_{x} strategies and Player 2’s mixed strategy set containing KyK_{y} strategies), the probability distribution of Player 1’s strategies 𝐏x\mathbf{P}_{x} can be calculated by

min𝐏x,UxUxs.t.i=1KxPx(𝐒xi)=1,Px(𝐒xi)0,fori{1,2,,Kx},Uxi=1KxPx(𝐒xi)𝐔ij,forj{1,2,,Ky}.\displaystyle\begin{aligned} &\min_{\mathbf{P}_{x},\ U_{x}\in\mathbb{R}}\ U_{x}\\ s.t.\ &\sum_{i=1}^{{K_{x}}}P_{x}(\mathbf{S}_{x}^{i})=1,\\ &P_{x}(\mathbf{S}_{x}^{i})\geq 0,\ \text{for}\ i\in\{1,2,\cdots,{K_{x}}\},\\ &U_{x}\geq\sum_{i=1}^{K_{x}}P_{x}(\mathbf{S}_{x}^{i})\mathbf{U}_{ij},\ \text{for}\ j\in\{1,2,\cdots,{K_{y}}\}.\end{aligned}

For Player 2, the probability distribution 𝐏y\mathbf{P}_{y} can be calculated by

max𝐏y,UyUys.t.j=1KyPy(𝐒yj)=1,Py(𝐒yj)0,forj{1,2,,Ky},Uyj=1KyPy(𝐒yj)𝐔ij,fori{1,2,,Kx}.\displaystyle\begin{aligned} &\max_{\mathbf{P}_{y},\ U_{y}\in\mathbb{R}}\ U_{y}\\ s.t.\ &\sum_{j=1}^{{K_{y}}}P_{y}(\mathbf{S}_{y}^{j})=1,\\ &P_{y}(\mathbf{S}_{y}^{j})\geq 0,\ \text{for}\ j\in\{1,2,\cdots,{K_{y}}\},\\ &U_{y}\leq\sum_{j=1}^{K_{y}}P_{y}(\mathbf{S}_{y}^{j})\mathbf{U}_{ij},\ \text{for}\ i\in\{1,2,\cdots,{K_{x}}\}.\end{aligned}

These can be solved by linprog in Python.

The most crucial part of the double oracle algorithm is to compute the best response strategy (Algorithm 1, line 4). The graph constraints and various robot types we consider make the computation challenging. First, the strategy set 𝒳\mathcal{X} and 𝒴\mathcal{Y} in (2) and (3) depends on the graph GG. Given that GG is unilaterally connected, not all nodes can be reached within one step. Therefore with an initial robot distribution, reachable strategy sets 𝒳\mathcal{X} and 𝒴\mathcal{Y} need to be computed first. This problem was introduced in paper [29], which we briefly summarize in Section III-B. Second, an appropriate utility is essential for computing the best response strategy. However, for CDH robots, no such function exists. To this end, we construct a continuous utility function to handle CDH robots (Section IV-C).

III-B Reachable Strategy Set

We leverage the notion of reachable set for computing strategy sets 𝒳(G)\mathcal{X}(G) and 𝒴(G)\mathcal{Y}(G) on graph GG. We define the adjacency matrix 𝐀\mathbf{A} of a graph GG as:

𝐀ij={0,(j,i);1,(j,i).\mathbf{A}_{ij}=\left\{\begin{aligned} 0&,&(j,i)&\notin\mathcal{E};\\ 1&,&(j,i)&\in\mathcal{E}.\end{aligned}\right.

The adjacency matrix describes the connectivity of nodes on a graph. We denote the ii-th type robots RiR_{i} distributed on a graph at time tt by a NN-dimension vector 𝐝t(Ri)\mathbf{d}_{t}(R_{i}) with NN the number of nodes and each element the number of this type robots on each node. Then we denote the Transition matrix 𝐓\mathbf{T} as

𝐝t+1(Ri)=𝐓𝐝t(Ri).\mathbf{d}_{t+1}(R_{i})=\mathbf{T}\mathbf{d}_{t}(R_{i}).

Note that the transition matrix 𝐓\mathbf{T} has two characteristics.

  • j𝐓ij=1.\sum_{j}\mathbf{T}_{ij}=1.

  • 𝐓ij0,𝐓ij=0if𝐀ij=0.\mathbf{T}_{ij}\geq 0,\ \mathbf{T}_{ij}=0\quad\text{if}\ \mathbf{A}_{ij}=0.

All valid transition matrices form admissible action space 𝒯~\widetilde{\mathcal{T}}, i.e.,

𝒯~={𝐓N|j𝐓ij=1,𝐓ij0,𝐓ij=0if𝐀ij=0}.\widetilde{\mathcal{T}}=\{\mathbf{T}\in\mathbb{R}^{N}\ |\ \sum_{j}\mathbf{T}_{ij}=1,\ \mathbf{T}_{ij}\geq 0,\ \mathbf{T}_{ij}=0\ \text{if}\ \mathbf{A}_{ij}=0\}.

To to determine closed-form expression of 𝒯~\widetilde{\mathcal{T}}, define extreme action space 𝒯^\hat{\mathcal{T}} corresponding to the binomial allocation as

𝒯^={𝐓N|𝐓𝒯~,𝐓ij{0,1}}.\hat{\mathcal{T}}=\{\mathbf{T}\in\mathbb{R}^{N}\ |\ \mathbf{T}\in\widetilde{\mathcal{T}},\mathbf{T}_{ij}\in\{0,1\}\}. (5)

Extreme action space 𝒯^\hat{\mathcal{T}} is a finite set and its element forms the base of 𝒯~\widetilde{\mathcal{T}}. Therefore, admissible action space can be calculated as a linear combination of all elements in extreme action space.

𝒯~={𝐓N|𝐓=𝐓𝒯^λi𝐓}.\widetilde{\mathcal{T}}=\{\mathbf{T}\in\mathbb{R}^{N}\ |\ \mathbf{T}=\sum_{\mathbf{T}^{\prime}\in\hat{\mathcal{T}}}\lambda_{i}\mathbf{T}^{\prime}\}.

Then, given an initial distribution of the ii-th robot type, 𝐝0(Ri)\mathbf{d}_{0}(R_{i}), the reachable set for ii-th robot type can be calculated as

(Ri)={𝐝|𝐓𝒯~,𝐝=𝐓𝐝0(Ri)}.\mathcal{R}(R_{i})=\{\mathbf{d}\ |\ \mathbf{T}\in\widetilde{\mathcal{T}},\mathbf{d}=\mathbf{T}\mathbf{d}_{0}(R_{i})\}. (6)

Finally, the strategy set 𝒳\mathcal{X} for all types of robots can be expressed as a collection of their individual reachable sets.

𝒳={𝐒M×N|𝐒=[𝐑1,,𝐑M]T,𝐑i(Ri)}.\mathcal{X}=\{\mathbf{S}\in\mathbb{R}^{M\times N}\ |\ \mathbf{S}=[\mathbf{R}_{1},\ \cdots,\mathbf{R}_{M}]^{T},\mathbf{R}_{i}\in\mathcal{R}(R_{i})\}. (7)

The reachable set 𝒴\mathcal{Y} can be calculated analogously.

IV Approach

In this section, we present solutions to compute the equilibrium (i.e., optimal mixed strategies for the two players) for Problem 1, Problem 2, and Problem 3.

IV-A Approach for Problem 1

In Problem 1, two players allocate homogeneous robots (i.e., M=1M=1) on graph GG. We denote initial robot distribution as 𝐝0(x)\mathbf{d}_{0}(x) and 𝐝0(y)\mathbf{d}_{0}(y), or shorthand 𝐝x\mathbf{d}_{x} and 𝐝y\mathbf{d}_{y}. We first calculate the reachable set of initial strategy sets 𝒳(𝐝x)\mathcal{X}(\mathbf{d}_{x}) and 𝒴(𝐝y)\mathcal{Y}(\mathbf{d}_{y}) by (6) and (7).

𝒳(𝐝x)={𝐒N|𝐓𝒯~,𝐒=𝐓𝐝x},𝒴(𝐝y)={𝐒N|𝐓𝒯~,𝐒=𝐓𝐝y}.\displaystyle\begin{split}\mathcal{X}(\mathbf{d}_{x})&=\{\mathbf{S}\in\mathbb{R}^{N}\ |\ \mathbf{T}\in\widetilde{\mathcal{T}},\mathbf{S}=\mathbf{T}\mathbf{d}_{x}\},\\ \mathcal{Y}(\mathbf{d}_{y})&=\{\mathbf{S}\in\mathbb{R}^{N}\ |\ \mathbf{T}\in\widetilde{\mathcal{T}},\mathbf{S}=\mathbf{T}\mathbf{d}_{y}\}.\end{split} (8)

Then we utilize the Double Oracle algorithm (Algorithm 1) to compute the optimal mixed strategies for the two players with the utility function introduced in (1). Particularly, assume in jj-th step 𝚫yj=[Py(𝐒y1)𝐒y1,Py(𝐒y2)𝐒y2,,Py(𝐒yK)𝐒yK]\mathbf{\Delta}_{y}^{j*}=[P_{y}(\mathbf{S}_{y}^{1})\cdot\mathbf{S}_{y}^{1},\ P_{y}(\mathbf{S}_{y}^{2})\cdot\mathbf{S}_{y}^{2},\ \cdots,\ P_{y}(\mathbf{S}_{y}^{K})\cdot\mathbf{S}_{y}^{K}] is known, the best response strategy for Player 1 is determined by

max𝐒x𝒳(𝐝x)i=1KyPy(𝐒yi)u(𝐒x,𝐒yi).\max_{\mathbf{S}_{x}\in\mathcal{X}(\mathbf{d}_{x})}\sum_{i=1}^{K_{y}}P_{y}(\mathbf{S}_{y}^{i})u(\mathbf{S}_{x},\mathbf{S}_{y}^{i}). (9)

The approach to solve this problem will be discussed in Section IV-D.

IV-B Approach for Problem 2

In Problem 2, two players allocate linear heterogeneous robots (i.e., M>1M>1) on graph GG. Then the strategy space 𝐒M×N\mathbf{S}\in\mathbb{R}^{M\times N} becomes a M×NM\times N-dimension matrix. Denote the initial robot distribution as 𝐝x\mathbf{d}_{x} and 𝐝y\mathbf{d}_{y}, which are also M×NM\times N-dimension matrices. Different robot types can be linearly transformed, i.e., Ri=IijRjR_{i}=I_{ij}R_{j} by an intrinsic matrix 𝐈\mathbf{I}, i.e.,

𝐈ij𝐈ji=1,forij,𝐈ii=1,\displaystyle\mathbf{I}_{ij}\mathbf{I}_{ji}=1,\;\text{for}\;i\neq j,\;\mathbf{I}_{ii}=1, (10a)
𝐈ik𝐈kj=𝐈ij,for anyi,j,k{1,2,,M}.\displaystyle\mathbf{I}_{ik}\mathbf{I}_{kj}=\mathbf{I}_{ij},\;\text{for any}\;i,j,k\in\{1,2,\cdots,M\}. (10b)

The second equality ensures that the transformation between any two types of robots is reversible and multiplicative. Therefore, all types can be transformed to a specified type ff, i.e., Rf=𝐈fiRiR_{f}=\mathbf{I}_{fi}R_{i} for all if,i{1,2,,M}i\neq f,i\in\{1,2,\cdots,M\}. This means that linear heterogeneous robots are transformed into homogeneous robots. In other words, Problem 2 becomes Problem 1. Specifically, after transformation, strategy space 𝐒N\mathbf{S}^{\prime}\in\mathbb{R}^{N} and initial robot distributions 𝐝x,𝐝y\mathbf{d}^{\prime}_{x},\mathbf{d}^{\prime}_{y} become NN-dimension vectors. Then the reachable set of initial strategy sets 𝒳(𝐝x)\mathcal{X}(\mathbf{d}^{\prime}_{x}) and 𝒴(𝐝y)\mathcal{Y}(\mathbf{d}^{\prime}_{y}) can be calculated by (8) and Problem 2 can be solved using the same approach for Problem 1.

IV-C Approach for Problem 3

In Problem 3, two players allocate cyclic-dominance-heterogeneous (CDH) robots on graph GG. In this paper, we specify that if robot type RiR_{i} dominates robot type RjR_{j}, then RiR_{i} is capable of neutralizing or overcoming a greater number of RjR_{j} than its own quantity. With CDH robots, the linear intrinsic transformation between robot types is broken and not guaranteed to be closed anymore. For example, if M=3M=3 and R1=2R2,R2=2R3,R3=2R1R_{1}=2R_{2},R_{2}=2R_{3},R_{3}=2R_{1}, by circularly converting R1R_{1} to R2R_{2}, R2R_{2} to R3R_{3}, and R3R_{3} to R1R_{1}, the number of R1R_{1} increases out of thin air. Therefore, we design a new transformation rule, named elimination transformation between different types of CDH robots.

Elimination transformation. In CDH robot allocation, transformation is valid only when robots can be eliminated but not created between two players. For example, consider 𝐈12=𝐈23=𝐈13=2\mathbf{I}_{12}=\mathbf{I}_{23}=\mathbf{I}_{13}=2, and Player 1 allocates 𝐒x,i=(R1,2R2,4R3)\mathbf{S}_{x,i}=(R_{1},2R_{2},4R_{3}) and Player 2 allocates 𝐒y,i=(3R1,R2,3R3)\mathbf{S}_{y,i}=(3R_{1},R_{2},3R_{3}) on node ii. After canceling out the same types of robots, the remaining robot distribution after subtraction on node ii is 𝐒x,i𝐒y,i=(2R1,R2,R3)\mathbf{S}_{x,i}-\mathbf{S}_{y,i}=(-2R_{1},R_{2},R_{3}). This means Player 1 wins the game since Player 1 can consume Player 2’s 2R12R_{1} at the cost of R3R_{3}, and still has one R2R_{2} to win node ii. However, if we transform Player 2’s R1R_{1} to R2R_{2}, i.e., 2R1=4R2-2R_{1}=-4R_{2}, the remaining robots are 4R2+R2+0.5R2=2.5R2-4R_{2}+R_{2}+0.5R_{2}=-2.5R_{2} with 0.5R20.5R_{2} transformed from R3R_{3}, leading to Player 2’s win. This process is wrong because we cannot transform R1R_{1} to R2R_{2} as we cannot create new type-2 robots out of type-1 robots. Therefore, the robots can only be eliminated but not created. Following this rule, we can transform 0.5R1-0.5R_{1} to cancel out R2R_{2} and the remaining 1.5R1-1.5R_{1} can be eliminated by 0.75R30.75R_{3}. In the end, we have the remaining robots as 0.25R30.25R_{3}, and thus Player 1 wins.

Based on the elimination transformation rule, we construct a novel utility function to decide on the outcome of the game. In particular, we consider three types of CDH robots (i.e., M=3M=3) as in the case of Rock-Paper-Scissor. 111For M4M\geq 4, the operation becomes even more complex and differs from the case of M=3M=3, which we explain by an example in the Appendix. We leave the case of M4M\geq 4 for future research. Notably, the inhibiting property of the CDH robot allocation implies that the intrinsic matrix satisfies 𝐈ij>1\mathbf{I}_{ij}>1 for (i,j){(1,2),(2,3),(3,1)}(i,j)\in\{(1,2),(2,3),(3,1)\}.

IV-C1 Construction of outcome interface

Based on the elimination transformation, we first construct an outcome surface that distinguishes between the win and loss of the players, and then design a new utility function of the game in Section IV-C2. The outcome interface πoi(𝐱)=πoi((u,v,w))=0\pi_{\texttt{oi}}(\mathbf{x})=\pi_{\texttt{oi}}((u,v,w))=0 defined on 3\mathbb{R}^{3} is the continuous surface between any two of the three concurrent lines:

l1\displaystyle l_{1} :u+v/𝐈12=0,w=0,\displaystyle\mathrel{\mathop{\mathchar 58\relax}}u+v/\mathbf{I}_{12}=0,w=0, (11a)
l2\displaystyle l_{2} :v+w/𝐈23=0,u=0,\displaystyle\mathrel{\mathop{\mathchar 58\relax}}v+w/\mathbf{I}_{23}=0,u=0, (11b)
l3\displaystyle l_{3} :w+u/𝐈31=0,v=0.\displaystyle\mathrel{\mathop{\mathchar 58\relax}}w+u/\mathbf{I}_{31}=0,v=0. (11c)

Then we formally define the piecewise linear function πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) according to [42, Definition 2.1].

Definition 1 ([42])

Let Γ\Gamma be a closed convex domain in d\mathbb{R}^{d}. A function π:Γ\pi\mathrel{\mathop{\mathchar 58\relax}}\Gamma\rightarrow\mathbb{R} is said to be piecewise linear if there is a finite family 𝒬\mathcal{Q} of closed domains such that Γ=𝒬\Gamma=\cup\mathcal{Q} and if π\pi is linear on every domain in 𝒬\mathcal{Q}. A unique linear function gg on d\mathbb{R}^{d} which coincides with π\pi on a given Q𝒬Q\in\mathcal{Q} is said to be a component of π\pi. Let \mathcal{H} denote the set of hyperplanes defined by gi(𝐱)=gj(𝐱)g_{i}(\mathbf{x})=g_{j}(\mathbf{x}) for i<ji<j that have a nonempty intersection with the interior of Γ\Gamma.

With the definition of the piecewise linear function, we utilize [42, Theorem 4.1] to design our outcome interface, which defines the demarcation between the win and loss.

Theorem 1 ([42])

Let π\pi be a piecewise linear function on Γ=3\Gamma=\mathbb{R}^{3} and {g1,,gn}\{g_{1},\cdots,g_{n}\} be the set of its distinct components. There exist a family {Fj}jJ\{F_{j}\}_{j\in J} of subsets of {1,2,n}\{1,2,\cdots n\} such that

π(𝐱)=maxjJminiFjgi(𝐱),𝐱Γ.\pi(\mathbf{x})=\max_{j\in J}\min_{i\in F_{j}}g_{i}(\mathbf{x}),\quad\operatorname{\forall}\mathbf{x}\in\Gamma. (12)

Conversely, for any family of distinct linear function {g1,,gn}\{g_{1},\cdots,g_{n}\}, the above formula defines a piecewise linear function.

The expression on the right side in (12) is a Max-Min (lattice) polynomial in the variables gig_{i} [42].

We write the πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) in Max-Min term by Theorem 1. Let {g1,g2,g3}\{g_{1},g_{2},g_{3}\} be the set of components of πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) on 3\mathbb{R}^{3} with

g1(𝐱)\displaystyle g_{1}(\mathbf{x}) =[1,𝐈23𝐈31,𝐈31]𝐱,\displaystyle=[1,\ \mathbf{I}_{23}\mathbf{I}_{31},\ \mathbf{I}_{31}]\cdot\mathbf{x}, (13a)
g2(𝐱)\displaystyle g_{2}(\mathbf{x}) =[𝐈12, 1,𝐈12𝐈31]𝐱,\displaystyle=[\mathbf{I}_{12},\ 1,\ \mathbf{I}_{12}\mathbf{I}_{31}]\cdot\mathbf{x}, (13b)
g3(𝐱)\displaystyle g_{3}(\mathbf{x}) =[𝐈12𝐈23,𝐈23, 1]𝐱.\displaystyle=[\mathbf{I}_{12}\mathbf{I}_{23},\ \mathbf{I}_{23},\ 1]\cdot\mathbf{x}. (13c)

where g1(𝐱)=0g_{1}(\mathbf{x})=0, g2(𝐱)=0g_{2}(\mathbf{x})=0, and g3(𝐱)=0g_{3}(\mathbf{x})=0 represent the planes constituted by the intersection of l2l_{2} in (11b) and l3l_{3} in (11c), the intersection of l1l_{1} in (11a) and l3l_{3} in (11c), and intersection of l1l_{1} in (11a) and l2l_{2} in (11b), respectively. Let J={1,2,3}J=\{1,2,3\}, and F1={1,2},F2={1,3},F3={2,3}F_{1}=\{1,2\},F_{2}=\{1,3\},F_{3}=\{2,3\}, then

πoi(𝐱)=maxjJminiFjgi(𝐱),=max{min{g1(𝐱),g2(𝐱)},min{g1(𝐱),g3(𝐱)},min{g2(𝐱),g3(𝐱)}}.\displaystyle\begin{aligned} \pi_{\texttt{oi}}(\mathbf{x})=&\max_{j\in J}\min_{i\in F_{j}}g_{i}(\mathbf{x}),\\ =&\max\{\min\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x})\},\ \min\{g_{1}(\mathbf{x}),g_{3}(\mathbf{x})\},\\ &\min\{g_{2}(\mathbf{x}),g_{3}(\mathbf{x})\}\}.\end{aligned} (14)

Therefore the close form expression of outcome surface is

πoi(𝐱)=\displaystyle\pi_{\texttt{oi}}(\mathbf{x})= max{min{g1(𝐱),g2(𝐱)},min{g1(𝐱),g3(𝐱)},\displaystyle\max\{\min\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x})\},\ \min\{g_{1}(\mathbf{x}),g_{3}(\mathbf{x})\},
min{g2(𝐱),g3(𝐱)}},\displaystyle\min\{g_{2}(\mathbf{x}),g_{3}(\mathbf{x})\}\},
=\displaystyle= 0.\displaystyle\ 0.

Fig. 3 illustrates gi(𝐱),,Qg_{i}(\mathbf{x}),\mathcal{H},Q and πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}).

Refer to caption
((a))
Refer to caption
((b))
Refer to caption
((c))
Figure 3: The top panel (a) illustrates the composition of the function’s definition space Γ\Gamma, which is partitioned into three subspaces Q1Q_{1}, Q2Q_{2}, Q3Q_{3}. In each subspace, the function πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) is linear, that is, πoi(𝐱)=gi(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{i}(\mathbf{x}) for 𝐱Qi\mathbf{x}\in Q_{i} (QiQ_{i} extending indefinitely outward into space). The plot on the right of (a) shows the positions where gi(𝐱)=0g_{i}(\mathbf{x})=0. The planes where the three subspaces (in different colors) intersect constitute the \mathcal{H} in (b). In (b), the three solid black lines correspond to the lines l1l_{1}, l2l_{2}, and l3l_{3}, respectively. The black dashed arrows represent the direction in which the plane shifts as a function of πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}). The gray planes are the \mathcal{H} in Definition 1. The subplanes in green, yellow, and red (each subplane consists of two components, with some region obscured by \mathcal{H}) are the graphical representations of the functions g1(𝐱)=0g_{1}(\mathbf{x})=0, g2(𝐱)=0g_{2}(\mathbf{x})=0, and g3(𝐱)=0g_{3}(\mathbf{x})=0, respectively. They concur at the origin (0,0,0)(0,0,0) and intersect each other to form a continuous surface πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}). (c) displays the relative positioning of the surface πoi(𝐱)=0\pi_{\texttt{oi}}(\mathbf{x})=0 in the Cartesian coordinate system. The red half-plane is g1g_{1} in (13a), the green half-plane is g2g_{2} in (13b), and the yellow half-plane is g3g_{3} in (13c). These three half-planes together form the outcome interface, i.e., πoi(𝐱)=0\pi_{\texttt{oi}}(\mathbf{x})=0.

IV-C2 Construction of Utility Function

In this section, we first prove that the outcome surface is the demarcation surface between win and loss. Then we construct the utility function based on the outcome interface. Suppose Player 1 allocates (u1R1,v1R2,w1R3)(u_{1}R_{1},v_{1}R_{2},w_{1}R_{3}) and Player 2 allocates (u2R1,v2R2,w2R3)(u_{2}R_{1},v_{2}R_{2},w_{2}R_{3}) on a node, the robot distribution after subtraction is δu=u1u2\delta u=u_{1}-u_{2}, δv=v1v2\delta v=v_{1}-v_{2}, and δw=w1w2\delta w=w_{1}-w_{2}.

Winning condition. Notably, Player 1 wins on a given node if and only if the remaining robot distribution after subtraction on the node (δu,δv,δw)(\delta u,\delta v,\delta w) can be elimination transformed into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}), with δu,δv,δw0\delta u^{*},\delta v^{*},\delta w^{*}\geq 0 and not all of them equal to 0. Player 2 wins on a given node if and only if the remaining robot distribution after subtraction on the node (δu,δv,δw)(\delta u,\delta v,\delta w) can be eliminated transformed into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}), with δu,δv,δw0\delta u^{*},\delta v^{*},\delta w^{*}\leq 0 and not all of them equal to 0. The game reaches a draw on a given node if and only if the remaining robot distribution after subtraction on the node (δu,δv,δw)(\delta u,\delta v,\delta w) can be eliminated transformed into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}), with δu,δv,δw=0\delta u^{*},\delta v^{*},\delta w^{*}=0.

Next, we show the completeness of the winning condition, i.e.,

Theorem 2

𝐱3\operatorname{\forall}\mathbf{x}\in\mathbb{R}^{3} 222In the appendix, we give an example demonstrating that when the dimension of 𝐱\mathbf{x} is larger than 3, 𝐱\mathbf{x} no longer belongs to one of these three conditions only., 𝐱\mathbf{x} belongs and only belongs to one of the three conditions.

Theorem 2 tells that if 𝐱\mathbf{x} can be elimination transformed into 𝐱\mathbf{x}^{\prime} with all entries of 𝐱0\mathbf{x}^{\prime}\geq 0 and 𝐱0\mathbf{x}^{\prime}\neq 0, then there is no other different elimination transformation that can transform it into 𝐱′′\mathbf{x}^{\prime\prime} that all entries of 𝐱′′0\mathbf{x}^{\prime\prime}\leq 0. This is obvious since the elimination transformation is a deterministic process in 3D. That is, for a vector 𝐯3\mathbf{v}\in\mathbb{R}^{3},

  • if all entries of 𝐯\mathbf{v} are concordant, then by definition 𝐯\mathbf{v} can not be eliminated transformed, or 𝐯\mathbf{v} can only be eliminated transformed into itself.

  • if not all entries of 𝐯\mathbf{v} are concordant, we sequentially search through pairs of entries in 𝐯\mathbf{v}, specifically the combinations (1,2)(1,2), (1,3)(1,3), and (2,3)(2,3). We select the first pair with different signs to operate an elimination transformation and continue this process until the value of one element in a pair reaches zero. This is a deterministic process and if the newly obtained vector 𝐯\mathbf{v}^{\prime} can still undergo an elimination transformation, meaning not all its entries have the same sign. Then we continue this process until we achieve a vector where all entries are concordant. In 3D, this process needs at most two iterations to ensure that the resulting vector has entries of the same sign. Again, each step of this process is deterministic.

Theorem 2 ensures that the defined winning condition divides space 3\mathbb{R}^{3} into three mutually exclusive subspaces. Indeed, the surface πoi(𝐱)=0\pi_{\texttt{oi}}(\mathbf{x})=0 is one of the three subspaces, more precisely, the tie game subspace. πoi(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})>0 corresponding to the Player 1 winning space and πoi(𝐱)<0\pi_{\texttt{oi}}(\mathbf{x})<0 corresponding to the Player 2 winning space. That is,

Theorem 3

At a node, Player 1 wins if and only if πoi(δu,δv,δw)>0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)>0, Player 2 wins if and only if πoi(δu,δv,δw)<0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)<0, and the game is tied if and only if πoi(δu,δv,δw)=0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)=0.

Without loss of generality, we prove the winning condition for Player 1. The winning condition for player 2 and the condition for a draw can be proved similarly.

Proof 1

First, we prove the necessity, i.e., if πoi(δu,δv,δw)>0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)>0, then Player 1 wins. According to the winning condition, the winning of Player 1 is that (δu,δv,δw)(\delta u,\delta v,\delta w) can be elimination transformed into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}), where δu,δv,δw0\delta u^{*},\delta v^{*},\delta w^{*}\geq 0 and not all of them are 0.

If δu,δv,δw0\delta u,\delta v,\delta w\geq 0 and not all of them are 0, then (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}) is exactly (δu,δv,δw)(\delta u,\delta v,\delta w). If not all δu,δv,δw0\delta u,\delta v,\delta w\geq 0, without loss of generality, we assume δw<0\delta w<0 and δu,δv0\delta u,\delta v\geq 0 and consider three cases below.

  • Case 1: 𝐱=(δu,δv,δw)𝐐1\mathbf{x}=(\delta u,\delta v,\delta w)\in\mathbf{Q}_{1}, i.e., πoi(𝐱)=g1(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})=g_{1}(\mathbf{x})>0. In this case, we first transform δuR1\delta uR_{1} into R3R_{3} until one of them becomes 0. If R3R_{3} is completely eliminated by R1R_{1}, or,

    δu𝐈31+δw0.\displaystyle\frac{\delta u}{\mathbf{I}_{31}}+\delta w\geq 0. (15)

    then through elimination transformation (δu,δv,δw)(\delta u,\delta v,\delta w) is transformed into (δu+𝐈31δw,δv,0)(\delta u+\mathbf{I}_{31}\delta w,\delta v,0) with each entry no less than 0. If δu𝐈31+δw<0\frac{\delta u}{\mathbf{I}_{31}}+\delta w<0, it means δuR1\delta uR_{1} does not suffice to eliminate R3R_{3}, and thus we further transform δvR2\delta vR_{2} into R3R_{3}. If

    δu𝐈31+δw+𝐈23δv>0.\displaystyle\frac{\delta u}{\mathbf{I}_{31}}+\delta w+\mathbf{I}_{23}\delta v>0. (16)

    then δvR2\delta vR_{2} suffices to eliminate remaining R3R_{3}, and through elimination transformation (δu,δv,δw)(\delta u,\delta v,\delta w) is transformed into (0,δv+δu𝐈31𝐈23+δw𝐈23,0)(0,\delta v+\frac{\delta u}{\mathbf{I}_{31}\mathbf{I}_{23}}+\frac{\delta w}{\mathbf{I}_{23}},0) with each entry no less than 0. Indeed, (16) is valid since

    (16)=1𝐈31(δu+𝐈31𝐈23δv+𝐈31δw)=1𝐈31g1(𝐱)>0.\displaystyle\begin{aligned} \eqref{proof_1}&=\frac{1}{\mathbf{I}_{31}}(\delta u+\mathbf{I}_{31}\mathbf{I}_{23}\delta v+\mathbf{I}_{31}\delta w)\\ &=\frac{1}{\mathbf{I}_{31}}g_{1}(\mathbf{x})>0.\end{aligned}

    Thus in this case (δu,δv,δw)(\delta u,\delta v,\delta w) can always be elimination transformation into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}) with entries no less than 0 and not all of them are 0.

  • Case 2: 𝐱=(δu,δv,δw)𝐐3\mathbf{x}=(\delta u,\delta v,\delta w)\in\mathbf{Q}_{3}, i.e., πoi(𝐱)=g3(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})=g_{3}(\mathbf{x})>0. We can still run the transformation process the same as for case 1, but need extra steps to prove (16). Note that (14) reveals an implication of πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}), in face πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) represents the median of g1(𝐱),g2(𝐱)g_{1}(\mathbf{x}),g_{2}(\mathbf{x}) and g3(𝐱)g_{3}(\mathbf{x}) given 𝐱\mathbf{x}. Therefore, with 𝐱𝐐3\mathbf{x}\in\mathbf{Q}_{3}, we obtain

    min{g1(𝐱),g2(𝐱)}g3(𝐱)max{g1(𝐱),g2(𝐱)}.\displaystyle\min\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x})\}\leq g_{3}(\mathbf{x})\leq\max\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x})\}. (17)

    Given that δw<0\delta w<0 and δu,δv0\delta u,\delta v\geq 0, it is clear that g3(𝐱)g2(𝐱)g_{3}(\mathbf{x})\geq g_{2}(\mathbf{x}). Thus (17) becomes g2(𝐱)g3(𝐱)g1(𝐱)g_{2}(\mathbf{x})\leq g_{3}(\mathbf{x})\leq g_{1}(\mathbf{x}). Therefore in this case g1(𝐱)g3(𝐱)=πoi(𝐱)0g_{1}(\mathbf{x})\geq g_{3}(\mathbf{x})=\pi_{\texttt{oi}}(\mathbf{x})\geq 0, which shows that (16) is valid.

  • Case 3: 𝐱=(δu,δv,δw)𝐐2\mathbf{x}=(\delta u,\delta v,\delta w)\in\mathbf{Q}_{2}, i.e., πoi(𝐱)=g2(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})=g_{2}(\mathbf{x})>0. Based on the proofs of case 1 and case 2, we know that in this case g2(𝐱)g_{2}(\mathbf{x}) is the median of g1(𝐱),g2(𝐱),g3(𝐱)g_{1}(\mathbf{x}),g_{2}(\mathbf{x}),g_{3}(\mathbf{x}). g3(𝐱)g2(𝐱)g_{3}(\mathbf{x})\geq g_{2}(\mathbf{x}) is valid as long as δw<0,δu,δv0\delta w<0,\delta u,\delta v\geq 0 holds. Thus we have g1(𝐱)g2(𝐱)g3(𝐱)g_{1}(\mathbf{x})\leq g_{2}(\mathbf{x})\leq g_{3}(\mathbf{x}). With g2(𝐱)g1(𝐱)g_{2}(\mathbf{x})\geq g_{1}(\mathbf{x}), (15) always holds, which means δuR1\delta uR_{1} always suffices to eliminate δwR3\delta wR_{3}. Indeed, g2(𝐱)g1(𝐱)g_{2}(\mathbf{x})\geq g_{1}(\mathbf{x}) can be rewritten as

    𝐈31(𝐈121)(δu𝐈31+δw)(𝐈23𝐈311)δv.\displaystyle\mathbf{I}_{31}(\mathbf{I}_{12}-1)(\frac{\delta u}{\mathbf{I}_{31}}+\delta w)\geq(\mathbf{I}_{23}\mathbf{I}_{31}-1)\delta v. (18)

    Given (𝐈23𝐈311)δv0(\mathbf{I}_{23}\mathbf{I}_{31}-1)\delta v\geq 0, (18) leads to (15). This means (δu,δv,δw)(\delta u,\delta v,\delta w) can always be eliminated into (δu+𝐈31δw,δv,0)(\delta u+\mathbf{I}_{31}\delta w,\delta v,0).

Therefore the necessity is proven.

Then we prove the sufficiency, i.e., if Player 1 wins (or equivalently, (δu,δv,δw)(\delta u,\delta v,\delta w) can be elimination transformed into (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}) with δu,δv,δw0\delta u^{*},\delta v^{*},\delta w^{*}\geq 0 and not all of them being 0), then πoi(δu,δv,δw)>0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)>0. We consider two cases.

  • Case 1: δu,δv,δw0\delta u,\delta v,\delta w\geq 0 and not all of them are 0, then (δu,δv,δw)(\delta u^{*},\delta v^{*},\delta w^{*}) is exactly (δu,δv,δw)(\delta u,\delta v,\delta w). Obviously, πoi(δu,δv,δw)>0\pi_{\texttt{oi}}(\delta u,\delta v,\delta w)>0, since the coefficients of piecewise linear function πoi()\pi_{\texttt{oi}}(\cdot) are positive.

  • Case 2: Not all δu,δv,δw0\delta u,\delta v,\delta w\geq 0, assuming δw<0\delta w<0. Upon performing a elimination transformation on (δu,δv,δw)(\delta u,\delta v,\delta w), either (15) or (16) must hold since the result invariably ensures that all entries are non-negative. If (15) holds, then g1(𝐱),g2(𝐱),g3(𝐱)>0g_{1}(\mathbf{x}),g_{2}(\mathbf{x}),g_{3}(\mathbf{x})>0, leading to πoi(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})>0. If (15) does not hold, then (16) must hold and πoi(𝐱)g2(𝐱)\pi_{\texttt{oi}}(\mathbf{x})\neq g_{2}(\mathbf{x}) since we have shown that if πoi(𝐱)=g2(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{2}(\mathbf{x}), then (15) must hold in the proof of necessity. Therefore, either πoi(𝐱)=g1(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{1}(\mathbf{x}) or πoi(𝐱)=g3(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{3}(\mathbf{x}). If πoi(𝐱)=g1(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{1}(\mathbf{x}) then (16) directly leads to πoi(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})>0. If πoi(𝐱)=g3(𝐱)\pi_{\texttt{oi}}(\mathbf{x})=g_{3}(\mathbf{x}), from δu𝐈31+δw<0\frac{\delta u}{\mathbf{I}_{31}}+\delta w<0 and (16), we have

    δv(δu𝐈31𝐈23+δw𝐈23).\displaystyle\delta v\geq-(\frac{\delta u}{\mathbf{I}_{31}\mathbf{I}_{23}}+\frac{\delta w}{\mathbf{I}_{23}}). (19)

    Substituting (19) into g3(𝐱)g_{3}(\mathbf{x}), we have

    g3(𝐱)=𝐈12𝐈23δu+𝐈23δv+δw(𝐈12𝐈231)δu>0.\displaystyle\begin{aligned} g_{3}(\mathbf{x})&=\mathbf{I}_{12}\mathbf{I}_{23}\delta u+\mathbf{I}_{23}\delta v+\delta w\\ &\geq(\mathbf{I}_{12}\mathbf{I}_{23}-1)\delta u\\ &>0.\end{aligned}

    Then we have πoi(𝐱)>0\pi_{\texttt{oi}}(\mathbf{x})>0.

Therefore the sufficiency is also proved.

We use an example to further clarify the property of the outcome interface. Suppose the remaining robot distribution after subtraction on a node is (δu,δv,δw)=(4,2,7)(\delta u,\delta v,\delta w)=(4,2,-7). We evaluate this outcome from two views. First, intuitively, Player 2 eliminates all Player 1’s 2R22R_{2} by 4R3-4R_{3} since R2=2R3R_{2}=2R_{3}, and eliminates all Player 1’s 4R14R_{1} by 2R3-2R_{3} since R3=2R1R_{3}=2R_{1}, and in the end, Player 2 still has 1R31R_{3} while Player 1 remains nothing. Thus Player 2 wins on this node. In this process, we first convert R2R_{2} into R3R_{3}, and then convert R1R_{1} into R3R_{3}, and finally find out that after the transformation there is still remaining R3R_{3} on Player 2’s side. The second view is through the outcome surface πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}). Plugging 𝐱=(δu,δv,δw)=(4,2,7)\mathbf{x}=(\delta u,\delta v,\delta w)=(4,2,-7) into (14), we have πoi((4,2,7))=2<0\pi_{\texttt{oi}}((4,2,-7))=-2<0, indicating Player 2 wins. Indeed the calculation of πoi((4,2,7))\pi_{\texttt{oi}}((4,2,-7)) includes three steps. We first compute g1((4,2,7))g_{1}((4,2,-7)), i.e., calculating the outcome of converting all types of robots into the R1R_{1}, i.e., is 18R1-18R_{1}. Second, we compute g2((4,2,7))g_{2}((4,2,-7)), i.e., calculating the outcome of converting all types of robots into the R2R_{2}, which is 13R213R_{2}. Thirdly, we compute g3((4,2,7))g_{3}((4,2,-7)), i.e., calculating the outcome of converting all types of robots into the R3R_{3}, which is 2R3-2R_{3}. Finally, we choose the middle value to represent the outcome according to the definition of πoi\pi_{\texttt{oi}}. This is consistent with the intuition. The reason that the value from the outcome surface 2-2 is different from that of the intuition 1-1 is that the plane function g1()g_{1}(\cdot), component of πoi()\pi_{\texttt{oi}}(\cdot), is 𝐈13\mathbf{I}_{13} multiplying the converted term (δu/𝐈31+𝐈23δv+δw)(\delta u/\mathbf{I}_{31}+\mathbf{I}_{23}\delta v+\delta w).

The utility function uCDH()u_{\texttt{CDH}}(\cdot) is then computed based on the outcome interface. To ensure function continuity, we assume that Player 1 is only considered to completely win when πoi(𝐱)>C\pi_{\texttt{oi}}(\mathbf{x})>C and the utility for Player 1 is 1. Otherwise, if 0<πoi(𝐱)<C0<\pi_{\texttt{oi}}(\mathbf{x})<C, Player 1 has an incomplete victory, with their utility being a number that is between 0 and 1. The same applies to Player 2. Note that the direction of the surface’s movement from πoi(𝐱)=0\pi_{\texttt{oi}}(\mathbf{x})=0 to πoi(𝐱)=ϵ\pi_{\texttt{oi}}(\mathbf{x})=\epsilon is denoted by the black dashed lines in Fig. 3. The illustration of the constructed utility function is shown in Fig. 4.

Refer to caption
Figure 4: Illustration of utility function uCDHu_{\texttt{CDH}}. The surface πoi(𝐒x𝐒y)\pi_{\texttt{oi}}(\mathbf{S}_{x}-\mathbf{S}_{y}) in the middle is the outcome interface in Fig. 3(c).
uCDH(𝐒x,𝐒y)=i=1Nπ(𝐒x,i𝐒y,i).\displaystyle u_{\texttt{CDH}}(\mathbf{S}_{x},\mathbf{S}_{y})=\sum_{i=1}^{N}{\pi(\mathbf{S}_{x,i}-\mathbf{S}_{y,i})}. (20)
whereπ(𝐱)={1,πoi(𝐱)C;πoi(𝐱)/C,πoi(𝐱)[C,C];1,πoi(𝐱)C.\displaystyle\text{where}\quad\pi(\mathbf{x})=\left\{\begin{aligned} -1&,&\pi_{\texttt{oi}}(\mathbf{x})&\leq-C;\\ \pi_{\texttt{oi}}(\mathbf{x})/C&,&\pi_{\texttt{oi}}(\mathbf{x})&\in[-C,C];\\ 1&,&\pi_{\texttt{oi}}(\mathbf{x})&\geq C.\end{aligned}\right.

CC is the threshold for win or loss. The remaining robot distribution after subtraction on ii-th node 𝐒x,i𝐒y,i=(δu,δv,δw)\mathbf{S}_{x,i}-\mathbf{S}_{y,i}=(\delta u,\delta v,\delta w) is situated on the surface πoi\pi_{\texttt{oi}}. If πoi>C\pi_{\texttt{oi}}>C, then uCDH(𝐒x,i𝐒y,i)=1u_{\texttt{CDH}}(\mathbf{S}_{x,i}-\mathbf{S}_{y,i})=1, indicating an absolute win for Player 1. Conversely, if πoi<C\pi_{\texttt{oi}}<-C, then uCDH(𝐒x,i𝐒y,i)=1u_{\texttt{CDH}}(\mathbf{S}_{x,i}-\mathbf{S}_{y,i})=-1, indicating an absolute win for Player 2. If C<πoi<C-C<\pi_{\texttt{oi}}<C, then uCDH(𝐒x,i𝐒y,i)=πoi/Cu_{\texttt{CDH}}(\mathbf{S}_{x,i}-\mathbf{S}_{y,i})=\pi_{\texttt{oi}}/C, meaning that one side achieves winning to a certain extent. This way of constructing the utility function ensures its continuity [37], consequently facilitating the calculation of the best response strategy (Algorithm 1, line 4).

IV-D DOA with CDH robots

We next introduce the steps of solving the optimization problem of (9) in Algorithm 1, line 4 with the constructed utility function uCDH()u_{\texttt{CDH}}(\cdot). The main idea is to rewrite the piecewise linear function into linear inequalities and transform the problem into a linear programming problem.

In (14), πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) has the Max-Min representation. We further rewrite it by linear inequalities. First, we rewrite it as

πoi(𝐱)=g1(𝐱)+g2(𝐱)+g3(𝐱)min{g1(𝐱),g2(𝐱),g3(𝐱)}max{g1(𝐱),g2(𝐱),g3(𝐱)}=g1(𝐱)+max{g2(𝐱)g1(𝐱),g2(𝐱)g3(𝐱),0}max{g1(𝐱)g3(𝐱),g2(𝐱)g3(𝐱),0}.\displaystyle\begin{aligned} \pi_{\texttt{oi}}(\mathbf{x})=&\ g_{1}(\mathbf{x})+g_{2}(\mathbf{x})+g_{3}(\mathbf{x})-\min\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x}),g_{3}(\mathbf{x})\}\\ &-\max\{g_{1}(\mathbf{x}),g_{2}(\mathbf{x}),g_{3}(\mathbf{x})\}\\ =&\ g_{1}(\mathbf{x})+\max\{g_{2}(\mathbf{x})-g_{1}(\mathbf{x}),g_{2}(\mathbf{x})-g_{3}(\mathbf{x}),0\}\\ &-\max\{g_{1}(\mathbf{x})-g_{3}(\mathbf{x}),g_{2}(\mathbf{x})-g_{3}(\mathbf{x}),0\}.\end{aligned} (21)

Note that (21) expresses πoi(𝐱)\pi_{\texttt{oi}}(\mathbf{x}) in the form of linear combination of max{f1(𝐱),f2(𝐱),0}\max\{f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),0\}. To rewrite it into linear inequalities, Lemma 4 is applied. To begin with, we first introduce [37, Lemma C.1] to write the functions in the form of F(𝐱)=max{f(𝐱),0}F(\mathbf{x})=\max\{f(\mathbf{x}),0\} into inequalities form.

Lemma 3 ([37])

Let f(𝐱)f(\mathbf{x}) be a function of d\mathbb{R}^{d}\rightarrow\mathbb{R}, U>0U>0, and F(𝐱)=max{f(𝐱),0}F(\mathbf{x})=\max\{f(\mathbf{x}),0\}. For every 𝐱\mathbf{x} such that f(𝐱)[U,U]f(\mathbf{x})\in[-U,U] there is a unique ss\in\mathbb{R} and a possible non-unique z{0,1}z\in\{0,1\} solving the system

s0,sUz,sf(𝐱),sf(𝐱)+U(1z).\displaystyle\begin{aligned} s&\geq 0,&s&\leq Uz,\\ s&\geq f(\mathbf{x}),&s&\leq f(\mathbf{x})+U(1-z).\end{aligned}

and it holds F(𝐱)=sF(\mathbf{x})=s.

Based on Lemma 3, one has

Lemma 4

Let f1(𝐱),f2(𝐱)f_{1}(\mathbf{x}),f_{2}(\mathbf{x}) be a function of d\mathbb{R}^{d}\rightarrow\mathbb{R}, U>0U>0, and F(𝐱)=max{f1(𝐱),f2(𝐱),0}F(\mathbf{x})=\max\{f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),0\}. For every 𝐱\mathbf{x} such that f1(𝐱),f2(𝐱),f1(𝐱)f2(𝐱)[U,U]f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),f_{1}(\mathbf{x})-f_{2}(\mathbf{x})\in[-U,U] there is a unique s,ts,t\in\mathbb{R} and a possible non-unique z1,z2{0,1}z_{1},z_{2}\in\{0,1\} solving the system

s0,sUz1,sf1(𝐱),sf1(𝐱)+U(1z1),tf2(𝐱),sf2(𝐱)+Uz2,ts,ts+U(1z2).\displaystyle\begin{aligned} s&\geq 0,&s&\leq Uz_{1},\\ s&\geq f_{1}(\mathbf{x}),&s&\leq f_{1}(\mathbf{x})+U(1-z_{1}),\\ t&\geq f_{2}(\mathbf{x}),&s&\leq f_{2}(\mathbf{x})+Uz_{2},\\ t&\geq s,&t&\leq s+U(1-z_{2}).\\ \end{aligned} (22)

and it holds F(𝐱)=tF(\mathbf{x})=t.

Proof 2

𝐱\operatorname{\forall}\mathbf{x}, s.t. f1(𝐱),f2(𝐱),f1(𝐱)f2(𝐱)[U,U]f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),f_{1}(\mathbf{x})-f_{2}(\mathbf{x})\in[-U,U], there is

max{f1(𝐱),f2(𝐱),0}\displaystyle\max\{f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),0\} =max{max{f1(𝐱),0},f2(𝐱)}\displaystyle=\max\{\max\{f_{1}(\mathbf{x}),0\},f_{2}(\mathbf{x})\}
=max{s,f2(𝐱)}\displaystyle=\max\{s,f_{2}(\mathbf{x})\}
=max{sf2(𝐱),0}+f2(𝐱)\displaystyle=\max\{s-f_{2}(\mathbf{x}),0\}+f_{2}(\mathbf{x})
=t+f2(𝐱).\displaystyle=t^{\prime}+f_{2}(\mathbf{x}).

where ss is the solution of the inequality system

s\displaystyle s 0,\displaystyle\geq 0, s\displaystyle s Uz1,\displaystyle\leq Uz_{1},
s\displaystyle s f1(𝐱),\displaystyle\geq f_{1}(\mathbf{x}), s\displaystyle s f1(𝐱)+U(1z1).\displaystyle\leq f_{1}(\mathbf{x})+U(1-z_{1}).

and s=max{f1(𝐱),0}s=\max\{f_{1}(\mathbf{x}),0\}, and tt^{\prime} is the solution of the inequality system

t\displaystyle t^{\prime} 0,\displaystyle\geq 0, t\displaystyle t^{\prime} Uz2,\displaystyle\leq Uz_{2},
t\displaystyle t^{\prime} sf2(𝐱),\displaystyle\geq s-f_{2}(\mathbf{x}), t\displaystyle t^{\prime} sf2(𝐱)+U(1z2).\displaystyle\leq s-f_{2}(\mathbf{x})+U(1-z_{2}).

and t=max{sf2(𝐱),0}t^{\prime}=\max\{s-f_{2}(\mathbf{x}),0\}, z1,z2{0,1}z_{1},z_{2}\in\{0,1\}, according to Lemma 3.

Denote max{f1(𝐱),f2(𝐱),0}\max\{f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),0\} as tt, following t=t+f2(𝐱)t=t^{\prime}+f_{2}(\mathbf{x}). Substituting tt into the linear inequalities above, we have

t\displaystyle t f2(𝐱),\displaystyle\geq f_{2}(\mathbf{x}), t\displaystyle t f2(𝐱)+Uz2,\displaystyle\leq f_{2}(\mathbf{x})+Uz_{2},
t\displaystyle t s,\displaystyle\geq s, t\displaystyle t s+U(1z2).\displaystyle\leq s+U(1-z_{2}).

Therefore we derive the linear inequalities equivalent to (21) by replacing the max{}\max\{\cdot\} operation with (22).

With Lemma 3 and Lemma 4, the optimization problem of (9) can be transformed into mixed integer linear programming (MILP), which can be solved by the optimization solver gurobi [43]. With utility function as uCDH()u_{\texttt{CDH}}(\cdot), the optimization problem (9) can be reformulated as follows:

Given Player 2’s jj-th step mixed strategy 𝚫yj=[Py(𝐒y1)𝐒y1,Py(𝐒y2)𝐒y2,,Py(𝐒yK)𝐒yK]\mathbf{\Delta}_{y}^{j*}=[P_{y}(\mathbf{S}_{y}^{1})\cdot\mathbf{S}_{y}^{1},\ P_{y}(\mathbf{S}_{y}^{2})\cdot\mathbf{S}_{y}^{2},\ \cdots,\ P_{y}(\mathbf{S}_{y}^{K})\cdot\mathbf{S}_{y}^{K}],

max𝐒x𝒳(𝐝x)i=1KPy(𝐒yi)j=1Nπ(𝐒x,j,𝐒y,ji).\displaystyle\begin{aligned} \max_{\mathbf{S}_{x}\in\mathcal{X}(\mathbf{d}_{x})}\sum_{i=1}^{K}P_{y}(\mathbf{S}_{y}^{i})\sum_{j=1}^{N}\pi(\mathbf{S}_{x,j},\mathbf{S}_{y,j}^{i}).\end{aligned} (23)

According to [37, Appendix C], for any i{1,2,,K}i\in\{1,2,\cdots,K\}, j{1,2,,N}j\in\{1,2,\cdots,N\} and a constant CC, there is

π(𝐒x,j,𝐒y,ji)=max{1C(πoi(𝐒x,j𝐒y,ji)+C),0}max{1C(πoi(𝐒x,j𝐒y,ji)C),0}1.\displaystyle\begin{aligned} &\pi(\mathbf{S}_{x,j},\mathbf{S}_{y,j}^{i})\\ =&\max\{\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C),0\}\\ &-\max\{\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C),0\}-1.\end{aligned}

or, for simplicity, we can denote π(𝐒x,j,𝐒y,ji)\pi(\mathbf{S}_{x,j},\mathbf{S}_{y,j}^{i}) as sijtij1s_{ij}-t_{ij}-1 where max{1C(πoi(𝐒x,j𝐒y,ji)+C),0}\max\{\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C),0\} and tij=max{1C(πoi(𝐒x,j𝐒y,ji)C),0}t_{ij}=\max\{\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C),0\}. With Lemma 3, we obtain sijs_{ij} and tijt_{ij} in the form of linear inequalities system as follows:

sij\displaystyle s_{ij} 0,\displaystyle\geq 0,
sij\displaystyle s_{ij} Uijszij,\displaystyle\leq U_{ij}^{s}z_{ij},
sij\displaystyle s_{ij} 1C(πoi(𝐒x,j𝐒y,ji)+C)+Uijs(1zij),\displaystyle\leq\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C)+U_{ij}^{s}(1-z_{ij}),
sij\displaystyle s_{ij} 1C(πoi(𝐒x,j𝐒y,ji)+C),\displaystyle\geq\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C),
tij\displaystyle t_{ij} 0,\displaystyle\geq 0,
tij\displaystyle t_{ij} Uijtwij,\displaystyle\leq U_{ij}^{t}w_{ij},
tij\displaystyle t_{ij} 1C(πoi(𝐒x,j𝐒y,ji)C)+Uijt(1wij),\displaystyle\leq\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C)+U_{ij}^{t}(1-w_{ij}),
tij\displaystyle t_{ij} 1C(πoi(𝐒x,j𝐒y,ji)C).\displaystyle\geq\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C).

where zij,wij{0,1}z_{ij},w_{ij}\in\{0,1\} and 1C(πoi(𝐒x,j𝐒y,ji)+C)[Uijs,Uijs]\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C)\in[-U_{ij}^{s},U_{ij}^{s}], 1C(πoi(𝐒x,j𝐒y,ji)C)[Uijt,Uijt]\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C)\in[-U_{ij}^{t},U_{ij}^{t}], 𝐒x𝒳(𝐝x)\operatorname{\forall}\mathbf{S}_{x}\in\mathcal{X}(\mathbf{d}_{x}). We can replace the πoi()\pi_{\texttt{oi}}(\cdot) in the linear inequalities as discussed above. Specifically, according to (21), one has

πoi(𝐒x,j𝐒y,ji)=g1(𝐒x,j𝐒y,ji)+pijqij.\displaystyle\begin{aligned} \pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})=g_{1}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+p_{ij}-q_{ij}.\end{aligned}

where pij=max{g21(𝐒x,j𝐒y,ji),g23(𝐒x,j𝐒y,ji), 0}p_{ij}=\max\{g_{21}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),\;g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),\;0\} and qij=max{g13(𝐒x,j𝐒y,ji),g23(𝐒x,j𝐒y,ji), 0}q_{ij}=\max\{g_{13}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),\;g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),\;0\}, g21()=g2()g1()g_{21}(\cdot)=g_{2}(\cdot)-g_{1}(\cdot), g13()=g1()g3()g_{13}(\cdot)=g_{1}(\cdot)-g_{3}(\cdot), g23()=g2()g3()g_{23}(\cdot)=g_{2}(\cdot)-g_{3}(\cdot) for simplicity. Moreover, according to Lemma 4 we can write pijp_{ij} and qijq_{ij} into the form of linear inequalities. Finally, the optimization problem in (23) is reformulated into a mixed integer linear optimization problem as follows.

Given Player 2’s jj-th step mixed strategy 𝚫yj=[Py(𝐒y1)𝐒y1,Py(𝐒y2)𝐒y2,,Py(𝐒yK)𝐒yK]\mathbf{\Delta}_{y}^{j*}=[P_{y}(\mathbf{S}_{y}^{1})\cdot\mathbf{S}_{y}^{1},\ P_{y}(\mathbf{S}_{y}^{2})\cdot\mathbf{S}_{y}^{2},\ \cdots,\ P_{y}(\mathbf{S}_{y}^{K})\cdot\mathbf{S}_{y}^{K}],

max𝐒x𝒳(𝐝x)\displaystyle\max_{\mathbf{S}_{x}\in\mathcal{X}(\mathbf{d}_{x})} i=1KPy(𝐒yi)j=1N(sijtij1)\displaystyle\sum_{i=1}^{K}P_{y}(\mathbf{S}_{y}^{i})\sum_{j=1}^{N}(s_{ij}-t_{ij}-1)
s.t.\displaystyle s.t.\quad sij0,sijUijszij,\displaystyle s_{ij}\geq 0,\quad s_{ij}\leq U_{ij}^{s}z_{ij},
sij1C(g1(𝐒x,j𝐒y,ji)+pijqij+C)+\displaystyle s_{ij}\leq\frac{1}{C}(g_{1}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+p_{ij}-q_{ij}+C)+
Uijs(1zij),\displaystyle\qquad\ \;U_{ij}^{s}(1-z_{ij}),
sij1C(g1(𝐒x,j𝐒y,ji)+pijqij+C),\displaystyle s_{ij}\geq\frac{1}{C}(g_{1}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+p_{ij}-q_{ij}+C),
tij0,tijUijtwij,\displaystyle t_{ij}\geq 0,\quad t_{ij}\leq U_{ij}^{t}w_{ij},
tij1C(g1(𝐒x,j𝐒y,ji)+pijqijC)+\displaystyle t_{ij}\leq\frac{1}{C}(g_{1}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+p_{ij}-q_{ij}-C)+
Uijt(1wij),\displaystyle\qquad\ \;U_{ij}^{t}(1-w_{ij}),
tij1C(g1(𝐒x,j𝐒y,ji)+pijqijC),\displaystyle t_{ij}\geq\frac{1}{C}(g_{1}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+p_{ij}-q_{ij}-C),
δijp0,δijpUijδpzijδp,\displaystyle\delta^{p}_{ij}\geq 0,\quad\delta^{p}_{ij}\leq U^{\delta^{p}}_{ij}z^{\delta^{p}}_{ij},
δijpg21(𝐒x,j𝐒y,ji),\displaystyle\delta^{p}_{ij}\geq g_{21}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),
δijpg21(𝐒x,j𝐒y,ji)+Uijδp(1zijδp),\displaystyle\delta^{p}_{ij}\leq g_{21}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+U^{\delta^{p}}_{ij}(1-z^{\delta^{p}}_{ij}),
pijδijp,pijδijp+Uijpq(1zijp),\displaystyle p_{ij}\geq\delta^{p}_{ij},\quad p_{ij}\leq\delta^{p}_{ij}+U^{pq}_{ij}(1-z^{p}_{ij}),
pijg23(𝐒x,j𝐒y,ji),\displaystyle p_{ij}\geq g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),
pijg23(𝐒x,j𝐒y,ji)+Uijpqzijp,\displaystyle p_{ij}\leq g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+U^{pq}_{ij}z^{p}_{ij},
δijq0,δijqUijδqzijδq,\displaystyle\delta^{q}_{ij}\geq 0,\quad\delta^{q}_{ij}\leq U^{\delta^{q}}_{ij}z^{\delta^{q}}_{ij},
δijqg13(𝐒x,j𝐒y,ji),\displaystyle\delta^{q}_{ij}\geq g_{13}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),
δijqg13(𝐒x,j𝐒y,ji)+Uijδq(1zijδq),\displaystyle\delta^{q}_{ij}\leq g_{13}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+U^{\delta^{q}}_{ij}(1-z^{\delta^{q}}_{ij}),
qijδijq,qijδijq+Uijpq(1zijq),\displaystyle q_{ij}\geq\delta^{q}_{ij},\quad q_{ij}\leq\delta^{q}_{ij}+U^{pq}_{ij}(1-z^{q}_{ij}),
qijg23(𝐒x,j𝐒y,ji),\displaystyle q_{ij}\geq g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),
qijg23(𝐒x,j𝐒y,ji)+Uijpqzijq.\displaystyle q_{ij}\leq g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+U^{pq}_{ij}z^{q}_{ij}.

with

sij,tij,pij,qij\displaystyle s_{ij},t_{ij},p_{ij},q_{ij} ,\displaystyle\in\mathbb{R},
zij,wij,zijδp,zijδq,zijp,zijq\displaystyle z_{ij},w_{ij},z^{\delta^{p}}_{ij},z^{\delta^{q}}_{ij},z^{p}_{ij},z^{q}_{ij} {0,1},\displaystyle\in\{0,1\},
1C(πoi(𝐒x,j𝐒y,ji)+C)\displaystyle\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C) [Uijs,Uijs],\displaystyle\in[-U_{ij}^{s},U_{ij}^{s}],
1C(πoi(𝐒x,j𝐒y,ji)C)\displaystyle\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C) [Uijt,Uijt],\displaystyle\in[-U_{ij}^{t},U_{ij}^{t}],
g21(𝐒x,j𝐒y,ji)\displaystyle g_{21}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) [Uijδp,Uijδp],\displaystyle\in[-U_{ij}^{\delta^{p}},U_{ij}^{\delta^{p}}],
g13(𝐒x,j𝐒y,ji)\displaystyle g_{13}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) [Uijδq,Uijδq],\displaystyle\in[-U_{ij}^{\delta^{q}},U_{ij}^{\delta^{q}}],
g23(𝐒x,j𝐒y,ji)\displaystyle g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) [Uijpq,Uijpq].\displaystyle\in[-U_{ij}^{pq},U_{ij}^{pq}].

Note that g21(),g13(),g23()g_{21}(\cdot),g_{13}(\cdot),g_{23}(\cdot) are linear functions and 𝐒x,j𝐒y,ji\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i} is bounded. Thus g21(𝐒x,j𝐒y,ji),g13(𝐒x,j𝐒y,ji),g23(𝐒x,j𝐒y,ji)g_{21}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),g_{13}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}),g_{23}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) are all bounded. Uijδp,Uijδq,UijpqU_{ij}^{\delta^{p}},U_{ij}^{\delta q},U_{ij}^{pq} are the large numbers to include their bounds. Since πoi(𝐒x,j𝐒y,ji)\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) is piecewise linear, both 1C(πoi(𝐒x,j𝐒y,ji)+C)\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})+C) and 1C(πoi(𝐒x,j𝐒y,ji)C)\frac{1}{C}(\pi_{\texttt{oi}}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i})-C) are bounded and Uijs,UijtU_{ij}^{s},U_{ij}^{t} are large number to include their bounds. Besides, since gi()g_{i}(\cdot) are linear functions, thus all gi(𝐒x,j𝐒y,ji)g_{i}(\mathbf{S}_{x,j}-\mathbf{S}_{y,j}^{i}) above can be written as gi(𝐒x,j)gi(𝐒y,ji)g_{i}(\mathbf{S}_{x,j})-g_{i}(\mathbf{S}_{y,j}^{i}).

V Numerical Evaluation

Refer to caption
Figure 5: Three different graphs: G1G_{1} (left), G2G_{2} (middle), G3G_{3} (right).
Refer to caption
Figure 6: DOA achieves the equilibrium (utility 0) in homogeneous robot allocation on G2G_{2}. The blue curve shows the the upper utility of the game UuU_{u} and the red curve shows the lower utility of the game UlU_{l} (Algorithm 1). The green curve shows the expected utility of the game in each iteration.
Refer to caption
Figure 7: Comparison of the expected utilities by DOA and other baselines for homogeneous robot allocation on G2G_{2}.
Refer to caption
Figure 8: Illustration of the reachable sets and mixed strategies calculated by DOA. The top three subfigures show the mixed strategies of Player 1 and the bottom three subfigures show that of Player 2. In each subfigure, the light blue region is the next-step reachable set for the initial allocation (represented by the red and blue star for Player 1 and Player 2, respectively). The colorful dots on the reachable set are the mixed strategies for two players, each dot representing one pure strategy. The distribution of the mixed strategies is represented by the heatmap on the right-hand side of each subfigure. The heatmap, transitioning from blue to yellow, represents the relative probability ppminpmaxpmin\frac{p-p_{\texttt{min}}}{p_{\texttt{max}}-p_{\texttt{min}}} of a pure strategy, ranging from low to high.
Refer to caption
Figure 9: DOA achieves equilibrium (utility -0.527925) in CDH robot allocation on G2G_{2}. The blue curve shows the upper value of the game UuU_{u} and the red curve shows the lower value of the game UlU_{l} (Algorithm 1). The green curve shows the expected utility of the game in each iteration.
Refer to caption
((a)) On G1G_{1}
Refer to caption
((b)) On G2G_{2}
Refer to caption
((c)) On G3G_{3}
Figure 10: Comparison of the expected utilities by DOA and other baselines for CDH robot allocation on graphs (a) G1G_{1}, (b) G2G_{2}, and (c) G3G_{3}.
Refer to caption
Figure 11: Comparison of the expected utilities by DOA and other baselines for homogeneous robot allocation on G2G_{2} with 𝐈12=𝐈23=𝐈31=500\mathbf{I}_{12}=\mathbf{I}_{23}=\mathbf{I}_{31}=500.

In this section, we conduct numerical simulations to demonstrate the effectiveness of DOA in computing the Nash equilibrium for the robot allocation games with homogeneous, linear heterogeneous, and CDH robots. Following the settings in  [20], we use the fraction of the robot population instead of the number of robots to represent the allocation amount. That is, we use 𝐒ij\mathbf{S}_{ij} to represent the fraction of the population of the ii-th type of robots allocated on the jj-th node. Then j𝐒ij=Ai=1\sum_{j}\mathbf{S}_{ij}=A_{i}=1 for the ii-th robot type. With homogeneous and linear heterogeneous robots, we consider two players to allocate robots on a three-node directed graph G2=(𝒱2,2)G_{2}=(\mathcal{V}_{2},\mathcal{E}_{2}) with 𝒱2={1,2,3}\mathcal{V}_{2}=\{1,2,3\} and 2={(1,2),(2,3),(3,1)}\mathcal{E}_{2}=\{(1,2),(2,3),(3,1)\}, i.e., G2G_{2} in Fig. 5. For the CDH robot allocation, we consider three robot types as mentioned in Section I, e.g., R1R_{1} stands for security robot, R2R_{2} stands for network intrusion robot, and R3R_{3} stands for combat robot. We evaluate three different cases. In the first case, two players allocate robots on a three-node directed graph G1=(𝒱1,1)G_{1}=(\mathcal{V}_{1},\mathcal{E}_{1}) with 𝒱1={1,2,3}\mathcal{V}_{1}=\{1,2,3\} and 1={(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)}\mathcal{E}_{1}=\{(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)\}, i.e., G1G_{1} in Fig. 5. In the second case, two players allocate robots on a thee-node directed graph G2=(𝒱2,2)G_{2}=(\mathcal{V}_{2},\mathcal{E}_{2}) with 𝒱2={1,2,3}\mathcal{V}_{2}=\{1,2,3\} and 2={(1,2),(2,3),(3,1)}\mathcal{E}_{2}=\{(1,2),(2,3),(3,1)\}, i.e., G2G_{2} in Fig. 5. In the third case, two players allocate robots on a five-node directed graph G3=(𝒱3,3)G_{3}=(\mathcal{V}_{3},\mathcal{E}_{3}) with 𝒱3={1,2,3,4,5}\mathcal{V}_{3}=\{1,2,3,4,5\} and 3={(1,2),(1,5),(2,3),(2,4),(3,4),(4,3),(4,5),(5,1)}\mathcal{E}_{3}=\{(1,2),(1,5),(2,3),(2,4),(3,4),(4,3),(4,5),(5,1)\}, i.e., G3G_{3} in Fig. 5.

The initial robot distribution is denoted as 𝐝x\mathbf{d}_{x} for Player 1 and 𝐝y\mathbf{d}_{y} for Player 2. The reachable set 𝒳(𝐝x)\mathcal{X}(\mathbf{d}_{x}) and 𝒴(𝐝x)\mathcal{Y}(\mathbf{d}_{x}) (light blue region in Fig. 8) is calculated based on the extreme action space 𝒯^x\hat{\mathcal{T}}_{x} and 𝒯^y\hat{\mathcal{T}}_{y} as in (5) of Section III-B. The elements of extreme action space can be used to form the boundary of the mixed strategies. The number of elements in extreme action space 𝒯^x\hat{\mathcal{T}}_{x} and 𝒯^y\hat{\mathcal{T}}_{y} are denoted as txt_{x} and tyt_{y}, respectively. Then the best response strategy in the DOA (Algorithm 1, line 4) for Player 1 can be calculated by:

max𝐒x,λk=1KPy,ki=1Mj=1Nu(𝐒x,ij,𝐒y,ijk).\displaystyle\max_{\mathbf{S}_{x},\lambda}\sum_{k=1}^{K}P_{y,k}\sum_{i=1}^{M}\sum_{j=1}^{N}u(\mathbf{S}_{x,ij},\mathbf{S}_{y,ij}^{k}). (24)
s.t. j=1N𝐒x,ij=1,i{1,2,3};\displaystyle\sum_{j=1}^{N}\mathbf{S}_{x,ij}=1,i\in\{1,2,3\};
k=1txλki𝐓xk=𝐒x,i,i{1,2,3},𝐓xk𝒯^x;\displaystyle\sum_{k=1}^{t_{x}}\lambda_{k}^{i}\mathbf{T}_{x}^{k}=\mathbf{S}_{x,i},\enspace i\in\{1,2,3\},\enspace\mathbf{T}_{x}^{k}\in\hat{\mathcal{T}}_{x};
k=1txλki=1,i{1,2,3}.\displaystyle\sum_{k=1}^{t_{x}}\lambda_{k}^{i}=1,i\in\{1,2,3\}.

We set the outcome threshold CC in (20) to be 0.50.5 for the game with homogeneous and linear heterogeneous robots and 1.51.5 for the game with CDH robots. The intrinsic transformation ratios are 𝐈12=𝐈23=𝐈31=2\mathbf{I}_{12}=\mathbf{I}_{23}=\mathbf{I}_{31}=2. The maximum iteration time is set to be 15, and we record 30 trials of the outcome for the following four baselines.

  • Baseline 1: Change 20%20\% of one player strategy from the mixed strategy 𝚫X\mathbf{\Delta}_{X} calculated by DOA to the random strategies within constraints. The opponent uses the best response strategy.

  • Baseline 2: Change 40%40\% of one player strategy from the mixed strategy 𝚫X\mathbf{\Delta}_{X} calculated by DOA to random strategies within constraints. The opponent uses the best response strategy.

  • Baseline 3: Change 80%80\% of one player strategy from the mixed strategy 𝚫X\mathbf{\Delta}_{X} calculated by DOA to random strategies within constraints. The opponent uses the best response strategy.

  • Baseline 4: Change 100%100\% of one player strategy from the mixed strategy 𝚫X\mathbf{\Delta}_{X} calculated by DOA to random strategies within constraints. The opponent uses the best response strategy.

Suppose Player 1’s strategy after change is 𝚫X\mathbf{\Delta}_{X}^{\prime}, we denote the best response strategy of Player 2 for 𝚫X\mathbf{\Delta}_{X}^{\prime} as 𝐒yδy(𝚫X)\mathbf{S}_{y}^{\prime}\in\delta_{y}(\mathbf{\Delta}_{X}^{\prime}). According to Lemma 2 [37], any change from the equilibrium strategy of one player will benefit her opponent. This means if we change Player 1’s strategy, then the expected utility U(𝚫X,𝐒y)U(\mathbf{\Delta}_{X}^{\prime},\mathbf{S}_{y}^{\prime}) tends to decrease from equilibrium value U(𝚫X,𝚫Y)U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}), namely U(𝚫X,𝐒y)<U(𝚫X,𝚫Y)U(\mathbf{\Delta}_{X}^{\prime},\mathbf{S}_{y}^{\prime})<U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}). Particularly, we test two strategy variation schemes. In the first scheme, we adjust Player 2’s strategy according to the baselines and calculate Player 1’s best response strategy. The outcome of this scheme is represented by box plots with a darker shade in Figs. 7, 10, and 11. Conversely, in the second scheme, we modify Player 1’s strategy according to the baselines and calculate Player 2’s best response strategy. Its outcome is represented by box plots with a lighter shade in Figs. 7, 10, and 11.

V-A Homogeneous or Linear Heterogeneous Robots

With homogeneous robots, the utility function u()u(\cdot) in (24) is the classic function (1) introduced in Section II-B. In [37], the allocation game with homogeneous robots has been solved without considering graph constraints. For linear heterogeneous robot allocation, the approach is the same except that we first transform different robot types into one type, as mentioned in Section IV-B. Therefore, we study these two cases together.

Fig. 6 shows the convergence of DOA, where the blue curve represents the upper utility of the game UuU_{u} and the red curve represents the lower utility of the game UlU_{l} (Algorithm 1). In iteration 20, their values converge. According to Lemma 2, this demonstrates that the mixed strategies computed by DOA achieve the equilibrium.

We compare our approach (i.e., DOA) with four baselines across ten trials where we randomly generate the initial allocations for the two players on G2G_{2}. Within each trial, we run baselines with randomness involved 1000 times and illustrate them via the box plot. The results are shown Fig. 7. Recall that Lemma 2 states that, in the context of an equilibrium mixed strategy set (𝚫X,𝚫Y)(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}), any variation in one party’s strategy invariably shifts the outcome towards a direction more advantageous to the opposing party. Here, the utility function is u(𝐒x𝐒y)u(\mathbf{S}_{x}-\mathbf{S}_{y}) (Section II-B). Thus, a larger uu benefits Player 1 while a small uu benefits Player 2. Fig. 7 shows the expected values via the first scheme (i.e., darker shades) are always greater than the value obtained by DOA. That is because, with the first scheme, Player 2’s strategy is randomly changed (with different levels for different baselines). This change benefits Player 1, as reflected by the increase in the game’s expected value. Analogously, the expected values by the second scheme two (i.e., lighter shades) are always less than the value obtained by DOA. Because, with the second scheme, Player 1’s strategy is randomly changed, which benefits Player 2, as reflected by the decrease in the game’s expected value. In other words, any variation from DOA leads to a worse utility of a corresponding player. Therefore, this further demonstrates that the mixed strategies calculated by DOA achieve the equilibrium of the game with homogeneous or linear heterogeneous robots.

V-B CDH Robots

For CDH robot allocation on three-node graphs G1G_{1} and G2G_{2}, we set the initial robot distribution for Player 1 𝐝x\mathbf{d}_{x} and for Player 2 𝐝y\mathbf{d}_{y} as:

𝐝x=[0.70.10.20.40.40.20.30.10.6],𝐝y=[0.20.20.60.350.150.50.40.20.4].\mathbf{d}_{x}=\begin{bmatrix}0.7&0.1&0.2\\ 0.4&0.4&0.2\\ 0.3&0.1&0.6\end{bmatrix},\quad\mathbf{d}_{y}=\begin{bmatrix}0.2&0.2&0.6\\ 0.35&0.15&0.5\\ 0.4&0.2&0.4\end{bmatrix}.

For the five-node graph G3G_{3}, we set the initial robot distributions as:

𝐝x\displaystyle\mathbf{d}_{x} =[0.20.30.10.10.30.30.10.40.10.10.20.10.10.10.5]\displaystyle=\begin{bmatrix}0.2&0.3&0.1&0.1&0.3\\ 0.3&0.1&0.4&0.1&0.1\\ 0.2&0.1&0.1&0.1&0.5\end{bmatrix}
𝐝y\displaystyle\mathbf{d}_{y} =[0.10.20.30.1,0.30.350.150.10.10.30.150.20.350.10.2].\displaystyle=\begin{bmatrix}0.1&0.2&0.3&0.1,&0.3\\ 0.35&0.15&0.1&0.1&0.3\\ 0.15&0.2&0.35&0.1&0.2\end{bmatrix}.

The column of 𝐝x\mathbf{d}_{x} refers to different nodes and the row of 𝐝x\mathbf{d}_{x} refers to different robot types. The utility function is uCDH()u_{\texttt{CDH}}(\cdot) in (20).

Fig. 8 gives an example of the mixed strategies obtained by DOA. It is observed that for both players the mixed strategies calculated by DOA lie in the reachable set, which illustrates the graph constraints. In Fig. 9 the upper utility of the game UuU_{u} (blue curve) and the lower utility of the game UlU_{l} (red curve) converge to the expected utility of the game after after 40 iterations. According to Lemma 2, this demonstrates that the mixed strategies computed by DOA are the equilibrium (or optimal) strategies.

In Fig. 10, we compare the expected utilities by DOA and other baselines (formed via the two schemes) for CDH robot allocation on different graphs G1G_{1}, G2G_{2} and G3G_{3}. Similar to the results of homogeneous robot allocation, the expected utility U(𝚫X,𝚫Y)U(\mathbf{\Delta}_{X}^{*},\mathbf{\Delta}_{Y}^{*}) is nearly zero and any change from the DOA strategies of one player always benefits her opponent, which is in line with the properties of equilibrium (Lemma 2). In Fig. 11, we set the intrinsic transformation ratio as a large number, i.e., 𝐈12=𝐈23=𝐈31=500\mathbf{I}_{12}=\mathbf{I}_{23}=\mathbf{I}_{31}=500 to model the case of the absolute dominance heterogeneous robot allocation. The result also aligns with Lemma 2. Therefore, Fig. 10 and Fig. 11 further verify that the mixed strategies calculated by DOA are the equilibrium strategies of the game with the CDH robots.

In Fig. 7, Fig. 10-(b), (c), and Fig. 11, we observe that the value of expected utility calculated by DOA is not zero. Notably, the conventional equi-resource Colonel Blotto game is a zero-sum game [27], [26] and the utility at the equilibrium is zero. This can be demonstrated by Fig. 10-(a) where the two players have the same number of robots and the robots can move flexibly with G1G_{1} a complete graph. However, due to the graph constraints in Fig. 7, Fig. 10-(b), (c), and Fig. 11, the game between the two players is not symmetric with different initial robot distributions. This non-symmetry leads to a non-zero utility value at the equilibrium.

VI Conclusions and Future Work

In this paper, we formulated a two-player robot allocation game on graphs. Then we leveraged the DOA to calculate the equilibrium of the games with homogeneous, linear heterogeneous, and CDH robots. Particularly, for CDH robot allocation, we designed a new transformation approach that offers reasonable comparisons between different robot types. Based on that, we designed a novel utility function to quantify the outcome of the game. Finally, we conducted extensive simulations to demonstrate the effectiveness of DOA in finding the Nash equilibrium.

Our first future work is to include the spatial and temporal factors in the robot allocation problem. The current setting assumes instantaneous transitions of robots between nodes. However, in realistic scenarios, spatial and temporal factors such as the distances between nodes and varying transition speeds of the robots need to be considered. Second, we will incorporate the elements of deception into the game, which differs from the current setting that assumes both players to have real-time awareness of each other’s strategy and allocation. This would encourage the players to strategically mislead opponents by sacrificing some immediate gains to achieve larger benefits in the longer term.

References

  • [1] B. Zhou, H. Xu, and S. Shen, “Racer: Rapid collaborative exploration with a decentralized multi-uav system,” IEEE Transactions on Robotics, 2023.
  • [2] Y. Tao, Y. Wu, B. Li, F. Cladera, A. Zhou, D. Thakur, and V. Kumar, “Seer: Safe efficient exploration for aerial robots using learning to predict information gain,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 1235–1241.
  • [3] A. Ahmadzadeh, J. Keller, G. Pappas, A. Jadbabaie, and V. Kumar, “An optimization-based approach to time-critical cooperative surveillance and coverage with uavs,” in Experimental Robotics: The 10th International Symposium on Experimental Robotics.   Springer, 2008, pp. 491–500.
  • [4] S. Bhattacharya, R. Ghrist, and V. Kumar, “Multi-robot coverage and exploration on riemannian manifolds with boundaries,” The International Journal of Robotics Research, vol. 33, no. 1, pp. 113–137, 2014.
  • [5] R. K. Ramachandran, L. Zhou, J. A. Preiss, and G. S. Sukhatme, “Resilient coverage: Exploring the local-to-global trade-off,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 11 740–11 747.
  • [6] V. D. Sharma, L. Zhou, and P. Tokekar, “D2coplan: A differentiable decentralized planner for multi-robot coverage,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 3425–3431.
  • [7] B. Grocholsky, J. Keller, V. Kumar, and G. Pappas, “Cooperative air and ground surveillance,” IEEE Robotics & Automation Magazine, vol. 13, no. 3, pp. 16–25, 2006.
  • [8] D. Thakur, M. Likhachev, J. Keller, V. Kumar, V. Dobrokhodov, K. Jones, J. Wurz, and I. Kaminer, “Planning for opportunistic surveillance with multiple robots,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2013, pp. 5750–5757.
  • [9] P. Dames, P. Tokekar, and V. Kumar, “Detecting, localizing, and tracking an unknown number of moving targets using a team of mobile robots,” The International Journal of Robotics Research, vol. 36, no. 13-14, pp. 1540–1553, 2017.
  • [10] L. Zhou and P. Tokekar, “Active target tracking with self-triggered communications in multi-robot teams,” IEEE Transactions on Automation Science and Engineering, vol. 16, no. 3, pp. 1085–1096, 2018.
  • [11] L. Zhou, V. Tzoumas, G. J. Pappas, and P. Tokekar, “Resilient active target tracking with multiple robots,” IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 129–136, 2018.
  • [12] L. Zhou and P. Tokekar, “Sensor assignment algorithms to improve observability while tracking targets,” IEEE Transactions on Robotics, vol. 35, no. 5, pp. 1206–1219, 2019.
  • [13] S. Mayya, R. K. Ramachandran, L. Zhou, V. Senthil, D. Thakur, G. S. Sukhatme, and V. Kumar, “Adaptive and risk-aware target tracking for robot teams with heterogeneous sensors,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5615–5622, 2022.
  • [14] L. Zhou, V. D. Sharma, Q. Li, A. Prorok, A. Ribeiro, P. Tokekar, and V. Kumar, “Graph neural networks for decentralized multi-robot target tracking,” in 2022 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).   IEEE, 2022, pp. 195–202.
  • [15] L. Zhou and V. Kumar, “Robust multi-robot active target tracking against sensing and communication attacks,” IEEE Transactions on Robotics, 2023.
  • [16] D. B. Reynolds, “Oil exploration game with incomplete information: an experimental study,” Energy Sources, vol. 23, no. 6, pp. 571–578, 2001.
  • [17] R. B. Myerson, “Incentives to cultivate favored minorities under alternative electoral systems,” American Political Science Review, vol. 87, no. 4, pp. 856–869, 1993.
  • [18] D. Kovenock and B. Roberson, “Coalitional colonel blotto games with application to the economics of alliances,” Journal of Public Economic Theory, vol. 14, no. 4, pp. 653–676, 2012.
  • [19] D. Kovenock and B. Roberson, “Conflicts with multiple battlefields,” Oxford Handbooks Online, 2010.
  • [20] S. Berman, A. Halász, M. A. Hsieh, and V. Kumar, “Optimized stochastic policies for task allocation in swarms of robots,” IEEE transactions on robotics, vol. 25, no. 4, pp. 927–937, 2009.
  • [21] A. Prorok, M. A. Hsieh, and V. Kumar, “The impact of diversity on optimal control policies for heterogeneous robot swarms,” IEEE Transactions on Robotics, vol. 33, no. 2, pp. 346–358, 2017.
  • [22] E. Borel, “La théorie du jeu et les équations intégralesa noyau symétrique,” Comptes rendus de l’Académie des Sciences, vol. 173, no. 1304-1308, p. 58, 1921.
  • [23] E. Borel, “The theory of play and integral equations with skew symmetric kernels,” Econometrica: journal of the Econometric Society, pp. 97–100, 1953.
  • [24] E. Borel, … Applications [de la théorie des probabilités] aux jeux de hasard: cours professé à la Faculté des sciences de Paris.   Gauthier-Villars, 1938, vol. 2.
  • [25] O. Gross and R. Wagner, A continuous Colonel Blotto game.   Rand Corporation, 1950.
  • [26] B. Roberson, “The colonel blotto game,” Economic Theory, vol. 29, no. 1, pp. 1–24, 2006.
  • [27] S. Behnezhad, S. Dehghani, M. Derakhshan, M. HajiAghayi, and S. Seddighin, “Faster and simpler algorithm for optimal strategies of blotto game,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
  • [28] A. Ahmadinejad, S. Dehghani, M. Hajiaghayi, B. Lucier, H. Mahini, and S. Seddighin, “From duels to battlefields: Computing equilibria of blotto and other games,” Mathematics of Operations Research, vol. 44, no. 4, pp. 1304–1325, 2019.
  • [29] D. Shishika, Y. Guan, M. Dorothy, and V. Kumar, “Dynamic defender-attacker blotto game,” in 2022 American Control Conference (ACC).   IEEE, 2022, pp. 4422–4428.
  • [30] M. Jain, D. Korzhyk, O. Vaněk, V. Conitzer, M. Pěchouček, and M. Tambe, “A double oracle algorithm for zero-sum security games on graphs,” in The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1.   Citeseer, 2011, pp. 327–334.
  • [31] A. Szolnoki, M. Mobilia, L.-L. Jiang, B. Szczesny, A. M. Rucklidge, and M. Perc, “Cyclic dominance in evolutionary games: a review,” Journal of the Royal Society Interface, vol. 11, no. 100, p. 20140735, 2014.
  • [32] M. A. Nowak and K. Sigmund, “Evolutionary dynamics of biological games,” science, vol. 303, no. 5659, pp. 793–799, 2004.
  • [33] J. C. Claussen and A. Traulsen, “Cyclic dominance and biodiversity in well-mixed populations,” Physical review letters, vol. 100, no. 5, p. 058104, 2008.
  • [34] H. B. McMahan, G. J. Gordon, and A. Blum, “Planning in the presence of cost functions controlled by an adversary,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 536–543.
  • [35] B. Bosansky, C. Kiekintveld, V. Lisy, and M. Pechoucek, “Iterative algorithm for solving two-player zero-sum extensive-form games with imperfect information,” in Proceedings of the 20th European Conference on Artificial Intelligence (ECAI), 2012, pp. 193–198.
  • [36] B. Bosansky, C. Kiekintveld, V. Lisy, J. Cermak, and M. Pechoucek, “Double-oracle algorithm for computing an exact nash equilibrium in zero-sum extensive-form games,” in Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, 2013, pp. 335–342.
  • [37] L. Adam, R. Horčík, T. Kasl, and T. Kroupa, “Double oracle algorithm for computing equilibria in continuous games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5070–5077.
  • [38] J. Bang-Jensen and G. Gutin, “Basic terminology, notation and results,” Classes of Directed Graphs, pp. 1–34, 2018.
  • [39] J. Gómez, E. A. Canale, and X. Muñoz, “Unilaterally connected large digraphs and generalized cycles,” Networks: An International Journal, vol. 42, no. 4, pp. 181–188, 2003.
  • [40] I. L. Glicksberg, “A further generalization of the kakutani fixed theorem, with application to nash equilibrium points,” Proceedings of the American Mathematical Society, vol. 3, no. 1, pp. 170–174, 1952.
  • [41] B. Bosansky, C. Kiekintveld, V. Lisy, and M. Pechoucek, “An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information,” Journal of Artificial Intelligence Research, vol. 51, pp. 829–866, 2014.
  • [42] S. Ovchinnikov, “Max-min representation of piecewise linear functions,” arXiv preprint math/0009026, 2000.
  • [43] Gurobi Optimization, LLC, “Gurobi Optimizer Reference Manual,” 2023. [Online]. Available: https://www.gurobi.com

We use an example to explain the hardness caused by increasing the number of robot types to four, i.e., M=4M=4. Suppose there are four types of robots R1,R2,R3,R4R_{1},R_{2},R_{3},R_{4}, and the intrinsic matrix is

𝐈=[121/21/2121/21221/21.]\mathbf{I}=\begin{bmatrix}1&2&-&1/2\\ 1/2&1&2&-\\ -&1/2&1&2\\ 2&-&1/2&1.\end{bmatrix}

Note that the entries 𝐈13,𝐈24,𝐈31,𝐈42\mathbf{I}_{13},\mathbf{I}_{24},\mathbf{I}_{31},\mathbf{I}_{42} (denoted as - in the intrinsic matrix) are not defined, as they are not involved in the calculation. For instance, on a node, Player 1’s allocation is (u1,v1,w1,z1)=(1,0,1,0)(u_{1},v_{1},w_{1},z_{1})=(1,0,1,0) while Player 2’s allocation is (u2,v2,w2,z2)=(0,1,0,1)(u_{2},v_{2},w_{2},z_{2})=(0,1,0,1). Therefore the remaining robot distribution after subtraction is (δu,δv,δw,δz)=(1,1,1,1)(\delta u,\delta v,\delta w,\delta z)=(1,-1,1,-1). This scenario presents a challenging outcome to determine a clear winner. Consider the intrinsic relationship between R1R_{1} and R2R_{2} and that between R3R_{3} and R4R_{4}. Given that R1R_{1} equals 2R22R_{2}, the elimination of 1R11R_{1} from 1R21R_{2} results in a remaining balance of 0.5R10.5R_{1}. Similarly, the elimination of 1R31R_{3} from 1R41R_{4} leads to a balance of 0.5R30.5R_{3}. From the perspective of the remaining robots, represented as (0.5,0,0.5,0)(0.5,0,0.5,0), it shows that Player 1 wins. However, if we instead consider the countering relationship between R2R_{2} and R3R_{3} and that between R4R_{4} and R1R_{1}. Following the same logic, the outcome becomes (0,0.5,0,0.5)(0,-0.5,0,-0.5), showing Player 2 is the winner. It is important to notice that both approaches adhere to the elimination transformation rule in Section IV-C. This dichotomy shows the complexity and ambiguity in adjudicating a definitive winner in this context. This implies that in the case of M>3M>3, it is necessary to introduce new mechanisms for determining win or loss on each node.

Notably, the intrinsic matrix in the above example is not deliberately constructed. In fact, any intrinsic matrix that reflects the CDH relationships between different robot types will inevitably lead to such issues. In Section IV-C2, we prove that for M=3M=3, outcome interface πoi(𝐱)=0\pi_{oi}(\mathbf{x})=0 is the demarcation surface of win and loss. However, the winning condition described in Section IV-C does not hold for M>3M>3, and thus the demarcation surface does not hold either. We conjecture that the demarcation for M>3M>3 is a polytope, which we leave for future study.