This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deviate or Not: Learning Coalition Structures
with Multiple-bit Observations in Games

Yixuan Even Xu Carnegie Mellon University. Email: [email protected]    Zhe Feng Google Research. Email: [email protected]    Fei Fang Carnegie Mellon University. Email: [email protected]
Abstract

We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of nn agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation – for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in O(logn)O(\log n) rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.

1 Introduction

Coalitions and collusive behavior are common in the real world. In auction systems, bidders may form coalitions to coordinate bids and exploit mechanisms for better odds of winning [1]. On ridesharing platforms like Uber and Lyft, some drivers might disconnect simultaneously to create an artificial shortage, triggering a price surge they can later benefit from [2, 3, 4]. While illegal collusion, like price fixing, is regulated by agencies such as the SEC in the United States, it remains challenging to eliminate, as seen in scandals like the LIBOR Scandal [5]. As AI systems become more advanced and widely used in domains like stock trading and autonomous driving, regulating these AI agents presents a growing challenge as seen by regulators and researchers [6, 7].

In response to such concerns, the problem of coalition structure learning (CSL) [8] has recently gained attention. In CSL, a platform aims to figure out which agents are in a coalition by designing a small number of games for the agents to play and make observations on the outcomes of these games. Notably, more than one coalition may exist simultaneously. In practice, the platform could represent a regulator, like an auction or ridesharing platform, aiming to detect collusion or design better mechanisms.

However, the existing study on CSL relies on a strong assumption that the platform can only observe whether a specified strategy profile is a Nash equilibrium (NE) in each round. With this single-bit observation oracle and other assumptions, optimal algorithms require Θ(nlogn)\Theta(n\log n) rounds of games to learn the coalition structure, where nn is the number of agents. Such an observation oracle is too restrictive for two reasons. First, for real-world systems with a large number of agents, Θ(nlogn)\Theta(n\log n) rounds is still prohibitively large. Second, it may be possible for the platform to know which agents would deviate from the specified strategy profile. For example, a ridesharing platform could observe which drivers disconnect from the platform given a pricing scheme.

In this paper, we aim to address this limitation and consider a multiple-bit observation oracle: the platform can observe an nn-dimensional binary vector in each round, where the ii-th bit indicates whether or not agent ii would like to deviate from the specified strategy profile. This oracle allows the platform to gain nn bits of information in each round, which is significantly more informative than the single-bit observation oracle, allowing the platform to learn the coalition structure with much fewer rounds. However, from a technical perspective, switching from the single-bit observation oracle to the multiple-bit observation oracle fundamentally changes the problem. With the single-bit observation, the platform can choose the game to use to get the next bit of observation, which means it is easy to ensure that each bit of observation brings in information. However, with multi-bit observation, the platform will get nn-bit information in a batch, and it is often inevitable that these nn-bits contain repetitive or highly correlated information. Therefore, the key challenge in algorithm design is how to design the games so that the nn bits of observation are more de-correlated and contain more information.

1.1 Our Results

Type of Games Lower Bound Complexity Section
Normal-Form log2nO(loglogn)\log_{2}n-O(\log\log n) log2n+2\log_{2}n+2 Section 3
Congestion log2nO(loglogn)\log_{2}n-O(\log\log n) log2n+2\log_{2}n+2 Appendix A
Graphical max(log2n,n/d)O(loglogn)\max(\log_{2}n,n/d)-O(\log\log n) 2n/d+2log2d+12n/d+2\log_{2}d+1 Appendix B
Auctions log2nO(loglogn)\log_{2}n-O(\log\log n) (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 Section 4
Table 1: Summary of results.

Depending on the specific scenario of application, sometimes external constraints may limit the type of games that can be designed. For example, in an auction platform, the algorithm may only use auctions. Therefore, we study different settings of the type of games the algorithm may design. We summarize our results for different settings in Table 1.

Information-theoretical lower bound. We first show that any algorithm that learns the coalition structure requires at least log2nO(loglogn)\log_{2}n-O(\log\log n) rounds regardless of the type of games used, where nn is the number of agents.

Normal-form games. Normal-form games are a general class of games. We consider the case when the platform can design any normal-form game and there are no external constraints on the type of games that can be designed. We show an algorithm, simultaneous binary search, that can learn the coalition structure in log2n+2\log_{2}n+2 rounds using directed prisoner’s dilemmas, a type of designed games for agents to play to elicit coalition structures. The number of rounds matches the lower bound almost exactly.

Congestion games. Congestion games [9] are a well-studied type of games that model the allocation of resources, and has demonstrated its importance in real-world transportation system. For CSL in congestion games, we show that with a different type of games – directed Braess’s paradoxes – the simultaneous binary search algorithm can be adapted to learn the coalition structure with congestion games. The number of rounds required is also log2n+2\log_{2}n+2.

Graphical games. Graphical games [10] are another thoroughly researched class of games. Investigating CSL with graphical games can be useful in contexts like social networks. We consider degree-dd graphical games, i.e., each agent has at most dd neighbors in the underlying graph. In the negative direction, we show that any algorithm for the problem requires at least n1d\lceil\frac{n-1}{d}\rceil rounds. In the positive direction, we design an algorithm that couples simultaneous binary search with the idea of block decomposition, i.e., to divide the agents into blocks of size d2\lfloor\frac{d}{2}\rfloor. See the construction of Algorithms 5 and 5 for details of this technique. Our algorithm learns the coalition structure in 2n/d+2log2d+12n/d+2\log_{2}d+1 rounds. Given the two lower bounds, this algorithm is optimal up to a constant factor.

Auctions. Finally, we study CSL with auctions, in particular, second-price auctions with personalized reserves, which are widely adopted in online advertising systems. The format of auctions is much more restrictive and challenging than the previous types of games since there is only one winner in an auction, and it is difficult to elicit information from the other players. Our algorithm in this case learns the coalition structure in (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 rounds, where cc is the size of the largest coalition. When the coalitions are small, the number of rounds required is close to the lower bound.

1.2 Related Work

Our work lies in the general research direction of learning in games [11]. In particular, our work is closely related to the line of research on Inverse Game Theory, e.g., [12, 13, 14, 15], where these previous papers aim to infer the parameters and underlying payoffs of a game by observing the behavior of players. Our work adopts this perspective but seeks to understand the coalition structure in games, which is not considered in prior works. By strategically and adaptively selecting defender strategies and observing the attacker’s best responses, [16, 17, 18] show that one can uncover the underlying utility functions driving attacker behavior in Stackelberg security games. Our work learns in a similar adaptive manner but focuses on the coalition structure among agents. In addition, learning coalition structure has a fundamental difference from the above existing literature since the space of all possible coalition structures is exponentially large.

The closest related work is [8], where they focus on learning coalition structure through a single bit feedback, i.e., the game designed in the learning process only returns one-bit information – whether a specified strategy profile is a NE or not. In this work, we strictly generalize the model proposed by [8] that allows us to observe multiple-bit feedback rather than just binary feedback. This flexibility strictly increases the set of possible observations and thus the difficulty of designing the games. Moreover, [19, 20] provide general approaches to detect one (collusion) coalition in multi-agent games, whereas, our work aims to find the entire coalition structure in games.

There is also a line of research on learning with revealed preference feedback, e.g., [21, 22, 23, 24, 25, 26], in which they target to predict the strategic agents’ behavior or optimize the profit of the decision maker, through the revealed preference feedback from strategic agents. In this paper, we assume that we can observe the best response feedback of the agents for the games, where each game is carefully designed to elicit information on the coalition structure within a set of agents. Loosely related work includes no-regret learning in games [27, 28]. Indeed, if there is a cost to perform one game to learn coalition structure, our learning process achieves constant regret given the number of rounds needed to identify all coalitions only depends on the number of agents and the structure of the underlying games.

2 Notations and Preliminaries

Consider a set N={1,2,,n}N=\{1,2,\dots,n\} of nn strategic agents. A coalition SNS\subseteq N is a nonempty subset of the agents, in which the agents can coordinate with each other in games. We assume there is a coalition structure 𝒮={S1,S2,,Sm}\mathcal{S}^{*}=\{{S_{1}}^{*},{S_{2}}^{*},\dots,{S_{m}}^{*}\} among these agents, which is a set partition of NN, and each agent in NN belongs to exactly one of these mutually disjoint coalitions. We use [i]𝒮[i]_{\mathcal{S}^{*}} to denote the coalition containing agent ii under 𝒮\mathcal{S}^{*}. 𝒮\mathcal{S}^{*} is not public information, but agents in each coalition know the other agents in the same coalition. In the Coalition Structure Learning (CSL) problem, our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the underlying coalition structure 𝒮\mathcal{S}^{*}. Specifically, we interact with the agents in a sequence of rounds. Each round, we design a game 𝒢\mathcal{G} along with a strategy profile Σ=(σ1,σ2,,σn)\Sigma=(\sigma_{1},\sigma_{2},\dots,\sigma_{n}) in 𝒢\mathcal{G} for the agents, and we make an observation 𝒪(𝒢,Σ)\mathcal{O}(\mathcal{G},\Sigma) about the strategy profile Σ\Sigma in 𝒢\mathcal{G}. The task is to learn the underlying coalition structure 𝒮\mathcal{S}^{*}.

Behavior model.

We assume that the agents are rational and strategic. When playing a game 𝒢\mathcal{G}, agents in the same coalition under 𝒮\mathcal{S}^{*} can coordinate with each other to choose actions and will share rewards after the game. Thus, they effectively play as a joint player whose action space and utility are, respectively, the Cartesian product of the action spaces and the sum of utilities of the agents in that coalition. Since the agents are not aware of the underlying coalition structure beyond their own coalitions, the information in the game is incomplete. We do not assume the agents’ full behavior model, but focus on the agents’ deviation decisions from the specified strategy profile.

Multiple-bit observation oracle.

In this paper, we consider the multiple-bit observation oracle as opposed to the single-bit observation oracle considered in [8]. The observation 𝒪(𝒢,Σ)\mathcal{O}(\mathcal{G},\Sigma) is an nn-dimensional binary vector, where the ii-th bit 𝒪i(𝒢,Σ){True,False}\mathcal{O}_{i}(\mathcal{G},\Sigma)\in\{\mathrm{True},\mathrm{False}\} indicates whether or not agent ii would like to deviate from the specified strategy σi\sigma_{i} when the coalition is best responding as a whole. In particular, given a game 𝒢\mathcal{G} and a strategy profile Σ\Sigma, for each coalition S𝒮S\in\mathcal{S}^{*}, agents in SS would first determine whether ΣS\Sigma_{S} is a joint best response to the other agents’ specified strategy ΣS\Sigma_{-S} in 𝒢\mathcal{G}. If it is, then 𝒪i=False\mathcal{O}_{i}=\mathrm{False} for each agent iSi\in S, i.e., they would decide not to deviate. Otherwise, they would compute a possible joint best response BR(S,Σ)\mathrm{BR}(S,\Sigma) to the other agents’ specified strategy ΣS\Sigma_{-S} in 𝒢\mathcal{G}. Then, they would compute the deviation decision 𝒪i=𝕀[BRi(S,Σ)σi]\mathcal{O}_{i}=\mathbb{I}\left[\vphantom{\sum}\mathrm{BR}_{i}(S,\Sigma)\neq\sigma_{i}\right] for each agent iSi\in S. The observation 𝒪(𝒢,Σ)\mathcal{O}(\mathcal{G},\Sigma) is the concatenation of the decisions of all agents in NN. If there are multiple best responses for a coalition, we assume that the agents in that coalition would choose any of them in the worst case. We will use “query the observation oracle” or “query (𝒢,Σ)(\mathcal{G},\Sigma)” to refer to the process of designing a game 𝒢\mathcal{G} and a strategy profile Σ\Sigma and receiving the observation 𝒪(𝒢,Σ)\mathcal{O}(\mathcal{G},\Sigma).

2.1 Lower Bound of the Number of Rounds

We first show a lower bound on the number of rounds required to solve the Multiple-bit CSL problem. The lower bound is based on an information-theoretical argument similar to [8]. Note that this lower bound holds for any type of games that the algorithm may design.

Theorem 2.1.

Any algorithm that solves the Multiple-bit CSL problem requires at least log2nO(loglogn)\log_{2}n-O(\log\log n) rounds of interactions with the agents in the worst case.

Proof.

In each round, the algorithm receives at most nn bits of information. As the number of possible set partitions of NN is the Bell number BnB_{n}, to distinguish between them, we need at least log2Bn=nlog2nO(nlog2log2n)\left\lceil\log_{2}B_{n}\right\rceil=n\log_{2}n-O(n\log_{2}\log_{2}n) bits of information. The equation follows from the asymptotic expression of Bell number established in [29]. The theorem then follows. ∎

3 Multiple-bit CSL with Normal-form Games

We show in this section an algorithm that solves Multiple-bit CSL with normal-form games in log2n+2\log_{2}n+2 rounds. Note that the number of rounds matches the lower bound in Theorem 2.1 up to O(loglogn)O(\log\log n). The main idea of the algorithm is to use binary search to find another agent in the same coalition for each agent simultaneously. We start by defining game-strategy pairs and a special pair that we call the directed prisoner’s dilemma, which we use in the algorithm.

Definition 3.1.

A game-strategy pair (𝒢,Σ)(\mathcal{G},\Sigma) is a nn-player normal-form game along with a pure strategy profile Σ\Sigma in 𝒢\mathcal{G}. Here, Σ\Sigma is called the specified strategy profile of (𝒢,Σ)(\mathcal{G},\Sigma).

Definition 3.2.

For iN,jNi\in N,j\in N, a directed prisoner’s dilemma 𝒫(i,j)\mathcal{P}(i,j) is a game-strategy pair (𝒢,Σ)(\mathcal{G},\Sigma). In game 𝒢\mathcal{G}, agent jj has two actions {C,D}\{\textup{C},\textup{D}\}, and all other agents only have one action {D}\{\textup{D}\}. If agent jj plays C, then ii and jj receive utility 22 and 1-1 respectively. Otherwise they both receive 0 utility. All other agents always receive 0 utility. Σ\Sigma is the pure strategy profile where all agents play D.

CDD(2,1)(0,0)\begin{array}[]{|c|c|c|}\hline\cr&\textup{C}&\textup{D}\\ \hline\cr\textup{D}&(2,-1)&(0,0)\\ \hline\cr\end{array}

Figure 1: The payoff table of agents ii and jj in the directed prisoner’s dilemma 𝒫(i,j)\mathcal{P}(i,j). Agent ii is the row player and agent jj is the column player. The others only have one action and are not shown in the table.

Given the construction of directed prisoner’s dilemma, we have the following observation.

Lemma 3.1.

Let i,jNi,j\in N and 𝐎=𝒪(𝒫(i,j))\mathbf{O}=\mathcal{O}(\mathcal{P}(i,j)). Then, Oj=TrueO_{j}=\mathrm{True} if and only if agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}. Moreover, Ok=FalseO_{k}=\mathrm{False} for kN{j}k\in N\setminus\{j\}.

Proof.

If agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}, then playing C is a dominant strategy for agent jj’s coalition. Otherwise, playing D is a dominant strategy. Therefore, Oj=TrueO_{j}=\mathrm{True} if and only if agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}. Moreover, since all agents other than jj only have one action {D}\{\textup{D}\}, Ok=FalseO_{k}=\mathrm{False} for kN{j}k\in N\setminus\{j\}. ∎

Lemma 3.1 shows that we can use directed prisoner’s dilemmas to determine whether two agents are in the same coalition under 𝒮\mathcal{S}^{*}. However, querying a single directed prisoner’s dilemma is not efficient as we only get 11 bit of information. To utilize the information from the observation oracle more efficiently, we will need to consider the product of multiple directed prisoner’s dilemmas.

Definition 3.3 ([8]).

Let (𝒢1,Σ1),(𝒢2,Σ2)(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2}) be two game-strategy pairs where Ax,i,ux,iA_{x,i},u_{x,i} are the action set and utility function of agent ii in 𝒢x\mathcal{G}_{x} respectively for x{1,2}x\in\{1,2\}. Let Σ1=(σ1,i)iN\Sigma_{1}=(\sigma_{1,i})_{i\in N} and Σ2=(σ2,i)iN\Sigma_{2}=(\sigma_{2,i})_{i\in N}. The product of (𝒢1,Σ1)(\mathcal{G}_{1},\Sigma_{1}) and (𝒢2,Σ2)(\mathcal{G}_{2},\Sigma_{2}) is a game-strategy pair (𝒢p,Σp)(\mathcal{G}_{p},\Sigma_{p}). Here, 𝒢p\mathcal{G}_{p} is a normal-form game with action set A1,i×A2,iA_{1,i}\times A_{2,i} and utility function u1,i+u2,iu_{1,i}+u_{2,i} for each iNi\in N. Σp,i=(σ1,i,σ2,i)iN\Sigma_{p,i}=(\sigma_{1,i},\sigma_{2,i})_{i\in N}. We denote the product of (𝒢1,Σ1)(\mathcal{G}_{1},\Sigma_{1}) and (𝒢2,Σ2)(\mathcal{G}_{2},\Sigma_{2}) as (𝒢1,Σ1)×(𝒢2,Σ2)(\mathcal{G}_{1},\Sigma_{1})\times(\mathcal{G}_{2},\Sigma_{2}) or x=12(𝒢x,Σx)\prod_{x=1}^{2}(\mathcal{G}_{x},\Sigma_{x}). The game-strategy pairs (𝒢1,Σ1),(𝒢2,Σ2)(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2}) are called the factors of (𝒢p,Σp)(\mathcal{G}_{p},\Sigma_{p}).

Lemma 3.2.

Let {(ix,jx)x{1,2,,k}}\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\} be kk pairs of agents and 𝐎=𝒪(x=1k𝒫(ix,jx))\mathbf{O}=\mathcal{O}(\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x})). Then, Oj=TrueO_{j}=\mathrm{True} if and only if there exists x{1,2,,k} such that jx=jx\in\{1,2,\dots,k\}\textbf{ such that }j_{x}=j and ix[j]𝒮i_{x}\in[j]_{\mathcal{S}^{*}}.

Proof.

By Definition 3.3, playing the product game is equivalent to separately playing each factor game, and summing up the resulting utilities of each agent. Therefore, agent jj decides to deviate in the product game x=1k𝒫(ix,jx)\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x}) if and only if agent jj decides to deviate in at least one factor game. The result then follows from Lemma 3.1. ∎

A graphical interpretation of Lemma 3.2 is as follows. When we query the game-strategy pair x=1k𝒫(ix,jx)\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x}), consider a directed graph where each vertex represents an agent and each edge (jx,ix)(j_{x},i_{x}), x{1,2,,k}x\in\{1,2,\dots,k\} represents 𝒫(ix,jx)\mathcal{P}(i_{x},j_{x}). The observation tells us for each vertex if any of its outgoing neighbors in this constructed graph is in the same coalition. With Lemma 3.2, we are ready for our Algorithm 1 for Multiple-bit CSL with normal-form games.

1
Input: The number of agents nn and the multiple-bit observation oracle 𝒪\mathcal{O}.
Output: A coalition structure 𝒮\mathcal{S} of the agents.
2
3Query iN,jN,i<j𝒫(i,j)\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j) and observe 𝐎\mathbf{O};
4 Let Tj{1,2,,j1} if Oj=True else T_{j}\leftarrow\{1,2,\dots,j-1\}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }\varnothing  for each jN\textbf{ for each }j\in N;
5 while jN such that |Tj|2\exists j\in N\textbf{ such that }|T_{j}|\geq 2 do
6       Let Lj{the smallest |Tj|2 elements in Tj}L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\} for each jNj\in N;
7       Let RjTjLjR_{j}\leftarrow T_{j}\setminus L_{j} for each jNj\in N;
8       Query jN,iLj𝒫(i,j)\prod_{j\in N,i\in L_{j}}\mathcal{P}(i,j) and observe 𝐎\mathbf{O};
9       Let TjLj if Oj=True else RjT_{j}\leftarrow L_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }R_{j}  for each jN\textbf{ for each }j\in N;
10      
11Let 𝒮{{1},{2},,{n}}\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\};
12 for jN such that |Tj|j\in N\textbf{ such that }|T_{j}|\neq\varnothing do
13       Let ithe only element in Tji\leftarrow\text{the only element in $T_{j}$};
14       Merge [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S};
15      
16return 𝒮\mathcal{S};
Algorithm 1 Simultaneous Binary Search

Intuitively, given an agent jj, if we want to find another agent i(i<j)i\ (i<j) in the same coalition with jj, we can query a game-strategy pair kTj𝒫(k,j)\prod_{k\in T_{j}}\mathcal{P}(k,j) where Tj={1,2,,j1}T_{j}=\{1,2,\ldots,j-1\} initially. According to Lemma 3.2, we will know whether there is such an agent in TjT_{j} from the jj-th bit of the observation. If there is such an agent, we can then bisect the set TjT_{j} and apply binary search to find this agent. Moreover, this binary search can be done simultaneously for different agents. For example, given two agents jj and jj^{\prime}, we can construct the game-strategy pairs kTj𝒫(k,j)×kTj𝒫(k,j)\prod_{k\in T_{j}}\mathcal{P}(k,j)\times\prod_{k\in T_{j^{\prime}}}\mathcal{P}(k,j^{\prime}) and apply binary search at the same time on TjT_{j} and TjT_{j^{\prime}} using the jj-th and jj^{\prime}-th bits of the observation. This is because the jj-th bit in the observation vector only depends on whether there exists another agent in TjT_{j} in the same coalition with agent jj, and the same argument applies to agent jj^{\prime}.

Algorithm 1 leverages this intuition and works in two stages. In the first stage (Lines 1 to 7), we try to find for each agent jj the smallest index of the agents in [j]𝒮[j]_{\mathcal{S}^{*}}. To do this, we first query iN,jN,i<j𝒫(i,j)\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j) (Line 1). In this game, agent jj has the incentive to deviate only when another agent ii with a smaller index than jj is in the same coalition with jj. Therefore, using the jj-th bit from the observation, we can determine whether agent jj is the agent with the smallest index in [j]𝒮[j]_{\mathcal{S}^{*}} (Line 2). For agents who are not, we then use binary search to locate the smallest indexed agents in their coalitions simultaneously (Lines 3 to 6). In the second stage (Lines 8 to 11), it merges each coalition in 𝒮\mathcal{S}^{*} together according to the result of the first stage.

We provide an example of Algorithm 1 in Fig. 2.

1324
(a) First query
1324T3T_{3}T4T_{4}
(b) Second query
1324T3T_{3}T4T_{4}
(c) Final result
Figure 2: Example execution of Algorithm 1 when 𝒮={{1,4},{2,3}}\mathcal{S}^{*}=\{\{1,4\},\{2,3\}\}. The vertices represent the agents, the edges represent the directed prisoner’s dilemmas that the algorithm queries each time, and the dashed rectangles represent the sets TjT_{j}. In the first query, the algorithm queries iN,jN,i<j𝒫(i,j)\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j) (Line 1) as shown in (a). Using the observations, the algorithm sets T1=T2=T_{1}=T_{2}=\varnothing, T3={1,2}T_{3}=\{1,2\}, and T4={1,2,3}T_{4}=\{1,2,3\} (Line 2). In the second query, T3T_{3} is bisected into L3={1},R3={2}L_{3}=\{1\},R_{3}=\{2\}, T4T_{4} is bisected into L4={1},R4={2,3}L_{4}=\{1\},R_{4}=\{2,3\}, and the algorithm queries jN,iLj𝒫(i,j)\prod_{j\in N,i\in L_{j}}\mathcal{P}(i,j) (Lines 3 to 6) as shown in (b). Using the observations, the algorithm sets T3=L3={1}T_{3}=L_{3}=\{1\} and T4=R4={2,3}T_{4}=R_{4}=\{2,3\} (Line 7) as shown in (c). Finally, the algorithm merges {1}\{1\} and {4}\{4\}, and {2}\{2\} and {3}\{3\} together to recover the coalition structure (Lines 8 to 11).
Theorem 3.1.

Algorithm 1 solves Multiple-bit CSL with normal-form games in log2n+2\log_{2}n+2 rounds.

Proof.

We first show the correctness of the algorithm. To do this, we will show two claims: (i) after Lines 1 to 7, for each jNj\in N, if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, Tj=T_{j}=\varnothing, otherwise, TjT_{j} contains only the smallest index of the agents in [j]S[j]_{S^{*}}; (ii) after Lines 8 to 11, 𝒮=𝒮\mathcal{S}=\mathcal{S}^{*}.

For (i), if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, then by Lemma 3.2, Oj=FalseO_{j}=\mathrm{False}, and Tj=T_{j}=\varnothing on Line 2. Throughout the algorithm, TjT_{j} remains \varnothing. Otherwise, TjT_{j} contains the smallest index of the agents in [j]S[j]_{S^{*}} after Line 2. In the while loop (Line 3 to 7), TjT_{j} is updated to LjL_{j} if Oj=TrueO_{j}=\mathrm{True}, and to RjR_{j} otherwise. Since LjL_{j} contains the smallest |Tj|/2\lfloor|T_{j}|/2\rfloor elements in TjT_{j}, by Lemma 3.2, Oj=TrueO_{j}=\mathrm{True} if and only if the smallest index of the agents in [j]S[j]_{S^{*}} is in LjL_{j}. Therefore, after one iteration of the loop, TjT_{j} still contains only the smallest index of the agents in [j]S[j]_{S^{*}}, while the size of TjT_{j} is halved. The loop terminates after |Tj|=1|T_{j}|=1, and TjT_{j} contains only the smallest index of the agents in [j]S[j]_{S^{*}}.

For (ii), since every agent jj is either the agent with the smallest index in [j]S[j]_{S^{*}} or merged with that agent, each coalition is merged in 𝒮\mathcal{S} after Lines 8 to 11. Thus 𝒮\mathcal{S} becomes the same as 𝒮\mathcal{S}^{*}.

Next, we show the complexity of the algorithm. The while loop in Lines 3 to 7 runs at most log2n\lceil\log_{2}n\rceil times, and each iteration requires 11 query. Together with the query on Line 1, the total number of queries is at most log2n+1log2n+2\lceil\log_{2}n\rceil+1\leq\log_{2}n+2. ∎

Multiple-bit CSL with Other Types of Games.

A natural next step after this section is to consider the case where the algorithm can only design games of a specific type. We will show that the idea of simultaneous binary search from Algorithm 1 can be adapted to solve the Multiple-bit CSL problem with congestion games (Appendix A), graphical games (Appendix B), or auctions (Section 4). The adaptation to congestion games is natural once we construct a new game-strategy pair that is similar in spirit to the directed prisoner’s dilemma. However, the adaptation to graphical games and auctions requires more innovations in algorithm design.

4 Multiple-bit CSL with Auctions

We consider in this section the case of Multiple-bit CSL with auctions. The auction format we consider is theoretically well-studied second-price auctions with personalized reserves [30], which is widely used in practice. We first formally define this format.

Definition 4.1.

A second-price auction with personalized reserves 𝒢\mathcal{G} is a tuple (N,𝐯,𝐫)(N,\mathbf{v},\mathbf{r}) where NN is the set of agents, 𝐯=(v1,v2,,vn)\mathbf{v}=(v_{1},v_{2},\dots,v_{n}) is the valuation vector, and 𝐫=(r1,r2,,rn)\mathbf{r}=(r_{1},r_{2},\dots,r_{n}) is the reserve price vector. The auction proceeds as follows: Each agent ii submits a bid σi0\sigma_{i}\in\mathbb{R}_{\geq 0} and the agent with the highest bid wins the auction with ties broken at uniform random. If the winner ii bids greater than rir_{i}, then the winner gets the item at a price equal to the maximum between the second highest bid and rir_{i}. The item can be reallocated among [i]𝒮[i]_{\mathcal{S}^{*}}. Otherwise, the item is not allocated.

Our algorithm for the Multiple-bit CSL problem with auctions works in (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 rounds, where cc is size of the largest coalition in 𝒮\mathcal{S}^{*}. To introduce the algorithm, we first define the game-strategy pairs that we will use in the algorithm, the auction game gadgets, where we will specify the valuation and reserve prices of every agents in each partition bucket.

Definition 4.2.

Let {X,Y,Z}\{X,Y,Z\} be a partition of NN, i.e., XY=XZ=YZ=X\cap Y=X\cap Z=Y\cap Z=\varnothing and XYZ=NX\cup Y\cup Z=N. An auction gadget 𝒜(X,Y,Z)\mathcal{A}(X,Y,Z) is a game-strategy pair (𝒢,Σ)(\mathcal{G},\Sigma) where 𝒢=(N,𝐯,𝐫)\mathcal{G}=(N,\mathbf{v},\mathbf{r}) is a second-price auction with personalized reserves, and Σ\Sigma is a specified bid vector for the agents, where

vi={1(iX)0(iY)0(iZ),ri={1(iX)0(iY)1(iZ),andσi=0v_{i}=\left\{\begin{array}[]{ll}1&(i\in X)\\ 0&(i\in Y)\\ 0&(i\in Z),\end{array}\right.r_{i}=\left\{\begin{array}[]{ll}1&(i\in X)\\ 0&(i\in Y)\\ 1&(i\in Z),\end{array}\right.\text{and}\ \ \ \sigma_{i}=0

Note that in the above auction game gadgets, we do not require each agent to know all the other agents’ valuations so that the valuation of each agent is still private across strategic agents. Only agents in the same coalition know each other’s true valuation.

Lemma 4.1.

For {X,Y,Z}\{X,Y,Z\} that is a partition of NN, let 𝐎=𝒪(𝒜(X,Y,Z))\mathbf{O}=\mathcal{O}(\mathcal{A}(X,Y,Z)). For each coalition S𝒮S\in\mathcal{S}^{*}, if SXS\cap X\neq\varnothing and SYS\cap Y\neq\varnothing, then, there is exactly one agent iSYi\in S\cap Y with Oi=TrueO_{i}=\mathrm{True} and Oj=FalseO_{j}=\mathrm{False} for all jS{i}j\in S\setminus\{i\}. Otherwise, Oi=FalseO_{i}=\mathrm{False} for all iSi\in S.

Proof.

For each coalition S𝒮S\in\mathcal{S}^{*}. We first consider the case where SXS\cap X\neq\varnothing and SYS\cap Y\neq\varnothing. Let agent iSXi\in S\cap X and agent jSYj\in S\cap Y. If the coalition follows the specified strategy profile ΣS\Sigma_{S}, then the item will not be allocated, so the utility of the agents in SS is 0. However, If agent jj raises its bid to σj>0\sigma^{\prime}_{j}>0, then it becomes the winner of the auction, and the item will be allocated to jj with price max{secondhighestbid,rj}=0\max\{second\ highest\ bid,r_{j}\}=0. The item can then be reallocated to agent ii, giving utility 11 to the agents in SS. Thus, ΣS\Sigma_{S} is not a joint best response to ΣS\Sigma_{-S}. Moreover, for agents in SS, the only joint best response is to let exactly one agent in SYS\cap Y raise its bid to positive, and the other agents in SS keep their bids at 0. Thus, the observation 𝐎\mathbf{O} is as described in the lemma.

Otherwise, either SX=S\cap X=\varnothing or SY=S\cap Y=\varnothing. If SX=S\cap X=\varnothing, then no one in SS has a positive valuation for the item, so the utility of the agents in SS cannot be greater than 0. If SY=S\cap Y=\varnothing, then no one in SS has a reserve price that is smaller than 11, which is the largest possible valuation. Thus, the utility of the agents in SS cannot be greater than 0, either. In both cases, keeping the bids at 0 is a joint best response to ΣS\Sigma_{-S}, so the observation 𝐎\mathbf{O} is as described in the lemma. ∎

The auction gadgets are important building blocks for our algorithm that works in (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 rounds. To help demonstrate how they can be used to solve Multiple-bit CSL, we present a simpler algorithm that works in n1n-1 rounds using the auction gadgets below.

1
Input: The number of agents nn and the multiple-bit observation oracle 𝒪\mathcal{O}.
Output: A coalition structure 𝒮\mathcal{S} of the agents.
2
3Let TNT\leftarrow N;
4 Let 𝒮{{1},{2},,{n}}\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\};
5 while |T|2|T|\geq 2 do
6       Let ithe first element in Ti\leftarrow\text{the first element in $T$};
7       while |T|2|T|\geq 2 and iTi\in T do
8             Let X{i},YT{i}X\leftarrow\{i\},Y\leftarrow T\setminus\{i\} and ZNSZ\leftarrow N\setminus S;
9             Query 𝒜(X,Y,Z)\mathcal{A}(X,Y,Z) and observe 𝐎\mathbf{O};
10             if j\exists j such that Oj=TrueO_{j}=\mathrm{True} then
11                   Merge [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S};
12                   TT{j}T\leftarrow T\setminus\{j\};
13                  
14            else
15                   TT{i}T\leftarrow T\setminus\{i\};
16                  
17            
18      
19return 𝒮\mathcal{S};
Algorithm 2 Iterative Location with Auctions

To provide some intuitions, consider an agent iNi\in N. If we want to find another agent in the same coalition as ii, we can query the auction gadget 𝒜({i},N{i},)\mathcal{A}(\{i\},N\setminus\{i\},\varnothing). According to Lemma 4.1, if the observation 𝐎\mathbf{O} has Oj=TrueO_{j}=\mathrm{True} for some jN{i}j\in N\setminus\{i\}, then ii and jj are in the same coalition under 𝒮\mathcal{S}^{*}. Otherwise, ii is in a coalition by itself. Then, suppose we have found an agent jj in ii’s coalition, and we would like to find a different agent in that coalition. We can query the auction gadget 𝒜({i},N{i,j},{j})\mathcal{A}(\{i\},N\setminus\{i,j\},\{j\}). By moving agent jj to the set ZZ, we ensure that jj does not have the incentive to deviate, and either another agent in the same coalition as ii will have Ok=TrueO_{k}=\mathrm{True} for some kN{i,j}k\in N\setminus\{i,j\}, or we can conclude that ii and jj are in a coalition by themselves. We can then repeat this process until we have found all the agents in the same coalition as ii. After that, we can move ii to the set ZZ and proceed with the next agent in NN. In this way, we can recover the coalition structure 𝒮\mathcal{S}^{*}. Moreover, since each time we query an auction gadget, we move one agent to set ZZ, and the algorithm terminates when |Z|=n1|Z|=n-1, it uses n1n-1 queries in total.

Theorem 4.1.

Algorithm 2 solves the Multiple-bit CSL problem with auctions in n1n-1 rounds.

We defer the proof to Appendix C. Other than the auction gadgets, we will also use another game-strategy pair in the algorithm. We prove the following lemma about the observations obtained by querying this game-strategy pair.

Lemma 4.2.

Let 𝐎=𝒪((N,𝐯=𝟏,𝐫=𝟎),Σ=𝟎)\mathbf{O}=\mathcal{O}((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0}), where 𝟎,𝟏\mathbf{0},\mathbf{1} are vectors consisting purely of 0s and 11s respectively. For each coalition S𝒮S\in\mathcal{S}^{*}, exactly one agent iSi\in S has Oi=TrueO_{i}=\mathrm{True}.

Proof.

For each coalition S𝒮S\in\mathcal{S}^{*}, if any agent ii in SS raises its bid to positive, then the item will be allocated to ii with price 0, giving utility 11 to the agents in SS. Thus, ΣS\Sigma_{S} is not a joint best response to ΣS\Sigma_{-S}. Moreover, the only joint best response for agents in SS is to let exactly one agent in SS raise its bid to positive, and the other agents in SS keep their bids at 0. The lemma then follows. ∎

Then, we are ready to present our algorithm that works in (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 rounds for Multiple-bit CSL with auctions. The algorithm is shown in Algorithm 3.

1
Input: The number of agents nn and the multiple-bit observation oracle 𝒪\mathcal{O}.
Output: A coalition structure 𝒮\mathcal{S} of the agents.
2
3Query ((N,𝐯=𝟏,𝐫=𝟎),Σ=𝟎)((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0}) and observe 𝐎\mathbf{O};
4 Let Tx{iOi=True}T_{x}\leftarrow\{i\mid O_{i}=\mathrm{True}\} and TyNTxT_{y}\leftarrow N\setminus T_{x};
5 for b{0,1,,log2n}b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\} do
6       Let X{iiTx and X\leftarrow\{i\mid i\in T_{x}\textbf{ and } the b-th lowest binary bit of i is 1}\text{the $b$-th lowest binary bit of $i$ is $1$}\};
7       Let TFalseTyT_{\mathrm{False}}\leftarrow T_{y};
8       repeat
9             Let YTFalseY\leftarrow T_{\mathrm{False}} and ZN(XY)Z\leftarrow N\setminus(X\cup Y);
10             Query 𝒜(X,Y,Z)\mathcal{A}(X,Y,Z) and observe 𝐎\mathbf{O};
11             Let TTrue{iiTFalse and Oi=True}T_{\mathrm{True}}\leftarrow\{i\mid i\in T_{\mathrm{False}}\textbf{ and }O_{i}=\mathrm{True}\};
12             TFalseTFalseTTrueT_{\mathrm{False}}\leftarrow T_{\mathrm{False}}\setminus T_{\mathrm{True}};
13            
14      until TTrue=T_{\mathrm{True}}=\varnothing;
15      Let Oi(b)𝕀[iTFalse] for each iTyO^{(b)}_{i}\leftarrow\mathbb{I}\left[\vphantom{\sum}i\not\in T_{\mathrm{False}}\right]\textbf{ for each }i\in T_{y};
16      
17Let 𝒮{{1},{2},,{n}}\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\};
18 for iTyi\in T_{y} do
19       Let jb=0log2n2bOi(b)j\leftarrow\sum_{b=0}^{\lfloor\log_{2}n\rfloor}2^{b}\cdot O^{(b)}_{i};
20       Merge [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S};
21      
22return 𝒮\mathcal{S};
Algorithm 3 Bitwise Search with Auctions
135246TxT_{x}TyT_{y}XXTTrueT_{\mathrm{True}}TFalseT_{\mathrm{False}}
(a) When b=0b=0
135264TxT_{x}TyT_{y}XXTTrueT_{\mathrm{True}}TFalseT_{\mathrm{False}}
(b) When b=1b=1
135624TxT_{x}TyT_{y}XXTTrueT_{\mathrm{True}}TFalseT_{\mathrm{False}}
(c) When b=2b=2
Figure 3: Example execution of the bitwise search (Lines 1 to 12) in Algorithm 3 when 𝒮={{1,4},{2,3},{5,6}}\mathcal{S}^{*}=\{\{1,4\},\{2,3\},\{5,6\}\}. The vertices represent the agents, and the dashed lines represent the sets used in the algorithm. Using the first query, the algorithm identifies one agent in each coalition and groups them as Tx={1,3,5}T_{x}=\{1,3,5\}. The rest are Ty={2,4,6}T_{y}=\{2,4,6\} (Lines 1 to 2). Then, as shown above, for each b{0,1,2}b\in\{0,1,2\}, the algorithm picks the set of agents in TxT_{x} with the bb-th lowest binary bit as 11 as XX, and partitions TyT_{y} into TTrueT_{\mathrm{True}}, each of which is cooperating with some agent in XX, and TFalseT_{\mathrm{False}}, each of which is not cooperating with any agent in XX (Lines 3 to 12).

The algorithm starts by querying the game-strategy pair ((N,𝐯=𝟏,𝐫=𝟎),Σ=𝟎)((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0}) (Line 1). Using the obtained observation, by Lemma 4.2, the algorithm identifies exactly one agent in each coalition, groups them into set TxT_{x} and the rest of the agents into set TyT_{y} (Lines 2). Essentially, set TxT_{x} contains a set of agents who are definitely not in the same coalition with each other. Furthermore, for each agent ii in TyT_{y}, there is exactly one “teammate” of ii in TxT_{x}. We denote the index of this agent as α(i)\alpha(i).

An important idea behind Algorithm 3 is bitwise search, i.e., to determine each binary bit of α(i)\alpha(i) separately. To do this, for each b{0,1,,log2n}b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}, the algorithm picks the set of agents in TxT_{x} such that the bb-th lowest binary bits of their index are 11 as XX (Line 4), and tries to tell which agents in TyT_{y} are in the same coalition as some agent in XX by querying auction gadgets. Intuitively, if we let Y=Ty,Z=N(XY)Y=T_{y},Z=N\setminus(X\cup Y) and query 𝒜(X,Y,Z)\mathcal{A}(X,Y,Z), by Lemma 4.1, for each coalition that contains an agent in XX, one of its members in YY will decide to deviate. Thus, we can move these agents to ZZ like how we did in Algorithm 2, until no agents in YY decide to deviate. The remaining agents in YY are then the agents that are not in the same coalition as any agent in XX. The algorithm leverages this idea to determine each binary bit of α(i)\alpha(i) separately, and stores the results in Oi(b)O^{(b)}_{i} (Lines 5 to 12). Note that for each bb, the algorithm uses at most 1+c1+c queries, where cc is the size of the largest coalition in 𝒮\mathcal{S}^{*}. We illustrate the execution of the bitwise search with an example in Fig. 3.

Finally, the algorithm recovers α(i)\alpha(i) for each agent ii in TyT_{y} by summing up the values of the binary bits in Oi(b)O^{(b)}_{i} (Line 15), and merges ii with α(i)\alpha(i) accordingly (Line 16).

Theorem 4.2.

Algorithm 3 solves the Multiple-bit CSL problem with auctions in (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1 rounds, where cc is size of the largest coalition in 𝒮\mathcal{S}^{*}.

Proof.

We first show the correctness of the algorithm. By Lemma 4.2, after Line 2, TxT_{x} contains exactly one agent in each coalition. This means that for each agent iTyi\in T_{y}, there is exactly one agent jTxj\in T_{x} such that ii and jj are in the same coalition. We denote this agent jj as α(i)\alpha(i). Then, we will show two claims: (i) after Lines 3 to 12, for each b{0,1,,log2n}b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\} and each iTyi\in T_{y}, Oi(b)=TrueO^{(b)}_{i}=\mathrm{True} if and only if the bb-th lowest binary bit of α(i)\alpha(i) is 11, and (ii) after Lines 13 to 16, 𝒮=𝒮\mathcal{S}=\mathcal{S}^{*}.

For (i), we only need to show that after the repeat loop in Lines 6 to 11, TFalseT_{\mathrm{False}} contains exactly the agents iTyi\in T_{y} such that the bb-th lowest binary bit of α(i)\alpha(i) is 0. If an agent iTyi\in T_{y} has the bb-th lowest binary bit of α(i)\alpha(i) as 0, then α(i)X\alpha(i)\not\in X after Line 4. Thus, by Lemma 4.1, OiO_{i} will never be True\mathrm{True} on Line 8, and iTFalsei\in T_{\mathrm{False}} after the repeat loop. If an agent iTyi\in T_{y} has the bb-th lowest binary bit of α(i)\alpha(i) as 11, then α(i)X\alpha(i)\in X after Line 4. By Lemma 4.1, in each iteration of the repeat loop, one agent j[i]𝒮TFalsej\in[i]_{\mathcal{S}^{*}}\cap T_{\mathrm{False}} will have Oj=TrueO_{j}=\mathrm{True}, and will be removed from TFalseT_{\mathrm{False}}. Since the repeat loop terminates only when no agent is removed from TFalseT_{\mathrm{False}} in one iteration, iTFalsei\not\in T_{\mathrm{False}} after the loop.

For (ii), given each binary bit of α(i)\alpha(i), the algorithm computes the index of the agent in α(i)\alpha(i) in TyT_{y} by summing up the values of the binary bits on Line 15. Then, the algorithm merges ii with α(i)\alpha(i) accordingly on Line 16. The correctness of the algorithm then follows.

For the complexity, the algorithm uses 11 query on Line 1, and for each b{0,1,,log2n}b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}, the algorithm uses at most 1+c1+c queries in the repeat loop in Lines 6 to 11. This shows that the total number of queries used by the algorithm is at most (1+log2n)(1+c)+1(1+\log_{2}n)(1+c)+1. ∎

5 Conclusion and Future Work

In this paper, we study the Coalition Structure Learning (CSL) problem under the multiple-bit observation oracle. We consider various settings of the type of games that the algorithm may design, including normal-form games, congestion games, graphical games, and auctions. In each of the settings, we present an algorithm that learns the coalition structure with a sublinear number of rounds. We also complement our algorithmic results with lower bounds on the number of rounds required to learn the coalition structure, demonstrating their optimality in most of the settings. Compared to the existing work, our results significantly reduce the number of rounds required to learn the coalition structure, and greatly improve the potential applicability of the CSL problem in real-world systems. Regarding CSL, there are various different settings for future work that are not covered by this paper. It would be interesting to consider the CSL problem (i) when the underlying coalition structure is dynamic, (ii) when the agents are aware of the algorithm and respond strategically, (iii) when the algorithm can only observe a subset of the agents in each round, and (iv) when the observations are noisy or the agents are boundedly rational.

Acknowledgements

This work was supported in part by NSF grant IIS-2046640 (CAREER) and NSF IIS-2200410.

References

  • Milgrom [2004] Paul Robert Milgrom. Putting auction theory to work. Cambridge University Press, 2004.
  • Hamilton [2019] Isobel Asher Hamilton. Uber drivers are reportedly colluding to trigger ‘surge’ prices because they say the company is not paying them enough. Business Insider, 2019.
  • Sweeney [2019] Sam Sweeney. Uber, lyft drivers manipulate fares at reagan national causing artificial price surges. WJLA, 2019.
  • Dowling [2023] Joshua Dowling. How uber drivers trigger fake surge price periods when no delays exist. Drive, 2023.
  • Wikipedia [2024] Wikipedia. Libor scandal. https://en.wikipedia.org/wiki/Libor˙scandal, 2024. Accessed: 2024-08-11.
  • EPRS [2020] EPRS. The ethics of artificial intelligence: Issues and initiatives. European Parliament Research Service, Scientific Foresight Unit (STOA), Panel for the Future of Science and Technology, 2020.
  • Qu et al. [2023] Ao Qu, Yihong Tang, and Wei Ma. Adversarial attacks on deep reinforcement learning-based traffic signal control systems with colluding vehicles. ACM Transactions on Intelligent Systems and Technology, 14(6):1–22, 2023.
  • Xu et al. [2024] Yixuan Even Xu, Chun Kai Ling, and Fei Fang. Learning coalition structures with games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 9944–9951, 2024. doi: 10.1609/aaai.v38i9.28856. URL https://ojs.aaai.org/index.php/AAAI/article/view/28856.
  • Rosenthal [1973] Robert W Rosenthal. A class of games possessing pure-strategy nash equilibria. International Journal of Game Theory, 2:65–67, 1973.
  • Kearns et al. [2001] Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. In Proceedings of the 17th Conference in Uncertainty in Artificial, Intelligence, 2001, pages 253–260, 2001.
  • Fudenberg and Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 1 of MIT Press Books. The MIT Press, December 1998.
  • Waugh et al. [2011] Kevin Waugh, Brian D. Ziebart, and J. Andrew Bagnell. Computational rationalization: the inverse equilibrium problem. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 1169–1176, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  • Kuleshov and Schrijvers [2015] Volodymyr Kuleshov and Okke Schrijvers. Inverse game theory: Learning utilities in succinct games. In Evangelos Markakis and Guido Schäfer, editors, Web and Internet Economics, pages 413–427, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.
  • Ling et al. [2018] Chun Kai Ling, Fei Fang, and J. Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 396–402. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi: 10.24963/ijcai.2018/55. URL https://doi.org/10.24963/ijcai.2018/55.
  • Letchford et al. [2009] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Marios Mavronicolas and Vicky G. Papadopoulou, editors, Algorithmic Game Theory, pages 250–262, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
  • Balcan et al. [2015] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334105.
  • Haghtalab et al. [2016] Nika Haghtalab, Fei Fang, Thanh H. Nguyen, Arunesh Sinha, Ariel D. Procaccia, and Milind Tambe. Three strategies to success: learning adversary models in security games. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 308–314. AAAI Press, 2016. ISBN 9781577357704.
  • Wu et al. [2022] Jibang Wu, Weiran Shen, Fei Fang, and Haifeng Xu. Inverse game theory for stackelberg games: the blessing of bounded rationality. Advances in Neural Information Processing Systems, 35:32186–32198, 2022.
  • Bonjour et al. [2022] Trevor Bonjour, Vaneet Aggarwal, and Bharat Bhargava. Information theoretic approach to detect collusion in multi-agent games. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 223–232. PMLR, 01–05 Aug 2022.
  • Mazrooei et al. [2013] Parisa Mazrooei, Christopher Archibald, and Michael Bowling. Automating collusion detection in sequential games. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1):675–682, Jun. 2013. doi: 10.1609/aaai.v27i1.8674. URL https://ojs.aaai.org/index.php/AAAI/article/view/8674.
  • Beigman and Vohra [2006] Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 36–42, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595932364.
  • Zadimoghaddam and Roth [2012] Morteza Zadimoghaddam and Aaron Roth. Efficiently learning from revealed preference. In Paul W. Goldberg, editor, Internet and Network Economics, pages 114–127, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
  • Balcan et al. [2014] Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V. Vazirani. Learning economic parameters from revealed preferences. In Tie-Yan Liu, Qi Qi, and Yinyu Ye, editors, Web and Internet Economics, pages 338–353, Cham, 2014. Springer International Publishing.
  • Blum et al. [2014] Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper˙files/paper/2014/file/cc1aa436277138f61cda703991069eaf-Paper.pdf.
  • Amin et al. [2015] Kareem Amin, Rachel Cummings, Lili Dworkin, Michael Kearns, and Aaron Roth. Online learning and profit maximization from revealed preferences. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 770–776. AAAI Press, 2015. ISBN 0262511290.
  • Roth et al. [2016] Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: optimizing from revealed preferences feedback. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, page 949–962, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341325.
  • Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  • Daskalakis et al. [2021] Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
  • De Bruijn [1981] Nicolaas Govert De Bruijn. Asymptotic methods in analysis, volume 4. Courier Corporation, 1981.
  • Paes Leme et al. [2016] Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. A field guide to personalized reserve prices. In Proceedings of the 25th international conference on world wide web, pages 1093–1102, 2016.

Appendix A Multiple-bit CSL with Congestion Games

To state our result for congestion games, we first recall the definition of a congestion game.

Definition A.1 ([9]).

A congestion game 𝒢\mathcal{G} is a tuple (N,R,(Ai)iN,(ci)rR)(N,R,(A_{i})_{i\in N},(c_{i})_{r\in R}) where NN is the set of agents, RR is the set of resources, Ai2R{}A_{i}\subseteq 2^{R}\setminus\{\varnothing\} is the strategy space of agent ii, and cr:c_{r}:\mathbb{N}\to\mathbb{R} is the cost function of resource rr. For a strategy profile Σ=(σi)iN\Sigma=(\sigma_{i})_{i\in N} where σiAi\sigma_{i}\in A_{i}, the utility of agent ii is given by ui(Σ)=rσicr(|{jNrσj}|)u_{i}(\Sigma)=-\sum_{r\in\sigma_{i}}c_{r}(|\{j\in N\mid r\in\sigma_{j}\}|).

Clearly, a directed prisoner’s dilemma is not a congestion game, so we cannot directly apply Algorithm 1 to solve Multiple-bit CSL with congestion games. However, as we will show in this subsection, we can design another type of game-strategy pair, directed Braess’s paradox, which is a congestion game, and functions similarly to directed prisoner’s dilemmas in the context of Multiple-bit CSL. We will then show that using directed Braess’s paradoxes, we can solve Multiple-bit CSL with congestion games in log2n+2\log_{2}n+2 rounds with an algorithm similar to Algorithm 1.

Definition A.2.

For iN,jNi\in N,j\in N, a directed Braess’s paradox (i,j)\mathcal{B}(i,j) is a game-strategy pair (𝒢,Σ)(\mathcal{G},\Sigma). Here, 𝒢\mathcal{G} is a congestion game (N,R,(Ai)iN,(ci)rR)(N,R,(A_{i})_{i\in N},(c_{i})_{r\in R}) where R={r1,r2,r3}R=\{r_{1},r_{2},r_{3}\}. The strategy spaces of agent ii and jj are {{r1}}\{\{r_{1}\}\} and {{r2},{r1,r3}}\{\{r_{2}\},\{r_{1},r_{3}\}\} respectively, and the strategy spaces for all other agents are {}\{\varnothing\}. The cost functions are cr1(x)=xc_{r_{1}}(x)=x, cr2(x)=2.5c_{r_{2}}(x)=2.5 and cr3(x)=0c_{r_{3}}(x)=0 for all xx\in\mathbb{N}. The strategy profile Σ\Sigma is such that σi={r1}\sigma_{i}=\{r_{1}\}, σj={r1,r3}\sigma_{j}=\{r_{1},r_{3}\} and σk=\sigma_{k}=\varnothing for all kN{i,j}k\in N\setminus\{i,j\}.

We illustrate the definition of directed Braess’s paradox in Fig. 4.

SiS_{i}SjS_{j}TTc(x)=2.5c(x)=2.5c(x)=xc(x)=xc(x)=0c(x)=0
Figure 4: Illustration of the directed Braess’s paradox (i,j)\mathcal{B}(i,j). Agent ii needs to choose a path from SiS_{i} to TT and agent jj needs to choose a path from SjS_{j} to TT. The cost functions are indicated on the edges.
Lemma A.1.

Let i,ji,j be two agents in NN and 𝐎=𝒪((i,j))\mathbf{O}=\mathcal{O}(\mathcal{B}(i,j)). Then, Oj=TrueO_{j}=\mathrm{True} if and only if agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}. Moreover, Ok=FalseO_{k}=\mathrm{False} for kN{j}k\in N\setminus\{j\}.

Proof.

If agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}, then choosing {r2}\{r_{2}\} is a dominant strategy for agent jj’s coalition. Otherwise, choosing {r1,r3}\{r_{1},r_{3}\} is a dominant strategy. Therefore, Oj=TrueO_{j}=\mathrm{True} if and only if agent ii and agent jj are in the same coalition under 𝒮\mathcal{S}^{*}. Moreover, since all agents other than jj only have one choice (either {r1}\{r_{1}\} or {}\{\varnothing\}), Ok=FalseO_{k}=\mathrm{False} for kN{j}k\in N\setminus\{j\}. ∎

Lemma A.1 shows that directed Braess’s paradoxes function just like directed prisoner’s dilemmas in Multiple-bit CSL. With an argument similar to the proof from Lemma 3.1 to Lemma 3.2, we can also show the following corollary about the product of multiple directed Braess’s paradoxes.

Corollary A.2.

Let {(ix,jx)x{1,2,,k}}\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\} be kk pairs of agents and 𝐎=𝒪(x=1k(ix,jx))\mathbf{O}=\mathcal{O}(\prod_{x=1}^{k}\mathcal{B}(i_{x},j_{x})). Then, Oj=TrueO_{j}=\mathrm{True} if and only if there exists x{1,2,,k} such that jx=jx\in\{1,2,\dots,k\}\textbf{ such that }j_{x}=j and ix[j]𝒮i_{x}\in[j]_{\mathcal{S}^{*}}.

To use the product of multiple directed Braess’s paradoxes in an algorithm to solve Multiple-bit CSL with congestion games, we also need to show that it is still a congestion game. We will show that congestion games are closed under the product operation in the following lemma.

Lemma A.3.

Let (𝒢1,Σ1),(𝒢2,Σ2)(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2}) be two game-strategy pairs and let (𝒢p,Σp)=(𝒢1,Σ1)×(𝒢2,Σ2)(\mathcal{G}_{p},\Sigma_{p})=(\mathcal{G}_{1},\Sigma_{1})\times(\mathcal{G}_{2},\Sigma_{2}). If 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} are both congestion games on NN, then 𝒢p\mathcal{G}_{p} is also a congestion game on NN.

Proof.

Let 𝒢1=(N,R1,(A1,i)iN,(c1,i)rR1)\mathcal{G}_{1}=(N,R_{1},(A_{1,i})_{i\in N},(c_{1,i})_{r\in R_{1}}) and 𝒢2=(N,R2,(A2,i)iN,(c2,i)rR2)\mathcal{G}_{2}=(N,R_{2},(A_{2,i})_{i\in N},(c_{2,i})_{r\in R_{2}}). We will write 𝒢p\mathcal{G}_{p} as a congestion game. According to Definition 3.3, in 𝒢p\mathcal{G}_{p}, the strategy space of agent ii is Ai=A1,i×A2,iA_{i}=A_{1,i}\times A_{2,i}. Moreover, given a strategy profile Σ=(σ1,i,σ2,i)iN\Sigma=(\sigma_{1,i},\sigma_{2,i})_{i\in N}, the utility of agent ii is given by ui(Σ)=rσ1,ic1,r(|{jNrσ1,j}|)rσ2,ic2,r(|{jNrσ2,j}|)u_{i}(\Sigma)=-\sum_{r\in\sigma_{1,i}}c_{1,r}(|\{j\in N\mid r\in\sigma_{1,j}\}|)-\sum_{r\in\sigma_{2,i}}c_{2,r}(|\{j\in N\mid r\in\sigma_{2,j}\}|). This shows that 𝒢p=(N,(R1,R2),(Ai)iN,((c1,i)rR1,(c2,i)rR2))\mathcal{G}_{p}=(N,(R_{1},R_{2}),(A_{i})_{i\in N},((c_{1,i})_{r\in R_{1}},\\ (c_{2,i})_{r\in R_{2}})) is a congestion game on NN. ∎

With Corollaries A.2 and A.3, we solve Multiple-bit CSL with congestion games just like we did with normal-form games in Algorithm 1. The only difference is that we query directed Braess’s paradoxes instead of directed prisoner’s dilemmas. We present the algorithm below.

1
Input: The number of agents nn and the multiple-bit observation oracle 𝒪\mathcal{O}.
Output: A coalition structure 𝒮\mathcal{S} of the agents.
2
3Query iN,jN,i<j(i,j)\prod_{i\in N,j\in N,i<j}\mathcal{B}(i,j) and observe 𝐎\mathbf{O};
4 Let Tj{1,2,,j1} if Oj=True else T_{j}\leftarrow\{1,2,\dots,j-1\}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }\varnothing  for each jN\textbf{ for each }j\in N;
5 while jN such that |Tj|2\exists j\in N\textbf{ such that }|T_{j}|\geq 2 do
6       Let Lj{the smallest |Tj|2 elements in Tj}L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\}  for each jN\textbf{ for each }j\in N;
7       Let RjTjLjR_{j}\leftarrow T_{j}\setminus L_{j} for each jNj\in N;
8       Query jN,iLj(i,j)\prod_{j\in N,i\in L_{j}}\mathcal{B}(i,j) and observe 𝐎\mathbf{O};
9       Let TjLj if Oj=True else RjT_{j}\leftarrow L_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }R_{j}  for each jN\textbf{ for each }j\in N;
10      
11Let 𝒮{{1},{2},,{n}}\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\};
12 for jN such that |Tj|j\in N\textbf{ such that }|T_{j}|\neq\varnothing do
13       Let ithe only element in Tji\leftarrow\text{the only element in $T_{j}$};
14       Merge [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S};
15      
16return 𝒮\mathcal{S};
Algorithm 4 Simultaneous Binary Search with Congestion Games
Theorem A.1.

Algorithm 4 solves Multiple-bit CSL with congestion games in log2n+2\log_{2}n+2 rounds.

Proof.

The proof is almost identical to the proof of Theorem 3.1. We first show the correctness of the algorithm. To do this, we will show three claims: (i) after Lines 1 to 7, for each jNj\in N, if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, Tj=T_{j}=\varnothing, otherwise, TjT_{j} contains only the smallest index of the agents in [j]S[j]_{S^{*}}; (ii) after Lines 8 to 11, 𝒮=𝒮\mathcal{S}=\mathcal{S}^{*}; (iii) the games queried in the algorithm are congestion games. (iii) is implied by Lemma A.3, so we only need to show (i) and (ii).

For (i), if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, then by Corollary A.2, Oj=FalseO_{j}=\mathrm{False}, and Tj=T_{j}=\varnothing on Line 2. Throughout the algorithm, TjT_{j} remains \varnothing. Otherwise, TjT_{j} contains the smallest index of the agents in [j]S[j]_{S^{*}} after Line 2. In the while loop, TjT_{j} is updated to LjL_{j} if Oj=TrueO_{j}=\mathrm{True}, and to RjR_{j} otherwise. Since LjL_{j} contains the smallest |Tj|/2\lfloor|T_{j}|/2\rfloor elements in TjT_{j}, by Corollary A.2, Oj=TrueO_{j}=\mathrm{True} if and only if the smallest index of the agents in [j]S[j]_{S^{*}} is in LjL_{j}. Therefore, after one iteration of the loop, TjT_{j} still contains only the smallest index of the agents in [j]S[j]_{S^{*}}, while the size of TjT_{j} is halved. The loop terminates after |Tj|=1|T_{j}|=1, and TjT_{j} contains only the smallest index of the agents in [j]S[j]_{S^{*}}.

For (ii), since every agent jj is either the agent with the smallest index in [j]S[j]_{S^{*}} or merged with that agent, each coalition is merged in 𝒮\mathcal{S} after Lines 8 to 11. Thus 𝒮\mathcal{S} becomes the same as 𝒮\mathcal{S}^{*}.

Next, we show the complexity of the algorithm. The while loop in Lines 3 to 7 runs at most log2n\lceil\log_{2}n\rceil times, and each iteration requires 11 query. Together with the query on Line 1, the total number of queries is at most log2n+1log2n+2\lceil\log_{2}n\rceil+1\leq\log_{2}n+2. ∎

Appendix B Multiple-bit CSL with Graphical Games

We then proceed to the case of Multiple-bit CSL with graphical games. Recall that in a graphical game [10], each agent ii is associated with a vertex in an undirected graph GG. We use NiNN_{i}\subseteq N to denote the set of neighbors of agent ii in GG, including agent ii itself.

Definition B.1.

A graphical game 𝒢\mathcal{G} is a tuple (N,G,(Ai)iN,(Mi)iN)(N,G,(A_{i})_{i\in N},(M_{i})_{i\in N}) where NN is the set of agents, GG is an undirected graph, AiA_{i} is the strategy space of agent ii, and Mi:jNiAjM_{i}:\prod_{j\in N_{i}}A_{j}\to\mathbb{R} is the local game matrix of agent ii. For a strategy profile Σ=(σi)iN\Sigma=(\sigma_{i})_{i\in N} where σiAi\sigma_{i}\in A_{i}, the utility of agent ii is given by ui(Σ)=Mi(ΣNi)u_{i}(\Sigma)=M_{i}(\Sigma_{N_{i}}), where ΣNi=(σj)jNi\Sigma_{N_{i}}=(\sigma_{j})_{j\in N_{i}} consists only of the strategies of agents in NiN_{i}. Here, NiN_{i} denotes the set of neighbors of agent ii in undirected graph GG.

If we let the undirected graph GG be a complete graph, then we recover the normal-form game setting. However, the reason why we consider Multiple-bit CSL with graphical games is that graphical games are compact representations of normal-form games. When the degree of each vertex in the graph is small, the graphical game representation is far succincter than the normal-form game representation. Therefore, we consider in this section the case where the algorithm can only design degree-dd graphical games, where the degree of each vertex in the graph is at most dd. In this case, we will need at least n1d\lceil\frac{n-1}{d}\rceil rounds to solve the Multiple-bit CSL problem. To prove this claim formally, we first show with the following lemma that in some cases, the observations reveal no information about whether a certain pair of agents are in the same coalition under 𝒮\mathcal{S}^{*}.

Lemma B.1.

Let i,ji,j be two agents in NN and let 𝒮0={{1},{2},,{n}},𝒮1=𝒮0{{i,j}}{{i},{j}}\mathcal{S}_{0}=\{\{1\},\{2\},\dots,\{n\}\},\mathcal{S}_{1}=\mathcal{S}_{0}\cup\{\{i,j\}\}\setminus\{\{i\},\{j\}\} be two coalition structures. If we query the oracle for a graphical game 𝒢\mathcal{G} where edge (i,j)(i,j) is not in the undirected graph GG, then the observation reveals no information about whether 𝒮=𝒮0\mathcal{S}^{*}=\mathcal{S}_{0} or 𝒮=𝒮1\mathcal{S}^{*}=\mathcal{S}_{1}.

Proof.

Let 𝒢=(N,G,(Ai)iN,(Mi)iN)\mathcal{G}=(N,G,(A_{i})_{i\in N},(M_{i})_{i\in N}) and let Σ=(σi)iN\Sigma=(\sigma_{i})_{i\in N} be the specified strategy profile in 𝒢\mathcal{G}. Consider agent ii’s deviation decision 𝒪i(𝒢,Σ)\mathcal{O}_{i}(\mathcal{G},\Sigma). Let BRi,𝒮0({i},Σ)\mathrm{BR}_{i,\mathcal{S}_{0}}(\{i\},\Sigma) and BRj,𝒮0({j},Σ)\mathrm{BR}_{j,\mathcal{S}_{0}}(\{j\},\Sigma) be the set of best responses of coalitions {i}\{i\} and {j}\{j\} to Σi\Sigma_{-i} and Σj\Sigma_{-j} if 𝒮=𝒮0\mathcal{S}^{*}=\mathcal{S}_{0} respectively. Since the local game matrix MiM_{i} only depends on ΣNi∌j\Sigma_{N_{i}}\not\ni j, no matter what strategy agent jj plays, the best response of {i}\{i\} to Σi\Sigma_{-i} is the same. Therefore, the set of best responses of coalition {i,j}\{i,j\} if 𝒮=𝒮1\mathcal{S}^{*}=\mathcal{S}_{1} is BRi,𝒮0({i},Σ)×BRj,𝒮0({j},Σ)\mathrm{BR}_{i,\mathcal{S}_{0}}(\{i\},\Sigma)\times\mathrm{BR}_{j,\mathcal{S}_{0}}(\{j\},\Sigma). This shows that whether 𝒮=𝒮0\mathcal{S}^{*}=\mathcal{S}_{0} or 𝒮=𝒮1\mathcal{S}^{*}=\mathcal{S}_{1} does not affect the deviation decision of agent ii. The same argument applies to agent jj. ∎

Theorem B.1.

Any algorithm that solves the Multiple-bit CSL problem with degree-dd graphical games requires at least n1d\lceil\frac{n-1}{d}\rceil rounds of interactions with the agents in the worst case.

Proof.

Suppose that the algorithm has queried the observation oracle kk times with game-strategy pairs {(𝒢1,Σ1),(𝒢2,Σ2),,(𝒢k,Σk)}\{(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2}),\dots,(\mathcal{G}_{k},\Sigma_{k})\}, where 𝒢i=(N,Gi,(Ai)iN,(Mi)iN)\mathcal{G}_{i}=(N,G_{i},(A_{i})_{i\in N},(M_{i})_{i\in N}) is a degree-dd graphical game. According to Lemma B.1, if the observations always behave as if 𝒮={{1},{2},,{n}}\mathcal{S}^{*}=\{\{1\},\{2\},\dots,\{n\}\}, then the algorithm cannot finalize the answer until for each pair of agents i,ji,j, the algorithm has queried the oracle for a graphical game where edge (i,j)(i,j) is in the undirected graph. Since the degree of each vertex in the graph is at most dd, the algorithm needs to query the oracle for at least n1d\lceil\frac{n-1}{d}\rceil different graphical games to include all edges adjacent to vertex 11. The theorem follows. ∎

Then, we present an algorithm that solves the Multiple-bit CSL problem with degree-dd graphical games in 2nd+2log2d+1\frac{2n}{d}+2\log_{2}d+1 rounds of interactions with the agents in the worst case. Note that dnd\leq n, given the lower bounds in Theorems 2.1 and B.1, the algorithm is optimal up to a constant factor.

The algorithm uses the product of multiple directed prisoner’s dilemmas to query the observation oracle. We will first show that such a product can be represented as a degree-dd graphical game.

Lemma B.2.

Let {(ix,jx)x{1,2,,k}}\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\} be kk pairs of agents and (𝒢p,Σp)=x=1k𝒫(ix,jx)(\mathcal{G}_{p},\Sigma_{p})=\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x}). Then, 𝒢p\mathcal{G}_{p} can be written as a degree-dd graphical game, where d=max{x=1k(𝕀[i=ix]+𝕀[i=jx])iN}d=\max\{\sum_{x=1}^{k}(\mathbb{I}\left[\vphantom{\sum}i=i_{x}\right]+\mathbb{I}\left[\vphantom{\sum}i=j_{x}\right])\mid i\in N\}, i.e., the maximum number of occurrences of any agent in the pairs.

Proof.

For two agents iN,jNi\in N,j\in N, if (i,j)(i,j) does not occur in any of the pairs, then, according to Definition 3.2, the utility of agent ii does not depend on agent jj’s strategy in any of the directed prisoner’s dilemmas 𝒫(ix,jx)x{1,2,,k}\mathcal{P}(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}. Since for each agent, the utility in the product game is the sum of the utilities in the factor games, the utility of agent ii in the product game also does not depend on agent jj’s strategy. This shows that 𝒢p\mathcal{G}_{p} can be represented as a graphical game with the graph’s edge set being {(ix,jx)x{1,2,,k}}\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\}. The lemma then follows. ∎

We then present the algorithm as Algorithm 5.

1
Input: The number of agents nn, the degree limit dd and the multiple-bit observation oracle 𝒪\mathcal{O}.
Output: A coalition structure 𝒮\mathcal{S} of the agents.
2
3Let sized2size\leftarrow\lfloor\frac{d}{2}\rfloor and cntnsizecnt\leftarrow\lceil\frac{n}{size}\rceil;
4 Let belongjj1size for each jNbelong_{j}\leftarrow\lfloor\frac{j-1}{size}\rfloor\textbf{ for each }j\in N;
5 for δ{0,1,,cnt1}\delta\in\{0,1,\dots,cnt-1\} do
6       Query iN,jN,i<j,belongjbelongi=δ𝒫(i,j)\prod_{i\in N,j\in N,i<j,belong_{j}-belong_{i}=\delta}\mathcal{P}(i,j) and observe 𝐎(δ)\mathbf{O}^{(\delta)};
7      
8Let Sj{δOj(δ)=True} for each jNS_{j}\leftarrow\{\delta\mid O^{(\delta)}_{j}=\mathrm{True}\}\textbf{ for each }j\in N;
9 Let δj1 if Sj= else min(Sj)\delta_{j}\leftarrow-1\textbf{ if }S_{j}=\varnothing\textbf{ else }\min(S_{j})  for each jN\textbf{ for each }j\in N;
10 Let Tj{ii<j,belongjbelongi=δj} if δj1 else  for each jNT_{j}\leftarrow\{i\mid i<j,belong_{j}-belong_{i}=\delta_{j}\}\textbf{ if }\delta_{j}\neq-1\textbf{ else }\varnothing\textbf{ for each }j\in N;
11 while jN such that δj=0,|Tj|2\exists j\in N\textbf{ such that }\delta_{j}=0,|T_{j}|\geq 2 do
12       Let Lj{the smallest |Tj|2 elements in Tj}L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\} for each jN,δj=0j\in N,\delta_{j}=0;
13       Let RjTjLjR_{j}\leftarrow T_{j}\setminus L_{j} for each jN,δj=0j\in N,\delta_{j}=0;
14       Query jN,δj=0,iRj𝒫(i,j)\prod_{j\in N,\delta_{j}=0,i\in R_{j}}\mathcal{P}(i,j) and observe 𝐎\mathbf{O};
15       Let TjRj if Oj=True else LjT_{j}\leftarrow R_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }L_{j}  for each jN,δj=0\textbf{ for each }j\in N,\delta_{j}=0;
16      
17while jN such that δj1,|Tj|2\exists j\in N\textbf{ such that }\delta_{j}\geq 1,|T_{j}|\geq 2 do
18       Let Lj{the smallest |Tj|2 elements in Tj}L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\} for each jN,δj1j\in N,\delta_{j}\geq 1;
19       Let RjTjLjR_{j}\leftarrow T_{j}\setminus L_{j} for each jN,δj1j\in N,\delta_{j}\geq 1;
20       Query jN,δj1,iRj𝒫(i,j)\prod_{j\in N,\delta_{j}\geq 1,i\in R_{j}}\mathcal{P}(i,j) and observe 𝐎\mathbf{O};
21       Let TjRj if Oj=True else LjT_{j}\leftarrow R_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }L_{j}  for each jN,δj1\textbf{ for each }j\in N,\delta_{j}\geq 1;
22      
23Let 𝒮{{1},{2},,{n}}\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\};
24 for jN such that |Tj|j\in N\textbf{ such that }|T_{j}|\neq\varnothing do
25       Let ithe only element in Tji\leftarrow\text{the only element in $T_{j}$};
26       Merge [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S};
27      
28return 𝒮\mathcal{S};
Algorithm 5 Block Decomposition and Simultaneous Binary Search with Graphical Games

Compared with the normal-form game setting, with degree-dd graphical games, the algorithm can only query products of multiple directed prisoner’s dilemmas subject to the degree constraint dd as described in Lemma B.2. To comply with this constraint, Algorithm 5 introduces an important idea, block decomposition, i.e., to partition the agents into blocks of size d2\lfloor\frac{d}{2}\rfloor and choose pairs to query according to the decomposition (Lines 1 to 2). As long as for each query, each agent is involved in directed prisoner’s dilemmas with at most two blocks of agents, the constraint is not violated.

The general idea of Algorithm 5 is to find for each agent jj, the “predecessor” of jj, i.e., the agent with the largest index that is smaller than jj in [j]𝒮[j]_{\mathcal{S}^{*}}. To do this, the algorithm takes a two-step approach. First, the algorithm tries to find which block the predecessor of each agent belongs to (Lines 3 to 6). We illustrate this process with an example in Fig. 5. The algorithm enumerates block index gap δ\delta and constructs one query for each δ\delta. The query for a given δ\delta includes 𝒫(i,j)\mathcal{P}(i,j) of all pairs of agents i,ji,j such that the difference between the block indices they belong to, i.e., belongjbelongibelong_{j}-belong_{i}, is δ\delta (Lines 3 to 4). In this way, an agent jj will be involved in directed prisoner’s dilemmas with agents in at most two blocks, block belongj+δbelong_{j}+\delta and block belongjδbelong_{j}-\delta in one query. Moreover, after the enumeration, since the algorithm observes for each agent ii and each block before belongibelong_{i}, whether there is an agent in the same coalition of ii in that block, it gets to determine the block index of each agent’s predecessor or conclude that it has no predecessors (Lines 5 to 6).

Then, the algorithm carries out a simultaneous binary search to find the predecessor of each agent (Lines 7 to 17). The algorithm first deals with the agents whose predecessors are in the same block as themselves (Lines 7 to 12). In this process, each agent jj will be involved in games only within its own block in the queries. Next, the algorithm deals with the agents whose predecessors are in a different block (Lines 13 to 17). In this process, each agent jj will be involved in games with its predecessor’s block, and agents outside block belongjbelong_{j} whose predecessors are in block belongjbelong_{j}. The number of such agents is also at most twice the block size.

Finally, the algorithm merges each agent with its predecessor (Lines 18 to 21).

135246Block 0Block 11Block 22
(a) When δ=0\delta=0
135246Block 0Block 11Block 22
(b) When δ=1\delta=1
135246Block 0Block 11Block 22
(c) When δ=2\delta=2
Figure 5: Example execution of Lines 1 to 6 in Algorithm 5 when n=6n=6 and size=2size=2. The vertices represent the agents and the edges represent the directed prisoner’s dilemmas that the algorithm queries for each δ={0,1,2}\delta=\{0,1,2\}. The algorithm first partitions the agents into cnt=nsize=3cnt=\lceil\frac{n}{size}\rceil=3 blocks, each containing at most size=2size=2 agents (Lines 1 to 2). Then, for each δ={0,1,2}\delta=\{0,1,2\}, the algorithm queries the oracle for the directed prisoner’s dilemmas that correspond to the edges shown in the figure (Lines 3 to 4). Using the observations, the algorithm then determines for each agent jj, δj=belongjbelongi\delta_{j}=belong_{j}-belong_{i}, where ii is jj’s predecessor in [j]𝒮[j]_{\mathcal{S}^{*}} (Lines 5 to 6). Note that in each query, any agent jj is involved in at most 2sized2size\leq d games.
Theorem B.2.

Let d2d\geq 2 be an even number, Algorithm 5 solves the Multiple-bit CSL problem with degree-dd graphical games in 2nd+2log2d+1\frac{2n}{d}+2\log_{2}d+1 rounds of interactions with the agents in the worst case.

Proof.

We first show the correctness of the algorithm. To do this, we will show four claims: (i) after Lines 1 to 6, for each jNj\in N, if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, δj=1\delta_{j}=-1; otherwise, let ii be the largest index in [j]S[j]_{S^{*}} that is smaller than jj, δj=belongjbelongi\delta_{j}=belong_{j}-belong_{i}; (ii) after Lines 7 to 17, for each jNj\in N, if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, Tj=T_{j}=\varnothing; otherwise, TjT_{j} contains only the largest index in [j]S[j]_{S^{*}} that is smaller than jj; (iii) after Lines 8 to 21, 𝒮=𝒮\mathcal{S}=\mathcal{S}^{*}; (iv) the games queried in Lines 4, 11, and 16 are degree-dd graphical games.

For (i), if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, then by Lemma 3.2, Oj(δ)=FalseO^{(\delta)}_{j}=\mathrm{False}, for all δ{0,1,,cnt1}\delta\in\{0,1,\dots,cnt-1\} on Line 4. As a result, Sj=S_{j}=\varnothing on Line 5, and δj=1\delta_{j}=-1 on Line 6. Otherwise, let ii be the largest index in [j]S[j]_{S^{*}} that is smaller than jj. Then, by Lemma 3.2, Oj(δ)=TrueO^{(\delta)}_{j}=\mathrm{True} for δ=belongjbelongi\delta=belong_{j}-belong_{i} on Line 4 since 𝒫(i,j)\mathcal{P}(i,j) is included as a factor game. Moreover, Oj(δ)=FalseO^{(\delta)}_{j}=\mathrm{False} for all δ<belongjbelongi\delta<belong_{j}-belong_{i} since ii is the largest index in [j]S[j]_{S^{*}} that is smaller than jj. Therefore, belongjbelongibelong_{j}-belong_{i} becomes the minimum of SjS_{j} on Line 5, and thus δj=belongjbelongi\delta_{j}=belong_{j}-belong_{i} on Line 6.

For (ii), if jj is the agent with the smallest index in [j]S[j]_{S^{*}}, then Tj=T_{j}=\varnothing on Line 7. Throughout the algorithm, TjT_{j} remains \varnothing. Otherwise, TjT_{j} contains the largest index, ii, in [j]S[j]_{S^{*}} that is smaller than jj after Line 7. Depending on whether belongjbelongi=0belong_{j}-belong_{i}=0 or not, either Lines 8 to 12 or Lines 13 to 17 will carry out a binary search to find this index ii. Note that in the while loop, TjT_{j} is updated to RjR_{j} if Oj=TrueO_{j}=\mathrm{True}, and to LjL_{j} otherwise. Since RjR_{j} contains the largest |Tj|/2\lceil|T_{j}|/2\rceil elements in TjT_{j}, by Lemma 3.2, Oj=TrueO_{j}=\mathrm{True} if and only if the largest index ii in [j]S[j]_{S^{*}} that is smaller than jj is in RjR_{j}. Thus, the binary search will find this index ii and end up with Tj={i}T_{j}=\{i\}.

For (iii), since every agent jj is either the agent with the smallest index in [j]S[j]_{S^{*}} or merged with its predecessor (the agent with the largest index that is smaller than jj) in [j]S[j]_{S^{*}}, each coalition is merged in 𝒮\mathcal{S} after Lines 18 to 21. Thus 𝒮\mathcal{S} becomes the same as 𝒮\mathcal{S}^{*}.

For (iv), consider an agent jj. In the game queried on line 4, agent jj is involved in at most 2sized2size\leq d factor games with agents {i|belongjbelongi|=δ}\{i\mid|belong_{j}-belong_{i}|=\delta\}; in the game queried on line 11, agent jj is involved in at most sizedsize\leq d factor games with agents {ibelongi=belongj}\{i\mid belong_{i}=belong_{j}\}; in the game queried on line 16, agent jj is involved in at most 2sized2size\leq d factor games with agents {ii<j,belongjbelongi=δj}{ii>j,δi1,belongibelongj=δi}\{i\mid i<j,belong_{j}-belong_{i}=\delta_{j}\}\cup\{i\mid i>j,\delta_{i}\geq 1,belong_{i}-belong_{j}=\delta_{i}\}. Therefore, the games queried in Lines 4, 11, and 16 are degree-dd graphical games by Lemma B.2.

Next, we show the complexity of the algorithm. The for loop in Lines 3 to 4 requires cnt=nsize=2ndcnt=\lceil\frac{n}{size}\rceil=\lceil\frac{2n}{d}\rceil queries, and the while loops in Lines 8 to 12 and Lines 13 to 17 require at most log2size=log2d1\lceil\log_{2}size\rceil=\lceil\log_{2}d\rceil-1 queries each. Therefore, the total number of queries is at most 2nd+2log2d22nd+2log2d+1\lceil\frac{2n}{d}\rceil+2\lceil\log_{2}d\rceil-2\leq\frac{2n}{d}+2\log_{2}d+1. This completes the proof. ∎

Appendix C Omitted Proofs

C.1 Proof of Theorem 4.1

See 4.1

Proof.

We first show the correctness of the algorithm. Consider the if statement on Line 8, if there exists jj such that Oj=TrueO_{j}=\mathrm{True}, then by Lemma 4.1, exactly one agent jj in TT has Oj=TrueO_{j}=\mathrm{True}, and agents ii and jj are in the same coalition under 𝒮\mathcal{S}^{*}. The algorithm then merges [i]𝒮[i]_{\mathcal{S}} and [j]𝒮[j]_{\mathcal{S}} in 𝒮\mathcal{S}, and removes jj from TT. Otherwise, by Lemma 4.1, no agent in TT is in [i]𝒮[i]_{\mathcal{S}^{*}}. The algorithm then removes ii from TT. Therefore, the inner while loop on Lines 5 to 12 finalizes agent ii’s coalition in 𝒮\mathcal{S} correctly, and removes the whole coalition from TT. The correctness of the algorithm then follows.

For the complexity, the algorithm starts with T=NT=N, and after each query on Line 7, it removes one agent from TT. Since the algorithm terminates when |T|=1|T|=1, it uses n1n-1 queries in total. ∎