Deviate or Not: Learning Coalition Structures
with Multiple-bit Observations in Games

Yixuan Even Xu Carnegie Mellon University. Email: [email protected] Zhe Feng Google Research. Email: [email protected] Fei Fang Carnegie Mellon University. Email: [email protected]

Abstract

We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of $n$ agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation – for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in $O(\log n)$ rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.

1 Introduction

Coalitions and collusive behavior are common in the real world. In auction systems, bidders may form coalitions to coordinate bids and exploit mechanisms for better odds of winning [1]. On ridesharing platforms like Uber and Lyft, some drivers might disconnect simultaneously to create an artificial shortage, triggering a price surge they can later benefit from [2, 3, 4]. While illegal collusion, like price fixing, is regulated by agencies such as the SEC in the United States, it remains challenging to eliminate, as seen in scandals like the LIBOR Scandal [5]. As AI systems become more advanced and widely used in domains like stock trading and autonomous driving, regulating these AI agents presents a growing challenge as seen by regulators and researchers [6, 7].

In response to such concerns, the problem of coalition structure learning (CSL) [8] has recently gained attention. In CSL, a platform aims to figure out which agents are in a coalition by designing a small number of games for the agents to play and make observations on the outcomes of these games. Notably, more than one coalition may exist simultaneously. In practice, the platform could represent a regulator, like an auction or ridesharing platform, aiming to detect collusion or design better mechanisms.

However, the existing study on CSL relies on a strong assumption that the platform can only observe whether a specified strategy profile is a Nash equilibrium (NE) in each round. With this single-bit observation oracle and other assumptions, optimal algorithms require $\Theta(n\log n)$ rounds of games to learn the coalition structure, where $n$ is the number of agents. Such an observation oracle is too restrictive for two reasons. First, for real-world systems with a large number of agents, $\Theta(n\log n)$ rounds is still prohibitively large. Second, it may be possible for the platform to know which agents would deviate from the specified strategy profile. For example, a ridesharing platform could observe which drivers disconnect from the platform given a pricing scheme.

In this paper, we aim to address this limitation and consider a multiple-bit observation oracle: the platform can observe an $n$ -dimensional binary vector in each round, where the $i$ -th bit indicates whether or not agent $i$ would like to deviate from the specified strategy profile. This oracle allows the platform to gain $n$ bits of information in each round, which is significantly more informative than the single-bit observation oracle, allowing the platform to learn the coalition structure with much fewer rounds. However, from a technical perspective, switching from the single-bit observation oracle to the multiple-bit observation oracle fundamentally changes the problem. With the single-bit observation, the platform can choose the game to use to get the next bit of observation, which means it is easy to ensure that each bit of observation brings in information. However, with multi-bit observation, the platform will get $n$ -bit information in a batch, and it is often inevitable that these $n$ -bits contain repetitive or highly correlated information. Therefore, the key challenge in algorithm design is how to design the games so that the $n$ bits of observation are more de-correlated and contain more information.

1.1 Our Results

Type of Games	Lower Bound	Complexity	Section
Normal-Form	$\log_{2}n-O(\log\log n)$	$\log_{2}n+2$	Section 3
Congestion	$\log_{2}n-O(\log\log n)$	$\log_{2}n+2$	Appendix A
Graphical	$\max(\log_{2}n,n/d)-O(\log\log n)$	$2n/d+2\log_{2}d+1$	Appendix B
Auctions	$\log_{2}n-O(\log\log n)$	$(1+\log_{2}n)(1+c)+1$	Section 4

Table 1: Summary of results.

Depending on the specific scenario of application, sometimes external constraints may limit the type of games that can be designed. For example, in an auction platform, the algorithm may only use auctions. Therefore, we study different settings of the type of games the algorithm may design. We summarize our results for different settings in Table 1.

Information-theoretical lower bound. We first show that any algorithm that learns the coalition structure requires at least $\log_{2}n-O(\log\log n)$ rounds regardless of the type of games used, where $n$ is the number of agents.

Normal-form games. Normal-form games are a general class of games. We consider the case when the platform can design any normal-form game and there are no external constraints on the type of games that can be designed. We show an algorithm, simultaneous binary search, that can learn the coalition structure in $\log_{2}n+2$ rounds using directed prisoner’s dilemmas, a type of designed games for agents to play to elicit coalition structures. The number of rounds matches the lower bound almost exactly.

Congestion games. Congestion games [9] are a well-studied type of games that model the allocation of resources, and has demonstrated its importance in real-world transportation system. For CSL in congestion games, we show that with a different type of games – directed Braess’s paradoxes – the simultaneous binary search algorithm can be adapted to learn the coalition structure with congestion games. The number of rounds required is also $\log_{2}n+2$ .

Graphical games. Graphical games [10] are another thoroughly researched class of games. Investigating CSL with graphical games can be useful in contexts like social networks. We consider degree- $d$ graphical games, i.e., each agent has at most $d$ neighbors in the underlying graph. In the negative direction, we show that any algorithm for the problem requires at least $\lceil\frac{n-1}{d}\rceil$ rounds. In the positive direction, we design an algorithm that couples simultaneous binary search with the idea of block decomposition, i.e., to divide the agents into blocks of size $\lfloor\frac{d}{2}\rfloor$ . See the construction of Algorithms 5 and 5 for details of this technique. Our algorithm learns the coalition structure in $2n/d+2\log_{2}d+1$ rounds. Given the two lower bounds, this algorithm is optimal up to a constant factor.

Auctions. Finally, we study CSL with auctions, in particular, second-price auctions with personalized reserves, which are widely adopted in online advertising systems. The format of auctions is much more restrictive and challenging than the previous types of games since there is only one winner in an auction, and it is difficult to elicit information from the other players. Our algorithm in this case learns the coalition structure in $(1+\log_{2}n)(1+c)+1$ rounds, where $c$ is the size of the largest coalition. When the coalitions are small, the number of rounds required is close to the lower bound.

1.2 Related Work

Our work lies in the general research direction of learning in games [11]. In particular, our work is closely related to the line of research on Inverse Game Theory, e.g., [12, 13, 14, 15], where these previous papers aim to infer the parameters and underlying payoffs of a game by observing the behavior of players. Our work adopts this perspective but seeks to understand the coalition structure in games, which is not considered in prior works. By strategically and adaptively selecting defender strategies and observing the attacker’s best responses, [16, 17, 18] show that one can uncover the underlying utility functions driving attacker behavior in Stackelberg security games. Our work learns in a similar adaptive manner but focuses on the coalition structure among agents. In addition, learning coalition structure has a fundamental difference from the above existing literature since the space of all possible coalition structures is exponentially large.

The closest related work is [8], where they focus on learning coalition structure through a single bit feedback, i.e., the game designed in the learning process only returns one-bit information – whether a specified strategy profile is a NE or not. In this work, we strictly generalize the model proposed by [8] that allows us to observe multiple-bit feedback rather than just binary feedback. This flexibility strictly increases the set of possible observations and thus the difficulty of designing the games. Moreover, [19, 20] provide general approaches to detect one (collusion) coalition in multi-agent games, whereas, our work aims to find the entire coalition structure in games.

There is also a line of research on learning with revealed preference feedback, e.g., [21, 22, 23, 24, 25, 26], in which they target to predict the strategic agents’ behavior or optimize the profit of the decision maker, through the revealed preference feedback from strategic agents. In this paper, we assume that we can observe the best response feedback of the agents for the games, where each game is carefully designed to elicit information on the coalition structure within a set of agents. Loosely related work includes no-regret learning in games [27, 28]. Indeed, if there is a cost to perform one game to learn coalition structure, our learning process achieves constant regret given the number of rounds needed to identify all coalitions only depends on the number of agents and the structure of the underlying games.

2 Notations and Preliminaries

Consider a set $N=\{1,2,\dots,n\}$ of $n$ strategic agents. A coalition $S\subseteq N$ is a nonempty subset of the agents, in which the agents can coordinate with each other in games. We assume there is a coalition structure $\mathcal{S}^{*}=\{{S_{1}}^{*},{S_{2}}^{*},\dots,{S_{m}}^{*}\}$ among these agents, which is a set partition of $N$ , and each agent in $N$ belongs to exactly one of these mutually disjoint coalitions. We use $[i]_{\mathcal{S}^{*}}$ to denote the coalition containing agent $i$ under $\mathcal{S}^{*}$ . $\mathcal{S}^{*}$ is not public information, but agents in each coalition know the other agents in the same coalition. In the Coalition Structure Learning (CSL) problem, our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the underlying coalition structure $\mathcal{S}^{*}$ . Specifically, we interact with the agents in a sequence of rounds. Each round, we design a game $\mathcal{G}$ along with a strategy profile $\Sigma=(\sigma_{1},\sigma_{2},\dots,\sigma_{n})$ in $\mathcal{G}$ for the agents, and we make an observation $\mathcal{O}(\mathcal{G},\Sigma)$ about the strategy profile $\Sigma$ in $\mathcal{G}$ . The task is to learn the underlying coalition structure $\mathcal{S}^{*}$ .

Behavior model.

We assume that the agents are rational and strategic. When playing a game $\mathcal{G}$ , agents in the same coalition under $\mathcal{S}^{*}$ can coordinate with each other to choose actions and will share rewards after the game. Thus, they effectively play as a joint player whose action space and utility are, respectively, the Cartesian product of the action spaces and the sum of utilities of the agents in that coalition. Since the agents are not aware of the underlying coalition structure beyond their own coalitions, the information in the game is incomplete. We do not assume the agents’ full behavior model, but focus on the agents’ deviation decisions from the specified strategy profile.

Multiple-bit observation oracle.

In this paper, we consider the multiple-bit observation oracle as opposed to the single-bit observation oracle considered in [8]. The observation $\mathcal{O}(\mathcal{G},\Sigma)$ is an $n$ -dimensional binary vector, where the $i$ -th bit $\mathcal{O}_{i}(\mathcal{G},\Sigma)\in\{\mathrm{True},\mathrm{False}\}$ indicates whether or not agent $i$ would like to deviate from the specified strategy $\sigma_{i}$ when the coalition is best responding as a whole. In particular, given a game $\mathcal{G}$ and a strategy profile $\Sigma$ , for each coalition $S\in\mathcal{S}^{*}$ , agents in $S$ would first determine whether $\Sigma_{S}$ is a joint best response to the other agents’ specified strategy $\Sigma_{-S}$ in $\mathcal{G}$ . If it is, then $\mathcal{O}_{i}=\mathrm{False}$ for each agent $i\in S$ , i.e., they would decide not to deviate. Otherwise, they would compute a possible joint best response $\mathrm{BR}(S,\Sigma)$ to the other agents’ specified strategy $\Sigma_{-S}$ in $\mathcal{G}$ . Then, they would compute the deviation decision $\mathcal{O}_{i}=\mathbb{I}\left[\vphantom{\sum}\mathrm{BR}_{i}(S,\Sigma)\neq\sigma_{i}\right]$ for each agent $i\in S$ . The observation $\mathcal{O}(\mathcal{G},\Sigma)$ is the concatenation of the decisions of all agents in $N$ . If there are multiple best responses for a coalition, we assume that the agents in that coalition would choose any of them in the worst case. We will use “query the observation oracle” or “query $(\mathcal{G},\Sigma)$ ” to refer to the process of designing a game $\mathcal{G}$ and a strategy profile $\Sigma$ and receiving the observation $\mathcal{O}(\mathcal{G},\Sigma)$ .

2.1 Lower Bound of the Number of Rounds

We first show a lower bound on the number of rounds required to solve the Multiple-bit CSL problem. The lower bound is based on an information-theoretical argument similar to [8]. Note that this lower bound holds for any type of games that the algorithm may design.

Theorem 2.1.

Any algorithm that solves the Multiple-bit CSL problem requires at least $\log_{2}n-O(\log\log n)$ rounds of interactions with the agents in the worst case.

Proof.

In each round, the algorithm receives at most $n$ bits of information. As the number of possible set partitions of $N$ is the Bell number $B_{n}$ , to distinguish between them, we need at least $\left\lceil\log_{2}B_{n}\right\rceil=n\log_{2}n-O(n\log_{2}\log_{2}n)$ bits of information. The equation follows from the asymptotic expression of Bell number established in [29]. The theorem then follows. ∎

3 Multiple-bit CSL with Normal-form Games

We show in this section an algorithm that solves Multiple-bit CSL with normal-form games in $\log_{2}n+2$ rounds. Note that the number of rounds matches the lower bound in Theorem 2.1 up to $O(\log\log n)$ . The main idea of the algorithm is to use binary search to find another agent in the same coalition for each agent simultaneously. We start by defining game-strategy pairs and a special pair that we call the directed prisoner’s dilemma, which we use in the algorithm.

Definition 3.1.

A game-strategy pair $(\mathcal{G},\Sigma)$ is a $n$ -player normal-form game along with a pure strategy profile $\Sigma$ in $\mathcal{G}$ . Here, $\Sigma$ is called the specified strategy profile of $(\mathcal{G},\Sigma)$ .

Definition 3.2.

For $i\in N,j\in N$ , a directed prisoner’s dilemma $\mathcal{P}(i,j)$ is a game-strategy pair $(\mathcal{G},\Sigma)$ . In game $\mathcal{G}$ , agent $j$ has two actions $\{\textup{C},\textup{D}\}$ , and all other agents only have one action $\{\textup{D}\}$ . If agent $j$ plays C, then $i$ and $j$ receive utility $2$ and $-1$ respectively. Otherwise they both receive $0$ utility. All other agents always receive $0$ utility. $\Sigma$ is the pure strategy profile where all agents play D.

\begin{array}[]{|c|c|c|}\hline\cr&\textup{C}&\textup{D}\\ \hline\cr\textup{D}&(2,-1)&(0,0)\\ \hline\cr\end{array}

Figure 1: The payoff table of agents

i

and

j

in the directed prisoner’s dilemma

\mathcal{P}(i,j)

. Agent

i

is the row player and agent

j

is the column player. The others only have one action and are not shown in the table.

Given the construction of directed prisoner’s dilemma, we have the following observation.

Lemma 3.1.

Let $i,j\in N$ and $\mathbf{O}=\mathcal{O}(\mathcal{P}(i,j))$ . Then, $O_{j}=\mathrm{True}$ if and only if agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ . Moreover, $O_{k}=\mathrm{False}$ for $k\in N\setminus\{j\}$ .

Proof.

If agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ , then playing C is a dominant strategy for agent $j$ ’s coalition. Otherwise, playing D is a dominant strategy. Therefore, $O_{j}=\mathrm{True}$ if and only if agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ . Moreover, since all agents other than $j$ only have one action $\{\textup{D}\}$ , $O_{k}=\mathrm{False}$ for $k\in N\setminus\{j\}$ . ∎

Lemma 3.1 shows that we can use directed prisoner’s dilemmas to determine whether two agents are in the same coalition under $\mathcal{S}^{*}$ . However, querying a single directed prisoner’s dilemma is not efficient as we only get $1$ bit of information. To utilize the information from the observation oracle more efficiently, we will need to consider the product of multiple directed prisoner’s dilemmas.

Definition 3.3 ([8]).

Let $(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2})$ be two game-strategy pairs where $A_{x,i},u_{x,i}$ are the action set and utility function of agent $i$ in $\mathcal{G}_{x}$ respectively for $x\in\{1,2\}$ . Let $\Sigma_{1}=(\sigma_{1,i})_{i\in N}$ and $\Sigma_{2}=(\sigma_{2,i})_{i\in N}$ . The product of $(\mathcal{G}_{1},\Sigma_{1})$ and $(\mathcal{G}_{2},\Sigma_{2})$ is a game-strategy pair $(\mathcal{G}_{p},\Sigma_{p})$ . Here, $\mathcal{G}_{p}$ is a normal-form game with action set $A_{1,i}\times A_{2,i}$ and utility function $u_{1,i}+u_{2,i}$ for each $i\in N$ . $\Sigma_{p,i}=(\sigma_{1,i},\sigma_{2,i})_{i\in N}$ . We denote the product of $(\mathcal{G}_{1},\Sigma_{1})$ and $(\mathcal{G}_{2},\Sigma_{2})$ as $(\mathcal{G}_{1},\Sigma_{1})\times(\mathcal{G}_{2},\Sigma_{2})$ or $\prod_{x=1}^{2}(\mathcal{G}_{x},\Sigma_{x})$ . The game-strategy pairs $(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2})$ are called the factors of $(\mathcal{G}_{p},\Sigma_{p})$ .

Lemma 3.2.

Let $\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\}$ be $k$ pairs of agents and $\mathbf{O}=\mathcal{O}(\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x}))$ . Then, $O_{j}=\mathrm{True}$ if and only if there exists $x\in\{1,2,\dots,k\}\textbf{ such that }j_{x}=j$ and $i_{x}\in[j]_{\mathcal{S}^{*}}$ .

Proof.

By Definition 3.3, playing the product game is equivalent to separately playing each factor game, and summing up the resulting utilities of each agent. Therefore, agent $j$ decides to deviate in the product game $\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x})$ if and only if agent $j$ decides to deviate in at least one factor game. The result then follows from Lemma 3.1. ∎

A graphical interpretation of Lemma 3.2 is as follows. When we query the game-strategy pair $\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x})$ , consider a directed graph where each vertex represents an agent and each edge $(j_{x},i_{x})$ , $x\in\{1,2,\dots,k\}$ represents $\mathcal{P}(i_{x},j_{x})$ . The observation tells us for each vertex if any of its outgoing neighbors in this constructed graph is in the same coalition. With Lemma 3.2, we are ready for our Algorithm 1 for Multiple-bit CSL with normal-form games.

Input: The number of agents

n

and the multiple-bit observation oracle

\mathcal{O}

Output: A coalition structure

\mathcal{S}

of the agents.

3Query

\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j)

and observe

\mathbf{O}

;

4 Let

T_{j}\leftarrow\{1,2,\dots,j-1\}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }\varnothing

\textbf{ for each }j\in N

;

5 while $\exists j\in N\textbf{ such that }|T_{j}|\geq 2$ do

6 Let

L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\}

for each

j\in N

;

7 Let

R_{j}\leftarrow T_{j}\setminus L_{j}

for each

j\in N

;

8 Query

\prod_{j\in N,i\in L_{j}}\mathcal{P}(i,j)

and observe

\mathbf{O}

;

9 Let

T_{j}\leftarrow L_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }R_{j}

\textbf{ for each }j\in N

;

11Let

\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\}

;

12 for $j\in N\textbf{ such that }|T_{j}|\neq\varnothing$ do

13 Let

i\leftarrow\text{the only element in $T_{j}$}

;

14 Merge

[i]_{\mathcal{S}}

and

[j]_{\mathcal{S}}

\mathcal{S}

;

16return $\mathcal{S}$ ;

Algorithm 1 Simultaneous Binary Search

Intuitively, given an agent $j$ , if we want to find another agent $i\ (i<j)$ in the same coalition with $j$ , we can query a game-strategy pair $\prod_{k\in T_{j}}\mathcal{P}(k,j)$ where $T_{j}=\{1,2,\ldots,j-1\}$ initially. According to Lemma 3.2, we will know whether there is such an agent in $T_{j}$ from the $j$ -th bit of the observation. If there is such an agent, we can then bisect the set $T_{j}$ and apply binary search to find this agent. Moreover, this binary search can be done simultaneously for different agents. For example, given two agents $j$ and $j^{\prime}$ , we can construct the game-strategy pairs $\prod_{k\in T_{j}}\mathcal{P}(k,j)\times\prod_{k\in T_{j^{\prime}}}\mathcal{P}(k,j^{\prime})$ and apply binary search at the same time on $T_{j}$ and $T_{j^{\prime}}$ using the $j$ -th and $j^{\prime}$ -th bits of the observation. This is because the $j$ -th bit in the observation vector only depends on whether there exists another agent in $T_{j}$ in the same coalition with agent $j$ , and the same argument applies to agent $j^{\prime}$ .

Algorithm 1 leverages this intuition and works in two stages. In the first stage (Lines 1 to 7), we try to find for each agent $j$ the smallest index of the agents in $[j]_{\mathcal{S}^{*}}$ . To do this, we first query $\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j)$ (Line 1). In this game, agent $j$ has the incentive to deviate only when another agent $i$ with a smaller index than $j$ is in the same coalition with $j$ . Therefore, using the $j$ -th bit from the observation, we can determine whether agent $j$ is the agent with the smallest index in $[j]_{\mathcal{S}^{*}}$ (Line 2). For agents who are not, we then use binary search to locate the smallest indexed agents in their coalitions simultaneously (Lines 3 to 6). In the second stage (Lines 8 to 11), it merges each coalition in $\mathcal{S}^{*}$ together according to the result of the first stage.

We provide an example of Algorithm 1 in Fig. 2.

(a) First query

(b) Second query

Figure 2: Example execution of Algorithm 1 when

\mathcal{S}^{*}=\{\{1,4\},\{2,3\}\}

. The vertices represent the agents, the edges represent the directed prisoner’s dilemmas that the algorithm queries each time, and the dashed rectangles represent the sets

T_{j}

. In the first query, the algorithm queries

\prod_{i\in N,j\in N,i<j}\mathcal{P}(i,j)

(Line 1) as shown in (a). Using the observations, the algorithm sets

T_{1}=T_{2}=\varnothing

T_{3}=\{1,2\}

, and

T_{4}=\{1,2,3\}

(Line 2). In the second query,

T_{3}

is bisected into

L_{3}=\{1\},R_{3}=\{2\}

T_{4}

is bisected into

L_{4}=\{1\},R_{4}=\{2,3\}

, and the algorithm queries

\prod_{j\in N,i\in L_{j}}\mathcal{P}(i,j)

(Lines 3 to 6) as shown in (b). Using the observations, the algorithm sets

T_{3}=L_{3}=\{1\}

and

T_{4}=R_{4}=\{2,3\}

(Line 7) as shown in (c). Finally, the algorithm merges

\{1\}

and

\{4\}

, and

\{2\}

and

\{3\}

together to recover the coalition structure (Lines 8 to 11).

Theorem 3.1.

Algorithm 1 solves Multiple-bit CSL with normal-form games in $\log_{2}n+2$ rounds.

Proof.

We first show the correctness of the algorithm. To do this, we will show two claims: (i) after Lines 1 to 7, for each $j\in N$ , if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , $T_{j}=\varnothing$ , otherwise, $T_{j}$ contains only the smallest index of the agents in $[j]_{S^{*}}$ ; (ii) after Lines 8 to 11, $\mathcal{S}=\mathcal{S}^{*}$ .

For (i), if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , then by Lemma 3.2, $O_{j}=\mathrm{False}$ , and $T_{j}=\varnothing$ on Line 2. Throughout the algorithm, $T_{j}$ remains $\varnothing$ . Otherwise, $T_{j}$ contains the smallest index of the agents in $[j]_{S^{*}}$ after Line 2. In the while loop (Line 3 to 7), $T_{j}$ is updated to $L_{j}$ if $O_{j}=\mathrm{True}$ , and to $R_{j}$ otherwise. Since $L_{j}$ contains the smallest $\lfloor|T_{j}|/2\rfloor$ elements in $T_{j}$ , by Lemma 3.2, $O_{j}=\mathrm{True}$ if and only if the smallest index of the agents in $[j]_{S^{*}}$ is in $L_{j}$ . Therefore, after one iteration of the loop, $T_{j}$ still contains only the smallest index of the agents in $[j]_{S^{*}}$ , while the size of $T_{j}$ is halved. The loop terminates after $|T_{j}|=1$ , and $T_{j}$ contains only the smallest index of the agents in $[j]_{S^{*}}$ .

For (ii), since every agent $j$ is either the agent with the smallest index in $[j]_{S^{*}}$ or merged with that agent, each coalition is merged in $\mathcal{S}$ after Lines 8 to 11. Thus $\mathcal{S}$ becomes the same as $\mathcal{S}^{*}$ .

Next, we show the complexity of the algorithm. The while loop in Lines 3 to 7 runs at most $\lceil\log_{2}n\rceil$ times, and each iteration requires $1$ query. Together with the query on Line 1, the total number of queries is at most $\lceil\log_{2}n\rceil+1\leq\log_{2}n+2$ . ∎

Multiple-bit CSL with Other Types of Games.

A natural next step after this section is to consider the case where the algorithm can only design games of a specific type. We will show that the idea of simultaneous binary search from Algorithm 1 can be adapted to solve the Multiple-bit CSL problem with congestion games (Appendix A), graphical games (Appendix B), or auctions (Section 4). The adaptation to congestion games is natural once we construct a new game-strategy pair that is similar in spirit to the directed prisoner’s dilemma. However, the adaptation to graphical games and auctions requires more innovations in algorithm design.

4 Multiple-bit CSL with Auctions

We consider in this section the case of Multiple-bit CSL with auctions. The auction format we consider is theoretically well-studied second-price auctions with personalized reserves [30], which is widely used in practice. We first formally define this format.

Definition 4.1.

A second-price auction with personalized reserves $\mathcal{G}$ is a tuple $(N,\mathbf{v},\mathbf{r})$ where $N$ is the set of agents, $\mathbf{v}=(v_{1},v_{2},\dots,v_{n})$ is the valuation vector, and $\mathbf{r}=(r_{1},r_{2},\dots,r_{n})$ is the reserve price vector. The auction proceeds as follows: Each agent $i$ submits a bid $\sigma_{i}\in\mathbb{R}_{\geq 0}$ and the agent with the highest bid wins the auction with ties broken at uniform random. If the winner $i$ bids greater than $r_{i}$ , then the winner gets the item at a price equal to the maximum between the second highest bid and $r_{i}$ . The item can be reallocated among $[i]_{\mathcal{S}^{*}}$ . Otherwise, the item is not allocated.

Our algorithm for the Multiple-bit CSL problem with auctions works in $(1+\log_{2}n)(1+c)+1$ rounds, where $c$ is size of the largest coalition in $\mathcal{S}^{*}$ . To introduce the algorithm, we first define the game-strategy pairs that we will use in the algorithm, the auction game gadgets, where we will specify the valuation and reserve prices of every agents in each partition bucket.

Definition 4.2.

Let $\{X,Y,Z\}$ be a partition of $N$ , i.e., $X\cap Y=X\cap Z=Y\cap Z=\varnothing$ and $X\cup Y\cup Z=N$ . An auction gadget $\mathcal{A}(X,Y,Z)$ is a game-strategy pair $(\mathcal{G},\Sigma)$ where $\mathcal{G}=(N,\mathbf{v},\mathbf{r})$ is a second-price auction with personalized reserves, and $\Sigma$ is a specified bid vector for the agents, where

v_{i}=\left\{\begin{array}[]{ll}1&(i\in X)\\ 0&(i\in Y)\\ 0&(i\in Z),\end{array}\right.r_{i}=\left\{\begin{array}[]{ll}1&(i\in X)\\ 0&(i\in Y)\\ 1&(i\in Z),\end{array}\right.\text{and}\ \ \ \sigma_{i}=0

Note that in the above auction game gadgets, we do not require each agent to know all the other agents’ valuations so that the valuation of each agent is still private across strategic agents. Only agents in the same coalition know each other’s true valuation.

Lemma 4.1.

For $\{X,Y,Z\}$ that is a partition of $N$ , let $\mathbf{O}=\mathcal{O}(\mathcal{A}(X,Y,Z))$ . For each coalition $S\in\mathcal{S}^{*}$ , if $S\cap X\neq\varnothing$ and $S\cap Y\neq\varnothing$ , then, there is exactly one agent $i\in S\cap Y$ with $O_{i}=\mathrm{True}$ and $O_{j}=\mathrm{False}$ for all $j\in S\setminus\{i\}$ . Otherwise, $O_{i}=\mathrm{False}$ for all $i\in S$ .

Proof.

For each coalition $S\in\mathcal{S}^{*}$ . We first consider the case where $S\cap X\neq\varnothing$ and $S\cap Y\neq\varnothing$ . Let agent $i\in S\cap X$ and agent $j\in S\cap Y$ . If the coalition follows the specified strategy profile $\Sigma_{S}$ , then the item will not be allocated, so the utility of the agents in $S$ is $0$ . However, If agent $j$ raises its bid to $\sigma^{\prime}_{j}>0$ , then it becomes the winner of the auction, and the item will be allocated to $j$ with price $\max\{second\ highest\ bid,r_{j}\}=0$ . The item can then be reallocated to agent $i$ , giving utility $1$ to the agents in $S$ . Thus, $\Sigma_{S}$ is not a joint best response to $\Sigma_{-S}$ . Moreover, for agents in $S$ , the only joint best response is to let exactly one agent in $S\cap Y$ raise its bid to positive, and the other agents in $S$ keep their bids at $0$ . Thus, the observation $\mathbf{O}$ is as described in the lemma.

Otherwise, either $S\cap X=\varnothing$ or $S\cap Y=\varnothing$ . If $S\cap X=\varnothing$ , then no one in $S$ has a positive valuation for the item, so the utility of the agents in $S$ cannot be greater than $0$ . If $S\cap Y=\varnothing$ , then no one in $S$ has a reserve price that is smaller than $1$ , which is the largest possible valuation. Thus, the utility of the agents in $S$ cannot be greater than $0$ , either. In both cases, keeping the bids at $0$ is a joint best response to $\Sigma_{-S}$ , so the observation $\mathbf{O}$ is as described in the lemma. ∎

The auction gadgets are important building blocks for our algorithm that works in $(1+\log_{2}n)(1+c)+1$ rounds. To help demonstrate how they can be used to solve Multiple-bit CSL, we present a simpler algorithm that works in $n-1$ rounds using the auction gadgets below.

Input: The number of agents

n

and the multiple-bit observation oracle

\mathcal{O}

Output: A coalition structure

\mathcal{S}

of the agents.

3Let

T\leftarrow N

;

4 Let

\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\}

;

5 while $|T|\geq 2$ do

6 Let

i\leftarrow\text{the first element in $T$}

;

7 while $|T|\geq 2$ and $i\in T$ do

8 Let

X\leftarrow\{i\},Y\leftarrow T\setminus\{i\}

and

Z\leftarrow N\setminus S

;

9 Query

\mathcal{A}(X,Y,Z)

and observe

\mathbf{O}

;

10 if $\exists j$ such that $O_{j}=\mathrm{True}$ then

11 Merge

[i]_{\mathcal{S}}

and

[j]_{\mathcal{S}}

\mathcal{S}

;

T\leftarrow T\setminus\{j\}

;

14 else

T\leftarrow T\setminus\{i\}

;

19return $\mathcal{S}$ ;

Algorithm 2 Iterative Location with Auctions

To provide some intuitions, consider an agent $i\in N$ . If we want to find another agent in the same coalition as $i$ , we can query the auction gadget $\mathcal{A}(\{i\},N\setminus\{i\},\varnothing)$ . According to Lemma 4.1, if the observation $\mathbf{O}$ has $O_{j}=\mathrm{True}$ for some $j\in N\setminus\{i\}$ , then $i$ and $j$ are in the same coalition under $\mathcal{S}^{*}$ . Otherwise, $i$ is in a coalition by itself. Then, suppose we have found an agent $j$ in $i$ ’s coalition, and we would like to find a different agent in that coalition. We can query the auction gadget $\mathcal{A}(\{i\},N\setminus\{i,j\},\{j\})$ . By moving agent $j$ to the set $Z$ , we ensure that $j$ does not have the incentive to deviate, and either another agent in the same coalition as $i$ will have $O_{k}=\mathrm{True}$ for some $k\in N\setminus\{i,j\}$ , or we can conclude that $i$ and $j$ are in a coalition by themselves. We can then repeat this process until we have found all the agents in the same coalition as $i$ . After that, we can move $i$ to the set $Z$ and proceed with the next agent in $N$ . In this way, we can recover the coalition structure $\mathcal{S}^{*}$ . Moreover, since each time we query an auction gadget, we move one agent to set $Z$ , and the algorithm terminates when $|Z|=n-1$ , it uses $n-1$ queries in total.

Theorem 4.1.

Algorithm 2 solves the Multiple-bit CSL problem with auctions in $n-1$ rounds.

We defer the proof to Appendix C. Other than the auction gadgets, we will also use another game-strategy pair in the algorithm. We prove the following lemma about the observations obtained by querying this game-strategy pair.

Lemma 4.2.

Let $\mathbf{O}=\mathcal{O}((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0})$ , where $\mathbf{0},\mathbf{1}$ are vectors consisting purely of $0$ s and $1$ s respectively. For each coalition $S\in\mathcal{S}^{*}$ , exactly one agent $i\in S$ has $O_{i}=\mathrm{True}$ .

Proof.

For each coalition $S\in\mathcal{S}^{*}$ , if any agent $i$ in $S$ raises its bid to positive, then the item will be allocated to $i$ with price $0$ , giving utility $1$ to the agents in $S$ . Thus, $\Sigma_{S}$ is not a joint best response to $\Sigma_{-S}$ . Moreover, the only joint best response for agents in $S$ is to let exactly one agent in $S$ raise its bid to positive, and the other agents in $S$ keep their bids at $0$ . The lemma then follows. ∎

Then, we are ready to present our algorithm that works in $(1+\log_{2}n)(1+c)+1$ rounds for Multiple-bit CSL with auctions. The algorithm is shown in Algorithm 3.

Input: The number of agents

n

and the multiple-bit observation oracle

\mathcal{O}

Output: A coalition structure

\mathcal{S}

of the agents.

3Query

((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0})

and observe

\mathbf{O}

;

4 Let

T_{x}\leftarrow\{i\mid O_{i}=\mathrm{True}\}

and

T_{y}\leftarrow N\setminus T_{x}

;

5 for $b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}$ do

6 Let

X\leftarrow\{i\mid i\in T_{x}\textbf{ and }

\text{the $b$-th lowest binary bit of $i$ is $1$}\}

;

7 Let

T_{\mathrm{False}}\leftarrow T_{y}

;

8 repeat

9 Let

Y\leftarrow T_{\mathrm{False}}

and

Z\leftarrow N\setminus(X\cup Y)

;

10 Query

\mathcal{A}(X,Y,Z)

and observe

\mathbf{O}

;

11 Let

T_{\mathrm{True}}\leftarrow\{i\mid i\in T_{\mathrm{False}}\textbf{ and }O_{i}=\mathrm{True}\}

;

T_{\mathrm{False}}\leftarrow T_{\mathrm{False}}\setminus T_{\mathrm{True}}

;

14 until $T_{\mathrm{True}}=\varnothing$ ;

15 Let

O^{(b)}_{i}\leftarrow\mathbb{I}\left[\vphantom{\sum}i\not\in T_{\mathrm{False}}\right]\textbf{ for each }i\in T_{y}

;

17Let

\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\}

;

18 for $i\in T_{y}$ do

19 Let

j\leftarrow\sum_{b=0}^{\lfloor\log_{2}n\rfloor}2^{b}\cdot O^{(b)}_{i}

;

20 Merge

[i]_{\mathcal{S}}

and

[j]_{\mathcal{S}}

\mathcal{S}

;

22return $\mathcal{S}$ ;

Algorithm 3 Bitwise Search with Auctions

(a) When

b=0

(b) When

b=1

b=2

Figure 3: Example execution of the bitwise search (Lines 1 to 12) in Algorithm 3 when

\mathcal{S}^{*}=\{\{1,4\},\{2,3\},\{5,6\}\}

. The vertices represent the agents, and the dashed lines represent the sets used in the algorithm. Using the first query, the algorithm identifies one agent in each coalition and groups them as

T_{x}=\{1,3,5\}

. The rest are

T_{y}=\{2,4,6\}

(Lines 1 to 2). Then, as shown above, for each

b\in\{0,1,2\}

, the algorithm picks the set of agents in

T_{x}

with the

b

-th lowest binary bit as

1

X

, and partitions

T_{y}

into

T_{\mathrm{True}}

, each of which is cooperating with some agent in

X

, and

T_{\mathrm{False}}

, each of which is not cooperating with any agent in

X

(Lines 3 to 12).

The algorithm starts by querying the game-strategy pair $((N,\mathbf{v}=\mathbf{1},\mathbf{r}=\mathbf{0}),\Sigma=\mathbf{0})$ (Line 1). Using the obtained observation, by Lemma 4.2, the algorithm identifies exactly one agent in each coalition, groups them into set $T_{x}$ and the rest of the agents into set $T_{y}$ (Lines 2). Essentially, set $T_{x}$ contains a set of agents who are definitely not in the same coalition with each other. Furthermore, for each agent $i$ in $T_{y}$ , there is exactly one “teammate” of $i$ in $T_{x}$ . We denote the index of this agent as $\alpha(i)$ .

An important idea behind Algorithm 3 is bitwise search, i.e., to determine each binary bit of $\alpha(i)$ separately. To do this, for each $b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}$ , the algorithm picks the set of agents in $T_{x}$ such that the $b$ -th lowest binary bits of their index are $1$ as $X$ (Line 4), and tries to tell which agents in $T_{y}$ are in the same coalition as some agent in $X$ by querying auction gadgets. Intuitively, if we let $Y=T_{y},Z=N\setminus(X\cup Y)$ and query $\mathcal{A}(X,Y,Z)$ , by Lemma 4.1, for each coalition that contains an agent in $X$ , one of its members in $Y$ will decide to deviate. Thus, we can move these agents to $Z$ like how we did in Algorithm 2, until no agents in $Y$ decide to deviate. The remaining agents in $Y$ are then the agents that are not in the same coalition as any agent in $X$ . The algorithm leverages this idea to determine each binary bit of $\alpha(i)$ separately, and stores the results in $O^{(b)}_{i}$ (Lines 5 to 12). Note that for each $b$ , the algorithm uses at most $1+c$ queries, where $c$ is the size of the largest coalition in $\mathcal{S}^{*}$ . We illustrate the execution of the bitwise search with an example in Fig. 3.

Finally, the algorithm recovers $\alpha(i)$ for each agent $i$ in $T_{y}$ by summing up the values of the binary bits in $O^{(b)}_{i}$ (Line 15), and merges $i$ with $\alpha(i)$ accordingly (Line 16).

Theorem 4.2.

Algorithm 3 solves the Multiple-bit CSL problem with auctions in $(1+\log_{2}n)(1+c)+1$ rounds, where $c$ is size of the largest coalition in $\mathcal{S}^{*}$ .

Proof.

We first show the correctness of the algorithm. By Lemma 4.2, after Line 2, $T_{x}$ contains exactly one agent in each coalition. This means that for each agent $i\in T_{y}$ , there is exactly one agent $j\in T_{x}$ such that $i$ and $j$ are in the same coalition. We denote this agent $j$ as $\alpha(i)$ . Then, we will show two claims: (i) after Lines 3 to 12, for each $b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}$ and each $i\in T_{y}$ , $O^{(b)}_{i}=\mathrm{True}$ if and only if the $b$ -th lowest binary bit of $\alpha(i)$ is $1$ , and (ii) after Lines 13 to 16, $\mathcal{S}=\mathcal{S}^{*}$ .

For (i), we only need to show that after the repeat loop in Lines 6 to 11, $T_{\mathrm{False}}$ contains exactly the agents $i\in T_{y}$ such that the $b$ -th lowest binary bit of $\alpha(i)$ is $0$ . If an agent $i\in T_{y}$ has the $b$ -th lowest binary bit of $\alpha(i)$ as $0$ , then $\alpha(i)\not\in X$ after Line 4. Thus, by Lemma 4.1, $O_{i}$ will never be $\mathrm{True}$ on Line 8, and $i\in T_{\mathrm{False}}$ after the repeat loop. If an agent $i\in T_{y}$ has the $b$ -th lowest binary bit of $\alpha(i)$ as $1$ , then $\alpha(i)\in X$ after Line 4. By Lemma 4.1, in each iteration of the repeat loop, one agent $j\in[i]_{\mathcal{S}^{*}}\cap T_{\mathrm{False}}$ will have $O_{j}=\mathrm{True}$ , and will be removed from $T_{\mathrm{False}}$ . Since the repeat loop terminates only when no agent is removed from $T_{\mathrm{False}}$ in one iteration, $i\not\in T_{\mathrm{False}}$ after the loop.

For (ii), given each binary bit of $\alpha(i)$ , the algorithm computes the index of the agent in $\alpha(i)$ in $T_{y}$ by summing up the values of the binary bits on Line 15. Then, the algorithm merges $i$ with $\alpha(i)$ accordingly on Line 16. The correctness of the algorithm then follows.

For the complexity, the algorithm uses $1$ query on Line 1, and for each $b\in\{0,1,\dots,\lfloor\log_{2}n\rfloor\}$ , the algorithm uses at most $1+c$ queries in the repeat loop in Lines 6 to 11. This shows that the total number of queries used by the algorithm is at most $(1+\log_{2}n)(1+c)+1$ . ∎

5 Conclusion and Future Work

In this paper, we study the Coalition Structure Learning (CSL) problem under the multiple-bit observation oracle. We consider various settings of the type of games that the algorithm may design, including normal-form games, congestion games, graphical games, and auctions. In each of the settings, we present an algorithm that learns the coalition structure with a sublinear number of rounds. We also complement our algorithmic results with lower bounds on the number of rounds required to learn the coalition structure, demonstrating their optimality in most of the settings. Compared to the existing work, our results significantly reduce the number of rounds required to learn the coalition structure, and greatly improve the potential applicability of the CSL problem in real-world systems. Regarding CSL, there are various different settings for future work that are not covered by this paper. It would be interesting to consider the CSL problem (i) when the underlying coalition structure is dynamic, (ii) when the agents are aware of the algorithm and respond strategically, (iii) when the algorithm can only observe a subset of the agents in each round, and (iv) when the observations are noisy or the agents are boundedly rational.

Acknowledgements

This work was supported in part by NSF grant IIS-2046640 (CAREER) and NSF IIS-2200410.

References

Milgrom [2004] Paul Robert Milgrom. Putting auction theory to work. Cambridge University Press, 2004.
Hamilton [2019] Isobel Asher Hamilton. Uber drivers are reportedly colluding to trigger ‘surge’ prices because they say the company is not paying them enough. Business Insider, 2019.
Sweeney [2019] Sam Sweeney. Uber, lyft drivers manipulate fares at reagan national causing artificial price surges. WJLA, 2019.
Dowling [2023] Joshua Dowling. How uber drivers trigger fake surge price periods when no delays exist. Drive, 2023.
Wikipedia [2024] Wikipedia. Libor scandal. https://en.wikipedia.org/wiki/Libor˙scandal, 2024. Accessed: 2024-08-11.
EPRS [2020] EPRS. The ethics of artificial intelligence: Issues and initiatives. European Parliament Research Service, Scientific Foresight Unit (STOA), Panel for the Future of Science and Technology, 2020.
Qu et al. [2023] Ao Qu, Yihong Tang, and Wei Ma. Adversarial attacks on deep reinforcement learning-based traffic signal control systems with colluding vehicles. ACM Transactions on Intelligent Systems and Technology, 14(6):1–22, 2023.
Xu et al. [2024] Yixuan Even Xu, Chun Kai Ling, and Fei Fang. Learning coalition structures with games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 9944–9951, 2024. doi: 10.1609/aaai.v38i9.28856. URL https://ojs.aaai.org/index.php/AAAI/article/view/28856.
Rosenthal [1973] Robert W Rosenthal. A class of games possessing pure-strategy nash equilibria. International Journal of Game Theory, 2:65–67, 1973.
Kearns et al. [2001] Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. In Proceedings of the 17th Conference in Uncertainty in Artificial, Intelligence, 2001, pages 253–260, 2001.
Fudenberg and Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 1 of MIT Press Books. The MIT Press, December 1998.
Waugh et al. [2011] Kevin Waugh, Brian D. Ziebart, and J. Andrew Bagnell. Computational rationalization: the inverse equilibrium problem. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 1169–1176, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
Kuleshov and Schrijvers [2015] Volodymyr Kuleshov and Okke Schrijvers. Inverse game theory: Learning utilities in succinct games. In Evangelos Markakis and Guido Schäfer, editors, Web and Internet Economics, pages 413–427, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.
Ling et al. [2018] Chun Kai Ling, Fei Fang, and J. Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 396–402. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi: 10.24963/ijcai.2018/55. URL https://doi.org/10.24963/ijcai.2018/55.
Letchford et al. [2009] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Marios Mavronicolas and Vicky G. Papadopoulou, editors, Algorithmic Game Theory, pages 250–262, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
Balcan et al. [2015] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334105.
Haghtalab et al. [2016] Nika Haghtalab, Fei Fang, Thanh H. Nguyen, Arunesh Sinha, Ariel D. Procaccia, and Milind Tambe. Three strategies to success: learning adversary models in security games. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 308–314. AAAI Press, 2016. ISBN 9781577357704.
Wu et al. [2022] Jibang Wu, Weiran Shen, Fei Fang, and Haifeng Xu. Inverse game theory for stackelberg games: the blessing of bounded rationality. Advances in Neural Information Processing Systems, 35:32186–32198, 2022.
Bonjour et al. [2022] Trevor Bonjour, Vaneet Aggarwal, and Bharat Bhargava. Information theoretic approach to detect collusion in multi-agent games. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 223–232. PMLR, 01–05 Aug 2022.
Mazrooei et al. [2013] Parisa Mazrooei, Christopher Archibald, and Michael Bowling. Automating collusion detection in sequential games. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1):675–682, Jun. 2013. doi: 10.1609/aaai.v27i1.8674. URL https://ojs.aaai.org/index.php/AAAI/article/view/8674.
Beigman and Vohra [2006] Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 36–42, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595932364.
Zadimoghaddam and Roth [2012] Morteza Zadimoghaddam and Aaron Roth. Efficiently learning from revealed preference. In Paul W. Goldberg, editor, Internet and Network Economics, pages 114–127, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
Balcan et al. [2014] Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V. Vazirani. Learning economic parameters from revealed preferences. In Tie-Yan Liu, Qi Qi, and Yinyu Ye, editors, Web and Internet Economics, pages 338–353, Cham, 2014. Springer International Publishing.
Blum et al. [2014] Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper˙files/paper/2014/file/cc1aa436277138f61cda703991069eaf-Paper.pdf.
Amin et al. [2015] Kareem Amin, Rachel Cummings, Lili Dworkin, Michael Kearns, and Aaron Roth. Online learning and profit maximization from revealed preferences. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 770–776. AAAI Press, 2015. ISBN 0262511290.
Roth et al. [2016] Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: optimizing from revealed preferences feedback. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, page 949–962, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341325.
Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
Daskalakis et al. [2021] Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
De Bruijn [1981] Nicolaas Govert De Bruijn. Asymptotic methods in analysis, volume 4. Courier Corporation, 1981.
Paes Leme et al. [2016] Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. A field guide to personalized reserve prices. In Proceedings of the 25th international conference on world wide web, pages 1093–1102, 2016.

Appendix A Multiple-bit CSL with Congestion Games

To state our result for congestion games, we first recall the definition of a congestion game.

Definition A.1 ([9]).

A congestion game $\mathcal{G}$ is a tuple $(N,R,(A_{i})_{i\in N},(c_{i})_{r\in R})$ where $N$ is the set of agents, $R$ is the set of resources, $A_{i}\subseteq 2^{R}\setminus\{\varnothing\}$ is the strategy space of agent $i$ , and $c_{r}:\mathbb{N}\to\mathbb{R}$ is the cost function of resource $r$ . For a strategy profile $\Sigma=(\sigma_{i})_{i\in N}$ where $\sigma_{i}\in A_{i}$ , the utility of agent $i$ is given by $u_{i}(\Sigma)=-\sum_{r\in\sigma_{i}}c_{r}(|\{j\in N\mid r\in\sigma_{j}\}|)$ .

Clearly, a directed prisoner’s dilemma is not a congestion game, so we cannot directly apply Algorithm 1 to solve Multiple-bit CSL with congestion games. However, as we will show in this subsection, we can design another type of game-strategy pair, directed Braess’s paradox, which is a congestion game, and functions similarly to directed prisoner’s dilemmas in the context of Multiple-bit CSL. We will then show that using directed Braess’s paradoxes, we can solve Multiple-bit CSL with congestion games in $\log_{2}n+2$ rounds with an algorithm similar to Algorithm 1.

Definition A.2.

For $i\in N,j\in N$ , a directed Braess’s paradox $\mathcal{B}(i,j)$ is a game-strategy pair $(\mathcal{G},\Sigma)$ . Here, $\mathcal{G}$ is a congestion game $(N,R,(A_{i})_{i\in N},(c_{i})_{r\in R})$ where $R=\{r_{1},r_{2},r_{3}\}$ . The strategy spaces of agent $i$ and $j$ are $\{\{r_{1}\}\}$ and $\{\{r_{2}\},\{r_{1},r_{3}\}\}$ respectively, and the strategy spaces for all other agents are $\{\varnothing\}$ . The cost functions are $c_{r_{1}}(x)=x$ , $c_{r_{2}}(x)=2.5$ and $c_{r_{3}}(x)=0$ for all $x\in\mathbb{N}$ . The strategy profile $\Sigma$ is such that $\sigma_{i}=\{r_{1}\}$ , $\sigma_{j}=\{r_{1},r_{3}\}$ and $\sigma_{k}=\varnothing$ for all $k\in N\setminus\{i,j\}$ .

We illustrate the definition of directed Braess’s paradox in Fig. 4.

Figure 4: Illustration of the directed Braess’s paradox

\mathcal{B}(i,j)

. Agent

i

needs to choose a path from

S_{i}

T

and agent

j

needs to choose a path from

S_{j}

T

. The cost functions are indicated on the edges.

Lemma A.1.

Let $i,j$ be two agents in $N$ and $\mathbf{O}=\mathcal{O}(\mathcal{B}(i,j))$ . Then, $O_{j}=\mathrm{True}$ if and only if agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ . Moreover, $O_{k}=\mathrm{False}$ for $k\in N\setminus\{j\}$ .

Proof.

If agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ , then choosing $\{r_{2}\}$ is a dominant strategy for agent $j$ ’s coalition. Otherwise, choosing $\{r_{1},r_{3}\}$ is a dominant strategy. Therefore, $O_{j}=\mathrm{True}$ if and only if agent $i$ and agent $j$ are in the same coalition under $\mathcal{S}^{*}$ . Moreover, since all agents other than $j$ only have one choice (either $\{r_{1}\}$ or $\{\varnothing\}$ ), $O_{k}=\mathrm{False}$ for $k\in N\setminus\{j\}$ . ∎

Lemma A.1 shows that directed Braess’s paradoxes function just like directed prisoner’s dilemmas in Multiple-bit CSL. With an argument similar to the proof from Lemma 3.1 to Lemma 3.2, we can also show the following corollary about the product of multiple directed Braess’s paradoxes.

Corollary A.2.

Let $\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\}$ be $k$ pairs of agents and $\mathbf{O}=\mathcal{O}(\prod_{x=1}^{k}\mathcal{B}(i_{x},j_{x}))$ . Then, $O_{j}=\mathrm{True}$ if and only if there exists $x\in\{1,2,\dots,k\}\textbf{ such that }j_{x}=j$ and $i_{x}\in[j]_{\mathcal{S}^{*}}$ .

To use the product of multiple directed Braess’s paradoxes in an algorithm to solve Multiple-bit CSL with congestion games, we also need to show that it is still a congestion game. We will show that congestion games are closed under the product operation in the following lemma.

Lemma A.3.

Let $(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2})$ be two game-strategy pairs and let $(\mathcal{G}_{p},\Sigma_{p})=(\mathcal{G}_{1},\Sigma_{1})\times(\mathcal{G}_{2},\Sigma_{2})$ . If $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are both congestion games on $N$ , then $\mathcal{G}_{p}$ is also a congestion game on $N$ .

Proof.

Let $\mathcal{G}_{1}=(N,R_{1},(A_{1,i})_{i\in N},(c_{1,i})_{r\in R_{1}})$ and $\mathcal{G}_{2}=(N,R_{2},(A_{2,i})_{i\in N},(c_{2,i})_{r\in R_{2}})$ . We will write $\mathcal{G}_{p}$ as a congestion game. According to Definition 3.3, in $\mathcal{G}_{p}$ , the strategy space of agent $i$ is $A_{i}=A_{1,i}\times A_{2,i}$ . Moreover, given a strategy profile $\Sigma=(\sigma_{1,i},\sigma_{2,i})_{i\in N}$ , the utility of agent $i$ is given by $u_{i}(\Sigma)=-\sum_{r\in\sigma_{1,i}}c_{1,r}(|\{j\in N\mid r\in\sigma_{1,j}\}|)-\sum_{r\in\sigma_{2,i}}c_{2,r}(|\{j\in N\mid r\in\sigma_{2,j}\}|)$ . This shows that $\mathcal{G}_{p}=(N,(R_{1},R_{2}),(A_{i})_{i\in N},((c_{1,i})_{r\in R_{1}},\\ (c_{2,i})_{r\in R_{2}}))$ is a congestion game on $N$ . ∎

With Corollaries A.2 and A.3, we solve Multiple-bit CSL with congestion games just like we did with normal-form games in Algorithm 1. The only difference is that we query directed Braess’s paradoxes instead of directed prisoner’s dilemmas. We present the algorithm below.

Input: The number of agents

n

and the multiple-bit observation oracle

\mathcal{O}

Output: A coalition structure

\mathcal{S}

of the agents.

3Query

\prod_{i\in N,j\in N,i<j}\mathcal{B}(i,j)

and observe

\mathbf{O}

;

4 Let

T_{j}\leftarrow\{1,2,\dots,j-1\}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }\varnothing

\textbf{ for each }j\in N

;

5 while $\exists j\in N\textbf{ such that }|T_{j}|\geq 2$ do

6 Let

L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\}

\textbf{ for each }j\in N

;

7 Let

R_{j}\leftarrow T_{j}\setminus L_{j}

for each

j\in N

;

8 Query

\prod_{j\in N,i\in L_{j}}\mathcal{B}(i,j)

and observe

\mathbf{O}

;

9 Let

T_{j}\leftarrow L_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }R_{j}

\textbf{ for each }j\in N

;

11Let

\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\}

;

12 for $j\in N\textbf{ such that }|T_{j}|\neq\varnothing$ do

13 Let

i\leftarrow\text{the only element in $T_{j}$}

;

14 Merge

[i]_{\mathcal{S}}

and

[j]_{\mathcal{S}}

\mathcal{S}

;

16return $\mathcal{S}$ ;

Algorithm 4 Simultaneous Binary Search with Congestion Games

Theorem A.1.

Algorithm 4 solves Multiple-bit CSL with congestion games in $\log_{2}n+2$ rounds.

Proof.

The proof is almost identical to the proof of Theorem 3.1. We first show the correctness of the algorithm. To do this, we will show three claims: (i) after Lines 1 to 7, for each $j\in N$ , if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , $T_{j}=\varnothing$ , otherwise, $T_{j}$ contains only the smallest index of the agents in $[j]_{S^{*}}$ ; (ii) after Lines 8 to 11, $\mathcal{S}=\mathcal{S}^{*}$ ; (iii) the games queried in the algorithm are congestion games. (iii) is implied by Lemma A.3, so we only need to show (i) and (ii).

For (i), if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , then by Corollary A.2, $O_{j}=\mathrm{False}$ , and $T_{j}=\varnothing$ on Line 2. Throughout the algorithm, $T_{j}$ remains $\varnothing$ . Otherwise, $T_{j}$ contains the smallest index of the agents in $[j]_{S^{*}}$ after Line 2. In the while loop, $T_{j}$ is updated to $L_{j}$ if $O_{j}=\mathrm{True}$ , and to $R_{j}$ otherwise. Since $L_{j}$ contains the smallest $\lfloor|T_{j}|/2\rfloor$ elements in $T_{j}$ , by Corollary A.2, $O_{j}=\mathrm{True}$ if and only if the smallest index of the agents in $[j]_{S^{*}}$ is in $L_{j}$ . Therefore, after one iteration of the loop, $T_{j}$ still contains only the smallest index of the agents in $[j]_{S^{*}}$ , while the size of $T_{j}$ is halved. The loop terminates after $|T_{j}|=1$ , and $T_{j}$ contains only the smallest index of the agents in $[j]_{S^{*}}$ .

Appendix B Multiple-bit CSL with Graphical Games

We then proceed to the case of Multiple-bit CSL with graphical games. Recall that in a graphical game [10], each agent $i$ is associated with a vertex in an undirected graph $G$ . We use $N_{i}\subseteq N$ to denote the set of neighbors of agent $i$ in $G$ , including agent $i$ itself.

Definition B.1.

A graphical game $\mathcal{G}$ is a tuple $(N,G,(A_{i})_{i\in N},(M_{i})_{i\in N})$ where $N$ is the set of agents, $G$ is an undirected graph, $A_{i}$ is the strategy space of agent $i$ , and $M_{i}:\prod_{j\in N_{i}}A_{j}\to\mathbb{R}$ is the local game matrix of agent $i$ . For a strategy profile $\Sigma=(\sigma_{i})_{i\in N}$ where $\sigma_{i}\in A_{i}$ , the utility of agent $i$ is given by $u_{i}(\Sigma)=M_{i}(\Sigma_{N_{i}})$ , where $\Sigma_{N_{i}}=(\sigma_{j})_{j\in N_{i}}$ consists only of the strategies of agents in $N_{i}$ . Here, $N_{i}$ denotes the set of neighbors of agent $i$ in undirected graph $G$ .

If we let the undirected graph $G$ be a complete graph, then we recover the normal-form game setting. However, the reason why we consider Multiple-bit CSL with graphical games is that graphical games are compact representations of normal-form games. When the degree of each vertex in the graph is small, the graphical game representation is far succincter than the normal-form game representation. Therefore, we consider in this section the case where the algorithm can only design degree- $d$ graphical games, where the degree of each vertex in the graph is at most $d$ . In this case, we will need at least $\lceil\frac{n-1}{d}\rceil$ rounds to solve the Multiple-bit CSL problem. To prove this claim formally, we first show with the following lemma that in some cases, the observations reveal no information about whether a certain pair of agents are in the same coalition under $\mathcal{S}^{*}$ .

Lemma B.1.

Let $i,j$ be two agents in $N$ and let $\mathcal{S}_{0}=\{\{1\},\{2\},\dots,\{n\}\},\mathcal{S}_{1}=\mathcal{S}_{0}\cup\{\{i,j\}\}\setminus\{\{i\},\{j\}\}$ be two coalition structures. If we query the oracle for a graphical game $\mathcal{G}$ where edge $(i,j)$ is not in the undirected graph $G$ , then the observation reveals no information about whether $\mathcal{S}^{*}=\mathcal{S}_{0}$ or $\mathcal{S}^{*}=\mathcal{S}_{1}$ .

Proof.

Let $\mathcal{G}=(N,G,(A_{i})_{i\in N},(M_{i})_{i\in N})$ and let $\Sigma=(\sigma_{i})_{i\in N}$ be the specified strategy profile in $\mathcal{G}$ . Consider agent $i$ ’s deviation decision $\mathcal{O}_{i}(\mathcal{G},\Sigma)$ . Let $\mathrm{BR}_{i,\mathcal{S}_{0}}(\{i\},\Sigma)$ and $\mathrm{BR}_{j,\mathcal{S}_{0}}(\{j\},\Sigma)$ be the set of best responses of coalitions $\{i\}$ and $\{j\}$ to $\Sigma_{-i}$ and $\Sigma_{-j}$ if $\mathcal{S}^{*}=\mathcal{S}_{0}$ respectively. Since the local game matrix $M_{i}$ only depends on $\Sigma_{N_{i}}\not\ni j$ , no matter what strategy agent $j$ plays, the best response of $\{i\}$ to $\Sigma_{-i}$ is the same. Therefore, the set of best responses of coalition $\{i,j\}$ if $\mathcal{S}^{*}=\mathcal{S}_{1}$ is $\mathrm{BR}_{i,\mathcal{S}_{0}}(\{i\},\Sigma)\times\mathrm{BR}_{j,\mathcal{S}_{0}}(\{j\},\Sigma)$ . This shows that whether $\mathcal{S}^{*}=\mathcal{S}_{0}$ or $\mathcal{S}^{*}=\mathcal{S}_{1}$ does not affect the deviation decision of agent $i$ . The same argument applies to agent $j$ . ∎

Theorem B.1.

Any algorithm that solves the Multiple-bit CSL problem with degree- $d$ graphical games requires at least $\lceil\frac{n-1}{d}\rceil$ rounds of interactions with the agents in the worst case.

Proof.

Suppose that the algorithm has queried the observation oracle $k$ times with game-strategy pairs $\{(\mathcal{G}_{1},\Sigma_{1}),(\mathcal{G}_{2},\Sigma_{2}),\dots,(\mathcal{G}_{k},\Sigma_{k})\}$ , where $\mathcal{G}_{i}=(N,G_{i},(A_{i})_{i\in N},(M_{i})_{i\in N})$ is a degree- $d$ graphical game. According to Lemma B.1, if the observations always behave as if $\mathcal{S}^{*}=\{\{1\},\{2\},\dots,\{n\}\}$ , then the algorithm cannot finalize the answer until for each pair of agents $i,j$ , the algorithm has queried the oracle for a graphical game where edge $(i,j)$ is in the undirected graph. Since the degree of each vertex in the graph is at most $d$ , the algorithm needs to query the oracle for at least $\lceil\frac{n-1}{d}\rceil$ different graphical games to include all edges adjacent to vertex $1$ . The theorem follows. ∎

Then, we present an algorithm that solves the Multiple-bit CSL problem with degree- $d$ graphical games in $\frac{2n}{d}+2\log_{2}d+1$ rounds of interactions with the agents in the worst case. Note that $d\leq n$ , given the lower bounds in Theorems 2.1 and B.1, the algorithm is optimal up to a constant factor.

The algorithm uses the product of multiple directed prisoner’s dilemmas to query the observation oracle. We will first show that such a product can be represented as a degree- $d$ graphical game.

Lemma B.2.

Let $\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\}$ be $k$ pairs of agents and $(\mathcal{G}_{p},\Sigma_{p})=\prod_{x=1}^{k}\mathcal{P}(i_{x},j_{x})$ . Then, $\mathcal{G}_{p}$ can be written as a degree- $d$ graphical game, where $d=\max\{\sum_{x=1}^{k}(\mathbb{I}\left[\vphantom{\sum}i=i_{x}\right]+\mathbb{I}\left[\vphantom{\sum}i=j_{x}\right])\mid i\in N\}$ , i.e., the maximum number of occurrences of any agent in the pairs.

Proof.

For two agents $i\in N,j\in N$ , if $(i,j)$ does not occur in any of the pairs, then, according to Definition 3.2, the utility of agent $i$ does not depend on agent $j$ ’s strategy in any of the directed prisoner’s dilemmas $\mathcal{P}(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}$ . Since for each agent, the utility in the product game is the sum of the utilities in the factor games, the utility of agent $i$ in the product game also does not depend on agent $j$ ’s strategy. This shows that $\mathcal{G}_{p}$ can be represented as a graphical game with the graph’s edge set being $\{(i_{x},j_{x})\mid x\in\{1,2,\dots,k\}\}$ . The lemma then follows. ∎

We then present the algorithm as Algorithm 5.

Input: The number of agents

n

, the degree limit

d

and the multiple-bit observation oracle

\mathcal{O}

Output: A coalition structure

\mathcal{S}

of the agents.

3Let

size\leftarrow\lfloor\frac{d}{2}\rfloor

and

cnt\leftarrow\lceil\frac{n}{size}\rceil

;

4 Let

belong_{j}\leftarrow\lfloor\frac{j-1}{size}\rfloor\textbf{ for each }j\in N

;

5 for $\delta\in\{0,1,\dots,cnt-1\}$ do

6 Query

\prod_{i\in N,j\in N,i<j,belong_{j}-belong_{i}=\delta}\mathcal{P}(i,j)

and observe

\mathbf{O}^{(\delta)}

;

8Let

S_{j}\leftarrow\{\delta\mid O^{(\delta)}_{j}=\mathrm{True}\}\textbf{ for each }j\in N

;

9 Let

\delta_{j}\leftarrow-1\textbf{ if }S_{j}=\varnothing\textbf{ else }\min(S_{j})

\textbf{ for each }j\in N

;

10 Let

T_{j}\leftarrow\{i\mid i<j,belong_{j}-belong_{i}=\delta_{j}\}\textbf{ if }\delta_{j}\neq-1\textbf{ else }\varnothing\textbf{ for each }j\in N

;

11 while $\exists j\in N\textbf{ such that }\delta_{j}=0,|T_{j}|\geq 2$ do

12 Let

L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\}

for each

j\in N,\delta_{j}=0

;

13 Let

R_{j}\leftarrow T_{j}\setminus L_{j}

for each

j\in N,\delta_{j}=0

;

14 Query

\prod_{j\in N,\delta_{j}=0,i\in R_{j}}\mathcal{P}(i,j)

and observe

\mathbf{O}

;

15 Let

T_{j}\leftarrow R_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }L_{j}

\textbf{ for each }j\in N,\delta_{j}=0

;

17while $\exists j\in N\textbf{ such that }\delta_{j}\geq 1,|T_{j}|\geq 2$ do

18 Let

L_{j}\leftarrow\{\text{the smallest $\lfloor\frac{|T_{j}|}{2}\rfloor$ elements in $T_{j}$}\}

for each

j\in N,\delta_{j}\geq 1

;

19 Let

R_{j}\leftarrow T_{j}\setminus L_{j}

for each

j\in N,\delta_{j}\geq 1

;

20 Query

\prod_{j\in N,\delta_{j}\geq 1,i\in R_{j}}\mathcal{P}(i,j)

and observe

\mathbf{O}

;

21 Let

T_{j}\leftarrow R_{j}\textbf{ if }O_{j}=\mathrm{True}\textbf{ else }L_{j}

\textbf{ for each }j\in N,\delta_{j}\geq 1

;

23Let

\mathcal{S}\leftarrow\{\{1\},\{2\},\dots,\{n\}\}

;

24 for $j\in N\textbf{ such that }|T_{j}|\neq\varnothing$ do

25 Let

i\leftarrow\text{the only element in $T_{j}$}

;

26 Merge

[i]_{\mathcal{S}}

and

[j]_{\mathcal{S}}

\mathcal{S}

;

28return $\mathcal{S}$ ;

Algorithm 5 Block Decomposition and Simultaneous Binary Search with Graphical Games

Compared with the normal-form game setting, with degree- $d$ graphical games, the algorithm can only query products of multiple directed prisoner’s dilemmas subject to the degree constraint $d$ as described in Lemma B.2. To comply with this constraint, Algorithm 5 introduces an important idea, block decomposition, i.e., to partition the agents into blocks of size $\lfloor\frac{d}{2}\rfloor$ and choose pairs to query according to the decomposition (Lines 1 to 2). As long as for each query, each agent is involved in directed prisoner’s dilemmas with at most two blocks of agents, the constraint is not violated.

The general idea of Algorithm 5 is to find for each agent $j$ , the “predecessor” of $j$ , i.e., the agent with the largest index that is smaller than $j$ in $[j]_{\mathcal{S}^{*}}$ . To do this, the algorithm takes a two-step approach. First, the algorithm tries to find which block the predecessor of each agent belongs to (Lines 3 to 6). We illustrate this process with an example in Fig. 5. The algorithm enumerates block index gap $\delta$ and constructs one query for each $\delta$ . The query for a given $\delta$ includes $\mathcal{P}(i,j)$ of all pairs of agents $i,j$ such that the difference between the block indices they belong to, i.e., $belong_{j}-belong_{i}$ , is $\delta$ (Lines 3 to 4). In this way, an agent $j$ will be involved in directed prisoner’s dilemmas with agents in at most two blocks, block $belong_{j}+\delta$ and block $belong_{j}-\delta$ in one query. Moreover, after the enumeration, since the algorithm observes for each agent $i$ and each block before $belong_{i}$ , whether there is an agent in the same coalition of $i$ in that block, it gets to determine the block index of each agent’s predecessor or conclude that it has no predecessors (Lines 5 to 6).

Then, the algorithm carries out a simultaneous binary search to find the predecessor of each agent (Lines 7 to 17). The algorithm first deals with the agents whose predecessors are in the same block as themselves (Lines 7 to 12). In this process, each agent $j$ will be involved in games only within its own block in the queries. Next, the algorithm deals with the agents whose predecessors are in a different block (Lines 13 to 17). In this process, each agent $j$ will be involved in games with its predecessor’s block, and agents outside block $belong_{j}$ whose predecessors are in block $belong_{j}$ . The number of such agents is also at most twice the block size.

Finally, the algorithm merges each agent with its predecessor (Lines 18 to 21).

(a) When

\delta=0

(b) When

\delta=1

\delta=2

Figure 5: Example execution of Lines 1 to 6 in Algorithm 5 when

n=6

and

size=2

. The vertices represent the agents and the edges represent the directed prisoner’s dilemmas that the algorithm queries for each

\delta=\{0,1,2\}

. The algorithm first partitions the agents into

cnt=\lceil\frac{n}{size}\rceil=3

blocks, each containing at most

size=2

agents (Lines 1 to 2). Then, for each

\delta=\{0,1,2\}

, the algorithm queries the oracle for the directed prisoner’s dilemmas that correspond to the edges shown in the figure (Lines 3 to 4). Using the observations, the algorithm then determines for each agent

j

\delta_{j}=belong_{j}-belong_{i}

, where

i

j

’s predecessor in

[j]_{\mathcal{S}^{*}}

(Lines 5 to 6). Note that in each query, any agent

j

is involved in at most

2size\leq d

games.

Theorem B.2.

Let $d\geq 2$ be an even number, Algorithm 5 solves the Multiple-bit CSL problem with degree- $d$ graphical games in $\frac{2n}{d}+2\log_{2}d+1$ rounds of interactions with the agents in the worst case.

Proof.

We first show the correctness of the algorithm. To do this, we will show four claims: (i) after Lines 1 to 6, for each $j\in N$ , if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , $\delta_{j}=-1$ ; otherwise, let $i$ be the largest index in $[j]_{S^{*}}$ that is smaller than $j$ , $\delta_{j}=belong_{j}-belong_{i}$ ; (ii) after Lines 7 to 17, for each $j\in N$ , if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , $T_{j}=\varnothing$ ; otherwise, $T_{j}$ contains only the largest index in $[j]_{S^{*}}$ that is smaller than $j$ ; (iii) after Lines 8 to 21, $\mathcal{S}=\mathcal{S}^{*}$ ; (iv) the games queried in Lines 4, 11, and 16 are degree- $d$ graphical games.

For (i), if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , then by Lemma 3.2, $O^{(\delta)}_{j}=\mathrm{False}$ , for all $\delta\in\{0,1,\dots,cnt-1\}$ on Line 4. As a result, $S_{j}=\varnothing$ on Line 5, and $\delta_{j}=-1$ on Line 6. Otherwise, let $i$ be the largest index in $[j]_{S^{*}}$ that is smaller than $j$ . Then, by Lemma 3.2, $O^{(\delta)}_{j}=\mathrm{True}$ for $\delta=belong_{j}-belong_{i}$ on Line 4 since $\mathcal{P}(i,j)$ is included as a factor game. Moreover, $O^{(\delta)}_{j}=\mathrm{False}$ for all $\delta<belong_{j}-belong_{i}$ since $i$ is the largest index in $[j]_{S^{*}}$ that is smaller than $j$ . Therefore, $belong_{j}-belong_{i}$ becomes the minimum of $S_{j}$ on Line 5, and thus $\delta_{j}=belong_{j}-belong_{i}$ on Line 6.

For (ii), if $j$ is the agent with the smallest index in $[j]_{S^{*}}$ , then $T_{j}=\varnothing$ on Line 7. Throughout the algorithm, $T_{j}$ remains $\varnothing$ . Otherwise, $T_{j}$ contains the largest index, $i$ , in $[j]_{S^{*}}$ that is smaller than $j$ after Line 7. Depending on whether $belong_{j}-belong_{i}=0$ or not, either Lines 8 to 12 or Lines 13 to 17 will carry out a binary search to find this index $i$ . Note that in the while loop, $T_{j}$ is updated to $R_{j}$ if $O_{j}=\mathrm{True}$ , and to $L_{j}$ otherwise. Since $R_{j}$ contains the largest $\lceil|T_{j}|/2\rceil$ elements in $T_{j}$ , by Lemma 3.2, $O_{j}=\mathrm{True}$ if and only if the largest index $i$ in $[j]_{S^{*}}$ that is smaller than $j$ is in $R_{j}$ . Thus, the binary search will find this index $i$ and end up with $T_{j}=\{i\}$ .

For (iii), since every agent $j$ is either the agent with the smallest index in $[j]_{S^{*}}$ or merged with its predecessor (the agent with the largest index that is smaller than $j$ ) in $[j]_{S^{*}}$ , each coalition is merged in $\mathcal{S}$ after Lines 18 to 21. Thus $\mathcal{S}$ becomes the same as $\mathcal{S}^{*}$ .

For (iv), consider an agent $j$ . In the game queried on line 4, agent $j$ is involved in at most $2size\leq d$ factor games with agents $\{i\mid|belong_{j}-belong_{i}|=\delta\}$ ; in the game queried on line 11, agent $j$ is involved in at most $size\leq d$ factor games with agents $\{i\mid belong_{i}=belong_{j}\}$ ; in the game queried on line 16, agent $j$ is involved in at most $2size\leq d$ factor games with agents $\{i\mid i<j,belong_{j}-belong_{i}=\delta_{j}\}\cup\{i\mid i>j,\delta_{i}\geq 1,belong_{i}-belong_{j}=\delta_{i}\}$ . Therefore, the games queried in Lines 4, 11, and 16 are degree- $d$ graphical games by Lemma B.2.

Next, we show the complexity of the algorithm. The for loop in Lines 3 to 4 requires $cnt=\lceil\frac{n}{size}\rceil=\lceil\frac{2n}{d}\rceil$ queries, and the while loops in Lines 8 to 12 and Lines 13 to 17 require at most $\lceil\log_{2}size\rceil=\lceil\log_{2}d\rceil-1$ queries each. Therefore, the total number of queries is at most $\lceil\frac{2n}{d}\rceil+2\lceil\log_{2}d\rceil-2\leq\frac{2n}{d}+2\log_{2}d+1$ . This completes the proof. ∎

Appendix C Omitted Proofs

C.1 Proof of Theorem 4.1

See 4.1

Proof.

We first show the correctness of the algorithm. Consider the if statement on Line 8, if there exists $j$ such that $O_{j}=\mathrm{True}$ , then by Lemma 4.1, exactly one agent $j$ in $T$ has $O_{j}=\mathrm{True}$ , and agents $i$ and $j$ are in the same coalition under $\mathcal{S}^{*}$ . The algorithm then merges $[i]_{\mathcal{S}}$ and $[j]_{\mathcal{S}}$ in $\mathcal{S}$ , and removes $j$ from $T$ . Otherwise, by Lemma 4.1, no agent in $T$ is in $[i]_{\mathcal{S}^{*}}$ . The algorithm then removes $i$ from $T$ . Therefore, the inner while loop on Lines 5 to 12 finalizes agent $i$ ’s coalition in $\mathcal{S}$ correctly, and removes the whole coalition from $T$ . The correctness of the algorithm then follows.

For the complexity, the algorithm starts with $T=N$ , and after each query on Line 7, it removes one agent from $T$ . Since the algorithm terminates when $|T|=1$ , it uses $n-1$ queries in total. ∎

Deviate or Not: Learning Coalition Structures with Multiple-bit Observations in Games

Abstract

1 Introduction

1.1 Our Results

1.2 Related Work

2 Notations and Preliminaries

Behavior model.

Multiple-bit observation oracle.

2.1 Lower Bound of the Number of Rounds

Theorem 2.1.

Proof.

3 Multiple-bit CSL with Normal-form Games

Definition 3.1.

Definition 3.2.

Lemma 3.1.

Proof.

Definition 3.3 ([8]).

Lemma 3.2.

Proof.

Theorem 3.1.

Proof.

Multiple-bit CSL with Other Types of Games.

4 Multiple-bit CSL with Auctions

Definition 4.1.

Definition 4.2.

Lemma 4.1.

Proof.

Theorem 4.1.

Lemma 4.2.

Proof.

Theorem 4.2.

Proof.

5 Conclusion and Future Work

Acknowledgements

References

Appendix A Multiple-bit CSL with Congestion Games

Definition A.1 ([9]).

Definition A.2.

Lemma A.1.

Proof.

Corollary A.2.

Lemma A.3.

Proof.

Theorem A.1.

Proof.

Appendix B Multiple-bit CSL with Graphical Games

Definition B.1.

Lemma B.1.

Proof.

Theorem B.1.

Proof.

Lemma B.2.

Proof.

Theorem B.2.

Proof.

Appendix C Omitted Proofs

C.1 Proof of Theorem 4.1

Proof.

Deviate or Not: Learning Coalition Structures
with Multiple-bit Observations in Games