Deviate or Not: Learning Coalition Structures
with Multiple-bit Observations in Games
Abstract
We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation – for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.
1 Introduction
Coalitions and collusive behavior are common in the real world. In auction systems, bidders may form coalitions to coordinate bids and exploit mechanisms for better odds of winning [1]. On ridesharing platforms like Uber and Lyft, some drivers might disconnect simultaneously to create an artificial shortage, triggering a price surge they can later benefit from [2, 3, 4]. While illegal collusion, like price fixing, is regulated by agencies such as the SEC in the United States, it remains challenging to eliminate, as seen in scandals like the LIBOR Scandal [5]. As AI systems become more advanced and widely used in domains like stock trading and autonomous driving, regulating these AI agents presents a growing challenge as seen by regulators and researchers [6, 7].
In response to such concerns, the problem of coalition structure learning (CSL) [8] has recently gained attention. In CSL, a platform aims to figure out which agents are in a coalition by designing a small number of games for the agents to play and make observations on the outcomes of these games. Notably, more than one coalition may exist simultaneously. In practice, the platform could represent a regulator, like an auction or ridesharing platform, aiming to detect collusion or design better mechanisms.
However, the existing study on CSL relies on a strong assumption that the platform can only observe whether a specified strategy profile is a Nash equilibrium (NE) in each round. With this single-bit observation oracle and other assumptions, optimal algorithms require rounds of games to learn the coalition structure, where is the number of agents. Such an observation oracle is too restrictive for two reasons. First, for real-world systems with a large number of agents, rounds is still prohibitively large. Second, it may be possible for the platform to know which agents would deviate from the specified strategy profile. For example, a ridesharing platform could observe which drivers disconnect from the platform given a pricing scheme.
In this paper, we aim to address this limitation and consider a multiple-bit observation oracle: the platform can observe an -dimensional binary vector in each round, where the -th bit indicates whether or not agent would like to deviate from the specified strategy profile. This oracle allows the platform to gain bits of information in each round, which is significantly more informative than the single-bit observation oracle, allowing the platform to learn the coalition structure with much fewer rounds. However, from a technical perspective, switching from the single-bit observation oracle to the multiple-bit observation oracle fundamentally changes the problem. With the single-bit observation, the platform can choose the game to use to get the next bit of observation, which means it is easy to ensure that each bit of observation brings in information. However, with multi-bit observation, the platform will get -bit information in a batch, and it is often inevitable that these -bits contain repetitive or highly correlated information. Therefore, the key challenge in algorithm design is how to design the games so that the bits of observation are more de-correlated and contain more information.
1.1 Our Results
Type of Games | Lower Bound | Complexity | Section |
---|---|---|---|
Normal-Form | Section 3 | ||
Congestion | Appendix A | ||
Graphical | Appendix B | ||
Auctions | Section 4 |
Depending on the specific scenario of application, sometimes external constraints may limit the type of games that can be designed. For example, in an auction platform, the algorithm may only use auctions. Therefore, we study different settings of the type of games the algorithm may design. We summarize our results for different settings in Table 1.
Information-theoretical lower bound. We first show that any algorithm that learns the coalition structure requires at least rounds regardless of the type of games used, where is the number of agents.
Normal-form games. Normal-form games are a general class of games. We consider the case when the platform can design any normal-form game and there are no external constraints on the type of games that can be designed. We show an algorithm, simultaneous binary search, that can learn the coalition structure in rounds using directed prisoner’s dilemmas, a type of designed games for agents to play to elicit coalition structures. The number of rounds matches the lower bound almost exactly.
Congestion games. Congestion games [9] are a well-studied type of games that model the allocation of resources, and has demonstrated its importance in real-world transportation system. For CSL in congestion games, we show that with a different type of games – directed Braess’s paradoxes – the simultaneous binary search algorithm can be adapted to learn the coalition structure with congestion games. The number of rounds required is also .
Graphical games. Graphical games [10] are another thoroughly researched class of games. Investigating CSL with graphical games can be useful in contexts like social networks. We consider degree- graphical games, i.e., each agent has at most neighbors in the underlying graph. In the negative direction, we show that any algorithm for the problem requires at least rounds. In the positive direction, we design an algorithm that couples simultaneous binary search with the idea of block decomposition, i.e., to divide the agents into blocks of size . See the construction of Algorithms 5 and 5 for details of this technique. Our algorithm learns the coalition structure in rounds. Given the two lower bounds, this algorithm is optimal up to a constant factor.
Auctions. Finally, we study CSL with auctions, in particular, second-price auctions with personalized reserves, which are widely adopted in online advertising systems. The format of auctions is much more restrictive and challenging than the previous types of games since there is only one winner in an auction, and it is difficult to elicit information from the other players. Our algorithm in this case learns the coalition structure in rounds, where is the size of the largest coalition. When the coalitions are small, the number of rounds required is close to the lower bound.
1.2 Related Work
Our work lies in the general research direction of learning in games [11]. In particular, our work is closely related to the line of research on Inverse Game Theory, e.g., [12, 13, 14, 15], where these previous papers aim to infer the parameters and underlying payoffs of a game by observing the behavior of players. Our work adopts this perspective but seeks to understand the coalition structure in games, which is not considered in prior works. By strategically and adaptively selecting defender strategies and observing the attacker’s best responses, [16, 17, 18] show that one can uncover the underlying utility functions driving attacker behavior in Stackelberg security games. Our work learns in a similar adaptive manner but focuses on the coalition structure among agents. In addition, learning coalition structure has a fundamental difference from the above existing literature since the space of all possible coalition structures is exponentially large.
The closest related work is [8], where they focus on learning coalition structure through a single bit feedback, i.e., the game designed in the learning process only returns one-bit information – whether a specified strategy profile is a NE or not. In this work, we strictly generalize the model proposed by [8] that allows us to observe multiple-bit feedback rather than just binary feedback. This flexibility strictly increases the set of possible observations and thus the difficulty of designing the games. Moreover, [19, 20] provide general approaches to detect one (collusion) coalition in multi-agent games, whereas, our work aims to find the entire coalition structure in games.
There is also a line of research on learning with revealed preference feedback, e.g., [21, 22, 23, 24, 25, 26], in which they target to predict the strategic agents’ behavior or optimize the profit of the decision maker, through the revealed preference feedback from strategic agents. In this paper, we assume that we can observe the best response feedback of the agents for the games, where each game is carefully designed to elicit information on the coalition structure within a set of agents. Loosely related work includes no-regret learning in games [27, 28]. Indeed, if there is a cost to perform one game to learn coalition structure, our learning process achieves constant regret given the number of rounds needed to identify all coalitions only depends on the number of agents and the structure of the underlying games.
2 Notations and Preliminaries
Consider a set of strategic agents. A coalition is a nonempty subset of the agents, in which the agents can coordinate with each other in games. We assume there is a coalition structure among these agents, which is a set partition of , and each agent in belongs to exactly one of these mutually disjoint coalitions. We use to denote the coalition containing agent under . is not public information, but agents in each coalition know the other agents in the same coalition. In the Coalition Structure Learning (CSL) problem, our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the underlying coalition structure . Specifically, we interact with the agents in a sequence of rounds. Each round, we design a game along with a strategy profile in for the agents, and we make an observation about the strategy profile in . The task is to learn the underlying coalition structure .
Behavior model.
We assume that the agents are rational and strategic. When playing a game , agents in the same coalition under can coordinate with each other to choose actions and will share rewards after the game. Thus, they effectively play as a joint player whose action space and utility are, respectively, the Cartesian product of the action spaces and the sum of utilities of the agents in that coalition. Since the agents are not aware of the underlying coalition structure beyond their own coalitions, the information in the game is incomplete. We do not assume the agents’ full behavior model, but focus on the agents’ deviation decisions from the specified strategy profile.
Multiple-bit observation oracle.
In this paper, we consider the multiple-bit observation oracle as opposed to the single-bit observation oracle considered in [8]. The observation is an -dimensional binary vector, where the -th bit indicates whether or not agent would like to deviate from the specified strategy when the coalition is best responding as a whole. In particular, given a game and a strategy profile , for each coalition , agents in would first determine whether is a joint best response to the other agents’ specified strategy in . If it is, then for each agent , i.e., they would decide not to deviate. Otherwise, they would compute a possible joint best response to the other agents’ specified strategy in . Then, they would compute the deviation decision for each agent . The observation is the concatenation of the decisions of all agents in . If there are multiple best responses for a coalition, we assume that the agents in that coalition would choose any of them in the worst case. We will use “query the observation oracle” or “query ” to refer to the process of designing a game and a strategy profile and receiving the observation .
2.1 Lower Bound of the Number of Rounds
We first show a lower bound on the number of rounds required to solve the Multiple-bit CSL problem. The lower bound is based on an information-theoretical argument similar to [8]. Note that this lower bound holds for any type of games that the algorithm may design.
Theorem 2.1.
Any algorithm that solves the Multiple-bit CSL problem requires at least rounds of interactions with the agents in the worst case.
Proof.
In each round, the algorithm receives at most bits of information. As the number of possible set partitions of is the Bell number , to distinguish between them, we need at least bits of information. The equation follows from the asymptotic expression of Bell number established in [29]. The theorem then follows. ∎
3 Multiple-bit CSL with Normal-form Games
We show in this section an algorithm that solves Multiple-bit CSL with normal-form games in rounds. Note that the number of rounds matches the lower bound in Theorem 2.1 up to . The main idea of the algorithm is to use binary search to find another agent in the same coalition for each agent simultaneously. We start by defining game-strategy pairs and a special pair that we call the directed prisoner’s dilemma, which we use in the algorithm.
Definition 3.1.
A game-strategy pair is a -player normal-form game along with a pure strategy profile in . Here, is called the specified strategy profile of .
Definition 3.2.
For , a directed prisoner’s dilemma is a game-strategy pair . In game , agent has two actions , and all other agents only have one action . If agent plays C, then and receive utility and respectively. Otherwise they both receive utility. All other agents always receive utility. is the pure strategy profile where all agents play D.
Given the construction of directed prisoner’s dilemma, we have the following observation.
Lemma 3.1.
Let and . Then, if and only if agent and agent are in the same coalition under . Moreover, for .
Proof.
If agent and agent are in the same coalition under , then playing C is a dominant strategy for agent ’s coalition. Otherwise, playing D is a dominant strategy. Therefore, if and only if agent and agent are in the same coalition under . Moreover, since all agents other than only have one action , for . ∎
Lemma 3.1 shows that we can use directed prisoner’s dilemmas to determine whether two agents are in the same coalition under . However, querying a single directed prisoner’s dilemma is not efficient as we only get bit of information. To utilize the information from the observation oracle more efficiently, we will need to consider the product of multiple directed prisoner’s dilemmas.
Definition 3.3 ([8]).
Let be two game-strategy pairs where are the action set and utility function of agent in respectively for . Let and . The product of and is a game-strategy pair . Here, is a normal-form game with action set and utility function for each . . We denote the product of and as or . The game-strategy pairs are called the factors of .
Lemma 3.2.
Let be pairs of agents and . Then, if and only if there exists and .
Proof.
By Definition 3.3, playing the product game is equivalent to separately playing each factor game, and summing up the resulting utilities of each agent. Therefore, agent decides to deviate in the product game if and only if agent decides to deviate in at least one factor game. The result then follows from Lemma 3.1. ∎
A graphical interpretation of Lemma 3.2 is as follows. When we query the game-strategy pair , consider a directed graph where each vertex represents an agent and each edge , represents . The observation tells us for each vertex if any of its outgoing neighbors in this constructed graph is in the same coalition. With Lemma 3.2, we are ready for our Algorithm 1 for Multiple-bit CSL with normal-form games.
Intuitively, given an agent , if we want to find another agent in the same coalition with , we can query a game-strategy pair where initially. According to Lemma 3.2, we will know whether there is such an agent in from the -th bit of the observation. If there is such an agent, we can then bisect the set and apply binary search to find this agent. Moreover, this binary search can be done simultaneously for different agents. For example, given two agents and , we can construct the game-strategy pairs and apply binary search at the same time on and using the -th and -th bits of the observation. This is because the -th bit in the observation vector only depends on whether there exists another agent in in the same coalition with agent , and the same argument applies to agent .
Algorithm 1 leverages this intuition and works in two stages. In the first stage (Lines 1 to 7), we try to find for each agent the smallest index of the agents in . To do this, we first query (Line 1). In this game, agent has the incentive to deviate only when another agent with a smaller index than is in the same coalition with . Therefore, using the -th bit from the observation, we can determine whether agent is the agent with the smallest index in (Line 2). For agents who are not, we then use binary search to locate the smallest indexed agents in their coalitions simultaneously (Lines 3 to 6). In the second stage (Lines 8 to 11), it merges each coalition in together according to the result of the first stage.
We provide an example of Algorithm 1 in Fig. 2.
Theorem 3.1.
Algorithm 1 solves Multiple-bit CSL with normal-form games in rounds.
Proof.
We first show the correctness of the algorithm. To do this, we will show two claims: (i) after Lines 1 to 7, for each , if is the agent with the smallest index in , , otherwise, contains only the smallest index of the agents in ; (ii) after Lines 8 to 11, .
For (i), if is the agent with the smallest index in , then by Lemma 3.2, , and on Line 2. Throughout the algorithm, remains . Otherwise, contains the smallest index of the agents in after Line 2. In the while loop (Line 3 to 7), is updated to if , and to otherwise. Since contains the smallest elements in , by Lemma 3.2, if and only if the smallest index of the agents in is in . Therefore, after one iteration of the loop, still contains only the smallest index of the agents in , while the size of is halved. The loop terminates after , and contains only the smallest index of the agents in .
For (ii), since every agent is either the agent with the smallest index in or merged with that agent, each coalition is merged in after Lines 8 to 11. Thus becomes the same as .
Next, we show the complexity of the algorithm. The while loop in Lines 3 to 7 runs at most times, and each iteration requires query. Together with the query on Line 1, the total number of queries is at most . ∎
Multiple-bit CSL with Other Types of Games.
A natural next step after this section is to consider the case where the algorithm can only design games of a specific type. We will show that the idea of simultaneous binary search from Algorithm 1 can be adapted to solve the Multiple-bit CSL problem with congestion games (Appendix A), graphical games (Appendix B), or auctions (Section 4). The adaptation to congestion games is natural once we construct a new game-strategy pair that is similar in spirit to the directed prisoner’s dilemma. However, the adaptation to graphical games and auctions requires more innovations in algorithm design.
4 Multiple-bit CSL with Auctions
We consider in this section the case of Multiple-bit CSL with auctions. The auction format we consider is theoretically well-studied second-price auctions with personalized reserves [30], which is widely used in practice. We first formally define this format.
Definition 4.1.
A second-price auction with personalized reserves is a tuple where is the set of agents, is the valuation vector, and is the reserve price vector. The auction proceeds as follows: Each agent submits a bid and the agent with the highest bid wins the auction with ties broken at uniform random. If the winner bids greater than , then the winner gets the item at a price equal to the maximum between the second highest bid and . The item can be reallocated among . Otherwise, the item is not allocated.
Our algorithm for the Multiple-bit CSL problem with auctions works in rounds, where is size of the largest coalition in . To introduce the algorithm, we first define the game-strategy pairs that we will use in the algorithm, the auction game gadgets, where we will specify the valuation and reserve prices of every agents in each partition bucket.
Definition 4.2.
Let be a partition of , i.e., and . An auction gadget is a game-strategy pair where is a second-price auction with personalized reserves, and is a specified bid vector for the agents, where
Note that in the above auction game gadgets, we do not require each agent to know all the other agents’ valuations so that the valuation of each agent is still private across strategic agents. Only agents in the same coalition know each other’s true valuation.
Lemma 4.1.
For that is a partition of , let . For each coalition , if and , then, there is exactly one agent with and for all . Otherwise, for all .
Proof.
For each coalition . We first consider the case where and . Let agent and agent . If the coalition follows the specified strategy profile , then the item will not be allocated, so the utility of the agents in is . However, If agent raises its bid to , then it becomes the winner of the auction, and the item will be allocated to with price . The item can then be reallocated to agent , giving utility to the agents in . Thus, is not a joint best response to . Moreover, for agents in , the only joint best response is to let exactly one agent in raise its bid to positive, and the other agents in keep their bids at . Thus, the observation is as described in the lemma.
Otherwise, either or . If , then no one in has a positive valuation for the item, so the utility of the agents in cannot be greater than . If , then no one in has a reserve price that is smaller than , which is the largest possible valuation. Thus, the utility of the agents in cannot be greater than , either. In both cases, keeping the bids at is a joint best response to , so the observation is as described in the lemma. ∎
The auction gadgets are important building blocks for our algorithm that works in rounds. To help demonstrate how they can be used to solve Multiple-bit CSL, we present a simpler algorithm that works in rounds using the auction gadgets below.
To provide some intuitions, consider an agent . If we want to find another agent in the same coalition as , we can query the auction gadget . According to Lemma 4.1, if the observation has for some , then and are in the same coalition under . Otherwise, is in a coalition by itself. Then, suppose we have found an agent in ’s coalition, and we would like to find a different agent in that coalition. We can query the auction gadget . By moving agent to the set , we ensure that does not have the incentive to deviate, and either another agent in the same coalition as will have for some , or we can conclude that and are in a coalition by themselves. We can then repeat this process until we have found all the agents in the same coalition as . After that, we can move to the set and proceed with the next agent in . In this way, we can recover the coalition structure . Moreover, since each time we query an auction gadget, we move one agent to set , and the algorithm terminates when , it uses queries in total.
Theorem 4.1.
Algorithm 2 solves the Multiple-bit CSL problem with auctions in rounds.
We defer the proof to Appendix C. Other than the auction gadgets, we will also use another game-strategy pair in the algorithm. We prove the following lemma about the observations obtained by querying this game-strategy pair.
Lemma 4.2.
Let , where are vectors consisting purely of s and s respectively. For each coalition , exactly one agent has .
Proof.
For each coalition , if any agent in raises its bid to positive, then the item will be allocated to with price , giving utility to the agents in . Thus, is not a joint best response to . Moreover, the only joint best response for agents in is to let exactly one agent in raise its bid to positive, and the other agents in keep their bids at . The lemma then follows. ∎
Then, we are ready to present our algorithm that works in rounds for Multiple-bit CSL with auctions. The algorithm is shown in Algorithm 3.
The algorithm starts by querying the game-strategy pair (Line 1). Using the obtained observation, by Lemma 4.2, the algorithm identifies exactly one agent in each coalition, groups them into set and the rest of the agents into set (Lines 2). Essentially, set contains a set of agents who are definitely not in the same coalition with each other. Furthermore, for each agent in , there is exactly one “teammate” of in . We denote the index of this agent as .
An important idea behind Algorithm 3 is bitwise search, i.e., to determine each binary bit of separately. To do this, for each , the algorithm picks the set of agents in such that the -th lowest binary bits of their index are as (Line 4), and tries to tell which agents in are in the same coalition as some agent in by querying auction gadgets. Intuitively, if we let and query , by Lemma 4.1, for each coalition that contains an agent in , one of its members in will decide to deviate. Thus, we can move these agents to like how we did in Algorithm 2, until no agents in decide to deviate. The remaining agents in are then the agents that are not in the same coalition as any agent in . The algorithm leverages this idea to determine each binary bit of separately, and stores the results in (Lines 5 to 12). Note that for each , the algorithm uses at most queries, where is the size of the largest coalition in . We illustrate the execution of the bitwise search with an example in Fig. 3.
Finally, the algorithm recovers for each agent in by summing up the values of the binary bits in (Line 15), and merges with accordingly (Line 16).
Theorem 4.2.
Algorithm 3 solves the Multiple-bit CSL problem with auctions in rounds, where is size of the largest coalition in .
Proof.
We first show the correctness of the algorithm. By Lemma 4.2, after Line 2, contains exactly one agent in each coalition. This means that for each agent , there is exactly one agent such that and are in the same coalition. We denote this agent as . Then, we will show two claims: (i) after Lines 3 to 12, for each and each , if and only if the -th lowest binary bit of is , and (ii) after Lines 13 to 16, .
For (i), we only need to show that after the repeat loop in Lines 6 to 11, contains exactly the agents such that the -th lowest binary bit of is . If an agent has the -th lowest binary bit of as , then after Line 4. Thus, by Lemma 4.1, will never be on Line 8, and after the repeat loop. If an agent has the -th lowest binary bit of as , then after Line 4. By Lemma 4.1, in each iteration of the repeat loop, one agent will have , and will be removed from . Since the repeat loop terminates only when no agent is removed from in one iteration, after the loop.
For (ii), given each binary bit of , the algorithm computes the index of the agent in in by summing up the values of the binary bits on Line 15. Then, the algorithm merges with accordingly on Line 16. The correctness of the algorithm then follows.
For the complexity, the algorithm uses query on Line 1, and for each , the algorithm uses at most queries in the repeat loop in Lines 6 to 11. This shows that the total number of queries used by the algorithm is at most . ∎
5 Conclusion and Future Work
In this paper, we study the Coalition Structure Learning (CSL) problem under the multiple-bit observation oracle. We consider various settings of the type of games that the algorithm may design, including normal-form games, congestion games, graphical games, and auctions. In each of the settings, we present an algorithm that learns the coalition structure with a sublinear number of rounds. We also complement our algorithmic results with lower bounds on the number of rounds required to learn the coalition structure, demonstrating their optimality in most of the settings. Compared to the existing work, our results significantly reduce the number of rounds required to learn the coalition structure, and greatly improve the potential applicability of the CSL problem in real-world systems. Regarding CSL, there are various different settings for future work that are not covered by this paper. It would be interesting to consider the CSL problem (i) when the underlying coalition structure is dynamic, (ii) when the agents are aware of the algorithm and respond strategically, (iii) when the algorithm can only observe a subset of the agents in each round, and (iv) when the observations are noisy or the agents are boundedly rational.
Acknowledgements
This work was supported in part by NSF grant IIS-2046640 (CAREER) and NSF IIS-2200410.
References
- Milgrom [2004] Paul Robert Milgrom. Putting auction theory to work. Cambridge University Press, 2004.
- Hamilton [2019] Isobel Asher Hamilton. Uber drivers are reportedly colluding to trigger ‘surge’ prices because they say the company is not paying them enough. Business Insider, 2019.
- Sweeney [2019] Sam Sweeney. Uber, lyft drivers manipulate fares at reagan national causing artificial price surges. WJLA, 2019.
- Dowling [2023] Joshua Dowling. How uber drivers trigger fake surge price periods when no delays exist. Drive, 2023.
- Wikipedia [2024] Wikipedia. Libor scandal. https://en.wikipedia.org/wiki/Libor˙scandal, 2024. Accessed: 2024-08-11.
- EPRS [2020] EPRS. The ethics of artificial intelligence: Issues and initiatives. European Parliament Research Service, Scientific Foresight Unit (STOA), Panel for the Future of Science and Technology, 2020.
- Qu et al. [2023] Ao Qu, Yihong Tang, and Wei Ma. Adversarial attacks on deep reinforcement learning-based traffic signal control systems with colluding vehicles. ACM Transactions on Intelligent Systems and Technology, 14(6):1–22, 2023.
- Xu et al. [2024] Yixuan Even Xu, Chun Kai Ling, and Fei Fang. Learning coalition structures with games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 9944–9951, 2024. doi: 10.1609/aaai.v38i9.28856. URL https://ojs.aaai.org/index.php/AAAI/article/view/28856.
- Rosenthal [1973] Robert W Rosenthal. A class of games possessing pure-strategy nash equilibria. International Journal of Game Theory, 2:65–67, 1973.
- Kearns et al. [2001] Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. In Proceedings of the 17th Conference in Uncertainty in Artificial, Intelligence, 2001, pages 253–260, 2001.
- Fudenberg and Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 1 of MIT Press Books. The MIT Press, December 1998.
- Waugh et al. [2011] Kevin Waugh, Brian D. Ziebart, and J. Andrew Bagnell. Computational rationalization: the inverse equilibrium problem. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 1169–1176, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
- Kuleshov and Schrijvers [2015] Volodymyr Kuleshov and Okke Schrijvers. Inverse game theory: Learning utilities in succinct games. In Evangelos Markakis and Guido Schäfer, editors, Web and Internet Economics, pages 413–427, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.
- Ling et al. [2018] Chun Kai Ling, Fei Fang, and J. Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 396–402. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi: 10.24963/ijcai.2018/55. URL https://doi.org/10.24963/ijcai.2018/55.
- Letchford et al. [2009] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Marios Mavronicolas and Vicky G. Papadopoulou, editors, Algorithmic Game Theory, pages 250–262, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
- Balcan et al. [2015] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334105.
- Haghtalab et al. [2016] Nika Haghtalab, Fei Fang, Thanh H. Nguyen, Arunesh Sinha, Ariel D. Procaccia, and Milind Tambe. Three strategies to success: learning adversary models in security games. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 308–314. AAAI Press, 2016. ISBN 9781577357704.
- Wu et al. [2022] Jibang Wu, Weiran Shen, Fei Fang, and Haifeng Xu. Inverse game theory for stackelberg games: the blessing of bounded rationality. Advances in Neural Information Processing Systems, 35:32186–32198, 2022.
- Bonjour et al. [2022] Trevor Bonjour, Vaneet Aggarwal, and Bharat Bhargava. Information theoretic approach to detect collusion in multi-agent games. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 223–232. PMLR, 01–05 Aug 2022.
- Mazrooei et al. [2013] Parisa Mazrooei, Christopher Archibald, and Michael Bowling. Automating collusion detection in sequential games. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1):675–682, Jun. 2013. doi: 10.1609/aaai.v27i1.8674. URL https://ojs.aaai.org/index.php/AAAI/article/view/8674.
- Beigman and Vohra [2006] Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 36–42, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595932364.
- Zadimoghaddam and Roth [2012] Morteza Zadimoghaddam and Aaron Roth. Efficiently learning from revealed preference. In Paul W. Goldberg, editor, Internet and Network Economics, pages 114–127, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
- Balcan et al. [2014] Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V. Vazirani. Learning economic parameters from revealed preferences. In Tie-Yan Liu, Qi Qi, and Yinyu Ye, editors, Web and Internet Economics, pages 338–353, Cham, 2014. Springer International Publishing.
- Blum et al. [2014] Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper˙files/paper/2014/file/cc1aa436277138f61cda703991069eaf-Paper.pdf.
- Amin et al. [2015] Kareem Amin, Rachel Cummings, Lili Dworkin, Michael Kearns, and Aaron Roth. Online learning and profit maximization from revealed preferences. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 770–776. AAAI Press, 2015. ISBN 0262511290.
- Roth et al. [2016] Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: optimizing from revealed preferences feedback. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, page 949–962, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341325.
- Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
- Daskalakis et al. [2021] Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
- De Bruijn [1981] Nicolaas Govert De Bruijn. Asymptotic methods in analysis, volume 4. Courier Corporation, 1981.
- Paes Leme et al. [2016] Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. A field guide to personalized reserve prices. In Proceedings of the 25th international conference on world wide web, pages 1093–1102, 2016.
Appendix A Multiple-bit CSL with Congestion Games
To state our result for congestion games, we first recall the definition of a congestion game.
Definition A.1 ([9]).
A congestion game is a tuple where is the set of agents, is the set of resources, is the strategy space of agent , and is the cost function of resource . For a strategy profile where , the utility of agent is given by .
Clearly, a directed prisoner’s dilemma is not a congestion game, so we cannot directly apply Algorithm 1 to solve Multiple-bit CSL with congestion games. However, as we will show in this subsection, we can design another type of game-strategy pair, directed Braess’s paradox, which is a congestion game, and functions similarly to directed prisoner’s dilemmas in the context of Multiple-bit CSL. We will then show that using directed Braess’s paradoxes, we can solve Multiple-bit CSL with congestion games in rounds with an algorithm similar to Algorithm 1.
Definition A.2.
For , a directed Braess’s paradox is a game-strategy pair . Here, is a congestion game where . The strategy spaces of agent and are and respectively, and the strategy spaces for all other agents are . The cost functions are , and for all . The strategy profile is such that , and for all .
We illustrate the definition of directed Braess’s paradox in Fig. 4.
Lemma A.1.
Let be two agents in and . Then, if and only if agent and agent are in the same coalition under . Moreover, for .
Proof.
If agent and agent are in the same coalition under , then choosing is a dominant strategy for agent ’s coalition. Otherwise, choosing is a dominant strategy. Therefore, if and only if agent and agent are in the same coalition under . Moreover, since all agents other than only have one choice (either or ), for . ∎
Lemma A.1 shows that directed Braess’s paradoxes function just like directed prisoner’s dilemmas in Multiple-bit CSL. With an argument similar to the proof from Lemma 3.1 to Lemma 3.2, we can also show the following corollary about the product of multiple directed Braess’s paradoxes.
Corollary A.2.
Let be pairs of agents and . Then, if and only if there exists and .
To use the product of multiple directed Braess’s paradoxes in an algorithm to solve Multiple-bit CSL with congestion games, we also need to show that it is still a congestion game. We will show that congestion games are closed under the product operation in the following lemma.
Lemma A.3.
Let be two game-strategy pairs and let . If and are both congestion games on , then is also a congestion game on .
Proof.
Let and . We will write as a congestion game. According to Definition 3.3, in , the strategy space of agent is . Moreover, given a strategy profile , the utility of agent is given by . This shows that is a congestion game on . ∎
With Corollaries A.2 and A.3, we solve Multiple-bit CSL with congestion games just like we did with normal-form games in Algorithm 1. The only difference is that we query directed Braess’s paradoxes instead of directed prisoner’s dilemmas. We present the algorithm below.
Theorem A.1.
Algorithm 4 solves Multiple-bit CSL with congestion games in rounds.
Proof.
The proof is almost identical to the proof of Theorem 3.1. We first show the correctness of the algorithm. To do this, we will show three claims: (i) after Lines 1 to 7, for each , if is the agent with the smallest index in , , otherwise, contains only the smallest index of the agents in ; (ii) after Lines 8 to 11, ; (iii) the games queried in the algorithm are congestion games. (iii) is implied by Lemma A.3, so we only need to show (i) and (ii).
For (i), if is the agent with the smallest index in , then by Corollary A.2, , and on Line 2. Throughout the algorithm, remains . Otherwise, contains the smallest index of the agents in after Line 2. In the while loop, is updated to if , and to otherwise. Since contains the smallest elements in , by Corollary A.2, if and only if the smallest index of the agents in is in . Therefore, after one iteration of the loop, still contains only the smallest index of the agents in , while the size of is halved. The loop terminates after , and contains only the smallest index of the agents in .
For (ii), since every agent is either the agent with the smallest index in or merged with that agent, each coalition is merged in after Lines 8 to 11. Thus becomes the same as .
Next, we show the complexity of the algorithm. The while loop in Lines 3 to 7 runs at most times, and each iteration requires query. Together with the query on Line 1, the total number of queries is at most . ∎
Appendix B Multiple-bit CSL with Graphical Games
We then proceed to the case of Multiple-bit CSL with graphical games. Recall that in a graphical game [10], each agent is associated with a vertex in an undirected graph . We use to denote the set of neighbors of agent in , including agent itself.
Definition B.1.
A graphical game is a tuple where is the set of agents, is an undirected graph, is the strategy space of agent , and is the local game matrix of agent . For a strategy profile where , the utility of agent is given by , where consists only of the strategies of agents in . Here, denotes the set of neighbors of agent in undirected graph .
If we let the undirected graph be a complete graph, then we recover the normal-form game setting. However, the reason why we consider Multiple-bit CSL with graphical games is that graphical games are compact representations of normal-form games. When the degree of each vertex in the graph is small, the graphical game representation is far succincter than the normal-form game representation. Therefore, we consider in this section the case where the algorithm can only design degree- graphical games, where the degree of each vertex in the graph is at most . In this case, we will need at least rounds to solve the Multiple-bit CSL problem. To prove this claim formally, we first show with the following lemma that in some cases, the observations reveal no information about whether a certain pair of agents are in the same coalition under .
Lemma B.1.
Let be two agents in and let be two coalition structures. If we query the oracle for a graphical game where edge is not in the undirected graph , then the observation reveals no information about whether or .
Proof.
Let and let be the specified strategy profile in . Consider agent ’s deviation decision . Let and be the set of best responses of coalitions and to and if respectively. Since the local game matrix only depends on , no matter what strategy agent plays, the best response of to is the same. Therefore, the set of best responses of coalition if is . This shows that whether or does not affect the deviation decision of agent . The same argument applies to agent . ∎
Theorem B.1.
Any algorithm that solves the Multiple-bit CSL problem with degree- graphical games requires at least rounds of interactions with the agents in the worst case.
Proof.
Suppose that the algorithm has queried the observation oracle times with game-strategy pairs , where is a degree- graphical game. According to Lemma B.1, if the observations always behave as if , then the algorithm cannot finalize the answer until for each pair of agents , the algorithm has queried the oracle for a graphical game where edge is in the undirected graph. Since the degree of each vertex in the graph is at most , the algorithm needs to query the oracle for at least different graphical games to include all edges adjacent to vertex . The theorem follows. ∎
Then, we present an algorithm that solves the Multiple-bit CSL problem with degree- graphical games in rounds of interactions with the agents in the worst case. Note that , given the lower bounds in Theorems 2.1 and B.1, the algorithm is optimal up to a constant factor.
The algorithm uses the product of multiple directed prisoner’s dilemmas to query the observation oracle. We will first show that such a product can be represented as a degree- graphical game.
Lemma B.2.
Let be pairs of agents and . Then, can be written as a degree- graphical game, where , i.e., the maximum number of occurrences of any agent in the pairs.
Proof.
For two agents , if does not occur in any of the pairs, then, according to Definition 3.2, the utility of agent does not depend on agent ’s strategy in any of the directed prisoner’s dilemmas . Since for each agent, the utility in the product game is the sum of the utilities in the factor games, the utility of agent in the product game also does not depend on agent ’s strategy. This shows that can be represented as a graphical game with the graph’s edge set being . The lemma then follows. ∎
We then present the algorithm as Algorithm 5.
Compared with the normal-form game setting, with degree- graphical games, the algorithm can only query products of multiple directed prisoner’s dilemmas subject to the degree constraint as described in Lemma B.2. To comply with this constraint, Algorithm 5 introduces an important idea, block decomposition, i.e., to partition the agents into blocks of size and choose pairs to query according to the decomposition (Lines 1 to 2). As long as for each query, each agent is involved in directed prisoner’s dilemmas with at most two blocks of agents, the constraint is not violated.
The general idea of Algorithm 5 is to find for each agent , the “predecessor” of , i.e., the agent with the largest index that is smaller than in . To do this, the algorithm takes a two-step approach. First, the algorithm tries to find which block the predecessor of each agent belongs to (Lines 3 to 6). We illustrate this process with an example in Fig. 5. The algorithm enumerates block index gap and constructs one query for each . The query for a given includes of all pairs of agents such that the difference between the block indices they belong to, i.e., , is (Lines 3 to 4). In this way, an agent will be involved in directed prisoner’s dilemmas with agents in at most two blocks, block and block in one query. Moreover, after the enumeration, since the algorithm observes for each agent and each block before , whether there is an agent in the same coalition of in that block, it gets to determine the block index of each agent’s predecessor or conclude that it has no predecessors (Lines 5 to 6).
Then, the algorithm carries out a simultaneous binary search to find the predecessor of each agent (Lines 7 to 17). The algorithm first deals with the agents whose predecessors are in the same block as themselves (Lines 7 to 12). In this process, each agent will be involved in games only within its own block in the queries. Next, the algorithm deals with the agents whose predecessors are in a different block (Lines 13 to 17). In this process, each agent will be involved in games with its predecessor’s block, and agents outside block whose predecessors are in block . The number of such agents is also at most twice the block size.
Finally, the algorithm merges each agent with its predecessor (Lines 18 to 21).
Theorem B.2.
Let be an even number, Algorithm 5 solves the Multiple-bit CSL problem with degree- graphical games in rounds of interactions with the agents in the worst case.
Proof.
We first show the correctness of the algorithm. To do this, we will show four claims: (i) after Lines 1 to 6, for each , if is the agent with the smallest index in , ; otherwise, let be the largest index in that is smaller than , ; (ii) after Lines 7 to 17, for each , if is the agent with the smallest index in , ; otherwise, contains only the largest index in that is smaller than ; (iii) after Lines 8 to 21, ; (iv) the games queried in Lines 4, 11, and 16 are degree- graphical games.
For (i), if is the agent with the smallest index in , then by Lemma 3.2, , for all on Line 4. As a result, on Line 5, and on Line 6. Otherwise, let be the largest index in that is smaller than . Then, by Lemma 3.2, for on Line 4 since is included as a factor game. Moreover, for all since is the largest index in that is smaller than . Therefore, becomes the minimum of on Line 5, and thus on Line 6.
For (ii), if is the agent with the smallest index in , then on Line 7. Throughout the algorithm, remains . Otherwise, contains the largest index, , in that is smaller than after Line 7. Depending on whether or not, either Lines 8 to 12 or Lines 13 to 17 will carry out a binary search to find this index . Note that in the while loop, is updated to if , and to otherwise. Since contains the largest elements in , by Lemma 3.2, if and only if the largest index in that is smaller than is in . Thus, the binary search will find this index and end up with .
For (iii), since every agent is either the agent with the smallest index in or merged with its predecessor (the agent with the largest index that is smaller than ) in , each coalition is merged in after Lines 18 to 21. Thus becomes the same as .
For (iv), consider an agent . In the game queried on line 4, agent is involved in at most factor games with agents ; in the game queried on line 11, agent is involved in at most factor games with agents ; in the game queried on line 16, agent is involved in at most factor games with agents . Therefore, the games queried in Lines 4, 11, and 16 are degree- graphical games by Lemma B.2.
Next, we show the complexity of the algorithm. The for loop in Lines 3 to 4 requires queries, and the while loops in Lines 8 to 12 and Lines 13 to 17 require at most queries each. Therefore, the total number of queries is at most . This completes the proof. ∎
Appendix C Omitted Proofs
C.1 Proof of Theorem 4.1
See 4.1
Proof.
We first show the correctness of the algorithm. Consider the if statement on Line 8, if there exists such that , then by Lemma 4.1, exactly one agent in has , and agents and are in the same coalition under . The algorithm then merges and in , and removes from . Otherwise, by Lemma 4.1, no agent in is in . The algorithm then removes from . Therefore, the inner while loop on Lines 5 to 12 finalizes agent ’s coalition in correctly, and removes the whole coalition from . The correctness of the algorithm then follows.
For the complexity, the algorithm starts with , and after each query on Line 7, it removes one agent from . Since the algorithm terminates when , it uses queries in total. ∎