This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Balancing Asymptotic and Transient Efficiency Guarantees
in Set Covering Games

Rohit Konda, Rahul Chandan, David Grimsman and Jason R. Marden R. Konda ([email protected]), R. Chandan, D. Grimsman and J. R. Marden are with the Department of Electrical and Computer Engineering at the University of California, Santa Barbara, CA. This work is supported by ONR Grant #N00014-20-1-2359 and AFOSR Grant #FA9550-20-1-0054.
Abstract

Game theoretic approaches have gained traction as robust methodologies for designing distributed local algorithms that induce a desired overall system configuration in multi-agent settings. However, much of the emphasis in these approaches is on providing asymptotic guarantees on the performance of a network of agents, and there is a gap in the study of efficiency guarantees along transients of these distributed algorithms. Therefore, in this paper, we study the transient efficiency guarantees of a natural game-theoretic algorithm in the class of set covering games, which have been used to model a variety of applications. Our main results characterize the optimal utility design that maximizes the guaranteed efficiency along the transient of the natural dynamics. Furthermore, we characterize the Pareto-optimal frontier with regards to guaranteed efficiency in the transient and the asymptote under a class of game-theoretic designs. Surprisingly, we show that there exists an extreme trade-off between the long-term and short-term guarantees in that an asymptotically optimal game-theoretic design can perform arbitrarily bad in the transient.

I Introduction

Recently, game theory has emerged as a significant framework in the study of multi-agent systems: the agents in the system are modeled as agents in a game, each with its own actions and individual rewards that are functions of the joint action of the other agents. Some prominent examples where these models have found success include wireless communication networks [1], UAV swarm task allocation [2], news subscription services [3], vaccinations during an epidemic [4], security systems [5], facility location [6], coordinating the charging of electric vehicles [7], and national defense [8], among others. Indeed, game theory has played a significant role in many applications and across various disciplines.

A central focus in using game-theoretic models to control multi-agent systems is the design of local objective (or reward) functions for the agents that result in a desirable emergent system-level outcome, as measured by a system welfare function [9]. Generally the emergent outcome is assumed to be a coordinated decision which is a Nash equilibrium of the game, wherein each agent cannot improve its own objective function by switching actions. A significant research effort has been devoted to designing local objective functions that ensure good performance of Nash equilibria. For example, [10] shows that for the class of set covering games, there exists a set of local objective functions which guarantee that the performance of any Nash equilibrium is within a multiplicative factor of 11e1-\frac{1}{e} of the optimal decision set. Furthermore, no other polynomial time algorithm can perform better [11].

Rather than focusing solely on the system welfare at the Nash equilibria, this work studies the welfare of decision sets along the transient path toward equilibrium. This may be important when the relevant system parameters evolve on faster time scales, agent modalities vary with time etc. The class of potential games [12] is particularly relevant to this goal as the class of best response/reply processes and its many variants (see, e.g., [13]) are known to converge to a Nash equilibrium asymptotically from any initial configuration. Best reply processes are intimately tied with the development of game theory, present in Cournot processes, Nash’s seminal paper [14], as well as stable and evolutionary equilibrium [15]. Efficiency guarantees along these best response processes are much less studied than the asymptotic guarantees that come with Nash equilibrium. In this work, we restrict our attention to an important subclass of potential games known as set covering games, in which agents with limited capabilities try to coordinate to cover as large of an area as possible. For set covering games, we characterize bounds on the efficiency for best response processes, and compare utility designs that maximize either the short- or long-efficiency.

While few works study the guarantees that can arise through best response processes, we highlight a subset of important works that deal with analyzing the emergent behavior of best response dynamics. The authors in [16, 17] analyze lower bounds for the efficiency guarantees resulting after a single best response round in linear congestion and cut games. These guarantees were verified through a linear programming approach in [18] and extended to quadratic and cubic congestion games. Best response dynamics are also studied in [19], where it was found that the efficiency resulting from best response dynamics can be arbitrarily bad for non-potential games. The speed of convergence to efficient states is studied in [20]. While similar in spirit, our paper extends the previous literature by moving from characterization results of prespecified utility designs to instead identifying optimal game-theoretic designs that maximize the efficiency guarantees along a best response process.

The contributions of this paper are all with respect to the class of set covering games and are as follows:

  1. 1.

    We identify the local objective that optimizes the transient efficiency guarantee along a best response dynamic;

  2. 2.

    We identify a trade-off between transient and asymptotic guarantees, and provide explicit characterizations of the Pareto frontier and Pareto-optimal agent objective functions.

The rest of the paper is organized as follows: In Section II, we introduce the necessary notation and definitions for best response dynamics, as well as for set covering games. We then outline our main results in Section III. Section IV contains the proof of the main result. We present a simulation study in Section V to examine how our findings translate to practice. Finally, we conclude in Section VI.

II Model

Notation. We use \mathbb{R} and \mathbb{N} to denote the sets of real and natural numbers respectively. Given a set 𝒮\mathcal{S}, |𝒮||\mathcal{S}| represents its cardinality. 𝟏S\mathbf{1}_{S} describes the indicator function for set SS (𝟏S(e)=1\mathbf{1}_{S}(e)=1 if eSe\in S, 0 otherwise).

II-A Set Covering Games and Best Response Dynamics

We present all of our results in the context of set covering games [10], a well-studied class of potential games. We first recall some definitions to outline our game-theoretic model, which is used to describe the given class of distributed scenarios. Set covering games are characterized by a finite set of resources \mathcal{R} that can be possibly covered by a set of nn agents ={1,,n}\mathcal{I}=\{1,\dots,n\}. Each resource rr has an associated value vr0v_{r}\geq 0 determining its relative importance to the system. Each agent ii has a finite action set 𝒜i={ai1,,aic}2\mathcal{A}_{i}=\{a^{1}_{i},\dots,a_{i}^{c}\}\subset 2^{\mathcal{R}}, representing the sets of resources that each agent can opt to select. The performance of a joint action a=(a1,,an)𝒜=𝒜1××𝒜na=(a_{1},\dots,a_{n})\in\mathcal{A}=\mathcal{A}_{1}\times\cdots\times\mathcal{A}_{n} comprised of every agent’s actions is evaluated by a system-level objective function W:𝒜0\mathrm{W}:\mathcal{A}\to\mathbb{R}_{\geq 0}, defined as the total value covered by the agents, i.e.,

W(a)=riaivr.\mathrm{W}(a)=\sum_{r\in\bigcup_{i}a_{i}}{v_{r}}. (1)

The goal is to coordinate the agents to a joint action aopta^{\mathrm{opt}} that maximizes this welfare W(a)\mathrm{W}(a). Thus, each agent ii\in\mathcal{I} is endowed with its own utility function Ui:𝒜\mathrm{U}_{i}:\mathcal{A}\to\mathbb{R} that dictates its preferences among the joint actions. We use ai=(a1,,ai1,ai+1,an)a_{-i}=(a_{1},\dots,a_{i-1},a_{i+1},\dots a_{n}) to denote the joint action without the action of agent ii. A given set covering game G\mathrm{G} can be summarized as the tuple G=(,𝒜,W,{Ui}i)\mathrm{G}=(\mathcal{I},\mathcal{A},\mathrm{W},\{\mathrm{U}_{i}\}_{i\in\mathcal{I}}). We consider the utility functions of the form

Ui(ai,ai)=raivrf(|a|r),\mathrm{U}_{i}(a_{i},a_{-i})=\sum_{r\in a_{i}}v_{r}\cdot f(|a|_{r}), (2)

where |a|r|a|_{r} is the number of agents that choose resource rr in action profile aa, and the utility rule f:{1,,n}0f:\{1,\dots,n\}\to\mathbb{R}_{\geq 0} defines the resource independent agent utility determined by |a|r|a|_{r}. This utility rule will be our design choice for defining the agents preferences towards each of their decisions. To motivate our results, we provide an example application of set covering games below, which represents just one among many [21, 22, 23].

Example 1 (Wireless Sensor Coverage [24]).

Consider a group of sensors s1,s2,,sns_{1},s_{2},\dots,s_{n} that can sense a particular region depending on the orientation, physical placement of the sensor, etc. The choice of these parameters constitute the different possible actions 𝒜i\mathcal{A}_{i} for each sensor. We partition the entire space into disjoint regions rr\in\mathcal{R} and each region rr is attributed a weight vr0v_{r}\geq 0 to signify the importance of covering it. As a whole, the set of sensors wish to maximize the total weight of the regions covered. Using a game-theoretic model of this scenario allows us to abstract away much of the complexity that arises from having to consider specific sensor types and their respective capabilities, in contrast to [24]. In this scenario, the results in the paper allow us to study guarantees of performance of local coordination algorithms that the sensors may employ. We further note that short-term performance guarantees may also be desirable especially when there is some drift in the parameters of the game (e.g., in vrv_{r} or 𝒜i\mathcal{A}_{i}).

Next, we introduce some notation relevant to the study of best reply processes. For a given joint action a¯𝒜\bar{a}\in\mathcal{A}, we say the action a^i\hat{a}_{i} is a best response for agent ii if

a^iBRi(a¯i)=argmaxai𝒜iUi(ai,a¯i).\hat{a}_{i}\in\ \text{BR}_{i}(\bar{a}_{-i})=\arg\max_{a_{i}\in\mathcal{A}_{i}}\mathrm{U}_{i}(a_{i},\bar{a}_{-i}). (3)

Note that BRi(a¯i)\text{BR}_{i}(\bar{a}_{-i}) may be set valued. In this fashion, the underlying discrete-time dynamics of the best response process is

ai[m+1]\displaystyle a_{i}[m+1] BRi(ai[m]) for some agent i\displaystyle\in\text{BR}_{i}(a_{-i}[m])\text{ for some agent }i (4)
ai[m+1]\displaystyle a_{-i}[m+1] =ai[m],\displaystyle=a_{-i}[m],

where at each step, a single agent acts in a best response to the current joint action profile. It is important to stress that the fixed points of this best response process are Nash equilibria of the game, i.e., a joint action ane𝒜a^{\mathrm{ne}}\in\mathcal{A} such that

aineBRi(aine) for all agents i.a^{\mathrm{ne}}_{i}\in\text{BR}_{i}(a^{\mathrm{ne}}_{-i})\text{ for all agents }i\in\mathcal{I}. (5)

In the case of potential games [12], the set of Nash equilibrium NE\mathrm{NE} is also globally attractive with respect to the best response dynamics. A potential game is defined as any game for which there exists a potential function ϕ\phi where for all agents ii and pairs of actions a,aa,a^{\prime},

ϕ(ai,ai)ϕ(ai,ai)=Ui(ai,ai)Ui(ai,ai).\phi(a_{i},a_{-i})-\phi(a_{i}^{\prime},a_{-i})=\mathrm{U}_{i}(a_{i},a_{-i})-\mathrm{U}_{i}(a_{i}^{\prime},a_{-i}). (6)

In these games, the potential function decreases monotonically along the best response process in a similar manner to a Lyapunov function. For non-potential games, there is a possibility that the best response process ends up in a cycle [25], but this falls outside the scope of this paper.

In this work we assume that the best response process is such that the agents are ordered from 11 to nn and perform successive best responses in that order. While other best response processes have been previously studied (see, e.g., kk covering walks [16]), we limit our analysis to this specific process for tractability. We also assume that each agent has a null action aia^{\varnothing}_{i} available to them representing their non-participation in the game G\mathrm{G}.

Definition 1.

Given a game G\mathrm{G} with nn agents, a kk-best response round is a nkn\cdot k step best reply process with an initial state a[0]=aa[0]=a^{\varnothing} and the corresponding dynamics

ai[m+1]\displaystyle a_{i}[m+1] BRi(ai[m]) for agent i=(m+1 mod n)\displaystyle\in\text{BR}_{i}(a_{-i}[m])\text{ for agent }i=(m+1\text{ mod }n)
ai[m+1]\displaystyle a_{-i}[m+1] =ai[m].\displaystyle=a_{-i}[m]. (7)

This process results in a set of possible action trajectories Σkσk=(a[0],a[1],a[kn1],a[kn])\Sigma^{k}\ni\sigma^{k}=(a[0],a[1]\dots,a[kn-1],a[kn]) with a set of possible end action states end(k)={σk(kn):σkΣk}\mathrm{end}(k)=\{\sigma^{k}(kn):\sigma^{k}\in\Sigma^{k}\} resulting after running for knk\cdot n time steps.

II-B Price of Best Response and Price of Anarchy

The primary focus of this paper is to analyze guarantees on the efficiency of possible end actions end(k)\mathrm{end}(k) after a kk-best response round with respect to the system welfare W\mathrm{W}. In this regard, we introduce the price of best response metric to evaluate the worst case ratio between the welfare of the outcome of the kk-best response round and the optimal welfare in a game.

PoB(G,k)=minaend(k)W(a)maxa𝒜W(a).\mathrm{PoB}(\mathrm{G},k)=\frac{\min_{a\in\mathrm{end}(k)}{\mathrm{W}(a)}}{\max_{a\in\mathcal{A}}{\mathrm{W}(a)}}. (8)

Our study of the price of best response will be in direct comparison with the standard game-theoretic metric of price of anarchy, which is defined as

PoA(G)=minaNEW(a)maxa𝒜W(a).\mathrm{PoA}(\mathrm{G})=\frac{\min_{a\in\mathrm{NE}}{\mathrm{W}(a)}}{\max_{a\in\mathcal{A}}{\mathrm{W}(a)}}. (9)

The price of anarchy is a well established metric, with a number of results on its characterization, complexity, and design [26]. We frame price of anarchy as a backdrop to many of our results, where Nash equilibria are the set of end states that are possible, following a kk-best response round in the limit as kk\to\infty. Under this notation, we can analyze the transient guarantees, which have been relatively unexplored, in these distributed frameworks in comparison to the asymptotic setting.

To isolate the efficiency analysis over the utility rule ff and the number of agents nn, we consider the price of best response with respect to the class of all set covering games 𝒢f,n\mathcal{G}_{f,n} with a fixed utility rule ff and nn number of agents, and any resource set \mathcal{R} and corresponding values vrv_{r} and set of agent actions 𝒜\mathcal{A}:

PoB(f,n,k)=infG𝒢f,nPoB(G,k).\mathrm{PoB}(f,n,k)=\inf_{\mathrm{G}\in\mathcal{G}_{f,n}}\mathrm{PoB}(\mathrm{G},k). (10)

Similarly, we define the price of anarchy over 𝒢f,n\mathcal{G}_{f,n} as

PoA(f,n)=infG𝒢f,nPoA(G).\mathrm{PoA}(f,n)=\inf_{\mathrm{G}\in\mathcal{G}_{f,n}}\mathrm{PoA}(\mathrm{G}). (11)

When appropriate, we may also remove the dependence on the number of agents by denoting PoB(f,k)=infnPoB(f,n,k)\mathrm{PoB}(f,k)=\inf_{n\in\mathbb{N}}\mathrm{PoB}(f,n,k) (and likewise for PoA\mathrm{PoA}) to take the worst case guarantees across any number of agents.

We are interested in the transient and asymptotic system guarantees that can result from a given utility rule. Therefore we bring to attention two well-motivated utility rules to highlight the spectrum of guarantees that are possible. The first utility rule fPoAargmaxfPoA(f)f_{\mathrm{PoA}}\in\arg\max_{f}\mathrm{PoA}(f), introduced in [10], attains the optimal price of anarchy guarantee for the class of set covering games and is defined as

fPoA(j)==j(j1)!!(e1).f_{\mathrm{PoA}}(j)=\sum_{\ell=j}^{\infty}{\frac{(j-1)!}{\ell!(e-1)}}. (12)

The second rule is the marginal contribution (MC) rule. The MC rule is a well studied [27] utility rule and is defined as

fMC(j)={1,for j=10,otherwise}.f_{MC}(j)=\left\{\begin{array}[]{lr}1,&\text{for }j=1\\ 0,&\text{otherwise}\\ \end{array}\right\}. (13)

Note that the induced utility under this mechanism is Ui(ai,ai)=W(ai,ai)W(ai,ai)\mathrm{U}_{i}(a_{i},a_{-i})=\mathrm{W}(a_{i},a_{-i})-\mathrm{W}(a^{\varnothing}_{i},a_{-i}), where the utility for agent ii is the added welfare from playing its null action aia^{\varnothing}_{i} to playing aia_{i}. Using fMCf_{MC} ensures that all Nash equilibria of the game are local optima of the welfare function. However, it is known that the MC rule has sub-optimal asymptotic guarantees as PoA(fMC)=12<11e=PoA(fPoA)\mathrm{PoA}(f_{MC})=\frac{1}{2}<1-\frac{1}{e}=\mathrm{PoA}(f_{\mathrm{PoA}}) [28]. Thus, under the standard tools of asymptotic analysis, fPoAf_{\mathrm{PoA}} has been considered to be ‘optimal’ in inducing desirable system behavior. Utilizing the notion of PoB\mathrm{PoB}, we inquire if the favorable characteristics of fPoAf_{\mathrm{PoA}} translate to good short-term behavior of the system. Intuitively, since fPoAf_{\mathrm{PoA}} guarantees optimal long-term performance, we expect the same in the short-term as well. This discussion motivates the following three questions:

  1. (a)

    For a given k1k\geq 1, is PoB(fPoA,n,k)\mathrm{PoB}(f_{\mathrm{PoA}},n,k) always greater than PoB(fMC,n,k)\mathrm{PoB}(f_{MC},n,k)?.

  2. (b)

    If not, then for how many rounds does the best response algorithm under fPoAf_{\mathrm{PoA}} have to be run for the guarantees on the end state performance to surpass the asymptotic performance of marginal contribution? In other words, is there k1k\geq 1 for which PoB(fPoA,n,k)PoA(fMC,n)=12\mathrm{PoB}(f_{\mathrm{PoA}},n,k)\geq\mathrm{PoA}(f_{MC},n)=\frac{1}{2}?

  3. (c)

    For a given k1k\geq 1, what is the utility mechanism that maximizes the end state performance after running the best response algorithm for kk rounds, i.e. what is argmaxfPoB(f,n,k)\arg\max_{f}\mathrm{PoB}(f,n,k)?

In the next section, we answer these questions in detail.

III Main Results

In this section, we outline our main results for price of best response guarantees. In this setting, utility functions are designed by carefully selecting the utility rule ff and improvement in transient system level behavior corresponds with a higher price of best response. Hence, we characterize the optimal ff that achieves the highest possible PoB\mathrm{PoB}. We formally state the optimality of the MC rule for the class of set covering games below.

Theorem 1.

Consider the class of set covering games with n2n\geq 2 users. The following statements are true:

  1. (a)

    For any number of rounds k1k\geq 1, the transient performance guarantees associated with the utility mechanisms fMCf_{MC} and fPoAf_{\mathrm{PoA}} satisfy

    PoB(fMC,n,k)\displaystyle\mathrm{PoB}(f_{MC},n,k) =1/2,\displaystyle=1/2, (14)
    PoB(fPoA,n,k)\displaystyle\mathrm{PoB}(f_{\mathrm{PoA}},n,k) 1/2.\displaystyle\leq 1/2. (15)

    Furthermore, the marginal contribution rule achieves its asymptotic guarantees after a single round, i.e.,

    PoB(fMC,n,k)=PoB(fMC,n,1).\mathrm{PoB}(f_{MC},n,k)=\mathrm{PoB}(f_{MC},n,1). (16)
  2. (b)

    For any number of rounds k1k\geq 1, the marginal contribution rule fMCf_{MC} optimizes the PoB\mathrm{PoB}, i.e.,

    maxfPoB(f,n,k)=PoB(fMC,n,k)=1/2.\max_{f}\mathrm{PoB}(f,n,k)=\mathrm{PoB}(f_{MC},n,k)=1/2. (17)

    Alternatively, there is no utility rule ff with PoB(f,n,k)>1/2\mathrm{PoB}(f,n,k)>1/2 for any kk.

Proof.

To prove the statements outlined in items (a)\rm{(a)} and (b)\rm{(b)} of the theorem statement, we show, more generally, that for any ff and any kk, the following series of inequalities is true:

12=PoB(fMC,n,1)=PoB(fMC,n,k)PoB(f,n,k).\displaystyle\frac{1}{2}=\mathrm{PoB}(f_{MC},n,1)=\mathrm{PoB}(f_{MC},n,k)\geq\mathrm{PoB}(f,n,k).

The fact that PoB(fMC,n,1)=1/2\mathrm{PoB}(f_{MC},n,1)=1/2 is a result of Lemma 1 (stated later in the paper) for n2n\geq 2 and the definition of fMCf_{MC}. For proving PoB(fMC,n,k)=12\mathrm{PoB}(f_{MC},n,k)=\frac{1}{2}, we first show that

PoB(fMC,n,k)PoB(fMC,n,1).\mathrm{PoB}(f_{MC},n,k)\geq\mathrm{PoB}(f_{MC},n,1). (18)

Let a,a𝒜a,a^{\prime}\in\mathcal{A} be joint actions such that aa^{\prime} is a best response to aa for agent ii. If the MC rule is used as in Equation (13), then

W(a)W(a)\displaystyle\mathrm{W}(a^{\prime})-\mathrm{W}(a) =W(ai,ai)W(ai,ai)\displaystyle=\mathrm{W}(a_{i}^{\prime},a_{-i})-\mathrm{W}(a^{\varnothing}_{i},a_{-i})
+W(ai,ai)W(ai,ai)\displaystyle\qquad+\mathrm{W}(a^{\varnothing}_{i},a_{-i})-\mathrm{W}(a_{i},a_{-i})
=Ui(ai,ai)Ui(ai,ai)\displaystyle=\mathrm{U}_{i}(a_{i}^{\prime},a_{-i})-\mathrm{U}_{i}(a_{i},a_{-i})
0.\displaystyle\geq 0.

Then, after any m1m\geq 1 best response rounds, W(end(m))W(end(1))\mathrm{W}(\mathrm{end}(m))\geq\mathrm{W}(\mathrm{end}(1)), and therefore the claim is shown. For the other direction as well as the inequality 12PoB(f,n,k)\frac{1}{2}\geq\mathrm{PoB}(f,n,k), we construct a game construction (also shown in Figure 1) such that for any utility rule ff, PoB(Gf,k)12\mathrm{PoB}(\mathrm{G}^{f},k)\leq\frac{1}{2}. The game Gf\mathrm{G}^{f} is constructed as follows with 22 agents. Let the resource set be ={r1,r2,r3,r4}\mathcal{R}=\{r_{1},r_{2},r_{3},r_{4}\} with vr1=1v_{r_{1}}=1, vr2=1v_{r_{2}}=1, vr3=f(2)v_{r_{3}}=f(2), and vr4=0v_{r_{4}}=0. For agent 11, let 𝒜1={a1,a11,a12}\mathcal{A}_{1}=\{a^{\varnothing}_{1},a^{1}_{1},a^{2}_{1}\} with a11={r1}a^{1}_{1}=\{r_{1}\} and a12={r2}a^{2}_{1}=\{r_{2}\}. For agent 22, let 𝒜2={a2,a21,a22}\mathcal{A}_{2}=\{a^{\varnothing}_{2},a^{1}_{2},a^{2}_{2}\} with a21={r3}a^{1}_{2}=\{r_{3}\} and a22={r1}a^{2}_{2}=\{r_{1}\}. It can be seen in this game, if f(2)1f(2)\leq 1, the optimal joint action is (a12,a22)(a^{2}_{1},a^{2}_{2}) resulting in a welfare of 22, and if f(2)>1f(2)>1, the optimal joint action is (a12,a21)(a^{2}_{1},a^{1}_{2}) resulting in a welfare of 1+f(2)21+f(2)\geq 2. Additionally, starting from (a1,a2)(a^{\varnothing}_{1},a^{\varnothing}_{2}), note the best response path ((a1,a2),(a11,a2)(a11,a21),,(a11,a21),(a11,a21),(a11,a22))((a^{\varnothing}_{1},a^{\varnothing}_{2}),(a^{1}_{1},a^{\varnothing}_{2})(a^{1}_{1},a^{1}_{2}),\dots,(a^{1}_{1},a^{1}_{2}),(a^{1}_{1},a^{1}_{2}),(a^{1}_{1},a^{2}_{2})) exists, ending in the action (a11,a22)(a^{1}_{1},a^{2}_{2}) with a welfare 11. Since (a11,a21)(a^{1}_{1},a^{1}_{2}) is a Nash equilibrium, both agents stay playing their first action a1a^{1} for k1k-1 rounds and agent 22 only switches to a22a^{2}_{2} in the kk-round. Therefore PoB(Gf,k)12\mathrm{PoB}(\mathrm{G}^{f},k)\leq\frac{1}{2} for this game. This game construction Gf\mathrm{G}^{f} can be extended to n>2n>2 by fixing the actions of agents i2i\geq 2 to be 𝒜i={ai,ai1}\mathcal{A}_{i}=\{a^{\varnothing}_{i},a^{1}_{i}\}, where the only nonempty action available to them is ai1={r4}a^{1}_{i}=\{r_{4}\} that selects the 0 value resource.

Since Gf\mathrm{G}^{f} may not be the worst case game, PoB(f,n,k)PoB(Gf,k)12\mathrm{PoB}(f,n,k)\leq\mathrm{PoB}(\mathrm{G}^{f},k)\leq\frac{1}{2} and the final claim is shown. ∎

Refer to caption
Figure 1: Under this nn-agent game construction Gf\mathrm{G}^{f}, outlined in Theorem 1’s proof, PoB(Gf,k)12\mathrm{PoB}(\mathrm{G}^{f},k)\leq\frac{1}{2} for any mechanism ff. There are four resources ={r1,r2,r3,r4}\mathcal{R}=\{r_{1},r_{2},r_{3},r_{4}\} with values {1,1,f(2),0}\{1,1,f(2),0\} respectively. Under the allocation (a12,a22)(a^{2}_{1},a^{2}_{2}), the resources r1r_{1}, r2r_{2}, and r4r_{4} are selected to produce a welfare of 22. However for any number of rounds k1k\geq 1, there exists a best response path that ends in an allocation that selects only the resources r1r_{1} and r4r_{4} to produce a welfare of 11. Then PoB(f,n,k)PoB(Gf,k)12\mathrm{PoB}(f,n,k)\leq\mathrm{PoB}(\mathrm{G}^{f},k)\leq\frac{1}{2}.

Observe that Theorem 1 answers Questions (a)–(c) posed in the previous section. Specifically, we have shown that for any number of rounds, the performance guarantees of fPoAf_{\mathrm{PoA}} never overcome the performance guarantees of fMCf_{MC}. Remarkably, for any fixed number of rounds kk, PoB(fMC,k)\mathrm{PoB}(f_{MC},k) remains greater than PoB(fPoA,k)\mathrm{PoB}(f_{\mathrm{PoA}},k) and the advantages of using fPoAf_{\mathrm{PoA}} only appear when the system settles to a Nash equilibrium as kk\to\infty. Furthermore, the marginal contribution rule is in fact the utility mechanism that optimizes the price of best response and that this optimal efficiency guarantee is achieved after 11 round of best response. Thus, the transient guarantees of fMCf_{MC} are quite strong in comparison to fPoAf_{\mathrm{PoA}}. Note that this is in stark contrast with their price of anarchy values, where fPoAf_{\mathrm{PoA}} significantly outperforms fMCf_{MC} [10, 28]. Clearly, there exists a trade-off between short-term and long-term performance guarantees in this setting.

In our next theorem, we characterize the trade-off between short-term and long-term performance guarantees by providing explicit expressions for the Pareto curve between PoB(f,1)\mathrm{PoB}(f,1) and PoA\mathrm{PoA} and the corresponding Pareto-optimal utility mechanisms. Surprisingly, we show that PoB(fPoA,1)\mathrm{PoB}(f_{\mathrm{PoA}},1) can be arbitrarily bad.

Theorem 2.

For a fixed PoA(f)=C[12,11e]\mathrm{PoA}(f)=C\in[\frac{1}{2},1-\frac{1}{e}], the optimal PoB(f,1)\mathrm{PoB}(f,1) is

maxfPoB(f,1)=[j=0max{j!(11CCτ=1j1τ!),0}+1]1.\small\max_{f}\mathrm{PoB}(f,1)=\Bigg{[}\sum^{\infty}_{j=0}\max\{j!(1-\frac{1-C}{C}\sum^{j}_{\tau=1}{\frac{1}{\tau!}}),0\}+1\Bigg{]}^{-1}. (19)

Furthermore, if PoA(f)=11e\mathrm{PoA}(f)=1-\frac{1}{e} for a given ff, then PoB(f,1)\mathrm{PoB}(f,1) must be equal to 0.

The price of anarchy is only taken on the interval [12,11e][\frac{1}{2},1-\frac{1}{e}], since the greatest achievable PoA(f)\mathrm{PoA}(f) is 11e1-\frac{1}{e} [10] and the price of anarchy of fMCf_{MC} is 12\frac{1}{2} (which attains the optimal PoB(f,1)\mathrm{PoB}(f,1) as shown in Theorem 1). The corresponding Pareto-optimal utility rules are given in Lemma 2. The optimal trade-off curve between PoA\mathrm{PoA} and PoB\mathrm{PoB} is plotted in Figure 2.

Refer to caption
Figure 2: All possible pairs (PoA(f),PoB(f,1))[0,1]×[0,1](\mathrm{PoA}(f),\mathrm{PoB}(f,1))\in[0,1]\times[0,1] are depicted in this figure. The attainable set where there exists a mechanism ff that can achieve either a greater PoB(f,1)\mathrm{PoB}(f,1) or PoA(f)\mathrm{PoA}(f) is depicted in green and the complement is depicted in red. The Pareto curve that results from the optimal trade-off described in Theorem 2 is depicted by the black curve. We observe that there is an extreme drop in transient performance to PoB(f,1)=0\mathrm{PoB}(f,1)=0 when using any utility rule that optimizes PoA(f)\mathrm{PoA}(f).

Theorem 2 highlights the disparity between the efficiency guarantees of the asymptotic and transient values, quantified by the pair of achievable PoA\mathrm{PoA} and PoB\mathrm{PoB}. While there is a decline in PoA\mathrm{PoA} guarantees when using the MC rule, the deterioration in PoB\mathrm{PoB} that results from using the mechanism fPoAf_{\mathrm{PoA}} that optimizes PoA\mathrm{PoA} is much more extreme. In fact, the transient performance of fPoAf_{\mathrm{PoA}} can be arbitrarily bad. This is quite surprising, it appears that a utility rule’s asymptotic performance guarantees do not reflect on its transient guarantees. This extreme trade-off has significant consequences on distributed design, especially when the short-term performance of the multi-agent system is critical, and prompts a more careful interpretation of the asymptotic results in such scenarios.

IV Proof of Theorem 2

In this section, we prove the result in Theorem 2. First, we characterize the 11-round price of best response for any utility rule ff in the next lemma. The proof of this lemma is based on a worst case game construction using a smoothness argument. The worst case game construction is well structured and is shown in Figure 3.

Lemma 1.

Given a utility rule ff and nn number of agents, the 11-round price of best response is

1PoB(f,n,1)=(j=1nf(j)minjf(j))/f(1)+1.\frac{1}{\mathrm{PoB}(f,n,1)}=\Big{(}\sum_{j=1}^{n}{f(j)}-\min_{j}{f(j)}\Big{)}/f(1)+1. (20)
Proof.

To characterize PoB(f,n,1)\mathrm{PoB}(f,n,1), we first formulate it as an optimization program, where we search over all possible set covering games with a fixed ff and nn for the worst 11-round outcome. We give a proof sketch of the correctness of the proposed optimization program in (22). Without loss of generality, we can only consider games where each agent ii has 22 actions aiopta^{\mathrm{opt}}_{i} and aibra^{\mathrm{br}}_{i}, in which aopta^{\mathrm{opt}} is the action that optimizes the welfare W\mathrm{W} and abra^{\mathrm{br}} is the action that results from a 11-round best response. Informally, the optimization program involves searching for a set covering game that maximizes W(aopt)\mathrm{W}(a^{\mathrm{opt}}) with the constraints W(abr)=1\mathrm{W}(a^{\mathrm{br}})=1 and that abra^{\mathrm{br}} is indeed the state that is achieved after a 11-round best response. Since the welfare W(a)\mathrm{W}(a) as well as Ui(a)\mathrm{U}_{i}(a) can be described fully by vrv_{r} and ff, each game is parametrized through the resource values vrv_{r} from a common resource set \mathcal{R}^{*}. Let

𝐚br(r)={i:raibr}\mathbf{a}_{\mathrm{br}}(r)=\{i\in\mathcal{I}:r\in a^{\mathrm{br}}_{i}\} (21)

denote the set of agents that select the resource rr in abra^{\mathrm{br}}. We define 𝐚opt(r)\mathbf{a}_{\mathrm{opt}}(r) similarly. Additionally, we can define 𝐚br<i(r)={1j<i:rajbr}\mathbf{a}_{\mathrm{br}}^{<i}(r)=\{1\leq j<i:r\in a^{\mathrm{br}}_{j}\} to denote the set of agents {1,,i1}\{1,\dots,i-1\} that select the resource rr in in aibra^{\mathrm{br}}_{i}. The common resource set \mathcal{R}^{*} is comprised of all the resources rr that have unique signature from 𝐚br(r)\mathbf{a}_{\mathrm{br}}(r) and 𝐚opt(r)\mathbf{a}_{\mathrm{opt}}(r). Taking the dual of the program in the previous outline results in the following program.

PoB(f,n,1)1\displaystyle\mathrm{PoB}(f,n,1)^{-1} =min{λi}i,μμ s.t.\displaystyle=\min_{\{\lambda_{i}\}_{i\in\mathcal{I}},\mu}\mu\text{ s.t.} (22)
μmin(|𝐚br(r)|,1)\displaystyle\mu\min(|\mathbf{a}_{\mathrm{br}}(r)|,1) min(|𝐚opt(r)|,1)+\displaystyle\geq\min(|\mathbf{a}_{\mathrm{opt}}(r)|,1)+ (23)
\medmathiλi[(𝟏𝐚br(r)(i)\displaystyle\medmath{\sum_{i\in\mathcal{I}}\lambda_{i}\Big{[}\big{(}\mathbf{1}_{\mathbf{a}_{\mathrm{br}}(r)}(i)} \medmath𝟏𝐚opt(r)(i))f(|𝐚br<i(r)|+1)]0r\displaystyle\medmath{-\mathbf{1}_{\mathbf{a}_{\mathrm{opt}}(r)}(i)\big{)}f(|\mathbf{a}_{\mathrm{br}}^{<i}(r)|+1)\Big{]}\geq 0}\ \forall r\in\mathcal{R}^{*}

Now we reduce the constraints of this program to give a closed form expression of PoB(f,n,k)\mathrm{PoB}(f,n,k). We first examine the constraints in (23) associated with rr\in\mathcal{R}^{*} with 𝐚br(r)=\mathbf{a}_{\mathrm{br}}(r)=\varnothing. Simplifying the corresponding constraints gives

jPλjf(1)1\sum_{j\in P}\lambda_{j}f(1)\geq 1 (24)

for any subset of agents PP\subset\mathcal{I}. Only considering P={j}P=\{j\} gives the set of binding constraints λj1/f(1)\lambda_{j}\geq 1/f(1) for any agent jj.

Now we consider the converse constraints in (23) where 𝐚br(r)\mathbf{a}_{\mathrm{br}}(r)\neq\varnothing. For this set of constraints, the binding constraint is postulated to be

μ(j=1nλjf(j)minjf(j))+1\mu\geq\Big{(}\sum_{j=1}^{n}{\lambda_{j}f(j)}-\min_{j}{f(j)}\Big{)}+1 (25)

corresponding to the resource r¯\bar{r} with 𝐚br(r¯)=\mathbf{a}_{\mathrm{br}}(\bar{r})=\mathcal{I} and 𝐚opt(r¯)={argminjf(j)}\mathbf{a}_{\mathrm{opt}}(\bar{r})=\{\arg\min_{j}f(j)\}. Under this assumption, the optimal dual variables λj\lambda_{j} are λj=1/f(1)j\lambda_{j}=1/f(1)\ \forall j. This follows from the constraint in (24) and the fact that decreasing any λj\lambda_{j} results in a lower binding constraint in (25).

Now we can verify that the constraint in (25) with 𝐚br(r¯)=\mathbf{a}_{\mathrm{br}}(\bar{r})=\mathcal{I} and 𝐚opt(r¯)={argminjf(j)}\mathbf{a}_{\mathrm{opt}}(\bar{r})=\{\arg\min_{j}f(j)\} is indeed the binding constraint under the dual variables λj=1/f(1)\lambda_{j}=1/f(1). The resulting dual program assuming 𝐚br(r)\mathbf{a}_{\mathrm{br}}(r)\neq\varnothing is

minμμ s.t.\displaystyle\min_{\mu}\mu\text{ s.t.} (26)
μ\displaystyle\mu j(𝟏𝐚br(r)(j)𝟏𝐚opt(r)(j))\displaystyle\geq\sum_{j}\big{(}\mathbf{1}_{\mathbf{a}_{\mathrm{br}}(r)}(j)-\mathbf{1}_{\mathbf{a}_{\mathrm{opt}}(r)}(j)\big{)} (27)
f(|𝐚br<j(r)|+1)/f(1)+min(|𝐚opt(r)|,1)r.\displaystyle f(|\mathbf{a}_{\mathrm{br}}^{<j}(r)|+1)/f(1)+\min(|\mathbf{a}_{\mathrm{opt}}(r)|,1)\ \forall r.

Since any valid utility rule has f(j)0f(j)\geq 0 for any jj, the terms in (27) is maximized if 𝟏𝐚br(r)(j)=1\mathbf{1}_{\mathbf{a}_{\mathrm{br}}(r)}(j)=1 for all jj which implies that 𝐚br(r)=\mathbf{a}_{\mathrm{br}}(r)=\mathcal{I}. Also note that |𝐚opt(r)|1|\mathbf{a}_{\mathrm{opt}}(r)|\leq 1, as

min(|𝐚opt(r)|,1)j𝟏𝐚opt(r)(j))f(|𝐚br<j(r)|+1)/f(1)\min(|\mathbf{a}_{\mathrm{opt}}(r)|,1)-\sum_{j}\mathbf{1}_{\mathbf{a}_{\mathrm{opt}}(r)}(j)\big{)}f(|\mathbf{a}_{\mathrm{br}}^{<j}(r)|+1)/f(1) (28)

is less binding if |𝐚opt(r)|>1|\mathbf{a}_{\mathrm{opt}}(r)|>1. Since 𝐚br(r)=\mathbf{a}_{\mathrm{br}}(r)=\mathcal{I} for the binding constraint, if |𝐚opt(r)|=1|\mathbf{a}_{\mathrm{opt}}(r)|=1, the terms in (28) reduces to 1f(j)/f(1)1-f(j)/f(1) for some jj. Similarly, if |𝐚opt(r)|=0|\mathbf{a}_{\mathrm{opt}}(r)|=0, the terms in (28) reduces to 0. Note that 1f(1)/f(1)01-f(1)/f(1)\geq 0, and so 1minj{f(j)}/f(1)01-\min_{j}\{f(j)\}/f(1)\geq 0 as well. Thus |𝐚opt(r)|=1|\mathbf{a}_{\mathrm{opt}}(r)|=1 is binding. Taking any 𝐚opt(r¯)argminjf(j)\mathbf{a}_{\mathrm{opt}}(\bar{r})\in\arg\min_{j}f(j) attains the maximum value for 1minj{f(j)}/f(1)1-\min_{j}\{f(j)\}/f(1), and so we have shown the claim that the binding constraint for μ\mu is in equation (25). Taking the binding constraint with equality gives

μ=(j=1nf(j)/f(1)minjf(j))+1,\mu=\Big{(}\sum_{j=1}^{n}{f(j)}/f(1)-\min_{j}{f(j)}\Big{)}+1,

giving the expression for PoB\mathrm{PoB} as stated. ∎

Refer to caption
Figure 3: In this worst case game example, we assume that f(n)=minjf(j)f(n)=\min_{j}{f(j)}. Then in a 11-round best response, each agent can choose r1r_{1} with v1=1v_{1}=1 successively, resulting in a welfare of 11. When agent i<ni<n moves, note that if the previous i1i-1 agents have selected r1r_{1} , agent ii is indifferent towards selecting the r1r_{1} or ri+1r_{i+1} resource. Agent nn can only select the resource r1r_{1}. The optimal allocation is when all agents select different resources, resulting in a welfare that is reported in Equation (20).

To characterize the trade-off, we now provide an explicit expression of Pareto optimal utility rules, i.e., utility rules ff that satisfy either PoB(f,1)PoB(f,1)\mathrm{PoB}(f,1)\geq\mathrm{PoB}(f^{\prime},1) or PoA(f)PoA(f)\mathrm{PoA}(f)\geq\mathrm{PoA}(f^{\prime}) for all fff^{\prime}\neq f.

Lemma 2.

For a given 𝒳0\mathcal{X}\geq 0, a utility rule ff that satisfies PoA(f)1/(1+𝒳)\mathrm{PoA}(f)\geq 1/(1+\mathcal{X}) while maximizing PoB(f,1)\mathrm{PoB}(f,1) is defined as in the following recursive formula:

f𝒳(1)\displaystyle f^{\mathcal{X}}(1) =1\displaystyle=1 (29)
f𝒳(j+1)\displaystyle f^{\mathcal{X}}(j+1) =max{jf𝒳(j)𝒳,0}.\displaystyle=\max\{jf^{\mathcal{X}}(j)-\mathcal{X},0\}. (30)
Proof.

According to a modified version of Theorem 22 in [26], the price of anarchy (PoA\mathrm{PoA}) can be written as

1PoA(f)\displaystyle\frac{1}{\mathrm{PoA}(f)} =1+(max1jn1{(j+1)f(j+1)f(1),\displaystyle=1+\Big{(}\max_{1\leq j\leq n-1}\{(j+1)f(j+1)-f(1),
jf(j)f(j+1),jf(j+1)})/f(1).\displaystyle jf(j)-f(j+1),jf(j+1)\}\Big{)}/f(1). (31)

We first show that if ff is Pareto optimal, then it must also be non-increasing. Otherwise, we show that another ff exists that achieves at least the same PoB\mathrm{PoB}, but a higher PoA\mathrm{PoA}, contradicting our assumption that ff is Pareto optimal. Assume, by contradiction, that there exists ff that is Pareto optimal and decreasing, i.e., there exists a j1j^{\prime}\geq 1, in which f(j)<f(j+1)f(j^{\prime})<f(j^{\prime}+1). Notice that switching the value f(j)f(j^{\prime}) with f(j+1)f(j^{\prime}+1) results in an unchanged PoB\mathrm{PoB} according to (20) in Lemma 1 if j>1j^{\prime}>1. We show that ff^{\prime} with the values switched has a higher PoA\mathrm{PoA} than ff.

For any 1jn11\leq j^{\prime}\leq n-1, the expressions from (IV) that include f(j)f(j^{\prime}) or f(j+1)f(j^{\prime}+1) are

(jf(j)f(1)))/f(1)+1,\displaystyle(j^{\prime}f(j^{\prime})-f(1)))/f(1)+1,
((j+1)f(j+1)f(1))/f(1)+1,\displaystyle((j^{\prime}+1)f(j^{\prime}+1)-f(1))/f(1)+1,
((j1)f(j1)f(j))/f(1)+1,\displaystyle((j^{\prime}-1)f(j^{\prime}-1)-f(j^{\prime}))/f(1)+1,
(jf(j)f(j+1))/f(1)+1,\displaystyle(j^{\prime}f(j^{\prime})-f(j^{\prime}+1))/f(1)+1,
((j+1)f(j+1)f(j+2))/f(1)+1,\displaystyle((j^{\prime}+1)f(j^{\prime}+1)-f(j^{\prime}+2))/f(1)+1,
((j1)f(j))/f(1)+1,\displaystyle((j^{\prime}-1)f(j^{\prime}))/f(1)+1,
(jf(j+1))/f(1)+1,\displaystyle(j^{\prime}f(j^{\prime}+1))/f(1)+1,

After switching, the relevant expressions for ff^{\prime} are

(jf(j+1)f(1)))/f(1)+1,\displaystyle(j^{\prime}f(j^{\prime}+1)-f(1)))/f(1)+1,
((j+1)f(j)f(1))/f(1)+1,\displaystyle((j^{\prime}+1)f(j^{\prime})-f(1))/f(1)+1,
((j1)f(j1)f(j+1))/f(1)+1,\displaystyle((j^{\prime}-1)f(j^{\prime}-1)-f(j^{\prime}+1))/f(1)+1,
(jf(j+1)f(j))/f(1)+1,\displaystyle(j^{\prime}f(j^{\prime}+1)-f(j^{\prime}))/f(1)+1,
((j+1)f(j)f(j+2))/f(1)+1,\displaystyle((j^{\prime}+1)f(j^{\prime})-f(j^{\prime}+2))/f(1)+1,
((j1)f(j+1))/f(1)+1,\displaystyle((j^{\prime}-1)f(j^{\prime}+1))/f(1)+1,
(jf(j))/f(1)+1.\displaystyle(j^{\prime}f(j^{\prime}))/f(1)+1.

Since f(j)<f(j+1)f(j^{\prime})<f(j^{\prime}+1), switching the values results in a strictly looser set of constraints, and the value of the binding constraint in (IV) for ff^{\prime} is at most the value of the binding constraint for ff. Therefore PoA(f)PoA(f)\mathrm{PoA}(f)\leq\mathrm{PoA}(f^{\prime}). Note that if j=1j^{\prime}=1, then PoB(f)>PoB(f)\mathrm{PoB}(f^{\prime})>\mathrm{PoB}(f) as well. This contradicts our assumption that ff is Pareto optimal.

Now we restrict our focus to non-increasing ff with f(1)=1f(1)=1. Observe that for any ff, the transformation f(j)=f(j)/Cf^{\prime}(j)=f(j)/C, C>0C>0, doesn’t change the price of best response and price of anarchy. Thus, we can take f(1)=1f(1)=1 without loss of generality. Given a ff that satisfies the above constraints, the price of anarchy is

1PoA(f)=1+max1jn1{jf(j)f(j+1),(n1)f(n)},\frac{1}{\mathrm{PoA}(f)}=1+\\ \max_{1\leq j\leq n-1}\{jf(j)-f(j+1),(n-1)f(n)\}, (32)

as detailed in Corollary 22 in [26]. Let

𝒳f=max1jn1{jf(j)f(j+1),(n1)f(n)}.\mathcal{X}_{f}=\max_{1\leq j\leq n-1}\{jf(j)-f(j+1),(n-1)f(n)\}. (33)

For ff to be Pareto optimal, we claim that 𝒳f=jf(j)f(j+1)\mathcal{X}_{f}=jf(j)-f(j+1) must hold for all jj. Consider any other ff^{\prime} with 𝒳f=𝒳f\mathcal{X}_{f}=\mathcal{X}_{f^{\prime}}. It follows that PoA(f)=PoA(f)=1/(1+𝒳f)\mathrm{PoA}(f)=\mathrm{PoA}(f^{\prime})=1/(1+\mathcal{X}_{f}). By induction, we show that f(j)f(j)f(j)\leq f^{\prime}(j) for all jj. The base case is satisfied, as 1=f(1)f(1)=11=f(1)\leq f^{\prime}(1)=1. Under the assumption f(j)f(j)f(j)\leq f^{\prime}(j), we also have that

jf(j)𝒳f=f(j+1)f(j+1)=jf(j)𝒳fj,jf(j)-\mathcal{X}_{f}=f(j+1)\leq f^{\prime}(j+1)=jf(j)-\mathcal{X}^{j}_{f}, (34)

where 𝒳fj=jf(j)f(j+1)𝒳f\mathcal{X}^{j}_{f}=jf(j)-f(j+1)\leq\mathcal{X}_{f} by definition in (33), and so f(j)f(j)f(j)\leq f^{\prime}(j) for all jj. Therefore the summation j=1nf(j)minjf(j)\sum_{j=1}^{n}{f(j)}-\min_{j}{f(j)} in (20) is diminished and PoB(f)PoB(f)\mathrm{PoB}(f)\geq\mathrm{PoB}(f^{\prime}), proving our claim. As ff must satisfy f(j)0f(j)\geq 0 for all jj to be a valid utility rule, f(j+1)f(j+1) is set to be max{jf(j)𝒳,0}\max\{jf(j)-\mathcal{X},0\}. Then we get the recursive definition for the maximal f𝒳f^{\mathcal{X}}. Finally, we note that for infinite nn, 𝒳1e1\mathcal{X}\leq\frac{1}{e-1} is not achievable, as shown in [10]. ∎

With the two previous lemmas, we can move to showing the result detailed in Theorem 2.

Proof of Theorem 2.

We first characterize a closed form expression of the maximal utility rule f𝒳f^{\mathcal{X}}, which is given in Lemma 2. We fix 𝒳\mathcal{X} so that PoA(f𝒳)=1𝒳+1\mathrm{PoA}(f^{\mathcal{X}})=\frac{1}{\mathcal{X}+1} = CC. To calculate the expression for f𝒳f^{\mathcal{X}} for a given 𝒳\mathcal{X}, a corresponding time varying discrete time system to (30) is constructed as follows.

x[m+1]\displaystyle x[m+1] =mx[m]u[m],\displaystyle=mx[m]-u[m], (35)
y[m]\displaystyle y[m] =max{x[m],0},\displaystyle=\max\{x[m],0\}, (36)
u[m]\displaystyle u[m] =𝒳,\displaystyle=\mathcal{X}, (37)
x[1]\displaystyle x[1] =1,\displaystyle=1, (38)

where y[m]y[m] corresponds to f𝒳(j)f^{\mathcal{X}}(j). Solving for the explicit expression for y[m]y[m] gives

y[1]\displaystyle y[1] =1\displaystyle=1 (39)
y[m]\displaystyle y[m] =max[=1m1𝒳(τ=1m2=τ+1m1)𝒳,0]m>1.\displaystyle=\max\Big{[}\prod_{\ell=1}^{m-1}\ell-\mathcal{X}\big{(}\sum_{\tau=1}^{m-2}\prod_{\ell=\tau+1}^{m-1}\ell\big{)}-\mathcal{X},0\Big{]}\ \ m>1. (40)

Simplifying the expression and substituting for f𝒳(j)f^{\mathcal{X}}(j) gives

f(j)=max[(j1)!(1𝒳τ=1j11τ!),0]j1.f(j)=\max\Big{[}(j-1)!(1-\mathcal{X}\sum_{\tau=1}^{j-1}\frac{1}{\tau!}\big{)},0\Big{]}\ \ \ j\geq 1. (41)

Substituting the expression for the maximal ff into (20) gives the PoB\mathrm{PoB}. Notice that for 𝒳1e1\mathcal{X}\geq\frac{1}{e-1}, limjf(j)=0\lim_{j\to\infty}f(j)=0, and therefore minjf(j)=0\min_{j}{f(j)}=0. Shifting variables j=j+1j^{\prime}=j+1, we get the statement in (19).

Finally, we prove that if PoA(f)=11e\mathrm{PoA}(f)=1-\frac{1}{e} for a given ff, then PoB(f,1)\mathrm{PoB}(f,1) must be equal to 0. According to (19), among utility rules ff with PoA(f)=11e\mathrm{PoA}(f)=1-\frac{1}{e}, the best achievable PoB(f,1)\mathrm{PoB}(f,1) can be written as

maxfPoB(f,1)\displaystyle\max_{f}\mathrm{PoB}(f,1) =(j=0j!τ=j+11τ!e1+1)1\displaystyle=\Big{(}\sum_{j=0}^{\infty}{\frac{j!\sum_{\tau=j+1}^{\infty}{\frac{1}{\tau!}}}{e-1}}+1\Big{)}^{-1}
(1e1j=01j+1+1)10\displaystyle\leq\Big{(}\frac{1}{e-1}\sum_{j=0}^{\infty}{\frac{1}{j+1}}+1\Big{)}^{-1}\leq 0

Since 0PoB(f,1)maxfPoB(f,1)00\leq\mathrm{PoB}(f,1)\leq\max_{f}\mathrm{PoB}(f,1)\leq 0, it follows that PoB(f,1)=0\mathrm{PoB}(f,1)=0. ∎

V Simulations

We analyze our theoretical results empirically through a Monte Carlo simulation, presented in Figure 5. We simulate 200200 instances where a random game G\mathrm{G} is generated and the welfare along a 44-round best response path is recorded. Each game G\mathrm{G} has 1010 agents and two sets of 1010 resources, where the values vrv_{r} are drawn uniformly from the continuous interval [0,1][0,1]. Each agent in the game has two singleton actions, where in one action the agent selects a resource in the first set and in the other action the agent selects a resource in the second set, as seen in Figure 4. The two resources in each agent’s actions are selected uniformly, at random. These parameters were chosen to induce a natural set of game circumstances that have a rich best response behavior. Additionally, the game is either endowed with the utility rule fMCf_{MC} that optimizes PoB(f,1)\mathrm{PoB}(f,1) or fPoAf_{\mathrm{PoA}} that optimizes PoA(f)\mathrm{PoA}(f). Then the best response round algorithm is performed, where each agent successively performs a best response. The ratio of the welfare achieved at each time step following the two utility rules fMCf_{MC}, fPoAf_{\mathrm{PoA}} is displayed in Figure 5.

Refer to caption
Figure 4: An example of a game that was generated within the Monte Carlo simulation described in Section V. The highlighted boxes represent the possible single resource selections that each agent can select. The two sets of resources are divided by a bold line. For each game generated, the welfare ratio between the best response sequences corresponding with the utility rules fMCf_{MC} and fPoAf_{\mathrm{PoA}} was recorded at each time step.
Refer to caption
Figure 5: A Monte Carlo simulation with 1010 agents and 22 set of resources was conducted, where the utility rules fMCf_{MC} and fPoAf_{\mathrm{PoA}} were used in separate runs. All the agents are randomly assigned a single resource from each set in their 22 actions. The ratio WMC(a)/WPoA(a)\mathrm{W}_{MC}(a)/\mathrm{W}_{\mathrm{PoA}}(a) (from using either of the utility rules) is plotted at each time step along the two better response paths. The average ratio across all runs is displayed in black and the minimum and maximum ratios are outlined with a red line. While on average, the performance from using fMCf_{MC} is greater than fPoAf_{\mathrm{PoA}} for 11 round, this performance difference is recuperated after running the best response algorithm for more steps.

Several significant observations can be made from Figure 5. Observe that after one round (after 3030 best response steps), the performance of fMCf_{MC} exceeds the performance fPoAf_{\mathrm{PoA}} on average. This is indicative of the trade-off shown in Theorem 2. Furthermore, running the best-response algorithm for additional steps recovers the performance difference between fMCf_{MC} and fPoAf_{\mathrm{PoA}}, with fPoAf_{\mathrm{PoA}} having a slightly better end performance on average. These results support our claim that optimizing for asymptotic results may induce sub-optimal system behavior in the short term, and those results must be carefully interpreted for application in multi-agent scenarios. By analyzing the performance directly along these dynamics, we see that a richer story emerges in both the theoretical and the empirical results.

VI Conclusion

There are many works that focus on efficiency guarantees for the asymptotic behavior of multi-agent systems under the game-theoretic notion of Nash equilibrium. However, much less is understood about the efficiency guarantees in the transient behavior of such systems. In this paper, we examine a natural class of best response dynamics and characterize the efficiency along the resulting transient states. We characterize a utility design that performs optimally in the transient for the class of set covering games. Furthermore, we complement these results by providing an explicit characterization of the efficiency trade-off between 11-round best response and Nash equilibrium that can result from a possible utility design. From these results, we observe that there is a significant misalignment between the optimization of transient and asymptotic behavior for these systems. Interesting future directions include extending our results to a more general class of games and considering other classes of learning dynamics from the literature.

References

  • [1] Z. Han, D. Niyato, W. Saad, T. Başar, and A. Hjørungnes, Game theory in wireless and communication networks: theory, models, and applications. Cambridge university press, 2012.
  • [2] J. J. Roldán, J. Del Cerro, and A. Barrientos, “Should we compete or should we cooperate? applying game theory to task allocation in drone swarms,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5366–5371, IEEE, 2018.
  • [3] C.-C. Hsu, A. Ajorlou, M. Yildiz, and A. Jadbabaie, “Information disclosure and network formation in news subscription services,” in 2020 59th IEEE Conference on Decision and Control (CDC), pp. 5586–5591, IEEE, 2020.
  • [4] A. R. Hota and S. Sundaram, “Game-theoretic vaccination against networked sis epidemics and impacts of human decision-making,” IEEE Transactions on Control of Network Systems, vol. 6, no. 4, pp. 1461–1472, 2019.
  • [5] K. Paarporn, R. Chandan, M. Alizadeh, and J. R. Marden, “Characterizing the interplay between information and strength in blotto games,” in 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5977–5982, 2019.
  • [6] A. Vetta, “Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions,” in The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pp. 416–425, 2002.
  • [7] J. Martinez-Piazuelo, N. Quijano, and C. Ocampo-Martinez, “Decentralized charging coordination of electric vehicles using multi-population games,” in 2020 59th IEEE Conference on Decision and Control (CDC), pp. 506–511, IEEE, 2020.
  • [8] E. S. Lee, D. Shishika, and V. Kumar, “Perimeter-defense game between aerial defender and ground intruder,” in 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1530–1536, IEEE, 2020.
  • [9] J. R. Marden and J. S. Shamma, “Game theory and distributed control,” in Handbook of game theory with economic applications, vol. 4, pp. 861–899, Elsevier, 2015.
  • [10] M. Gairing, “Covering games: Approximation through non-cooperation,” in International Workshop on Internet and Network Economics, pp. 184–195, Springer, 2009.
  • [11] U. Feige, “A threshold of ln n for approximating set cover,” Journal of the ACM (JACM), vol. 45, no. 4, pp. 634–652, 1998.
  • [12] D. Monderer and L. S. Shapley, “Potential games,” Games and economic behavior, vol. 14, no. 1, pp. 124–143, 1996.
  • [13] H. P. Young, “The evolution of conventions,” Econometrica: Journal of the Econometric Society, pp. 57–84, 1993.
  • [14] J. F. Nash et al., “Equilibrium points in n-person games,” Proceedings of the national academy of sciences, vol. 36, no. 1, pp. 48–49, 1950.
  • [15] W. H. Sandholm, “Evolutionary game theory,” Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, pp. 573–608, 2020.
  • [16] G. Christodoulou, V. S. Mirrokni, and A. Sidiropoulos, “Convergence and approximation in potential games,” in Annual Symposium on Theoretical Aspects of Computer Science, pp. 349–360, Springer, 2006.
  • [17] V. Bilò, A. Fanelli, M. Flammini, and L. Moscardelli, “Performances of one-round walks in linear congestion games,” in International Symposium on Algorithmic Game Theory, pp. 311–322, Springer, 2009.
  • [18] V. Bilo, “A unifying tool for bounding the quality of non-cooperative solutions in weighted congestion games,” Theory of Computing Systems, vol. 62, no. 5, pp. 1288–1317, 2018.
  • [19] V. S. Mirrokni and A. Vetta, “Convergence issues in competitive games,” in Approximation, randomization, and combinatorial optimization. algorithms and techniques, pp. 183–194, Springer, 2004.
  • [20] A. Fanelli, M. Flammini, and L. Moscardelli, “The speed of convergence in congestion games under best-response dynamics,” in International Colloquium on Automata, Languages, and Programming, pp. 796–807, Springer, 2008.
  • [21] D. S. Hochbaum and W. Maass, “Approximation schemes for covering and packing problems in image processing and vlsi,” Journal of the ACM (JACM), vol. 32, no. 1, pp. 130–136, 1985.
  • [22] S. Mandal, N. Patra, and M. Pal, “Covering problem on fuzzy graphs and its application in disaster management system,” Soft Computing, vol. 25, no. 4, pp. 2545–2557, 2021.
  • [23] J. M. Nash, “Optimal allocation of tracking resources,” in 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications, pp. 1177–1180, IEEE, 1977.
  • [24] C.-F. Huang and Y.-C. Tseng, “The coverage problem in a wireless sensor network,” Mobile networks and Applications, vol. 10, no. 4, pp. 519–528, 2005.
  • [25] M. Goemans, V. Mirrokni, and A. Vetta, “Sink equilibria and convergence,” in 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pp. 142–151, IEEE, 2005.
  • [26] D. Paccagnan and J. R. Marden, “Utility design for distributed resource allocation–part ii: Applications to submodular, covering, and supermodular problems,” arXiv preprint arXiv:1807.01343, 2018.
  • [27] A. Vetta, “Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions,” in The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pp. 416–425, IEEE, 2002.
  • [28] V. Ramaswamy, D. Paccagnan, and J. R. Marden, “Multiagent maximum coverage problems: The trade-off between anarchy and stability,” in 2019 18th European Control Conference (ECC), pp. 1043–1048, IEEE, 2019.