This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Distributed Task Allocation for Self-Interested Agents with Partially Unknown Rewards

Nirabhra Mandal, ,   Mohammad Khajenejad, ,
and Sonia Martínez
This work is supported by the ARL grant: W911NF-23-2-0009.Nirabhra Mandal, Mohammad Khajenejad and Sonia Martínez are with the Mechanical & Aerospace Engineering Department, University of California San Diego. {nmandal,mkhajenejad,soniamd}@ ucsd.edu.
Abstract

This paper provides a novel solution to a task allocation problem, by which a group of agents decides on the assignment of a discrete set of tasks in a distributed manner. In this setting, heterogeneous agents have individual preferences and associated rewards for doing each task; however, these rewards are only known asymptotically. We start by formulating the assignment problem by means of a combinatorial partition game for known rewards, with no constraints on number of tasks per agent. We relax this into a weight game, which together with the former, are shown to contain the optimal task allocation in the corresponding set of Nash Equilibria (NE). We then propose a projected, best-response, ascending gradient dynamics (PBRAG) that converges to a NE in finite time. This forms the basis of a distributed online version that can deal with a converging sequence of rewards by means of an agreement sub-routine. We present simulations that support our results.

Index Terms:
Best response, partition game, projected gradient ascent, unknown reward, weight game

1 Introduction

A prototypical multi-agent coordination problem aims to find an efficient assignment of group of agents to complete a collection of tasks. These tasks can range from abstract sets of objectives to specific physical jobs, the nature of which may not be completely known. In addition, the agents composing the group may have heterogeneous capabilities, and react to different sets of incentives that are being learned progressively. This necessitates of novel task-assignment algorithms that can adapt and react online as new information arises. Motivated by this, we study a discrete task allocation problem modeled as a game of self-interested agents that have partial knowledge of their rewards. This requires addressing the problem’s combinatorial nature, and designing provable-correct distributed dynamics that adapt to dynamic rewards revealed online. To the best of our knowledge, algorithms that combine all these features are not available in literature.

Literature review

The problem of task allocation with known rewards has been widely considered; see e.g. [1, 2, 3]. A centralized solution to this problem, where the number mm of tasks and agents are equal and a task-agent matching is sought, is the optimization-based Hungarian algorithm [4], and its distributed version [5]. The latter, which reproduces the Hungarian algorithm locally, requires tracking of the agents’ identities associated with each task, has a time complexity of O(m3)O(m^{3}) and communication cost of O(mlogm)O(m\log m) (per communication round). Thus, the algorithm can be computationally and memory-intensive for large problems, and hard to adapt as new tasks are generated or their valuations change online. The work in [6] provides a tractable, sub-optimal solution to the same NP-hard problem, while the research in [7] showed that the sub-optimality can be resolved by restricting heterogeneous agents to be of certain types. In the same vein, the works in [8, 9, 10] considers submodular functions which allow rewards to take any non-negative value. However, submodular optimization can be applied in specific domains where the property naturally arises, such as in certain economics and distributed sensing problems. Alternatively, a well known approach to (unconstrained) task assignment problems is given by kk-means clustering and the Lloyd’s algorithm [11]. By interpreting that tasks are generated by a probability distribution, the approach can handle tasks generated dynamically [12, 13, 14, 15]. However, it is well known that Lloyd’s algorithm is sensitive to the initial task assignment for a small number of agents, and converges to a local minima.

Game-theoretic models have also been proposed to find solutions to task allocation problems. For sensor networks, each agent is equipped with an appropriate utility function [16, 17, 18] and the optimal task allocation is related to the Nash equilibrium of this game. Any Nash-seeking [19] algorithm returns a solution; but often, these algorithms require strong assumptions on the utility functions and their derivatives. Distributed versions of Nash-seeking with consensus in continuous time have been explored in [20, 21], while [22, 23] addressed the problem in discrete time. All the work in [20]-[23] assumed complete and perfect information for agents, while in practice, agents may have limited or imperfect information about the tasks, and the capabilities of other agents. Potential games can be used in this regards, but they do not work when the reward parameters are unknown. To that end, [18] characterizes transient behavior for set covering games, and [24] looks at a general potential game approach for task allocation. In the latter case, the agents are homogeneous and tasks have same rewards for all agents.

Contributions

We consider a task-assignment problem where a number of agents is to be matched to an unrestricted set of tasks. In the considered formulation, the number of tasks per agent is not constrained, yet the optimal assignment problem remains combinatorial as the number of tasks is discrete. To deal with arbitrary heteregenous agents, we derive a game-theoretic partition problem formulation that favors task distribution. We then relax the game into a weight game, one per task. We obtain characterizations of the NE of each game, their relationship, and identify conditions under which the NE leads to an optimal solution of the original assignment problem. Leveraging the relaxed formulation, and under a full-information assumption, we derive a projected best-response dynamics that is shown to converge to the optimal task allocation in finite time. This forms the basis for a new algorithm, PBRAG, which is distributed, does not require the knowledge of other agent identities, and converges to the optimal task allocation, also in finite time, as rewards are revealed online.

2 Preliminaries

Here, we formalize the notations and briefly list some well-known concepts that are used to solve the problem formulated in the following section.

2-A Notations

The sets of real numbers, non-negative real numbers, and non-negative integers are denoted as \mathbb{R}, 0\mathbb{R}_{\geq 0}, and 0\mathbb{Z}_{\geq 0}, respectively. For a set 𝒮\mathcal{S}, |𝒮||\mathcal{S}| denotes its cardinality, 2𝒮2^{\mathcal{S}} represents the class of all its subsets, 𝒮n\mathcal{S}^{n} denotes the nn Cartesian product of 𝒮\mathcal{S} with itself, and 𝒮n×m\mathcal{S}^{n\times m} collects all n×mn\times m matrices whose (i,j)th(i,j)^{\text{th}} entry lies in 𝒮\mathcal{S}. Given 𝐌𝒮n×m\mathbf{M}\in\mathcal{S}^{n\times m}, mijm^{j}_{i} is its (i,j)th(i,j)^{\text{th}} entry, and 𝐦i𝒮m\mathbf{m}_{i}^{\top}\in\mathcal{S}^{m} (resp. 𝐦j𝒮n\mathbf{m}^{j}\in\mathcal{S}^{n}) its ithi^{\text{th}} row (resp. its jthj^{\text{th}} column). For xx\in\mathbb{R}, [x]01:=max{0,min{x,1}}[x]_{0}^{1}:=\max\{0,\min\{x,1\}\}. For a set 𝒮\mathcal{S}, define max(2)𝒮:=max{s𝒮|smax𝒮}\operatorname*{max^{(2)}}\mathcal{S}:=\max\{s\in\mathcal{S}\,|\,s\neq\max\,\mathcal{S}\}. For a vector 𝐱n\mathbf{x}\in\mathbb{R}^{n} and a set 𝒮n\mathcal{S}\subseteq\mathbb{R}^{n}, d(𝐱,𝒮):=inf𝐲𝒮𝐱𝐲1d(\mathbf{x},\mathcal{S}):=\inf_{\mathbf{y}\in\mathcal{S}}\|\mathbf{x}-\mathbf{y}\|_{1} is the distance of the vector from the set. Lastly, the empty set is denoted as \varnothing.

2-B Game theory

A strategic form game [25] is a tuple 𝒢:=𝒜,{𝒮i}i𝒜,{ψi}i𝒜\mathscr{G}:=\left<\mathcal{A},\{\mathcal{S}_{i}\}_{i\in\mathcal{A}},\{\psi_{i}\}_{i\in\mathcal{A}}\right> consisting of the following components:

  1. 1.

    a set of players (or agents) 𝒜\mathcal{A};

  2. 2.

    a set of strategies si𝒮is_{i}\in\mathcal{S}_{i} available to each i𝒜i\in\mathcal{A};

  3. 3.

    a set of utility functions ψi:×i𝒜𝒮i\psi_{i}:\times_{i\in\mathcal{A}}\mathcal{S}_{i}\to\mathbb{R} over the strategy profiles of all the agents.

In what follows, sis_{-i} denotes the strategy profile of all players other than i𝒜i\in\mathcal{A}. Next, we formally state the definition of the NE of a strategic form game.

Definition 2.1 (Nash equilibrium).

The strategy profile (s^i,s^i)(\widehat{s}_{i},\widehat{s}_{-i}) is a Nash equilibrium (NE) of 𝒢\mathscr{G} if and only if

ψi(s^i,s^i)ψi(si,s^i),si𝒮i,i𝒜.\displaystyle\psi_{i}(\widehat{s}_{i},\widehat{s}_{-i})\geq\psi_{i}(s_{i},\widehat{s}_{-i}),\,\,\forall s_{i}\in\mathcal{S}_{i},\,\,\forall i\in\mathcal{A}\,.

𝒩(𝒢)\mathcal{NE}(\mathscr{G}) denotes the set of all Nash equilibria of 𝒢\mathscr{G}. \bullet

2-C Graph theory

A directed graph [26] 𝒢:=(𝒜,)\mathcal{G}:=(\mathcal{A},\mathcal{E}), is a tuple consisting of (a) a set of nodes (here agents 𝒜\mathcal{A}); (b) a set of arcs 𝒜×𝒜\mathcal{E}\subseteq\mathcal{A}\times\mathcal{A} between the nodes. The set 𝒩i:={j𝒜|(j,i)}\mathcal{N}_{i}:=\{j\in\mathcal{A}\,|\,(j,i)\in\mathcal{E}\} denotes the (in) neighbors of node i𝒜i\in\mathcal{A} and 𝒩¯i:=𝒩i{i}\overline{\mathcal{N}}_{i}:=\mathcal{N}_{i}\cup\{i\}. A path is an ordered set of non-repeating nodes such that each tuple of adjacent nodes belongs to \mathcal{E}. The graph 𝒢\mathcal{G} is said to be strongly connected if there exists a path from every node to every other node. The diameter of the graph diam(𝒢)\mathrm{diam}(\mathcal{G}) is the length of the largest possible path between any two nodes.

3 Problem Formulation

A group of agents 𝒜:={1,,n}\mathcal{A}:=\{1,\cdots,n\} is to complete a set of tasks 𝒬:={1,,m}\mathcal{Q}:=\{1,\cdots,m\}, where nmn\neq m possibly, in a distributed manner. For this purpose, each agent i𝒜i\in\mathcal{A} encodes via ϕi:𝒬0\phi_{i}:\mathcal{Q}\to\mathbb{R}_{\geq 0} the importance of each task (the higher ϕi(q)\phi_{i}(q) the larger the agent’s capability/fondness on q𝒬q\in\mathcal{Q}) and ri:𝒬0r_{i}:\mathcal{Q}\to\mathbb{R}_{\geq 0} the reward for completing each task. For the sake of brevity, define fi(q):=ri(q)ϕi(q)0f_{i}(q):=r_{i}(q)\phi_{i}(q)\geq 0, q𝒬\forall q\in\mathcal{Q}, i𝒜\forall i\in\mathcal{A}. An optimal task assignment is the solution to

max𝒫=(𝒱1,,𝒱n)𝒬nJ(𝒫):=i𝒜q𝒱ifi(q),\displaystyle\max_{{\mathcal{P}=(\mathcal{V}_{1},\cdots,\mathcal{V}_{n})\subseteq\mathcal{Q}^{n}}}\quad J(\mathcal{P}):=\sum_{i\in\mathcal{A}}\sum_{q\in\mathcal{V}_{i}}f_{i}(q), (1a)
s.t.i𝒜𝒱i=𝒬;𝒱i𝒱j=,ifij.\displaystyle\mathrm{s.t.}\quad\bigcup_{i\in\mathcal{A}}\mathcal{V}_{i}=\mathcal{Q};\quad\mathcal{V}_{i}\cap\mathcal{V}_{j}=\varnothing,\,\,\mathrm{if}\,\,i\neq j. (1b)

Here, 𝒱i𝒬\mathcal{V}_{i}\subseteq\mathcal{Q} is the set of tasks assigned to agent i𝒜i\in\mathcal{A} and 𝒫=(𝒱1,,𝒱n)\mathcal{P}=(\mathcal{V}_{1},\cdots,\mathcal{V}_{n}) is the ordered collection of sets that defines a partition of 𝒬\mathcal{Q} (as in 1b).

The group of agents is to compute an optimal partition of the task set 𝒬\mathcal{Q} on their own. Naturally, each agent i𝒜i\in\mathcal{A} aims to get the tasks q𝒬q\in\mathcal{Q} for which fi(q)f_{i}(q) is the largest. This motivates the following definition.

Definition 3.1 (Task specific dominating agent).

An agent i𝒜i\in\mathcal{A} is said to be a dominating agent for task q𝒬q\in\mathcal{Q} (or ii is dominating for qq), if fi(q)fj(q)f_{i}(q)\geq f_{j}(q), j𝒜\forall j\in\mathcal{A}. If a task q𝒬q\in\mathcal{Q} has exactly one dominating agent, we say that there exists a unique dominating agent for task qq. \bullet

The collection of possible strategies for each agent is a combinatorial class, which grows exponentially as the number of tasks mm increases. To address this problem, we first assume that each i𝒜i\in\mathcal{A} measures the utility of a subset of tasks 𝒱i𝒬\mathcal{V}_{i}\subseteq\mathcal{Q} via

Hi(𝒱i,𝒱i):=q𝒱i[fi(q)maxj𝒜{i},q𝒱jfj(q)].\displaystyle H_{i}(\mathcal{V}_{i},\mathcal{V}_{-i}):=\sum_{q\in\mathcal{V}_{i}}\Big{[}f_{i}(q)-\max_{j\in\mathcal{A}\setminus\{i\},q\in\mathcal{V}_{j}}f_{j}(q)\Big{]}. (2)

This leads to the partition game

𝒢P:=𝒜,{2𝒬}i𝒜,{Hi}i𝒜,\displaystyle\mathscr{G}_{P}:={\left<\mathcal{A},\{2^{\mathcal{Q}}\}_{i\in\mathcal{A}},\{H_{i}\}_{i\in\mathcal{A}}\right>},

where the strategy of each agent is to choose a subset 𝒱i\mathcal{V}_{i} of 𝒬\mathcal{Q} to maximize HiH_{i}. In this way, agent strategies are no-longer required to form a valid partition, but the utility in 2 penalizes each agent for taking tasks that others have chosen.

Second, we further relax this game by reducing the decision of each agent i𝒜i\in\mathcal{A} regarding task q𝒬q\in\mathcal{Q} to the computation of a weight wiq[0,1]w^{q}_{i}\in[0,1]. Briefly, this defines 𝐖[0,1]n×m\mathbf{W}\in[0,1]^{n\times m} as the matrix whose (i,q)th(i,q)^{\text{th}} entry is wiqw^{q}_{i}. Thus, 𝐰i[0,1]m\mathbf{w}_{i}^{\top}\in[0,1]^{m} (resp. 𝐰q[0,1]n\mathbf{w}^{q}\in[0,1]^{n}) represents the weights that agent i𝒜i\in\mathcal{A} (resp. for task q𝒬q\in\mathcal{Q}) gives to each task (resp. given by each agent). Agent i𝒜i\in\mathcal{A} is equipped with the utility function:

Ui(𝐰i,𝐰i)\displaystyle U_{i}(\mathbf{w}_{i},\mathbf{w}_{-i}) =q𝒬[fi(q)wiqmaxj𝒜{i}fj(q)wjqwiq],\displaystyle=\sum_{q\in\mathcal{Q}}\Big{[}f_{i}(q)w^{q}_{i}-\max_{j\in\mathcal{A}\setminus\{i\}}f_{j}(q)w^{q}_{j}w^{q}_{i}\Big{]}\,, (3)

which collectively define the weight game

𝒢W:=𝒜,𝐖,{Ui}i𝒜,\displaystyle\mathscr{G}_{W}:={\left<\mathcal{A},\mathbf{W},\{U_{i}\}_{i\in\mathcal{A}}\right>},

In this way, a product of weights in the second part of the sum in 3 relaxes the check on overlapping task in 2. In this paper, we ignore the trivial case where all agents get the same payoff for a task, as stated in the following.

Assumption 3.2 (Non-trivial task assignment).

Not all agents are dominating for each task q𝒬q\in\mathcal{Q}. \bullet

The above framework allows us to deal with a case where the fi(q)f_{i}(q) are unknown to the agent, but where these values can be learned progressively by an external mechanism until convergence. More precisely, we assume the following.

Assumption 3.3 (Converging reward sequence).

For each i𝒜i\in\mathcal{A}, q𝒬q\in\mathcal{Q}, there exists a sequence {ziq(t)}t0\{z^{q}_{i}(t)\}_{t\in\mathbb{Z}_{\geq 0}} such that ziq(t)fi(q)z^{q}_{i}(t)\to f_{i}(q) as tt\to\infty. \bullet

In what follows, we first study the games when the reward parameters are known. Then we adapt the results for the case when only a converging reward sequence is available.

Now we formally state the goals of this work.

Problem 3.4.

Given the aforementioned setup and the non-trivial task assignment assumption, find

  1. 1.

    a relationship between the NE of 𝒢P\mathscr{G}_{P} and 𝒢W\mathscr{G}_{W},

  2. 2.

    a relationship between the NE and optimal partitions according to 1,

  3. 3.

    a distributed algorithm that converges to the NE of the limiting weight game 𝒢W\mathscr{G}_{W} under the converging reward sequence assumption. \bullet

4 On Nash Equilibria and Optimal Partitions

We start by addressing the first two problems above. Thus, we first characterize the NE of the partition game 𝒢P\mathscr{G}_{P}.

Lemma 4.1 (Nash equilibria of 𝒢P\mathscr{G}_{P}).

The strategy (𝒱^i,𝒱^i)𝒩(𝒢P)(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i})\in\mathcal{NE}(\mathscr{G}_{P}) if and only if:

  1. 1.

    for each q𝒬q\in\mathcal{Q}, i𝒜\exists\,i\in\mathcal{A} dominating for qq and q𝒱^iq\in\widehat{\mathcal{V}}_{i};

  2. 2.

    if jj is not a dominating agent for task qq, then q𝒱^jq\not\in\widehat{\mathcal{V}}_{j}.

Proof. First, we show the necessity of Properties 1 and 2. Suppose that (𝒱^i,𝒱^i)𝒩(𝒢P)(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i})\in\mathcal{NE}(\mathscr{G}_{P}). We prove Property 1 by contradiction and assume q𝒬\exists\,q\in\mathcal{Q} such that i𝒜\forall i\in\mathcal{A} dominating for task qq, q𝒱^iq\notin\widehat{\mathcal{V}}_{i}. Pick one such agent ii and take 𝒱i=𝒱^i{q}\mathcal{V}_{i}=\widehat{\mathcal{V}}_{i}\cup\{q\}. Then,

Hi(𝒱i,𝒱^i)Hi(𝒱^i,𝒱^i)=fi(q)maxki,q𝒱^kfk(q)>0.\displaystyle H_{i}(\mathcal{V}_{i},\widehat{\mathcal{V}}_{-i})-H_{i}(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i})=f_{i}(q)-\max_{k\neq i,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q)>0.

The inequality is strict since ii is dominating and the max is over all agents that are not dominating for qq (by assumption). This is a contradiction with (𝒱^i,𝒱^i)𝒩(𝒢P)(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i})\in\mathcal{NE}(\mathscr{G}_{P}).

The necessity of Property 2 also follows from contradiction. Suppose q𝒬\exists\,q\in\mathcal{Q} and a j𝒜j\in\mathcal{A} not dominating for qq with q𝒱^jq\in\widehat{\mathcal{V}}_{j}. From Property 1, there is an i𝒜i\in\mathcal{A} dominating for qq with q𝒱^iq\in\widehat{\mathcal{V}}_{i}. Thus, with strategy (𝒱^j,𝒱^j)(\widehat{\mathcal{V}}_{j},\widehat{\mathcal{V}}_{-j}),

fj(q)maxkj,q𝒱^kfk(q)=fj(q)maxk𝒜fk(q)<0.\displaystyle f_{j}(q)-\max_{k\neq j,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q)=f_{j}(q)-\max_{k\in\mathcal{A}}f_{k}(q)<0.

Now, consider the strategy 𝒱j=𝒱^j{q}\mathcal{V}_{j}=\widehat{\mathcal{V}}_{j}\setminus\{q\}. It follows that Hj(𝒱j,𝒱^j)Hj(𝒱^j,𝒱^j)>0H_{j}(\mathcal{V}_{j},\widehat{\mathcal{V}}_{-j})-H_{j}(\widehat{\mathcal{V}}_{j},\widehat{\mathcal{V}}_{-j})>0, which contradicts (𝒱^j,𝒱^j)𝒩(𝒢P)(\widehat{\mathcal{V}}_{j},\widehat{\mathcal{V}}_{-j})\in\mathcal{NE}(\mathscr{G}_{P}).

Now, we show the sufficiency of Properties 1 and 2. Let (𝒱^i,𝒱^i)(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i}) satisfy Properties 1 and 2 and let i𝒜i\in\mathcal{A} be an arbitrary but fixed agent. Suppose that 𝒱i𝒱^i\mathcal{V}_{i}\neq\widehat{\mathcal{V}}_{i} is any other strategy. Then, the proof follows from three cases:

Case (i): q𝒱i\exists\,q\in\mathcal{V}_{i} such that q𝒱^iq\notin\widehat{\mathcal{V}}_{i} and ii is dominating for qq. Then, since j𝒜\exists\,j\in\mathcal{A} dominating for qq with q𝒱^jq\in\widehat{\mathcal{V}}_{j},

fi(q)maxki,q𝒱^kfk(q)=fi(q)fj(q)=0.\displaystyle f_{i}(q)-\max_{k\neq i,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q)=f_{i}(q)-f_{j}(q)=0.

Case (ii): q𝒱i\exists\,q\in\mathcal{V}_{i} such that q𝒱^iq\notin\widehat{\mathcal{V}}_{i} and ii does not dominate qq. Then, as j𝒜\exists\,j\in\mathcal{A} dominating for qq and q𝒱^jq\in\widehat{\mathcal{V}}_{j}, we have

fi(q)maxki,q𝒱^kfk(q)=fi(q)fj(q)<0.\displaystyle f_{i}(q)-\max_{k\neq i,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q)=f_{i}(q)-f_{j}(q)<0.

Case (iii): q𝒱^i\exists\,q\in\widehat{\mathcal{V}}_{i} such that q𝒱iq\notin\mathcal{V}_{i}. This can only happen if ii is dominating for qq (else, by Property 2, q𝒱^jq\notin\widehat{\mathcal{V}}_{j}). Then,

fi(q)maxki,q𝒱^kfk(q)0.\displaystyle f_{i}(q)-\max_{k\neq i,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q)\geq 0.

From the above, it is easy to see that any deviation from (𝒱^i,𝒱^i)(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i}) will not result in an increase in utility for ii since Hi(𝒱i,𝒱^i)Hi(𝒱^i,𝒱^i)=fi(q)maxki,q𝒱^kfk(q)H_{i}(\mathcal{V}_{i},\widehat{\mathcal{V}}_{-i})-H_{i}(\widehat{\mathcal{V}}_{i},\widehat{\mathcal{V}}_{-i})=f_{i}(q)-\max\limits_{k\neq i,\,q\in\widehat{\mathcal{V}}_{k}}f_{k}(q). \blacksquare

From the previous result, at least one of the dominating agents will be assigned to a task by means of a NE strategy of 𝒢P\mathscr{G}_{P}. However, this does not preclude that two dominating agents are assigned the same task. Next, we show that the NE of the relaxed game 𝒢W\mathscr{G}_{W} are equivalent to the NE of 𝒢P\mathscr{G}_{P}.

Lemma 4.2 (Nash equilibria of 𝒢W\mathscr{G}_{W}).

The strategy (𝐰^i,𝐰^i)𝒩(𝒢W)(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i})\in\mathcal{NE}(\mathscr{G}_{W}) if and only if:

  1. 1.

    for each q𝒬q\in\mathcal{Q}, i𝒜\exists\,i\in\mathcal{A} dominating for qq and w^iq=1\widehat{w}^{q}_{i}=1;

  2. 2.

    if jj is not a dominating agent for task qq, then w^jq=0\widehat{w}^{q}_{j}=0.

Proof. First, we show the necessity of all properties. Suppose (𝐰^i,𝐰^i)𝒩(𝒢W)(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i})\in\mathcal{NE}(\mathscr{G}_{W}). We show Property 1 is necessary by contradiction. Consider an arbitrary q𝒬q\in\mathcal{Q} and suppose that for all dominating agents iq𝒜i^{*}_{q}\in\mathcal{A} for task qq, it holds that w^iqq<1\widehat{w}^{q}_{i^{*}_{q}}<1. In particular, for any such iqi^{*}_{q}, we have maxj{iq}fj(q)w^jq<fiq(q)\max_{j\neq\{i^{*}_{q}\}}f_{j}(q)\widehat{w}^{q}_{j}<f_{i^{*}_{q}}(q). Now consider the strategy 𝐰iq\mathbf{w}_{i^{*}_{q}}, where wiqq=1w^{q}_{i^{*}_{q}}=1 and wpiq=w^piqw^{i^{*}_{q}}_{p}=\widehat{w}^{i^{*}_{q}}_{p}, pq𝒬\forall p\neq q\in\mathcal{Q}. Then,

Uiq(𝐰iq,𝐰^iq)Uiq(𝐰^iq,𝐰^iq)=\displaystyle U_{i^{*}_{q}}(\mathbf{w}_{i^{*}_{q}},\widehat{\mathbf{w}}_{-i^{*}_{q}})-U_{i^{*}_{q}}(\widehat{\mathbf{w}}_{i^{*}_{q}},\widehat{\mathbf{w}}_{-i^{*}_{q}})=
[fiq(q)maxjiqfj(q)w^jq][1w^iqq]>0.\displaystyle\hskip 30.00005pt\Big{[}f_{i^{*}_{q}}(q)-\max_{j\neq i^{*}_{q}}f_{j}(q)\widehat{w}^{q}_{j}\Big{]}\big{[}1-\widehat{w}^{q}_{i^{*}_{q}}\big{]}>0.

This leads to a contradiction with (𝐰^i,𝐰^i)𝒩(𝒢W)(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i})\in\mathcal{NE}(\mathscr{G}_{W}).

We similarly show Property 2 is necessary by contradiction. Let q𝒬q\in\mathcal{Q} be an arbitrary task, and suppose that j𝒜\exists\,j\in\mathcal{A} which is not dominating for qq but for which w^jq>0\widehat{w}^{q}_{j}>0. Due to Property 1, let iqi^{*}_{q} be the dominating agent for qq such that w^iqq=1\widehat{w}^{q}_{i^{*}_{q}}=1. Now define a new strategy 𝐰j\mathbf{w}_{j}, with wjq=0w^{q}_{j}=0 and wpj=w^pjw^{j}_{p}=\widehat{w}^{j}_{p}, pq𝒬\forall p\neq q\in\mathcal{Q}. Then,

Uj(𝐰j,𝐰^j)Uj(𝐰^j,𝐰^j)=[fj(q)fiq(q)][w^jq]>0,\displaystyle U_{j}(\mathbf{w}_{j},\widehat{\mathbf{w}}_{-j})-U_{j}(\widehat{\mathbf{w}}_{j},\widehat{\mathbf{w}}_{-j})=\big{[}f_{j}(q)-f_{i^{*}_{q}}(q)\big{]}[-\widehat{w}^{q}_{j}]>0,

where the inequality is because both terms are negative. This contradicts (𝐰^i,𝐰^i)𝒩(𝒢W)(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i})\in\mathcal{NE}(\mathscr{G}_{W}).

Next, we show sufficiency. Let (𝐰^i,𝐰^i)[0,1]n×m(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i})\in[0,1]^{n\times m} be a candidate strategy satisfying Properties 12 and let i𝒜i\in\mathcal{A}. Take any other 𝐰i𝐰^i\mathbf{w}_{i}\neq\widehat{\mathbf{w}}_{i} and a task q𝒬q\in\mathcal{Q}. The proof follows from the following cases.

Case (i): ii is a dominating agent for task qq and w^iq=1\widehat{w}^{q}_{i}=1. Then, from Definition 3.1,

[fi(q)maxjifj(q)w^jq]wiq[fi(q)maxjifj(q)w^jq]w^iq.\displaystyle\Big{[}f_{i}(q)-\max_{j\neq i}f_{j}(q)\widehat{w}^{q}_{j}\Big{]}w^{q}_{i}\leq\Big{[}f_{i}(q)-\max_{j\neq i}f_{j}(q)\widehat{w}^{q}_{j}\Big{]}\widehat{w}^{q}_{i}.

Case (ii): ii is a dominating agent for task qq and w^iq<1{\widehat{w}^{q}_{i}<1}. Then, since j𝒜\exists\,j\in\mathcal{A} dominating for qq and w^jq=1\widehat{w}^{q}_{j}=1,

[fi(q)maxkifk(q)w^kq]wiq=[fi(q)fj(q)]wiq\displaystyle\Big{[}f_{i}(q)-\max_{k\neq i}f_{k}(q)\widehat{w}^{q}_{k}\Big{]}w^{q}_{i}=\big{[}f_{i}(q)-f_{j}(q)\big{]}w^{q}_{i}
=[fi(q)fj(q)]w^iq=[fi(q)maxkifk(q)w^kq]w^iq=0,\displaystyle=\big{[}f_{i}(q)-f_{j}(q)\big{]}\widehat{w}^{q}_{i}=\Big{[}f_{i}(q)-\max_{k\neq i}f_{k}(q)\widehat{w}^{q}_{k}\Big{]}\widehat{w}^{q}_{i}=0,

since fi(q)fj(q)=0f_{i}(q)-f_{j}(q)=0.

Case (iii): ii is not a dominating agent for task qq (and hence w^iq=0\widehat{w}^{q}_{i}=0). Again, j𝒜\exists\,j\in\mathcal{A} dominating for qq and w^jq=1\widehat{w}^{q}_{j}=1. Then,

[fi(q)maxkifk(q)w^kq]wiq=[fi(q)fj(q)]wiq\displaystyle\Big{[}f_{i}(q)-\max_{k\neq i}f_{k}(q)\widehat{w}^{q}_{k}\Big{]}w^{q}_{i}=\big{[}f_{i}(q)-f_{j}(q)\big{]}w^{q}_{i}
<[fi(q)fj(q)]w^iq=[fi(q)maxkifk(q)w^kq]w^iq,\displaystyle<\big{[}f_{i}(q)-f_{j}(q)\big{]}\widehat{w}^{q}_{i}=\Big{[}f_{i}(q)-\max_{k\neq i}f_{k}(q)\widehat{w}^{q}_{k}\Big{]}\widehat{w}^{q}_{i},

since fi(q)<fj(q)f_{i}(q)<f_{j}(q). Now, using these three cases, it is easy to see that any deviation from (𝐰^i,𝐰^i)(\widehat{\mathbf{w}}_{i},\widehat{\mathbf{w}}_{-i}) will not result in an increase the utility of ii. \blacksquare

As a direct implication of Lemma 4.2, for any 𝐖[0,1]n×m\mathbf{W}\in[0,1]^{n\times m} we can define C:[0,1]n×m(2𝒬)nC:[0,1]^{n\times m}\to(2^{\mathcal{Q}})^{n} as

C(𝐖):=(tsupp(𝐰1),,tsupp(𝐰n)),\displaystyle C(\mathbf{W}):=(\mathrm{tsupp}\,(\mathbf{w}_{1}),\cdots,\mathrm{tsupp}\,(\mathbf{w}_{n})), (4)

where tsupp(𝐰i):={q𝒬|wiq=1},i𝒜\mathrm{tsupp}\,(\mathbf{w}_{i}):=\{q\in\mathcal{Q}\,|\,w^{q}_{i}=1\},\,\,\forall i\in\mathcal{A}. Then,

𝒩(𝒢P)=C(𝒩(𝒢W)).\displaystyle\mathcal{NE}(\mathscr{G}_{P})=C(\mathcal{NE}(\mathscr{G}_{W})). (5)

Next, we relate the optimal partition and the NE of the two games through the following theorem.

Theorem 4.3 (Optimal partitions and Nash equilibria).

Given the problem in 1 , 𝒪C(𝒩(𝒢W))\mathcal{O}\subseteq C(\mathcal{NE}(\mathscr{G}_{W})), where

𝒪:={𝒫|𝒫isasolutionto1}.\displaystyle\mathcal{O}:=\{\mathcal{P}^{*}\,\,|\,\,\mathcal{P}^{*}\,\,\mathrm{is\,\,a\,\,solution\,\,to\,\,}\lx@cref{refnum}{eq:partition_optimization}\}. (6)

Proof. By 5, we can equivalently show that 𝒪𝒩(𝒢P)\mathcal{O}\subseteq\mathcal{NE}(\mathscr{G}_{P}). Let 𝒫𝒪\mathcal{P}^{*}\in\mathcal{O}. It is easy to see that q𝒱iq\in\mathcal{V}^{*}_{i} only if i𝒜i\in\mathcal{A} is a dominating agent for task qq. Moreover, if j𝒜j\in\mathcal{A} is not dominating for qq, then q𝒱jq\notin\mathcal{V}^{*}_{j}. Then from Lemma 4.1, 𝒫𝒩(𝒢P)\mathcal{P}^{*}\in\mathcal{NE}(\mathscr{G}_{P}). The rest follows from 5. \blacksquare

The above result states that if an agent i𝒜i\in\mathcal{A} is assigned tasks using the translated support of the NE of 𝒢W\mathscr{G}_{W}, this set is a superset of the optimizers of 1. The extra solutions arise when there are non-unique dominating agents for a task. When there are unique dominating agents, the next result shows there is a unique NE for 𝒢W\mathscr{G}_{W}. This follows from Lemma 4.2 immediately, so we skip a formal proof.

Corollary 4.4 (Uniqueness of Nash equilibria).

Suppose that for each q𝒬q\in\mathcal{Q}, iqi^{*}_{q} is the unique dominating agent for task qq. Then 𝒩(𝒢W)={𝐖^}\mathcal{NE}(\mathscr{G}_{W})=\{\widehat{\mathbf{W}}\} where, for each q𝒬q\in\mathcal{Q}, 𝐖^\widehat{\mathbf{W}} satisfies 1) w^iqq=1\widehat{w}^{q}_{i^{*}_{q}}=1, and 2) w^jq=0\widehat{w}^{q}_{j}=0, jiq\forall\,j\neq i^{*}_{q}. Further, 𝒫=C(𝐖^)\mathcal{P}^{*}=C(\widehat{\mathbf{W}}) is the unique solution to 1. \blacksquare

In general, 𝒩(𝒢W)\mathcal{NE}(\mathscr{G}_{W}) is a superset of the set of optimal partitions. The next example makes this clear.

Example 4.5 (Optimal partitions and Nash equilibria).

Let 𝒜={1,2}\mathcal{A}=\{1,2\} and 𝒬={𝔞,𝔟}\mathcal{Q}=\{\mathfrak{a},\mathfrak{b}\}. Assume the values f1(𝔞)=f2(𝔞)=0.5f_{1}(\mathfrak{a})=f_{2}(\mathfrak{a})=0.5, f1(𝔟)=0.7f_{1}(\mathfrak{b})=0.7 and f2(𝔟)=0.3f_{2}(\mathfrak{b})=0.3. Then, 𝒪={({𝔞,𝔟},),({𝔟},{𝔞})}\mathcal{O}=\{(\{\mathfrak{a},\mathfrak{b}\},\varnothing),(\{\mathfrak{b}\},\{\mathfrak{a}\})\}; 𝒩(𝒢P)={({𝔞,𝔟},),({𝔞,𝔟},{𝔞}),({𝔟},{𝔞})}\mathcal{NE}(\mathscr{G}_{P})=\{(\{\mathfrak{a},\mathfrak{b}\},\varnothing),(\{\mathfrak{a},\mathfrak{b}\},\{\mathfrak{a}\}),(\{\mathfrak{b}\},\{\mathfrak{a}\})\}; and

𝒩(𝒢W)={[11λ0],[1110],[μ110]},\displaystyle\mathcal{NE}(\mathscr{G}_{W})=\left\{\begin{bmatrix}1&1\\ \lambda&0\end{bmatrix},\begin{bmatrix}1&1\\ 1&0\end{bmatrix},\begin{bmatrix}\mu&1\\ 1&0\end{bmatrix}\right\},

where λ,μ\lambda,\mu can independently take any value in [0,1)[0,1). Thus, in this case 𝒪𝒩(𝒢P)=C(𝒩(𝒢W))\mathcal{O}\subsetneq\mathcal{NE}(\mathscr{G}_{P})=C(\mathcal{NE}(\mathscr{G}_{W})). Interestingly, note that there is an optimal partition in which agent 22 does not get any task. \bullet

Next, we design a dynamical system using which the agents can figure out the optimal partition on their own.

5 Best Response Projected Gradient Ascent

From the previous section, we know that if the agents play the weight game 𝒢W\mathscr{G}_{W}, then the NE form a superset of the optimal task partition (with slight abuse of notation). Thus, here we let the agents update their weights (from any initial feasible weight) using the gradient of their utility while assuming the others do not change their weights. For such a dynamical system, we aim to relate its equilibria to the NE of 𝒢W\mathscr{G}_{W} and hence also relate it to the set 𝒪\mathcal{O} of optimal solutions to 1. Now, from 3, it can be seen that,

wiqUi=fi(q)maxj𝒜{i}fj(q)wjq=:uiq(𝐰q).\displaystyle\frac{\partial}{\partial w^{q}_{i}}U_{i}=f_{i}(q)-\max_{j\in\mathcal{A}\setminus\{i\}}f_{j}(q)w^{q}_{j}\,=:u^{q}_{i}(\mathbf{w}^{q})\,. (7)

Thus, the weights are updated using the following dynamics:

wiq(t+1)=[wiq(t)+γiquiq(𝐰q(t))]01,\displaystyle w^{q}_{i}(t+1)=\big{[}w^{q}_{i}(t)+\gamma^{q}_{i}\,u^{q}_{i}(\mathbf{w}^{q}(t))\big{]}_{0}^{1}, (8)

with γiq>0\gamma^{q}_{i}\in\mathbb{R}_{>0}, i𝒜,q𝒬\forall i\in\mathcal{A},\forall q\in\mathcal{Q}. We call this the projected best response ascending gradient dynamics (PBRAG). From 7 and 8, note that in order to compute the weight updates, each agent i𝒜i\in\mathcal{A} needs to know fj(q)wjqf_{j}(q)w^{q}_{j}, for all jij\neq i. This requires that each agent must talk to every other agent to compute its own gradient. The equilibrium points of this dynamics is given by

𝒲:={𝐖[0,1]n×m[wiq+γiquiq(𝐰q)]01=wiq,\displaystyle\mathcal{W}:=\Big{\{}\mathbf{W}\in[0,1]^{n\times m}\ \vline\ \Big{[}w^{q}_{i}+\gamma^{q}_{i}\,u^{q}_{i}(\mathbf{w}^{q})\Big{]}_{0}^{1}=w^{q}_{i},
i𝒜,q𝒬}.\displaystyle\hskip 150.00023pt\forall i\in\mathcal{A},\forall q\in\mathcal{Q}\Big{\}}. (9)

For this, the following result can be stated immediately.

Lemma 5.1 (Equilibrium weights are Nash equilibria).

The weight matrix 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W} if and only if 𝐖¯𝒩(𝒢W)\overline{\mathbf{W}}\in\mathcal{NE}(\mathscr{G}_{W}).

Proof. We prove this by showing that 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W} if and only if 𝐖¯\overline{\mathbf{W}} follows Properties 1 and 2 of Lemma 4.2. Suppose that 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W} and consider an arbitrary but fixed q𝒬q\in\mathcal{Q}. We prove Property 1 by contradiction and assume that i𝒜\forall i\in\mathcal{A} dominating for qq, w¯iq<1\overline{w}^{q}_{i}<1. Now, for any dominating agent iq𝒜i^{*}_{q}\in\mathcal{A}, maxj𝒜{iq}fj(q)w¯jq<fiq(q)\max_{j\in\mathcal{A}\setminus\{i^{*}_{q}\}}f_{j}(q)\overline{w}^{q}_{j}<f_{i^{*}_{q}}(q). Thus, uiqq(𝐰¯q)>0u^{q}_{i^{*}_{q}}(\overline{\mathbf{w}}^{q})>0. Since w¯iqq<1\overline{w}^{q}_{i^{*}_{q}}<1 and γiqq>0\gamma^{q}_{i^{*}_{q}}>0, this contradicts 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W}.

Next we prove Property 2 also by contradiction. Suppose that j𝒜\exists\,j\in\mathcal{A} not dominating for task qq but w¯jq>0\overline{w}^{q}_{j}>0. Due to Property 1, let iq𝒜i^{*}_{q}\in\mathcal{A} be the dominating agent for task qq such that w¯iqq=1\overline{w}^{q}_{i^{*}_{q}}=1. Then,

ujq(𝐰¯q)=fj(q)maxk𝒜{j}fk(q)w¯kq=fj(q)fiq(q)<0.\displaystyle u^{q}_{j}(\overline{\mathbf{w}}^{q})=f_{j}(q)-\max_{k\in\mathcal{A}\setminus\{j\}}f_{k}(q)\overline{w}^{q}_{k}=f_{j}(q)-f_{i^{*}_{q}}(q)<0\,.

Again, as w¯jq>0\overline{w}^{q}_{j}>0 and γjq>0\gamma^{q}_{j}>0, this contradicts 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W}.

To show sufficiency, let 𝐖¯\overline{\mathbf{W}} satisfy Properties 1 and 2. Then it is easy to see that for each task q𝒬q\in\mathcal{Q}, uiq(𝐰¯q)0u^{q}_{i}(\overline{\mathbf{w}}^{q})\geq 0 if i𝒜i\in\mathcal{A} is dominating for qq with w¯iq=1\overline{w}^{q}_{i}=1, uiq(𝐰¯q)=0u^{q}_{i}(\overline{\mathbf{w}}^{q})=0 if i𝒜i\in\mathcal{A} is dominating for qq with w¯iq<1\overline{w}^{q}_{i}<1, and ujq(𝐰¯q)<0u^{q}_{j}(\overline{\mathbf{w}}^{q})<0 if j𝒜j\in\mathcal{A} is not dominating for qq (hence w¯jq=0\overline{w}^{q}_{j}=0). Then, 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W} follows since γjq>0\gamma^{q}_{j}>0. \blacksquare

From Lemma 5.1, we can also infer that if there is a unique dominating agent, then the equilibrium set becomes a singleton and follows the same structure as in Corollary 4.4.

In what follows, we show that starting from any initial weights, the dynamics 8 converges to an equilibrium.

Theorem 5.2 (PBRAG  converges to an equilibrium weight).

Suppose Assumption 3.2 holds. Consider the dynamics 8 with an initial condition 𝐖(0)[0,1]n×m\mathbf{W}(0)\in[0,1]^{n\times m} and let 𝐖(t)\mathbf{W}(t) be the solution trajectory. Then limt𝐖(t)=𝐖¯𝒲\lim_{t\to\infty}\mathbf{W}(t)=\overline{\mathbf{W}}\in\mathcal{W}.

Proof. Notice that for the dynamics 8, the weight associated with each task evolves independently from the weights associated with other tasks. Thus, consider an arbitrary but fixed q𝒬q\in\mathcal{Q}. Next, consider any i𝒜i\in\mathcal{A} that is dominating for qq. From 7, uiq(𝐰q)0u^{q}_{i}(\mathbf{w}^{q})\geq 0, 𝐰q[0,1]n\forall\,\mathbf{w}^{q}\in[0,1]^{n}. Thus, since γiq>0\gamma^{q}_{i}>0, i𝒜\forall i\in\mathcal{A}, wiq(t)w^{q}_{i}(t) is non-decreasing. Hence, wiq(t)w^iq[0,1]w^{q}_{i}(t)\to\widehat{w}^{q}_{i}\in[0,1] as tt\to\infty, since [0,1][0,1] is compact. Now consider q:={j𝒜|jisdominatingforq}\mathcal{I}^{q}:=\{j\in\mathcal{A}\,|\,j\,\,\mathrm{is\,\,dominating\,\,for}\,\,q\}, the set 𝒳:={𝐯[0,1]|q||vj=1forsomejq}\mathcal{X}:=\{\mathbf{v}\in[0,1]^{|\mathcal{I}^{q}|}\,|\,v_{j}=1\,\,\mathrm{for\,\,some}\,\,j\in\mathcal{I}^{q}\} and define the continuous function V(𝐰q):=d({wiq}iq,𝒳)V(\mathbf{w}^{q}):=d(\{w^{q}_{i}\}_{i\in\mathcal{I}^{q}},\mathcal{X}). It is clear that V(𝐰q(t+1))V(𝐰q(t))V(\mathbf{w}^{q}(t+1))\leq V(\mathbf{w}^{q}(t)) t0\forall t\in\mathbb{Z}_{\geq 0}. Applying the LaSalle invariance principle, there is convergence to the largest invariant set in V(𝐰(t))=V(𝐰(t+1))V(\mathbf{w}(t))=V(\mathbf{w}(t+1)) for all tt. We argue this set is necessarily 𝒳\mathcal{X}. Otherwise, invariance implies that uiq(𝐰q)=0u^{q}_{i}(\mathbf{w}^{q})=0 for any dominating agent i𝒜i\in\mathcal{A}. However, this occurs if and only if i𝒜\exists\,i^{\prime}\in\mathcal{A}, iii\neq i^{\prime}, another dominating agent for task qq such that wiq=1w^{q}_{i^{\prime}}=1; otherwise, uiq(𝐰q)>0u^{q}_{i}(\mathbf{w}^{q})>0. Thus, {wiq(t)}iq𝒳\{w^{q}_{i}(t)\}_{i\in\mathcal{I}^{q}}\to\mathcal{X} as tt\to\infty . This along with previous discussion proves that w^iq\widehat{w}^{q}_{i} follows Property 1 of Lemma 4.2.

Next consider any j𝒜j\in\mathcal{A} that is not dominating for qq. From the previous part of the proof, we know that there is a i𝒜i\in\mathcal{A} dominating for qq for which wiq(t)1w^{q}_{i}(t)\to 1 and thus fi(q)wiq(t)fi(q)f_{i}(q)w^{q}_{i}(t)\to f_{i}(q) as tt\to\infty. This implies that τ0\exists\,\tau\in\mathbb{Z}_{\geq 0} such that maxk𝒜{j}fk(q)wkq(t)fj(q)+ν\max_{k\in\mathcal{A}\setminus\{j\}}f_{k}(q)w^{q}_{k}(t)\geq f_{j}(q)+\nu, for some ν>0\nu>0 and tτ\forall t\geq\tau. Then, as ujq(𝐰q(t))ν<0u^{q}_{j}(\mathbf{w}^{q}(t))\leq-\nu<0, wjq(t)w^{q}_{j}(t) is a strictly decreasing sequence (after τ\tau time steps). Thus, from the dynamics in 8, wjq(t)w^jq=0w^{q}_{j}(t)\to\widehat{w}^{q}_{j}=0 as tt\to\infty. Hence, w^jq\widehat{w}^{q}_{j} follows Property 2 of Lemma 4.2. \blacksquare

When there is a unique dominating agent for each task, we can guarantee finite-time convergence to an optimal partition.

Theorem 5.3 (PBRAG  converges in finite time).

Suppose Assumption 3.2 holds and suppose that for each q𝒬q\in\mathcal{Q}, there exists a unique dominating agent, iq𝒜i_{q}^{*}\in\mathcal{A}. Define γ:=mini𝒜,q𝒬γiq>0\gamma:=\min\limits_{i\in\mathcal{A},q\in\mathcal{Q}}\gamma^{q}_{i}>0, and let δ:=minq𝒬[fiq(q)maxjiqfj(q)]>0\delta:=\min\limits_{q\in\mathcal{Q}}\big{[}f_{i^{*}_{q}}(q)-\max\limits_{j\neq i^{*}_{q}}f_{j}(q)\big{]}>0. Consider the dynamics 8 starting from 𝐖(0)[0,1]n×m\mathbf{W}(0)\in[0,1]^{n\times m}, with the solution trajectory 𝐖(t)𝐖¯𝒲\mathbf{W}(t)\rightarrow\overline{\mathbf{W}}\in\mathcal{W}, as tt\rightarrow\infty. Then wiq(t)=w¯iqw^{q}_{i}(t)=\overline{w}^{q}_{i}, i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}, t2(γδ)1\forall t\geq 2\left\lceil(\gamma\delta)^{-1}\right\rceil.

Proof. From Lemma 5.1 and Corollary 4.4, it is clear that 𝒲\mathcal{W} is a singleton set. Let 𝐖¯𝒲\overline{\mathbf{W}}\in\mathcal{W} be the unique equilibrium point. From Theorem 5.2, we know that 𝐖(t)𝐖¯\mathbf{W}(t)\to\overline{\mathbf{W}} as tt\to\infty. From 7, uiqq(𝐰q(t))δ>0,t0u^{q}_{i^{*}_{q}}(\mathbf{w}^{q}(t))\geq\delta>0,\forall t\in\mathbb{Z}_{\geq 0} and hence from 8, t0\forall t\in\mathbb{Z}_{\geq 0}, wiqq(t+1)[wiqq(t)+γiqqδ]01[wiqq(0)+(t+1)γiqqδ]01w^{q}_{i^{*}_{q}}(t+1)\geq[w^{q}_{i^{*}_{q}}(t)+\gamma^{q}_{i^{*}_{q}}\,\delta]_{0}^{1}\geq[w^{q}_{i^{*}_{q}}(0)+(t+1)\,\gamma^{q}_{i^{*}_{q}}\,\delta]_{0}^{1}. The inequality holds since []01[\cdot]_{0}^{1} is a nondecreasing function. Thus, wiqq(t)=1w^{q}_{i^{*}_{q}}(t)=1, for all t(γδ)1(γδ)1[1wiqq(0)][γiqqδ]1t\geq\left\lceil(\gamma\delta)^{-1}\right\rceil\geq(\gamma\delta)^{-1}\geq[1-w^{q}_{i^{*}_{q}}(0)][\gamma^{q}_{i^{*}_{q}}\delta]^{-1}. Now define τ:=(γδ)1\tau:=\left\lceil(\gamma\delta)^{-1}\right\rceil and consider any jiqj\neq i^{*}_{q} and notice from 7 that ujq(𝐰q(t))δ<0,tτu^{q}_{j}(\mathbf{w}^{q}(t))\leq-\delta<0,\forall t\geq\tau. Thus, wjq(t)[wjq(τ)tγjqδ]01,tτ.w^{q}_{j}(t)\leq[w^{q}_{j}(\tau)-t\,\gamma^{q}_{j}\,\delta]_{0}^{1},\forall t\geq\tau. The inequality again holds since []01[\cdot]_{0}^{1} is non decreasing. So, wjq(t)=0w^{q}_{j}(t)=0, for all tτ+(γδ)1τ+[γjqδ]1τ+wjq(τ)[γjqδ]1t\geq\tau+\left\lceil(\gamma\delta)^{-1}\right\rceil\geq\tau+[\gamma^{q}_{j}\delta]^{-1}\geq\tau+w^{q}_{j}(\tau)[\gamma^{q}_{j}\delta]^{-1}. \blacksquare

Remark 5.4 (On the effect of step-size on convergence).

By Theorem 5.28 converges to an equilibrium weight when γiq>0\gamma^{q}_{i}>0, i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}. Thus agents can choose any constant positive step size and guarantee convergence to a NE of the weight game. Further inspection of Theorem 5.3 leads to this interesting observation. Since δ1>0\delta^{-1}>0, the individual γiq\gamma^{q}_{i}’s can be chosen in such a way that 0<(γδ)1<10<{(\gamma\delta)}^{-1}<1. Then 2(γδ)1=22\left\lceil(\gamma\delta)^{-1}\right\rceil=2. That is, by choosing a sufficiently large step size and communicating with every other agent, the agents can reach the NE in at most two time steps. \bullet

In order to avoid all-to-all communication, it is possible to adapt 8 introducing a consensus subroutine. In the next section, we utilize this idea to handle decentralization together with unknown rewards.

6 Distributed Task Allocation

Here, we provide a solution to Problem 3.4 (3). Recall that in Section 5, each agent i𝒜i\in\mathcal{A} computes maxjifj(q)wjq\max_{j\neq i}f_{j}(q)w^{q}_{j} using information from all other agents. Here, we introduce a communication graph 𝒢:=(𝒜,)\mathcal{G}:=(\mathcal{A},\mathcal{E}) with vertex set 𝒜\mathcal{A}. The arc set \mathcal{E} defines the connections between agents, with (i,j)(i,j)\in\mathcal{E} if and only if i𝒜i\in\mathcal{A} can send information to j𝒜j\in\mathcal{A}. For the sake of brevity, let d:=diam(𝒢)\mathrm{d}:=\mathrm{diam}(\mathcal{G}).

For such a setup, the following result gives a way to find the max and second unique max values in a distributed fashion. This is useful in providing a distributed PBRAG (d-PBRAG).

Lemma 6.1 (Agreement on the two largest variables in a network).

Let 𝒢\mathcal{G} be a strongly connected graph and consider

Miq(t+1)=maxj𝒩¯iMjq(t),\displaystyle M^{q}_{i}(t+1)=\max_{j\in\overline{\mathcal{N}}_{i}}M^{q}_{j}(t), (10a)
Siq(t+1)=max(2){{Sjq(t)}j𝒩¯i,Miq(t),viq},\displaystyle S^{q}_{i}(t+1)=\operatorname*{max^{(2)}}\Big{\{}\{S^{q}_{j}(t)\}_{j\in\overline{\mathcal{N}}_{i}},M^{q}_{i}(t),v^{q}_{i}\Big{\}}, (10b)

with initial condition Miq(0)=Siq(0)=viq0M^{q}_{i}(0)=S^{q}_{i}(0)=v^{q}_{i}\in\mathbb{R}_{\geq 0}, i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}. Then q𝒬\forall\,q\in\mathcal{Q} and i𝒜\forall\,i\in\mathcal{A};

  1. 1.

    Miq(t)=max{vjq}j𝒜M^{q}_{i}(t)=\max\{v^{q}_{j}\}_{j\in\mathcal{A}}, td\forall\,t\geq\mathrm{d},

  2. 2.

    Siq(t)=max(2){vjq}j𝒜S^{q}_{i}(t)=\operatorname*{max^{(2)}}\{v^{q}_{j}\}_{j\in\mathcal{A}}, t2d\forall\,t\geq 2\,\mathrm{d}.

Proof. We show this for an arbitrary but fixed q𝒬q\in\mathcal{Q}. Let iqargmax{vjq}j𝒜i^{*}_{q}\in\operatorname*{argmax}\{v^{q}_{j}\}_{j\in\mathcal{A}}. Then, from 10a, Miqq(t)=viqqM^{q}_{i^{*}_{q}}(t)=v^{q}_{i^{*}_{q}}, t0\forall t\in\mathbb{Z}_{\geq 0}. Thus, iqargmax{vjq}j𝒜\forall\,i^{*}_{q}\in\operatorname*{argmax}\{v^{q}_{j}\}_{j\in\mathcal{A}}, Mjq(t)=viqqM^{q}_{j}(t)=v^{q}_{i^{*}_{q}}, j𝒩iq\forall j\in\mathcal{N}_{i^{*}_{q}}, t1\forall t\geq 1. Continuing this argument inductively proves Property 1 since 𝒢\mathcal{G} is strongly connected.

To show Property 2, we use Property 1. Now let iqargmax(2){vjq}j𝒜i^{*}_{q}\in\operatorname*{argmax^{(2)}}\{v^{q}_{j}\}_{j\in\mathcal{A}}. Then, from 10b, Siqq(t)=viqqS^{q}_{i^{*}_{q}}(t)=v^{q}_{i^{*}_{q}}, td\forall t\geq\mathrm{d}. Thus, similarly, iqargmax(2){vjq}j𝒜\forall\,i^{*}_{q}\in\operatorname*{argmax^{(2)}}\{v^{q}_{j}\}_{j\in\mathcal{A}}, Sjq(t)=viqqS^{q}_{j}(t)=v^{q}_{i^{*}_{q}}, j𝒩iq\forall j\in\mathcal{N}_{i^{*}_{q}}, td+1\forall t\geq\mathrm{d}+1. Again, continuing this argument proves Property 2 since 𝒢\mathcal{G} is strongly connected. \blacksquare

To compute the gradient and update the weights wiqw^{q}_{i} simultaneously, we propose the following dynamics:

wiq(t+1)=[wiq(t)+γiq(t)(ziq(t)12(Miq(t)+Siq(t)))]01,\displaystyle w^{q}_{i}(t+1)=\Big{[}w^{q}_{i}(t)+\gamma^{q}_{i}(t)\Big{(}z^{q}_{i}(t)-\frac{1}{2}\big{(}M^{q}_{i}(t)+S^{q}_{i}(t)\big{)}\Big{)}\Big{]}_{0}^{1}, (11a)
Miq(t+1)=σsw(maxj𝒩¯iMjq(t),eiq(t+1),t+1,T),\displaystyle M^{q}_{i}(t+1)=\sigma_{\mathrm{sw}}\Big{(}\max_{j\in\overline{\mathcal{N}}_{i}}M^{q}_{j}(t),e^{q}_{i}(t+1),t+1,T\Big{)}, (11b)
Siq(t+1)=σsw(max(2){{Sjq(t)}j𝒩¯i,Miq(t),eiq(t)},\displaystyle S^{q}_{i}(t+1)=\sigma_{\mathrm{sw}}\Big{(}\operatorname*{max^{(2)}}\Big{\{}\{S^{q}_{j}(t)\}_{j\in\overline{\mathcal{N}}_{i}},M^{q}_{i}(t),e^{q}_{i}(t)\Big{\}},
eiq(t+1),t+1,T),\displaystyle\hskip 120.00018pte^{q}_{i}(t+1),t+1,T\Big{)}, (11c)
eiq(t+1)=σsw(eiq(t),ziq(t+1),t+1,T),\displaystyle e^{q}_{i}(t+1)=\sigma_{\mathrm{sw}}\Big{(}e^{q}_{i}(t),z^{q}_{i}(t+1),t+1,T\Big{)}\,, (11d)

for some T0T\in\mathbb{R}_{\geq 0} and where σsw\sigma_{\mathrm{sw}} is the switching function

σsw(m,z,t,T):={z,iftmodT=0,m,otherwise.\displaystyle\sigma_{\mathrm{sw}}\Big{(}m,z,t,T\Big{)}:=\begin{cases}z,&\mathrm{if}\,\,t\,\,\mathrm{mod}\,\,T=0,\\ m,&\mathrm{otherwise}\,.\end{cases} (12)
Remark 6.2 (d-PBRAG  with agreement and periodic input injection).

Note that i𝒜\forall\,i\in\mathcal{A}, q𝒬\forall\,q\in\mathcal{Q}, the weight update in 11a uses the sequence {ziq(t)}t0\{z^{q}_{i}(t)\}_{t\in\mathbb{Z}_{\geq 0}} and a time-varying step-size γiq(t)\gamma^{q}_{i}(t) instead of fi(q)f_{i}(q) and a constant step-size γiq\gamma^{q}_{i}; respectively, as in 8. The periodic switching function σsw\sigma_{\mathrm{sw}} ensures that eiq(t)e^{q}_{i}(t) holds the value ziq(t)z^{q}_{i}(t) for every TT time-steps. This in turn allows 11b and 11c to run an agreement subroutine as 10 every TT time-steps with viq=ziq(kT)v^{q}_{i}=z^{q}_{i}(kT), for k0k\in\mathbb{Z}_{\geq 0}. Thus, at every time-step which is a multiple of TT, each agent believes that its own value is the maximum and corrects this belief over the next T1T-1 time-steps. \bullet

Theorem 6.3 (Asymptotic behavior of d-PBRAG).

Suppose Assumptions 3.2 and 3.3 hold. Define

Δq:=(maxi𝒜fi(q)mini𝒜fi(q))>0,q𝒬.\displaystyle\Delta^{q}:=\Big{(}\max_{i\in\mathcal{A}}f_{i}(q)-\min_{i\in\mathcal{A}}f_{i}(q)\Big{)}>0,\,\forall q\in\mathcal{Q}\,. (13)

Consider any ε(0,1)\varepsilon\in(0,1). Suppose t0\forall t\in\mathbb{Z}_{\geq 0}, γiq(t)=αiq\gamma^{q}_{i}(t)=\alpha^{q}_{i}, with 0<αiqε(2dΔq)10<\alpha^{q}_{i}\leq\varepsilon\,(2\,\mathrm{d}\,\Delta^{q})^{-1}, i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}. Next, define α:=mini𝒜,q𝒬αiq\alpha:=\min_{i\in\mathcal{A},q\in\mathcal{Q}}\alpha^{q}_{i}, μq:=0.5(max{fi(q)}i𝒜max(2){fi(q)}i𝒜)\mu^{q}:=0.5\,(\max\{f_{i}(q)\}_{i\in\mathcal{A}}-\operatorname*{max^{(2)}}\{f_{i}(q)\}_{i\in\mathcal{A}}), and

μ:=(1ν)minq𝒬μq>0,\displaystyle\mu:=(1-\nu)\min_{q\in\mathcal{Q}}\mu^{q}>0, (14)

with ν(0,1)\nu\in(0,1). Further, suppose T>2d+(αμ)1+1T>2\,\mathrm{d}+(\alpha\,\mu)^{-1}+1. Let 𝐖(t)\mathbf{W}(t) be the solution trajectory to 11 starting from 𝐖(0)[0,1]n×m\mathbf{W}(0)\in[0,1]^{n\times m}. Then τ(𝐖(0))0\exists\,\tau(\mathbf{W}(0))\in\mathbb{Z}_{\geq 0} such that tτ\forall\,t\geq\tau,

  1. 1.

    wiq(t)=1w^{q}_{i}(t)=1 if i𝒜i\in\mathcal{A} is dominating for q𝒬q\in\mathcal{Q};

  2. 2.

    wjq(t)εw^{q}_{j}(t)\leq\varepsilon if j𝒜j\in\mathcal{A} is not dominating for q𝒬q\in\mathcal{Q}.

Thus C(𝐖(t))C(\mathbf{W}(t)) converges in finite number of time steps.

Proof. First note that the bounds on αiq\alpha^{q}_{i}’s and TT are valid because of Assumption 3.2. Then, we show the claims for an arbitrary but fixed q𝒬q\in\mathcal{Q}.

Recall that because of Assumption 3.3, τ00\exists\,\tau_{0}\in\mathbb{Z}_{\geq 0} such that t,tτ0\forall t,t^{\prime}\geq\tau_{0}, ziq(t)0.5(ziq(t)+zjq(t))<Δqz^{q}_{i}(t)-0.5\,(z^{q}_{i}(t)+z^{q}_{j}(t^{\prime}))<\Delta^{q}, i,j𝒜\forall\,i,j\in\mathcal{A}. Moreover τ0\tau_{0} can be chosen such that ziqq(t)0.5(ziqq(t)+zjq(t))(1ν)μq>0z^{q}_{i^{*}_{q}}(t)-0.5\,(z^{q}_{i^{*}_{q}}(t)+z^{q}_{j}(t^{\prime}))\geq(1-\nu)\,\mu^{q}>0, for any ν(0,1)\nu\in(0,1), if iqargmaxi𝒜fi(q)i^{*}_{q}\in\operatorname*{argmax}_{i\in\mathcal{A}}f_{i}(q) and j𝒜\forall j\in\mathcal{A} such that jargmaxi𝒜fi(q)j\notin\operatorname*{argmax}_{i\in\mathcal{A}}f_{i}(q).

Now consider any iqargmaxi𝒜fi(q)i^{*}_{q}\in\operatorname*{argmax}_{i\in\mathcal{A}}f_{i}(q) and any ν(0,1)\nu\in(0,1). Then from Remark 6.2 and from the previous discussion, it is clear that, tτ0\forall t\geq\tau_{0},

αiqq(ziqq(t)0.5(Miqq(t)+Siqq(t)))αiq(1ν)μqαiqμ.\displaystyle\alpha^{q}_{i^{*}_{q}}\big{(}z^{q}_{i^{*}_{q}}(t)-0.5\big{(}M^{q}_{i^{*}_{q}}(t)+S^{q}_{i^{*}_{q}}(t)\big{)}\big{)}\geq\alpha^{q}_{i}(1-\nu)\,\mu^{q}\geq\alpha^{q}_{i}\mu\,.

This proves Property 1 of this theorem as αiqμ>0\alpha^{q}_{i}\mu>0.

Next consider any jargmaxi𝒜fi(q)j\notin\operatorname*{argmax}_{i\in\mathcal{A}}f_{i}(q). Note from Remark 6.2 that tτ0\forall t\geq\tau_{0} such that t{kT+2d,,2kT1}t\in\{kT+2\mathrm{d},\cdots,2kT-1\} for some k0k\in\mathbb{Z}_{\geq 0}, wjq(t)w^{q}_{j}(t) strictly decreases, since,

zjq(t)0.5(Mjq(t)+Sjq(t))(1ν)μq<0.\displaystyle z^{q}_{j}(t)-0.5(M^{q}_{j}(t)+S^{q}_{j}(t))\leq-(1-\nu)\mu^{q}<0\,.

Consider any tτ0t\geq\tau_{0} such that t{kT+2d,,2kT1}t\in\{kT+2\mathrm{d},\cdots,2kT-1\} with k0k\in\mathbb{Z}_{\geq 0}. Then, since wjq(kT+2d1)1w^{q}_{j}(kT+2\mathrm{d}-1)\leq 1,

wjq(t)1((tmodT)2d)αμ,\displaystyle w^{q}_{j}(t)\leq 1-((t\,\,\mathrm{mod}\,\,T)-2\mathrm{d})\alpha\mu,

and hence because of the bound on TT, wjq(2kT1)=0w^{q}_{j}(2kT-1)=0. Finally, consider any tτ0t\geq\tau_{0} such that t{kT,,kT+2d1}t\in\{kT,\cdots,kT+2\mathrm{d}-1\} with k0k\in\mathbb{Z}_{\geq 0}. Then, since wjq(kT1)=0w^{q}_{j}(kT-1)=0 (from previous arguments), we have wjq(t)(tmodT)αiqΔqw^{q}_{j}(t)\leq(t\,\,\mathrm{mod}\,\,T)\,\alpha^{q}_{i}\,\Delta^{q}. Thus combining all these arguments proves Property 2.

The final claim follows from the previous ones and 4. \blacksquare

Note that the previous result does not guarantee that the weights converge. This stems from the fact that at periodic times, each agent believes that it gets the maximum reward for each task. Moreover, the previous result needs information about the limits of the converging sequences to provide bounds for the step sizes and the period of input injection. This can be avoided by allowing time-varying step sizes as stated next.

Theorem 6.4 (d-PBRAG  converges to Nash equilibrium).

Suppose Assumptions 3.2 and 3.3 hold. Let 𝐖(t)\mathbf{W}(t) be the solution trajectory of 11 from 𝐖(0)[0,1]n×m\mathbf{W}(0)\in[0,1]^{n\times m}, with T>2d+1T>2\,\mathrm{d}+1 and i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q},

γiq(t)={αiq(k)>0,ift{kT,,kT+2d1},βiq(k)>0,ift{kT+2d,,2kT1},\displaystyle\gamma^{q}_{i}(t)=\begin{cases}\alpha^{q}_{i}(k)>0,&\mathrm{if}\,t\in\{kT,\cdots,kT+2\mathrm{d}-1\},\\ \beta^{q}_{i}(k)>0,&\mathrm{if}\,t\in\{kT+2\mathrm{d},\cdots,2kT-1\},\end{cases}

with k0k\in\mathbb{Z}_{\geq 0}. Further, i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}; take sequences αiq(k)0\alpha^{q}_{i}(k)\to 0 as kk\to\infty and βiq(k)\beta^{q}_{i}(k)\to\infty as kk\to\infty. Then 𝐖(t)𝐖¯𝒩(𝒢W)\mathbf{W}(t)\to\overline{\mathbf{W}}\in\mathcal{NE}(\mathscr{G}_{W}) as tt\to\infty.

Proof. From hypothesis, ε>0\forall\varepsilon>0, K0\exists\,K\in\mathbb{Z}_{\geq 0}, such that t{kT,,kT+2d1}\forall t\in\{kT,\cdots,kT+2\mathrm{d}-1\}, with kKk\geq K, αiq(t)ε(2dΔq)1\alpha^{q}_{i}(t)\leq\varepsilon\,(2\,\mathrm{d}\,\Delta^{q})^{-1}, with Δq\Delta^{q} as in 13. Moreover, KK can be chosen such that t{kT+2d,,2kT1}\forall t\in\{kT+2\mathrm{d},\cdots,2kT-1\}, with kKk\geq K, T>2d+((1ν)μmini𝒜,q𝒬αiqβiq(t))1+1T>2\mathrm{d}+((1-\nu)\mu\min_{i\in\mathcal{A},q\in\mathcal{Q}}\alpha^{q}_{i}\beta^{q}_{i}(t))^{-1}+1 for any ν(0,1)\nu\in(0,1) and with μ\mu as in 14. Then the claim of this result is a consequence of applying similar arguments as in the proof of Theorem 6.3 and using 5. \blacksquare

We conclude this section by discussing some interesting observations about the parameters in 11.

Remark 6.5 (On the implementation of d-PBRAG).

Note that even though diam(𝒢)\mathrm{diam}(\mathcal{G}) is an internal property of the communication graph 𝒢\mathcal{G} and requires some structural knowledge of the same, the claims in Theorems 6.3 and 6.4 remain true if d\mathrm{d} is replaced with nn. This is because diam(𝒢)n\mathrm{diam}(\mathcal{G})\leq n. Moreover, these results can be extended to time-varying communication graphs with periodic connectivity because the agreement subroutine still works. Further, note that the conditions in Theorem 6.4 are only sufficient for convergence. In fact, βiq(k)\beta^{q}_{i}(k) need not grow unbounded, but then knowledge of converging reward values are required for proper functioning of the algorithm. For example, if μ\mu is large, small values of βiq(k)\beta^{q}_{i}(k) are sufficient to guarantee convergence; but if μ\mu is small then βiq(k)\beta^{q}_{i}(k) values have to be sufficiently large in order to guarantee that non-dominating agents are not assigned the task. Finally, notice that in order for the algorithm to work, each agent i𝒜i\in\mathcal{A} has to pass two values (Miq(t)M^{q}_{i}(t), Siq(t)S^{q}_{i}(t)) for each task q𝒬q\in\mathcal{Q} to its neighbors at each time step. This makes the local communication cost of this algorithm of the order of O(m)O(m) per iteration time. \bullet

TABLE I: Approximate fi(q)f_{i}(q) values for Simulations
i𝒜i\in\mathcal{A} q𝒬q\in\mathcal{Q} 1 2 3 4
1 0.4536 0.4407 0.2881 0.0055
2 0.7504 0.2228 0.0411 0.2801
3 0.7656 0.0987 0.1381 0.2491
4 0.3023 0.2211 0.3334 0.2462
i𝒜i\in\mathcal{A} q𝒬q\in\mathcal{Q} 5 6 7 8
1 0.0049 0.2394 0.3152 0.2217
2 0.2374 0.0768 0.0852 0.1760
3 0.2969 0.1003 0.1471 0.6902
4 0.3033 0.4991 0.1231 0.5931
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: PBRAG  using 8 and large step size γ\gamma. Plots share a common legend.

7 Simulations

In this section, we verify our major claims and illustrate some interesting features of our algorithms.

7-A Fast convergence of PBRAG  with large step-size

Here, we simulate n=4n=4 agents to optimally allocate m=8m=8 tasks with ri(q),ϕi(q)Unif[0,1]r_{i}(q),\phi_{i}(q)\sim\mathrm{Unif}[0,1], i𝒜,q𝒬\forall i\in\mathcal{A},q\in\mathcal{Q}. In particular, Table I gives the approximate values of fi(q)f_{i}(q), i𝒜\forall i\in\mathcal{A}, q𝒬q\in\mathcal{Q}. For each q𝒬q\in\mathcal{Q}, the highlighted cell represents maxi𝒜fi(q)\max_{i\in\mathcal{A}}f_{i}(q).

We first verify the claim in Remark 5.4. Figure 1 shows the solution evolution using 8 from an initial 𝐖(0)=𝟎\mathbf{W}(0)=\mathbf{0}. Optimal partition as in Figure 1 is given as 𝒱1={2,7}\mathcal{V}_{1}=\{2,7\}, 𝒱2={4}\mathcal{V}_{2}=\{4\}, 𝒱3={1,8}\mathcal{V}_{3}=\{1,8\}, 𝒱4={3,5,6}\mathcal{V}_{4}=\{3,5,6\}. Here, since the values of fi[0,1]f_{i}\in[0,1], γiqO(106)\gamma^{q}_{i}\in O(10^{6}) was required to make the solutions converge in two time steps. For larger deviations in the values of fif_{i}, much smaller values of γiq\gamma^{q}_{i}’s can achieve similar effects.

7-B Effect of constant step-size on d-PBRAG

Here, we deal with the claims in Theorem 6.3 for n=8n=8 agents optimally allocating m=1m=1 task. We take f1(1)=Bf_{1}(1)=B, f2(1)=0.9Bf_{2}(1)=0.9\,B, and fi(1)=0.3B/if_{i}(1)=0.3\,B/i, i{3,,8}\forall\,i\in\{3,\cdots,8\}, with B=1000B=1000. Thus agent 11 is the dominating agent. Further, we consider an unknown reward structure with ziq(t)=fi(q)+aiqcos(biqt)exp(ciqt)z^{q}_{i}(t)=f_{i}(q)+a^{q}_{i}\,\cos(b^{q}_{i}\,t)\mathrm{exp}(-c^{q}_{i}\,t), i𝒜\forall i\in\mathcal{A}, q𝒬\forall q\in\mathcal{Q}, where aiqUnif[0,fi(q)]a^{q}_{i}\sim\mathrm{Unif}[0,f_{i}(q)], biqUnif[0,10]b^{q}_{i}\sim\mathrm{Unif}[0,10], and ciqUnif[0,1]c^{q}_{i}\sim\mathrm{Unif}[0,1]. We set the communication graph 𝒢=(𝒜,)\mathcal{G}=(\mathcal{A},\mathcal{E}) with ={(1,2),(2,3),(3,4),(4,1)}\mathcal{E}=\{(1,2),(2,3),(3,4),(4,1)\}.

Figure 2 shows the solution evolution using 11 with constant step-size from an initial 𝐖(0)=𝟎\mathbf{W}(0)=\mathbf{0}. It is interesting to note from Figure 2 that if ε\varepsilon is large, then w11(t)w^{1}_{1}(t) reaches 11 faster, but the weights of the non-dominating agents rise higher. On the other hand if ε\varepsilon is small, then the rise in the weights of the non-dominating agents is less but w11(t)w^{1}_{1}(t) reaches 11 slower. This is because ε\varepsilon affects the choice of TT as well.

7-C d-PBRAG  with time-varying step-sizes

Here, we again simulate n=4n=4 agents optimally allocating m=8m=8 tasks. We take the unknown reward structure as in Section 7-B with fi(q)f_{i}(q) as in Table I. Further, we use the distributed approach using 11 with time-varying step-sizes as described in Theorem 6.4. We also set the communication graph 𝒢\mathcal{G} as in Section 7-B.

Figure 3 shows the solution evolution using 11 from an initial 𝐖(0)=𝟎\mathbf{W}(0)=\mathbf{0}. Optimal partition as in Figure 3 is given as 𝒱1={2,7}\mathcal{V}_{1}=\{2,7\}, 𝒱2={4}\mathcal{V}_{2}=\{4\}, 𝒱3={1,8}\mathcal{V}_{3}=\{1,8\}, 𝒱4={3,5,6}\mathcal{V}_{4}=\{3,5,6\}. This is exactly same as the observation in Section 7-A. Further, notice from Figure 3 that the weights of agents 33 and 44 take longer time to settle than agents 11 and 22. In general, convergence rate of the algorithm depends on the properties of the unknown reward sequences and hence is difficult to characterize.

Refer to caption
Refer to caption
Figure 2: d-PBRAG  for unknown rewards with constant step-size using 11 on cyclic communication graph. Step-size γiq(t)\gamma^{q}_{i}(t) and time-period TT were chosen as in Theorem 6.3 with different ε\varepsilon and ν=0.1\nu=0.1. The plots share a common legend. (Top) ε=0.9\varepsilon=0.9. (Bottom) ε=0.3\varepsilon=0.3.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: d-PBRAG  for unknown rewards with time-varying step-size using 11 on cyclic communication graph. The plots share a common legend.

8 Conclusion and Future Work

In this paper, we presented a game theoretic formulation of an optimal task allocation problem for a group of agents. By allowing agents to assign weights between zero and one for each task, we relaxed the combinatorial nature of the problem. This led to a partition and weight game, whose NE formed a superset of the optimal task partition. Then, we provided a distributed best-response projected gradient ascent by which convergence to the NE of the weight game was guaranteed.

Future work will consider constraints on number of tasks for each agent, and generalizing the setup to continuous space of tasks and classes of tasks.

References

  • [1] H. L. Choi, L. Brunet, and J. P. How, “Consensus-based decentralized auctions for robust task allocation,” IEEE Transactions on Robotics, vol. 25, no. 4, pp. 912–926, 2009.
  • [2] F. Bullo, E. Frazzoli, M. Pavone, K. Savla, and S. L. . Smith, “Dynamic vehicle routing for robotic systems,” Proceedings of the IEEE, vol. 99, no. 9, pp. 1482–1504, 2011.
  • [3] A. Sadeghi and S. L. Smith, “Heterogeneous task allocation and sequencing via decentralized large neighborhood search,” Unmanned Systems, vol. 5, no. 02, pp. 79–95, 2017.
  • [4] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics, vol. 2, no. 1–2, p. 83–97, May 1955.
  • [5] S. Chopra, G. Notarstefano, M. Rice, and M. Egerstedt, “A distributed version of the Hungarian method for multirobot assignment,” IEEE Transactions on Robotics, vol. 33, no. 4, pp. 932–947, 2017.
  • [6] J. Cerquides, A. Farinelli, P. Meseguer, and S. D. Sarvapali, “A tutorial on optimization for multi-agent systems,” The Computer Journal, vol. 57, no. 6, pp. 799–824, 2014.
  • [7] A. Prasad, S. Sundaram, and H. L. Choi, “Min-max tours for task allocation to heterogeneous agents,” in IEEE Int. Conf. on Decision and Control, 2018, pp. 1706–1711.
  • [8] N. Rezazadeh and S. S. Kia, “Distributed strategy selection: A submodular set function maximization approach,” Automatica, vol. 153, p. 111000, 2023.
  • [9] J. Vondrák, “Optimal approximation for the submodular welfare problem in the value oracle model,” in ACM Symposium on Theory of Computing, 2008, pp. 67–74.
  • [10] G. Calinescu, C. Chekuri, M. Pal, and J. Vondrák, “Maximizing a monotone submodular function subject to a matroid constraint,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1740–1766, 2011.
  • [11] S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, p. 129–137, 1982.
  • [12] M. Santos, Y. Diaz-Mercado, and M. Egerstedt, “Coverage control for multirobot teams with heterogeneous sensing capabilities,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 919–925, 2018.
  • [13] M. Santos and M. Egerstedt, “Coverage control for multi-robot teams with heterogeneous sensing capabilities using limited communications,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2018, pp. 5313–5319.
  • [14] J. Cortés, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Transactions on Robotics and Automation, vol. 20, no. 2, pp. 243–255, 2004.
  • [15] E. Frazzoli and F. Bullo, “Decentralized algorithms for vehicle routing in a stochastic time-varying environment,” in IEEE Int. Conf. on Decision and Control, Paradise Island, Bahamas, Dec. 2004, pp. 3357–3363.
  • [16] M. Zhu and S. Martínez, “Distributed coverage games for energy-aware mobile sensor networks,” SIAM Journal on Control and Optimization, vol. 51, no. 1, pp. 1–27, 2013.
  • [17] J. R. Marden, G. Arslan, and J. S. Shamma, “Cooperative control and potential games,” IEEE Transactions on Systems, Man, & Cybernetics. Part B: Cybernetics, vol. 39, no. 6, p. 1393–1407, 2009.
  • [18] R. Konda, R. Chandan, D. Grimsman, and J. R. Marden, “Balancing asymptotic and transient efficiency guarantees in set covering games,” in American Control Conference, 2022, pp. 4416–4421.
  • [19] P. Frihauf, M. Krstic, and T. Basar, “Nash equilibrium seeking for games with non-quadratic payoffs,” in IEEE Int. Conf. on Decision and Control, Atlanta, USA, December 2010, pp. 881–886.
  • [20] M. Ye and G. Hu, “Distributed Nash equilibrium seeking by a consensus based approach,” IEEE Transactions on Automatic Control, vol. 62, no. 9, pp. 4811–4818, 2017.
  • [21] Z. Deng and X. Nian, “Distributed generalized Nash equilibrium seeking algorithm design for aggregative games over weight-balanced digraphs,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 3, pp. 695–706, 2018.
  • [22] J. Koshal, A. Nedić, and U. V. Shanbhag, “A gossip algorithm for aggregative games on graphs,” in IEEE Int. Conf. on Decision and Control, 2012, pp. 4840–4845.
  • [23] F. Salehisadaghiani and L. Pavel, “Distributed Nash equilibrium seeking: A gossip-based algorithm,” Automatica, vol. 72, pp. 209–216, 2016.
  • [24] A. C. Chapman, R. A. Micillo, R. Kota, and N. R. Jennings, “Decentralized dynamic task allocation using overlapping potential games,” The Computer Journal, vol. 53, no. 9, pp. 1462–1477, 2010.
  • [25] Y. Narahari, Game theory and mechanism design.   World Scientific, 2014, vol. 4.
  • [26] R. Diestel, “Graph theory,” Graduate Texts in Mathematics, pp. 173–207, 2017.