Utilitarian Welfare Optimization in the Generalized Vertex Coloring Games: An Implication to Venue Selection in Events Planning

Abstract

We consider a general class of multi-agent games in networks, namely the generalized vertex coloring games (G-VCGs), inspired by real-life applications of the venue selection problem in events planning. Certain utility responding to the contemporary coloring assignment will be received by each agent under some particular mechanism, who, striving to maximize his own utility, is restricted to local information thus self-organizing when choosing another color. Our focus is on maximizing some utilitarian-looking welfare objective function concerning the cumulative utilities across the network in a decentralized fashion. Firstly, we investigate on a special class of the G-VCGs, namely Identical Preference VCGs (IP-VCGs) which recovers the rudimentary work by Chaudhuri et al. (2008). We reveal its convergence even under a completely greedy policy and completely synchronous settings, with a stochastic bound on the converging rate provided. Secondly, regarding the general G-VCGs, a greediness-preserved Metropolis-Hasting based policy is proposed for each agent to initiate with the limited information and its optimality under asynchronous settings is proved using theories from the regular perturbed Markov processes. The policy was also empirically witnessed to be robust under independently synchronous settings. Thirdly, in the spirit of “robust coloring”, we include an expected loss term in our objective function to balance between the utilities and robustness. An optimal coloring for this robust welfare optimization would be derived through a second-stage MH-policy driven algorithm. Simulation experiments are given to showcase the efficiency of our proposed strategy.

keywords:

vertex coloring game , Markov Chain methods , the Metropolis-Hasting algorithm , venue selection problem , welfare optimization

\affiliation

[label1]organization=School of Physical and Mathematical Sciences, Nanyang Technological University, country=Singapore

1 Introduction

Selecting a right venue is crucial to the success of an event and worth emphasis in events planning. Critical criteria for being “right” may consider the venue’s layout and size, the accessibility, technical requirements, the atmosphere or tone the planner wishes to convey, etc., and often differ across specific activities (Allen et al., 2022). A common challenge confronted in reality is for a group of different event managers to select their preferred locations among a limited number of venues, especially when there exist mutual timetable clashes between events, and a beneficial-to-all venue assignment is always desirable. For instance, multiple clubs on a campus who intend to organize respective events in celebration of certain festival may fail to obtain their respective favorite venue due to the limited number of available function halls. A minimum requirement for any event planning is to guarantee that each event would be placed somewhere without a timetable clash with others, which can be modelled by a classical graph (vertex) coloring problem. Vertex coloring has long been a prominent tool for modelling network problems with multifrious applications including scheduling, channel assignment, text or image segmentation, etc.; see, e.g. a survey by Ahmed (2012). Given a connected graph (network) $G=(V,E)$ with $|V|=n$ and a collection of colors $M=\{M_{1},M_{2},\dots,M_{m}\}$ with cardinality $m$ , a coloring is a function $c:V\rightarrow M$ that assigns each vertex $V_{i}\in V$ a color $m_{i}\in M$ . Denote the color assigned to a vertex $V_{i}$ by $c_{i}$ . We say that a clash occurs between two vertices $V_{i}$ and $V_{j}$ when $c_{i}=c_{j}$ , $e_{ij}\in E$ . It is desired for many scenarios to obtain a coloring without clash, namely a proper coloring $c^{p}:V\rightarrow M$ such that $c^{p}(V_{i})\neq c^{p}(V_{j})$ whenever $e_{ij}\in E$ . Denote the space of all possible colorings by $C$ and the space of all proper colorings by $C^{p}$ . For the sake of representation, the graph is often translated to a square location matrix $\mathcal{L}^{n\times n}$ where $\mathcal{L}_{ij}=\mathds{1}_{\{e_{ij}\in E\}}$ . The nighborhood of a vertex $V_{i}\in V$ , $\mathcal{N}(V_{i})$ , is defined as the set $\{V_{j}:\mathcal{L}_{ij}=1\}$ .

Concretely during events planning, one may represent different function halls as different colors. Let each club manager be represented by a vertex, then an edge can be linked between any pair whose event schedules clash, which establishes a connected graph (network). The goal is then to look for a proper coloring; i.e. a coloring in which adjacent vertices are associated with distinguished colors. However, classical procedures of solving such a vertex coloring problem often provide few practical insights for event planners due to two drawbacks. One unfavorable shortcoming is that classical vertex colorings always bother a “central agent” to gather and manipulate with the information contained in the entire network (Kearns et al., 2006; Chaudhuri et al., 2008). In real-life planning, this agent with “omniscient” view is most times absent, and each club manager needs to propose and apply for a venue autonomously. Algorithmically speaking, though centralized colorings may converge to a proper one quickly, they require heavy workload or computation, especially when $n$ is large, which are most times out of reach in reality. The other limitation is regarding the overlooked real value, or social welfare, generated by the eventual coloring. As mentioned, managers do have varied preferences on different venues, thus beyond the properness of the eventual coloring, the satisfiability of each agent under the coloring is worth further attention.

Bearing the above concerns in mind, in this paper, we keep the agents in the network self-organized and extend the pursuit of a proper coloring to more general objectives. To eliminate any dependence on “central agent”, each vertex is motivated to strive for some target coloring themselves through limited information sharing and mutual negotiations. In other words, instead of being “allocated” a color, each vertex is required to “select” a color autonomously in rounds and solve the possible improper situations themselves through compliant communications. Such a “self-organizing” setting better mimics the real dynamics and interactions occuring in the multi-agent system and converts the original static task to an evolutionary game, namely a Vertex Coloring Game (VCG). Meanwhile, in an attempt to include the group satisfiability into consideration, certain preference mechanism is employed to measure personal satisfiabilities. Analogue to event planners’ customized preferences to different function halls, a preference function $\phi_{i}:M\rightarrow\mathbb{R}_{+}$ is developed by each vertex $V_{i}$ to measure its preference to the current color in hold, with the total number of colors provided as a constraint. Like in many conventional game-theoretic settings, we are interested in the social welfare across the entire network which is evaluated via some welfare functional $\Phi:C\rightarrow\mathbb{R}$ defined by $\Phi(c):=\mathbf{f}(\{\phi_{1}(c_{1}),\phi_{2}(c_{2}),\dots,\phi_{n}(c_{n})\})$ which is essentially a function of all individual utilities. Such functions are often referred to as “cardinal” welfare functions and cover a wide range of candidates including utilitarian welfare, minmax welfare, Foster’s welfare, etc. (Keeney and Kirkwood, 1975). Finding a most beneficial venue assignment is thus equivalent to maximizing the social welfare without violating the constraint of a proper coloring, which is summarized in the following constrained welfare optimization problem

\displaystyle\begin{split}\operatorname*{\mathrm{maximize}}_{c\in C}\quad&\Phi(c):=\mathbf{f}(\{\phi_{1}(c_{1}),\phi_{2}(c_{2}),\dots,\phi_{n}(c_{n})\})\\ \mathrm{subject~{}to}\quad&c_{i}\neq c_{j},\forall i,\forall j\in\{j:\mathcal{L}_{ij}=1\}\end{split}.

(

\mathsf{WO1}

)

Remark 1

We would like to accentuate that, the optimal coloring to ( $\mathsf{WO1}$ ), $c^{*}_{WO1}$ , is necessarily a Nash equilibrium. Suppose not, since $u_{i}(c^{*}_{WO1_{i}},c^{*}_{WO1_{-i}})>0$ (by the properness of $c^{*}_{WO1}$ ), the only way for $V_{i}$ to obtain better utility is to select another color he prefers more but different from all his neighbots’ choices, otherwise his utility would be reduced to zero. This operation destines an increase in social welfare thus contracting to the fact that $c^{*}_{WO1}$ is optimal.

Note that the set constraint of proper colorings typically do not enjoy nice geometric structures and the preference functions are completely arbitrary and purely decided by the particular event manager. Consequently, it is often difficult to explicitly control the feasibility of a new coloring when optimizing the objective in Problem $\mathsf{WO1}$ . Hence, it is natural to consider some relaxation to somehow decompose and reflect this network constraint to individual agents; i.e. each agent would commit the duty to satisfy the constraint. To achieve this goal, we define a utility function $u_{i}:C\rightarrow\mathbb{R}_{+}\cup\{0\}$ for each agent as

\displaystyle u_{i}(c):=\phi_{i}(c_{i})\mathds{1}_{c_{i}\neq c_{j},\forall j\in\mathcal{L}_{ij}}

(1)

and formulate another optimization problem

\displaystyle\operatorname*{\mathrm{maximize}}_{c\in C}\quad

\displaystyle\Phi^{\prime}(c):=\mathbf{f}(\{u_{1}(c),u_{2}(c),\dots,u_{n}(c)\}).

(

\mathsf{WO2}

)

Denote the optimal colorings of ( $\mathsf{WO1}$ ) and ( $\mathsf{WO2}$ ) by $c^{*}_{WO1}$ and $c^{*}_{WO2}$ in respective. In Section 3, we will show that $c^{*}_{WO1}=c^{*}_{WO2}$ under several conditions.

As mentioned, the information accessible to each event manager is limited and it is extremely expensive for each event manager to collect the latest situations of all other clubs in the entire network. We integrate this consideration of information scarcity into our modelling and restrict vertices’ exposures to two types of “local” information:

(Info-Type1)

A vertex can access to the information of his neighbors, including the current utility, the current color assignment and the weight in social welfare.
(Info-Type2)

A vertex can access to the information of the members holding the same color, including the current utility, the current color assignment and the weight in social welfare.

Despite its advantages in simulating realistic behaviours, such a VCG with restricted information may be associated with several vulnerabilities. For instance, scarce and unbalanced information can obstruct the convergence to an optimal solution to Problem $\mathsf{WO2}$ , and manager’s personal incentives may inconsilient with the global interest. We illuminate three threatening features of the VCGs as follows:

(Feature 1)

The agents are greedy: if an agent is offered an alternative color that improves his utility, he would have a high incentive to embrace it; otherwise, when confronted with a color he disdains, he would most times reject on the assignment.
(Feature 2)

The agents are uncomplacent: The agents are not shiftless but always active to try some other color, even if he is not aware of whether a better choice exists.
(Feature 3)

The agents are myopic. Since each agent is only informed of the current situation without presaging capabilities, he can hardly realize that a temporary sacrifice on the utility may result in better personal outcomes in later rounds.

In each case, the optimization procedure regarding Problem ( $\mathsf{WO2}$ ) would be affected or even undermined. For instance, the game is likely to fall stuck when any agent occupies a certain color regardless of his neighbor’s benefits, or constantly rejects on taking any color he dislikes. The search space thus severely shrinks and the optimal solution may become inattainable. Fortunately, we will show in later sections that, under mild conditions and some proper strategy, the convergence to an optimal or suboptimal solution for Problem ( $\mathsf{WO2}$ ) is immuned to the above features.

Throughout our following discussions, the games are most modeled by discrete-time Markov Chains (MCs) with carefully designed transition matrices. This is because of the stochastic nature embedded in our game-theoretic setting, and the fact that the G-VCG obeys the Markov property: evolution of the game is memoryless and would only be determined by the present state. The corresponding state space would be the space of all possible colorings. Techniques and methods like the first-step analysis and Markov Chain Monte Carlo (MCMC) would be employed for mathematical derivations and algorithm designs.

To summarize, the main contributions of this paper are as follows.

(1)

We extend the goal of finding a proper coloring (a feasibility problem) to finding some proper max-welfare coloring solution (a constrained optimization problem) through information restrictive VCGs as expatiated in Problem $\mathsf{WO1}$ and Problem $\mathsf{WO2}$ , which better caters for the goal for venue selection in event planning.
(2)

We investigate on a special class of G-VCGs where agents are indifferent about the color assigned but only concern possible clashes, namely the Identical Preference VCGs (IP-VCGs). As a continuation to the work led by Chaudhuri et al. (2008), we further propose and prove a stochastic upper bound of $O_{p}(\log n)$ for the convergence time to the optimal coloring, under mild assumptions.
(3)

Novel discussions are given on broad G-VCGs where an agent’s personal utility is contingent both on whether he is clash-free and the specific color assigned to him. Though a completely greedy policy no longer leads to optimality, we propose and show that a Metropolis-Hasting based policy that, while respecting agents’ feature of being greedy, help the self-organizing network move towards a optimal coloring in asynchronous settings and a special class of independently synchronous settings. Theories from the regular perturbed Markov Process (RPMP) are employed in proofs.
(4)

In a spirit of “robust coloring”, we further integrate into our formulation an extra term representing the expected loss from possible complementary edge connection, to increase the robustness of the optimal coloring in the sense firstly mentioned in yanez2003robust). To our knowledge, this is the first attempt to interact the objective of robust coloring with another optimization scheme such that the welfare and the risk are well balanced. A second-stage Metropolis-Hasting based algorithm taking the optimal solution to $\mathsf{WO1}$ as input is presented to solve RWO.

2 Literature Review

2.1 Network Coloring Game for Conflict Resolving

The concept of the self-organized VCG was informally proposed in a behavioural study reported in Kearns et al. (2006), which was motivated by a self-organizing venue assignment problem among the faculty members. With the same aim of resolving conflicts (i.e. achieving some proper coloring), experiments were conducted against different network topologies and with variations to the extent of limits on information sharing among the agents. The conflict-resolving status and time were well monitored and compared across networks with distinguished features. In Chaudhuri et al. (2008), such social interactions within a network was first formulated as a game with binary payoffs, and theoretical results were derived through ingenious combinatorial arguments. It was shown that, even the agents would adopt completely greedy strategies and are allowed to act simultaneously, a proper coloring is attainable when available colors are two more in number than the maximum degree $\Delta$ of the graph, and the procedure ends in $O(\frac{\log n}{\delta})$ rounds with probability $1-\delta$ . Almost at the same time, an independent work led by Panagopoulou and Spirakis (2008) studied a similar problem with a slightly different payoff scheme from a more game-theoretic perception. The authors showed that every Nash equilibrium of the corresponding VCG is feasible and locally optimal, and a characterization of the Nash equilibria was provided.

The two pioneering works attracted numerous attention to the field of decentralized coloring game. One branch of work focuses on customized results when the VCG is restricted to certain special graph structures (see, e.g., (Enemark, McCubbins, Paturi and Weller, 2011)). Another popular ramification is induced by the flexible game settings, both on game types ((Pelekis and Schauer, 2013; Carosi, Fioravanti, Gualà and Monaco, 2019)) and payoff features ((Kliemann, Sheykhdarabadi and Srivastav, 2015)) and employed strategies ((Hernández and Blum, 2012)). Of course, constant efforts have also been paid on improving the upper bounds for the convergence rate thus the complexity of the VCG (Bermond, Chaintreau, Ducoffe and Mazauric, 2019) or reduing the color supply (Fryganiotis, Papavassiliou and Pelekis, 2023). The idea of the VCG also shed light on many engineering problem-solvings; see, e.g. (Goonewardena, Akbari, Ajib and Elbiaze, 2014; Marden and Wierman, 2013; Touhiduzzaman, Hahn and Srivastava, 2018).

2.2 The Use of Markov Chains in Graph Coloring

A Markov chain is a stochastic process that models a sequence of random events in which the probability of each event depends only on the state of the system at the previous event. It has been widely used in graph coloring, which is NP-hard, due to its advantage in handling large-scale search in a systematic and efficient manner. One of the earliest Markov chain algorithms for graph coloring was proposed by Kirkpatrick, Gelatt Jr and Vecchi (1983) where the transition probability depends on the difference in the number of conflicting edges between the two colorings. The algorithm proved to be effective on small to medium-sized graphs had difficulty with larger graphs. In recent years, there have been several other Markov chain algorithms proposed for graph coloring, such as the Genetic Algorithm (Fleurent and Ferland, 1996; Hindi and Yampolskiy, 2012; Marappan and Sethumadhavan, 2013), Ant Colony Optimization (Salari and Eshghi, 2005; Dowsland and Thompson, 2008), and Particle Swarm Optimization (Cui, Qin, Liu, Wang, Zhang and Cao, 2008; Marappan and Sethumadhavan, 2021). Promising results have been shown on a wide range of graph coloring problems.

Markov chains have also been applied for color samplings. Aiming to approximately counting the number of the k-colorings in a graph, Jerrum (1995) converted this counting problem to estimating the mixing time of a Markov Chain. Inspired by the term of Glauber Dynamic in statistical physics, he presented an approach to ramdomly sampling colorings with maximum degree $\Delta$ in $O(n\log n)$ time with at least $2\Delta+1$ colors provided. Vigoda (2000) later proved that it suffices for the number of colors to be only $\frac{11}{6}\Delta$ for the $O(n\log n)$ bound. Further results were developed for some graphs with special features (see, e.g., (Hayes and Vigoda, 2006; Hayes, Vera and Vigoda, 2007)). Recently, Vigoda’s result on general graphs was improved by Chen, Delcourt, Moitra, Perarnau and Postle (2019) who proved that the chains are rapidly mixing in $O(n\log n)$ time when there are at least $(\frac{11}{6}-\epsilon_{0})\Delta$ colors where $\epsilon_{0}$ is a small positive constant, using the linear programming approach.

2.3 The Metropolis-Hasting Algorithm for Optimization

The Metropolis-Hastings algorithm is a versatile algorithm for solving complex optimization problems that was first introduced in a seminal paper by Metropolis et al. (1953) and further modified by Hastings (1970) who included a correction factor to ensure detailed balance. It relies on constructing a Markov chain that has the desired target distribution as its stationary distribution, from which it generates a sequence of samples. The algorithm has since been adapted for engineering applications emerging in various fields, including but not limited to signal processing (Luengo and Martino, 2013; Vu, Vo and Evans, 2014; Marnissi, Chouzenoux, Benazza-Benyahia and Pesquet, 2020), self-reconfiguration systems (Pickem, Egerstedt and Shamma, 2015), task allocation (Hamza, Toonsi and Shamma, 2021; Moayedikia, Ghaderi and Yeoh, 2020), etc.

A snapshot of the algorithm is as follows: In a finite-state space $S$ , denote the target distribution by $\pi(X)$ where $X$ is the variable ot interest. As an initialization one constructs a starting state $x_{0}$ and an irreducible proposal transition matrix $Q(x^{\prime}|x)$ to generate a new state $x^{\prime}$ from $x$ . The transition probability is then multiplied by an acceptance rate

\displaystyle\alpha(x\rightarrow x^{\prime})=\min\{1,\frac{\pi(x^{\prime})Q(x|x^{\prime})}{\pi(x)Q(x^{\prime}|x)}\}

and the updated transition probability, namely the target transition probability, is

	$\displaystyle P(x^{\prime}\|x)$	$\displaystyle=Q(x^{\prime}\|x)\alpha(x\rightarrow x^{\prime}),\quad x^{\prime}\neq x;$
	$\displaystyle P(x\|x)$	$\displaystyle=1-\sum_{x^{\prime}\neq x}P(x^{\prime}\|x).$

Note that the target transition probability satisfies the local balanced equation

\displaystyle\pi(x)P(x^{\prime}|x)=\pi(x^{\prime})P(x|x^{\prime})

thus the target distribution o $\pi(X)$ is exactly a stationary distribution of the Markov process induced by $P$ . The uniqueness of the stationary distribution is given by irreducibility of the chain.

3 Game Formulation

3.1 Setting

Let us first introduce the notations and terminologies that are used throughout this paper. Notations of the graph structure and the color collection follow from Section 1, and the maximum degree of the graph $G$ is denoted by $\delta(G)$ . We require $m\geq\mathcal{X}(G)$ where $\mathcal{X}(G)$ is the chromatic number of G; i.e. the least number of colors that enables a proper coloring. It is denoted by $V_{j}\rightarrow V_{i}$ if the information of $V_{j}$ can be shared to $V_{i}$ , and $V_{i}$ ’s family is defined as the set $\mathcal{F}_{i}:=\{V_{j}:V_{j}\rightarrow V_{i}\}$ . Let $c^{k}\in C$ (k = 1, 2, …, $|C|$ ) be the coloring realizations. By adding a subscript $i\in\{1,2,\dots,n\}$ we specify the particular color assigned to $V_{i}$ ; e.g. $c^{k}_{i}$ specifies the assignment under the coloring $c^{k}$ . When there is a change in coloring, say from $c^{k}$ to $c^{l}$ , let $\Delta u_{i}(c^{k}\rightarrow c^{l}):=u_{i}(c^{l})-u_{i}(c^{k})$ and $\Delta\Phi(c^{k}\rightarrow c^{l}):=\Phi(c^{l})-\Phi(c^{k})$ denote the difference in $V_{i}$ ’s utility and the welfare respectively under the two colorings. To emphasize on the color choice of a particular agent, we sometimes specify $u_{i}(c)$ as $u_{i}(c_{i},c_{-i})$ .

We say an agent is active in round $t$ if he wishes to update his color choice. Starting from any coloring $c(0)\in C$ , the game continues in discrete rounds. Let $c_{i}(t)$ denote the color assigned to $V_{i}$ in round $t$ and $\mathcal{A}(t)$ denote the active set containing all active agents in round $t$ . In each round, every active agent, driven by either (Feature 1) or (Feature 2), would attempt to update his color from certain available color set (or stategy set) $\mathcal{C}_{i}$ under some policy, while the inactive ones keep their colors unchanged. This induces a discrete-time Markov Chain on the coloring space $C$ with its transition matrix determined by the policy. A formal definition of the Generalized Vertex Coloring Game (G-VCG) is given as follows:

Definition 1 (G-VCG)

A G-VCG is a quadruplet $\mathcal{G}=(V,G,\mathcal{C},u)$ where:

–

$V=\{V_{1},V_{2},\dots,V_{n}\}$ : the set of agents (vertices).
–

$G(V,E)$ : the connected network constraining agents’ information, which is apriori to agents.
–

$\mathcal{C}_{i}\ni c_{i}$ : the set of available colors (pure strategies) for $V_{i}$ .
–

$\mathcal{C}=\prod_{i}\mathcal{C}_{i}$ : the set of coloring profiles.
–

$u=(u_{1},u_{2},\dots,u_{n})$ : the set of utility functions.

In the following sections, there will be discussions and experiments of G-VCGs on both asynchronous and synchronous conditions. A discrete game is said to be asynchronous if at most one agent is active in each round, while synchrony allows multiple active agents to update in the same round. In this paper, two particular types of synchrony, namely independent synchrony and complete synchrony, will be discussed.

Definition 2 (Independent and completely synchronous G-VCGs)

A G-VCG $\mathcal{G}$ is said to be independently synchronous if in each round $t$ of a G-VCG $\mathcal{G}$ , the probability that $V_{i}$ becomes active, denoted by $\omega_{t}\in(0,1]$ , is identical across $\forall V_{i}\in V$ . A completely synchronous game has all agents be active in each round; i.e. $\omega_{t}=1,\forall t\in\mathbb{N}$ .

3.2 Essential Assumptions

In this part of the section, we make several essential assumptions, most of which are to be assumed thoughout the paper unless mentioned otherwise.

Focusing on maximizing the total utility of the network with little regard to the utility distribution, we now make an additional assumption that the welfare function is utilitarian or weighted utilitarian where the weights are priori information to the agents. In later sections we will see that such welfare formulas would enable individual agents to evaluate their contributions to the total welfare by selecting a new color, even without refering to the entire network.

Assumption 1 (Utilitarian/Weighted-Utilitarian Welfare)

The welfare function $\Phi:C\rightarrow\mathbb{R}$ is defined by

\displaystyle\Phi(c)=\sum_{V_{i}\in V}w_{i}u_{i}(c)

where $w_{i}\geq 0$ and $\sum_{i}w_{i}=1$ .

As mentioned in Section 1, our main concern is whether the optimal solutions of ( $\mathsf{WO2}$ ) coincide with the ones of ( $\mathsf{WO1}$ ). It is straitforward that if $c^{*}_{WO2}$ is a proper coloring, then $c^{*}_{WO1}=c^{*}_{WO2}$ ; otherwise $c^{*}_{WO1}$ would be a better solution for ( $\mathsf{WO2}$ ). Yet, whether $c^{*}_{WO2}$ is proper highly depends on the abundance of colors provided. We make another assumption on the relationship between the caridinality of the color set and the maximum degree of the graph.

Assumption 2 (Abundant Colors)

The number of colors provided is at least one more than the maximum degree of the graph; i.e.

\displaystyle|M|\geq\delta+1.

Assumption 2 guarantees $c^{*}_{WO2}$ to be proper because each vertex $V_{i}$ always has a color that is distinguished from the neighborhood; i.e.

\displaystyle\forall c_{-i}\in\prod_{j\neq i}\mathcal{C}_{j},~{}\exists c_{i}\in\mathcal{C}_{i}\text{ such that }u_{i}(c_{i},c_{-i})>0.

(2)

To show that Assumption 2 is the weakest to ensure the properness of $C^{*}_{WO2}$ in whatever graph structures, we would like to give a counterexample when cases with $|M|=\delta$ thus $|M|\geq\delta$ may fail.

Example 1 ( $|M|\geq\delta$ is insufficient)

Consider the network structure in Figure 2 where each vertex has equal weight in contribution to the social welfare. Suppose $M=(Red,Green,Blue)$ so that $|M|=\delta=3$ . The preferences of each vertex on each color are given in Table 1. One can easily observe that the optimal coloring $c^{*}_{WO2}$ is as in Figure 2 where $V_{1}$ , $V_{2}$ and $V_{4}$ all obtain their top-preferred colors, and the optimal welfare is

\displaystyle\Phi(c^{*}_{WO2})=\frac{1}{4}(0\times 1+0\times 1+1\times 10+1\times 10)=5.

In this case, $V_{1}$ and $V_{2}$ compromises to bear in a clash whatever color he selects, therefore $c^{*}_{WO2}$ is not proper.

Figure 1: The network corresponding to Example 1 where

|M|=\delta

fails

Figure 2: The optimal coloring

c^{*}_{WO2}

for Example 1

	$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$
Red	1	1	1	1
Green	1	1	10	1
Blue	1	1	1	10

Table 1: Preferences on colors for Example 1

Moreover, special attention should be given to the step when the active agents modify their color choices. If not well regulated, the game would easily become vulnerable and two typical jeopardies include:

(i)

Small search space: Due to (Feature 1), every agent would incline to top-preferred colors and is loath to adopt other ones. This would severely reduce the search space of the process thus the optimal coloring is often attainable.
(ii)

Deadlock: Due to (Feature 3) and (Feature 1), an agent is not aware that sacrifice would bring “win-win” outcomes for both himself and the neighbors later. This would result in deadlocks in network, especially when the preference systems differ across the neighbors, when imprudent agents pursues temporary self-seeking colors while impairing the neighborhood’s progress.

In reality, one common approach to avoid these emotional behaviors is to involve in randomness (e.g. by lottery). We make an analogical assumption here yet still respect agents’ acceptance or rejection on the new color.

Assumption 3 (Random Color Update with Policy Acceptance)

We assume the following rules during the color updates of active agents in each round:

(R1)

An active agent, say $V_{i}$ , would first randomly select a color $c_{new}$ from a color set $\mathcal{C}_{i}$ . Let $\mathcal{C}_{i}=M$ in each round unless mentioned otherwise.
(R2)

After given $c_{new}$ , $V_{i}$ is given a transient probation period, during which he is only aware of $u_{i}(c_{new},c_{-i}(t))$ and possible influences to his family members. Utilities under other unchosen colors are not accessible.
(R3)

Based on the possible utility, a decision of acceptance (update the new color) or rejection (keep the old color) would be made by $V_{i}$ under some policy.

When making a decision, the latest information that an active agent can refer to is the current situations of the family members (which are obtained in the last round); i.e. the active agent $V_{i}$ assumes $c_{-i}(t+1)=c_{-i}(t)$ when deciding whether to accept the offer. The completely greedy policy is defined as follows:

Definition 3 (Completely Greedy Policy)

A completely greedy policy accepts whatever colors leading to better utility and rejects any color implying a worse or identically bad transcient utility; i.e. given $V_{i}$ active in round $t$ and a new color $c_{new}$ ,

\displaystyle c_{i}(t+1)=\begin{cases}c_{new}&\textit{if }u_{i}(c_{new},c_{-i}(t))>u_{i}(c_{i}(t),c_{-i}(t))\\ c_{i}(t)&\textit{if }u_{i}(c_{new},c_{-i}(t))\leq u_{i}(c_{i}(t),c_{-i}(t)).\end{cases}

(CGP)

Here is a brief summary on the game procedure:

(i)

Before round 0, an initial coloring $c(0)$ is assigned to the network $G$ . Each agent $V_{i}$ calculates his utility by evaluating $c_{i}(0)$ and detect clashes in his family.
(ii)

In round $t\geq 0$ , each element of $\mathcal{A}(t)$ pursues a new color by randomly sampling a color $j\in M$ . The inactive agents remain their colors in the previous rounds; i.e. $c_{l}(t+1)=c_{l}(t)$ for $V_{l}\notin\mathcal{A}(t)$ .
(iii)

Each $V_{k}\in\mathcal{A}(t)$ decides whether to accept $j$ under certain acceptance policy. If $j$ is accepted, $c_{k}(t+1)=j$ ; otherwise $c_{k}(t+1)=c_{k}(t)$ . The round t is finished and the game steps to round $t+1$ .

4 Identical Preference VCGs

In this section, we focus on a particular category of the G-VCGs called Identical Preference VCGs (IP-VCGs), whose definition is given as below.

Definition 4 (IP-VCGs)

A G-VCG is an IP-VCG if each vertex lays identical preference on every color; i.e.

\displaystyle\phi_{i}(m)=\phi_{i},~{}\forall m\in M

where $\phi_{i}$ is a constant. In other words, for each $V_{i}\in V$ , his utility function $u_{i}$ is indifferent about the specific color assigned. In reality, such situations are frequently witnessed when event managers are in a rush such that their goal is to confirm a feasible venue regardless of other preferences.

Note that the social welfare of an IP-VCG is

\displaystyle\begin{split}\Phi(c)&=\sum_{i=1}^{n}w_{i}\phi_{i}(c_{i})\mathds{1}_{c_{i}\neq c_{j},\forall j\in\mathcal{L}_{ij}}\\ &\leq\sum_{i=1}^{n}w_{i}\phi_{i}\end{split}

where the equality holds if and only if $\mathds{1}_{c_{i}\neq c_{j},\forall j\in\mathcal{L}_{ij}}=1$ for $\forall i\in\{1,2,\dots,n\}$ . Therefore, solving Problem ( $\mathsf{WO2}$ ) for IP-VCGs is equivalent to finding a proper coloring which recovers the binary-payoff setting in previous works (see, e.g. Kearns et al. (2006); Chaudhuri et al. (2008)). Thus, in the rest of this section, we restrict our attention to the simplified problem with utility

\displaystyle u_{i}(c)=\mathds{1}_{c_{i}\neq c_{j},\forall j\in\mathcal{L}_{ij}},i=\{1,2,\dots,n\}.

(3)

4.1 Convergence of Completely Synchronous IP-VCGs to Optimality under the CGP

We examine in this subsection whether the convergence of an IP-VCG to its optimal coloring under CGP which is often driven by (Feature 1). As mentioned, it suffices to answer this question: can a proper coloring be achieved in a game with binary utilities (3) under CGP? It is obvious that an active agent with zero utility would accept any color unused by his neighbor to avoid clashes, and those agents with utility one, though can still be active, would never accept any new color because they have already attained their best personal utilities. Under our Assumption 2, it is straightforward that asynchronous IP-VCGs will converge to an optimal coloring in finite rounds since there are always extra colors different from all neighbors’ for each agent to select, and the probability of such a selection is positive. In general, analysis on an arbitrary synchronous G-VCG could be quixotic for tremendous randomness and the optimality may not be guaranteed. However, we observe that the extreme setting of complete synchrony brings peculiar properties which are worth an extra attention. As such, in the rest of the section, we concentrate our arguments on the completely synchronous cases.

We would like to point out that a similar problem was discussed in the seminal work Chaudhuri et al. (2008) which also concerned about the binary utility VCG. The main difference of their setting is that, unlike in Assumption 3 where we assume $\mathcal{C}_{i}=M$ , they restricted the available color set for each agent to be the colors currently unused by his neighbors; i.e. $\mathcal{C}_{i}(t)=M\backslash\{c_{j}(t):V_{j}\in\mathcal{N}(V_{i})\}$ . Though could be more efficient, their setting suffers from a severely reduced search space because state transfers are often only allowed in one direction, which impairs possible generalizations. Besides, their setting requires a stronger assumption than our Assumption 2 that the number of colors must be at least two more than the maximum degree. Notice that the counterexample they gave on a failure of convergence when $m=\delta(G)+1$ , however, can be solved in our setting for better color availability. We place their example here for the sake of completeness and better understanding.

Example 2 (Theorem 2 in Chaudhuri et al. (2008))

Consider $G$ being a cycle with five vertices $V=\{v_{1},v_{2},v_{3},v_{4},v_{5}\}$ thus $\delta(G)=2$ . Given 3 colors, say $R$ , $G$ and $B$ , and suppose that the initial configuration is $(R,G,B,B,G)$ . In the scenario when all agents with zero utility are active and obtain colors unused by neighbors, $v_{3}$ and $v_{4}$ will always be active yet keep clashing with each other, thus a proper coloring would never be reached. Nevertheless, under our setting and Assumption 3, this conflict can be easily resolved; e.g. when $v_{3}$ selects the $R$ while $v_{4}$ does not, the configuration becomes $(R,G,R,B,G)$ .

Before stepping further, we first introduce a special type of Markov Chain, namely Absorbing Markov Chain (AMC), which will be our main tool of proving the convergence.

Definition 5 (Absorbing Markov Chain)

Given a Markov Chain $(Z_{t})_{t\in\mathbb{N}}$ , a state $i$ is called absorbing if $\Pr[Z_{t+k}=i\mid Z_{t}=i]=1$ , $\forall k\in\mathbb{N}$ , otherwise is called transient. The chain $(Z_{t})_{t\in\mathbb{N}}$ is defined to be an absorbing Markov Chain (AMC) if there exists at least one absorbing state that is accessible from any transient state.

In the context of a 0-1 VCG, which is a simplification of IP-VCGs, one may use a list with binary digits to represent the utility of players after each round. Such utility lists have a length of $n$ with each element representing the utility of a vertex, and there are $2^{n}$ possible outcomes. Additionally, since it is impossible for all but one players to have utility equal to 1, the total number of possible cases is reduced by $n$ . Let $S$ denote the utility space consisting of these $2^{n}-n$ utility lists on which a Markov Chain $(X_{t})_{t\in\mathbb{N}}$ can be run. The transition probability reflects the likelihood of a change in utility. The following Lemma 1 states that $(X_{t})_{t\in\mathbb{N}}$ is an AMC.

Lemma 1 (AMC on Utility Space)

Let Assumptions 1, 2 and 3 hold. Consider a completely synchronous IP-VCG under CGP with utility functions 3. Given the finite utility space $S$ containing all binary utility lists $\{L^{1},L^{2},\dots,L^{2^{n}-n}\}$ , a discrete-time Markov Chain $(X_{t})_{t\in\mathbb{N}}$ is established with transition matrix $P$ whose elements are defined as $P_{kl}=\Pr[X_{t+1}=L^{l}|X_{t}=L^{k}]$ . Then $(X_{t})_{t\in\mathbb{N}}$ is an AMC.

Proof of Lemma 1 1

See Appendix B.1.

Remark 2

Note that $(X_{t})_{t\in\mathbb{N}}$ always jumps from a low-welfare state to a high-welfare state but never in the reverse direction, because agents with utility one would never accept a new color thus remains his utility forever under the completely greedy policy.

A classical property of an AMC is that it will be absorbed eventually, which means the convergence to a proper coloring in our case.

Theorem 2 (Convergence to Optimality)

Let Assumptions 1, 2 and 3 hold. Any completely synchronous IP-VCG under CGP converges to an optimal coloring $c^{*}_{WO2}$ .

Proof of Theorem 2 2

See Appendix B.1.

4.2 Stochastic Boundedness on the Time to Convergence of Completely Synchronous IP-VCGs

Given an optimal convergence of IP-VCGs, we are interested in the time to convergence with complexity measurements. In their setting of $\mathcal{C}_{i}(t)=M\backslash\{c_{j}(t):V_{j}\in\mathcal{N}(V_{i})\}$ , Chaudhuri et al. (2008) proposed and proved that the game ends in $O(\log\frac{n}{\epsilon})$ rounds with probability at least $1-\epsilon$ when $m\geq\delta+2$ . They took advantage of an important property that any zero-utility agent could earn a utility rise in two consecutive rounds with a probability above some positive constant. It supprisingly proves that this property extends to the completely synchronous 0-1 IP-VCGs thus to all IP-VCGs under CGP. This is given as Theorem 3 below and the proof is an anlogue to Lemma 3 in Chaudhuri et al. (2008) with subtle adjustments.

Theorem 3 (Individual Utility Rise in Two Consecutive Rounds)

Let Assumptions 1, 2 and 3 hold. Consider a completely synchronous IP-VCG under CGP with utility functions 3. Suppose $V_{i}\in V$ has a clash in round $t$ thus $u_{i}(t)=0$ , then with at least a constant probability, he would become clash-free and obtain a positive utility after two rounds; i.e.

\displaystyle\Pr[u_{i}(t+2)>0|u_{i}(t)=0]\geq\mathbf{L},\quad\forall t\in\mathbb{N}

where $\mathbf{L}$ is a positive constant number.

Proof of Theorem 3 3

See Appendix B.2.

The constant lower bound, though relatively small, is useful because it provides some deterministic information in a stochastic self-organizing game scenario. More importantly in our discussions, the constant probability for a utility rise in two consecutive rounds sheds light on the behavior of the Markov Chain as defined in Theorem 1, besides the upshot of being absorbed. The following lemma demonstrates respective bounds on the expectation and variance of the number of rounds to convergence.

Lemma 4 (Bounded Expectation and Variance for Time to Convergence)

Let Assumptions 1, 2 and 3 hold. Consider a completely synchronous IP-VCG $\mathcal{G}$ under CGP associated with a Markov Chain $(X_{t})_{t\in\mathbb{N}}$ as defined in Theorem 1. Then $\mathcal{G}$ is expected to converge to an optimal coloring to ( $\mathsf{WO2}$ ), $c^{*}_{WO2}$ , in $\mathcal{O}\log n$ rounds; in addition, the variance of the time to convergence is $\mathcal{O}(\log n)^{2}$ .

Proof of Lemma 4 4

See Appendix B.2. Here is a sketch for our proof: Again we focus on the representative candidate of IP-VCGs with binary utility functions 3. To most utilize the information given in Theorem 3, we generate $(X_{t})_{t\in\mathbb{N}}$ to jump two steps at a time and then double the number of steps. We also pay special attention to the first migration of $(X_{t})_{t\in\mathbb{N}}$ and conduct the so-called “first-step analysis” to derive out certain induction relationship on the expected step between different initial states.

We are now ready to state the main theorem in this section, in which we show the stochastic boundedness of $T$ , the time (i.e. number of steps) to convergence.

Theorem 5 (Stochastic Boundedness of the Time to Convergence)

Let Assumptions 1, 2 and 3 hold. Consider a completely synchronous IP-VCG $\mathcal{G}(V,G,\mathcal{C},u)$ under CGP associated with a Markov Chain $(X_{t})_{t\in\mathbb{N}}$ as defined in Theorem 1. Let $T^{(n)}$ denote the number of steps needed for $\mathcal{G}$ to converge in an optimal coloring $c^{*}_{WO2}$ . Then $T^{(n)}$ is $\mathcal{O}_{p}(\log n)$ ; i.e. $\frac{T^{(n)}}{\log n}$ is bounded in probability as

\displaystyle\forall\epsilon>0,\exists\mathbf{M}_{\epsilon}\in\mathbb{R}_{+},\mathbf{N}_{\epsilon}\in\mathbb{N}_{+}\rightarrow\Pr[|\frac{T^{(n)}}{\log n}|>\mathbf{M}_{\epsilon}]<\epsilon,\forall n>N_{\epsilon}.

Moreover, $\mathbf{N}_{\epsilon}$ can be taken as a constant independent of $\epsilon$ when $n$ is fixed.

Proof of Theorem 5 5

See Appendix B.2.

5 A Metropolis-Hasting Based Optimal Policy for G-VCGs

In this section, we wish to examine the likelihood for the self-organizing G-VCGs in more general settings to attain its optimal coloring, possibly with some carefully-designed acceptance policy. A rudimentary scrutiny would reveal that the completely greedy policy CGP would no longer guarantee a convergence to optimality as a consequence of (Feature 1) and (Feature 3). Below is a simple example.

Example 3

Consider the graph in Figure 3 and the color preference matrix in Table 2. Assumption 2 would be satisfied when three colors are provided, namely $R,G,B$ . Given the initial color assignment $c(0):=(c_{1}(0),c_{2}(0),c_{3}(0))$ to be $(R,G,B)$ . The optimal coloring would be $(G,B,G)$ . However, under CGP, $V_{2}$ would never accept another color because $u_{2}(G,c_{-2})>u_{2}(c_{2},c_{-2})$ for $\forall c_{2}\neq G$ . Consequently, $V_{1}$ and $V_{3}$ would never accept $G$ when active because, for example, $u_{1}(c_{1},G,c_{3})>u_{1}(G,G,c_{3})$ for $\forall c_{1}\neq G$ . The game thus never converges to its optimality.

Figure 3: The graph in Example 3

	$V_{1}$	$V_{2}$	$V_{3}$
Red	1	1	1
Green	10	10	10
Blue	1	2	1

Table 2: The preference matrix in Example 3

Therefore, it is necessary for the agent temporarily with high utility to be a bit more “altruistic” to avoid such standstills.

Among literatures on solving the agent-based evolutionary optimization procedure, a similar network setting was witnessed in Marden and Shamma (2012) where a policy based on “log-linear learning” was evaluated. Nevertheless, this approach does not apply to G-VCGs because they assumed

(i)

The game is potential; i.e. $\Delta u_{i}=\Delta\Phi$ . This is not true in our setting as a modification on an agent’s choice would bring not only a difference to his utility but also to his neighbors’.
(ii)

For any active agent, the utility for all choices must be exhausted to make a decision. This violates (R2) in our Assumption 3 and more space is required to store the latest utilities.

Alternatively, inspired by the Metropolis-Hasting algorithm as reviewed in 2.3, we propose another delicate policy adpative to our setting of G-VCGs with sound converging behaviours. We will call it MH-Policy and details are given in Section 5.2. We would like to mention that the spirit has been informally applied to task allocation of robots in Hamza et al. (2021). Yet, there lacked a mathematical discussion, from agents’ standpoints, on whether the policy violates agents’ incentives (e.g. (Feature 1) and (Feature 3)); otherwise the self-organizing agents woudl be reluctant to follow the rule. The gap is then filled in this section. Note that instead of the explicit stationary distribution, our focus is on the support of its stationary distribution, namely the stochastically stable states.

5.1 Detour: Regular Perturbed Markov Processes and Resistance Tree

It is acknowledged that, despite the mentioned limitations, we still gain parts of our insights from the work led by Marden and Shamma (2012). Specifically, this detour section introduces some preliminary terminologies and results from Young (1993) that will be invoked in the proofs of our later results. We first give the definition of a regular perturbed Markov process (RPMP).

Definition 6 (Regular Perturbed Markov Process)

Let $(Z_{t}^{0})_{t\in\mathbb{N}}$ be a finite-state Markov chain over the finite-state space $S$ with a transition matrix $P^{0}$ . We refer to it as an unperturbed process. Consider a perturbed process $(Z_{t}^{\epsilon})_{t\in\mathbb{N}}$ such that the extent of perturbation can be indexed by a scalar $\epsilon>0$ , and let $P^{\epsilon}$ be the associated transition matrix. Then $(Z_{t}^{\epsilon})_{t\in\mathbb{N}}$ is defined to be a regular perturbed Markov process (RPMP) if the following conditions are satisfied:

(i)

$(Z_{t}^{\epsilon})_{t\in\mathbb{N}}$ is irreducible, $\forall\epsilon>0$ .
(ii)

$\lim_{\epsilon\rightarrow 0^{+}}P_{s\rightarrow s^{\prime}}^{\epsilon}=P_{s\rightarrow s^{\prime}}^{0}$ , $\forall s,s^{\prime}\in S$ .
(iii)

$P_{s\rightarrow s^{\prime}}^{\epsilon}>0$ for some $\epsilon>0$ $\Rightarrow$ $0<\lim_{\epsilon\rightarrow 0^{+}}\frac{P_{s\rightarrow s^{\prime}}^{\epsilon}}{\epsilon^{R(s\rightarrow s^{\prime})}}<\infty$ .

where $R(s\rightarrow s^{\prime})$ in (iii) is some nonnegative real number and is referred to as the resistance of the transition $s\rightarrow s^{\prime}$ .

Remark 3

We would like to remark that since irreducible, a RPMP on a finite-state space has a unique stationary distribution, namely $\pi^{\epsilon}$ , which is uniquely determined by the balance equation $\pi=\pi P$ . See, e.g., Rosenthal (2006) for details.

Another important concept we would like to review here is the construction of a resistance tree with some associated stochastic potential.

Definition 7 (Resistance Tree and Stochastic Potential)

Given the state space $S$ in Definition 6, we construct a complete graph (i.e. a graph with an edge between any pair of its vertices) with $|S|$ vertices, each denoted as $s_{i}$ for $i=1,2,\dots,|S|$ . For any directed edge $(s_{i},s_{j})$ , we assign on it a weight of $\rho_{ij}=R(s_{i}\rightarrow s_{j})$ where $R(\cdot)$ is the corresponding resistance. A directed path $\mathcal{P}$ with length $|\mathcal{P}|$ is a sequence of joint directed edges $\mathcal{P}=\{s^{0}\rightarrow s^{1}\rightarrow\cdots,\rightarrow s^{|\mathcal{P}|}\}$ and the restistence of the path $\mathcal{P}$ is defined as $R(\mathcal{P}):=\sum_{k=1}^{|\mathcal{P}|}R(s^{k-1}\rightarrow s^{k})$ . Let $\mathcal{T}_{i}$ be a tree containing all $|S|$ vertices and rooted at the vertex $s_{i}$ , such that there exists a unique directed path $\mathcal{P}_{j\rightarrow i}$ from any $s_{j}\neq s_{i}$ . The resistance of $\mathcal{T}_{i}$ is defined as

\displaystyle R(\mathcal{T}_{i}):=\sum_{\{s\rightarrow s^{\prime}\}\in\mathcal{T}_{i}}R(s\rightarrow s^{\prime}).

Among $\mathcal{T}_{i}$ for $\forall i=1,2,\dots,|S|$ , there exist $\mathcal{T}_{i}^{*}$ (s) having minimum resistance $\gamma_{i}=\min_{\mathcal{T}_{i}}R(\mathcal{T}_{i})$ . $\gamma_{i}$ is then referred to as stochastic potential of state $s_{i}$ .

The above terminologies will play key roles in our later proofs, via a powerful theorem lemma from Young (1993).

Theorem 6 (Characterization for Stochastically Stable States)

Consider an RPMP $(Z_{t}^{\epsilon})_{t\in\mathbb{N}}$ with a transition matrix $P^{\epsilon}$ defined on the state space $S$ . The stochastically stable states are exactly $\{s_{i}\in S:\gamma_{i}=\min_{j\in\{1,2,\dots,|S|\}}\gamma_{j}\}$ .

Proof of Theorem 6 6

See Young (1993) Lemma 1.

5.2 The MH-Policy in Asynchronous G-VCGs

We start from the easiest setting where agents become active and update their colors one by one. A Markov chain $(Y_{t})_{t\in\mathbb{N}}$ is established on the coloring space $C$ with each coloring as a state. The elements of the transition matrix correspond to the probability of a direct transition between the two states. Our aim is to adjust the transition probabilities of the chain (i.e. the acceptance probability of an active agent) so that stochastically stable states only contain $c^{*}_{WO2}$ ’s.

We would like to explain on some possible concerns regarding the policy before stepping towards the details in the MH-policy. In the policy, one may question that (Info-Type2) seems useless for individual decision-making. Indeed, without a loss of generality, we would restrict our consideration only on (Info-Type1) (i.e. assume (Info-Type2) is not inaccessible) for the sake of simplicity in the rest of this section, which equalizes the neighborhood $\mathcal{N}(V_{i})$ with the family $\mathcal{F}_{i}$ . However, we are to show in Section 6 on why (Info-Type2) is important for some special purposes of the G-VCGs and how it can be tractably integrated in our formulations.

The proposed MH-policy is now presented. In order to maximize the objective value in $\mathsf{WO2}$ , as in the Metropolis-Hasting algorithm, we wish the acceptance probability for each active agent given the selected color $c_{new}$ to be

\displaystyle\alpha[(c_{i}(t),c_{-i}(t))\rightarrow(c_{new},c_{-i}(t))]=\min\{1,e^{\frac{1}{\tau}\Phi(c_{new},c_{-i}(t))-\Phi(c_{i}(t),c_{-i}(t))}\}

(4)

to satisfy the local blanced equation for the stationary distribution $\pi$ , where

\displaystyle\pi(c)=\frac{e^{\frac{1}{\tau}\Phi(c)}}{\sum_{\tilde{c}\in C}e^{\frac{1}{\tau}\Phi(\tilde{c})}}

is a Boltzmann distribution (McQuarrie, 2000) and $\tau$ is some temperature parameter. Here we omit the terms of the proposal probability because all proposal probabilities equal in a randomized selection. Notice that, as one decreases the temperature $\tau$ , the target distribution only puts weights on the state(s) where the social welfare $\Phi(\cdot)$ is maximized.

Input:

G(V,E)

M

c(0)

T

\tau_{0}

Output:

c^{*}_{WO2}

(equivalently,

c^{*}_{WO1}

)

1 Initialize

c\leftarrow c(0)

t\leftarrow 0

\tau(0)\leftarrow\tau_{0}

2 while $t<T$ do

3 Activate

V_{i}\in V

randomly.

4 Sample a color

c_{new}

to be

c_{i}(t+1)

from the available color set

\mathcal{C}_{i}=M

randomly.

c_{-i}(t+1)=c_{-i}(t)

6 Calculate the acceptance ratio

\alpha(t)=\min\{1,e^{\frac{1}{\tau(t)}\sum_{j\in\mathcal{F}_{i}\cup\{V_{i}\}}w_{j}\Delta u_{j}((c_{i}(t),c_{-i}(t))\rightarrow(c_{new},c_{-i}(t))}\}

7 Generate a number

K\in[0,1]

randomly.

8 if $K<\alpha(t)$ then

c_{i}(t+1)=c_{new}

else

c_{i}(t+1)=c_{i}(t)

t\leftarrow t+1

12 Reduce

\tau(t-1)

\tau(t)

under some scheme.

c^{*}_{WO2}\leftarrow c(T)=(c_{1}(T),c_{2}(T),\dots,c_{n}(T))

14 return

c^{*}_{WO2}

Algorithm 1 Realization of the MH-policy in asynchronous G-VCGs

The welfare difference in the acceptance ratio (4) cannot be directly reflected from individual’s utility difference as a modification on an agent’s choice would likely affect his neighbors’ utilities as well. Yet, we notice that, a clever way for an active agent $V_{i}$ to keep aware of the total difference in the entire network is to monitor the imminent local utility difference across his family $\mathcal{F}_{i}$ before he accepts or rejects. This is because an agent choice can affect the utility of nobody other than his neighbors $V_{j}\in\mathcal{N}(V_{i})$ , and we have unified $\mathcal{F}_{i}$ and $\mathcal{N}(V_{i})$ to be equivalent. Formally, when $V_{i}$ is the unique active agent in the game, it holds that

\displaystyle\Delta\Phi(c^{k}\rightarrow c^{l})=\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})

(5)

where $c^{k}=(c_{i}(t),c_{-i}(t))$ and $c^{l}=(c_{new},c_{-i}(t))$ , and the term on the righth-and side is defined as

\displaystyle\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l}):=\sum_{j\in\mathcal{F}_{i}\cup\{V_{i}\}}w_{j}\Delta u_{j}(c^{k}\rightarrow c^{l}).

(6)

Using the terminology in a foundational work Monderer and Shapley (1996), there exists a potential relationship between $\Phi(\cdot)$ and $U_{\mathcal{F}_{i}\cup\{V_{i}\}}(\cdot)$ . The procedure of realizing the MH-policy is summarized in the pseudo-code Algorithm 1 in the asynchronous version.

Before moving on to prove the optimality of Algorithm 1, we would like to demonstrate why the MH-policy can be appealing to agents; in other words, it respects the human nature of (Feature 1) and (Feature 3). This is because when updating under the MH-policy:

(i)

An active agent $V_{i}$ will accept $c_{new}$ whenever his own utility $u_{i}(c_{new},c_{-i}(t))$ increases: if $u_{i}(c_{i}(t),c_{-i}(t))=0$ and $u_{i}(c_{new},c_{-i}(t))>0$ then $u_{i}(c_{new},c_{-i}(t))-u_{i}(c_{i}(t),c_{-i}(t))<\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})$ since $V_{i}$ ’s zero-utility neighbors may also have the clash resolved; else, if $u_{i}(c_{i}(t),c_{-i}(t))>0$ and $u_{i}(c_{new},c_{-i}(t))>u_{i}(c_{i}(t),c_{-i}(t))$ , then $u_{i}(c_{new},c_{-i}(t))-u_{i}(c_{i}(t),c_{-i}(t))=\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})$ as the neighbors’ utilities are unaffected. In both cases, $e^{\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})}>1$ .
(ii)

An active agent $V_{i}$ is not obliged to give up on his old color whenever his own utility $u_{i}(c_{new},c_{-i}(t))$ decreases: if $u_{i}(c_{i}(t),c_{-i}(t))>0$ and $u_{i}(c_{new},c_{-i}(t))=0$ then $u_{i}(c_{new},c_{-i}(t))-u_{i}(c_{i}(t),c_{-i}(t))>\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})$ since $c_{new}$ induces color clashes between $V_{i}$ and some neighbors whose utility would immediately drop to zero; else, if $u_{i}(c_{i}(t),c_{-i}(t))>0$ and $u_{i}(c_{new},c_{-i}(t))>0$ , then $u_{i}(c_{new},c_{-i}(t))-u_{i}(c_{i}(t),c_{-i}(t))=\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})$ as the neighbors’ utilities are unaffected. In both cases, $e^{\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})}<1$ .
(iii)

The more $V_{i}$ ’s utility would drop, the more likely he would keep his old color. The arguments are the same as (ii) along with the fact that the worst utility is zero.

We now step to prove that the MH-policy indeed leads to $c_{WO2}^{*}$ . To do so we first state that the Markov Chain induced by the MH-policy is in fact an RPMP.

Lemma 7 (MH-Policy Induces RPMP)

Let Assumptions 1, 2 and 3 hold. Consider an asynchronous G-VCG $\mathcal{G}$ . The MH-policy induces an RPMP where the unperturbed process only accesses to the colorings bringing in better welfares. The corresponding resistance of a state transition $c\rightarrow c^{\prime}$ is

\displaystyle R(c\rightarrow c^{\prime})=\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c\rightarrow c‘)\}

for $c:=(c_{i},c_{-i})$ and $c^{\prime}:=(c_{i}^{\prime},c_{-i})$ only differing on vertex $V_{i}$ .

Proof of Lemma 7 7

See Appendix C.1.

We are now able to invoke properties of RPMP to investigate on the support of the stationary distribution $\pi(\cdot)$ , i.e. the set of stochastically stable states, which only contains states leading to the global optimal welfare rather than a local optimum. Besides, it turns out that we can establish relationship between the resistances and the final welfares. Equipped with these tools, we substantiate that the MH-policy indeed leads the game to $c^{*}_{WO2}$ .

Theorem 8 (The MH-Policy is Optimal)

Let Assumptions 1, 2 and 3 hold. Consider an asynchronous G-VCG $\mathcal{G}$ . Then $\mathcal{G}$ under the MH-policy converges to $c^{*}_{WO2}$ , the global optimal solution to $\mathsf{WO2}$ (or, equivalently, to $\mathsf{WO1}$ by (2)).

Proof of Theorem 8 8

See Appendix C.1.

5.3 The MH-Policy in Independently Synchronous G-VCGs

We turn to the more general synchronous setting where agents may become active simultaneously. As already mentioned in Section 4, an arbitrary synchronous G-VCG is extremely difficult to trace and optimality may be no longer guaranteed. However, inspired by Marden and Shamma (2012), we show in this section that when restricted to a collection of independently synchronic G-VCGs, the optimality of the MH-policy is still preserved when the probability of each agent being active is small enough.

Analogue to our discussions in Section 5.2, the following Lemma 9 indicates that the MH-policy in independent synchronous settings again induces some RPMP with different resistances of state transitions.

Lemma 9 (MH-Policy Induces RPMP with independent synchrony)

Let Assumptions 1, 2 and 3 hold. Consider an independently synchronous G-VCG $\mathcal{G}$ with an activation parameter $\omega_{t}$ . Define $\epsilon_{t}:=e^{-\frac{1}{\tau(t)}}$ . Then the MH-policy induces an RPMP and the unperturbed process only accesses to the colorings bringing in better welfares. The corresponding resistance of a state transition $c\rightarrow c^{\prime}$ is

\displaystyle R(c\rightarrow c^{\prime})=\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}

where $G:=\{V_{j}:c_{j}\neq c_{j}^{\prime}\}$ .

Proof of Lemma 9 9

See Appendix C.2.

It remains to check whether a stochastically stable state corresponds to a maximum welfare value. Unfortunately, this is not always the case as a G-VCG would be likely to get stuck in some local maximum state. Nevertheless, in the next Theorem 10, we will show the optimality of the MH-policy is preserved when the activation probability $\omega_{t}$ is small enough when $\log_{\epsilon_{t}}\omega_{t}=\mathcal{K}>0$ is constant.

Theorem 10 (Optimality of the MH-Policy with low- $\omega$ independent synchrony)

Let Assumptions 1, 2 and 3 hold. Consider an independently synchronic G-VCG $\mathcal{G}$ with activation probability $\omega_{t}$ . Define $\epsilon_{t}:=e^{-\frac{1}{\tau(t)}}$ . If $\log_{\epsilon_{t}}\omega_{t}=\mathcal{K}$ is constant and $\mathcal{K}\geq 2\phi^{*}$ , then the MH-policy also induces another RPMP with resistance

\displaystyle R(c\rightarrow c^{\prime})=\sum_{V_{i}\in G}\mathcal{K}|G|+\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}

and $\mathcal{G}$ under the MH-policy converges to $c^{*}_{WO2}$ , the global optimal solution to $\mathsf{WO2}$ (or, equivalently, to $\mathsf{WO1}$ by (2)).

Proof of Theorem 10 10

See Appendix C.2.

Remark 4

Theorem 10 serves as a theoretical gurantee for the asymptotic behavior of the RPMP induced by MH policy, yet would be impractical to observe as $\omega_{t}$ becomes extremely small when $t\rightarrow\infty$ . However, empirical evidence from experiments presented in Section 7 show that the MH policy still performs well for independently synchronous cases with some constant $\omega$ . Any relaxation on the assumption made in Theorem 10 would be left to future work.

6 Robust Welfare Optimization for G-VCGs

In previous sections, we exhaust the information of (Info-Type1) for agents’ decision-making while (Info-Type2) is not fully employed. In this section, we would convince the readers on the importance of the latter in real-life events planning and how it can help with some welfare optimization in G-VCGs for additional purposes, namely robust welfare optimization (RWO). Besides the factors introduced above, another key consideration in venue selection of events planning is the time between consecutive events conducted in the same venue. If two events arranged in the same venue have their schedules close to each other, the second one would take higher risk of being affected by a possible overrun of the previous one. Therefore, an ideal welfare function should impose penalties for colorings where two inadjacent vertices highly likely to be linked are share the same colors. The probability of the presence of an complementary edge between vertices $V_{i}$ and $V_{j}$ can be estimated by the below formula proposed by Lim and Wang (2005)

\displaystyle p_{ij}\equiv\frac{\tilde{t}}{\max\{T_{i}^{s},T_{j}^{s}\}-\min\{T_{i}^{e},T_{j}^{e}\}+\alpha}.

(7)

where $\tilde{t}$ is the minimum event transfer time, $(T^{s}_{i},T^{e}_{i})$ and $(T^{s}_{j},T^{e}_{j})$ are the starting and ending time for the two events, and $\alpha$ is some predefined constant. In the rest of this section, we assume such probabilities are public information in each family.

The idea of RWO comes from the concept of robust coloring problem (RCP) firstly proposed in yanez2003robust and soon gained numerous popularities; see, e.g., Lim and Wang (2005); Wang and Xu (2013); Archetti, Bianchessi and Hertz (2014). All these literatures analyzed the robust coloring as a separate optimization problem while, to our knowledge, have not considered its iteraction with other objectives. We therefore fill the gap to include in our discussion the robust welfare optimization as an “extension” to the baseline $\mathsf{WO1}$ and $\mathsf{WO2}$ , thus control the social welfare with more influencing factors in the network. By “extension”, we will explain later that $c^{*}_{WO1}$ would be an input of our algorithm to solve RWO which should have been figured out in Algorithm 1; i.e. one can interpret RWO as a second-stage problem.

6.1 Motivation and Formulation

We would like to give a brief overview on the main idea behind the RCP before formally introducing our RWO problem. Besides requiring the coloring to be proper as classical graph coloring does, a robust coloring takes into account the complementary edges in a network. Given each complementary edge a probability to be connected, the objective is to fathom out a most “robust” coloring in a sense that the probability for the properness to be destroyed by connecting one or more complementary edges is minimized. The number of colors $m$ is fixed as a constraint. An RCP is formulated as follows:

\displaystyle\begin{split}\operatorname*{\mathrm{minimize}}_{c\in C}\quad&\sum_{\{V_{i},V_{j}\}\in\bar{E};c_{i}=c_{j}}p_{ij}\\ \mathrm{subject~{}to}\quad&c_{i}\neq c_{j},\forall i,\forall j\in\{j:\mathcal{L}_{ij}=1\}\end{split}

(RCP)

where $p_{ij}$ is the probability for a complementary $\{V_{i},V_{j}\}$ to be connected.

The literature works as mentioned explored multifarious applications which can be modelled by RCP, which simply assumed an equal “damage” brought by any new pair-connection, to our knowledge. Note that the formulation of RCP reflects on this assumption by giving each pair-connection a damage of 1, without a loss of generality. This is, however, not true in our setting of G-VCGs with utility variations, since the amount of welfare can be affected differently when a different complementary edge is connected. Once a complementary edge is connected between two vertices in the same color, the utilities of both get eliminated from the total social welfare as a new clash occurs. Indeed in reality, robustness is most times not the only pursuit and there is always a trade-off between welfare and risk (i.e. decreasing robustness). To integrate the consideration of such robustness into our welfare optimization problem, we add in $\mathsf{WO1}$ an “expected loss” term $\mathds{E}(\mathfrak{L})$ to be minimized. This is inspired by the spirit of the general stochastic optimization to quantify the average uncertainty. We formulate our RWO problem to be

\displaystyle\begin{split}\operatorname*{\mathrm{maximize}}_{c\in C}\quad&\Phi_{R}(c)=\sum_{V_{i}\in V}w_{i}u_{i}(c)-\mathds{E}(\mathfrak{L})\\ \mathrm{subject~{}to}\quad&c_{i}\neq c_{j},\forall i,\forall j\in\{j:\mathcal{L}_{ij}=1\}.\end{split}

(RWO)

Suppose each complementary edge has a probability $p_{ij}$ to be connected. Denote $\mathfrak{N}\subseteq\bar{E}$ to be the set of new-connected complementary edges. Let $\mathfrak{C}_{i}:=\{\{V_{i},V_{j}\}\in\bar{E}:V_{j}\in V,c_{i}=c_{j}\}$ be the set of complementary edges taking $V_{i}$ as an endpoint and with endpoints’ colors identical . It is important to note that $\mathfrak{L}$ in RWO is not a simple summation of the utility loss in both endpoints from a pair-connection, otherwise the losses would be counted repeatedly. For example, after the utility of $V_{i}\in V$ is eliminated due to a new connected clash, another complementary edge with him as an endpoint would no longer reduce its utility. Instead, we may express $\mathfrak{L}$ as a weighted sum of $n$ random variables $X_{1},X_{2},\dots,X_{n}$ defined by

\displaystyle X_{i}=u_{i}(c)\mathds{1}_{\mathfrak{C}_{i}\cap\mathfrak{N}\neq\emptyset},\quad\forall i\in\{1,2,\dots,n\}

thus

\displaystyle\mathds{E}(\mathfrak{L})=\sum_{i=1}^{n}w_{i}\mathds{E}[u_{i}(c)\mathds{1}_{\mathfrak{C}_{i}\cap\mathfrak{N}\neq\emptyset}]=\sum_{i=1}^{n}w_{i}u_{i}(c)\Pr(\mathfrak{C}_{i}\cap\mathfrak{N}\neq\emptyset).

(8)

6.2 The MH-Policy for RWO

In this subsection, we again employ the Metropolis-Hasting principle to solve RWO in a decentralized fashion. A key difference to the MH-policy in Section 5 is that the family structure is dynamic when (Info-Type2) is involved. Note that the proposed MH-policy for RWO is continued as an extension to solving $\mathsf{WO1}$ , thus taking the output of Algorithm 1 as input.

Assumption 4

We replace Assumption (R1) by the following statement while keep (R2) and (R3) hold.

(R1-RWO)

An active agent, say $V_{i}$ , would first randomly select a color $c_{new}$ from the color set $M/c_{\mathcal{N}(V_{i})}$ ; i.e. $\mathcal{C}_{i}=M/c_{\mathcal{N}(V_{i})}$ in each round.

Remark 5

We would like to clarify on some possible doubts regarding our design:

(i)

Note that Assumption 4 is not a loss of generality. An equivalent approach is to work with Assumption 3 and set the loss for an improper coloring to be $-\infty$ so that the game never accepts improper colorings if starting from $c^{*}_{WO1}$ which is proper. In other words, instead of a change in the general setting, Assumption 4 merely allows an acceleration compared to Assumption 3.
(ii)

The reason why we start our algorithm from $c^{*}_{WO1}$ rather than thoroughly starting from the very beginning arbitrary coloring $c(0)$ is to guarantee the properness of our optimal coloring. Note that if we use the relaxation in $\mathsf{WO2}$ in solving RWO, the optimal coloring may not be proper; for example, when a vertex’s utility is less than the possible loss that might be triggered. Starting from $c^{*}_{WO1}$ along with Assumption 4 makes sure the game keeps running on proper colorings.

Input:

G(V,E)

M

c^{*}_{WO1}

(equivalently

c^{*}_{WO2}

T

\tau_{0}

Output:

c^{*}_{RWO}

1 Initialize

c\leftarrow c^{*}_{WO1}

t\leftarrow 0

\tau(0)\leftarrow\tau_{0}

2 while $t<T$ do

3 Activate

V_{i}\in V

randomly.

4 Sample a color

c_{new}

from the available color set

\mathcal{C}_{i}=M/c_{\mathcal{N}(V_{i})}

randomly.

c_{-i}(t+1)=c_{-i}(t)

6 Calculate the acceptance ratio

\alpha(t)=\min\{1,e^{\frac{1}{\tau(t)}[\Delta U_{\mathcal{N}(V_{i})\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})-\Delta L_{\mathcal{F}_{i}\cup\{V_{i}\}/\mathcal{N}(V_{i})}(c^{k}\rightarrow c^{l})]}\}

7 Generate a number

K\in[0,1]

randomly.

8 if $K<\alpha(t)$ then

c_{i}(t+1)=c_{new}

else

c_{i}(t+1)=c_{i}(t)

t\leftarrow t+1

12 Reduce

\tau(t-1)

\tau(t)

under some scheme.

c^{*}_{RWO}\leftarrow c(T)=(c_{1}(T),c_{2}(T),\dots,c_{n}(T))

14 return

c^{*}_{RWO}

Algorithm 2 Realization of the MH-policy for RWO in asynchronous G-VCGs

As in Section 5, we would first consider asynchronous cases for solving RWO. Denote by $\mathfrak{L}_{\mathfrak{S}^{(c)}}$ the loss occurs in a set of vertices $\mathfrak{S}\subseteq V$ under a coloring $c$ . When the coloring is changed from $c^{k}$ to $c^{l}$ in round $t$ where $c^{k}=(c_{i}(t),c_{-i}(t))$ and $c^{l}=(c_{new},c_{-i}(t))$ , the difference in the total welfare $\Phi_{R}$ would be

\displaystyle\begin{split}\Delta\Phi_{R}(c^{k}\rightarrow c^{l})&=\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})-\Delta L_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})\\ &=\Delta U_{\mathcal{N}(V_{i})\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})-\Delta L_{\mathcal{F}_{i}\cup\{V_{i}\}/\mathcal{N}(V_{i})}(c^{k}\rightarrow c^{l})\end{split}

(9)

where $\Delta U_{\mathcal{N}(V_{i})\cup\{V_{i}\}}(c^{k}\rightarrow c^{l})$ is defined as in 6 and

\displaystyle\Delta L_{\mathcal{F}_{i}\cup\{V_{i}\}/\mathcal{N}(V_{i})}(c^{k}\rightarrow c^{l}):=\mathds{E}[\mathfrak{L}_{\mathcal{F}_{i}^{(c)}\cup\{V_{i}\}/\mathcal{N}(V_{i})}|c=c^{l}]-\mathds{E}[\mathfrak{L}_{\mathcal{F}_{i}^{(c)}\cup\{V_{i}\}/\mathcal{N}(V_{i})}|c=c^{k}].

(10)

The second equality in (9) holds because a color change in $V_{i}$ will only affect the utilities of his neighbors sharing (Info-Type1) and only affect the underlying loss of his non-neighboring family members sharing (Info-Type2), and $c_{i}\notin c_{\mathcal{N}(V_{i})}$ by Assumption 4. Since the probability terms in (8) can be expressed using $p_{ij}$ by

\displaystyle\begin{split}\Pr(\mathfrak{C}_{i}\cap\mathfrak{N}\neq\emptyset)&=1-\Pr(\mathfrak{C}_{i}\cap\mathfrak{N}=\emptyset)\\ &=1-\prod_{j:\{V_{i},V_{j}\}\in\bar{E},c_{i}=c_{j}}(1-p_{ij})\end{split}

(11)

and vertices other than $V_{i}$ keep their colors unchanged, we can decompose 10 as

\displaystyle\begin{split}\Delta L_{\mathcal{F}_{i}\cup\{V_{i}\}/\mathcal{N}(V_{i})}(c^{k}\rightarrow c^{l})&=w_{i}\biggl{[}u_{i}(c^{l})\biggl{(}1-\prod_{j:V_{j}\in\mathcal{F}_{i}^{(c^{l})}/\mathcal{N}(V_{i})}(1-p_{ij})\biggl{)}-u_{i}(c^{k})\biggl{(}1-\prod_{\tilde{j}:V_{\tilde{j}}\in\mathcal{F}_{i}^{(c^{k})}/\mathcal{N}(V_{i})}(1-p_{i\tilde{j}})\biggl{)}\biggl{]}\\ &\quad\quad-\sum_{\tilde{j}:V_{\tilde{j}}\in\mathcal{F}_{i}^{(c^{k})}/\mathcal{N}(V_{i})}p_{i\tilde{j}}w_{\tilde{j}}u_{\tilde{j}}(c^{k})\\ &\quad\quad+\sum_{j:V_{j}\in\mathcal{F}_{i}^{(c^{l})}/\mathcal{N}(V_{i})}p_{ij}w_{j}u_{j}(c^{l})\end{split}.

(12)

Notice that the terms included in 12 are all accessible to $V_{i}$ during his color updating procedure, since losses can only be triggered by vertices in the same color which are all included in the family. Therefore, 9 can be calculated by each decentralized active agent. The pseudo-code for realizing the MH policy for RWO is provided in Algorithm 2.

Remark 6

Note that the probability of a connection in any complementary edge whose both endpoints share the same color is known to the corresponding vertices, since it belongs to (Info-Type2).

The optimality of the MH-policy for RWO in either asynchronous and independently synchronous settings can be again substantiated using theories of RPMP. Arguments are similar to the ones in Section 5 though associated with some other resistance with a minor modification, and we will not repeat the procedure here.

7 Simulation Experiments

We examine the effectiveness of the proposed MH-policies for G-VCGs in this section. The probabilitsitic network stuctures were constructed using the Erdős–Rényi model provided by the networkx package in Python with certain connection probabilities. Meanwhile, we randomly generated preferences from $(0,100)$ for each vertex as well as weights from $(0,1)$ which was further normalized to sum up to 1. We monitor the dynamics induced by the networks with different sizes, different connection probabilities (thus different $\delta$ ) and different temperature reduction schemes. We also checked on the effects of varied activation parameters $\omega$ in synchronous settings to see how synchrony affects the efficiency of the MH-policy driven algorithms. For comparison purposes, we also implemented a tabu search (TS) algorithm adapted from Lim and Wang (2005) as an aid to determine whether the obtained welfares are optimal (or, at least, sub-optimal), of which the details are given in Algorithm 3 in Section A.

Temperature Scheme	Formula	$\tau(0)$
constant	$\tau(t)=\tau(0)$	0.01
exponential multiplicative	$\tau(t)=0.99^{t}\tau(0)$	10
logarithmical	$\tau(t)=\tau(0)/1+\log(100+t)$	0.1
trigonometric additive	$\tau(0)=0.01+0.05(\tau(0)-0.01)(1+\cos(\frac{t\pi}{T}))$	10

Table 3: Different temperature schemes

Four temperature reduction schemes were considered in our experiments: constant, exponential multiplicative, logarithmical and trigonometric additive. The specific parameters are given in Table 3. The experiements were divided into several groups, each of which has one monitored variable with other variables controlled. For each group, we ran three parallel simulations and would present one of them in this section. The number of iterations in each simulation experiment was set to be $T=1\times 10^{4}$ .

Table 4 gives the results for the three groups which focus on solving $\mathsf{WO2}$ . Among the data given by the four temperature schemes, the welfares outperforming TS are labelled in red while the ones in blue suggest that the difference between it and the TS outcome is less than 0.5. The experiments in the first two groups were all conducted in an asynchronous manner as described in Section 3.1. The group “WO-Async 1” aims to detect on the influence made by variations of the network size. It can be observed from Figure 6 that the MH-policy supported by the logarimical and trigonometric addtive schemes achieves higher welfares on this task than TS did, and all methods perform equally well in general. In terms of group “WO-Async 2”, we are able to analyze on the impacts under different network degrees by adjusting the priori connection probability between edges when generating the graph. It turned out that trigonometric additive and logarithmical models still prevail over TS as can be observed from Figure 6.

Group	Network Features	Trig	Exp	Log	Const	TS
WO-Async 1	$n=10$ , $p=0.5$	84.12	83.57	81.43	83.88	83.88
	$n=20$ , $p=0.5$	89.45	87.18	90.77	87.01	88.11
	$n=30$ , $p=0.5$	94.49	94.44	94.47	94.21	94.28
	$n=40$ , $p=0.5$	92.99	93.23	93.34	92.52	93.40
	$n=50$ , $p=0.5$	93.03	95.05	94.16	95.65	95.53
WO-Async 2	$n=20,p=0.25$	89.93	89.74	89.56	88.71	88.87
	$n=20,p=0.5$	89.45	87.18	90.77	87.01	88.11
	$n=20,p=0.75$	92.53	89.91	91.95	90.94	91.38
WO-Sync	$n=20,p=0.5,\omega=0.25$	90.15	88.82	87.39	85.73	88.11
	$n=20,p=0.5,\omega=0.5$	90.83	90.21	86.93	87.70	88.11
	$n=20,p=0.5,\omega=0.75$	90.62	90.44	87.15	88.82	88.11
	$n=20,p=0.5,\omega=1$	90.44	90.63	89.19	88.51	88.11

Table 4: Simulation results for

\mathsf{WO2}

-solving.

Refer to caption — Figure 4: Optimal Welfare Comparison for Group A

The MH-policies under synchronous settings were examined in the third group, namely “WO-Sync”. It is uplifting to witness that the proposed MH-policies can almost achieve more optimal solutions than TS, especially when driven by trignometric addtive and exponential multiplicative schemes. It is also remarkable that the algorithms are fairly robust even under completely synchronous cases. A visualization for this is given in Figure 6.

We further proceed our investigation to solving RWO, the results of which are reported in Table 5. Due to our formulation of the expected loss term in 8, the final objective highly depends on the probability of a future connection of each complementary edge, which is directly related to the degree of the network. Therfore, in the fourth “RWO-Async” group, we had the network size controlled and modified the connection probability $p$ , as did in “WO-Async 2”. It turned out that the MH-policy driven Algorithm 2 performs well when $p$ is relatively large, no matter what scheme it employed, as shown in Figure 8A possible explanation for this phenomenon is that, when $p$ is high, the number of complementary edges falls thus the optimal solution of RWO stays close $c_{WO2}^{*}$ , thus the two-stage MH-policy becomes advantageous. Figure 8 corresponds to the last group which again focused on the impact of synchrony. Like in “WO-Sync”, the effectiveness of the proposed MH-policy confronted by different activation parameters are also well demonstrated.

Group	Network Features	Trig	Exp	Log	Const	TS
RWO-Async	$n=20,p=0.25$	67.17	64.30	66.89	65.65	70.93
	$n=20,p=0.5$	67.73	66.76	74.16	65.83	69.92
	$n=20,p=0.75$	86.80	89.17	87.25	87.55	82.65
RWO-Sync	$n=20,p=0.5,\omega=0.25$	74.35	68.17	72.70	65.35	69.92
	$n=20,p=0.5,\omega=0.5$	73.66	68.53	73.76	69.95	69.92
	$n=20,p=0.5,\omega=0.75$	64.62	70.69	71.31	73.14	69.92
	$n=20,p=0.5,\omega=1$	70.17	59.30	69.75	68.45	69.92

Table 5: Simulation results for RWO-solving.

8 Conclusion

In this paper, we focus on the venue selection problem commonly confronted in event planning, and model its dynamics by the proposed generalized vertex coloring games which go beyond the basic requirement of properness in search of some max-welfare coloring. It was shown that in an IP-VCG, being completely greedy and completely synchronous does not prevent agents reaching the optimal coloring asymptotically, and the converging time was shown to be stochastically bounded by $\mathcal{O}_{p}(\log n)$ . In more general G-VCGs in asynchronous manner, there is still some policy driven by the Metropolis-Hasting algorithm which enables the self-organized agents to reach the optimal coloring without violating their greediness. The optimality of the MH policy even holds in some particular independently synchronous settings. Finally, we integrate the idea of robust coloring into our formulation seeking for a balance between the welfare and the risk, and a corresponding adaptive MH-policy for the robust welfare optimization problem is provided. The effectiveness of the proposed policies are substantiated in our simulation experiments.

9 Acknowledgement

This project was supported by Nanyang Technological University under the URECA Undergraduate Research Programme.

References

Ahmed (2012) Ahmed, S., 2012. Applications of graph coloring in modern computer science. International Journal of Computer and Information Technology 3, 1–7.
Allen et al. (2022) Allen, J., Harris, R., Jago, L., Tantrai, A., Jonson, P., D’Arcy, E., 2022. Festival and special event management. John Wiley & Sons.
Archetti et al. (2014) Archetti, C., Bianchessi, N., Hertz, A., 2014. A branch-and-price algorithm for the robust graph coloring problem. Discrete Applied Mathematics 165, 49–59.
Bermond et al. (2019) Bermond, J.C., Chaintreau, A., Ducoffe, G., Mazauric, D., 2019. How long does it take for all users in a social network to choose their communities? Discrete Applied Mathematics 270, 37–57.
Carosi et al. (2019) Carosi, R., Fioravanti, S., Gualà, L., Monaco, G., 2019. Coalition resilient outcomes in max k-cut games, in: SOFSEM 2019: Theory and Practice of Computer Science: 45th International Conference on Current Trends in Theory and Practice of Computer Science, Novỳ Smokovec, Slovakia, January 27-30, 2019, Proceedings, Springer. pp. 94–107.
Chaudhuri et al. (2008) Chaudhuri, K., Chung Graham, F., Jamall, M.S., 2008. A network coloring game, in: Internet and Network Economics: 4th International Workshop, WINE 2008, Shanghai, China, December 17-20, 2008. Proceedings 4, Springer. pp. 522–530.
Chen et al. (2019) Chen, S., Delcourt, M., Moitra, A., Perarnau, G., Postle, L., 2019. Improved bounds for randomly sampling colorings via linear programming, in: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM. pp. 2216–2234.
Cui et al. (2008) Cui, G., Qin, L., Liu, S., Wang, Y., Zhang, X., Cao, X., 2008. Modified pso algorithm for solving planar graph coloring problem. Progress in Natural Science 18, 353–357.
Dowsland and Thompson (2008) Dowsland, K.A., Thompson, J.M., 2008. An improved ant colony optimisation heuristic for graph colouring. Discrete Applied Mathematics 156, 313–324.
Enemark et al. (2011) Enemark, D.P., McCubbins, M.D., Paturi, R., Weller, N., 2011. Does more connectivity help groups to solve social problems, in: Proceedings of the 12th ACM conference on Electronic commerce, pp. 21–26.
Fleurent and Ferland (1996) Fleurent, C., Ferland, J.A., 1996. Genetic and hybrid algorithms for graph coloring. Annals of operations research 63.
Fryganiotis et al. (2023) Fryganiotis, N., Papavassiliou, S., Pelekis, C., 2023. A note on the network coloring game: A randomized distributed ( $\delta$ + 1)-coloring algorithm. Information Processing Letters 182, 106385.
Goonewardena et al. (2014) Goonewardena, M., Akbari, H., Ajib, W., Elbiaze, H., 2014. On minimum-collisions assignment in heterogeneous self-organizing networks, in: 2014 IEEE Global Communications Conference, IEEE. pp. 4665–4670.
Hamza et al. (2021) Hamza, D., Toonsi, S., Shamma, J.S., 2021. A metropolis-hastings algorithm for task allocation, in: 2021 60th IEEE Conference on Decision and Control (CDC), IEEE. pp. 4539–4545.
Hastings (1970) Hastings, W.K., 1970. Monte carlo sampling methods using markov chains and their applications .
Hayes et al. (2007) Hayes, T.P., Vera, J.C., Vigoda, E., 2007. Randomly coloring planar graphs with fewer colors than the maximum degree, in: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 450–458.
Hayes and Vigoda (2006) Hayes, T.P., Vigoda, E., 2006. Coupling with the stationary distribution and improved sampling for colorings and independent sets .
Hernández and Blum (2012) Hernández, H., Blum, C., 2012. Distributed graph coloring: an approach based on the calling behavior of japanese tree frogs. Swarm Intelligence 6, 117–150.
Hindi and Yampolskiy (2012) Hindi, M.M., Yampolskiy, R.V., 2012. Genetic algorithm applied to the graph coloring problem, in: Proc. 23rd Midwest Artificial Intelligence and Cognitive Science Conf, pp. 61–66.
Jerrum (1995) Jerrum, M., 1995. A very simple algorithm for estimating the number of k-colorings of a low-degree graph. Random Structures & Algorithms 7, 157–165.
Kassir (2018) Kassir, A., 2018. Absorbing Markov chains with random transition matrices and applications. Ph.D. thesis. University of California, Irvine.
Kearns et al. (2006) Kearns, M., Suri, S., Montfort, N., 2006. An experimental study of the coloring problem on human subject networks. science 313, 824–827.
Keeney and Kirkwood (1975) Keeney, R.L., Kirkwood, C.W., 1975. Group decision making using cardinal social welfare functions. Management Science 22, 430–437.
Kirkpatrick et al. (1983) Kirkpatrick, S., Gelatt Jr, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. science 220, 671–680.
Kliemann et al. (2015) Kliemann, L., Sheykhdarabadi, E.S., Srivastav, A., 2015. Price of anarchy for graph coloring games with concave payoff. arXiv preprint arXiv:1507.08249 .
Lim and Wang (2005) Lim, A., Wang, F., 2005. Robust graph coloring for uncertain supply chain management, in: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, IEEE. pp. 81b–81b.
Luengo and Martino (2013) Luengo, D., Martino, L., 2013. Fully adaptive gaussian mixture metropolis-hastings algorithm, in: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE. pp. 6148–6152.
Marappan and Sethumadhavan (2013) Marappan, R., Sethumadhavan, G., 2013. A new genetic algorithm for graph coloring, in: 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation, IEEE. pp. 49–54.
Marappan and Sethumadhavan (2021) Marappan, R., Sethumadhavan, G., 2021. Solving graph coloring problem using divide and conquer-based turbulent particle swarm optimization. Arabian Journal for Science and Engineering , 1–18.
Marden and Shamma (2012) Marden, J.R., Shamma, J.S., 2012. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior 75, 788–808.
Marden and Wierman (2013) Marden, J.R., Wierman, A., 2013. Overcoming the limitations of utility design for multiagent systems. IEEE Transactions on Automatic Control 58, 1402–1415.
Marnissi et al. (2020) Marnissi, Y., Chouzenoux, E., Benazza-Benyahia, A., Pesquet, J.C., 2020. Majorize–minimize adapted metropolis–hastings algorithm. IEEE Transactions on Signal Processing 68, 2356–2369.
McQuarrie (2000) McQuarrie, D.A., 2000. Statistical mechanics. Sterling Publishing Company.
Metropolis et al. (1953) Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculations by fast computing machines. The journal of chemical physics 21, 1087–1092.
Moayedikia et al. (2020) Moayedikia, A., Ghaderi, H., Yeoh, W., 2020. Optimizing microtask assignment on crowdsourcing platforms using markov chain monte carlo. Decision Support Systems 139, 113404.
Monderer and Shapley (1996) Monderer, D., Shapley, L.S., 1996. Potential games. Games and economic behavior 14, 124–143.
Panagopoulou and Spirakis (2008) Panagopoulou, P.N., Spirakis, P.G., 2008. A game theoretic approach for efficient graph coloring, in: Algorithms and Computation: 19th International Symposium, ISAAC 2008, Gold Coast, Australia, December 15-17, 2008. Proceedings 19, Springer. pp. 183–195.
Pelekis and Schauer (2013) Pelekis, C., Schauer, M., 2013. Network coloring and colored coin games, in: Search Theory: A Game Theoretic Perspective. Springer, pp. 59–73.
Pickem et al. (2015) Pickem, D., Egerstedt, M., Shamma, J.S., 2015. A game-theoretic formulation of the homogeneous self-reconfiguration problem, in: 2015 54th IEEE Conference on Decision and Control (CDC), IEEE. pp. 2829–2834.
Rosenthal (2006) Rosenthal, J.S., 2006. First Look At Rigorous Probability Theory, A. World Scientific Publishing Company.
Salari and Eshghi (2005) Salari, E., Eshghi, K., 2005. An aco algorithm for graph coloring problem, in: 2005 ICSC Congress on computational intelligence methods and applications, IEEE. pp. 5–pp.
Touhiduzzaman et al. (2018) Touhiduzzaman, M., Hahn, A., Srivastava, A.K., 2018. A diversity-based substation cyber defense strategy utilizing coloring games. IEEE Transactions on Smart Grid 10, 5405–5415.
Vigoda (2000) Vigoda, E., 2000. Improved bounds for sampling colorings. Journal of Mathematical Physics 41, 1555–1569.
Vu et al. (2014) Vu, T., Vo, B.N., Evans, R., 2014. A particle marginal metropolis-hastings multi-target tracker. IEEE transactions on signal processing 62, 3953–3964.
Wang and Xu (2013) Wang, F., Xu, Z., 2013. Metaheuristics for robust graph coloring. Journal of Heuristics 19, 529–548.
(47)

Appendix A Tabu search algorthm used in Section 7

Input:

G(V,E)

M

c(0)

T

, MaxTenure

Output:

c^{*}_{RWO}

1 Initialize

c\leftarrow c(0)

t\leftarrow 0

t^{\prime}\leftarrow 0

tabu_{v}=[0]^{|V|}

tabu_{p}=[0]^{|V|\times|M|}

2 while $t<T$ do

t^{\prime}\leftarrow t^{\prime}+1

c^{\prime}\leftarrow c

\delta\leftarrow 0

6 for $V_{i}\in V$ where $t^{\prime}>tabu_{v}[V_{i}]$ do

c_{i}^{*}\leftarrow\operatorname*{arg\,max}_{c_{i}\in\mathcal{C}_{i}=M/c_{\mathcal{N}(V_{i})}}\Phi((c_{i},c_{-i})-\Phi(c)+\mathds{E}(\mathfrak{L}^{c})-\mathds{E}(\mathfrak{L}^{(c_{i},c_{-i})}))

\delta^{*}\leftarrow\Phi((c_{i}^{*},c_{-i})-\Phi(c)+\mathds{E}(\mathfrak{L}^{c})-\mathds{E}(\mathfrak{L}^{(c_{i}^{*},c_{-i})}))

9 if $\delta^{*}>\delta$ then

V^{\prime}\leftarrow V_{i}

;

c^{*}\leftarrow c_{i}^{*}

\delta\leftarrow\delta^{*}

c^{\prime}_{V^{\prime}}\leftarrow c^{*}

13 if $\delta>0$ then

c\leftarrow c^{\prime}

t\leftarrow 0

tabu_{v}[V^{\prime}]\leftarrow t^{\prime}+\text{randint}(1,\text{MaxTenure})

tabu_{p}[V^{\prime},c^{*}]\leftarrow t^{\prime}+~{}\text{randint}(3,\text{MaxTenure})

else

t\leftarrow t+1

c^{*}_{RWO}\leftarrow c

20 return

c^{*}_{RWO}

Algorithm 3 Tabu Search Algorithm for RWO in asynchronous G-VCGs (Adapted from Lim and Wang (2005))

Appendix B Proof of results in Section 4

B.1 Proof of results in Section 4.1

In this part of the Appendix, we present proofs regarding the lemmas and theorems on the convergence of IP-VCGs, as discussed in Section 4.1.

Proof of Lemma 1 11

Under Assumption 2, the state space $S$ must contain the state $K=(1,1,\dots,1)$ corresponding to any proper coloring. One may easily observe that the state $K=(1,1,\dots,1)$ is absorbing since none of the agents would accept new colors thus the utility remains. All other states are transient because they are likely to alter in later rounds. It then suffices to prove the accessibility between other states to the state $K$ .

For an active zero-utility agent, say $V_{i}$ , consider the probability $p_{i}$ of a utility rise from round $t$ to round $t+1$ . Let $d$ denote the number of $V_{i}$ ’s neighbors and $\tilde{d}$ denote the number of colors held by his neighbors in round $t$ . We also consider the number of neighbors of $V_{i}$ ’s neighbors, denoted by $d_{1}$ , $\cdots$ , $d_{d}$ in respective. Then $V_{i}$ can only have a utility rise when his new color is unused by his neighbors in round $t$ and does not clash with the new colors of his active neighbors. For a completely synchronous game under Assumption 2, we have

\displaystyle\begin{split}p_{i}&>\frac{m-\tilde{d}}{m}(1-\frac{1}{m})^{|\{j:V_{j}\in\mathcal{N}(V),u_{j}(t)=0\}|}\\ &\geq\frac{m-\delta}{m}(1-\frac{1}{m})^{\delta}\\ &\geq\frac{1}{m}(\frac{1}{2})^{\delta}\\ &>0.\\ \end{split}

Note that this constant lower bound for $p_{i}$ is independent of the network structure and neighborhood behaviours. By the definition of the AMC on the utility space,

\displaystyle P_{kl}=\prod_{i:L^{k}_{i}=0,L^{l}_{i}=1}p_{i}\prod_{j:L^{l}_{j}=0}(1-p_{j}).

As such, the probability of transition from a low-welfare state to a high-welfare state is always positive. Then $K=(1,1,\dots,1)$ is accessible from any transcient state, which completes the proof. \qed

Proof of Theorem 2 12

According to the arguments in Section 4, it suffices to prove that a synchronous IP-VCG with utility functions 3 converges to some proper coloring. This is equivalent to show that the AMC defined in Theorem 1 would eventually be absorbed in $K=(1,1,\dots,1)$ .

Suppose, from an initial state $L^{0}$ , the chain requires at least $T_{i}$ rounds to reach an absorbing state and the corresponding probability is $\mathcal{P}_{i}$ ( $0<\mathcal{P}_{i}<1$ because the chain is absorbing). Let $T=\max_{i}T_{i}$ and $\mathcal{P}=\min_{i}\mathcal{P}_{i}$ . Then the probability that the chain does not access to the absorbing state after $kT$ rounds is at most $(1-\mathcal{P})^{k}$ , which converges to 0 when $k\to\infty$ (i.e. $\lim_{t\to\infty}Q^{t}=0$ ). Therefore, the chain will be absorbed in an absorbing state, which is uniquely $K=(1,1,\dots,1)$ in our case. \qed

B.2 Proof of results in Section 4.2

In this part of the Appendix, we present proofs regarding the lemmas and theorems on the stochastic boundedness on the time to convergence of IP-VCGs, as discussed in Section 4.2.

Proof of Theorem 3 13

Denote by $F_{i}$ the set of $V_{i}$ ’s neighbors who have no clash in round $t$ , and define $\tilde{F_{i}}:=\cup_{V_{u}\in F}\{c_{u}(t)\}$ the set with cardinality denoted by $f_{i}$ . Additionally, define $\tilde{H_{i}}:=\cup_{V_{j}\in\mathcal{N}(V_{i})}\{c_{j}(t)\}$ . We also define an indicator variable $Y_{k}$ for each color $k\in M$ as

\displaystyle Y_{k}=\begin{cases}1&\textit{if }c_{j}(t+1)\neq k,\forall V_{j}\in\mathcal{N}(V_{i})\\ 0&\textit{otherwise}.\end{cases}

Define $Y=\sum_{k\in M}Y_{k}$ , which represents the number of colors unused by $V_{i}$ ’s neighbors in round $t+1$ . A color $k$ can belong to one of the three cases:

(case 1)

$k\in\tilde{F_{i}}$ . Then $\Pr(Y_{k}=1)=0$ because $c_{j}(t+1)=c_{j}(t)$ for $\forall V_{j}\in F_{i}$ .
(case 2)

$k\in M\backslash\tilde{H_{i}}$ . Then

$\displaystyle\Pr(Y_{k}=1)=(\frac{m-1}{m})^{|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|};$

i.e. the color $k$ can only be available in round $t+1$ when it is not taken by the active neighbors with zero utility.

(case 3)

$k\in\tilde{H_{i}}\backslash\tilde{F_{i}}$ . Then

\displaystyle\Pr(Y_{k}=1)=(\frac{m-1}{m})^{|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|}+\prod_{V_{j}\in\mathcal{N}(V_{i}):c_{j}(t)=k}\frac{|M\backslash\tilde{H_{j}}|}{m}.

The second term results from the fact that an agent would reject any color which is held by at least one of his neighbors and remain the color in the last round.

Therefore, the expectation of $Y$ can be expressed as

\displaystyle\begin{split}\mathds{E}(Y)&=\sum_{k\in M}\Pr(Y_{k}=1)\\ &=\sum_{k\in M\backslash\tilde{H_{i}}}(\frac{m-1}{m})^{|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|}+\sum_{k\in\tilde{H_{i}}\backslash\tilde{F_{i}}}[(\frac{m-1}{m})^{|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|}+\prod_{V_{j}\in\mathcal{N}(V_{i}):c_{j}(t)=k}\frac{|M\backslash\tilde{H_{j}}|}{m}]\\ &=\sum_{k\in M\backslash\tilde{F_{i}}}(\frac{m-1}{m})^{|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|}+\sum_{k\in\tilde{H_{i}}\backslash\tilde{F_{i}}}\prod_{V_{j}\in\mathcal{N}(V_{i}):c_{j}(t)=k}\frac{|M\backslash\tilde{H_{j}}|}{m}\\ &\geq\sum_{k\in M\backslash\tilde{F_{i}}}e^{-\frac{3}{2m}|\{V_{j}\in\mathcal{N}(V_{i}):k\notin\tilde{H_{j}}\cup\{c_{j}(t)\}\}|}.\end{split}

(13)

Assumption 2 implies $m\geq\delta(G)+1\geq 2$ because the graph $G$ is connected. The last inequality in (13) arises from the fact that $1-x\geq e^{-\frac{2}{3}x}$ for $x\in[0,\frac{1}{2}]$ and the nonnegativity of $\sum_{k\in\tilde{H_{i}}\backslash\tilde{F_{i}}}\prod_{V_{j}\in\mathcal{N}(V_{i}):c_{j}(t)=k}\frac{|M\backslash\tilde{H_{j}}|}{m}$ . Defining

\displaystyle\hat{Y_{k}}=\begin{cases}1&\textit{if }c_{j}(t+2)\neq k,\forall V_{j}\in\mathcal{N}(V_{i})\\ 0&\textit{otherwise}\end{cases},

Lemma 4 and Lemma 5 in Chaudhuri et al. (2008) then apply. Therefore, we can adapt from their Lemma 3 that

\displaystyle\Pr[u_{i}(t+2)>0|u_{i}(t)=0]\geq\mathbf{L},\quad\forall t\in\mathbb{N}

for $\mathbf{L}=\frac{1}{44100e^{18}}$ by multiplying the original constant by another $\frac{1}{6}$ and $\frac{1}{7e^{9}}$ due to Assumption (R1) where $\mathcal{C}_{i}=M$ . \qed

To prepare for the proofs regarding Lemma 4, we introduce the canonical form of an AMC.

Definition 8 (Canonical Form of AMC)

Suppose an AMC has $r$ absorbing states and $t$ transient states. The transition matrix $P$ of the AMC is in canonical form if it is divided into four sub-matrices listed as follows:

\displaystyle P=\begin{pmatrix}Q&R\\ 0&I\\ \end{pmatrix}

(CF)

where

•

$Q$ is a $t\times t$ block containing the transition probabilities between transient states.
•

$R$ is a $t\times r$ block containing the transition probabilities from transient to absorbing states.
•

$I$ is the identity matrix of order $r$ and $0$ is the block with null elements

Since an AMC will be eventually absorbed in an absorbing state, as we prove in Theorem 2, we have

\displaystyle\lim_{t\rightarrow\infty}Q^{t}=0.

The following lemma gives several properties of an AMC which would be useful in our later arguments.

Lemma 11 (Properties of AMC (See, e.g. Kassir (2018) Chapter 2))

For an AMC with transition matrix $P$ in canonical form (CF), then

(AMC-1)

$(I-Q)^{-1}$ exists.
(AMC-2)

$\lim_{t\to\infty}P^{t}=$ $\begin{pmatrix}0&(I-Q)^{-1}R\\ 0&I\\ \end{pmatrix}$
(AMC-3)

Suppose the number of states is $k$ . Define a $k\times 1$ column vector $n$ whose i-th element $\mathbf{n}_{i}$ denotes the expected number of steps before absorption, given the initial state $i$ . Then $\mathbf{n}=(I-Q)^{-1}\mathds{1}$ where $\mathds{1}=(1,1,\cdots,1)_{k}^{\intercal}$ .

Proof of Lemma 11 14

The first statement is equivalent to“if $(I-Q)x=0$ , then $x=0$ ” which is easy to prove as

\displaystyle(I-Q)x=0\iff x=Qx\iff x=\lim_{t\to\infty}Q^{t}x\iff x=0.

Let $N=(I-Q)^{-1}$ , then $N=\sum_{t=0}^{\infty}Q^{t}$ by Taylor’s formula. Notice that the n-th power of the transition matrix is

\displaystyle P^{t}=\begin{pmatrix}Q^{t}&(I+Q+\cdots+Q^{t-1})R\\ 0&I\\ \end{pmatrix}

then

\displaystyle\lim_{t\to\infty}P^{t}=\begin{pmatrix}0&(\sum_{t=0}^{\infty}Q^{t})R)\\ 0&I\\ \end{pmatrix}=\begin{pmatrix}0&NR\\ 0&I\\ \end{pmatrix}.

Denote by $X_{ij}$ the times the chain hits state $j$ starting from $i$ and $X_{ij}^{(l)}$ the indicator of arriving at state $j$ in the $l$ th step, where $i$ and $j$ are both transient states. Then

\displaystyle\begin{split}\mathds{E}(X_{ij})&=\sum_{l=0}^{\infty}\Pr{(X_{ij}^{(l)}=1)}\\ &=I_{ij}+Q_{ij}+Q_{ij}^{2}+Q_{ij}^{3}+\cdots\\ &=N_{ij}.\end{split}

Since $\mathbf{n}_{i}:=\mathds{E}(X_{i})$ where $X_{i}$ denotes the number of steps before absorption given the initial state $i$ , we have

\displaystyle\begin{split}\mathbf{n}_{i}&=\mathds{E}(\sum_{j=1}^{k}X_{ij})=\sum_{j=1}^{k}N_{ij}=[N\mathds{1}]_{i}\end{split}

which completes the proof. \qed

Next, we prove Lemma 4.

Proof of Lemma 4 15

It suffices to prove the lemma on the representative candidate of IP-VCGs with binary utility functions (3), as explained in Section 4. Given an arbitrary number $\epsilon\in(0,1)$ and define $\tau=\frac{1}{\mathbf{L}}\ln(\frac{n}{\epsilon})$ . We also define the random variable $T$ to denote the time to convergence of the game, and, specifically, let $T_{i}$ denote the number of steps to convergence of $(X_{t})_{t\in\mathbb{N}}$ initiating from the state $i$ . Invoking Theorem 3 we have

\displaystyle\begin{split}\Pr[u_{i}(c(2\tau))=0|u_{i}(c(0))=0]&=\prod_{t=1}^{\tau}\Pr[u_{i}(c(2t))=0|\bigcap_{s=1}^{t-1}\Pr[u_{i}(c(2s))=0]\\ &\leq(1-\mathbf{L})^{\tau}\\ &\leq e^{-\mathbf{L}\tau}.\end{split}

(14)

The first equality in (14) is due to the fact that agents with utility one will remain his color forever. Thus,

\displaystyle\Pr[u_{i}(c(2\tau))=1|u_{i}(c(0))=0]\geq 1-\frac{\epsilon}{n}.

Taking an union bound over all vertices, we obtain

\displaystyle\Pr[T\leq 2\tau]

\displaystyle\geq(1-\frac{\epsilon}{n})^{n}\geq 1-\epsilon

thus

\displaystyle\Pr[T>2\tau]\leq\epsilon.

(15)

We generate the Markov Chain defined in Lemma 1, $(X_{t})_{t\in\mathbb{N}}$ , to jump two steps at a time and consider its transition matrix $P$ in the form (CF). Define $\tilde{N}:=(I-Q^{2})^{-1}$ . We denote $\mathbf{n}=(\mathbf{n}_{1},\mathbf{n}_{2},\dots,\mathbf{n}_{2^{n}-n-1})^{\mathsf{T}}$ the column vector whose i-th element $\mathbf{n}_{i}$ denotes the expected number of steps before absorption starting from $i\neq K=(1,1,\dots,1)$ , i.e. $\mathds{E}[T_{i}]$ , where we correspond $K$ to the $2^{n}-n$ row in $P$ . Obviously $\mathbf{n}_{2^{n}-n}=0$ corresponding to the absorbing state of any proper coloring. By (AMC-3) of Lemma 11,

\displaystyle 2\tilde{N}\mathds{1}\geq\mathbf{n}

elementwise. For each element $n_{i}$ , equivalently,

\displaystyle\begin{split}\mathbf{n}_{i}&\leq 2\sum_{\tau=0}^{\infty}\sum_{j=1}^{2^{n}-n-1}{Q^{2}}_{ij}^{(\tau)}=2\sum_{\tau=0}^{\infty}\Pr[T>2\tau].\end{split}

Along with (15) for $\epsilon=ne^{-\mathbf{L}\tau}$ , we have

\displaystyle\begin{split}\sum_{\tau=0}^{\infty}\Pr[T>2\tau]&\leq 1\times\lceil\frac{\ln{n}}{\mathbf{L}}\rceil+\sum_{\tau=\lceil\frac{\ln{n}}{\mathbf{L}}\rceil}^{\infty}ne^{-\mathbf{L}\tau}\\ &=\lceil\frac{\ln{n}}{\mathbf{L}}\rceil+n\sum_{\tau=\lceil\frac{\ln{n}}{\mathbf{L}}\rceil}^{\infty}e^{-\mathbf{L}\tau}\\ &\leq\lceil\frac{\ln{n}}{\mathbf{L}}\rceil+n(\frac{1}{n}\frac{1}{1-e^{-\mathbf{L}}})\\ &=\mathcal{O}\log n.\end{split}

(16)

Therefore, $\mathds{E}[T]=\mathcal{O}\log n$ .

Denote $\mathbf{r}=(\mathbf{r}_{1},\mathbf{r}_{2},\dots,\mathbf{r}_{2^{n}-n-1})^{\mathsf{T}}$ be the column vector whose i-th element $\mathbf{r}_{i}$ denotes $\mathrm{Var}[T_{i}]$ , the variance of steps before absorption starting from $i\neq K=(1,1,\dots,1)$ . Then

\displaystyle\mathbf{r}_{i}=\mathds{E}[T_{i}^{2}]-\mathbf{n}_{i}^{2}.

We consider all possible states led by the first step of $X_{ij}^{(l)}$ and derive an expression for $\mathds{E}[T_{i}]$ as a probabilistic combination of the expected number of steps given other initiate states.

\displaystyle\begin{split}\mathds{E}[T_{i}^{2}]&=\sum_{k=1}^{2^{n}-n}P_{ik}\mathds{E}[(1+T_{k})^{2}]\\ &=\sum_{k=1}^{2^{n}-n}P_{ik}\mathds{E}[T_{k}^{2}+2T_{k}+1]\\ &=1+2\sum_{k=1}^{2^{n}-n}P_{ik}\mathds{E}[T_{k}]+\sum_{k=1}^{2^{n}-n}P_{ik}\mathds{E}[T_{k}^{2}]\end{split}.

(17)

This operation is often named “first-step analysis” in stochastic processes.

Let $\mathds{E}[T^{2}]=(\mathds{E}[T_{1}^{2}],\mathds{E}[T_{2}^{2}],\dots,\mathds{E}[T_{2^{n}-n-1}^{2}])^{\mathsf{T}}$ . We rewrite (17) in vector form as

\displaystyle\mathds{E}[T^{2}]=\mathds{1}+2Q\mathbf{n}+Q\mathds{E}[T^{2}]

and simple arrangements give

\displaystyle\begin{split}\mathds{E}[T^{2}]&=(I-Q)^{-1}(\mathds{1}+2Q\mathbf{n})\\ &=\mathbf{n}+2(I-Q)^{-1}Q\mathbf{n}\\ &=\mathbf{n}+2(Q+Q^{2}+\dots)\mathbf{n}\\ &=\mathbf{n}+2[(I-Q)^{-1}-I]\mathbf{n}\\ &=[2\mathbf{n}-I]\mathbf{n}\\ &=\mathcal{O}(\log n)^{2}.\end{split}

Since $(\mathds{E}[T])^{2}$ is also $\mathcal{O}(\log n)^{2}$ , we know that $\mathrm{Var}[T_{i}]=\mathcal{O}(\log n)^{2}$ .

The proof is thus complete. \qed

We now forward to prove Theorem 5.

Proof of Theorem 5 16

Let $\mu^{(n)}=\mathds{E}[T^{(n)}]$ and $\sigma^{(n)}=\sqrt{\mathrm{Var}[T^{(n)}]}$ . By Lemma 4, $\mu^{(n)}\leq D_{1}\log n$ for $n\geq N_{1}$ and $\sigma^{(n)}\leq D_{2}\log n$ for $n\geq N_{2}$ , where $D_{1}$ and $D_{2}$ are two positive constants. By Chebyshev’s inequality, $\forall k\geq 1$ ,

\displaystyle\Pr[T^{(n)}-\mu^{(n)}\geq k\sigma^{(n)}]\leq\Pr[|T^{(n)}-\mu^{(n)}|\geq k\sigma^{(n)}]\leq\frac{1}{k^{2}}.

For $\forall\epsilon>0$ , let $M_{\epsilon}=\frac{D_{2}}{\sqrt{\epsilon}}+D_{1}>D_{1}$ . Plugging $k=\frac{M_{\epsilon}\log n-\mu^{(n)}}{\sigma^{(n)}}$ into the inequality gives

\displaystyle\Pr[T^{(n)}-\mu^{(n)}>M_{\epsilon}\log n-\mu^{(n)}]<\frac{D_{2}^{2}(\log n)^{2}}{(M_{\epsilon}\log n-\mu^{(n)})^{2}}

thus

\displaystyle\Pr[\frac{T^{(n)}}{\log n}>M_{\epsilon}]<\epsilon,\quad\forall n>N_{\epsilon}=\max\{N_{1},N_{2}\}

which completes the proof. \qed

Appendix C Proof of results in Section 5

C.1 Proof of results in Section 5.2

In this part of the Appendix, we present proofs regarding the lemmas and theorems on the the optimality of the proposed MH-policy, as discussed in Section 5.2.

Proof of Lemma 7 17

We prove by definition. Irreducibility follows from the proposal uniform distribution due to ramdom sampling from $\mathcal{C}_{i}=M$ for $\forall V_{i}\in V$ being active. The transition probability from state (coloring) $c$ to state (coloring) $c^{\prime}$ by the MH-policy (Algorithm 1) is

\displaystyle P^{\epsilon}_{c\rightarrow c^{\prime}}=\frac{1}{n}\frac{1}{m}\min\{1,\epsilon^{-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c\rightarrow c^{\prime})}\}

where $\epsilon=e^{-\frac{1}{\tau}}$ . Note that $P^{\epsilon}_{c\rightarrow c^{\prime}}>0$ whatever $c$ and $c^{\prime}$ are when $\epsilon<\infty$ . Then we have

\displaystyle\lim_{\epsilon\rightarrow 0^{+}}P^{\epsilon}_{c\rightarrow c^{\prime}}=\frac{1}{n}\frac{1}{m}\mathds{1}_{U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c)<U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c^{\prime})}

which is the transtion probability of the unperturbed distribution $P^{0}_{c\rightarrow c^{\prime}}$ . Then we have

\displaystyle\frac{P^{\epsilon}_{c\rightarrow c^{\prime}}}{\epsilon^{\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c\rightarrow c‘)\}}}=\frac{1}{nm}>0

thus the MH-policy induced Markov Chain is RPMP with resistances

\displaystyle R(c\rightarrow c^{\prime})=\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}(c\rightarrow c^{\prime})\}.

(18)

\qed

The proof of Theorem 8 is based on Lemma 7.

Proof of Theorem 8 18

We prove by contradiction. Let $c$ be a stochastically stable state and $T^{*}_{c}$ is the corresponding minimum resistance tree rooted at $c$ ; i.e. $R(T^{*}_{c})=\gamma^{*}:=\min_{j\in\{1,2,\dots,|C|\}}\gamma_{j}$ by Theorem 6. Suppose $c\neq\operatorname*{arg\,max}_{\tilde{c}\in C}\Phi(\tilde{c})$ . Consider an optimal state $c^{\prime}$ such that $c^{\prime}=\arg\max_{\tilde{c}\in C}\Phi(\tilde{c})$ . We try to establish a new tree $T_{c^{\prime}}$ rooted at $c^{\prime}$ such that $R(T_{c^{\prime}})<R(T^{*}_{c})$ .

By Definition 7, there exists a unique path

\displaystyle\mathcal{P}(c^{\prime}\rightarrow c)=\{c^{\prime}\rightarrow c^{1}\rightarrow\dots\rightarrow c^{k}\rightarrow c\}

from $c^{\prime}$ to $c$ . Consider the reverse of this path

\displaystyle\mathcal{P}(c\rightarrow c^{\prime})=\{c\rightarrow c^{k}\rightarrow\dots\rightarrow c^{1}\rightarrow c^{\prime}\}

and one can observe that a new tree, namely $T_{c^{\prime}}$ can be established by replacing $\mathcal{P}(c^{\prime}\rightarrow c)$ with $\mathcal{P}(c\rightarrow c^{\prime})$ . The vertices which used to access $c$ without passing $c^{\prime}$ would now access $c^{\prime}$ uniquely through $\mathcal{P}(c\rightarrow c^{\prime})$ . Figure 9 illustrates an example when $|C|=8$ and $k=2$ with $T^{*}_{c}$ and $T_{c^{\prime}}$ represented in red and green respectively. The resistence of $T_{c^{\prime}}$ is then

\displaystyle R(T_{c^{\prime}})=R(T^{*}_{c})+R(\mathcal{P}(c\rightarrow c^{\prime}))-R(\mathcal{P}(c^{\prime}\rightarrow c)).

(19)

Figure 9: Example: establish

T_{c^{\prime}}

from

T^{*}_{c}

By (18) and (5), for adjacent states $c^{l}$ and $c^{l+1}$ (i.e. there is a an edge between $c^{l}$ and $c^{l+1}$ ),

	$\displaystyle R(c^{l}\rightarrow c^{l+1})=\max\{0,\Phi(c^{l})-\Phi(c^{l+1})\}$
	$\displaystyle R(c^{l+1}\rightarrow c^{l})=\max\{0,\Phi(c^{l+1})-\Phi(c^{l})\}$

thus

\displaystyle R(c^{l+1}\rightarrow c^{l})-R(c^{l}\rightarrow c^{l+1})=\Phi(c^{l+1})-\Phi(c^{l}).

Therefore, with $c^{0}=c^{\prime}$ and $c^{k+1}=c$ ,

	$\displaystyle R(\mathcal{P}(c\rightarrow c^{\prime}))-R(\mathcal{P}(c^{\prime}\rightarrow c))$	$\displaystyle=\sum_{l=0}^{k}[R(c^{l}\rightarrow c^{l+1})-R(c^{l+1}\rightarrow c^{l})]$
		$\displaystyle=\Phi(c)-\Phi(c^{\prime})$
		$\displaystyle<0$

because $c^{\prime}$ achieves best welfare by assumption. Plugging back into (19), we have

\displaystyle R(T_{c^{\prime}})<R(T^{*}_{c})

which contradicts to our assumption that $c$ is stochastically stable. We now conclude that any stochastically stable state(coloring) must bring the largest welfare; i.e. a G-VCG converges to $c^{*}_{WO2}$ under the MH-policy. \qed

C.2 Proof of results in Section 5.3

In this part of the Appendix, we present proofs regarding the lemmas and theorems on the the optimality of the proposed MH-policy in independent synchronous settings, as discussed in Section 5.3.

Proof of Lemma 9 19

(i) and (ii) in Definition 6 are obviously satisfied with same arguments as in our proof of Lemma 7. We now examine whether (iii) holds.

\displaystyle\begin{split}P_{t}^{\epsilon_{t}}(c\rightarrow c^{\prime})&=\sum_{S\subseteq V:G\subseteq S}\omega_{t}^{|S|}(1-\omega_{t})^{|V\backslash S|}\times\prod_{V_{i}\in G}\frac{1}{m}\min\{1,\epsilon_{t}^{-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))}\}\\ &\quad\quad\times\prod_{V_{j}\in S\backslash G}[1-\sum_{\tilde{c}\in\mathcal{C}_{j}\backslash\{c_{j}\}}\frac{1}{m}\min\{1,\epsilon_{t}^{-\Delta U_{\mathcal{F}_{j}\cup\{V_{j}\}}((c_{j},c_{-j})\rightarrow(\tilde{c}_{j},c_{-j}))}\}]\\ &=\sum_{S\subseteq V:G\subseteq S}\omega_{t}^{|S|}(1-\omega_{t})^{|V\backslash S|}\frac{1}{m^{|G|}}\epsilon_{t}^{\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}\\ &\quad\quad\times\underbrace{\prod_{V_{j}\in S\backslash G}[1-\sum_{\tilde{c}\in M\backslash\{c_{j}\}}\frac{1}{m}\underbrace{\epsilon_{t}^{\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(\tilde{c}_{i},c_{-i}))\}}}_{\in(0,1)}]}_{\in(\frac{1}{m^{|S\backslash G|}},1)}\\ \end{split}

(20)

thus

\displaystyle\begin{split}\lim_{t\rightarrow\infty}\frac{P_{t}^{\epsilon_{t}}(c\rightarrow c^{\prime})}{\epsilon_{t}^{R^{\epsilon_{t}}(c\rightarrow c^{\prime})}}&\in(\sum_{S\subseteq V:G\subseteq S}\omega_{t}^{|S|}(1-\omega_{t})^{|V\backslash S|}\frac{1}{m^{|S|}},\sum_{S\subseteq V:G\subseteq S}\omega_{t}^{|S|}(1-\omega_{t})^{|V\backslash S|}\frac{1}{m^{|G|}})\\ &\in(0,+\infty)\end{split}

where

\displaystyle R^{\epsilon_{t}}(c\rightarrow c^{\prime})=\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(\tilde{c}_{i},c_{-i}))\}.

\qed

The proof of Theorem 10 is based on Lemma 9.

Proof of Theorem 10 20

The proof is similar to the one in Marden and Shamma (2012) Theorem 4.2. Arbitrarily consider a transition $c\rightarrow c^{\prime}$ where $|G|\equiv\{V_{j}:c_{j}\neq c_{j}^{\prime}\}\geq 2$ in round $t$ . Given the independent activation probability $\omega_{t}=\epsilon_{t}^{\mathcal{K}}$ where $\epsilon_{t}=e^{-\frac{1}{\tau(t)}}$ , the probability of this transition is

\displaystyle\begin{split}P_{t}^{\epsilon_{t}}(c\rightarrow c^{\prime})&=\epsilon_{t}^{\mathcal{K}|G|}\sum_{S\subseteq V:G\subseteq S}\biggl{\{}\epsilon_{t}^{\mathcal{K}(|S|-|G|)}(1-\epsilon_{t}^{\mathcal{K}})^{|V\backslash S|}\times\prod_{V_{i}\in G}\frac{1}{m}\min\{1,\epsilon_{t}^{-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))}\}\\ &\quad\quad\times\prod_{V_{j}\in S\backslash G}[1-\sum_{\tilde{c}\in M\backslash\{c_{j}\}}\frac{1}{m}\min\{1,\epsilon_{t}^{-\Delta U_{\mathcal{F}_{j}\cup\{V_{j}\}}((c_{j},c_{-j})\rightarrow(\tilde{c}_{j},c_{-j}))}\}]\biggl{\}}.\\ \end{split}

(21)

Note that the second equality of (21) results from (R1) in Assumption 3. Dividing $P_{t}^{\epsilon_{t}}(c\rightarrow c^{\prime})$ by $\epsilon_{t}^{\mathcal{K}|G|+\sum_{i\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}$ and taking limits $t\rightarrow\infty$ on both sides of (21) (equivalently $\tau(t)\rightarrow 0$ and $\epsilon_{t}\rightarrow 0$ ) eliminates the terms when $|S|>|G|$ and gives

\displaystyle\begin{split}&\lim_{t\rightarrow\infty}\frac{P_{t}^{\epsilon_{t}}(c\rightarrow c^{\prime})}{\epsilon_{t}^{\mathcal{K}|G|+\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}}\\ =&\lim_{t\rightarrow\infty}\frac{\prod_{V_{i}\in G}\frac{1}{m}\min\{1,\epsilon_{t}^{-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))}\}}{\epsilon_{t}^{\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}}\\ =&\lim_{t\rightarrow\infty}\frac{1}{m^{|G|}}\frac{\prod_{V_{i}\in G}\epsilon_{t}^{\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}}{\epsilon_{t}^{\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}}}\\ =&\frac{1}{m^{|G|}}\in(0,+\infty).\end{split}

Therefore, the process induced by the MH-policy is exactly an RPMP with transition resistance

\displaystyle R(c\rightarrow c^{\prime})=\mathcal{K}|G|+\sum_{V_{i}\in G}\max\{0,-\Delta U_{\mathcal{F}_{i}\cup\{V_{i}\}}((c_{i},c_{-i})\rightarrow(c_{i}^{\prime},c_{-i}))\}.

Denote $\phi^{*}$ to be the maximum preference among all agents and colors. Then for any transition $c\rightarrow c^{\prime}$ with $G:=\{V_{j}:c_{j}\neq c_{j}^{\prime}\}$ ,

\displaystyle\mathcal{K}|G|\leq R(c\rightarrow c^{\prime})\leq\mathcal{K}|G|+\phi^{*}|G|.

(22)

We would like to show that any edge $\{c^{k}\rightarrow c^{l}\}$ in a minimum resistance tree $\mathcal{T}^{*}_{\tilde{c}}$ , whatever the root $\tilde{c}$ is, must have its endpoints deviating only in single element; i.e. $|\{V_{j}:c_{j}^{k}\neq c_{j}^{l}\}|=1$ . Then the optimality of a stochastically stable state is covered by our proof in Theorem 8.

Again we prove by contradiction. Suppose there exists an edge $\{c\rightarrow c^{\prime}\}$ in $\mathcal{T}^{*}_{\tilde{c}}$ such that $c$ and $c^{\prime}$ have more than one deviators; i.e. $G:=\{V_{j}:c_{j}\neq c_{j}^{\prime}\}$ has cardinality $|G|\geq 2$ . We now look for another path $\mathcal{P}_{c\rightarrow c^{\prime}}=\{c\rightarrow c^{1}\rightarrow c^{2}\rightarrow\dots\rightarrow c^{k-1}\rightarrow c^{\prime}\}$ where $k=|G|$ such that the adjacent coloring only differ in one element. By (22),

\displaystyle\begin{split}R(c\rightarrow c^{\prime})&\geq\mathcal{K}|G|\\ R(\mathcal{P}_{c\rightarrow c^{\prime}})&=\sum_{i=0}^{k-1}R(c^{i}\rightarrow c^{i+1})\leq|G|(\mathcal{K}+\phi^{*})\end{split}

where $c^{0}:=c$ and $c^{k}:=c^{\prime}$ . Ignoring the original path $\mathcal{P}_{c^{i}\rightarrow\tilde{c}}$ for $i=0,1,2,\dots,k-1$ and implementing $\mathcal{P}_{c\rightarrow c^{\prime}}$ into $\mathcal{T}^{*}_{\tilde{c}}$ gives another tree $\mathcal{T}_{\tilde{c}}$ . Then

\displaystyle\begin{split}R(\mathcal{T}_{\tilde{c}})&=R(\mathcal{T}^{*}_{\tilde{c}})+R(\mathcal{P}_{c\rightarrow c^{\prime}})-R(c\rightarrow c^{\prime})-\sum_{i=1}^{k-1}R(\mathcal{P}_{c^{i}\rightarrow\tilde{c}})\\ &\leq R(\mathcal{T}^{*}_{\tilde{c}})+\mathcal{K}|G|+\phi^{*}|G|-\mathcal{K}|G|-\mathcal{K}(|G|-1)\\ &=R(\mathcal{T}^{*}_{\tilde{c}})+\phi^{*}|G|-\mathcal{K}(|G|-1).\end{split}

As long as $K>\frac{|G|\phi^{*}}{|G|-1}$ , one has $R(\mathcal{T}_{\tilde{c}})<R(\mathcal{T}_{\tilde{c}^{*}})$ which contradicts out assumption that $\mathcal{T}^{*}_{\tilde{c}}$ has the least resistance. Therefore, a minimum resistance tree must have its endpoints deviating only in single element and the rest follows from Theorem 8.\qed

C.3 Group B: WO-Async 2

In this group, we investigate on the performance of Algorithm 1 when given networks imposed with different connection probabilities (thus different degrees) while control the sizes identical ( $n=20$ ), and compare the results with the TS algorithm 3. See Figure 10 for the results.

C.4 Group C: WO-Sync

In this group, we investigate on the performance of Algorithm 1 under independently synchronous settings with different activation parameters $\omega$ (the game is completely synchronous when $\omega=1$ ) while control the network features to be identical ( $n=20,p=0.5$ ). One can compare the optimal welfare with that in 10(e) for effectiveness evaluation. See Figure 11 for the results.

C.5 Group D: RWO-Async

In this group, we focus on Algorithm 2 on solving RWO. The expected loss term highly depends on the conncection probabilities of the complementary edges which are indirectly determined by the network degrees. Therefore, we investigate on the asynchronous performance under networks with a same number of vertices ( $n=20$ ) yet different priori connection probabilities, as in Group B. See Figure 12 for the results.

C.6 Group E: RWO-Sync

In this group, we again investigate on the influence made by different activation parameters under synchronous settings as in Group C, yet focus on solving RWO. One can compare the optimal welfare with that in 12(e). See Figure 13 for the results.

	$\displaystyle P(x^{\prime}\|x)$	$\displaystyle=Q(x^{\prime}\|x)\alpha(x\rightarrow x^{\prime}),\quad x^{\prime}\neq x;$
	$\displaystyle P(x\|x)$	$\displaystyle=1-\sum_{x^{\prime}\neq x}P(x^{\prime}\|x).$

Utilitarian Welfare Optimization in the Generalized Vertex Coloring Games: An Implication to Venue Selection in Events Planning

Abstract

keywords:

1 Introduction

Remark 1

2 Literature Review

2.1 Network Coloring Game for Conflict Resolving

2.2 The Use of Markov Chains in Graph Coloring

2.3 The Metropolis-Hasting Algorithm for Optimization

3 Game Formulation

3.1 Setting

Definition 1 (G-VCG)

Definition 2 (Independent and completely synchronous G-VCGs)

3.2 Essential Assumptions

Assumption 1 (Utilitarian/Weighted-Utilitarian Welfare)

Assumption 2 (Abundant Colors)

Example 1 (|M|≥δ|M|\geq\delta is insufficient)

Assumption 3 (Random Color Update with Policy Acceptance)

Definition 3 (Completely Greedy Policy)

4 Identical Preference VCGs

Definition 4 (IP-VCGs)

4.1 Convergence of Completely Synchronous IP-VCGs to Optimality under the CGP

Example 2 (Theorem 2 in Chaudhuri et al. (2008))

Definition 5 (Absorbing Markov Chain)

Lemma 1 (AMC on Utility Space)

Proof of Lemma 1 1

Remark 2

Theorem 2 (Convergence to Optimality)

Proof of Theorem 2 2

4.2 Stochastic Boundedness on the Time to Convergence of Completely Synchronous IP-VCGs

Theorem 3 (Individual Utility Rise in Two Consecutive Rounds)

Proof of Theorem 3 3

Lemma 4 (Bounded Expectation and Variance for Time to Convergence)

Proof of Lemma 4 4

Theorem 5 (Stochastic Boundedness of the Time to Convergence)

Proof of Theorem 5 5

5 A Metropolis-Hasting Based Optimal Policy for G-VCGs

Example 3

5.1 Detour: Regular Perturbed Markov Processes and Resistance Tree

Definition 6 (Regular Perturbed Markov Process)

Remark 3

Definition 7 (Resistance Tree and Stochastic Potential)

Theorem 6 (Characterization for Stochastically Stable States)

Proof of Theorem 6 6

5.2 The MH-Policy in Asynchronous G-VCGs

Lemma 7 (MH-Policy Induces RPMP)

Proof of Lemma 7 7

Theorem 8 (The MH-Policy is Optimal)

Proof of Theorem 8 8

5.3 The MH-Policy in Independently Synchronous G-VCGs

Lemma 9 (MH-Policy Induces RPMP with independent synchrony)

Proof of Lemma 9 9

Theorem 10 (Optimality of the MH-Policy with low-ω\omega independent synchrony)

Proof of Theorem 10 10

Remark 4

6 Robust Welfare Optimization for G-VCGs

6.1 Motivation and Formulation

6.2 The MH-Policy for RWO

Assumption 4

Remark 5

Remark 6

7 Simulation Experiments

8 Conclusion

9 Acknowledgement

References

Appendix A Tabu search algorthm used in Section 7

Appendix B Proof of results in Section 4

B.1 Proof of results in Section 4.1

Proof of Lemma 1 11

Proof of Theorem 2 12

B.2 Proof of results in Section 4.2

Proof of Theorem 3 13

Definition 8 (Canonical Form of AMC)

Lemma 11 (Properties of AMC (See, e.g. Kassir (2018) Chapter 2))

Proof of Lemma 11 14

Proof of Lemma 4 15

Proof of Theorem 5 16

Appendix C Proof of results in Section 5

C.1 Proof of results in Section 5.2

Proof of Lemma 7 17

Example 1 ( $|M|\geq\delta$ is insufficient)

Theorem 10 (Optimality of the MH-Policy with low- $\omega$ independent synchrony)