This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

00footnotetext: Equal contribution. 1Northwestern University; {boyiliu2018, jiayangli2024}@u.northwestern.edu, {y-nie, zhaoran.wang}@northwestern.edu. 2Yale University; [email protected]. 3The Chinese University of Hong Kong; [email protected]. 4University of Minnesota; [email protected]

Inducing Equilibria via Incentives:
Simultaneous Design-and-Play Ensures Global Convergence

Boyi Liu1,∗   Jiayang Li1,∗   Zhuoran Yang2   Hoi-To Wai3   Mingyi Hong4
Yu (Marco) Nie1   Zhaoran Wang1
Abstract

To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome. This incentive design problem naturally possesses a bilevel structure, in which a designer modifies the rewards of the agents with incentives while anticipating the response of the agents, who play a non-cooperative game that converges to an equilibrium. The existing bilevel optimization algorithms raise a dilemma when applied to this problem: anticipating how incentives affect the agents at equilibrium requires solving the equilibrium problem repeatedly, which is computationally inefficient; bypassing the time-consuming step of equilibrium-finding can reduce the computational cost, but may lead the designer to a sub-optimal solution. To address such a dilemma, we propose a method that tackles the designer’s and agents’ problems simultaneously in a single loop. Specifically, at each iteration, both the designer and the agents only move one step. Nevertheless, we allow the designer to gradually learn the overall influence of the incentives on the agents, which guarantees optimality after convergence. The convergence rate of the proposed scheme is also established for a broad class of games.

1 Introduction

A common thread in human history is how to “properly” regulate a social system comprised of self-interested individuals. In a laissez-faire economy, for example, the competitive market itself is the primary regulatory mechanism [47, 16]. However, a laissez-faire economy may falter due to the existence of significant “externalities” [8, 18], which may arise wherever the self-interested agents do not bear the external cost of their behaviors in the entirety. The right response, many argue, is to introduce corrective policies in the form of economic incentives (e.g., tolls, taxes, and subsidies) [32]. By modifying the rewards of the agents, these incentives can encourage (discourage) the agents to engage in activities that create positive (negative) side effects for the society, and thus guide the self-interests of the agents towards a socially desirable end. For example, carbon taxes can be levied on carbon emissions to protect the environment during the production of goods and services [35]. Surge pricing has been widely used to boost supply and dampen demand in volatile ride-hail markets [37]. Lately, subsidies and penalties were both introduced to overcome vaccine hesitancy in the world’s hard-fought battle against the COVID-19 pandemic.

The goal of this paper is to develop a provably efficient method for guiding the agents in a non-cooperative game towards a socially desirable outcome — e.g., the one that maximizes the social welfare — by modifying their payoffs with incentives. The resulting problem may be naturally interpreted as a Stackelberg game [50] in which the “incentive designer” is the leader while the agents being regulated are the followers. Hence, it naturally possesses a bilevel structure [3]: at the upper level, the “designer” optimizes the incentives by anticipating and regulating the best response of the agents, who play a non-cooperative game at the lower level. As the lower-level agents pursue their self-interests freely, their best response can be predicted by the Nash equilibrium [39], which dictates no agent can do better by unilaterally changing their strategy. Accordingly, the incentive design problem is a mathematical program with equilibrium constraints (MPEC) [30].

In the optimization literature, MPECs are well-known for their intractability [10]. Specifically, even getting a first-order derivative through their bilevel structure is a challenge. In the incentive design problem, for example, to calculate the gradient of the designer’s objective at equilibrium, which provides a principled direction for the designer to update the incentives, one must anticipate how the equilibrium is affected by the changes [17]. This is usually achieved by performing a sensitivity analysis, which in turn requires differentiation through the lower-level equilibrium problem, either implicitly or explicitly [25]. No matter how the sensitivity analysis is carried out, the equilibrium problem must be solved before updating the incentives. The resulting algorithm thus admits a double loop structure: in the outer loop, the designer iteratively moves along the gradient; but to find the gradient, it must allow the lower level game dynamics to run its course to arrive at the equilibrium given the current incentives.

Because of the inherent inefficiency of the double-loop structure, many heuristics methods have also been developed for bilevel programs in machine learning [27, 29, 14]. When applied to the incentive design problem, these methods assume that the designer does not solve the equilibrium exactly to evaluate the gradient. Instead, at each iteration, the game is allowed to run just a few rounds, enough for the designer to obtain a reasonable approximation of the gradient. Although such a method promises to reduce the computational cost significantly at each iteration, it may never converge to the same optimal solution obtained without the approximation.

Contribution. In a nutshell, correctly anticipating how incentives affect the agents at equilibrium requires solving the equilibrium problem repeatedly, which is computationally inefficient. On the other hand, simply bypassing the time-consuming step of equilibrium-finding may lead the designer to a sub-optimal solution. This dilemma prompts the following question that motivates this study: can we obtain the optimal solution to an incentive design problem without repeatedly solving the equilibrium problem?

In this paper, we propose an efficient principled method that tackles the designer’s problem and agents’ problem simultaneously in a single loop. At the lower level, we use the mirror descent method [40] to model the process through which the agents move towards equilibrium. At the upper level, we use the gradient descent method to update the incentives towards optimality. At each iteration, both the designer and the agents only move one step based on the first-order information. However, as discussed before, the upper gradient relies on the corresponding lower equilibrium, which is not available in the single-loop update. Hence, we propose to use the implicit differentiation formula—with equilibrium strategy replaced by the current strategy—to estimate the upper gradient, which might be biased at the beginning. Nevertheless, we prove that if we improve the lower-level solution with larger step sizes, the upper-level and lower-level problems may converge simultaneously at a fast rate. The proposed scheme hence guarantees optimality because it can anticipate the overall influence of the incentives on the agents eventually after convergence.

Organization. In Section 2, we discuss related work. In Section 3, we provide the mathematical formulation of the incentive design problem. In Section 4, we design algorithms for solving the problem. In Section 5, we establish conditions under which the proposed scheme globally converges to the optimal solution and analyze the convergence rate. The convergence analysis is restricted to games with a unique equilibrium. In Section 6, we discuss how to apply our algorithms to games with multiple equilibria. Eventually, we conduct experiments to test our algorithms in Section 7.

Notation. We denote ,\langle\cdot,\cdot\rangle as the inner product in vector spaces. For a vector form a=(ai)a=(a^{i}), we denote ai=(aj)jia^{-i}=(a^{j})_{j\neq i}. For a finite set 𝒳n\mathcal{X}\in\mathbb{R}^{n}, we denote Δ(𝒳)={π+n:xi𝒳πxi=1}\Delta(\mathcal{X})=\{\pi\in\mathbb{R}_{+}^{n}:\sum_{x^{i}\in\mathcal{X}}\pi_{x^{i}}=1\}. For any vector norm \|\cdot\|, we denote =supz1,z\|\cdot\|_{*}=\sup_{\|z\|\leq 1}\langle\cdot,z\rangle as its dual norm. We refer readers to Appendix A for a collection of frequently used problem-specific notations.

2 Related work

The incentive design problem studied in this paper is a special case of mathematical programs with equilibrium constraints (MPEC) [19], a class of optimization problems constrained by equilibrium conditions. MPEC is closely related to bilevel programs [10], which bind two mathematical programs together by treating one program as part of the constraints for the other.

Bilevel Programming. In the optimization literature, bilevel programming was first introduced to tackle resource allocation problems [7] and has since found applications in such diverse topics as revenue management, network design, traffic control, and energy systems. In the past decade, researchers have discovered numerous applications of bilevel programming in machine learning, including meta-learning (ML) [14], adversarial learning [22], hyperparameter optimization, [31] and neural architecture search [27]. These newly found bilevel programs in ML are often solved by gradient descent methods, which require differentiating through the (usually unconstrained) lower-level optimization problem [28]. The differentiation can be carried out either implicitly on the optimality conditions as in the conventional sensitivity analysis [see e.g., 2, 43, 4], or explicitly by unrolling the numerical procedure used to solve the lower-level problem [see e.g., 31, 15]. In the explicit approach, one may “partially” unroll the solution procedure (i.e., stop after just a few rounds, or even only one round) to reduce the computational cost. Although this popular heuristic has delivered satisfactory performance on many practical tasks [29, 36, 14, 27], it cannot guarantee optimality for bilevel programs under the general setting, as it cannot derive the accurate upper-level gradient at each iteration [53].

MPEC. Unlike bilevel programs, MPEC is relatively under-explored in the ML literature so far. Recently, Li et al. [25] extended the explicit differentiation method for bilevel programs to MPECs. Their algorithm unrolls an iterative projection algorithm for solving the lower-level problem, which is formulated as a variational inequality (VI) problem. Leveraging the recent advance in differentiable programming [2], they embedded each projection iteration as a differentiable layer in a computational graph, and accordingly, transform the explicit differentiation as standard backpropagation through the graph. The algorithm proposed by Li et al. [26] has a similar overall structure, but theirs casts the lower-level solution process as the imitative logit dynamics [6] drawn from the evolutionary game theory, which can be more efficiently unrolled. Although backpropagation is efficient, constructing and storing such a graph — with potentially a large number of projection layers needed to find a good solution to the lower-level problem — is still demanding. To reduce this burden, partially unrolling the iterative projection algorithm is a solution. Yet, it still cannot guarantee optimality for MPECs due to the same reason as for bilevel programs.

The simultaneous design-and-play approach is proposed to address this dilemma. Our approach follows the algorithm of Hong et al. [21] and Chen et al. [9], which solves bilevel programs via single-loop update. Importantly, they solve both the upper- and the lower-level problem using a gradient descent algorithm and establish the relationship between the convergence rate of the single-loop algorithm and the step size used in gradient descent. However, their algorithms are limited to the cases where the lower-level optimization problem is unconstrained. Our work extends these single-loop algorithms to MPECs that have an equilibrium problem at the lower level. We choose mirror descent as the solution method to the lower-level problem because of its broad applicability to optimization problems with constraints [40] and generality in the behavioral interpretation of games [34, 23]. We show that the convergence of the proposed simultaneous design-and-play approach relies on the setting of the step size for both the upper- and lower-level updates, a finding that echos the key result in [21]. We first give the convergence rate under mirror descent and the unconstrained assumption and then extend the result to the constrained case. For the latter, we show that convergence cannot be guaranteed if the lower-level solution gets too close to the boundary early in the simultaneous solution process. To avoid this trap, the standard mirror descent method is revised to carefully steer the lower-level solution away from the boundary.

3 Problem Formulation

We study incentive design in both atomic games [39] and nonatomic games [45], classified depending on whether the set of agents is endowed with an atomic or a nonatomic measure. In social systems, both types of games can be useful, although the application context varies. Atomic games are typically employed when each agent has a non-trivial influence on the rewards of other agents. In a nonatomic game, on the contrary, a single agent’s influence is negligible and the reward could only be affected by the collective behavior of agents.

Atomic Game. Consider a game played by a finite set of agents 𝒩={1,,n}\mathcal{N}=\{1,\ldots,n\}, where each agent i𝒩i\in\mathcal{N} selects a strategy ai𝒜idia^{i}\in\mathcal{A}^{i}\subseteq{\mathbb{R}}^{d^{i}} to maximize the reward received, which is determined by a continuously differentiable function ui:𝒜=i𝒩𝒜iu^{i}:\mathcal{A}=\prod_{i\in\mathcal{N}}\mathcal{A}^{i}\to\mathbb{R}. Formally, a joint strategy a𝒜a_{*}\in\mathcal{A} is a Nash equilibrium if

ui(ai,ai)ui(ai,ai),ai𝒜i,i𝒩.u^{i}(a^{i}_{*},a^{-i}_{*})\geq u^{i}(a^{i},a^{-i}_{*}),\quad\forall~{}a^{i}\in\mathcal{A}^{i},\quad\forall i\in{\mathcal{N}}.

Suppose that for all i𝒩i\in\mathcal{N}, the strategy set 𝒜i\mathcal{A}^{i} is closed and convex, and the reward function uiu^{i} is convex in aia^{i}, then a𝒜a_{*}\in\mathcal{A} is a Nash equilibrium if and only if there exists λi,,λn>0\lambda^{i},\ldots,\lambda^{n}>0 such that [46]

i=1nλiaiui(a),aiai0,for alla𝒜.\sum_{i=1}^{n}\lambda^{i}\cdot\bigl{\langle}\nabla_{a^{i}}u^{i}(a_{*}),a^{i}-a^{i}_{*}\bigr{\rangle}\leq 0,\quad\text{for all}~{}a\in\mathcal{A}. (3.1)
Example 3.1 (Oligopoly model).

In an oligopoly model, there is finite set 𝒩={1,,n}{\mathcal{N}}=\{1,\ldots,n\} of firms, each of which supplies the market with a quantity aia^{i} (ai0a^{i}\geq 0) of goods. Under this setting, we have 𝒜=+n\mathcal{A}=\mathbb{R}^{n}_{+}. The good is then priced as p(q)=p0γqp(q)=p_{0}-\gamma\cdot q, where p0,γ>0p_{0},\gamma>0 and q=j𝒩aiq=\sum_{j\in{\mathcal{N}}}a^{i} it the total output. The profit and the marginal profit of the firm ii are then given by

ui(a)=ai(p0γj𝒩aj)ci,aiui(a)=p0γ(ai+j𝒩aj),\displaystyle u^{i}(a)=a^{i}\cdot\biggl{(}p_{0}-\gamma\cdot\sum_{j\in{\mathcal{N}}}a^{j}\biggr{)}-c^{i},\quad\nabla_{a^{i}}u^{i}(a)=p_{0}-\gamma\cdot\biggl{(}a^{i}+\sum_{j\in{\mathcal{N}}}a^{j}\biggr{)},

respectively, where cic^{i} is the constant marginal cost111Throughout this paper, we use the term “reward” to describe the scenario where the agents aim to maximize uiu^{i}, and use “cost” when the agents aim to do the opposite. for firm ii.

Nonatomic Game. Consider a game played by a continuous set of agents, which can be divided into a finite set of classes 𝒩={1,,n}\mathcal{N}=\{1,\ldots,n\}. We assume that each i𝒩i\in\mathcal{N} represents a class of infinitesimal and homogeneous agents sharing the finite strategy set 𝒜i\mathcal{A}^{i} with |𝒜i|=di|\mathcal{A}^{i}|=d^{i}. The mass distribution for the class ii is defined as a vector qiΔ(𝒜i)q^{i}\in\Delta(\mathcal{A}^{i}) that gives the proportion of agents using each strategy. Let the cost of an agent in class ii to select a strategy a𝒜ia\in\mathcal{A}^{i} given q=(q1,,qn)q=(q^{1},\ldots,q^{n}) be cia(q)c^{ia}(q). Formally, a joint mass distribution qΔ(𝒜)=i𝒩Δ(𝒜i)q\in\Delta(\mathcal{A})=\prod_{i\in\mathcal{N}}\Delta(\mathcal{A}^{i}) is a Nash equilibrium, also known as a Wardrop equilibrium [51], if for all i𝒩i\in\mathcal{N}, there exists bib^{i} such that

{cia(q)=bi,ifqia>0,cia(q)bi,ifqia=0.\begin{cases}c^{ia}(q_{*})=b^{i},\quad\text{if}~{}q^{ia}_{*}>0,\\ c^{ia}(q_{*})\geq b^{i},\quad\text{if}~{}q^{ia}_{*}=0.\end{cases}

The following result extends the VI formulation to Nash equilibrium in nonatomic game: denote ci(q)=(cia(q))a𝒜ic^{i}(q)=(c^{ia}(q))_{a\in\mathcal{A}^{i}}, then qq_{*} is a Nash equilibrium if and only if [11]

i𝒩λici(q),qiqi0,for allqΔ(𝒜).\sum_{i\in\mathcal{N}}\lambda^{i}\cdot\bigl{\langle}c^{i}(q_{*}),q^{i}-q^{i}_{*}\bigr{\rangle}\geq 0,\quad\text{for all}~{}q\in\Delta(\mathcal{A}). (3.2)
Example 3.2 (Routing game).

Consider a set of agents traveling from source nodes to sink nodes in a directed graph with nodes 𝒱\mathcal{V} and edges \mathcal{E}. Denote 𝒩𝒱×𝒱\mathcal{N}\subseteq\mathcal{V}\times\mathcal{V} as the set of source-sink pairs, 𝒜i2\mathcal{A}^{i}\subseteq 2^{\mathcal{E}} as the set of paths connecting i𝒩i\in\mathcal{N} and ia\mathcal{E}^{ia}\subseteq\mathcal{E} as the set of all edges on the path a𝒜ia\in\mathcal{A}^{i}. Suppose that each source-sink pair i𝒩i\in\mathcal{N} is associated with ρi\rho^{i} nonatomic agents aiming to choose a route from 𝒜i\mathcal{A}^{i} to minimize the total cost incurred. Let qiaΔ(𝒜i)q^{ia}\in\Delta(\mathcal{A}^{i}) be the proportion of travelers using the path a𝒜wa\in\mathcal{A}_{w}, xe+x_{e}\in\mathbb{R}_{+} be the number of travelers using the edge ee and te(xe)+t^{e}(x^{e})\in\mathbb{R}_{+} be the cost for using edge ee. Then we have xe=i𝒩a𝒜iρiqiaδeia,x^{e}=\sum_{i\in\mathcal{N}}\sum_{a\in\mathcal{A}^{i}}\rho^{i}\cdot q^{ia}\cdot\delta^{eia}, where δeik\delta^{eik} equals 1 if eiae\in\mathcal{E}^{ia} and 0 otherwise. The total cost for a traveler selecting a path a𝒜ia\in\mathcal{A}^{i} will then be cia(q)=ete(xe)δeiac^{ia}(q)=\sum_{e\in\mathcal{E}}t^{e}(x^{e})\cdot\delta^{eia}.

Incentive Design. Despite the difference, we see that an equilibrium of both atomic and nonatomic games can be formulated as a solution to a corresponding VI problem in the following form

i𝒩λivi(x),xixi0,for allx𝒳=i𝒩𝒳i,\sum_{i\in\mathcal{N}}\lambda^{i}\cdot\bigl{\langle}v^{i}(x_{*}),x^{i}-x^{i}_{*}\bigr{\rangle}\leq 0,\quad\text{for all}~{}x\in\mathcal{X}=\prod_{i\in\mathcal{N}}\mathcal{X}^{i}, (3.3)

where viv^{i} and 𝒳i\mathcal{X}^{i} denote different terms in the two types of games. Suppose that there exists an incentive designer aiming to induce a desired equilibrium. To this end, the designer can add incentives θΘd\theta\in\Theta\subseteq{\mathbb{R}}^{d}, which is assumed to enter the reward/cost functions and thus leads to a parameterized vθi(x)v_{\theta}^{i}(x). We assume that the designer’s objective is determined by a function f:Θ×𝒳f:\Theta\times\mathcal{X}\to{\mathbb{R}}. Denote S(θ)S(\theta) as the solution set to (3.3). We then obtain the uniform formulation of the incentive design problem for both atomic games and nonatomic games

minθΘf(θ)=f(θ,x),s.t.xS(θ).\min_{\theta\in\Theta}f_{*}(\theta)=f(\theta,x_{*}),\quad\text{s.t.}\quad x_{*}\in S(\theta). (3.4)

If the equilibrium problem admits multiple solutions, the agents may converge to different ones and it would be difficult to determine which one can better predict the behaviors of the agents without additional information. In this paper, we first consider the case where the game admits a unique equilibrium. Sufficient conditions under which the game admits a unique equilibrium will also be provided later. We would consider the non-unique case later and show that our algorithms can still become applicable by adding an appropriate regularizer in the cost function.

Stochastic Environment. In the aforementioned settings, vθi(x)v_{\theta}^{i}(x) is a deterministic function. Although most MPEC algorithms in the optimization literature follow this deterministic setting, in this paper, we hope our algorithm can handle more realistic scenarios. Specifically, in the real world, the environment could be stochastic if some environment parameters that fluctuate over days. In a traffic system, for example, both worse weather and special events may affect the road condition, hence the travel time vθi(x)v_{\theta}^{i}(x) experienced by the drivers. We expect our algorithm can still work in the face of such stochasticity. To this end, we assume that vθi(x)v_{\theta}^{i}(x) represents the expected value of the cost function. On each day, however, the agents can only receive a noised feedback v^θi\widehat{v}_{\theta}^{i} as an estimation. In the next section, we develop algorithms based on such noisy feedback.

4 Algorithm

We propose to update θ\theta and xx simultaneously to improve the computational efficiency. The game dynamics at the lower level is modeled using the mirror descent method. Specifically, at the stage kk, given θk\theta_{k} and xkx_{k}, the agent first receives vθki(xk)v^{i}_{\theta_{k}}(x_{k}) as the feedback. After receiving the feedback, the agents update their strategies via

xk+1i=argmaxxi𝒳i{vθki(xk),xi1/βkiDψi(xki,xi)},x^{i}_{k+1}=\mathop{\mathrm{argmax}}_{x^{i}\in{\mathcal{X}}^{i}}\bigl{\{}\langle v^{i}_{\theta_{k}}(x_{k}),x^{i}\rangle-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(x^{i}_{k},x^{i})\bigr{\}}, (4.1)

where Dψi(xki,xi)D_{\psi^{i}}(x^{i}_{k},x^{i}) is the Bregman divergence induced by a strongly convex function ψi\psi^{i}. The accurate value of f(θk)\nabla f_{*}(\theta_{k}), the gradient of its objective function, equals

θf(θk,x(θk))+[θx(θk)]xf(θk,x(θk)),\quad\nabla_{\theta}f\bigl{(}\theta_{k},x_{*}({\theta_{k}})\bigr{)}+\bigl{[}\nabla_{\theta}x_{*}({\theta_{k}})\bigr{]}^{\top}\cdot\nabla_{x}f\bigl{(}\theta_{k},x_{*}({\theta_{k}})\bigr{)},

which requires the exact lower-level equilibrium x(θk)x_{*}({\theta_{k}}). However, at the stage kk, we only have the current strategy xkx_{k}. Therefore, we also have to establish an estimator of x(θk)\nabla x_{*}(\theta_{k}) and f(θk)\nabla f_{*}(\theta_{k}) using xkx_{k}, the form of which will be specified later.

Remark 4.1.

The standard gradient descent method is double-loop because at each θk\theta_{k} it involves an inner loop for solving the exact value of x(θk)x_{*}(\theta_{k}) and then calculating the exact gradient.

4.1 Unconstrained Game

We first consider unconstrained games with 𝒳i=di{\mathcal{X}}^{i}=\mathbb{R}^{d^{i}}, for all i𝒩i\in\mathcal{N}. We select ψi()\psi^{i}(\cdot) as smooth function, i.e., there exists a constant Hψ1H_{\psi}\geq 1 such that for all i𝒩i\in{\mathcal{N}} and xi,xi𝒳ix^{i},x^{i\prime}\in{\mathcal{X}}^{i},

ψi(xi)ψi(xi)2Hψxixi2.\bigl{\|}\nabla\psi^{i}(x^{i})-\nabla\psi^{i}(x^{i\prime})\bigr{\|}_{2}\leq H_{\psi}\cdot\|x^{i}-x^{i\prime}\|_{2}. (4.2)

Example of ψi\psi^{i} satisfying this assumption include (but is not limited to) ψi(xi)=(xi)Qixi/2\psi^{i}(x^{i})=(x^{i})^{\top}Q^{i}x^{i}/2, where Qidi×diQ^{i}\in\mathbb{R}^{d^{i}}\times\mathbb{R}^{d^{i}} is a positive definite matrix. It can be directly checked that we can set Hψ=maxi𝒩δiH_{\psi}=\max_{i\in\mathcal{N}}\delta^{i}, where δi\delta^{i} is the largest singular value of QiQ^{i}. In this case, the corresponding Bregman divergence becomes Dψi(xi,xi)=(xixi)Qi(xixi)/2D_{\psi^{i}}(x^{i},x^{i\prime})=(x^{i}-x^{i\prime})^{\top}Q^{i}(x^{i}-x^{i\prime})/2, which is known as the squared Mahalanobis distance. Before laying out the algorithm, we first give the following lemma characterizing θx(θ)\nabla_{\theta}x_{*}(\theta).

Lemma 4.2.

When 𝒳i=di{\mathcal{X}}^{i}={\mathbb{R}}^{d^{i}} and xvθ(x(θ))\nabla_{x}v_{\theta}(x_{*}(\theta)) is non-singular, it holds that

θx(θ)=[xvθ(x(θ))]1θvθ(x(θ)).\displaystyle\nabla_{\theta}x_{*}(\theta)=-\Bigl{[}\nabla_{x}v_{\theta}\bigl{(}x_{*}(\theta)\bigr{)}\Bigr{]}^{-1}\cdot\nabla_{\theta}v_{\theta}\bigl{(}x_{*}(\theta)\bigr{)}.
Proof.

See Appendix B.2 for detailed proof. ∎

For any given θΘ\theta\in\Theta and x𝒳x\in{\mathcal{X}}, we define

~f(θ,x)=θf(θ,x)[θvθ(x)][xvθ(x)]1xf(θ,x).\displaystyle\widetilde{\nabla}f(\theta,x)=\nabla_{\theta}f(\theta,x)-\bigl{[}\nabla_{\theta}v_{\theta}(x)\bigr{]}^{\top}\cdot\bigl{[}\nabla_{x}v_{\theta}(x)\bigr{]}^{-1}\cdot\nabla_{x}f(\theta,x). (4.3)

Although we cannot obtain the exact value of f(θk)\nabla f_{*}(\theta_{k}), we may use ^f(θk,xk)\widehat{\nabla}f(\theta_{k},x_{k}) as a surrogate and update θk\theta_{k} based on ^f(θk,xk)\widehat{\nabla}f_{*}(\theta_{k},x_{k}) instead. Now we are ready to present the following bilevel incentive design algorithm for unconstrained games (see Algorithm 1).

Algorithm 1 Bilevel incentive design for unconstrained games
  Input: θ0Θ\theta_{0}\in\Theta, x0𝒳=dx_{0}\in{\mathcal{X}}={\mathbb{R}}^{d}, where d=i𝒩did=\sum_{i\in\mathcal{N}}d^{i}, sequence of step sizes (αk,{βki}i𝒩)(\alpha_{k},\{\beta^{i}_{k}\}_{i\in{\mathcal{N}}}). For k=0,1,k=0,1,\dots do:
  Update strategy profile
xk+1i=argmaxxi𝒳i{v^ki,xi1/βkiDψi(xki,xi)},\displaystyle x^{i}_{k+1}=\mathop{\mathrm{argmax}}_{x^{i}\in{\mathcal{X}}^{i}}\bigl{\{}\langle\widehat{v}^{i}_{k},x^{i}\rangle-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(x^{i}_{k},x^{i})\bigr{\}}, (4.4)
for all i𝒩i\in{\mathcal{N}}, where v^ki\widehat{v}^{i}_{k} is an estimator of vθki(xk)v^{i}_{\theta_{k}}(x_{k}).
  Update incentive parameter
θk+1=argmaxθΘ{~f(θk,xk+1),θ1/αkθθk22}.\displaystyle\theta_{k+1}=\mathop{\mathrm{argmax}}_{\theta\in\Theta}\bigl{\{}\langle\widetilde{\nabla}f(\theta_{k},x_{k+1}),\theta\rangle-{1/\alpha_{k}}\cdot\|\theta-\theta_{k}\|_{2}^{2}\bigr{\}}.
EndFor
  Output: Last iteration incentive parameter θk+1\theta_{k+1} and strategy profile xk+1x_{k+1}.

In Algorithm 1, if θk\theta_{k} and xkx_{k} converge to fixed points θ¯\bar{\theta} and x¯\bar{x}, respectively, then x¯=x(θ¯)\bar{x}=x_{*}(\bar{\theta}) is expected to be satisfied. Thus, we would also have ^f(θ¯,x¯)=f(θ¯).\widehat{\nabla}f(\bar{\theta},\bar{x})=\nabla f_{*}(\bar{\theta}). Thus, the optimality of θ¯\bar{\theta} can be then guaranteed. It implies that the algorithm would find the optimal solution if it converges. Instead, the difficult part is how to design appropriate step sizes that ensure convergence. In this paper, we provide such conditions in Section 5.1.

4.2 Simplex-Constrained Game

We then consider the case where for all i𝒩i\in{\mathcal{N}}, xix^{i} is constrained within the probability simplex

Δ([di])={xidi|xi0,(xi)𝟏di=1},\displaystyle\Delta\bigl{(}[d^{i}]\bigr{)}=\bigl{\{}x^{i}\in{\mathbb{R}}^{d^{i}}\,\big{|}\,x^{i}\geq 0,(x^{i})^{\top}{\bf 1}_{d^{i}}=1\bigr{\}},

where 𝟏didi{\bf 1}_{d^{i}}\in{\mathbb{R}}^{d^{i}} is the vector of all ones. Here we remark that any classic game-theoretic models are simplex-constrained. In fact, as long as the agent faces a finite number of choices and adopts a mixed strategy, its decision space would be a probability simplex [39]. In addition, some other types of decisions may also be constrained by a simplex. For example, financial investment concerns how to split the money on different assets. In such a scenario, the budget constraint can also be represented by a probability simplex.

In such a case, we naturally consider ψi(xi)=j[di][xi]jlog[xi]j\psi^{i}(x^{i})=-\sum_{j\in[d^{i}]}[x^{i}]_{j}\cdot\log[x^{i}]_{j}, which is the Shannon entropy. Such a choice gives the Bregman divergence Dψi(xi,xi)=j[di][xi]jlog([xi]j/[xi]j)D_{\psi^{i}}(x^{i},x^{i\prime})=\sum_{j\in[d^{i}]}[x^{i}]_{j}\cdot\log([x^{i\prime}]_{j}/[x^{i}]_{j}), which is known as the KL divergence. In this case, we still first need to characterize θx(θ)\nabla_{\theta}x_{*}(\theta), which also has an analytic form. Specifically, if we define a function hθ(x)=(hθi(xi))i𝒩h_{\theta}(x)=(h_{\theta}^{i}(x^{i}))_{i\in\mathcal{N}} that satisfies

hθi(xi)=argmaxxi𝒳i{vθi(xi),xi1/βkiDψi(xi,xi)},h_{\theta}^{i}(x^{i})=\mathop{\mathrm{argmax}}_{x^{\prime i}\in{\mathcal{X}}^{i}}\Bigl{\{}\bigl{\langle}v^{i}_{\theta}(x^{i}),x^{\prime i}\bigr{\rangle}-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(x^{i},x^{\prime i})\Bigr{\}},

then for any θ\theta, xi(θ)x^{i}_{*}(\theta) satisfies xi(θ)=hθ(xi(θ))x^{i}_{*}(\theta)=h_{\theta}(x^{i}_{*}(\theta)) [12]. Implicitly differentiating through this fixed point equation then yields x(θ)=θhθ(x(θ))(Ixhθ(x(θ)))1\nabla x_{*}(\theta)=\nabla_{\theta}h_{\theta}(x_{*}(\theta))\cdot(I-\nabla_{x}h_{\theta}(x_{*}(\theta)))^{-1}. Then, similar to (4.3), we may use

~f(θ,x)=θf(θ,x)θhθ(x)(Ixhθ(x))1xf(θ,x)\displaystyle\widetilde{\nabla}f(\theta,x)=\nabla_{\theta}f(\theta,x)-\nabla_{\theta}h_{\theta}(x)\cdot\bigl{(}I-\nabla_{x}h_{\theta}(x)\bigr{)}^{-1}\cdot\nabla_{x}f(\theta,x) (4.5)

to approximate the actual gradient f(θ)\nabla f_{*}(\theta) and then update θk\theta_{k} based on f(θk)\nabla f_{*}(\theta_{k}) instead.

Remark 4.3.

The mapping hθ(x)h_{\theta}(x) has an analytic expression, which reads

hθi(x)=xiexp(βkivθi(x))/xiexp(βkivθi(x))1.h_{\theta}^{i}(x)=x^{i}\cdot\exp\bigl{(}-\beta_{k}^{i}\cdot v_{\theta}^{i}(x)\bigr{)}\Big{/}\Bigl{\|}x^{i}\cdot\exp\bigl{(}-\beta_{k}^{i}\cdot v_{\theta}^{i}(x)\bigr{)}\Bigr{\|}_{1}.

Hence, both xhθ(x)\nabla_{x}h_{\theta}(x) and θhθ(x)\nabla_{\theta}h_{\theta}(x) can also be calculated analytically.

In addition to a different gradient estimate, we also modify Algorithm 1 to keep the iterations xkx_{k} from hitting the boundary at the early stage. The modification involves an additional step that mixes the strategy with a uniform strategy 𝟏di/di{\bf 1}_{d^{i}}/d^{i}, i.e., imposing an additional step

x~k+1i=(1νk+1)xk+1+νk+1𝟏di/di\displaystyle\widetilde{x}^{i}_{k+1}=(1-\nu_{k+1})\cdot x_{k+1}+\nu_{k+1}\cdot{\bf 1}_{d^{i}}/d^{i}

upon finishing the update (4.4), where νk+1(0,1)\nu_{k+1}\in(0,1) is a the mixing parameter, decreasing to 0 when kk\to\infty. In the following, we give the formal presentation of the modified bilevel incentive design algorithm for simplex-constrained games (see Algorithm 2).

Algorithm 2 Bilevel incentive design for simplex constrained games
  Input: θ0Θ\theta_{0}\in\Theta, x0𝒳x_{0}\in{\mathcal{X}}, step sizes (αk,{βki}i𝒩),k0(\alpha_{k},\{\beta_{k}^{i}\}_{i\in{\mathcal{N}}}),k\geq 0, and mixing parameters νk,k0\nu_{k},k\geq 0. For k=0,1,k=0,1,\dots do:
  Update strategy profile
xk+1i\displaystyle{x}^{i}_{k+1} =argmaxxiΔ([di]){v^ki,xi1/βkiDψi(x~ki,xi)},\displaystyle=\mathop{\mathrm{argmax}}_{x^{i}\in\Delta([d^{i}])}\bigl{\{}\langle\widehat{v}^{i}_{k},x^{i}\rangle-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(\widetilde{x}^{i}_{k},x^{i})\bigr{\}},
x~k+1i\displaystyle\widetilde{x}^{i}_{k+1} =(1νk)xk+1+νk𝟏di/di,\displaystyle=(1-\nu_{k})\cdot{x}_{k+1}+\nu_{k}\cdot{\bf 1}_{d^{i}}/d^{i}, (4.6)
for all i𝒩i\in{\mathcal{N}}, where v^ki\widehat{v}^{i}_{k} is an estimator of vθki(x~k)v^{i}_{\theta_{k}}(\widetilde{x}_{k}).
  Update incentive parameter
θk+1argmaxθΘ{~f(θk,x~k+1),θ1/αkθθk22}.\displaystyle\theta_{k+1}\leftarrow\mathop{\mathrm{argmax}}_{\theta\in\Theta}\bigl{\{}\langle\widetilde{\nabla}f(\theta_{k},\widetilde{x}_{k+1}),\theta\rangle-{1/\alpha_{k}}\cdot\|\theta-\theta_{k}\|_{2}^{2}\bigr{\}}.
EndFor
  Output: Last iteration incentive parameter θk+1\theta_{k+1} and strategy profile xk+1x_{k+1}.

Similar to Algorithm 1, at the core of the convergence of Algorithm 2 is still the step size. This case is even more complicated, as we need to design αk\alpha_{k}, βk\beta_{k}, and νk\nu_{k} at the same time. In this paper, a provably convergent scheme is provided in Section 5.2.

Before closing this section, we remark that the algorithm can be easily adapted to other types of constraints by using another hθ(x)h_{\theta}(x) to model the game dynamics. Particularly, the projected gradient descent dynamics has very broad applicability. In this case, the algorithm for calculating θhθ(x)\nabla_{\theta}h_{\theta}(x) and xhθ(x)\nabla_{x}h_{\theta}(x) is given by, for example, Amos and Kolter [2]. The additional step (4.6) then becomes unnecessary as it is dedicated to simplex constraints.

5 Convergence Analysis

In this section, we study the convergence of the proposed algorithms. For simplicity, define D¯ψ(x,x)=i𝒩Dψi(xi,xi).\overline{D}_{\psi}\bigl{(}x,x^{\prime}\bigr{)}=\sum_{i\in{\mathcal{N}}}D_{\psi^{i}}\bigl{(}x^{i},x^{i\prime}\bigr{)}. We make the following assumptions.

Assumption 5.1.

The lower-level problem in (3.4) satisfies the following conditions.

  • The strategy set 𝒳i{\mathcal{X}}^{i} of agent ii is a nonempty, compact, and convex subset of di{\mathbb{R}}^{d^{i}}.

  • For each i𝒩i\in{\mathcal{N}}, the gradient vθi()v_{\theta}^{i}(\cdot) is HuH_{u}-Lipschitz continuous with respect to D¯ψ\overline{D}_{\psi}, i.e., for all i𝒩i\in{\mathcal{N}} and x,x𝒳x,x^{\prime}\in{\mathcal{X}}, vθi(x)vθi(x)2Hu2D¯ψ(x,x)\|v_{\theta}^{i}(x)-v_{\theta}^{i}(x^{\prime})\|^{2}_{*}\leq H^{2}_{u}\cdot\overline{D}_{\psi}(x,x^{\prime}).

  • There exist constants ρθ,ρx>0\rho_{\theta},\rho_{x}>0 such that for all x𝒳x\in{\mathcal{X}} and θΘ\theta\in\Theta, θvθ(x)2<ρθ\|\nabla_{\theta}v_{\theta}(x)\|_{2}<\rho_{\theta}, and [xvθ(x)]121/ρx\|[\nabla_{x}v_{\theta}(x)]^{-1}\|_{2}\leq 1/\rho_{x}.

  • For all θΘ\theta\in\Theta, the equilibrium x(θ)x_{*}(\theta) of the game is strongly stable with respect to D¯ψ\overline{D}_{\psi}, i.e., for all x𝒳x\in\mathcal{X}, i𝒩λivθi(x),xi(θ)xiD¯ψ(x(θ),x)\sum_{i\in{\mathcal{N}}}\lambda^{i}\cdot\langle v^{i}_{\theta}(x),x^{i}_{*}(\theta)-x^{i}\rangle\geq\overline{D}_{\psi}(x_{*}(\theta),x).

Assumption 5.2.

The upper-level problem in (3.4) satisfies the following properties.

  • The set Θ\Theta is compact and convex. The function f(θ)f_{*}(\theta) is μ\mu-strongly convex and f(θ)\nabla f_{*}(\theta) has 22-norm uniformly bounded by MM.

  • The extended gradient ~f(θ,x)\widetilde{\nabla}f(\theta,x) is H~\widetilde{H}-Lipschitz continuous with respect to D¯ψ\overline{D}_{\psi}, i.e., for all x,x𝒳x,x^{\prime}\in{\mathcal{X}}, ~f(θ,x)~f(θ,x)22H~2D¯ψ(x,x)\|\widetilde{\nabla}f(\theta,x)-\widetilde{\nabla}f(\theta,x^{\prime})\|^{2}_{2}\leq\widetilde{H}^{2}\cdot\overline{D}_{\psi}(x,x^{\prime}).

Assumption 5.3.

Define the filtration by 0θ={θ0}\mathcal{F}_{0}^{\theta}=\{\theta_{0}\}, 0x=\mathcal{F}_{0}^{x}=\varnothing, kθ=k1θ{xk1,θk}\mathcal{F}_{k}^{\theta}=\mathcal{F}_{k-1}^{\theta}\cup\{x_{k-1},\theta_{k}\}, and kx=k1x{θk,xk}.\quad\mathcal{F}_{k}^{x}=\mathcal{F}_{k-1}^{x}\cup\{\theta_{k},x_{k}\}. We impose the following assumptions.

  • The feedback v^k\widehat{v}_{k} is an unbiased estimate, i.e., for all i𝒩i\in{\mathcal{N}}, we have 𝔼[v^ki|kx]=vθki(xk)\mathbb{E}[\widehat{v}^{i}_{k}\,|\,\mathcal{F}_{k}^{x}]=v^{i}_{\theta_{k}}(x_{k}).

  • The feedback v^k\widehat{v}_{k} has bounded mean squared estimation errors, i.e., there exists δu>0\delta_{u}>0 such that 𝔼[v^kivθki(xk)2|kx]δu2\mathbb{E}[\|\widehat{v}^{i}_{k}-v^{i}_{\theta_{k}}(x_{k})\|^{2}_{*}\,|\,\mathcal{F}_{k}^{x}]\leq\delta_{u}^{2} for all i𝒩i\in{\mathcal{N}}.

Below we discuss when the proposed assumptions hold, and if they are violated, how would our algorithm works. Assumption 5.1 includes the condition that x(θ)x_{*}(\theta) is strongly stable. In this case, it is the unique Nash equilibrium of the game [34]. It is also a common assumption in the analysis of the mirror descent dynamics itself [13]. We provide sufficient conditions for checking strong stability in Appendix B.1. We refer the readers to Section 6 for an explanation of how to extend our algorithm when this assumption is violated. Assumption 5.2 includes the convexity of the upper-level problem, which is usually a necessary condition to ensure global convergence. Yet, without convexity, our algorithm can still converge to a local minimum. Assumption 5.3 becomes unnecessary if we simply assume that the environment is deterministic. In this case, the accurate value of vθi(x)v_{\theta}^{i}(x) is available. Yet, if the noises are added to the feedback, assuming that the noisy feedback is unbiased and bounded is still reasonable.

5.1 Unconstrained Game

In this part, we establish the convergence guarantee of Algorithm 1 for unconstrained games. We define the optimality gap ϵkθ\epsilon^{\theta}_{k} and the equilibrium gap ϵk+1x\epsilon^{x}_{k+1} as

ϵkθ:=𝔼[θkθ22],ϵk+1x:=𝔼[D¯ψ(x(θk),xk+1)].\displaystyle\epsilon^{\theta}_{k}:=\mathbb{E}\bigl{[}\|\theta_{k}-\theta_{*}\|_{2}^{2}\bigr{]},\quad\epsilon_{k+1}^{x}:=\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k+1}\bigr{)}\Bigr{]}.

We track such two gaps as the convergence criteria in the subsequent results.

Theorem 5.4.

For Algorithm 1, set the step sizes αk=α/(k+1)\alpha_{k}=\alpha/(k+1), βk=β/(k+1)2/3\beta_{k}=\beta/(k+1)^{2/3}, and βki=λiβk\beta_{k}^{i}=\lambda^{i}\cdot\beta_{k} with constants α>0\alpha>0 and β>0\beta>0 satisfying

β1/NHu2λ22,α/β3/21/12HψH~H,\displaystyle\beta\leq 1/N\cdot H_{u}^{2}\|\lambda\|_{2}^{2},\quad\alpha/\beta^{3/2}\leq 1/12\cdot H_{\psi}\widetilde{H}H_{*},

where H=ρθ/ρxH_{*}=\rho_{\theta}/\rho_{x}. Suppose that Assumptions 5.1-5.3 hold, then we have

ϵkθ=O(k2/3),ϵkx=O(k2/3).\displaystyle\epsilon_{k}^{\theta}=O(k^{-2/3}),\quad\epsilon_{k}^{x}=O(k^{-2/3}).
Proof.

See Appendix C for detailed proof and a detailed expression of convergence rates. ∎

5.2 Simplex-Constrained Game

In this part, we establish the convergence guarantee of Algorithm 2 for simplex constrained games. We still define optimality gap ϵkθ\epsilon^{\theta}_{k} as ϵkθ=𝔼[θkθ22].\epsilon^{\theta}_{k}=\mathbb{E}\bigl{[}\|\theta_{k}-\theta_{*}\|_{2}^{2}\bigr{]}. Yet, corresponding to (4.6), we track ϵ~k+1x\widetilde{\epsilon}_{k+1}^{x} as a measure of convergence for the strategies of the agents, which is defined as

ϵ~k+1x=𝔼[D¯ψ(x~(θk),x~k+1)],\displaystyle\widetilde{\epsilon}_{k+1}^{x}=\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k+1}\bigr{)}\Bigr{]},

where x~(θk)=(1νk)x(θk)+νk𝟏/di.\widetilde{x}_{*}(\theta_{k})=(1-\nu_{k})\cdot{x}_{*}(\theta_{k})+\nu_{k}\cdot{\bf 1}/d^{i}. We are now ready to give the convergence guarantee of Algorithm 2.

Theorem 5.5.

For Algorithm 2, set the step sizes αk=α/(k+1)1/2\alpha_{k}=\alpha/(k+1)^{1/2}, βk=β/(k+1)2/7\beta_{k}=\beta/(k+1)^{2/7}, βki=λiβk\beta_{k}^{i}=\lambda^{i}\cdot\beta_{k}, and νk=ν/(k+1)4/7\nu_{k}=\nu/(k+1)^{4/7} with constants α>0\alpha>0 and β>0\beta>0 satisfying

β1/6NHu2λ22,α/β3/21/7H~H~,\displaystyle\beta\leq 1/6\cdot NH_{u}^{2}\|\lambda\|_{2}^{2},\quad\alpha/\beta^{3/2}\leq 1/7\cdot\widetilde{H}\widetilde{H}_{*},

where H~=(1+d)ρθ/ρx\widetilde{H}_{*}=(1+d)\rho_{\theta}/\rho_{x}. Suppose that Assumptions 5.1-5.3 hold. If there exists some constant V>0V_{*}>0 such that vθ(x(θ))V\|v_{\theta}(x_{*}(\theta))\|_{\infty}\leq V_{*} for any θΘ\theta\in\Theta, we then have

ϵkθ=O(k2/7),ϵ~kx=O(k2/7).\displaystyle\epsilon_{k}^{\theta}={O}(k^{-2/7}),\quad\widetilde{\epsilon}_{k}^{x}={O}(k^{-2/7}).
Proof.

See Appendix D for detailed proof and a detailed form of the convergence rates. ∎

6 Extensions to Games with Multiple Equilibria

We then briefly discuss how to apply our algorithms when the lower-level game has multiple equilibria.

Case I: If the function vθ(x)=(vθi(x))iv_{\theta}(x)=(v_{\theta}^{i}(x))_{i} is strongly monotone in the neighbourhood of each equilibrium, then all equilibria are strongly stable in a neighbourhood and hence isolated [38]. In this case, our algorithms can be directly applied as xvθ(x)\nabla_{x}v_{\theta}(x) is non-singular in these neighborhoods. It is commonly believed that the most likely equilibrium is the one reached by the game dynamics [52]; our algorithm naturally converges to this one.

Case II: If the function vθ(x)=(vθi(x))iv_{\theta}(x)=(v_{\theta}^{i}(x))_{i} is monotone but not strongly monotone, then the equilibrium set is a convex and closed region [34]. This case is challenging, as the matrix xvθ(x)\nabla_{x}v_{\theta}(x) needed to be inverted would become singular. Nevertheless, we can simply assume the agents are bounded rational [42, 1, 33]. The bounded rationality would result in a quantal response equilibrium for predicting the agents’ response. We refer the readers to Appendix F for a detailed explanation (with numerical examples for illustration). Here we briefly sum up the key takeaways: (1) it is equivalent to add a regularizer ηi(log(xi+ϵ)+1)\eta^{i}\cdot(\log(x^{i}+\epsilon)+1) to vθi(x)v_{\theta}^{i}(x) for some ηi>0\eta^{i}>0 and ϵ0\epsilon\geq 0; (2) as long as θ>0\theta>0, the strong stability condition in Assumption 5.1 would then be satisfied, hence a unique equilibrium would exist; (3) as long as ϵ>0\epsilon>0, the Lipschitz continuous condition in Assumption 5.1 will also not be violated. In a nutshell, the bounded rationality assumption can simultaneously make our model more realistic and satisfy the assumptions in Section 5.

7 Numerical Experiments

In this section, we conduct two numerical experiments to test our algorithms. All numerical results reported in this section were produced either on a MacBook Pro (15-inch, 2017) with 2.9 GHz Quad-Core Intel Core i7 CPU.

7.1 Pollution Control via Emission Tax

We first consider the oligopoly model introduced in Example 3.1. We assume that producing aia_{i} units of output, firm ii would generate ei=diaie_{i}=d^{i}a_{i} units of emissions. We consider the following social welfare function [44]

W(a)=0i=1nai(p0γq)dqi=1nciqiτi=1ndiai,W(a)=\int_{0}^{\sum_{i=1}^{n}a^{i}}(p_{0}-\gamma\cdot q)\mathop{}\!\mathrm{d}q-\sum_{i=1}^{n}c^{i}\cdot q^{i}-\tau\cdot\sum_{i=1}^{n}d^{i}a^{i},

where the first term is the consumers’ surplus, the second term is the total production cost, and the third term is the social damage caused by pollution. To maximize social welfare, an authority can impose emission taxes on the productions. Specifically, whenever producing aia_{i} units of output, firm ii could be charged πiai\pi^{i}\cdot a^{i}, where πi\pi^{i} is specialized for the firm ii. In the experiment, we set n=100n=100, p0=100p_{0}=100, γ=1\gamma=1, τ=10\tau=10, and di=10exp(ci)+ϵid^{i}=10-\exp(c^{i})+\epsilon^{i}, where {ci}i=1100\{c^{i}\}_{i=1}^{100} are evenly spaced between 1 and 2 and {ϵi}i=1100\{\epsilon^{i}\}_{i=1}^{100} are the white noises. Under this setting, cic_{i} and eie_{i} are negatively correlated, which is realistic because if a firm hopes to reduce their pollution by updating their emission control systems, the production cost must be increased accordingly.

Through this small numerical example, we hope to illustrate that the single-loop scheme developed in this paper is indeed much more efficient than previous double-loop algorithms. To this end, we compare our algorithm with two double-loop schemes proposed by Li et al. [25]. In both approaches, the lower-level equilibrium problem is solved exactly first at each iteration. But afterwards, the upper-level gradients are respectively obtained via automatic differentiation (AD) and implicit differentiation (ID). To make a fair comparison, the same hyperparameters — including the initial solutions, the learning rates, and the tolerance values for both upper- and lower-level problems — are employed for the tested algorithms (double-loop AD, double-loop ID, and our algorithm).

Table 1 reports statistics related to the computation performance, including the total CPU time, the total iteration number, and the CPU time per iteration. The results reveal that all tested algorithm take a similar number of iterations to reach the same level of precision. However, the running time per iteration required by our algorithm is significantly lower than the two double-loop approaches. Hence, in general, our scheme is more efficient.

Table 1: Performance of the tested algorithms for solving the emission tax design problem.
Method Double-loop AD Double-loop ID Our algorithm
Total CPU time (s) 71.56 11.29 2.09
Total iteration number 519 626 813
Time per iteration (ms) 137.89 18.04 2.57

7.2 Second-Best Congestion Pricing

We then consider the routing game model introduced in Example 3.2. To minimize the total travel delay, an authority on behalf of the public sector could impose appropriate tolls on selected roads [49]. This problem of determining tolls is commonly known as the congestion pricing problem. The second-best scheme assumes that only a subset of links can be charged [48]. Specifically, we write π+||\pi\in\mathbb{R}_{+}^{|\mathcal{E}|} as the toll imposed on all the links and toll\mathcal{E}_{\text{toll}} as the set of tollable links. We model total cost for a traveler selecting a path a𝒜a\in\mathcal{A} as

cia(π,q)=e(te(xe)+πe)δeia+η(log(qia)+1),c^{ia}(\pi,q)=\sum_{e\in\mathcal{E}}\bigl{(}t^{e}(x^{e})+\pi^{e}\bigr{)}\cdot\delta^{eia}+\eta\cdot\bigl{(}\log(q^{ia})+1\bigr{)},

where we add an extra term (log(qia)+1)(\log(q^{ia})+1) to characterize the uncertainties in travelers’ route choices [20, 5]. It results in a quantal response equilibrium, as discussed in Section 6. We test our algorithm on a real-world traffic network: the Sioux-Falls network (See Lawphongpanich and Hearn [24] for its structure). We select 20 links (11, 35, 32, 68, 46, 21, 65, 52, 71, 74, 33, 64, 69, 14, 18, 39, 57, 48, 15, 51) for imposing congestion tolls.

We run Algorithm 2 to solve the problem and compare 4 different settings on the selection of step sizes. Setting A: αk=α/(k+1)1/2\alpha_{k}=\alpha/(k+1)^{1/2}, βk=β/(k+1)2/7\beta_{k}=\beta/(k+1)^{2/7}, and νk=ν/(k+1)4/7\nu_{k}=\nu/(k+1)^{4/7}. It ensures convergence according to Theorem 5.5. Setting B: αk=α/(k+1)1/2\alpha_{k}=\alpha/(k+1)^{1/2}, βk=β/(k+1)1\beta_{k}=\beta/(k+1)^{1}, and νk=ν/(k+1)4/7\nu_{k}=\nu/(k+1)^{4/7}. It only increases the decreasing rate of the step size in the inner loop. Setting C: αk=α/(k+1)1\alpha_{k}=\alpha/(k+1)^{1}, βk=β/(k+1)1\beta_{k}=\beta/(k+1)^{1}, and νk=ν/(k+1)1\nu_{k}=\nu/(k+1)^{1}. It assumes that all step sizes decrease based on the classic O(1/k)O(1/k) rate. Setting D: αk=α/(k+1)1/2\alpha_{k}=\alpha/(k+1)^{1/2}, βk=β/(k+1)2/7\beta_{k}=\beta/(k+1)^{2/7}, and νk=0\nu_{k}=0. It does not adopt the mixing step proposed in our paper. We add white noise to the costs received by the agents based on Gaussian distributions and run our algorithm under each setting with 10 times. The mean value of upper-level optimality gaps and lower-level equilibrium gaps are reported in Figure 1 (the shadowed areas are plotted based on “mean ±\pm std” area over all sampled trajectories).

Refer to caption

Figure 1: Comparison across different algorithms and step sizes when solving the second-best congestion pricing problem.

Below we summarize a few observations. First, without the mixing step proposed in our paper, the algorithm does hit the boundary too early. The algorithm fails after just a few iterations. It verifies our earlier claim that directly extending previous methods [21, 9] designed for bilevel optimization problems to MPECs is problematic. Second, the step size given by Theorem 5.5 ensures the fastest and the most stable convergence.

References

  • Ahmed et al. [2019] Ahmed, Z., Le Roux, N., Norouzi, M. and Schuurmans, D. (2019). Understanding the impact of entropy on policy optimization. In International conference on machine learning. PMLR.
  • Amos and Kolter [2017] Amos, B. and Kolter, J. Z. (2017). Optnet: Differentiable optimization as a layer in neural networks. In Proceedings of the 34th International Conference on Machine Learning.
  • Anandalingam and Friesz [1992] Anandalingam, G. and Friesz, T. L. (1992). Hierarchical optimization: An introduction. Annals of Operations Research 34 1–11.
  • Bai et al. [2019] Bai, S., Kolter, J. Z. and Koltun, V. (2019). Deep equilibrium models. In Advances in Neural Information Processing Systems.
  • Bar-Gera and Boyce [1999] Bar-Gera, H. and Boyce, D. (1999). Route flow entropy maximization in origin-based traffic assignment. In 14th International Symposium on Transportation and Traffic TheoryTransportation Research Institute.
  • Björnerstedt and Weibull [1994] Björnerstedt, J. and Weibull, J. W. (1994). Nash equilibrium and evolution by imitation. Tech. rep., IUI Working Paper.
  • Bracken and McGill [1973] Bracken, J. and McGill, J. T. (1973). Mathematical programs with optimization problems in the constraints. Operations Research 21 37–44.
  • Buchanan and Stubblebine [1962] Buchanan, J. M. and Stubblebine, W. C. (1962). Externality. In Classic papers in natural resource economics. Springer, 138–154.
  • Chen et al. [2021] Chen, T., Sun, Y. and Yin, W. (2021). A single-timescale stochastic bilevel optimization method. arXiv preprint arXiv:2102.04671 .
  • Colson et al. [2007] Colson, B., Marcotte, P. and Savard, G. (2007). An overview of bilevel optimization. Annals of operations research 153 235–256.
  • Dafermos [1980] Dafermos, S. (1980). Traffic equilibrium and variational inequalities. Transportation Science 14 42–54.
  • Dafermos [1983] Dafermos, S. (1983). An iterative scheme for variational inequalities. Mathematical Programming 26 40–47.
  • D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I. and Mitliagkas, I. (2021). Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. arXiv preprint arXiv:2110.15412 .
  • Finn et al. [2017] Finn, C., Abbeel, P. and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR.
  • Franceschi et al. [2017] Franceschi, L., Donini, M., Frasconi, P. and Pontil, M. (2017). Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR.
  • Friedman and Friedman [1990] Friedman, M. and Friedman, R. (1990). Free to choose: A personal statement. Houghton Mifflin Harcourt.
  • Friesz et al. [1990] Friesz, T. L., Tobin, R. L., Cho, H.-J. and Mehta, N. J. (1990). Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality constraints. Mathematical Programming 48 265–284.
  • Greenwald and Stiglitz [1986] Greenwald, B. C. and Stiglitz, J. E. (1986). Externalities in economies with imperfect information and incomplete markets. The quarterly journal of economics 101 229–264.
  • Harker and Pang [1988] Harker, P. T. and Pang, J.-S. (1988). Existence of optimal solutions to mathematical programs with equilibrium constraints. Operations research letters 7 61–64.
  • Hazelton [1998] Hazelton, M. L. (1998). Some remarks on stochastic user equilibrium. Transportation Research Part B: Methodological 32 101–108.
  • Hong et al. [2020] Hong, M., Wai, H.-T., Wang, Z. and Yang, Z. (2020). A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic. arXiv preprint arXiv:2007.05170 .
  • Jiang et al. [2021] Jiang, H., Chen, Z., Shi, Y., Dai, B. and Zhao, T. (2021). Learning to defend by learning to attack. In International Conference on Artificial Intelligence and Statistics. PMLR.
  • Krichene et al. [2015] Krichene, W., Krichene, S. and Bayen, A. (2015). Convergence of mirror descent dynamics in the routing game. In 2015 European Control Conference (ECC). IEEE.
  • Lawphongpanich and Hearn [2004] Lawphongpanich, S. and Hearn, D. W. (2004). An mpec approach to second-best toll pricing. Mathematical programming 101 33–55.
  • Li et al. [2020] Li, J., Yu, J., Nie, Y. M. and Wang, Z. (2020). End-to-end learning and intervention in games. Advances in Neural Information Processing Systems 33.
  • Li et al. [2022] Li, J., Yu, J., Wang, Q., Liu, B., Wang, Z. and Nie, Y. M. (2022). Differentiable bilevel programming for stackelberg congestion games. arXiv preprint arXiv:2209.07618 .
  • Liu et al. [2018] Liu, H., Simonyan, K. and Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 .
  • Liu et al. [2021] Liu, R., Gao, J., Zhang, J., Meng, D. and Lin, Z. (2021). Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. arXiv preprint arXiv:2101.11517 .
  • Luketina et al. [2016] Luketina, J., Berglund, M., Greff, K. and Raiko, T. (2016). Scalable gradient-based tuning of continuous regularization hyperparameters. In International conference on machine learning. PMLR.
  • Luo et al. [1996] Luo, Z.-Q., Pang, J.-S. and Ralph, D. (1996). Mathematical programs with equilibrium constraints. Cambridge University Press.
  • Maclaurin et al. [2015] Maclaurin, D., Duvenaud, D. and Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR.
  • Maskin [1994] Maskin, E. S. (1994). The invisible hand and externalities. The American Economic Review 84 333–337.
  • Melo [2021] Melo, E. (2021). On the uniqueness of quantal response equilibria and its application to network games. Economic Theory 1–45.
  • Mertikopoulos and Zhou [2019] Mertikopoulos, P. and Zhou, Z. (2019). Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming 173 465–507.
  • Metcalf and Weisbach [2009] Metcalf, G. E. and Weisbach, D. (2009). The design of a carbon tax. Harv. Envtl. L. Rev. 33 499.
  • Metz et al. [2016] Metz, L., Poole, B., Pfau, D. and Sohl-Dickstein, J. (2016). Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163 .
  • Mguni et al. [2019] Mguni, D., Jennings, J., Sison, E., Valcarcel Macua, S., Ceppi, S. and Munoz de Cote, E. (2019). Coordinating the crowd: Inducing desirable equilibria in non-cooperative systems. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems.
  • Nagurney [2013] Nagurney, A. (2013). Network economics: A variational inequality approach, vol. 10. Springer Science & Business Media.
  • Nash [1951] Nash, J. (1951). Non-cooperative games. Annals of mathematics 286–295.
  • Nemirovskij and Yudin [1983] Nemirovskij, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience publication, Wiley.
  • Parise and Ozdaglar [2017] Parise, F. and Ozdaglar, A. (2017). Sensitivity analysis for network aggregative games. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE.
  • Prashker and Bekhor [2004] Prashker, J. N. and Bekhor, S. (2004). Route choice models used in the stochastic user equilibrium problem: a review. Transport reviews 24 437–463.
  • Rajeswaran et al. [2019] Rajeswaran, A., Finn, C., Kakade, S. and Levine, S. (2019). Meta-learning with implicit gradients. arXiv preprint arXiv:1909.04630 .
  • Requate [1993] Requate, T. (1993). Pollution control in a cournot duopoly via taxes or permits. Journal of Economics 58 255–291.
  • Schmeidler [1973] Schmeidler, D. (1973). Equilibrium points of nonatomic games. Journal of statistical Physics 7 295–300.
  • Scutari et al. [2010] Scutari, G., Palomar, D. P., Facchinei, F. and Pang, J.-S. (2010). Convex optimization, game theory, and variational inequality theory. IEEE Signal Processing Magazine 27 35–49.
  • Smith [1791] Smith, A. (1791). An Inquiry into the Nature and Causes of the Wealth of Nations, vol. 1. Librito Mondi.
  • Verhoef [2002] Verhoef, E. T. (2002). Second-best congestion pricing in general networks. heuristic algorithms for finding second-best optimal toll levels and toll points. Transportation Research Part B: Methodological 36 707–729.
  • Vickrey [1969] Vickrey, W. S. (1969). Congestion theory and transport investment. The American Economic Review 59 251–260.
  • von Stackelberg [1952] von Stackelberg, H. (1952). The theory of the market economy. Oxford University Press.
  • Wardrop [1952] Wardrop, J. G. (1952). Some theoretical aspects of road traffic research. In Proceedings of the Institution of Civil Engineers, vol. 1.
  • Weibull [1997] Weibull, J. W. (1997). Evolutionary game theory. MIT press.
  • Wu et al. [2018] Wu, Y., Ren, M., Liao, R. and Grosse, R. (2018). Understanding short-horizon bias in stochastic meta-optimization. arXiv preprint arXiv:1803.02021 .

Appendix A Notations

We collect the most frequently used notations here.

Category Symbol Definition
Game (lower level) 𝒩{\mathcal{N}} set of all NN agents
xi𝒳idix^{i}\in{\mathcal{X}}^{i}\subseteq\mathbb{R}^{d^{i}} agent ii’s strategy
ui:𝒳iu^{i}:{\mathcal{X}}^{i}\to\mathbb{R} reward function of agent ii
vi:𝒳idiv^{i}:{\mathcal{X}}^{i}\to\mathbb{R}^{d^{i}} xiui\nabla_{x^{i}}u^{i}, unit reward of agent ii
λi+\lambda^{i}\in\mathbb{R}_{+} variational stability coefficient of agent ii
uθi,vθiu_{\theta}^{i},v_{\theta}^{i} incentivized reward and unit reward
Incentive designer (upper level) θΘ\theta\in\Theta incentive distributed to the agents
f:𝒳×Θf:{\mathcal{X}}\times\Theta\to\mathbb{R} objective of the incentive designer
xS(θ)x_{*}\in S(\theta) equilibrium induced by incentive θ\theta
f:Θf_{*}:\Theta\to\mathbb{R} designer’s objective value under xS(θ)x_{*}\in S(\theta)
Algorithm (kk-th iteration) DψiD_{\psi^{i}} Bregman divergence in agent ii’s strategy update
βki\beta_{k}^{i} equilibrium learning rate for agent ii
αk\alpha_{k} incentive learning rate for the designer
v^ki\widehat{v}_{k}^{i} estimate of incentivized unit cost vθki(xki)v_{\theta_{k}}^{i}(x_{k}^{i})
~f\widetilde{\nabla}{f} approximate incentive gradient
hθkih_{\theta_{k}}^{i} equilibrium learning dynamic
x~k+1i\widetilde{x}^{i}_{k+1} agent ii’s strategy mixed with uniform strategy
Analysis (assumptions) HuH_{u} Lipschitz continuity coefficient of vθiv_{\theta}^{i}
ρθ\rho_{\theta} spectral upper bound of θvθ\nabla_{\theta}v_{\theta}
ρx\rho_{x} spectral lower bound of xvθ\nabla_{x}v_{\theta}
μ\mu strong convexity coefficient of ff_{*}
MM 2\ell_{2} upper bound of f\nabla f_{*}
H~\widetilde{H} Lipschitz continuity coefficient of ~f\widetilde{\nabla}f
δu\delta_{u} estimation MSE upper bound of v^ki\widehat{v}_{k}^{i}

Appendix B Detailed Discussions on the Games

In this section, we present detailed discussions on the properties of the games.

B.1 Strong Stability

We establish the following sufficient condition for strong stability.

Lemma B.1.

Define the matrix Hλ(x)H^{\lambda}(x) as a block matrix with (i,j)(i,j)-th block taking the form of

Hi,jλ(x)=λixjvi(x).\displaystyle H^{\lambda}_{i,j}(x)=\lambda^{i}\cdot\nabla_{x^{j}}v^{i}(x).

Suppose that ψi\psi^{i} satisfies the smooth condition (4.2). If Hλ(x)+Hλ(x)2HψIdH^{\lambda}(x)+H^{\lambda}(x)^{\top}\prec-2\cdot H_{\psi}\cdot I_{d} for some λ+N\lambda\in{\mathbb{R}}_{+}^{N}, then the Nash equilibrium xx_{*} is λ\lambda-strongly stable with respect to the Bregman divergence D¯ψ\overline{D}_{\psi}.

Proof.

We define the λ\lambda-weighted gradient of the game as

vλ(x)=Diag(λ1Id1,,λNIdN)v(x).\displaystyle v^{\lambda}(x)={\rm Diag}(\lambda^{1}\cdot I_{d^{1}},\dots,\lambda^{N}\cdot I_{d^{N}})\cdot v(x).

By definition, we can verify that Hλ(x)H^{\lambda}(x) is the Jacobian matrix of vλ(x)v^{\lambda}(x) with respect to xx. For any x,x𝒳x,x^{\prime}\in\mathcal{X}, let xω=ωx+(1ω)xx_{\omega}=\omega\cdot x+(1-\omega)\cdot x^{\prime}. We have

vλ(x)vλ(x)\displaystyle v^{\lambda}(x)-v^{\lambda}(x^{\prime}) =01dvλ(xω)dωdω\displaystyle=\int_{0}^{1}\frac{{\mathrm{d}}v^{\lambda}(x_{\omega})}{{\mathrm{d}}\omega}\cdot{\mathrm{d}}\omega
=01Hλ(xω)dxωdωdω=01Hλ(xω)(xx)dω.\displaystyle=\int_{0}^{1}H^{\lambda}(x_{\omega})\cdot\frac{{\mathrm{d}}x_{\omega}}{{\mathrm{d}}\omega}\cdot{\mathrm{d}}\omega=\int_{0}^{1}H^{\lambda}(x_{\omega})(x-x^{\prime})\cdot{\mathrm{d}}\omega. (B.1)

Taking additional dot product ,xx\langle\,\cdot\,,x-x^{\prime}\rangle on both sides of (B.1), we have

vλ(x)vλ(x),xx=1/201(xx)(Hλ(xω)+Hλ(xω))(xx)dω01Hψxx22dω=Hψxx22,\begin{split}\bigl{\langle}v^{\lambda}(x)-v^{\lambda}(x^{\prime}),x-x^{\prime}\bigr{\rangle}&=1/2\cdot\int_{0}^{1}(x-x^{\prime})^{\top}\bigl{(}H^{\lambda}(x_{\omega})+H^{\lambda}(x_{\omega})^{\top}\bigr{)}(x-x^{\prime})\cdot{\mathrm{d}}\omega\\ &\leq-\int_{0}^{1}H_{\psi}\cdot\|x-x^{\prime}\|_{2}^{2}\cdot{\mathrm{d}}\omega=-H_{\psi}\cdot\|x-x^{\prime}\|_{2}^{2},\end{split} (B.2)

where the inequality follows from Hλ(xω)+Hλ(xω)2IdH^{\lambda}(x_{\omega})+H^{\lambda}(x_{\omega})^{\top}\preceq-2\cdot I_{d}. Setting x=xx^{\prime}=x_{*}, we further have

vλ(x),xxvλ(x),xxHψxx22Hψxx22,\bigl{\langle}v^{\lambda}(x),x-x_{*}\bigr{\rangle}\leq\bigl{\langle}v^{\lambda}(x_{*}),x-x_{*}\bigr{\rangle}-H_{\psi}\cdot\|x-x_{*}\|_{2}^{2}\leq-H_{\psi}\cdot\|x-x_{*}\|_{2}^{2}, (B.3)

where the second inequality follows from the fact that xx_{*} is a Nash equilibrium. Eventually, as ψi\psi^{i} satisfies (4.2), we have

D¯ψ(x,x)=i𝒩Dψi(xi,xi)Hψi𝒩xixi22=Hψxx22.\overline{D}_{\psi}(x_{*},x)=\sum_{i\in\mathcal{N}}D_{\psi_{i}}(x_{*}^{i},x^{i})\leq H_{\psi}\cdot\sum_{i\in\mathcal{N}}\|x^{i}-x^{i}_{*}\|_{2}^{2}=H_{\psi}\cdot\|x-x_{*}\|_{2}^{2}. (B.4)

Combining (B.3) and (B.4), we then conclude the proof. ∎

Lemma B.2.

Define the matrix H~λ(x)\widetilde{H}^{\lambda}(x) as

H~i,jλ(x)=λixjv~i(x),wherev~i(x)=vi(x)+1/λilogxi.\displaystyle\widetilde{H}^{\lambda}_{i,j}(x)=\lambda^{i}\cdot\nabla_{x^{j}}\widetilde{v}^{i}(x),\quad\text{where}~{}\widetilde{v}^{i}(x)={v}^{i}(x)+1/\lambda^{i}\cdot\log x^{i}.

Suppose that ψi\psi^{i} is the negative entropy function. If H~λ(x)+H~λ(x)0\widetilde{H}^{\lambda}(x)+\widetilde{H}^{\lambda}(x)^{\top}\prec 0 for some λ+N\lambda\in{\mathbb{R}}_{+}^{N}, then Nash equilibrium xx_{*} is λ\lambda-strongly stable with respect to the KL divergence.

Proof.

For any x,x𝒳x,x^{\prime}\in\mathcal{X}, let xω=ωx+(1ω)xx_{\omega}=\omega\cdot x+(1-\omega)\cdot x^{\prime}. Similar to (B.2), we first obtain

v~λ(x)v~λ(x),xx=1/201(xx)(H~λ(xω)+H~λ(xω))(xx)dω0,\begin{split}\bigl{\langle}\widetilde{v}^{\lambda}(x)-\widetilde{v}^{\lambda}(x^{\prime}),x-x^{\prime}\bigr{\rangle}&=1/2\cdot\int_{0}^{1}(x-x^{\prime})^{\top}\bigl{(}\widetilde{H}^{\lambda}(x_{\omega})+\widetilde{H}^{\lambda}(x_{\omega})^{\top}\bigr{)}(x-x^{\prime})\cdot{\mathrm{d}}\omega\leq 0,\end{split} (B.5)

Noted that

logxilogxi,xixi\displaystyle\langle\log x^{i}-\log x^{i\prime},x^{i}-x^{i\prime}\rangle =logxi/xi,xi+logxi/xi,xi=KL(xi,xi)+KL(xi,xi),\displaystyle=\langle\log x^{i}/x^{i\prime},x^{i}\rangle+\langle\log x^{i\prime}/x^{i},x^{i\prime}\rangle={\rm KL}(x^{i},x^{i\prime})+{\rm KL}(x^{i\prime},x^{i}),

by setting x=xx^{\prime}=x_{*} in (B.5), we further have

vλ(x),xxvλ(x),xxi=1nKL(xi,xi)i=1nKL(xi,xi)i=1nKL(xi,xi),\displaystyle\bigl{\langle}v^{\lambda}(x),x-x_{*}\bigr{\rangle}\leq\bigl{\langle}v^{\lambda}(x_{*}),x-x_{*}\bigr{\rangle}-\sum_{i=1}^{n}{\rm KL}(x^{i}_{*},x^{i})-\sum_{i=1}^{n}{\rm KL}(x^{i},x^{i}_{*})\leq-\sum_{i=1}^{n}{\rm KL}(x^{i}_{*},x^{i}),

where the second inequality follows from the fact that xx_{*} is a Nash equilibrium. ∎

B.2 Sensitivity of Nash Equilibrium

Unconstrained vame. We now provide the proof of Lemma 4.2.

Proof of Lemma 4.2.

Since 𝒳i=di{\mathcal{X}}^{i}={\mathbb{R}}^{d^{i}} for all i𝒩i\in{\mathcal{N}}, by the definition of x(θ)x_{*}(\theta), we have vθ(x(θ))=0v_{\theta}(x_{*}(\theta))=0 for all θd\theta\in{\mathbb{R}}^{d}. Then, differentiating this equality with respect to θ\theta on both ends, for any i𝒩i\in{\mathcal{N}}, we have

x,xi2uθi(x(θ))θx(θ)+θ,xi2uθi(x(θ))=0.\displaystyle\nabla^{2}_{x,x^{i}}u_{\theta}^{i}\bigl{(}x_{*}({\theta})\bigr{)}\nabla_{\theta}x_{*}({\theta})+\nabla^{2}_{\theta,x^{i}}u^{i}_{\theta}\bigl{(}x_{*}(\theta)\bigr{)}=0.

Thus, we have

θx(θ)=[xvθ(x)]1θvθ(x)|x=x(θ).\displaystyle\nabla_{\theta}x_{*}({\theta})=-\bigl{[}\nabla_{x}v_{\theta}(x)\bigr{]}^{-1}\nabla_{\theta}v_{\theta}(x)\big{|}_{x=x_{*}(\theta)}.

As a consequence of Lemma 4.2, we have the following lemma addressing the sensitivity of Nash equilibrium with respect to the incentive parameter θ\theta.

Lemma B.3.

Under Assumption 5.1, we have for {x~k}k0\{\widetilde{x}_{k}\}_{k\geq 0} generated by Algorithm 1,

x(θk)x(θk1)2Hθkθk12,\displaystyle\bigl{\|}x_{*}(\theta_{k})-x_{*}(\theta_{k-1})\bigr{\|}_{2}\leq{H}_{*}\cdot\|\theta_{k}-\theta_{k-1}\|_{2},

where H=ρθ/ρx{H}_{*}=\rho_{\theta}/\rho_{x}.

Proof.

We have for some θ¯=ωθk+(1ω)θk1\overline{\theta}=\omega\cdot\theta_{k}+(1-\omega)\cdot\theta_{k-1} with ω[0,1]\omega\in[0,1],

x(θk)x(θk1)2\displaystyle\bigl{\|}x_{*}(\theta_{k})-x_{*}(\theta_{k-1})\bigr{\|}_{2} =θx(θ¯)2θkθk12\displaystyle=\bigl{\|}\nabla_{\theta}x_{*}(\overline{\theta})\bigr{\|}_{2}\cdot\|\theta_{k}-\theta_{k-1}\|_{2}
=[xvθ(x(θ¯))]1θvθ(x(θ¯))2θkθk12\displaystyle=\biggl{\|}\Bigl{[}\nabla_{x}v_{\theta}\bigl{(}x_{*}(\overline{\theta})\bigr{)}\Bigr{]}^{-1}\nabla_{\theta}v_{\theta}\bigl{(}x_{*}(\overline{\theta})\bigr{)}\biggr{\|}_{2}\cdot\|\theta_{k}-\theta_{k-1}\|_{2}
[xvθ(x(θ¯))]12θvθ(x(θ¯))2θkθk12\displaystyle\leq\biggl{\|}\Bigl{[}\nabla_{x}v_{\theta}\bigl{(}x_{*}(\overline{\theta})\bigr{)}\Bigr{]}^{-1}\biggr{\|}_{2}\cdot\Bigl{\|}\nabla_{\theta}v_{\theta}\bigl{(}x_{*}(\overline{\theta})\bigr{)}\Bigr{\|}_{2}\cdot\|\theta_{k}-\theta_{k-1}\|_{2}
ρx/ρθθkθk12,\displaystyle\leq\rho_{x}/\rho_{\theta}\cdot\|\theta_{k}-\theta_{k-1}\|_{2},

where the last inequality follows from Assumption 5.1. ∎

Simplex-Constrained Game. We first provide the following lemma.

Lemma B.4 (Theorem 1 in [41]).

When 𝒳i=Δ([di]){\mathcal{X}}^{i}=\Delta([d^{i}]) and xvθ(x(θ))\nabla_{x}v_{\theta}(x_{*}(\theta)) is non-singular, the Jacobian of x(θ)x_{*}(\theta) takes the form

θx(θ)=Jθθvθ(x)|x=x(θ),\displaystyle\nabla_{\theta}x_{*}(\theta)=-J_{\theta}\cdot\nabla_{\theta}v_{\theta}(x)\big{|}_{x=x_{*}(\theta)},

where

Jθ=LLA[ALA]1AL,L=[xvθ(x(θ))]1.\displaystyle J_{\theta}=L-LA^{\top}[ALA^{\top}]^{-1}AL,\quad L=\Bigl{[}\nabla_{x}v_{\theta}\bigl{(}x_{*}(\theta)\bigr{)}\Bigr{]}^{-1}.

Here A=[blkdiag(B1,,Bn);blkdiag(𝟏d1,,𝟏dn)]A=[\mathrm{blkdiag}(B^{1},\ldots,B^{n});\mathrm{blkdiag}({\bf 1}_{d^{1}},\ldots,{\bf 1}_{d^{n}})], where BiB^{i} is obtained by deleting the rows jj in the identity matrix IdiI_{d^{i}} with [xi(θ)]j=0[x_{*}^{i}(\theta)]_{j}=0.

As a consequence of Lemma B.4, we have the following lemma as the simplex constrained version of Lemma B.3.

Lemma B.5.

Under Assumption 5.1, we have

x(θk)x(θk1)1H~θkθk12,\displaystyle\bigl{\|}x_{*}(\theta_{k})-x_{*}(\theta_{k-1})\bigr{\|}_{1}\leq\widetilde{H}_{*}\cdot\|\theta_{k}-\theta_{k-1}\|_{2},

where H~=(1+d)ρθ/ρx\widetilde{H}_{*}=(1+d)\rho_{\theta}/\rho_{x}.

Proof.

Recall that JθJ_{\theta} is defined in Lemma B.4 and that 2\|\cdot\|_{2} is the spectral norm when operating on a matrix. We have

Jθ2\displaystyle\|J_{\theta}\|_{2} L2+LA[ALA]1AL2\displaystyle\leq\|L\|_{2}+\bigl{\|}LA^{\top}[ALA^{\top}]^{-1}AL\bigr{\|}_{2}
L2+L2tr(A[ALA]1AL)\displaystyle\leq\|L\|_{2}+\|L\|_{2}\cdot{{\rm tr}\bigl{(}A^{\top}[ALA^{\top}]^{-1}AL\bigr{)}}
L2+L2tr([ALA]1ALA)\displaystyle\leq\|L\|_{2}+{\|L\|_{2}\cdot{\rm tr}\bigl{(}[ALA^{\top}]^{-1}ALA^{\top}\bigr{)}}
=(1+d)L2(1+d)/ρx.\displaystyle=(1+d)\cdot\|L\|_{2}\leq(1+d)/\rho_{x}.

Thus, by Lemma B.4, we have for some θ¯=ωθk+(1ω)θk1\overline{\theta}=\omega\cdot\theta_{k}+(1-\omega)\cdot\theta_{k-1} with ω[0,1]\omega\in[0,1],

x(θk)x(θk1)\displaystyle\|x_{*}(\theta_{k})-x_{*}(\theta_{k-1})\| θx(θ¯)2θkθk12\displaystyle\leq\bigl{\|}\nabla_{\theta}x_{*}(\overline{\theta})\bigr{\|}_{2}\cdot\|\theta_{k}-\theta_{k-1}\|_{2}
Jθ2θvθ(x)2θkθk12\displaystyle\leq\|J_{\theta}\|_{2}\cdot\bigl{\|}\nabla_{\theta}v_{\theta}(x)\bigr{\|}_{2}\cdot\|\theta_{k}-\theta_{k-1}\|_{2}
(1+d)ρθ/ρxθkθk12,\displaystyle\leq(1+d)\rho_{\theta}/\rho_{x}\cdot\|\theta_{k}-\theta_{k-1}\|_{2},

where the last inequality follows from Assumption 5.1. ∎

Appendix C Proof of Theorem 5.4

We first present the following two lemmas under the conditions presented in Theorem 5.4.

Lemma C.1.

For all k0k\geq 0, we have

ϵk+1x\displaystyle\epsilon_{k+1}^{x} ϵ0xj=0k(1βj/8)+[M2/4H~2+δu2λ22]l=0k[βl2j=l+1k(1βj/8)].\displaystyle\leq\epsilon_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}M^{2}/4\widetilde{H}^{2}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\sum_{l=0}^{k}\biggl{[}\beta^{2}_{l}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}.
Proof.

See Appendix C.1 for detailed proof. ∎

Lemma C.2.

For all k0k\geq 0, we have

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+2M2l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+2M^{2}\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)].\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}.
Proof.

See Appendix C.2 for detailed proof. ∎

Now we are ready to present the proof of Theorem 5.4.

Proof of Theorem 5.4.

For βk=β/(k+1)2/3\beta_{k}=\beta/(k+1)^{2/3}, by [[21], Lemma C.4], we have

j=1k(1βj/8)l=0k[βl2j=l+1k(1βj/8)]8βk.\displaystyle\prod^{k}_{j=1}(1-\beta_{j}/8)\leq\sum_{l=0}^{k}\biggl{[}\beta^{2}_{l}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}\leq 8\beta_{k}.

Thus, by Lemma C.1, we have

ϵk+1x\displaystyle\epsilon_{k+1}^{x} ϵ0xj=0k(1βj/8)+[M2/4H~2+δu2λ22]l=0k[βl2j=l+1k(1βj/4)]\displaystyle\leq\epsilon_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}M^{2}/4\widetilde{H}^{2}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\sum_{l=0}^{k}\biggl{[}\beta^{2}_{l}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/4)\biggr{]}
[8ϵ0x+2M2/H~2+8δu2λ22]βk=O(k2/3).\displaystyle\leq\bigl{[}8\epsilon_{0}^{x}+2M^{2}/\widetilde{H}^{2}+8\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\beta_{k}=O(k^{-2/3}). (C.1)

Similarly, we have for αk=α/(k+1)\alpha_{k}=\alpha/(k+1),

j=0k(1μαj)l=0k[αl2j=l+1k(1μαj)]l=0k[αl5/3j=l+1k(1μαj)]2/μαk2/3.\displaystyle\prod^{k}_{j=0}(1-\mu\alpha_{j})\leq\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}\leq\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{5/3}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}\leq 2/\mu\cdot\alpha_{k}^{2/3}.

Thus, combining (C) with Lemma C.2, we obtain

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+2M2l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+2M^{2}\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]} (C.2)
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)]\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
2ϵ0θ/μαk2/3+(2M2+2H~2)l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq 2\epsilon_{0}^{\theta}/\mu\cdot\alpha_{k}^{2/3}+(2M^{2}+2\widetilde{H}^{2})\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+[8ϵ0x+2M2/H~2+8δu2λ22]H~2/μl=0k[αlβl+1j=l+1k(1μαj)]\displaystyle\qquad+\bigl{[}8\epsilon_{0}^{x}+2M^{2}/\widetilde{H}^{2}+8\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\widetilde{H}^{2}/\mu\cdot\sum_{l=0}^{k}\bigg{[}\alpha_{l}\beta_{l+1}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
2ϵ0θ/μαk2/3+4(M2+H~2)/μαk2/3\displaystyle\leq 2\epsilon_{0}^{\theta}/\mu\cdot\alpha_{k}^{2/3}+4\cdot(M^{2}+\widetilde{H}^{2})/\mu\cdot\alpha_{k}^{2/3}
+[8ϵ0x+2M2/H~2+8δu2λ22]2βH~2/α2/3μ2αk2/3=O(k2/3).\displaystyle\qquad+\bigl{[}8\epsilon_{0}^{x}+2M^{2}/\widetilde{H}^{2}+8\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot 2\beta\widetilde{H}^{2}/\alpha^{2/3}\mu^{2}\cdot\alpha_{k}^{2/3}=O(k^{-2/3}).

Here the third inequality follows from βk=β/α2/3αk2/3\beta_{k}=\beta/\alpha^{2/3}\cdot\alpha_{k}^{2/3}. Therefore, (C) and (C.2) conclude the proof of Theorem 5.4. ∎

C.1 Proof of Lemma C.1

Proof.

Recall that

xk+1i\displaystyle x^{i}_{k+1} =argmaxxi𝒳i{v^ki,xi1/βkiDψi(xki,xi)}.\displaystyle=\mathop{\mathrm{argmax}}_{x^{i}\in{\mathcal{X}}^{i}}\bigl{\{}\langle\widehat{v}^{i}_{k},x^{i}\rangle-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(x^{i}_{k},x^{i})\bigr{\}}.

We have

Dψi(xi(θk),xk+1i)\displaystyle D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k+1}\bigr{)} Dψi(xi(θk),xki)βkiv^ki,xi(θk)xk+1iDψi(xki,xk+1i)\displaystyle\leq D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\bigl{\langle}\widehat{v}^{i}_{k},x^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}\bigl{(}x^{i}_{k},x^{i}_{k+1}\bigr{)}
Dψi(xi(θk),xki)βkiv^ki,xi(θk)xki\displaystyle\leq D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\bigl{\langle}\widehat{v}^{i}_{k},x^{i}_{*}(\theta_{k})-x^{i}_{k}\bigr{\rangle}
βkiv^ki,xkixk+1i1/2xkixk+1i2,\displaystyle\qquad-\beta^{i}_{k}\cdot\bigl{\langle}\widehat{v}^{i}_{k},x^{i}_{k}-x^{i}_{k+1}\bigr{\rangle}-1/2\cdot\|x^{i}_{k}-x^{i}_{k+1}\|^{2},

where the second inequality follows from the fact that Dψi(xi,xi)1/2xixi2D_{\psi^{i}}(x^{i},x^{i\prime})\geq 1/2\cdot\|x^{i}-x^{i\prime}\|^{2}. Taking the conditional expectation given kx\mathcal{F}^{x}_{k}, we obtain

𝔼[Dψi(xi(θk),xk+1i)|kx]\displaystyle\mathbb{E}\Bigl{[}D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} Dψi(xi(θk),xki)βki𝔼[v^ki,xi(θk)xki|kx]\displaystyle\leq D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\mathbb{E}\Bigl{[}\bigl{\langle}\widehat{v}^{i}_{k},x^{i}_{*}(\theta_{k})-x^{i}_{k}\bigr{\rangle}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (C.3)
+𝔼[βkiv^ki,xkixk+1i1/2xkixk+1i2|kx]\displaystyle\qquad+\mathbb{E}\Bigl{[}-\beta^{i}_{k}\cdot\bigl{\langle}\widehat{v}^{i}_{k},x^{i}_{k}-x^{i}_{k+1}\bigr{\rangle}-1/2\cdot\bigl{\|}x^{i}_{k}-x^{i}_{k+1}\bigr{\|}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}
Dψi(xi(θk),xki)βkivθki(xk),xi(θk)xki+(βki)2/2𝔼[v^ki2|kx],\displaystyle\leq D_{\psi^{i}}\bigl{(}x^{i}_{*}(\theta_{k}),x^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(x_{k}),x^{i}_{*}(\theta_{k})-x^{i}_{k}\bigr{\rangle}+(\beta^{i}_{k})^{2}/2\cdot\mathbb{E}\bigl{[}\|\widehat{v}^{i}_{k}\|_{*}^{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]},

where the second inequality follows from Assumption 5.3. Summing up (C.3) for all i𝒩i\in{\mathcal{N}}, we have

𝔼[D¯ψ(x(θk),xk+1)|kx]\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} D¯ψ(x(θk),xk)βki𝒩λivθki(xk),xi(θk)xki\displaystyle\leq\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}-\beta_{k}\cdot\sum_{i\in{\mathcal{N}}}\lambda^{i}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(x_{k}),x^{i}_{*}(\theta_{k})-x^{i}_{k}\bigr{\rangle}
+βk2/2i𝒩(λi)2𝔼[v^ki2|kx].\displaystyle\qquad+\beta_{k}^{2}/2\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\mathbb{E}\bigl{[}\|\widehat{v}^{i}_{k}\|_{*}^{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]}. (C.4)

By the λ\lambda-strong variational stability of xi(θk)x_{i}(\theta_{k}), we have

i𝒩λivθki(xk),xi(θk)xkiD¯ψ(x(θk),xk).\displaystyle-\sum_{i\in{\mathcal{N}}}\lambda^{i}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(x_{k}),x^{i}_{*}(\theta_{k})-x^{i}_{k}\bigr{\rangle}\leq-\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}. (C.5)

Moreover, by Assumption 5.3 we have

𝔼[v^ki2|kx]\displaystyle\mathbb{E}\bigl{[}\|\widehat{v}^{i}_{k}\|_{*}^{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]} 2𝔼[v^kivθki(xk)2+vθki(xk)2|kx]2δu2+2vθki(xk)2,\displaystyle\leq 2\cdot\mathbb{E}\Bigl{[}\bigl{\|}\widehat{v}^{i}_{k}-v^{i}_{\theta_{k}}(x_{k})\bigr{\|}_{*}^{2}+\bigl{\|}v^{i}_{\theta_{k}}(x_{k})\bigr{\|}_{*}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}\leq 2\delta_{u}^{2}+2\bigl{\|}v^{i}_{\theta_{k}}(x_{k})\bigr{\|}_{*}^{2},

summing up which gives

i𝒩(λi)2𝔼[v^ki2|kx]\displaystyle\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\mathbb{E}\bigl{[}\|\widehat{v}^{i}_{k}\|_{*}^{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]} 2δu2i𝒩(λi)2+2i𝒩(λi)2vθki(xk)vθki(x(θk))2\displaystyle\leq 2\delta_{u}^{2}\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}+2\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\Bigl{\|}v^{i}_{\theta_{k}}(x_{k})-v^{i}_{\theta_{k}}\bigl{(}x_{*}(\theta_{k})\bigr{)}\Bigr{\|}_{*}^{2}
2δu2i𝒩(λi)2+2NHu2i𝒩(λi)2D¯ψ(x(θk),xk)\displaystyle\leq 2\delta_{u}^{2}\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}+2NH_{u}^{2}\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}
=2δu2λ22+2NHu2λ22D¯ψ(x(θk),xk),\displaystyle=2\delta_{u}^{2}\|\lambda\|^{2}_{2}+2NH_{u}^{2}\|\lambda\|_{2}^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}, (C.6)

where the first inequality follows from the optimality condition that vθk(x(θk))=0v_{\theta_{k}}(x_{*}(\theta_{k}))=0, and the second inequality follows from Assumption 5.1. Thus, taking (C.5) and (C.1) into (C.1), we obtain

𝔼[D¯ψ(x(θk),xk+1)|kx]\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (C.7)
(1βk)D¯ψ(x(θk),xk)+δu2λ22βk2+2NHu2λ22βk2D¯ψ(x(θk),xk)2\displaystyle\quad\leq(1-\beta_{k})\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2}+2NH_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}^{2}
(1βk+NHu2λ22βk2)D¯ψ(x(θk),xk)+δu2λ22βk2\displaystyle\quad\leq\bigl{(}1-\beta_{k}+NH_{u}^{2}\|\lambda\|_{2}^{2}\beta^{2}_{k}\bigr{)}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta^{2}_{k}
(1βk/2)D¯ψ(x(θk),xk)+δu2λ22βk2,\displaystyle\quad\leq(1-\beta_{k}/2)\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2},

where the last inequality holds with NHu2λ22βk2βkNH_{u}^{2}\|\lambda\|_{2}^{2}\beta^{2}_{k}\leq\beta_{k}, which is satisfied by β1/NHu2λ22\beta\leq 1/NH_{u}^{2}\|\lambda\|_{2}^{2}. By Lemma E.1, we have for any γ>max{1,Hψ2}\gamma>\max\{1,H_{\psi}^{2}\},

D¯ψ(x(θk),xk)(1+1γ)D¯ψ(x(θk1),xk)\displaystyle\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k}\bigr{)}-\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)} (C.8)
Hψ2(1+γ)2(1+γ)2γi𝒩xi(θk)xi(θk1)2\displaystyle\quad\leq\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot\sum_{i\in{\mathcal{N}}}\bigl{\|}x^{i}_{*}(\theta_{k})-x^{i}_{*}(\theta_{k-1})\bigr{\|}^{2}
Hψ2(1+γ)2(1+γ)2γx(θk)x(θk1)2\displaystyle\quad\leq\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot\bigl{\|}x_{*}(\theta_{k})-x_{*}(\theta_{k-1})\bigr{\|}^{2}
Hψ2(1+γ)2(1+γ)2γH2θkθk122Hψ2(1+γ)2(1+γ)2γH2αk12~fk122,\displaystyle\quad\leq\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot H_{*}^{2}\cdot\|\theta_{k}-\theta_{k-1}\|^{2}_{2}\leq\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot H_{*}^{2}\alpha^{2}_{k-1}\cdot\|\widetilde{\nabla}f_{k-1}\|^{2}_{2},

where the third inequality follows from Lemma B.3, and the last inequality follows from the fact that proximal mapping is non-expensive. Taking (C.8) into (C.7) and choosing γ=(42βk)/βk\gamma=(4-2\beta_{k})/\beta_{k}, we obtain

𝔼[D¯ψ(x(θk),xk+1)|kx]\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (1βk/2)(1+1γ)D¯ψ(x(θk1),xk)+δu2λ22βk2\displaystyle\leq(1-\beta_{k}/2)\cdot\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2} (C.9)
+(1βk/2)Hψ2(1+γ)2(1+γ)2γH2αk12𝔼[~fk122|kx],\displaystyle\qquad+(1-\beta_{k}/2)\cdot\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot H_{*}^{2}\alpha^{2}_{k-1}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]},
(1βk/4)D¯ψ(x(θk1),xk)+δu2λ22βk2\displaystyle\leq(1-\beta_{k}/4)\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2}
+(1βk/4)Hψ2(1+γ)12H2αk12𝔼[~fk122|kx].\displaystyle\qquad+(1-\beta_{k}/4)\cdot\frac{H_{\psi}^{2}\cdot(1+\gamma)-1}{2}\cdot H_{*}^{2}\alpha^{2}_{k-1}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]}.

By Assumptions 5.2 and 5.3, we have

𝔼[~fk122|k1θ]\displaystyle\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}_{k-1}^{\theta}\bigr{]} (C.10)
2𝔼[~f(θk1,xk)f(θk1)22+f(θk1)22|k1θ]\displaystyle\quad\leq 2\mathbb{E}\Bigl{[}\bigl{\|}\widetilde{\nabla}f(\theta_{k-1},x_{k})-\nabla f_{*}(\theta_{k-1})\bigr{\|}_{2}^{2}+\bigl{\|}\nabla f_{*}(\theta_{k-1})\bigr{\|}_{2}^{2}\,\Big{|}\,\mathcal{F}_{k-1}^{\theta}\Bigr{]}
2H~2D¯ψ(x(θk1),xk)+2M2,\displaystyle\quad\leq 2\widetilde{H}^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)}+2M^{2},

taking which into (C.9) and taking expectation on both sides, we further obtain

ϵk+1x\displaystyle\epsilon^{x}_{k+1} {1βk/4+[Hψ2(1+γ)1]H~2H2αk12}ϵkx+δu2λ22βk2\displaystyle\leq\Bigl{\{}1-\beta_{k}/4+\bigl{[}H_{\psi}^{2}\cdot(1+\gamma)-1\bigr{]}\cdot\widetilde{H}^{2}H_{*}^{2}\alpha^{2}_{k-1}\Bigr{\}}\cdot\epsilon_{k}^{x}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\beta_{k}^{2}
+(1βk/4)Hψ2(1+γ)12H2αk122M2.\displaystyle\qquad+(1-\beta_{k}/4)\cdot\frac{H_{\psi}^{2}\cdot(1+\gamma)-1}{2}\cdot H_{*}^{2}\alpha^{2}_{k-1}\cdot 2M^{2}.

By γ=(42βk)/βk\gamma=(4-2\beta_{k})/\beta_{k}, we have

[Hψ2(1+γ)1]H~2H2αk12\displaystyle\bigl{[}H_{\psi}^{2}\cdot(1+\gamma)-1\bigr{]}\cdot\widetilde{H}^{2}H_{*}^{2}\alpha^{2}_{k-1} =[Hψ2(4/βk1)1]H~2H2αk12\displaystyle=\bigl{[}H_{\psi}^{2}\cdot(4/\beta_{k}-1)-1\bigr{]}\cdot\widetilde{H}^{2}H_{*}^{2}\alpha^{2}_{k-1}
16Hψ2H~2H2αk2/βkβk2/8βk/8,\displaystyle\leq 16H_{\psi}^{2}\widetilde{H}^{2}H_{*}^{2}\alpha^{2}_{k}/\beta_{k}\leq\beta^{2}_{k}/8\leq\beta_{k}/8,

where the first inequality follows from αk12αk\alpha_{k-1}\leq 2\alpha_{k}, and the second inequality follows from αk=α/(k+1)\alpha_{k}=\alpha/(k+1), βk=β/(k+1)2/3\beta_{k}=\beta/(k+1)^{2/3}, and α/β3/21/12HψH~H\alpha/\beta^{3/2}\leq 1/12H_{\psi}\widetilde{H}H_{*}. Thus, we obtain

ϵk+1x(1βk/8)ϵkx+[M2/4H~2+δu2λ22]βk2.\displaystyle\epsilon^{x}_{k+1}\leq(1-\beta_{k}/8)\cdot\epsilon_{k}^{x}+\bigl{[}M^{2}/4\widetilde{H}^{2}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\beta_{k}^{2}. (C.11)

Recursively applying (C.11), we obtain

ϵk+1x\displaystyle\epsilon_{k+1}^{x} ϵ0xj=0k(1βj/8)+[M2/4H~2+δu2λ22]l=0k[βl2j=l+1k(1βj/4)].\displaystyle\leq\epsilon_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}M^{2}/4\widetilde{H}^{2}+\delta_{u}^{2}\|\lambda\|_{2}^{2}\bigr{]}\cdot\sum_{l=0}^{k}\biggl{[}\beta^{2}_{l}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/4)\biggr{]}.

Thus, we conclude the proof Lemma C.1. ∎

C.2 Proof of Lemma C.2

Proof.

Since the projection argmaxx𝒳\mathop{\mathrm{argmax}}_{x\in{\mathcal{X}}} is non-expansive, we have

θk+1θ2\displaystyle\|\theta_{k+1}-\theta_{*}\|^{2} θk+αk^fkθ22\displaystyle\leq\|\theta_{k}+\alpha_{k}\cdot\widehat{\nabla}f_{k}-\theta_{*}\|_{2}^{2}
=θkθ22+αk2^fk22+2αk^fk,θkθ.\displaystyle=\|\theta_{k}-\theta_{*}\|_{2}^{2}+\alpha_{k}^{2}\cdot\|\widehat{\nabla}f_{k}\|_{2}^{2}+2\alpha_{k}\cdot\langle\widehat{\nabla}f_{k},\theta_{k}-\theta_{*}\rangle.

Taking the conditional expectation given kθ\mathcal{F}_{k}^{\theta}, we obtain

𝔼[θk+1θ2|kθ]\displaystyle\mathbb{E}\bigl{[}\|\theta_{k+1}-\theta_{*}\|^{2}\,\big{|}\,\mathcal{F}_{k}^{\theta}] θkθ22+2αkθf(θk),θkθ+αk2𝔼[~fk22|kθ]\displaystyle\leq\|\theta_{k}-\theta_{*}\|_{2}^{2}+2\alpha_{k}\cdot\bigl{\langle}\nabla_{\theta}f_{*}(\theta_{k}),\theta_{k}-\theta_{*}\bigr{\rangle}+\alpha_{k}^{2}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k}\|_{2}^{2}\,\big{|}\,\mathcal{F}^{\theta}_{k}\bigr{]}
+2αk𝔼[~f(θk,xk+1)θf(θk),θkθ|kθ]\displaystyle\qquad+2\alpha_{k}\cdot\mathbb{E}\Bigl{[}\bigl{\langle}\widetilde{\nabla}f(\theta_{k},x_{k+1})-\nabla_{\theta}f_{*}(\theta_{k}),\theta_{k}-\theta_{*}\bigr{\rangle}\,\Big{|}\,\mathcal{F}^{\theta}_{k}\Bigr{]}
(12μαk)θkθ22+αk2𝔼[~fk22|kθ]\displaystyle\leq(1-2\mu\alpha_{k})\cdot\|\theta_{k}-\theta_{*}\|_{2}^{2}+\alpha_{k}^{2}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k}\|_{2}^{2}\,\big{|}\,\mathcal{F}^{\theta}_{k}\bigr{]}
+αk/μ𝔼[~f(θk,xk+1)θf(θk)2|kθ]+μαkθkθ22\displaystyle\qquad+\alpha_{k}/\mu\cdot\mathbb{E}\Bigl{[}\bigl{\|}\widetilde{\nabla}f(\theta_{k},x_{k+1})-\nabla_{\theta}f_{*}(\theta_{k})\bigr{\|}_{2}\,\Big{|}\,\mathcal{F}^{\theta}_{k}\Bigr{]}+\mu\alpha_{k}\cdot\|\theta_{k}-\theta_{*}\|_{2}^{2}
(1μαk)θkθ22+H~2αk/μ𝔼[D¯ψ(x(θk),xk+1)|kθ]\displaystyle\leq(1-\mu\alpha_{k})\cdot\|\theta_{k}-\theta_{*}\|_{2}^{2}+\widetilde{H}^{2}\alpha_{k}/\mu\cdot\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{\theta}_{k}\Bigr{]}
+αk2𝔼[~fk22|kθ],\displaystyle\qquad+\alpha_{k}^{2}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k}\|_{2}^{2}\,\big{|}\,\mathcal{F}^{\theta}_{k}\bigr{]}, (C.12)

where the second inequality follows from Assumption 5.2 and the Cauchy-Schwartz inequality, and the last inequality follows from Assumption 5.2. Applying (C.10) to (C.2) and taking expectation on both sides, we get

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} (1μαk)ϵkθ+2M2αk2+H~2(2αk2+αk/μ)ϵk+1x.\displaystyle\leq(1-\mu\alpha_{k})\cdot\epsilon_{k}^{\theta}+2M^{2}\cdot\alpha_{k}^{2}+\widetilde{H}^{2}\cdot(2\alpha_{k}^{2}+\alpha_{k}/\mu)\cdot\epsilon_{k+1}^{x}. (C.13)

Recursively applying (C.13), we obtain

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+2M2l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+2M^{2}\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)].\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}.

Thus, we conclude the proof of Lemma C.2. ∎

Appendix D Proof of Theorem 5.5

We first present the following two lemmas under the conditions presented in Theorem 5.5

Lemma D.1.

For all k0k\geq 0, we have

ϵ~k+1x\displaystyle\widetilde{\epsilon}_{k+1}^{x} ϵ~0xj=0k(1βj/8)+[(δu2+3V2)λ22+(M2+3NH~2)/4H~2][l=0kβl2j=l+1k(1βj/8)]\displaystyle\leq\widetilde{\epsilon}_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(M^{2}+3N\widetilde{H}^{2})/4\widetilde{H}^{2}\bigr{]}\cdot\biggl{[}\sum_{l=0}^{k}\beta_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}
+N[l=0k(βlνllog(1/νl)+2νl+1+2νl2)j=l+1k(1βj/8)].\displaystyle\qquad+N\cdot\biggl{[}\sum_{l=0}^{k}\bigl{(}\beta_{l}\nu_{l}\log(1/\nu_{l})+2\nu_{l+1}+2\nu_{l}^{2}\bigr{)}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}.
Proof.

See Appendix D.1 for detailed proof. ∎

Lemma D.2.

For all k0k\geq 0, we have

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+(3M2+6NH~2)l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+(3M^{2}+6N\widetilde{H}^{2})\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)].\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}.
Proof.

See Appendix D.2 for detailed proof. ∎

Now we are ready to prove Theorem 5.5.

Proof of Theorem 5.5.

For βk=β/(k+1)2/7\beta_{k}=\beta/(k+1)^{2/7}, by [[21], Lemma C.4], we have

j=1k(1βj/8)l=0k[βl2j=l+1k(1βj/8)]8βk.\displaystyle\prod^{k}_{j=1}(1-\beta_{j}/8)\leq\sum_{l=0}^{k}\biggl{[}\beta^{2}_{l}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}\leq 8\beta_{k}.

Thus, by Lemma C.1, we have

ϵ~k+1x\displaystyle\widetilde{\epsilon}_{k+1}^{x} ϵ~0xj=0k(1βj/8)+[(δu2+3V2)λ22+(M2+3NH~2)/4H~2][l=0kβl2j=l+1k(1βj/8)]\displaystyle\leq\widetilde{\epsilon}_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(M^{2}+3N\widetilde{H}^{2})/4\widetilde{H}^{2}\bigr{]}\cdot\biggl{[}\sum_{l=0}^{k}\beta_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}
+N[l=0k(βlνllog(1/νl)+2νl+1+2νl2)j=l+1k(1βj/8)]\displaystyle\qquad+N\cdot\biggl{[}\sum_{l=0}^{k}\bigl{(}\beta_{l}\nu_{l}\log(1/\nu_{l})+2\nu_{l+1}+2\nu_{l}^{2}\bigr{)}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}
[8ϵ~0x+8(δu2+3V2)λ22+(2M2+6NH~2)/H~2]βk\displaystyle\leq\bigl{[}8\widetilde{\epsilon}_{0}^{x}+8(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(2M^{2}+6N\widetilde{H}^{2})/\widetilde{H}^{2}\bigr{]}\cdot\beta_{k}
+N[l=0k(βlνllog(1/νl)+2νl+1+2νl2)j=l+1k(1βj/8)].\displaystyle\qquad+N\cdot\biggl{[}\sum_{l=0}^{k}\bigl{(}\beta_{l}\nu_{l}\log(1/\nu_{l})+2\nu_{l+1}+2\nu_{l}^{2}\bigr{)}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}. (D.1)

Since νl=1/(l+1)4/7\nu_{l}=1/(l+1)^{4/7}, we have

βlνllog(1/νl)+2νl+1+2νl2(4/β2+4/7β)βl2,\displaystyle\beta_{l}\nu_{l}\log(1/\nu_{l})+2\nu_{l+1}+2\nu_{l}^{2}\leq(4/\beta^{2}+4/7\beta)\cdot\beta^{2}_{l},

taking which into (D), we obtain

ϵ~k+1x\displaystyle\widetilde{\epsilon}_{k+1}^{x} [8ϵ~0x+8(δu2+3V2)λ22+(2M2+6NH~2)/H~2]βk+32N(1/β2+1/7β)βk\displaystyle\leq\bigl{[}8\widetilde{\epsilon}_{0}^{x}+8(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(2M^{2}+6N\widetilde{H}^{2})/\widetilde{H}^{2}\bigr{]}\cdot\beta_{k}+32N\cdot(1/\beta^{2}+1/7\beta)\cdot\beta_{k}
=c~βk=O(k2/7),\displaystyle=\widetilde{c}\cdot\beta_{k}=O(k^{-2/7}), (D.2)

where

c~=8ϵ~0x+8(δu2+3V2)λ22+(2M2+6NH~2)/H~2+32N(1/β2+4/7β).\displaystyle\widetilde{c}=8\widetilde{\epsilon}_{0}^{x}+8(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(2M^{2}+6N\widetilde{H}^{2})/\widetilde{H}^{2}+32N\cdot(1/\beta^{2}+4/7\beta).

Similarly, we have for αk=α/(k+1)1/2\alpha_{k}=\alpha/(k+1)^{1/2},

j=0k(1μαj)l=0k[αl2j=l+1k(1μαj)]l=0k[αl11/7j=l+1k(1μαj)]2/μαk4/7\displaystyle\prod^{k}_{j=0}(1-\mu\alpha_{j})\leq\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}\leq\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{11/7}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}\leq 2/\mu\cdot\alpha_{k}^{4/7}

Thus, combining (C) with Lemma C.2, we obtain

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+(3M2+6NH~2)l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+(3M^{2}+6N\widetilde{H}^{2})\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]} (D.3)
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)]\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
2ϵ0θ/μαk4/7+(3M2+6NH~2+2H~2)l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq 2\epsilon_{0}^{\theta}/\mu\cdot\alpha_{k}^{4/7}+(3M^{2}+6N\widetilde{H}^{2}+2\widetilde{H}^{2})\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+c~H~2/μl=0k[αlβl+1j=l+1k(1μαj)]\displaystyle\qquad+\widetilde{c}\cdot\widetilde{H}^{2}/\mu\cdot\sum_{l=0}^{k}\bigg{[}\alpha_{l}\beta_{l+1}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
2ϵ0θ/μαk4/7+2(3M2+6NH~2+2H~2)/μαk4/7+c~2βH~2/α4/7μ2αk4/7\displaystyle\leq 2\epsilon_{0}^{\theta}/\mu\cdot\alpha_{k}^{4/7}+2\cdot(3M^{2}+6N\widetilde{H}^{2}+2\widetilde{H}^{2})/\mu\cdot\alpha_{k}^{4/7}+\widetilde{c}\cdot 2\beta\widetilde{H}^{2}/\alpha^{4/7}\mu^{2}\cdot\alpha_{k}^{4/7}
=O(k2/7).\displaystyle=O(k^{-2/7}).

Here the third inequality follows from βk=β/α4/7αk4/7\beta_{k}=\beta/\alpha^{4/7}\cdot\alpha_{k}^{4/7}. Therefore, (D) and (D.3) conclude the proof of Theorem 5.5. ∎

D.1 Proof of Lemma D.1

Proof.

We first show that

Dψi(x~i(θk),xk+1i)=Dψi(x~i(θk),x~ki)1/βkiv^ki,x~i(θk)xk+1iDψi(xk+1i,x~ki).D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),{x}^{i}_{k+1}\bigr{)}=D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-1/\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}(x^{i}_{k+1},\widetilde{x}_{k}^{i}). (D.4)

Since

xk+1i\displaystyle x^{i}_{k+1} =argmaxxi𝒳i{v^ki,xi1/βkiDψi(xki,xi)},\displaystyle=\mathop{\mathrm{argmax}}_{x^{i}\in{\mathcal{X}}^{i}}\bigl{\{}\langle\widehat{v}^{i}_{k},x^{i}\rangle-1/\beta^{i}_{k}\cdot D_{\psi^{i}}(x^{i}_{k},x^{i})\bigr{\}},

we have the exact form of xk+1ix^{i}_{k+1} as

xk+1ix~kiexp{βkiv^ki}.\displaystyle x^{i}_{k+1}\propto\widetilde{x}_{k}^{i}\cdot\exp\{\beta_{k}^{i}\cdot\widehat{v}_{k}^{i}\}. (D.5)

By the definition of KL divergence, we have

Dψi(x~i(θk),xk+1i)Dψi(x~i(θk),x~ki)\displaystyle D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),{x}^{i}_{k+1}\bigr{)}-D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)} (D.6)
=log(x~ki/xk+1i),x~i(θk)xk+1iDψi(xk+1i,x~ki)\displaystyle\quad=\bigl{\langle}\log(\widetilde{x}^{i}_{k}/{x}^{i}_{k+1}),\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}(x^{i}_{k+1},\widetilde{x}_{k}^{i})
=log(x~ki/xk+1i)+βkiv^ki,x~i(θk)xk+1iβkiv^ki,x~i(θk)xk+1iDψi(xk+1i,x~ki).\displaystyle\quad=\bigl{\langle}\log(\widetilde{x}^{i}_{k}/{x}^{i}_{k+1})+\beta_{k}^{i}\cdot\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}(x^{i}_{k+1},\widetilde{x}_{k}^{i}).

Let Zki=j[di][x~ki]jexp{1/βki[v^ki]j}{Z}_{k}^{i}=\sum_{j\in[d^{i}]}[\widetilde{x}_{k}^{i}]_{j}\cdot\exp\{1/\beta_{k}^{i}\cdot[\widehat{v}_{k}^{i}]_{j}\} be the normalization factor of the exact form of xk+1ix_{k+1}^{i} in (D.5). We have

log(x~ki/xk+1i)+βkiv^ki,x~i(θk)xk+1i\displaystyle\bigl{\langle}\log(\widetilde{x}^{i}_{k}/{x}^{i}_{k+1})+\beta_{k}^{i}\cdot\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle} (D.7)
=logx~kilogxk+1i+βkiv^ki,x~i(θk)xk+1i\displaystyle\quad=\bigl{\langle}\log\widetilde{x}^{i}_{k}-\log{x}^{i}_{k+1}+\beta_{k}^{i}\cdot\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}
=logx~kilogx~kiβkiv^ki+logZki+βkiv^ki,x~i(θk)xk+1i\displaystyle\quad=\bigl{\langle}\log\widetilde{x}^{i}_{k}-\log\widetilde{x}^{i}_{k}-\beta_{k}^{i}\cdot\widehat{v}^{i}_{k}+\log{Z}_{k}^{i}+\beta_{k}^{i}\cdot\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}
=logZk+1i𝟏di,x~i(θk)xk+1i=0,\displaystyle\quad=\log Z_{k+1}^{i}\cdot\bigl{\langle}{\bf 1}_{d^{i}},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}=0,

where the last equality follows from the fact that j[di][x~i(θk)]j=j[di][xk+1i]j=1\sum_{j\in[d^{i}]}[\widetilde{x}^{i}_{*}(\theta_{k})]_{j}=\sum_{j\in[d^{i}]}[{x}^{i}_{k+1}]_{j}=1. Taking (D.7) into (D.6), we obtain

Dψi(x~i(θk),xk+1i)=Dψi(x~i(θk),x~ki)βkiv^ki,x~i(θk)xk+1iDψi(xk+1i,x~ki),\displaystyle D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),{x}^{i}_{k+1}\bigr{)}=D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}(x^{i}_{k+1},\widetilde{x}_{k}^{i}),

which concludes the proof of (D.4).

Continuing from (D.4), we have

Dψi(x~i(θk),xk+1i)\displaystyle D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),{x}^{i}_{k+1}\bigr{)} =Dψi(x~i(θk),x~ki)βkiv^ki,x~i(θk)xk+1iDψi(xk+1i,x~ki)\displaystyle=D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-x^{i}_{k+1}\bigr{\rangle}-D_{\psi^{i}}(x^{i}_{k+1},\widetilde{x}_{k}^{i})
Dψi(x~i(θk),x~ki)βkiv^ki,x~i(θk)x~ki\displaystyle\leq D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\rangle}
βkiv^ki,x~kixk+1i1/2xk+1ix~ki12,\displaystyle\qquad-\beta_{k}^{i}\cdot\bigl{\langle}\widehat{v}_{k}^{i},\widetilde{x}^{i}_{k}-x^{i}_{k+1}\bigr{\rangle}-1/2\cdot\|x^{i}_{k+1}-\widetilde{x}_{k}^{i}\|_{1}^{2},

where the second inequality follows from the fact that Dψi(xi,xi)1/2xixi12D_{\psi^{i}}(x^{i},x^{i\prime})\geq 1/2\cdot\|x^{i}-x^{i\prime}\|_{1}^{2}. Taking the conditional expectation given kx\mathcal{F}^{x}_{k}, we obtain

𝔼[Dψi(x~i(θk),xk+1i)|kx]\displaystyle\mathbb{E}\Bigl{[}D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),x^{i}_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} Dψi(x~i(θk),x~ki)βki𝔼[v^ki,x~i(θk)x~ki|kx]\displaystyle\leq D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\mathbb{E}\Bigl{[}\bigl{\langle}\widehat{v}^{i}_{k},\widetilde{x}^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\rangle}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (D.8)
+𝔼[βkiv^ki,x~kixk+1i1/2x~kixk+1i12|kx]\displaystyle\qquad+\mathbb{E}\Bigl{[}-\beta^{i}_{k}\cdot\bigl{\langle}\widehat{v}^{i}_{k},\widetilde{x}^{i}_{k}-x^{i}_{k+1}\bigr{\rangle}-1/2\cdot\|\widetilde{x}^{i}_{k}-x^{i}_{k+1}\|_{1}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}
Dψi(x~i(θk),x~ki)βkivθki(x~k),x~i(θk)xi(θk)\displaystyle\leq D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\bigl{\langle}\beta^{i}_{k}\cdot v^{i}_{\theta_{k}}(\widetilde{x}_{k}),\widetilde{x}^{i}_{*}(\theta_{k})-{x}^{i}_{*}(\theta_{k})\bigr{\rangle}
βkivθki(x~k),xi(θk)x~ki+(βki)2/2𝔼[v^ki2|kx]\displaystyle\qquad-\beta^{i}_{k}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(\widetilde{x}_{k}),{x}^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\rangle}+(\beta^{i}_{k})^{2}/2\cdot\mathbb{E}\bigl{[}\|\widehat{v}^{i}_{k}\|_{\infty}^{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]}
Dψi(x~i(θk),x~ki)βkivθki(x~k),xi(θk)x~ki\displaystyle\leq D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\beta^{i}_{k}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(\widetilde{x}_{k}),{x}^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\rangle}
+(βki)2/2𝔼[v^ki2+vθki(x~k)2|kx]+2νk2.\displaystyle\qquad+(\beta^{i}_{k})^{2}/2\cdot\mathbb{E}\Bigl{[}\|\widehat{v}^{i}_{k}\|_{\infty}^{2}+\bigl{\|}{v}^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}+2\nu_{k}^{2}.

where the last inequality follows from Cauchy-Schwartz inequality and the fact that x~i(θk)xi(θk)1=νkxi(θk)𝟏di/di12νk\|\widetilde{x}_{*}^{i}(\theta_{k})-x_{*}^{i}(\theta_{k})\|_{1}=\nu_{k}\cdot\|{x}_{*}^{i}(\theta_{k})-{\bf 1}_{d^{i}}/d^{i}\|_{1}\leq 2\nu_{k}. By the λ\lambda-strong variational stability of x(θk)x_{*}(\theta_{k}), we have

i𝒩λivθki(x~k),xi(θk)x~kiD¯ψ(x(θk),x~k).\displaystyle-\sum_{i\in{\mathcal{N}}}\lambda^{i}\cdot\bigl{\langle}v^{i}_{\theta_{k}}(\widetilde{x}_{k}),x^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\rangle}\leq-\overline{D}_{\psi}\bigl{(}{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}. (D.9)

Moreover, by Assumption 5.3, we have

𝔼[v^ki2+vθki(x~k)2|kx]\displaystyle\mathbb{E}\Bigl{[}\|\widehat{v}^{i}_{k}\|_{\infty}^{2}+\bigl{\|}{v}^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}
𝔼[2v^kivθki(x~k)2+3vθki(x~k)2|kx]2δu2+3vθki(x~k)2,\displaystyle\quad\leq\mathbb{E}\Bigl{[}2\bigl{\|}\widehat{v}^{i}_{k}-v^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2}+3\bigl{\|}v^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}\leq 2\delta_{u}^{2}+3\bigl{\|}v^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2},

summing up which gives

i𝒩(λi)2𝔼[v^ki2+vθki(x~k)2|kx]\displaystyle\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\mathbb{E}\Bigl{[}\|\widehat{v}^{i}_{k}\|_{\infty}^{2}+\bigl{\|}{v}^{i}_{\theta_{k}}(\widetilde{x}_{k})\bigr{\|}_{\infty}^{2}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (D.10)
2δu2i𝒩(λi)2+6i𝒩(λi)2(vθki(x~k)vθki(x(θk))2+vθki(x(θk))2)\displaystyle\quad\leq 2\delta_{u}^{2}\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}+6\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\biggl{(}\Bigl{\|}v^{i}_{\theta_{k}}(\widetilde{x}_{k})-v^{i}_{\theta_{k}}\bigl{(}x_{*}(\theta_{k})\bigr{)}\Bigr{\|}_{\infty}^{2}+\Bigl{\|}v^{i}_{\theta_{k}}\bigl{(}x_{*}(\theta_{k})\bigr{)}\Bigr{\|}_{\infty}^{2}\biggr{)}
(2δu2+6V2)i𝒩(λi)2+6NHu2i𝒩(λi)2D¯ψ(x(θk),x~k)\displaystyle\quad\leq(2\delta_{u}^{2}+6V_{*}^{2})\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}+6NH_{u}^{2}\cdot\sum_{i\in{\mathcal{N}}}(\lambda^{i})^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}
=(2δu2+6V2)λ22+6NHu2λ22D¯ψ(x(θk),x~k),\displaystyle\quad=(2\delta_{u}^{2}+6V_{*}^{2})\cdot\|\lambda\|^{2}_{2}+6NH_{u}^{2}\|\lambda\|_{2}^{2}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)},

where the second inequality follows from vθ(x(θ))V\|v_{\theta}(x_{*}(\theta))\|_{\infty}\leq V_{*}, and the third inequality follows from Assumption 5.1. Thus, taking (D.9) and (D.10) into (D.8), we obtain for 6NHu2λ22βk2βk6NH_{u}^{2}\|\lambda\|_{2}^{2}\beta^{2}_{k}\leq\beta_{k} (i.e., β1/6NHu2λ22\beta\leq 1/6NH_{u}^{2}\|\lambda\|_{2}^{2}) that,

𝔼[D¯ψ(x~(θk),xk+1)|kx]\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}
D¯ψ(x~(θk),x~k)(βk3NHu2λ22βk2)D¯ψ(x(θk),x~k)+(δu2+3V2)λ22βk2+2Nνk2\displaystyle\quad\leq\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}-\bigl{(}\beta_{k}-3NH_{u}^{2}\|\lambda\|_{2}^{2}\beta^{2}_{k}\bigr{)}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta^{2}_{k}+2N\nu_{k}^{2}
(1βk/2)D¯ψ(x~(θk),x~k)+(δu2+3V2)λ22βk2+Nβkνklog(1/νk)+2Nνk2,\displaystyle\quad\leq(1-\beta_{k}/2)\cdot\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}+N\beta_{k}\nu_{k}\log(1/\nu_{k})+2N\nu_{k}^{2},

where the second inequality follows from Lemma E.3. By Lemma E.3, we further have for νkO(1/k)\nu_{k}\leq O(1/k),

𝔼[D¯ψ(x~(θk),x~k+1)|kx]2Nνk+1\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}-2N\nu_{k+1} (D.11)
𝔼[D¯ψ(x~(θk),xk+1)|kx]\displaystyle\quad\leq\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),x_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]}
(1βk/2)D¯ψ(x~(θk),x~k)+(δu2+3V2)λ22βk2+Nβkνklog(1/νk)+2Nνk2,\displaystyle\quad\leq(1-\beta_{k}/2)\cdot\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}+N\beta_{k}\nu_{k}\log(1/\nu_{k})+2N\nu_{k}^{2},

By Lemma E.2, we have for any γ>max{1,1/νk2}\gamma>\max\{1,1/\nu_{k}^{2}\},

D¯ψ(x~(θk),x~k)(1+1γ)D¯ψ(x~(θk1),x~k)\displaystyle\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}-\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k-1}),\widetilde{x}_{k}\bigr{)} (D.12)
(1+γ)2/νk2(1+γ)2γx~(θk)x~(θk1)12\displaystyle\quad\leq\frac{(1+\gamma)^{2}/\nu_{k}^{2}-(1+\gamma)}{2\gamma}\cdot\bigl{\|}\widetilde{x}_{*}(\theta_{k})-\widetilde{x}_{*}(\theta_{k-1})\bigr{\|}_{1}^{2}
(1+γ)2/νk2(1+γ)2γθkθk122(1+γ)2/νk2(1+γ)2γH~2αk12~fk122,\displaystyle\quad\leq\frac{(1+\gamma)^{2}/\nu_{k}^{2}-(1+\gamma)}{2\gamma}\cdot\|\theta_{k}-\theta_{k-1}\|^{2}_{2}\leq\frac{(1+\gamma)^{2}/\nu_{k}^{2}-(1+\gamma)}{2\gamma}\cdot\widetilde{H}_{*}^{2}\alpha_{k-1}^{2}\cdot\|\widetilde{\nabla}f_{k-1}\|^{2}_{2},

where the second inequality follows from Lemma B.3. Taking (D.12) into (D.11) and choosing γ=(42βk)/βk\gamma=(4-2\beta_{k})/\beta_{k}, we obtain

𝔼[D¯ψ(x~(θk),x~k+1)|kx]\displaystyle\mathbb{E}\Bigl{[}\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k+1}\bigr{)}\,\Big{|}\,\mathcal{F}^{x}_{k}\Bigr{]} (D.13)
(1βk/2)(1+1γ)D¯ψ(x(θk1),xk)+(δu2+3V2)λ22βk2+Nβkνklog(1/νk)\displaystyle\quad\leq(1-\beta_{k}/2)\cdot\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}+N\beta_{k}\nu_{k}\log(1/\nu_{k})
+2N(νk2+νk+1)+(1βk/2)(1+γ)2/νk2(1+γ)2γH~2αk12𝔼[~fk122|kx]\displaystyle\quad\qquad+2N\cdot(\nu_{k}^{2}+\nu_{k+1})+(1-\beta_{k}/2)\cdot\frac{(1+\gamma)^{2}/\nu_{k}^{2}-(1+\gamma)}{2\gamma}\cdot\widetilde{H}_{*}^{2}\alpha^{2}_{k-1}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]}
=(1βk/4)D¯ψ(x(θk1),xk)+(δu2+3V2)λ22βk2+Nβkνklog(1/νk)\displaystyle\quad=(1-\beta_{k}/4)\cdot\overline{D}_{\psi}\bigl{(}x_{*}(\theta_{k-1}),x_{k}\bigr{)}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}+N\beta_{k}\nu_{k}\log(1/\nu_{k})
+2N(νk2+νk+1)+(1βk/4)(1+γ)/νk212H~2αk12𝔼[~fk122|kx].\displaystyle\quad\qquad+2N\cdot(\nu_{k}^{2}+\nu_{k+1})+(1-\beta_{k}/4)\cdot\frac{(1+\gamma)/\nu_{k}^{2}-1}{2}\cdot\widetilde{H}_{*}^{2}\alpha^{2}_{k-1}\cdot\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}^{x}_{k}\bigr{]}.

By Assumption 5.3, we have

𝔼[~fk122|k1θ]\displaystyle\mathbb{E}\bigl{[}\|\widetilde{\nabla}f_{k-1}\|^{2}_{2}\,\big{|}\,\mathcal{F}_{k-1}^{\theta}\bigr{]} (D.14)
3𝔼[~f(θk1,x~k)~f(θk1,x~(θk1))22|k1θ]\displaystyle\quad\leq 3\cdot\mathbb{E}\biggl{[}\Bigl{\|}\widetilde{\nabla}f(\theta_{k-1},\widetilde{x}_{k})-\widetilde{\nabla}f\bigl{(}\theta_{k-1},\widetilde{x}_{*}(\theta_{k-1})\bigr{)}\Bigr{\|}_{2}^{2}\,\bigg{|}\,\mathcal{F}_{k-1}^{\theta}\biggr{]}
+3𝔼[~f(θk1,x~(θk1))f(θk1)22|k1θ]+3𝔼[f(θk1)22|k1θ]\displaystyle\qquad+3\cdot\mathbb{E}\biggl{[}\Bigl{\|}\widetilde{\nabla}f\bigl{(}\theta_{k-1},\widetilde{x}_{*}(\theta_{k-1})\bigr{)}-\nabla f_{*}(\theta_{k-1})\Bigr{\|}_{2}^{2}\,\bigg{|}\,\mathcal{F}_{k-1}^{\theta}\biggr{]}+3\cdot\mathbb{E}\Bigl{[}\bigl{\|}\nabla f_{*}(\theta_{k-1})\bigr{\|}_{2}^{2}\,\Big{|}\,\mathcal{F}_{k-1}^{\theta}\Bigr{]}
3H~2D¯ψ(x~(θk1),x~k)+3H~2D¯ψ(x(θk1),x~(θk1))+3M2\displaystyle\quad\leq 3\widetilde{H}^{2}\cdot\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k-1}),\widetilde{x}_{k}\bigr{)}+3\widetilde{H}^{2}\cdot\overline{D}_{\psi}\bigl{(}{x}_{*}(\theta_{k-1}),\widetilde{x}_{*}(\theta_{k-1})\bigr{)}+3M^{2}
3H~2D¯ψ(x~(θk1),x~k)+3M2+6NH~2νk1,\displaystyle\quad\leq 3\widetilde{H}^{2}\cdot\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k-1}),\widetilde{x}_{k}\bigr{)}+3M^{2}+6N\widetilde{H}^{2}\nu_{k-1},

where the third inequality follows from Assumptions 5.2 and 5.3, and the last inequality follows from Lemma E.3. Taking (D.14) into (D.13) and taking expectation on both sides, we obtain

ϵ~k+1x\displaystyle\widetilde{\epsilon}_{k+1}^{x} {1βk/4+3[(1+γ)/νk21]/2H~2H~2αk12}ϵ~kx+(δu2+3V2)λ22βk2\displaystyle\leq\Bigl{\{}1-\beta_{k}/4+3\bigl{[}(1+\gamma)/\nu_{k}^{2}-1\bigr{]}/2\cdot\widetilde{H}^{2}\widetilde{H}_{*}^{2}\alpha^{2}_{k-1}\Bigr{\}}\cdot\widetilde{\epsilon}_{k}^{x}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}
+(1βk/4)(1+γ)/νk212H~2αk12(2M2+6NH~2νk1)\displaystyle\qquad+(1-\beta_{k}/4)\cdot\frac{(1+\gamma)/\nu_{k}^{2}-1}{2}\cdot\widetilde{H}_{*}^{2}\alpha^{2}_{k-1}\cdot(2M^{2}+6N\widetilde{H}^{2}\nu_{k-1})
Nβkνklogνk+2N(νk+1+νk2).\displaystyle\qquad-N\beta_{k}\nu_{k}\log\nu_{k}+2N\cdot(\nu_{k+1}+\nu_{k}^{2}). (D.15)

With γ=(4βk2)/βk\gamma=(4\beta_{k}-2)/\beta_{k}, αk=α/(k+1)\alpha_{k}=\alpha/(k+1), βk=β/(k+1)2/7\beta_{k}=\beta/(k+1)^{2/7}, νk=1/(k+1)4/7\nu_{k}=1/(k+1)^{4/7}, and α/β3/21/7H~H~\alpha/\beta^{3/2}\leq 1/7\widetilde{H}\widetilde{H}_{*}, we have

3[(1+γ)/νk21]/2H~2H~2αk12\displaystyle 3\bigl{[}(1+\gamma)/\nu_{k}^{2}-1\bigr{]}/2\cdot\widetilde{H}^{2}\widetilde{H}_{*}^{2}\alpha^{2}_{k-1} =3[(4/βk1)/νk21]/2H~2H~2αk12\displaystyle=3\bigl{[}(4/\beta_{k}-1)/\nu_{k}^{2}-1\bigr{]}/2\cdot\widetilde{H}^{2}\widetilde{H}_{*}^{2}\alpha^{2}_{k-1}
6H~2H~2αk2/βkνk2βk2/8βk/8,\displaystyle\leq 6\widetilde{H}^{2}\widetilde{H}_{*}^{2}\alpha^{2}_{k}/\beta_{k}\nu_{k}^{2}\leq\beta^{2}_{k}/8\leq\beta_{k}/8,

taking which into (D.1), we obtain

ϵ~k+1x\displaystyle\widetilde{\epsilon}_{k+1}^{x} (1βk/8)ϵ~kx+(δu2+3V2)λ22βk2Nβkνklogνk+2N(νk+1+νk2)\displaystyle\leq(1-\beta_{k}/8)\cdot\widetilde{\epsilon}_{k}^{x}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}-N\beta_{k}\nu_{k}\log\nu_{k}+2N\cdot(\nu_{k+1}+\nu_{k}^{2})
+βk2(M2+3NH~2νk1)/4H~2\displaystyle\qquad+\beta^{2}_{k}\cdot(M^{2}+3N\widetilde{H}^{2}\nu_{k-1})/4\widetilde{H}^{2}
(1βk/8)ϵkx+(δu2+3V2)λ22βk2Nβkνklogνk+2N(νk+1+νk2)\displaystyle\leq(1-\beta_{k}/8)\cdot\epsilon_{k}^{x}+(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}\beta_{k}^{2}-N\beta_{k}\nu_{k}\log\nu_{k}+2N\cdot(\nu_{k+1}+\nu_{k}^{2})
+βk2(M2+3NH~2)/4H~2.\displaystyle\qquad+\beta^{2}_{k}\cdot(M^{2}+3N\widetilde{H}^{2})/4\widetilde{H}^{2}. (D.16)

Recursively applying (D.1), we obtain

ϵk+1x\displaystyle\epsilon_{k+1}^{x} ϵ0xj=0k(1βj/8)+[(δu2+3V2)λ22+(M2+3NH~2)/4H~2][l=0kβl2j=l+1k(1βj/8)]\displaystyle\leq\epsilon_{0}^{x}\cdot\prod^{k}_{j=0}(1-\beta_{j}/8)+\bigl{[}(\delta_{u}^{2}+3V_{*}^{2})\cdot\|\lambda\|_{2}^{2}+(M^{2}+3N\widetilde{H}^{2})/4\widetilde{H}^{2}\bigr{]}\cdot\biggl{[}\sum_{l=0}^{k}\beta_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}
+N[l=0k(βlνllogνl+2νl+1+2νl2)j=l+1k(1βj/8)].\displaystyle\qquad+N\cdot\biggl{[}\sum_{l=0}^{k}(-\beta_{l}\nu_{l}\log\nu_{l}+2\nu_{l+1}+2\nu_{l}^{2})\cdot\prod^{k}_{j=l+1}(1-\beta_{j}/8)\biggr{]}.

D.2 Proof of Lemma D.2

Proof.

Applying (D.14) to (C.2) and taking expectation on both sides, we get

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} (1μαk)ϵkθ+(3M2+6NH~2νk)αk2+H~2(3αk2+αk/μ)ϵk+1x\displaystyle\leq(1-\mu\alpha_{k})\cdot\epsilon_{k}^{\theta}+(3M^{2}+6N\widetilde{H}^{2}\nu_{k})\cdot\alpha_{k}^{2}+\widetilde{H}^{2}\cdot(3\alpha_{k}^{2}+\alpha_{k}/\mu)\cdot\epsilon_{k+1}^{x}
(1μαk)ϵkθ+(3M2+6NH~2)αk2+H~2(3αk2+αk/μ)ϵk+1x,\displaystyle\leq(1-\mu\alpha_{k})\cdot\epsilon_{k}^{\theta}+(3M^{2}+6N\widetilde{H}^{2})\cdot\alpha_{k}^{2}+\widetilde{H}^{2}\cdot(3\alpha_{k}^{2}+\alpha_{k}/\mu)\cdot\epsilon_{k+1}^{x}, (D.17)

where the second inequality follows from νk1\nu_{k}\leq 1. Recursively applying (D.2), we obtain

ϵk+1θ\displaystyle\epsilon_{k+1}^{\theta} j=0k(1μαj)ϵ0θ+(3M2+6NH~2)l=0k[αl2j=l+1k(1μαj)]\displaystyle\leq\prod^{k}_{j=0}(1-\mu\alpha_{j})\cdot\epsilon_{0}^{\theta}+(3M^{2}+6N\widetilde{H}^{2})\cdot\sum_{l=0}^{k}\biggl{[}\alpha_{l}^{2}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}
+H~2l=0k[(2αl2+αl/μ)ϵl+1xj=l+1k(1μαj)].\displaystyle\qquad+\widetilde{H}^{2}\cdot\sum_{l=0}^{k}\bigg{[}(2\alpha_{l}^{2}+\alpha_{l}/\mu)\cdot\epsilon_{l+1}^{x}\cdot\prod^{k}_{j=l+1}(1-\mu\alpha_{j})\biggr{]}.

Appendix E Properties of the Bregman Divergence

The following lemma is used in the analysis of unconstrained games.

Lemma E.1.

Let ψ()\psi(\cdot) be 11-strongly convex with respect to the norm \|\cdot\|. Assume that (4.2) holds. We have for any γ>Hψ21\gamma>H_{\psi}^{2}\geq 1,

Dψ(x,z)(1+1γ)Dψ(y,z)Hψ2(1+γ)2(1+γ)2γxy2.\displaystyle D_{\psi}(x,z)-\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot D_{\psi}(y,z)\leq\frac{H_{\psi}^{2}\cdot(1+\gamma)^{2}-(1+\gamma)}{2\gamma}\cdot\|x-y\|^{2}.
Proof.

By the definition of Bregman divergence, we have for any γ>0\gamma>0,

Dψ(x,z)Dψ(y,z)\displaystyle D_{\psi}(x,z)-D_{\psi}(y,z) =ψ(x)ψ(z),xzψ(y)+ψ(z),yz\displaystyle=\psi(x)-\bigl{\langle}\nabla\psi(z),x-z\bigr{\rangle}-\psi(y)+\bigl{\langle}\nabla\psi(z),y-z\bigr{\rangle}
=Dψ(x,y)+ψ(z)ψ(x),yx\displaystyle=-D_{\psi}(x,y)+\bigl{\langle}\nabla\psi(z)-\nabla\psi(x),y-x\bigr{\rangle}
1/2xy2+ψ(z)ψ(x)yx,\displaystyle\leq-1/2\cdot\|x-y\|^{2}+\bigl{\|}\nabla\psi(z)-\nabla\psi(x)\bigr{\|}_{*}\cdot\|y-x\|,

where the inequality follows from 11-strong convexity of ψ()\psi(\cdot) and the Cauchy-Schwartz inequality. By (4.2), we further have

Dψ(x,z)Dψ(y,z)\displaystyle D_{\psi}(x,z)-D_{\psi}(y,z) 1/2xy2+Hψzxyx,\displaystyle\leq-1/2\cdot\|x-y\|^{2}+H_{\psi}\cdot\|z-x\|\cdot\|y-x\|,
12(1+γ)xz2+Hψ2(1+γ)12xy2\displaystyle\leq\frac{1}{2(1+\gamma)}\cdot\|x-z\|^{2}+\frac{H_{\psi}^{2}\cdot(1+\gamma)-1}{2}\cdot\|x-y\|^{2}
11+γDψ(x,z)+Hψ2(1+γ)12xy2,\displaystyle\leq\frac{1}{1+\gamma}\cdot D_{\psi}(x,z)+\frac{H_{\psi}^{2}\cdot(1+\gamma)-1}{2}\cdot\|x-y\|^{2}, (E.1)

where the second inequality follows from 11-strong convexity of ψ()\psi(\cdot). Rearranging the terms in (E), we finish the proof of Lemma E.1. ∎

The following two lemmas are involved in the analysis of simplex constrained games.

Lemma E.2.

Let ψ()\psi(\cdot) be the Shannon entropy. We have for any γk>max{1,1/νk2}\gamma_{k}>\max\{1,1/\nu_{k}^{2}\},

Dψi(x~i(θk),x~ki)(1+1γ)Dψi(x~i(θk1),x~ki)\displaystyle D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-\biggl{(}1+\frac{1}{\gamma}\biggr{)}\cdot D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k-1}),\widetilde{x}^{i}_{k}\bigr{)} (1+γ)2/νk2(1+γ)2γx~i(θk)x~i(θk1)12,\displaystyle\leq\frac{(1+\gamma)^{2}/\nu_{k}^{2}-(1+\gamma)}{2\gamma}\cdot\bigl{\|}\widetilde{x}_{*}^{i}(\theta_{k})-\widetilde{x}_{*}^{i}(\theta_{k-1})\bigr{\|}_{1}^{2},

for {x~k}k0\{\widetilde{x}_{k}\}_{k\geq 0} generated by Algorithm 2.

Proof.

For the Shannon entropy ψ()\psi(\cdot), we have xjψ(x~)=1+log[x~]j\nabla_{x_{j}}\psi(\widetilde{x})=1+\log[\widetilde{x}]_{j}, which gives

|[xi]jψ(x~i(θk))[xi]jψ(x~ki)|=|log[x~i(θk)]jlog[x~ki]j|1/νk|[x~i(θk)]j[x~ki]j|.\displaystyle\Bigl{|}\nabla_{[x^{i}]_{j}}\psi\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k})\bigr{)}-\nabla_{[x^{i}]_{j}}\psi(\widetilde{x}^{i}_{k})\Bigr{|}=\Bigl{|}\log\bigl{[}\widetilde{x}^{i}_{*}(\theta_{k})\bigr{]}_{j}-\log[\widetilde{x}^{i}_{k}]_{j}\Bigr{|}\leq 1/\nu_{k}\cdot\Bigl{|}\bigl{[}\widetilde{x}^{i}_{*}(\theta_{k})\bigr{]}_{j}-[\widetilde{x}^{i}_{k}]_{j}\Bigr{|}.

Thus, we have

ψ(x~i(θk))ψ(x~ki)1/νkx~i(θk)x~ki1.\displaystyle\Bigl{\|}\nabla\psi\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k})\bigr{)}-\nabla\psi(\widetilde{x}^{i}_{k})\Bigr{\|}_{\infty}\leq 1/\nu_{k}\cdot\bigl{\|}\widetilde{x}^{i}_{*}(\theta_{k})-\widetilde{x}^{i}_{k}\bigr{\|}_{1}.

Replacing HψH_{\psi} with 1/νk1/\nu_{k} in the proof of Lemma E.1, we conclude the proof of Lemma E.2. ∎

Lemma E.3.

Let {xk}k0\{x_{k}\}_{k\geq 0} and {x~k}k0\{\widetilde{x}_{k}\}_{k\geq 0} be the sequences of strategy profiles generated by Algorithm 2 with νkO(1/k)\nu_{k}\leq O(1/k). We have

D¯ψ(x~(θk),x~k)D¯ψ(x~(θk),xk)\displaystyle\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}-\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),{x}_{k}\bigr{)} 2Nνk,\displaystyle\leq 2N\nu_{k}, (E.2)
D¯ψ(x~(θk),x~k+1)D¯ψ(x~(θk),xk+1)\displaystyle\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k+1}\bigr{)}-\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),{x}_{k+1}\bigr{)} 2Nνk+1,\displaystyle\leq 2N\nu_{k+1}, (E.3)
D¯ψ(x~(θk),x~k)D¯ψ(x(θk),x~k)\displaystyle\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)}-\overline{D}_{\psi}\bigl{(}{x}_{*}(\theta_{k}),\widetilde{x}_{k}\bigr{)} 2Nνklog(1/νk).\displaystyle\leq 2N\nu_{k}\log(1/\nu_{k}). (E.4)
Proof.

By the definition of KL divergence, we have

Dψ(x~i(θk),x~k+1i)Dψ(x~i(θk),xk+1i)=x~i(θk),log(xk+1i/x~k+1i).\displaystyle{D}_{\psi}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k+1}\bigr{)}-{D}_{\psi}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),{x}^{i}_{k+1}\bigr{)}=\bigl{\langle}\widetilde{x}_{*}^{i}(\theta_{k}),\log(x^{i}_{k+1}/\widetilde{x}_{k+1}^{i})\bigr{\rangle}. (E.5)

By (4.6), we have for all νkO(k1)\nu_{k}\leq O(k^{-1}) that

log[xk+1i]j[x~k+1i]j\displaystyle\log\frac{[x^{i}_{k+1}]_{j}}{[\widetilde{x}^{i}_{k+1}]_{j}} =log{[xk+1i]j(1νk+1)[xk+1i]j+νk+1/di}\displaystyle=\log\biggl{\{}\frac{[{x}^{i}_{k+1}]_{j}}{(1-\nu_{k+1})\cdot[{x}^{i}_{k+1}]_{j}+\nu_{k+1}/d^{i}}\biggr{\}}
=log{1+νk+1([xk+1i]j1/di)(1νk+1)[xk+1i]j+νk+1/di}2νk+1.\displaystyle=\log\biggl{\{}1+\frac{\nu_{k+1}\cdot\bigl{(}[{x}^{i}_{k+1}]_{j}-1/d^{i}\bigr{)}}{(1-\nu_{k+1})\cdot[{x}^{i}_{k+1}]_{j}+\nu_{k+1}/d^{i}}\biggr{\}}\leq 2\nu_{k+1}. (E.6)

Taking (E) into (E.5), we obtain

D¯ψ(x~(θk),x~k+1)D¯ψ(x~(θk),xk+1)i𝒩x~i(θk),𝟏di2νk+1=2Nνk+1,\displaystyle\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),\widetilde{x}_{k+1}\bigr{)}-\overline{D}_{\psi}\bigl{(}\widetilde{x}_{*}(\theta_{k}),{x}_{k+1}\bigr{)}\leq\sum_{i\in{\mathcal{N}}}\bigl{\langle}\widetilde{x}^{i}_{*}(\theta_{k}),{\bf 1}_{d^{i}}\bigr{\rangle}\cdot 2\nu_{k+1}=2N\nu_{k+1},

which concludes the proof of (E.2). Similar arguments also yields (E.3). Also, we have

Dψi(x~i(θk),x~ki)Dψi(xi(θk),x~ki)\displaystyle D_{\psi^{i}}\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}-D_{\psi^{i}}\bigl{(}{x}^{i}_{*}(\theta_{k}),\widetilde{x}^{i}_{k}\bigr{)}
=x~i(θk),log(x~i(θk)/x~ki)xi(θk),log(xi(θk)/x~ki))\displaystyle\quad=\Bigl{\langle}\widetilde{x}^{i}_{*}(\theta_{k}),\log\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k})/\widetilde{x}^{i}_{k}\bigr{)}\Bigr{\rangle}-\Bigl{\langle}{x}^{i}_{*}(\theta_{k}),\log\bigl{(}{x}^{i}_{*}(\theta_{k})/\widetilde{x}^{i}_{k}\bigr{)}\Bigr{)}
=x~i(θk)xi(θk),log(x~i(θk)/x~ki)Dψi(xi(θk),x~i(θk))\displaystyle\quad=\Bigl{\langle}\widetilde{x}^{i}_{*}(\theta_{k})-{x}^{i}_{*}(\theta_{k}),\log\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k})/\widetilde{x}^{i}_{k}\bigr{)}\Bigr{\rangle}-D_{\psi^{i}}\bigl{(}x_{*}^{i}(\theta_{k}),\widetilde{x}_{*}^{i}(\theta_{k})\bigr{)}
x~i(θk)xi(θk)1log(x~i(θk)/x~ki)2νklog(1/νk),\displaystyle\quad\leq\bigl{\|}\widetilde{x}^{i}_{*}(\theta_{k})-{x}^{i}_{*}(\theta_{k})\bigr{\|}_{1}\cdot\Bigl{\|}\log\bigl{(}\widetilde{x}^{i}_{*}(\theta_{k})/\widetilde{x}^{i}_{k}\bigr{)}\Bigr{\|}_{\infty}\leq 2\nu_{k}\log(1/\nu_{k}),

summing up which for i𝒩i\in{\mathcal{N}} gives (E.4). ∎

Appendix F Extensions to Games with Multiple Equilibria

In this section, we provide more details on how to extend our algorithms to games with multiple equilibria. As discussed in Section 6, the only challenging case is when the function vθ(x)v_{\theta(x)} is monotone but not strongly monotone, so that the equilibrium set is a convex and closed region.

Example F.1.

We consider a specific routing game (see Example 3.2). Specifically, we study a simple network as shown in Figure 2. Suppose that the number of nonatomic agents aiming to travel from node 1 to node 4 is d=10d=10.

Refer to caption
Figure 2: A 3-node-4-link network.

The costs for using the 4 edges are given by t1(x1)=4+(x1)4t^{1}(x^{1})=4+(x^{1})^{4}, t2(x2)=20+5(x2)4t^{2}(x^{2})=20+5(x^{2})^{4}, t3(x3)=1+30(x3)4t^{3}(x^{3})=1+30(x^{3})^{4} and t4(x4)=30+(x4)4t^{4}(x^{4})=30+(x^{4})^{4}, respectively. There are 4 paths connecting node 1 and node 4 (path 1 uses link 1 and link 3, path 2 uses link 2 and link 4, path 3 uses link 1 and link 4, path 4 uses link 2 and 3). Let q=(qa)a=14q=(q^{a})_{a=1}^{4} be the number of agents using each path and c(q)=(ca(q))a=14c(q)=(c^{a}(q))_{a=1}^{4} be cost of using each path. Then we have

c(q)=Δ𝖳t(x),c(q)=\Delta^{\mathsf{T}}t(x),

where Δ=[1,0,1,0;0,1,0,1;1,0,0,1;0,1,1,0]\Delta=[1,0,1,0;0,1,0,1;1,0,0,1;0,1,1,0]. Therefore, we have c(q)=Δ𝖳Diag(t(x))Δ\nabla c(q)=\Delta^{\mathsf{T}}\text{Diag}({t^{\prime}(x)})\Delta, which is semi-positive definite but not positive definite because the matrix Δ\Delta is singular. Under this setting, the set of equilibria to this routing game can be written as

{f0:Δf=x},wherex=[6,4,3,7]𝖳.\{f^{*}\geq 0:\Delta f^{*}=x^{*}\},\quad\text{where}~{}x^{*}=[6,4,3,7]^{\mathsf{T}}. (F.1)

Yet, we can simply add a regularizer ηi(log(xi+ϵ)+1)\eta^{i}\cdot(\log(x^{i}+\epsilon)+1) to each vθi(x)v_{\theta}^{i}(x). We note that the Jacobian of vθi(x)+ηi(log(xi+ϵ)+1)v_{\theta}^{i}(x)+\eta^{i}\cdot(\log(x^{i}+\epsilon)+1) is vθi(x)+ηiDiag(1/(xi+ϵ))\nabla v_{\theta}^{i}(x)+\eta^{i}\cdot\text{Diag}(1/(x^{i}+\epsilon)). The unique equilibrium then would be strongly stable (see Lemma B.2). Meanwhile, if ϵ>0\epsilon>0, the Jacobian would also be upper-bounded, hence the function vθi(x)+ηi(log(xi+ϵ)+1)v_{\theta}^{i}(x)+\eta^{i}\cdot(\log(x^{i}+\epsilon)+1) is still Lipschitz continuous. We summarize a few properties of this regularizer. First, as long as η1>0\eta^{1}>0 and ϵ>0\epsilon>0, all conditions in Assumption 5.1 would be satisfied. Seond, if η1>0\eta^{1}>0 and ϵ=0\epsilon=0, the resulting equilibrium is often referred to as a quantal response (response) equilibrium, would is more realistic than a Nash (Wardrop) equilibrium if we believe that human beings are not always fully rational [42, 1, 33]. Third, if η1>0\eta^{1}>0 and ϵ>0\epsilon>0, the resulting equilibrium would be a “smoothed” quantal response equilibrium. It is close to the quantal response equilibrium when ϵ\epsilon is sufficiently small, but it preserves the Lipschitz continuity.