Gradient-Free Nash Equilibrium Seeking in N-Cluster Games with Uncoordinated Constant Step-Sizes

Yipeng Pang and Guoqiang Hu This research was supported by Singapore Ministry of Education Academic Research Fund Tier 1 RG180/17(2017-T1-002-158).Y. Pang and G. Hu are with the School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore [email protected], [email protected].

Abstract

This work investigates a problem of simultaneous global cost minimization and Nash equilibrium seeking, which commonly exists in $N$ -cluster non-cooperative games. Specifically, the agents in the same cluster collaborate to minimize a global cost function, being a summation of their individual cost functions, and jointly play a non-cooperative game with other clusters as players. For the problem settings, we suppose that the explicit analytical expressions of the agents’ local cost functions are unknown, but the function values can be measured. We propose a gradient-free Nash equilibrium seeking algorithm by a synthesis of Gaussian smoothing techniques and gradient tracking. Furthermore, instead of using the uniform coordinated step-size, we allow the agents across different clusters to choose different constant step-sizes. When the largest step-size is sufficiently small, we prove a linear convergence of the agents’ actions to a neighborhood of the unique Nash equilibrium under a strongly monotone game mapping condition, with the error gap being propotional to the largest step-size and the smoothing parameter. The performance of the proposed algorithm is validated by numerical simulations.

Index Terms:

Nash equilibrium (NE) seeking, gradient-free methods, non-cooperative games.

I Introduction

The research on cooperation and competition across multiple interacting agents has been extensively studied in recent years, especially on distributed optimization and Nash equilibrium (NE) seeking in non-cooperative games. Specifically, distributed optimization deals with a cooperative minimization problem among a network of agents. On the other hand, NE seeking in non-cooperative games is concerned with a number of agents (also known as players), who are self-interested to minimize their individual cost given the other agents’ decisions.

To simultaneously model the cooperative and competitive behaviors in networked systems, an $N$ -cluster game is formulated. This game is essentially a non-cooperative game played among $N$ interacting clusters with each cluster being a virtual player. In each cluster, there are a group of agents who collaboratively minimize a cluster-level cost function given by a summation of their individual local cost functions. With these features, the $N$ -cluster game naturally accommodates both collaboration and competition in a unified framework, which motivates us to study and propose solutions to find its NE. In this paper, we consider such an $N$ -cluster non-cooperative game. Moreover, we further suppose that the explicit analytical expressions of the agents’ local cost functions are unknown, but the function values can be measured.

A substantial works on NE seeking algorithms for non-cooperative games have been reported in the recent literature, including [1, 2, 3, 4, 5], to list a few. The focus of the aforementioned works is mainly on the competitive nature in the non-cooperative games. Different from that, the works in [6, 7] considered two sub-networks zero-sum games, where each subnetwork owns an opposing cost function to be cooperatively minimized by the agents in the corresponding subnetwork. Then, an extension of such problems to $N$ subnetworks was firstly formulated in [8], which is known as an $N$ -cluster (or coalition) game. Then, this problem has received a high research interest recently, which includes [9, 10, 11, 12, 13, 14, 15, 16]. Most of the above works focus on continuous-time based methods, such as gradient play [9, 10, 11, 12], subgradient dynamics [13], projected primal-dual dynamics [14], and extremum-seeking techniques [15]. Our previous work in [16] introduced a discrete-time NE seeking strategy based on a synthesis of gradient-free and gradient-tracking techniques. This paper revisits the $N$ -cluster game, and aims to extend the results to uncoordinated step-sizes across different clusters.

Contributions: As compared to the aforementioned relevant works, the contributions of this work can be summarized as follows. 1) In contrast to the problem setups in [9, 10, 11, 12, 13, 14], we limit the agents on the access to the cost functions: no explicit analytical expressions but only the values of the local cost functions can be utilized in the update laws. In this case, no gradient information can be directly utilized in the design of the algorithm. Hence, gradient-free techniques are adopted in this work. 2) As compared to our prior work [16], we extend the gradient-tracking method to allow uncoordinated constant step-sizes across different clusters, which further reduces the coordination among players from different clusters. 3) The technical challenges of the convergence analysis brought by gradient tracking methods in games, and uncoordinated step-sizes are addressed in this work. For the convergence results: we obtain a linear convergence to a neighborhood of the unique NE with the error being proportional to the largest step-size and a smoothing parameter under appropriate settings.

Notations: We use $\mathbf{1}_{m}$ ( $\mathbf{0}_{m}$ ) for an $m$ -dimensional vector with all elements being 1 (0), and $I_{m}$ for an $m\times m$ identity matrix. For a vector $\pi$ , we use $\text{diag}(\pi)$ to denote a diagonal matrix formed by the elements of $\pi$ . For any two vectors $u,v$ , their inner product is denoted by $\langle u,v\rangle$ . The transpose of $u$ is denoted by $u^{\top}$ . Moreover, we use $\|u\|$ for its standard Euclidean norm, i.e., $\|u\|=\sqrt{\langle u,u\rangle}$ . For vector $u$ , we use $[u]_{i}$ to denote its $i$ -th entry. The transpose and spectral norm of a matrix $A$ are denoted by $A^{\top}$ and $\|A\|$ , respectively. We use $\rho(A)$ to represent the spectral radius of a square matrix $A$ . The expectation operator is denoted by $\mathbb{E}[\cdot]$ .

II Problem Statement

II-A Problem Formulation

An $N$ -cluster game is defined by $\Gamma(\mathcal{N},\{f^{i}\},\{\mathbb{R}^{n_{i}}\})$ , where each cluster, indexed by $i\in\mathcal{N}\triangleq\{1,2,\ldots,N\}$ , consists of a group of agents, denoted by $\mathcal{V}^{i}\triangleq\{1,2,\ldots,n_{i}\}$ . Denote $n\triangleq\sum_{i=1}^{N}n_{i}$ . These agents aim to minimize a cluster-level cost function $f^{i}:\mathbb{R}^{n}\to\mathbb{R}$ , defined as

f^{i}(\mathbf{x}^{i},\mathbf{x}^{-i})\triangleq\frac{1}{n_{i}}\sum_{j=1}^{n_{i}}f^{i}_{j}(\mathbf{x}^{i},\mathbf{x}^{-i}),\quad\forall i\in\mathcal{N},

where $f^{i}_{j}(\mathbf{x}^{i},\mathbf{x}^{-i})$ is a local cost function of agent $j$ in cluster $i$ , $\mathbf{x}^{i}\triangleq[x^{i\top}_{1},\ldots,x^{i\top}_{n_{i}}]^{\top}\in\mathbb{R}^{n_{i}}$ is a collection of all agents’ actions in cluster $i$ with $x^{i}_{j}\in\mathbb{R}$ being the action of agent $j$ in cluster $i$ , and $\mathbf{x}^{-i}\in\mathbb{R}^{n-n_{i}}$ denotes a collection of all agents’ actions except cluster $i$ . Denote $\mathbf{x}\triangleq(\mathbf{x}^{i},\mathbf{x}^{-i})=[\mathbf{x}^{1\top},\ldots,\mathbf{x}^{N\top}]^{\top}$ .

Definition 1

(NE of $N$ -Cluster Games). A vector $\mathbf{x}^{*}\triangleq(\mathbf{x}^{i*},\mathbf{x}^{-i*})\in\mathbb{R}^{n}$ is said to be an NE of the $N$ -cluster non-cooperative game $\Gamma(\mathcal{N},\{f^{i}\},\{\mathbb{R}^{n_{i}}\})$ , if and only if

\displaystyle f^{i}(\mathbf{x}^{i*},\mathbf{x}^{-i*})\leq f^{i}(\mathbf{x}^{i},\mathbf{x}^{-i*}),\quad\forall\mathbf{x}^{i}\in\mathbb{R}^{n},\quad\forall i\in\mathcal{N}.

Within each cluster $i\in\mathcal{N}$ , there is an underlying directed communication network, denoted by $\mathcal{G}_{i}(\mathcal{V}^{i},\mathcal{E}^{i})$ with an adjacency matrix $\mathcal{A}^{i}\triangleq[a^{i}_{jk}]\in\mathbb{R}^{n_{i}\times n_{i}}$ . In particular, $a^{i}_{jk}>0$ if agent $j$ can directly pass information to agent $k$ , and $a^{i}_{jk}=0$ otherwise. We suppose $a^{i}_{jj}>0,\forall j\in\mathcal{V}^{i}$ . Regarding the communication network, the following standard assumption is supposed.

Assumption 1

For $i\in\mathcal{N}$ , the digraph $\mathcal{G}_{i}(\mathcal{V}^{i},\mathcal{E}^{i})$ is strongly connected. The adjacency matrix $\mathcal{A}^{i}$ is doubly-stochastic.

Noting that $\sigma_{\mathcal{A}^{i}}\triangleq\|\mathcal{A}^{i}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}\|<1$ [17, Lemma 1], we define $\bar{\sigma}\triangleq\max_{i\in\mathcal{N}}\sigma_{\mathcal{A}^{i}}$ and $\varsigma\triangleq\max_{i\in\mathcal{N}}(1+\sigma_{\mathcal{A}^{i}}^{2})/(1-\sigma_{\mathcal{A}^{i}}^{2})$ .

Moreover, we consider the scenario where the explicit analytical expressions of the agents’ local cost functions are unknown, but the function values can be measured, similar to the settings in [15, 16, 18, 19]. Regarding the cost function, the following standard assumption is supposed.

Assumption 2

For each $j\in\mathcal{V}^{i},i\in\mathcal{N}$ , the local cost function $f^{i}_{j}(\mathbf{x}^{i},\mathbf{x}^{-i})$ is convex in $\mathbf{x}^{i}$ , and continuously differentiable in $\mathbf{x}$ . The total gradient $\nabla f^{i}_{j}(\mathbf{x})$ is $L$ -Lipschitz continuous in $\mathbf{x}$ , i.e., for any $\mathbf{x},\mathbf{x}^{\prime}\in\mathbb{R}^{n}$ , $\|\nabla f^{i}_{j}(\mathbf{x})-\nabla f^{i}_{j}(\mathbf{x}^{\prime})\|\leq L\|\mathbf{x}-\mathbf{x}^{\prime}\|$ .

The game mapping of $\Gamma(\mathcal{N},\{f^{i}\},\{\mathbb{R}^{n_{i}}\})$ is defined as

\displaystyle\Phi(\mathbf{x})\triangleq[\nabla_{\mathbf{x}^{1}}f^{1}(\mathbf{x})^{\top},\ldots,\nabla_{\mathbf{x}^{N}}f^{N}(\mathbf{x})^{\top}]^{\top}.

The following standard assumption on the game mapping $\Phi(\mathbf{x})$ is supposed.

Assumption 3

The game mapping $\Phi$ of game $\Gamma$ is strongly monotone with a constant $\chi>0$ , i.e., for any $\mathbf{x},\mathbf{x}^{\prime}\in\mathbb{R}^{n}$ , we have $\langle\Phi(\mathbf{x})-\Phi(\mathbf{x}^{\prime}),\mathbf{x}-\mathbf{x}^{\prime}\rangle\geq\chi\|\mathbf{x}-\mathbf{x}^{\prime}\|^{2}$ .

Remark 1

It is known that under Assumptions 2 and 3, game $\Gamma$ admits a unique NE.

II-B Preliminaries

This part presents some preliminary results on gradient-free techniques based on Gaussian smoothing [20].

For $j\in\mathcal{V}^{i}$ , $i\in\mathcal{N}$ , a Gaussian-smoothed function of the local cost function $f^{i}_{j}(\mathbf{x})$ can be defined as

\displaystyle f^{i}_{j,\mu}(\mathbf{x})\triangleq\mathbb{E}_{\zeta\sim\mathcal{N}(\mathbf{0}_{n},I_{n})}[f^{i}_{j}(\mathbf{x}+\mu\zeta)],

(1)

where $\zeta$ is generated from a Gaussian distribution $\mathcal{N}(\mathbf{0}_{n},I_{n})$ , and $\mu\geq 0$ is a smoothing parameter.

For each cluster $i\in\mathcal{N}$ , the randomized gradient-free oracle of $f^{i}_{j}(\mathbf{x})$ for player $j$ with respect to agent $k$ , $j,k\in\mathcal{V}^{i}$ , $i\in\mathcal{N}$ at time step $t\geq 0$ is developed as

\displaystyle{g}^{i}_{jk}(\mathbf{x}_{t})\triangleq\frac{f^{i}_{j}(\mathbf{x}_{t}+\mu\zeta^{i}_{j,t})-f^{i}_{j}(\mathbf{x}_{t})}{\mu}[\zeta^{i}_{j,t}]^{i}_{k},

(2)

where $[\zeta^{i}_{j,t}]^{i}_{k}$ denotes the $(\sum_{l=0}^{i}n_{l}+k)$ -th element of $\zeta^{i}_{j,t}$ with $n_{0}=0$ , and $\zeta^{i}_{j,t}$ being player $j$ ’s own version of $\zeta$ at time step $t$ , and $\mu>0$ . The oracle (2) is useful as it can correctly estimate the partial gradient of the Gaussian-smoothed cost function $\nabla_{x^{i}_{k}}f^{i}_{j,\mu}(\mathbf{x}_{t})$ . The following results for $f^{i}_{j,\mu}(\mathbf{x})$ and ${g}^{i}_{jk}(\mathbf{x})$ can be readily obtained according to [20].

Lemma 1

(see [20]) Under Assumption 2, for $\forall j,k\in\mathcal{V}^{i},i\in\mathcal{N}$ , the following properties hold.

1.

The function $f^{i}_{j,\mu}(\mathbf{x})$ is convex in $\mathbf{x}^{i}$ and totally differentiable in $\mathbf{x}$ .
2.

The total gradient $\nabla f^{i}_{j,\mu}(\mathbf{x})$ is $L$ -Lipschitz continuous in $\mathbf{x}$ , i.e., $\forall\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ , $\|\nabla f^{i}_{j,\mu}(\mathbf{x})-\nabla f^{i}_{j,\mu}(\mathbf{y})\|\leq L\|\mathbf{x}-\mathbf{y}\|$ ; and satisfies that $\|\nabla f^{i}_{j,\mu}(\mathbf{x})-\nabla f^{i}_{j}(\mathbf{x})\|\leq\frac{1}{2}(n+3)^{\frac{3}{2}}L\mu$ .
3.

The randomized gradient-free oracle ${g}^{i}_{jk}(\mathbf{x})$ satisfies that $\mathbb{E}[{g}^{i}_{jk}(\mathbf{x})]=\nabla_{x^{i}_{k}}f^{i}_{j,\mu}(\mathbf{x})$ , and $\mathbb{E}[\|{g}^{i}_{jk}(\mathbf{x})\|^{2}]\leq 4(n+4)\|\nabla f^{i}_{j,\mu}(\mathbf{x})\|^{2}+3(n+4)^{3}\mu^{2}L^{2}$ .

We define a Gaussian-smoothed game associated with the $N$ -cluster game $\Gamma$ , denoted by $\Gamma_{\mu}(\mathcal{N},\{f^{i}_{\mu}\},\{\mathbb{R}^{n_{i}}\})$ , having the same set of clusters and action sets as game $\Gamma$ , but the cost function is given by

\displaystyle f^{i}_{\mu}(\mathbf{x}^{i},\mathbf{x}^{-i})\triangleq\frac{1}{n_{i}}\sum_{j=1}^{n_{i}}f^{i}_{j,\mu}(\mathbf{x}^{i},\mathbf{x}^{-i}),\quad\forall i\in\mathcal{N},

where $f^{i}_{j,\mu}$ is a Gaussian-smoothed function of $f^{i}_{j}$ defined in (1). Similar to the game mapping of $\Gamma$ , we define the game mapping of $\Gamma_{\mu}$ by $\Phi_{\mu}(\mathbf{x})\triangleq[\nabla_{\mathbf{x}^{1}}f^{1}_{\mu}(\mathbf{x})^{\top},\ldots,\nabla_{\mathbf{x}^{N}}f^{N}_{\mu}(\mathbf{x})^{\top}]^{\top}$ . The following lemma shows the strong monotonicity condition of $\Phi_{\mu}(\mathbf{x})$ , and quantifies the distance between the NE of the smoothed game $\Gamma_{\mu}$ and the NE of the original game $\Gamma$ in terms of the smoothing parameter $\mu$ .

Lemma 2

(see [16, Lemma 1]) Under Assumptions 2 and 3, for $\forall\mu\geq 0$ , the smoothed game $\Gamma_{\mu}(\mathcal{N},\{f^{i}_{\mu}\},\{\mathbb{R}^{n_{i}}\})$ holds that

1.

The game mapping $\Phi_{\mu}(\mathbf{x})$ is $\chi$ -strongly monotone.

The smoothed game $\Gamma_{\mu}$ admits a unique NE (denoted by $\mathbf{x}_{\mu}^{*}$ ) satisfying that

\displaystyle\|\mathbf{x}_{\mu}^{*}-\mathbf{x}^{*}\|\leq\frac{n(n+3)^{\frac{3}{2}}L\gamma}{2(1-\sqrt{1-\gamma\chi})}\mu,

where $\mathbf{x}^{*}$ is the unique NE of the original game $\Gamma$ , and $\gamma\in(0,\frac{\chi}{n^{2}L^{2}}]$ is a constant.

It follows from Lemma 2 that $\mathbf{x}^{*}_{\mu}$ is the unique NE of the smoothed game $\Gamma_{\mu}(\mathcal{N},\{f^{i}_{\mu}\},\{\mathbb{R}^{n_{i}}\})$ , and hence holds that $\Phi_{\mu}(\mathbf{x}^{*}_{\mu})=\mathbf{0}_{n}$ . We define $G\triangleq\max_{j\in\mathcal{V}^{i},i\in\mathcal{N}}\|\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\|$ .

III NE Seeking Algorithm for N-Cluster Games

III-A Algorithm

In this part, we present an NE seeking strategy for the $N$ -Cluster Game. Specifically, each agent $j\in\mathcal{V}^{i}$ , $i\in\mathcal{N}$ needs to maintain its own action variable $x^{i}_{j}$ , and gradient tracker variables $\varphi^{i}_{jk}$ for $\forall k\in\mathcal{V}^{i}$ . Let $x^{i}_{j,t},\varphi^{i}_{jk,t}$ denote the values of these variables at time-step $t$ . The update laws for each agent $j\in\mathcal{V}^{i}$ , $i\in\mathcal{N}$ are designed as


$\displaystyle y^{i}_{jk,t+1}$	$\displaystyle=\sum_{l=1}^{n_{i}}a^{i}_{jl}y^{i}_{lk,t}-\alpha^{i}\varphi^{i}_{jk,t},$	(3a)
$\displaystyle x^{i}_{j,t+1}$	$\displaystyle=y^{i}_{jj,t+1},$	(3b)
$\displaystyle\varphi^{i}_{jk,t+1}$	$\displaystyle=\sum_{l=1}^{n_{i}}a^{i}_{jl}\varphi^{i}_{lk,t}+{g}^{i}_{jk}(\mathbf{x}_{t+1})-{g}^{i}_{jk}(\mathbf{x}_{t}),$	(3c)

with arbitrary $x^{i}_{j,0},y^{i}_{jk,0}\in\mathbb{R}$ and $\varphi^{i}_{jk,0}={g}^{i}_{jk}(\mathbf{x}_{0})$ , where ${g}^{i}_{jk}(\mathbf{x}_{t})$ is the gradient estimator given by (2). and $\alpha^{i}>0$ is a constant step-size sequence adopted by all agents in cluster $i\in\mathcal{N}$ . Denote the largest step-size by ${\alpha}_{\max}\triangleq\max_{i\in\mathcal{N}}\alpha^{i}$ and the average of all step-sizes by $\bar{\alpha}\triangleq\frac{1}{n}\sum_{i\in\mathcal{N}}n_{i}\alpha^{i}$ . Define the heterogeneity of the step-size as the following ratio, $\epsilon_{\alpha}\triangleq\|\bm{\alpha}-\bar{\bm{\alpha}}\|/\|\bar{\bm{\alpha}}\|$ , where $\bm{\alpha}\triangleq[\alpha^{1}\mathbf{1}^{\top}_{n_{1}},\ldots,\alpha^{N}\mathbf{1}^{\top}_{n_{N}}]^{\top}$ and $\bar{\bm{\alpha}}\triangleq\bar{\alpha}\mathbf{1}_{n}$ .

III-B Main Results

This part presents the main results of the proposed algorithm, as stated in the following theorem. Detailed proof is given in Sec. IV-B.

Theorem 1

Suppose Assumptions 1, 2 and 3 hold. Generate the auxiliary variables $\{y^{i}_{jk,t}\}_{t\geq 0}$ , the agent’s action $\{x^{i}_{j,t}\}_{t\geq 0}$ and gradient tracker $\{\varphi^{i}_{jk,t}\}_{t\geq 0}$ by (3) with the uncoordinated constant step-size $\alpha^{i}$ satisfying $\epsilon_{\alpha}<\frac{\chi}{2\sqrt{n}L}$ and

\displaystyle 0<{\alpha}_{\max}<\min\bigg{\{}\alpha_{1},\alpha_{2},\alpha_{3},\frac{1}{\chi-2\sqrt{n}L\epsilon_{\alpha}},1\bigg{\}},

where $\alpha_{1}$ , $\alpha_{2}$ and $\alpha_{3}$ are defined in Sec. IV-B. Then, all players’ decisions $\mathbf{x}_{t}$ linearly converges to a neighborhood of the unique NE $\mathbf{x}^{*}$ , and

\displaystyle\limsup_{t\to\infty}\mathbb{E}[\|\mathbf{x}_{t}-\mathbf{x}^{*}\|^{2}]\leq\mathcal{O}({\alpha}_{\max})+\mathcal{O}(\mu).

Remark 2

Theorem 1 characterizes the convergence performance of the proposed algorithm. It shows that the agents’ actions converge to a neighborhood of the NE linearly with the error bounded by two terms: one is proportional to the largest step-size, and the other is proportional to the smoothing parameter due to the gradient estimation.

IV Convergence Analysis

Let $\mathcal{H}_{t}$ denote the $\sigma$ -field generated by the entire history of the random variables from time-step 0 to $t-1$ . We introduce the following notations. Denote that $n_{s}\triangleq\sum_{i=1}^{N}n_{i}^{2}$ and $n_{c}\triangleq\sum_{i=1}^{N}n_{i}^{3}$ . For $\forall k\in\mathcal{V}^{i},i\in\mathcal{N}$ , $\mathbf{y}^{i}_{k,t}\triangleq[y^{i}_{1k,t},\ldots,y^{i}_{n_{i}k,t}]^{\top}\in\mathbb{R}^{n_{i}},\bar{y}^{i}_{k,t}\triangleq\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}\mathbf{y}^{i}_{k,t}\in\mathbb{R},\bar{\mathbf{y}}^{i}_{t}\triangleq[\bar{y}^{i}_{1,t},\ldots,\bar{y}^{i}_{n_{i},t}]^{\top}\in\mathbb{R}^{n_{i}},\bar{\mathbf{y}}_{t}\triangleq[\bar{\mathbf{y}}^{1\top}_{t},\ldots,\bar{\mathbf{y}}^{N\top}_{t}]^{\top}\in\mathbb{R}^{n},\varphi^{i}_{k,t}\triangleq[\varphi^{i}_{1k,t},\ldots,\varphi^{i}_{n_{i}k,t}]^{\top}\in\mathbb{R}^{n_{i}},\bar{\varphi}^{i}_{k,t}\triangleq\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}\varphi^{i}_{k,t}\in\mathbb{R},\bar{\bm{\varphi}}^{i}_{t}\triangleq[\bar{\varphi}^{i}_{1,t},\ldots,\bar{\varphi}^{i}_{n_{i},t}]^{\top}\in\mathbb{R}^{n_{i}},\mathbf{g}^{i}_{k}\triangleq[{g}^{i}_{1k},\ldots,{g}^{i}_{n_{i}k}]^{\top}\in\mathbb{R}^{n_{i}},\bar{{g}}^{i}_{k}\triangleq\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}\mathbf{g}^{i}_{k}\in\mathbb{R},\nabla_{x^{i}_{k}}\mathbf{f}^{i}_{\mu}(\mathbf{x})\triangleq[\nabla_{x^{i}_{k}}f^{i}_{1,\mu}(\mathbf{x}),\ldots,\nabla_{x^{i}_{k}}f^{i}_{n_{i},\mu}(\mathbf{x})]^{\top}\in\mathbb{R}^{n_{i}}$ . Then, the update laws (3a) and (3c) read:


$\displaystyle\mathbf{y}^{i}_{k,t+1}$	$\displaystyle=\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\alpha^{i}\varphi^{i}_{k,t},$	(4a)
$\displaystyle\varphi^{i}_{k,t+1}$	$\displaystyle=\mathcal{A}^{i}\varphi^{i}_{k,t}+\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t}).$	(4b)

Pre-multiplying both sides of (4a) by $\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}$ and augmenting the relation for $k\in\mathcal{V}^{i}$ , we have

\displaystyle\bar{\mathbf{y}}^{i}_{t+1}=\bar{\mathbf{y}}^{i}_{t}-\alpha^{i}\bar{\bm{\varphi}}^{i}_{t},

(5)

The convergence analysis of the proposed algorithm will be conducted by: 1) constructing a linear system of three terms $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\|^{2}$ , $\|\bar{\mathbf{y}}_{\ell}-\mathbf{x}^{*}_{\mu}\|^{2}$ and $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}$ in terms of their past iterations and some constants, 2) analyzing the convergence of the established linear system.

IV-A Auxiliary Analysis

We first derive some results for the averaged gradient tracker $\bar{\varphi}^{i}_{k,t}$ .

Lemma 3

Under Assumptions 1 and 2, the averaged gradient tracker $\bar{\varphi}^{i}_{k,t},\forall k\in\mathcal{V}^{i},i\in\mathcal{N}$ holds that

1.

$\bar{\varphi}^{i}_{k,t}=\bar{{g}}^{i}_{k}(\mathbf{x}_{t})$ ,
2.

$\mathbb{E}[\bar{\varphi}^{i}_{k,t}|\mathcal{H}_{t}]=\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})$ ,
3.

$\mathbb{E}[\|\bar{\varphi}^{i}_{k,t}\|^{2}|\mathcal{H}_{t}]\leq 12(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\|^{2}+12(n+4)L^{2}\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\|^{2}+12(n+4)G^{2}+3(n+4)^{3}\mu^{2}L^{2}$ .

Proof: For 1), multiplying $\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}$ from the left on both sides of (4b), and noting that $\mathcal{A}^{i}$ is doubly stochastic, we have

\displaystyle\bar{\varphi}^{i}_{k,t+1}=\bar{\varphi}^{i}_{k,t}+\bar{{g}}^{i}_{k}(\mathbf{x}_{t+1})-\bar{{g}}^{i}_{k}(\mathbf{x}_{t}).

Recursively expanding the above relation and noting that $\varphi^{i}_{k,0}=\mathbf{g}^{i}_{k}(\mathbf{x}_{0})$ completes the proof.

For 2), following the result of part 1) and Lemma 1-3), we obtain

\displaystyle\mathbb{E}[\bar{\varphi}^{i}_{k,t}|\mathcal{H}_{t}]=\mathbb{E}[\bar{{g}}^{i}_{k}(\mathbf{x}_{t})|\mathcal{H}_{t}]=\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}\mathbb{E}[\mathbf{g}^{i}_{k}|\mathcal{H}_{t}]=\frac{1}{n_{i}}\mathbf{1}_{n_{i}}^{\top}\nabla_{x^{i}_{k}}\mathbf{f}^{i}_{\mu}=\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t}).

For 3), it follows that

\displaystyle\mathbb{E}[\|\bar{\varphi}^{i}_{k,t}\|^{2}|\mathcal{H}_{t}]=\mathbb{E}[\|\bar{{g}}^{i}_{k}(\mathbf{x}_{t})\|^{2}|\mathcal{H}_{t}]=\frac{1}{n_{i}^{2}}\mathbb{E}[\|\mathbf{1}_{n_{i}}^{\top}\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\|^{2}|\mathcal{H}_{t}]\leq\frac{1}{n_{i}}\sum_{j=1}^{n_{i}}\mathbb{E}[\|{g}^{i}_{jk}(\mathbf{x}_{t})\|^{2}|\mathcal{H}_{t}].

On the other hand, it follows from Lemma 1-3) that

$\displaystyle\mathbb{E}[\\|{g}^{i}_{jk}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]$	$\displaystyle\leq 4(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}_{t})\\|^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}_{t})-\nabla f^{i}_{j,\mu}(\bar{\mathbf{y}}_{t})\\|^{2}+12(n+4)\\|\nabla f^{i}_{j,\mu}(\bar{\mathbf{y}}_{t})-\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\\|^{2}$
	$\displaystyle\quad+12(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\\|^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)L^{2}\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|^{2}+12(n+4)L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
	$\displaystyle\quad+12(n+4)G^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad+12(n+4)(L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+G^{2})+3(n+4)^{3}\mu^{2}L^{2},$	(6)

where $G\triangleq\max_{j\in\mathcal{V}^{i},i\in\mathcal{N}}\|\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\|$ and the last inequality follows from (3b) that

\displaystyle\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\|^{2}

\displaystyle=\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|y^{i}_{kk,t}-\bar{y}^{i}_{k,t}\|^{2}\leq\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\|^{2}.

(7)

The proof is completed by combining the above relations. ∎

Then, we provide a bound on the stacked gradient tracker $\varphi^{i}_{k,t}$ .

Lemma 4

Under Assumptions 1 and 2, the stacked gradient tracker $\{\varphi^{i}_{k,t}\}_{t\geq 0},\forall k\in\mathcal{V}^{i},i\in\mathcal{N}$ holds that

	$\displaystyle\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$	$\displaystyle\leq 2\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+24n_{i}^{2}(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
		$\displaystyle\quad+24n_{i}^{2}(n+4)(L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+G^{2})+6n_{i}^{2}(n+4)^{3}\mu^{2}L^{2}.$

Proof: It is noted that

\displaystyle\|\varphi^{i}_{k,t}\|^{2}

\displaystyle\leq 2\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}+2\|\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}=2\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}+2n_{i}^{2}\|\bar{\varphi}^{i}_{k,t}\|^{2}.

The proof is completed by taking the conditional expectation on $\mathcal{H}_{t}$ on both sides and substituting Lemma 3-3). ∎

Now, we are ready to establish an inequality relation for the first term, $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\|^{2}$ in Lemma 5.

Lemma 5

Under Assumptions 1, 2 and 3, the total consensus error of the auxiliary variables $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\|^{2}$ satisfies that

	$\displaystyle\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]\leq\bigg{(}\frac{1+\bar{\sigma}^{2}}{2}+24(n+4)n_{c}\varsigma L^{2}\alpha_{\max}^{2}\bigg{)}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad+24(n+4)n_{c}\varsigma L^{2}\alpha_{\max}^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+2\varsigma\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad+24(n+4)n_{c}\varsigma G^{2}\alpha_{\max}^{2}+6(n+4)^{3}n_{c}\varsigma\mu^{2}L^{2}\alpha_{\max}^{2}.$

Proof: It follows from (4a) that for $i\in\mathcal{N}$

	$\displaystyle\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}$	$\displaystyle=\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\alpha^{i}\varphi^{i}_{k,t}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}(\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\alpha^{i}\varphi^{i}_{k,t})\\|^{2}$
		$\displaystyle\leq\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\\|\alpha^{i}(I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top})\varphi^{i}_{k,t}\\|^{2}$
		$\displaystyle\quad-2\alpha^{i}\langle\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t},(I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top})\varphi^{i}_{k,t}\rangle,$

Taking the conditional expectation on $\mathcal{H}_{t}$ and noting that $\|I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}\|=1$ , we obtain

	$\displaystyle\mathbb{E}[\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]\leq\sigma_{\mathcal{A}^{i}}^{2}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\quad+\frac{1-\sigma_{\mathcal{A}^{i}}^{2}}{2\sigma_{\mathcal{A}^{i}}^{2}}\mathbb{E}[\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\frac{2\sigma_{\mathcal{A}^{i}}^{2}}{1-\sigma_{\mathcal{A}^{i}}^{2}}\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\frac{1+\sigma_{\mathcal{A}^{i}}^{2}}{2}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\frac{1+\sigma_{\mathcal{A}^{i}}^{2}}{1-\sigma_{\mathcal{A}^{i}}^{2}}\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\frac{1+\bar{\sigma}^{2}}{2}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\varsigma\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}].$

Applying Lemma 4 and summing over $k=1$ to $n_{i}$ , $i=1$ to $N$ complete the proof. ∎

Then, we proceed to build the inequality relation for the second term, $\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\|^{2}$ in Lemma 6.

Lemma 6

Under Assumptions 1, 2 and 3, the gap between the stacked averaged auxiliary variable and the NE of game $\Gamma_{\mu}$ , $\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\|^{2}$ holds that

	$\displaystyle\mathbb{E}[\\|\bar{\mathbf{y}}_{t+1}-\mathbf{x}^{*}\\|^{2}\|\mathcal{H}_{t}]$	$\displaystyle\leq(1-(\chi-2\sqrt{n}L\epsilon_{\alpha})\bar{\alpha}+12n(n+4)L^{2}\alpha_{\max}^{2})\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
		$\displaystyle\quad+\bigg{(}12n(n+4)L^{2}\alpha_{\max}^{2}+\frac{n^{2}L^{2}\alpha_{\max}}{\chi}\bigg{)}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
		$\displaystyle\quad+12n(n+4)G^{2}\alpha_{\max}^{2}+3n(n+4)^{3}\mu^{2}L^{2}\alpha_{\max}^{2}.$

Proof: From (5), we know


$\displaystyle\\|\bar{\mathbf{y}}_{t+1}-\mathbf{x}^{*}_{\mu}\\|^{2}$	$\displaystyle=\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu}-\alpha^{i}\bar{\varphi}^{i}_{k,t}\\|^{2}$
	$\displaystyle\leq\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\bar{\varphi}^{i}_{k,t}\\|^{2}$	(8a)
	$\displaystyle\quad-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu},\bar{\varphi}^{i}_{k,t}-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})\rangle$	(8b)
	$\displaystyle\quad-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})\rangle$	(8c)
	$\displaystyle\quad-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle.$	(8d)

For the second term in (8a), applying Lemma 3-(3)

	$\displaystyle\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]\leq\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\bigg{(}12(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad\quad+12(n+4)(L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+G^{2})+3(n+4)^{3}\mu^{2}L^{2}\bigg{)}$
	$\displaystyle\quad\leq 12n(n+4)L^{2}\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+12n(n+4)L^{2}\alpha_{\max}^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
	$\displaystyle\quad\quad+12n(n+4)G^{2}\alpha_{\max}^{2}+3n(n+4)^{3}\mu^{2}L^{2}\alpha_{\max}^{2}.$		(9)

For (8b),

\displaystyle\mathbb{E}[-2\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu},\bar{\varphi}^{i}_{k,t}-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})\rangle|\mathcal{H}_{t}]=0.

(10)

For (8c),

	$\displaystyle-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})\rangle$
	$\displaystyle\quad\leq 2\alpha_{\max}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu}\\|\\|\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})\\|$
	$\displaystyle\quad\leq 2L\alpha_{\max}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\bar{y}^{i}_{k,t}-x^{i}_{k,\mu}\\|\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|\leq 2\sqrt{n}L\alpha_{\max}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{}_{\mu}\\|\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|$
	$\displaystyle\quad\leq\chi\bar{\alpha}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+\frac{nL^{2}\alpha_{\max}^{2}}{\chi\bar{\alpha}}\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|^{2}$
	$\displaystyle\quad\leq\chi\bar{\alpha}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+\frac{nL^{2}\alpha_{\max}^{2}}{\chi\bar{\alpha}}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad\leq\chi\bar{\alpha}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+\frac{n^{2}L^{2}\alpha_{\max}}{\chi}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2},$		(11)

where the last inequality is due to $\frac{{\alpha}_{\max}}{\bar{\alpha}}<n$ . For (8d),


	$\displaystyle-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle$
	$\displaystyle\quad=-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}(\alpha^{i}-\bar{\alpha})\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle$		(12a)
	$\displaystyle\quad\quad-2\bar{\alpha}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle.$		(12b)

For (12a),

	$\displaystyle-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}(\alpha^{i}-\bar{\alpha})\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle$
	$\displaystyle\quad\leq 2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\alpha^{i}-\bar{\alpha}\|\\|\bar{y}^{i}_{k,t}-x^{i}_{k,\mu}\\|\\|\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\\|$
	$\displaystyle\quad\leq 2L\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{}_{\mu}\\|\sum_{i=1}^{N}\|\alpha^{i}-\bar{\alpha}\|\sum_{k=1}^{n_{i}}\\|\bar{y}^{i}_{k,t}-x^{i}_{k,\mu}\\|$
	$\displaystyle\quad\leq 2L\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{}_{\mu}\\|\sum_{i=1}^{N}\sqrt{n_{i}}\|\alpha^{i}-\bar{\alpha}\|\\|\bar{\mathbf{y}}^{i}_{t}-\mathbf{x}^{i}_{\mu}\\|\leq 2\sqrt{n}L\epsilon_{\alpha}\bar{\alpha}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}.$		(13)

For (12b),

	$\displaystyle-2\bar{\alpha}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\langle\bar{y}^{i}_{k,t}-x^{i}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{}_{\mu})\rangle$
	$\displaystyle\quad=-2\bar{\alpha}\langle\bar{\mathbf{y}}_{t}-\mathbf{x}^{}_{\mu},\Phi_{\mu}(\bar{\mathbf{y}}_{t})-\Phi_{\mu}(\mathbf{x}^{}_{\mu})\rangle\leq-2\chi\bar{\alpha}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}.$		(14)

Combining (13) and (14), we obtain for (8d) that

\displaystyle-2\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\alpha^{i}\langle\bar{y}^{i}_{k,t}-x^{i*}_{k,\mu},\nabla_{x^{i}_{k}}f^{i}_{\mu}(\bar{\mathbf{y}}_{t})-\nabla_{x^{i}_{k}}f^{i}_{\mu}(\mathbf{x}^{*}_{\mu})\rangle\leq 2(\sqrt{n}L\epsilon_{\alpha}-\chi)\bar{\alpha}\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\|^{2}.

(15)

Finally, taking the conditional expectation for (8) on $\mathcal{H}_{t}$ , and substituting (9), (10), (11) and (15) into it, we obtain the desired result. ∎

Finally, we derive an inequality relation for the third term, $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\varphi^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}$ in Lemma 7.

Lemma 7

Under Assumptions 1 and 2, the total gradient tracking error $\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\|^{2}$ satisfies

	$\displaystyle\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\varphi^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]\leq 24(n+4)n_{s}\varsigma L^{2}\bigg{(}\frac{3+\bar{\sigma}^{2}}{2}+\frac{n^{2}L^{2}\alpha_{\max}}{\chi}$
	$\displaystyle\quad\quad+24(n+4)n_{c}\varsigma L^{2}\alpha_{\max}^{2}+12n(n+4)L^{2}\alpha_{\max}^{2}\bigg{)}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad+\bigg{(}\frac{1+\bar{\sigma}^{2}}{2}+48(n+4)n_{s}\varsigma^{2}L^{2}\alpha_{\max}^{2}\bigg{)}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad+24(n+4)n_{s}\varsigma L^{2}[2+12n(n+4)L^{2}\alpha_{\max}^{2}+24(n+4)n_{c}\varsigma L^{2}\alpha_{\max}^{2}]\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
	$\displaystyle\quad+24(n+4)n_{s}\varsigma L^{2}[24(n+4)n_{c}\varsigma G^{2}+6(n+4)^{3}n_{c}\varsigma\mu^{2}L^{2}+12n(n+4)G^{2}$
	$\displaystyle\quad\quad+3n(n+4)^{3}\mu^{2}L^{2}]\alpha_{\max}^{2}+12(n+4)^{3}n_{s}\varsigma\mu^{2}L^{2}+48(n+4)n_{s}\varsigma G^{2}.$

Proof: It is obtained from (4b) that

	$\displaystyle\\|\varphi^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t+1}\\|^{2}$	$\displaystyle=\\|\mathcal{A}^{i}\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}+\bigg{\\|}\bigg{(}I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}\bigg{)}(\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t}))\bigg{\\|}^{2}$
		$\displaystyle\quad+2\bigg{\langle}\mathcal{A}^{i}\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t},\bigg{(}I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}\bigg{)}(\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t}))\bigg{\rangle}.$

It is noted that $\|I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}\|=1$ . Taking the conditional expectation on $\mathcal{H}_{t}$ yields

	$\displaystyle\mathbb{E}[\\|\varphi^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]\leq\sigma_{\mathcal{A}^{i}}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\mathbb{E}[\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\quad+2\mathbb{E}[\\|\mathcal{A}^{i}\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\sigma_{\mathcal{A}^{i}}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\mathbb{E}[\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\quad+\frac{1-\sigma_{\mathcal{A}^{i}}^{2}}{2}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\frac{2\sigma_{\mathcal{A}^{i}}^{2}}{1-\sigma_{\mathcal{A}^{i}}^{2}}\mathbb{E}[\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\frac{1+\bar{\sigma}^{2}}{2}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\varsigma\mathbb{E}[\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}].$		(16)

The last term of (16) follows from (6) that

	$\displaystyle\mathbb{E}[\\|\mathbf{g}^{i}_{k}(\mathbf{x}_{t+1})-\mathbf{g}^{i}_{k}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]\leq 2\sum_{j=1}^{n_{i}}(\mathbb{E}[\\|g^{i}_{jk}(\mathbf{x}_{t+1})\\|^{2}\|\mathcal{H}_{t}]+\mathbb{E}[\\|g^{i}_{jk}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}])$
	$\displaystyle\quad\leq 24n_{i}(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]+24n_{i}(n+4)L^{2}\mathbb{E}[\\|\bar{\mathbf{y}}_{t+1}-\mathbf{x}^{*}_{\mu}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\quad+24n_{i}(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+24n_{i}(n+4)L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+12n_{i}(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\quad\quad+48n_{i}(n+4)G^{2}.$

Invoking Lemmas 5 and 6 in the above relation, and summing (16) over $k\in\mathcal{V}^{i},i\in\mathcal{N}$ complete the proof. ∎

IV-B Proof of Theorem 1

Now, we proceed to the proof of Theorem 1. Based on the results in Lemmas 5, 6 and 7, we can construct a linear system by taking the total expectation on the corresponding relations.

\displaystyle\Psi_{t+1}\leq\mathbf{M}\Psi_{t}+\Upsilon,

(17)

where

	$\displaystyle\Psi_{t}$	$\displaystyle\triangleq\begin{bmatrix}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|_{\bm{\nu}^{i}_{r}}^{2}]\\ \mathbb{E}[\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}]\\ \sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\varphi^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{\varphi}^{i}_{k,t}\\|^{2}]\end{bmatrix},\Upsilon\triangleq\begin{bmatrix}m_{12}{\alpha}_{\max}^{2}\\ m_{13}{\alpha}_{\max}^{2}\\ m_{14}+m_{15}{\alpha}_{\max}^{2}\end{bmatrix},$
	$\displaystyle\mathbf{M}$	$\displaystyle\triangleq\begin{bmatrix}1-m_{1}+m_{2}{\alpha}_{\max}^{2}&m_{2}{\alpha}_{\max}^{2}&m_{3}{\alpha}_{\max}^{2}\\ m_{4}{\alpha}_{\max}+m_{5}{\alpha}_{\max}^{2}&1-m_{6}\bar{\alpha}+m_{5}{\alpha}_{\max}^{2}&0\\ m_{7}+m_{8}{\alpha}_{\max}+m_{9}{\alpha}_{\max}^{2}&m_{10}+m_{9}{\alpha}_{\max}^{2}&1-m_{1}+m_{11}{\alpha}_{\max}^{2}\end{bmatrix},$

$m_{1}\triangleq\frac{1-\bar{\sigma}^{2}}{2}$ , $m_{2}\triangleq 24(n+4)n_{c}\varsigma L^{2}$ , $m_{3}\triangleq 2\varsigma$ , $m_{4}\triangleq n^{2}L^{2}/\chi$ , $m_{5}\triangleq 12n(n+4)L^{2}$ , $m_{6}\triangleq\chi-2\sqrt{n}L\epsilon_{\alpha}$ , $m_{7}\triangleq 12(n+4)n_{s}\varsigma L^{2}(3+\bar{\sigma}^{2})$ , $m_{8}\triangleq 24(n+4)n_{s}\varsigma L^{2}m_{4}$ , $m_{9}\triangleq 24(n+4)n_{s}\varsigma L^{2}(m_{2}+m_{5})$ , $m_{10}\triangleq 48(n+4)n_{s}\varsigma L^{2}$ , $m_{11}\triangleq\varsigma m_{10}$ , $m_{12}\triangleq 24(n+4)n_{c}\varsigma G^{2}+6(n+4)^{3}n_{c}\varsigma\mu^{2}L^{2}$ , $m_{13}\triangleq 12n(n+4)G^{2}+3n(n+4)^{3}\mu^{2}L^{2}$ , $m_{14}\triangleq 48(n+4)n_{s}\varsigma G^{2}+12(n+4)^{3}n_{s}\varsigma\mu^{2}L^{2}$ and $m_{15}\triangleq 24(n+4)n_{s}\varsigma L^{2}(m_{12}+m_{13})$ .

For the linear system (17), we aim to prove $\rho(\mathbf{M})<1$ such that each component of $\Psi_{t}$ can linearly converge to a neighborhood of 0 [21].

We adopt the following result to guarantee $\rho(\mathbf{M})<1$ :

Lemma 8

(see [21, Cor. 8.1.29]) Let $A\in\mathbb{R}^{m\times m}$ be a matrix with non-negative entries and $\bm{\nu}\in\mathbb{R}^{m}$ be a vector with positive entries. If there exists a constant $\lambda\geq 0$ such that $A\bm{\nu}<\lambda\bm{\nu}$ , then $\rho(A)<\lambda$ .

To apply Lemma 8, each element of $\mathbf{M}$ should be non-negative. Hence, we may set $m_{6}>0$ and ${\alpha}_{\max}<\frac{1}{m_{6}}$ , i.e.,

\displaystyle{\alpha}_{\max}<\frac{1}{m_{6}},\epsilon_{\alpha}<\frac{\chi}{2\sqrt{n}L}.

Next, based on Lemma 8, it suffices to find a vector $\bm{\nu}\triangleq[\nu_{1},\nu_{2},\nu_{3}]^{\top}$ with $\nu_{1},\nu_{2},\nu_{3}>0$ such that $\mathbf{M}_{\alpha}\bm{\nu}<\bm{\nu}$ , i.e.,

	$\displaystyle(1-m_{1}+m_{2}{\alpha}_{\max}^{2})\nu_{1}+(m_{2}{\alpha}_{\max}^{2})\nu_{2}+(m_{3}{\alpha}_{\max}^{2})\nu_{3}<\nu_{1},$
	$\displaystyle(m_{4}{\alpha}_{\max}+m_{5}{\alpha}_{\max}^{2})\nu_{1}+(1-m_{6}\bar{\alpha}+m_{5}{\alpha}_{\max}^{2})\nu_{2}<\nu_{2},$
	$\displaystyle(m_{7}+m_{8}{\alpha}_{\max}+m_{9}{\alpha}_{\max}^{2})\nu_{1}+(m_{10}+m_{9}{\alpha}_{\max}^{2})\nu_{2}+(1-m_{1}+m_{11}{\alpha}_{\max}^{2})\nu_{3}<\nu_{3}.$

Without loss of generality, we may set $\nu_{3}=1$ . It remains to find $\nu_{1}$ and $\nu_{2}$ such that the following inequalities hold


	$\displaystyle(m_{2}\nu_{1}+m_{2}\nu_{2}+m_{3}){\alpha}_{\max}^{2}<m_{1}\nu_{1},$		(18a)
	$\displaystyle(m_{5}\nu_{1}+m_{5}\nu_{2}){\alpha}_{\max}<\frac{m_{6}\nu_{2}}{n}-m_{4}\nu_{1},$		(18b)
	$\displaystyle(m_{9}\nu_{1}+m_{9}\nu_{2}+m_{11}){\alpha}_{\max}^{2}<m_{1}-(m_{7}+m_{8})\nu_{1}-m_{10}\nu_{2},$		(18c)

where we have applied $\frac{\bar{\alpha}}{{\alpha}_{\max}}>\frac{1}{n}$ in (18b), and forced ${\alpha}_{\max}<1$ in (18c).

To ensure the existence of ${\alpha}_{\max}$ , the RHS of (18) has to be positive. Hence, we may set

\displaystyle\nu_{1}=\frac{m_{1}m_{6}}{4nm_{4}m_{10}+2m_{6}m_{7}+2m_{6}m_{8}},\nu_{2}=\frac{nm_{1}m_{4}}{2nm_{4}m_{10}+m_{6}m_{7}+m_{6}m_{8}}.

Then, the three inequalities in (18) can be solved, which gives

\displaystyle{\alpha}_{\max}<\alpha_{1},{\alpha}_{\max}<\alpha_{2},{\alpha}_{\max}<\alpha_{3},

where

	$\displaystyle\alpha_{1}\triangleq\sqrt{\frac{m_{1}^{2}m_{6}}{m_{1}m_{2}m_{6}+2nm_{1}m_{2}m_{4}+4nm_{3}m_{4}m_{10}+2m_{3}m_{6}m_{7}+2m_{3}m_{6}m_{8}}},$
	$\displaystyle\alpha_{2}\triangleq\frac{m_{1}m_{4}m_{6}}{m_{1}m_{5}m_{6}+2nm_{1}m_{4}m_{5}},$
	$\displaystyle\alpha_{3}\triangleq\sqrt{\frac{m_{1}(2nm_{4}m_{10}+m_{6}m_{7}+m_{6}m_{8})}{m_{1}m_{6}m_{9}+2nm_{1}m_{4}m_{9}+m_{11}(4nm_{4}m_{10}+2m_{6}m_{7}+2m_{6}m_{8})}}.$

Therefore, the range of the step-size is given by

\displaystyle 0<{\alpha}_{\max}<\min\bigg{\{}\alpha_{1},\alpha_{2},\alpha_{3},\frac{1}{m_{6}},1\bigg{\}},\epsilon_{\alpha}<\frac{\chi}{2\sqrt{n}L}.

Furthermore, taking the limsup on both sides of (17)

\displaystyle\limsup_{t\to\infty}\Psi_{t}\leq\mathbf{M}\limsup_{t\to\infty}\Psi_{t}+\Upsilon,

which gives

\displaystyle(I_{3}-\mathbf{M})\limsup_{t\to\infty}\Psi_{t}\leq\Upsilon,

where

\displaystyle I_{3}-\mathbf{M}=\begin{bmatrix}m_{1}-m_{2}{\alpha}_{\max}^{2}&-m_{2}{\alpha}_{\max}^{2}&-m_{3}{\alpha}_{\max}^{2}\\ -m_{4}{\alpha}_{\max}-m_{5}{\alpha}_{\max}^{2}&m_{6}\bar{\alpha}-m_{5}{\alpha}_{\max}^{2}&0\\ -m_{7}-m_{8}{\alpha}_{\max}-m_{9}{\alpha}_{\max}^{2}&-m_{10}-m_{9}{\alpha}_{\max}^{2}&m_{1}-m_{11}{\alpha}_{\max}^{2}\end{bmatrix}.

It can be obtained that

	$\displaystyle det(I_{3}-\mathbf{M})$	$\displaystyle\triangleq(m_{1}-m_{11}{\alpha}_{\max}^{2})[(m_{1}-m_{2}{\alpha}_{\max}^{2})(m_{6}\bar{\alpha}-m_{5}{\alpha}_{\max}^{2})$
		$\displaystyle\quad-m_{2}{\alpha}_{\max}^{2}(m_{4}{\alpha}_{\max}+m_{5}{\alpha}_{\max}^{2})]$
		$\displaystyle>(m_{1}-m_{11}{\alpha}_{\max}^{2})\bigg{[}(m_{1}-m_{2}{\alpha}_{\max}^{2})\bigg{(}\frac{m_{6}{\alpha}_{\max}}{n}-m_{5}{\alpha}_{\max}^{2}\bigg{)}$
		$\displaystyle\quad-m_{2}{\alpha}_{\max}^{2}(m_{4}{\alpha}_{\max}+m_{5}{\alpha}_{\max}^{2})\bigg{]}$
		$\displaystyle={\alpha}_{\max}(m_{1}-m_{2}{\alpha}_{\max}^{2})\bigg{[}\frac{m_{1}m_{6}}{n}-m_{1}m_{5}{\alpha}_{\max}-m_{2}\bigg{(}m_{4}+\frac{m_{6}}{n}\bigg{)}{\alpha}_{\max}^{2}\bigg{]},$
	$\displaystyle adj(I_{3}-\mathbf{M})_{11}$	$\displaystyle\triangleq(m_{1}-m_{11}{\alpha}_{\max}^{2})(m_{6}\bar{\alpha}-m_{5}{\alpha}_{\max}^{2})\leq{\alpha}_{\max}(m_{1}-m_{11}{\alpha}_{\max}^{2})(m_{6}-m_{5}{\alpha}_{\max}),$
	$\displaystyle adj(I_{3}-\mathbf{M})_{12}$	$\displaystyle\triangleq{\alpha}_{\max}^{2}[m_{2}(m_{1}-m_{11}{\alpha}_{\max}^{2})+m_{3}(m_{10}+m_{9}{\alpha}_{\max}^{2})],$
	$\displaystyle adj(I_{3}-\mathbf{M})_{13}$	$\displaystyle\triangleq m_{3}{\alpha}_{\max}^{2}(m_{6}\bar{\alpha}-m_{5}{\alpha}_{\max}^{2})\leq m_{3}{\alpha}_{\max}^{3}(m_{6}-m_{5}{\alpha}_{\max}),$
	$\displaystyle adj(I_{3}-\mathbf{M})_{21}$	$\displaystyle\triangleq{\alpha}_{\max}(m_{1}-m_{11}{\alpha}_{\max}^{2})(m_{4}+m_{5}{\alpha}_{\max}),$
	$\displaystyle adj(I_{3}-\mathbf{M})_{22}$	$\displaystyle\triangleq(m_{1}-m_{2}{\alpha}_{\max}^{2})(m_{1}-m_{11}{\alpha}_{\max}^{2})-m_{3}{\alpha}_{\max}^{2}(m_{7}+m_{8}{\alpha}_{\max}+m_{9}{\alpha}_{\max}^{2}),$
	$\displaystyle adj(I_{3}-\mathbf{M})_{23}$	$\displaystyle\triangleq m_{3}{\alpha}_{\max}^{3}(m_{4}+m_{5}{\alpha}_{\max})$

Then, we have

	$\displaystyle\limsup_{t\to\infty}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}]\leq[(I_{3}-\mathbf{M})^{-1}\Upsilon]_{1}$
	$\displaystyle\quad=\frac{adj(I_{3}-\mathbf{M})_{11}[\Upsilon]_{1}}{det(I_{3}-\mathbf{M})}+\frac{adj(I_{3}-\mathbf{M})_{12}[\Upsilon]_{2}}{det(I_{3}-\mathbf{M})}+\frac{adj(I_{3}-\mathbf{M})_{13}[\Upsilon]_{3}}{det(I_{3}-\mathbf{M})}=\mathcal{O}({\alpha}_{\max}^{2}),$

and

	$\displaystyle\limsup_{t\to\infty}\mathbb{E}[\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}]\leq[(I_{3}-\mathbf{M})^{-1}\Upsilon]_{2}$
	$\displaystyle\quad=\frac{adj(I_{3}-\mathbf{M})_{21}[\Upsilon]_{1}}{det(I_{3}-\mathbf{M})}+\frac{adj(I_{3}-\mathbf{M})_{22}[\Upsilon]_{2}}{det(I_{3}-\mathbf{M})}+\frac{adj(I_{3}-\mathbf{M})_{23}[\Upsilon]_{3}}{det(I_{3}-\mathbf{M})}=\mathcal{O}({\alpha}_{\max}).$

Thus, it follows from (7) that

	$\displaystyle\limsup_{t\to\infty}\mathbb{E}[\\|\mathbf{x}_{t}-\mathbf{x}^{}_{\mu}\\|^{2}]\leq 2\limsup_{t\to\infty}\mathbb{E}[\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|^{2}]+2\limsup_{t\to\infty}\mathbb{E}[\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{}_{\mu}\\|^{2}]$
	$\displaystyle\quad\leq 2\limsup_{t\to\infty}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}]+2\limsup_{t\to\infty}\mathbb{E}[\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}]=\mathcal{O}({\alpha}_{\max}).$

Invoking Lemma 2 yields

\displaystyle\limsup_{t\to\infty}\mathbb{E}[\|\mathbf{x}_{t}-\mathbf{x}^{*}\|^{2}]\leq 2\limsup_{t\to\infty}\mathbb{E}[\|\mathbf{x}_{t}-\mathbf{x}^{*}_{\mu}\|^{2}]+2\|\mathbf{x}^{*}_{\mu}-\mathbf{x}^{*}\|^{2}=\mathcal{O}({\alpha}_{\max})+\mathcal{O}(\mu),

which completes the proof.

V Numerical Simulations

We illustrate the proposed NE seeking strategy on a connectivity control game [22], played among a number of sensor networks. Specifically, there are $N$ sensor networks, where each sensor network contains $n_{i}$ sensors. Let $x^{i}_{j}=[x^{i}_{j,1},x^{i}_{j,2}]^{\top}\in\mathbb{R}^{2}$ denote the position of sensor $j$ (referred to as an agent) from a sensor network $i$ (referred to as a cluster). Then, this sensor aims to seek a tradeoff between a local cost, $l^{i}_{j}(\mathbf{x}^{i})$ (e.g., source seeking and positioning) and the global cost, $h^{i}_{j}(\mathbf{x})$ (e.g., connectivity preservation with other sensor networks). Hence, the cost function to be minimized by this sensor is given by

f^{i}_{j}(\mathbf{x})=l^{i}_{j}(\mathbf{x}^{i})+h^{i}_{j}(\mathbf{x}),

where

	$\displaystyle l^{i}_{j}(\mathbf{x}^{i})$	$\displaystyle=\mathbf{x}^{i\top}a^{i}_{j}\mathbf{x}^{i}+b^{i\top}_{j}\mathbf{x}^{i}+c^{i}_{j},$
	$\displaystyle h^{i}_{j}(\mathbf{x})$	$\displaystyle=\sum_{k\in\mathcal{N}_{i}}d^{i}_{j}\\|e^{i}_{jk}x^{i}_{j}-\mathbf{x}^{k}\\|^{2},$

and $a^{i}_{j},b^{i}_{j},c^{i}_{j},d^{i}_{j},e^{i}_{jk}$ are constant matrices or vectors of appropriate dimensions, and $\mathcal{N}_{i}$ stands for the set of neighbors of sensor network $i$ in a connected graph characterizing their position dependence. Specifically, if $k\in\mathcal{N}_{i}$ , then the corresponding term $\|e^{i}_{jk}x^{i}_{j}-\mathbf{x}^{k}\|^{2}$ represents the intention of sensor $j$ from a sensor network $i$ to preserve the connectivity with the sensors from sensor network $k$ .

Refer to caption — Figure 1: Communication network.

In this simulation, we consider $N=3$ and $n_{i}=4$ . The local and global costs are set as $l^{i}_{j}=i[\|x^{i}_{j}\|^{2}+\mathbf{1}^{\top}_{2}x^{i}_{j}+j]$ for $j=1,\ldots,4$ , $i=1,2,3$ and $h^{1}_{j}=\|x^{1}_{j}-x^{2}_{j}\|^{2}$ , $h^{2}_{j}=\|x^{2}_{j}-x^{3}_{j}\|^{2}$ , $h^{3}_{j}=\|x^{3}_{j}-x^{1}_{j}\|^{2}$ for $j=1,\ldots,4$ . Then, it is readily verified that Assumptions 2 and 3 hold. The directed communication graph for each sensor network $i$ is as shown in Fig. 1. For the algorithm parameters, we let the smoothing parameter be $\mu=10^{-4}$ , and the constant step-sizes for sensors of network $i$ be $\alpha^{i}=0.1$ , $0.08$ , $0.06$ , respectively. Thus, $\epsilon_{\alpha}=0.2041$ . We initialize the algorithm with arbitrary $x^{i}_{j,0}$ , $y^{i}_{jk,0}$ and $\varphi^{i}_{jk,0}={g}^{i}_{jk}(\mathbf{x}_{0})$ . The trajectories of the sensors’ positions for the three sensor networks are plotted in Fig. 2. It can be seen that the positions of all sensors can almost converge to the NE. Also, more ‘zigzags’ can be observed for the case of a larger step-size, since the update is more aggressive.

Next, we illustrate the convergence rate results. First, we set the constant step-sizes for sensors of network $i$ be $\alpha^{i}_{j}=0.1a$ , $0.08a$ and $0.06a$ , respectively, and let $a=1.2$ , $1$ and $0.6$ , respectively. Hence, we fix the heterogeneity of the step-size $\epsilon_{\alpha}=0.2041$ , and set the largest step-size to ${\alpha}_{\max}=0.12$ , $0.1$ and $0.06$ , respectively. The trajectories of the error gap $\|\mathbf{x}_{t}-\mathbf{x}^{*}\|$ with these settings are plotted in Fig. 3a. Then, we fix the largest step-size to ${\alpha}_{\max}=0.1$ and the averaged step-size $\bar{\alpha}=0.06$ , and set the heterogeneity of the step-size $\epsilon_{\alpha}=0.2041$ , $0.4714$ , $0.4907$ and $0.5443$ , respectively. The trajectories of the error gap $\|\mathbf{x}_{t}-\mathbf{x}^{*}\|$ with these settings are plotted in Fig. 3b. As can be seen from both figures, the error gap descends linearly for all cases. Moreover, the convergence speed is faster with larger step-sizes and smaller heterogeneity, which verifies the derived results in Theorem 1.

VI Conclusions

This work has studied an $N$ -cluster non-cooperative game problem, where the agents’ cost functions are possibly non-smooth and the explicit expressions are unknown. By integrating the Gaussian smoothing techniques with the gradient tracking, a gradient-free NE seeking algorithm has been developed, in which the agents are allowed to select their own preferred constant step-sizes. We have shown that, when the largest step-size is sufficiently small, the agents’ actions approximately converge to the unique NE under a strongly monotone game mapping condition, and the error gap is proportional to the largest step-size and the smoothing parameter. Finally, the derived results have been verified by numerical simulations.

References

[1] M. Ye and G. Hu, “Distributed nash equilibrium seeking in multiagent games under switching communication topologies,” IEEE Transactions on Cybernetics, vol. 48, no. 11, pp. 3208–3217, 2018.
[2] K. Lu, G. Jing, and L. Wang, “Distributed Algorithms for Searching Generalized Nash Equilibrium of Noncooperative Games,” IEEE Transactions on Cybernetics, vol. 49, no. 6, pp. 2362–2371, 2019.
[3] P. Yi and L. Pavel, “An operator splitting approach for distributed generalized Nash equilibria computation,” Automatica, vol. 102, pp. 111–121, 2019.
[4] C. De Persis and S. Grammatico, “Continuous-Time Integral Dynamics for a Class of Aggregative Games With Coupling Constraints,” IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 2171–2176, 2020.
[5] Y. Zhang, S. Liang, X. Wang, and H. Ji, “Distributed Nash Equilibrium Seeking for Aggregative Games With Nonlinear Dynamics Under External Disturbances,” IEEE Transactions on Cybernetics, pp. 1–10, 2019.
[6] B. Gharesifard and J. Cortes, “Distributed convergence to Nash equilibria in two-network zero-sum games,” Automatica, vol. 49, no. 6, pp. 1683–1692, 2013.
[7] Y. Lou, Y. Hong, L. Xie, G. Shi, and K. H. Johansson, “Nash Equilibrium Computation in Subnetwork Zero-Sum Games With Switching Communications,” IEEE Transactions on Automatic Control, vol. 61, no. 10, pp. 2920–2935, 2016.
[8] M. Ye, G. Hu, and F. Lewis, “Nash equilibrium seeking for N-coalition noncooperative games,” Automatica, vol. 95, pp. 266–272, 2018.
[9] M. Ye and G. Hu, “Simultaneous social cost minimization and nash equilibrium seeking in non-cooperative games,” in Chinese Control Conference, CCC. IEEE Computer Society, 2017, pp. 3052–3059.
[10] M. Ye and G. Hu, “A distributed method for simultaneous social cost minimization and nash equilibrium seeking in multi-agent games,” in IEEE International Conference on Control and Automation, ICCA, 2017, pp. 799–804.
[11] M. Ye, G. Hu, F. L. Lewis, and L. Xie, “A Unified Strategy for Solution Seeking in Graphical N-Coalition Noncooperative Games,” IEEE Transactions on Automatic Control, vol. 64, no. 11, pp. 4645–4652, 2019.
[12] X. Nian, F. Niu, and Z. Yang, “Distributed Nash Equilibrium Seeking for Multicluster Game Under Switching Communication Topologies,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021.
[13] X. Zeng, J. Chen, S. Liang, and Y. Hong, “Generalized Nash equilibrium seeking strategy for distributed nonsmooth multi-cluster game,” Automatica, vol. 103, pp. 20–26, 2019.
[14] C. Sun and G. Hu, “Distributed Generalized Nash Equilibrium Seeking of N-Coalition Games with Full and Distributive Constraints,” arXiv preprint arXiv:2109.12515, sep 2021.
[15] M. Ye, G. Hu, and S. Xu, “An extremum seeking-based approach for Nash equilibrium seeking in N-cluster noncooperative games,” Automatica, vol. 114, p. 108815, 2020.
[16] Y. Pang and G. Hu, “Nash Equilibrium Seeking in N-Coalition Games via a Gradient-Free Method,” Automatica, vol. 136, p. 110013, 2022.
[17] S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, pp. 1–49, 2020.
[18] Y. Pang and G. Hu, “Distributed Nash Equilibrium Seeking with Limited Cost Function Knowledge via A Consensus-Based Gradient-Free Method,” IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1832–1839, 2021.
[19] Y. Pang and G. Hu, “A Gradient-Free Distributed Nash Equilibrium Seeking Method with Uncoordinated Step-Sizes,” in 2020 IEEE 59th Conference on Decision and Control(CDC), 2020, pp. 2291–2296.
[20] Y. Nesterov and V. Spokoiny, “Random Gradient-Free Minimization of Convex Functions,” Foundations of Computational Mathematics, vol. 17, no. 2, pp. 527–566, 2017.
[21] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge university press, 1990.
[22] M. S. Stankovic, K. H. Johansson, and D. M. Stipanovic, “Distributed Seeking of Nash Equilibria With Applications to Mobile Sensor Networks,” IEEE Transactions on Automatic Control, vol. 57, no. 4, pp. 904–919, 2012.

$\displaystyle\mathbb{E}[\\|{g}^{i}_{jk}(\mathbf{x}_{t})\\|^{2}\|\mathcal{H}_{t}]$	$\displaystyle\leq 4(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}_{t})\\|^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}_{t})-\nabla f^{i}_{j,\mu}(\bar{\mathbf{y}}_{t})\\|^{2}+12(n+4)\\|\nabla f^{i}_{j,\mu}(\bar{\mathbf{y}}_{t})-\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\\|^{2}$
	$\displaystyle\quad+12(n+4)\\|\nabla f^{i}_{j,\mu}(\mathbf{x}^{*}_{\mu})\\|^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)L^{2}\\|\mathbf{x}_{t}-\bar{\mathbf{y}}_{t}\\|^{2}+12(n+4)L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
	$\displaystyle\quad+12(n+4)G^{2}+3(n+4)^{3}\mu^{2}L^{2}$
	$\displaystyle\leq 12(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad+12(n+4)(L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+G^{2})+3(n+4)^{3}\mu^{2}L^{2},$	(6)

	$\displaystyle\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}$	$\displaystyle=\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\alpha^{i}\varphi^{i}_{k,t}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top}(\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\alpha^{i}\varphi^{i}_{k,t})\\|^{2}$
		$\displaystyle\leq\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\\|\alpha^{i}(I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top})\varphi^{i}_{k,t}\\|^{2}$
		$\displaystyle\quad-2\alpha^{i}\langle\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t},(I_{n_{i}}-\frac{1}{n_{i}}\mathbf{1}_{n_{i}}\mathbf{1}_{n_{i}}^{\top})\varphi^{i}_{k,t}\rangle,$

	$\displaystyle\mathbb{E}[\\|\mathbf{y}^{i}_{k,t+1}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t+1}\\|^{2}\|\mathcal{H}_{t}]\leq\sigma_{\mathcal{A}^{i}}^{2}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\quad+\frac{1-\sigma_{\mathcal{A}^{i}}^{2}}{2\sigma_{\mathcal{A}^{i}}^{2}}\mathbb{E}[\\|\mathcal{A}^{i}\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\frac{2\sigma_{\mathcal{A}^{i}}^{2}}{1-\sigma_{\mathcal{A}^{i}}^{2}}\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\frac{1+\sigma_{\mathcal{A}^{i}}^{2}}{2}\mathbb{E}[\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]+\frac{1+\sigma_{\mathcal{A}^{i}}^{2}}{1-\sigma_{\mathcal{A}^{i}}^{2}}\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]$
	$\displaystyle\quad\leq\frac{1+\bar{\sigma}^{2}}{2}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+\varsigma\alpha_{\max}^{2}\mathbb{E}[\\|\varphi^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}].$

	$\displaystyle\mathbb{E}[\\|\bar{\mathbf{y}}_{t+1}-\mathbf{x}^{*}\\|^{2}\|\mathcal{H}_{t}]$	$\displaystyle\leq(1-(\chi-2\sqrt{n}L\epsilon_{\alpha})\bar{\alpha}+12n(n+4)L^{2}\alpha_{\max}^{2})\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
		$\displaystyle\quad+\bigg{(}12n(n+4)L^{2}\alpha_{\max}^{2}+\frac{n^{2}L^{2}\alpha_{\max}}{\chi}\bigg{)}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
		$\displaystyle\quad+12n(n+4)G^{2}\alpha_{\max}^{2}+3n(n+4)^{3}\mu^{2}L^{2}\alpha_{\max}^{2}.$

	$\displaystyle\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\mathbb{E}[\\|\bar{\varphi}^{i}_{k,t}\\|^{2}\|\mathcal{H}_{t}]\leq\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\bigg{(}12(n+4)L^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}$
	$\displaystyle\quad\quad+12(n+4)(L^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}+G^{2})+3(n+4)^{3}\mu^{2}L^{2}\bigg{)}$
	$\displaystyle\quad\leq 12n(n+4)L^{2}\alpha_{\max}^{2}\sum_{i=1}^{N}\sum_{k=1}^{n_{i}}\\|\mathbf{y}^{i}_{k,t}-\mathbf{1}_{n_{i}}\bar{y}^{i}_{k,t}\\|^{2}+12n(n+4)L^{2}\alpha_{\max}^{2}\\|\bar{\mathbf{y}}_{t}-\mathbf{x}^{*}_{\mu}\\|^{2}$
	$\displaystyle\quad\quad+12n(n+4)G^{2}\alpha_{\max}^{2}+3n(n+4)^{3}\mu^{2}L^{2}\alpha_{\max}^{2}.$		(9)