Algorithm-Level Confidentiality for Average Consensus on Time-Varying Directed Graphs

Huan Gao, and Yongqiang Wang This paper has been accepted to IEEE Transactions on Network Science and Engineering as a regular paper. Please cite this paper as: Huan Gao and Yongqiang Wang, “Algorithm-Level Confidentiality for Average Consensus on Time-Varying Directed Graphs,” in IEEE Transactions on Network Science and Engineering, doi: 10.1109/TNSE.2022.3140274.Part of the results was presented at 2018 IEEE Conference on Communications and Network Security (CNS) [1]. The work was supported in part by the National Science Foundation under Grants ECCS-1912702 and CCF-2106293.Huan Gao was with the Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, USA. He is now with the School of Automation, Northwestern Polytechnical University, Xi’an 710129, China (email: [email protected]). The work was done when Huan Gao was with Clemson University.Yongqiang Wang is with the Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, USA (email: [email protected]).

Abstract

Average consensus plays a key role in distributed networks, with applications ranging from time synchronization, information fusion, load balancing, to decentralized control. Existing average consensus algorithms require individual agents to exchange explicit state values with their neighbors, which leads to the undesirable disclosure of sensitive information in the state. In this paper, we propose a novel average consensus algorithm for time-varying directed graphs that can protect the confidentiality of a participating agent against other participating agents. The algorithm injects randomness in interaction to obfuscate information on the algorithm-level and can ensure information-theoretic privacy without the assistance of any trusted third party or data aggregator. By leveraging the inherent robustness of consensus dynamics against random variations in interaction, our proposed algorithm can also guarantee the accuracy of average consensus. The algorithm is distinctly different from differential-privacy based average consensus approaches which enable confidentiality through compromising accuracy in obtained consensus value. Numerical simulations confirm the effectiveness and efficiency of our proposed approach.

Index Terms:

Average consensus, confidentiality, time-varying directed graphs.

I Introduction

Averaging consensus is an important tool in distributed computing. For a network of $N$ agents interacting on a graph, average consensus can enable the states of all agents to converge to the average of their initial values through local interactions between neighboring agents.

Recently, average consensus is finding increased applications in load balancing [2, 3], network synchronization [4], distributed information fusion [5, 6], and decentralized control [7, 8]. To ensure all agents converge to the average value of their initial values, conventional average consensus approaches require individual agents to exchange explicit state values with their neighbors. This results in the disclosure of sensitive state information, which is sometimes undesirable in terms of confidentiality. In fact, in many applications such as the smart grid, health-care or banking networks, confidentiality is crucial for promoting participation in collaboration since individual agents tend not to trade confidentiality for performance [9, 10, 11]. For instance, a group of people using average consensus to reach a common opinion may want to keep their individual opinions secret [12]. Another typical example is power systems in which multiple generators have to reach agreement on cost while maintaining their individual generation information confidential [13].

To achieve confidentiality in average consensus, recently results have started to emerge. A commonly used confidentiality mechanism is differential privacy from the database literature [14, 15, 16, 17, 18, 19, 20] (and its variants [21, 22]) which injects independent (and hence uncorrelated) noises directly to agents’ states in order to achieve confidentiality in average consensus. However, the use of independent noises on the states in these approaches prevents converging to the exact average value [23]. To improve consensus accuracy, which is crucial in cyber-physical systems and sensor networks, [24, 25, 26, 27, 28, 29, 30, 31] inject carefully calculated correlated additive noises to agents’ states, instead of independent (and hence uncorrelated) noises used in differential-privacy based approaches. (A similar approach was proposed in [32] to achieve maximum consensus.) However, these prior works only consider average consensus under balanced and static network topologies. Different from injecting noises to agents’ states in the aforementioned approaches, [33] employed carefully designed mask maps to protect the actual states. Observability based approaches have also been reported to protect the confidentiality of multi-agent consensus [34, 35, 36]. Its idea is to design the topology of interactions such that the observability from a compromised agent is minimized, which amounts to minimizing the ability of the compromised agent to infer the initial states of other agents. Recently, encryption based approaches have been proposed to protect the confidentiality by encrypting exchanged messages with the assistance of additive homomorphic encryption [37, 38, 39, 40], with the price of increasing computation and communication overhead. Another confidentiality approach was proposed in [41] where each agent’s confidentiality is protected by decomposing its state into two sub-states. However, [41] relies on undirected interactions and is inapplicable to time-varying directed graphs considered in this paper.

This paper addresses the confidentiality of average consensus under time-varying directed graphs that are not necessarily balanced. Since push-sum based average consensus approaches do not require balanced topologies, we build our confidential average consensus algorithm on the push-sum approach. More specifically, to protect the confidentiality of the initial states of participating agents, in the first several iterations, we let agents send random values instead of their actual states to obfuscate their initial values. Of course, to guarantee the accuracy of average consensus, we have to judiciously design the state-update rule such that the randomness added in the first several iterations does not affect the final convergence result. Different from approaches injecting correlated additive noises directly to agents’ states, our approach adds independent (and hence uncorrelated in time) randomness directly to the average consensus dynamics, which makes it applicable to time-varying directed graphs. Compared with our prior work in [40] which employs homomorphic encryption to preserve confidentiality and [41] which protects each agent’s confidentiality by decomposing its state into two sub-states, this paper proposes a different approach that enables confidentiality by judiciously adding randomness in interaction dynamics. More importantly, both [40] and [41] rely on undirected interactions and hence are inapplicable to time-varying directed graphs considered in this paper. Some of the results here were presented at the 2018 IEEE Conference on Communications and Network Security (CNS) [1]. Compared with the conference version, the journal version has the following significant differences: 1) the journal version extends the results for constant directed graphs in [1] to time-varying directed graphs; 2) the journal version provides formal and rigorous analysis of convergence rate that does not exist in the conference version; 3) the journal version allows multiple adversaries to collude to infer the sensitive value a target agent, which is not addressed in our conference version; and 4) the journal version revises and enhances the proposed confidential average consensus algorithm to guarantee information-theoretic privacy, which is stronger than the confidentiality achieved in the conference version.

II Preliminaries and Problem Formulation

II-A Graph Representation

We represent a network of $N$ agents as a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ where $\mathcal{V}=\{1,2,\ldots,N\}$ is the set of agents and $k=0,1,\ldots$ is the time index. $\mathcal{E}(k)\subset\mathcal{V}\times\mathcal{V}$ is the edge set at time $k$ , whose elements are such that $(i,\,j)\in\mathcal{E}(k)$ holds if and only if there exists a directed edge from agent $j$ to agent $i$ at time $k$ , i.e., agent $j$ can send messages to agent $i$ at time $k$ . For notational convenience, we assume that there are no self edges, i.e., $(i,\,i)\notin\mathcal{E}(k)$ for all $k$ and $i\in\mathcal{V}$ . At time $k$ , each edge $(i,\,j)\in\mathcal{E}(k)$ has an associated weight, $p_{ij}(k)>0$ . The out-neighbor set of agent $i$ at time $k$ , which represents the set of agents that can receive messages from agent $i$ at time $k$ , is denoted as $\mathcal{N}_{i}^{out}(k)=\{j\in\mathcal{V}\,|\,(j,\,i)\in\mathcal{E}(k)\}$ . Similarly, at time $k$ , the in-neighbor set of agent $i$ , which represents the set of agents that can send messages to agent $i$ at time $k$ , is denoted as $\mathcal{N}_{i}^{in}(k)=\{j\in\mathcal{V}\,|\,(i,\,j)\in\mathcal{E}(k)\}$ . From the above definitions, it can be obtained that $i\in\mathcal{N}_{j}^{out}(k)$ and $j\in\mathcal{N}_{i}^{in}(k)$ are equivalent. Agent $i$ ’s out-degree at time instant $k$ is represented by $D_{i}^{out}(k)=|\mathcal{N}_{i}^{out}(k)|$ and its in-degree is represented by $D_{i}^{in}(k)=|\mathcal{N}_{i}^{in}(k)|$ , where $|\mathcal{S}|$ is the cardinality of the set $\mathcal{S}$ .

At iteration $k$ , the incidence matrix $\mathbf{C}(k)=[c_{il}(k)]_{N\times E(k)}$ for graph $\mathcal{G}(k)=(\mathcal{V},\mathcal{E}(k))$ is defined as

c_{il}(k)=\left\{\begin{aligned} 1\quad&\textnormal{if the}\ l\textnormal{-th edge in}\ \mathcal{E}(k)\ \textnormal{is}\ (i,\,j)\\ -1\quad&\textnormal{if the}\ l\textnormal{-th edge in}\ \mathcal{E}(k)\ \textnormal{is}\ (j,\,i)\\ 0\quad&\textnormal{otherwise}\end{aligned}\right.

(1)

where $E(k)=|\mathcal{E}(k)|$ represents the number of edges in $\mathcal{E}(k)$ . Note that the $l$ -th column of $\mathbf{C}(k)$ is corresponding to the $l$ -th edge in $\mathcal{E}(k)$ , and the sum of each column of $\mathbf{C}(k)$ is $0$ , i.e., $\mathbf{1}^{T}\mathbf{C}(k)=\mathbf{0}^{T}$ .

For a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ , we define $\mathcal{E}_{\infty}$ as the set of directed edges $(i,\,j)$ that exist for infinitely many time instants, i.e.,

\mathcal{E}_{\infty}=\big{\{}(i,\,j)\big{|}(i,\,j)\in\mathcal{E}(k)\ \text{for infinitely many indices}\ k\big{\}}

(2)

We focus on time-varying directed graphs which satisfy the following assumptions:

Assumption 1

For a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ , for any $i,j\in\mathcal{V}$ with $i\neq j$ , there exists at least one directed path from $i$ to $j$ in $(\mathcal{V},\,\mathcal{E}_{\infty})$ , i.e., $(\mathcal{V},\,\mathcal{E}_{\infty})$ is strongly connected.

Assumption 2

For a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ , there exists an integer $T\geq 1$ such that for every $(i,\,j)\in\mathcal{E}_{\infty}$ , agent $j$ directly communicates with agent $i$ at least once in every $T$ consecutive time instants. $T$ is called intercommunication interval bound.

Assumption 3

We assume that each agent $i$ has access to its out-degree $D_{i}^{out}(k)$ at each iteration $k$ .

Remark 1

Assumption 3 is widely used in existing literature on time-varying directed graphs such as [42, 43, 44]. In fact, in many directed graphs, it is feasible for a node to know its out-neighbors. For example, in many safety-critical systems such as industrial control systems, the exchange of data occurs in a directed way due to unidirectional gateways (aka data diode) whereas control messages (a special type of messages used to configure network connections) can be exchanged in a bidirectional manner to establish connections [45].

II-B The Conventional Push-Sum

The conventional push-sum considers $N$ agents interacting on a constant directed graph $\mathcal{G}=(\mathcal{V},\,\mathcal{E})$ , with each agent having an initial state $x_{i}^{0}$ ( $i=1,2,\ldots,N$ ) [46, 47]. Represent the average value of all initial states as $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ . The conventional push-sum algorithm conducts two iterative computations simultaneously, and allows each agent to obtain the exact average of the initial values $\bar{x}^{0}$ in an asymptotic way. This mechanism is summarized in Algorithm 0 below:

Algorithm 0: The conventional push-sum algorithm

1.

$N$ agents interact on a constant directed graph $\mathcal{G}=(\mathcal{V},\,\mathcal{E})$ . Each agent $i$ is initialized with $s_{i}(0)=x_{i}^{0}$ , $w_{i}(0)=1$ , and $\pi_{i}(0)=s_{i}(0)/w_{i}(0)$ . The weight $p_{ij}$ associated with the edge $(i,\,j)\in\mathcal{E}$ satisfies $p_{ij}\in(0,1)$ if $j\in\mathcal{N}_{i}^{in}\cup\{i\}$ is true and $p_{ij}=0$ otherwise. For any given $j=1,2,\ldots,N$ , $p_{ij}$ satisfies $\sum_{i=1}^{N}p_{ij}=1$ .

At iteration step $k$ :

(a)

Agent $i$ calculates $p_{ji}s_{i}(k)$ and $p_{ji}w_{i}(k)$ , and sends both values to all of its out-neighbors $j\in\mathcal{N}_{i}^{out}$ .

(b)

After receiving the values of $p_{ij}s_{j}(k)$ and $p_{ij}w_{j}(k)$ from all its in-neighbors $j\in\mathcal{N}_{i}^{in}$ , agent $i$ updates $s_{i}$ and $w_{i}$ as follows:

\left\{\begin{aligned} &\ s_{i}(k+1)=\sum_{j\in\mathcal{N}_{i}^{in}\cup\{i\}}p_{ij}s_{j}(k)\\ &\ w_{i}(k+1)=\sum_{j\in\mathcal{N}_{i}^{in}\cup\{i\}}p_{ij}w_{j}(k)\end{aligned}\right.

(3)

(c)

Agent $i$ uses the ratio $\pi_{i}(k+1)=s_{i}(k+1)/w_{i}(k+1)$ to estimate the average value $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ .

For the sake of notational simplicity, we rewrite (3) in the following more compact form:

\left\{\begin{aligned} &\ \mathbf{s}(k+1)=\mathbf{P}\mathbf{s}(k)\\ &\ \mathbf{w}(k+1)=\mathbf{P}\mathbf{w}(k)\end{aligned}\right.

(4)

where $\mathbf{s}(k)=[s_{1}(k),s_{2}(k),\ldots,s_{N}(k)]^{T}$ and $\mathbf{w}(k)=[w_{1}(k),w_{2}(k),\ldots,w_{N}(k)]^{T}$ , and $\mathbf{P}=[p_{ij}]$ . From Algorithm 0, we have $\mathbf{s}(0)=[x_{1}^{0},x_{2}^{0},\ldots,x_{N}^{0}]^{T}$ and $\mathbf{w}(0)=\mathbf{1}$ . We can also obtain that the matrix $\mathbf{P}$ is column-stochastic, i.e., $\sum_{i=1}^{N}p_{ij}=1$ holds for $j=1,2,\ldots,N$ .

At iteration step $k$ , each agent computes the ratio $\pi_{i}(k+1)=s_{i}(k+1)/w_{i}(k+1)$ to estimate the average value $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ . Since $\mathcal{G}$ is assumed to be a strongly connected directed graph, $\mathbf{P}^{k}$ will converge to a rank- $1$ matrix exponentially fast [48, 49]. Defining $\mathbf{P}^{\infty}$ as the limit of $\mathbf{P}^{k}$ as $k\rightarrow\infty$ , we can obtain the form of $\mathbf{P}^{\infty}$ as $\mathbf{P}^{\infty}=\mathbf{v}\mathbf{1}^{T}$ where $\mathbf{v}=[v_{1},v_{2},\ldots,v_{N}]^{T}$ . Using the facts $\mathbf{s}(k)=\mathbf{P}^{k}\mathbf{s}(0)$ and $\mathbf{w}(k)=\mathbf{P}^{k}\mathbf{w}(0)$ , we can further have [50]:

\displaystyle\pi_{i}(\infty)=\frac{s_{i}(\infty)}{w_{i}(\infty)}=\frac{[\mathbf{P}^{\infty}\mathbf{s}(0)]_{i}}{[\mathbf{P}^{\infty}\mathbf{w}(0)]_{i}}=\frac{v_{i}\sum_{j=1}^{N}s_{j}(0)}{v_{i}\sum_{j=1}^{N}w_{j}(0)}=\bar{x}^{0}

(5)

where $[\mathbf{P}^{\infty}\mathbf{s}(0)]_{i}$ and $[\mathbf{P}^{\infty}\mathbf{w}(0)]_{i}$ represent the $i$ -th element of vector $\mathbf{P}^{\infty}\mathbf{s}(0)$ and vector $\mathbf{P}^{\infty}\mathbf{w}(0)$ , respectively. Hence, all estimates $\pi_{1}(k),\pi_{2}(k),\ldots,\pi_{N}(k)$ will asymptotically converge to the average $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ .

II-C Problem Formulation

In this paper, we will address average consensus under time-varying directed graphs while protecting the confidentiality of participating agents against adversaries. To this end, we first present some assumptions and definitions.

Assumption 4

We assume that all agents’ initial states $x_{i}^{0}$ are bounded. Without loss of generality, the lower bound and upper bound are denoted as $a$ and $b$ , respectively. Both $a$ and $b$ are assumed known to all agents.

Remark 2

It is worth noting that although the bounds $a$ and $b$ are assumed known to all agents, this does not mean that the minimum and maximum of all agents’ initial states are known to all agents. In fact, $a$ (resp. $b$ ) can be arbitrarily small (resp. large) compared with the actual minimum (resp. maximum) of agents’ initial states.

Definition 1

We define an honest-but-curious adversary as an agent who follows all protocol steps correctly but collects received messages in an attempt to infer the initial value of other participating agents.

Assumption 5

We assume that agents can collude, i.e., a set of honest-but-curious agents $\mathcal{A}$ can share information with each other to infer the initial value $x_{i}^{0}$ of a target agent $i\notin\mathcal{A}$ .

Definition 2

We define that confidentiality (privacy) of the initial value $x_{i}^{0}$ of agent $i$ is preserved if $x_{i}^{0}$ is indistinguishable from the viewpoint of honest-but-curious adversaries $\mathcal{A}$ . By “indistinguishable,” we mean that the probability distribution of information set accessible to $\mathcal{A}$ does not change when agent $i$ ’s initial state $x_{i}^{0}$ is altered to any $\tilde{x}_{i}^{0}\neq x_{i}^{0}$ under the constraint that the sum of the initial states of all nodes not in $\mathcal{A}$ (i.e., $\sum_{j\in\mathcal{V}\setminus\mathcal{A}}x_{j}^{0}$ ) is unchanged.

Our definition of confidentiality requires perfect indistinguishability of a target agent’s different initial states from the viewpoint of honest-but-curious adversaries $\mathcal{A}$ , and, therefore, is more stringent than the confidentiality definition in [31, 32, 51, 52, 53] which defines confidentiality as the inability of an adversary to uniquely determine the sensitive value.

We next show that the conventional push-sum is not confidential. From (3) and (4), an honest-but-curious agent $i$ can receive $p_{ij}s_{j}(0)$ and $p_{ij}w_{j}(0)$ from its in-neighbor agent $j$ after the first iteration step $k=0$ . Then agent $i$ is able to uniquely determine $x_{j}^{0}$ by $x_{j}^{0}=s_{j}(0)=\frac{p_{ij}s_{j}(0)}{p_{ij}w_{j}(0)}$ using the fact $w_{j}(0)=1$ . Therefore, an honest-but-curious agent can always infer the initial values of all its in-neighbors, and hence the conventional push-sum algorithm cannot provide confidentiality against honest-but-curious adversaries. It is worth noting that using a similar argument, we can also obtain that the conventional push-sum is not confidential even when the weight is allowed to be time-varying (e.g., [47].)

III The Confidentiality Algorithm and Performance Analysis

In this section, we will propose a confidential average consensus algorithm for time-varying directed graphs, and then provide rigorous analysis of its convergence rate and enabled strength of confidentiality.

III-A Confidential Average Consensus Algorithm

The analysis above reveals that using the same weight $p_{ij}$ for both $p_{ij}s_{j}(0)$ and $p_{ij}w_{j}(0)$ discloses the initial state value. Motivated by this observation and the work in [29], here we introduce a novel confidential average consensus algorithm which injects randomness in the dynamics of interactions in iterations $k=0,\ldots,K$ . Note that here $K$ is a non-negative integer and is known to every agent. Its influence will be discussed in detail in Remark 11 and Remark 12.

Algorithm 1: Confidential average consensus algorithm

1.

$N$ agents interact on a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ . Each agent $i$ is initialized with $s_{i}(0)=\frac{1}{N^{2}}+\frac{(N-2)(x_{i}^{0}-a)}{(b-a)N^{2}}\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]$ , $w_{i}(0)=1$ , and $\pi_{i}(0)=\frac{b-a}{N-2}[N\times{\rm frac}(\frac{Ns_{i}(0)}{w_{i}(0)})-1]+a$ where the function ${\rm frac}(x)=x-\lfloor x\rfloor$ denotes the fractional part of a real number $x$ (here $\lfloor x\rfloor$ represents the largest integer not greater than $x$ ).

At iteration step $k$ :

(a)

Agent $i$ generates a set of random weights $\big{\{}p_{ji}(k)\in(\varepsilon,\,1)\,\big{|}\,j\in\mathcal{N}_{i}^{out}(k)\cup\{i\}\big{\}}$ with the sum of this set equal to $1$ , and sets $\Delta w_{ji}(k)=p_{ji}(k)w_{i}(k)$ for $j\in\mathcal{N}_{i}^{out}(k)\cup\{i\}$ .
(b)

If $k\leq K$ , agent $i$ independently generates uniformly distributed values $\Delta s_{ji}(k)\in[0,\,1)$ for its out-neighbors $j\in\mathcal{N}_{i}^{out}(k)$ , and sets $\Delta s_{ii}(k)={\rm frac}\big{(}s_{i}(k)-\sum_{j\in\mathcal{N}_{i}^{out}(k)}\Delta s_{ji}(k)\big{)}$ ; otherwise, agent $i$ sets $\Delta s_{ji}(k)=p_{ji}(k)s_{i}(k)$ for $j\in\mathcal{N}_{i}^{out}(k)\cup\{i\}$ .
(c)

Agent $i$ sends $\Delta s_{ji}(k)$ and $\Delta w_{ji}(k)$ to its out-neighbors $j\in\mathcal{N}_{i}^{out}(k)$ .

(d)

After receiving $\Delta s_{ij}(k)$ and $\Delta w_{ij}(k)$ from its in-neighbors $j\in\mathcal{N}_{i}^{in}(k)$ , agent $i$ updates $s_{i}$ and $w_{i}$ as

s_{i}(k+1)=\left\{\begin{aligned} &{\rm frac}\Big{(}\sum_{j\in\mathcal{N}_{i}^{in}(k)\cup\{i\}}\Delta s_{ij}(k)\Big{)}\ \textnormal{for}\,k\leq K\\ &\sum_{j\in\mathcal{N}_{i}^{in}(k)\cup\{i\}}\Delta s_{ij}(k)\quad\ \textnormal{for}\,k\geq K+1\\ \end{aligned}\right.

(6)

and

\displaystyle w_{i}(k+1)=\sum_{j\in\mathcal{N}_{i}^{in}(k)\cup\{i\}}\Delta w_{ij}(k)\quad\textnormal{for}\ k\geq 0

(7)

respectively.

(e)

Agent $i$ uses the ratio $\pi_{i}(k+1)=\frac{b-a}{N-2}[N\times{\rm frac}(\frac{Ns_{i}(k+1)}{w_{i}(k+1)})-1]+a$ to estimate the average value $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ .

Remark 3

Compared to the conventional confidentiality-violating push-sum algorithm which broadcasts messages, Algorithm 1 needs agent $i$ to send different random numbers to different out-neighbors in iterations $k\leq K$ . This is a price of obtaining confidentiality without losing accuracy in the time-varying directed topology case.

Remark 4

The way of injecting randomness in $\Delta s_{ji}(k)$ is different in iterations $k\leq K$ from $k>K$ . In fact, in iterations $k\leq K$ , $\Delta s_{ji}(k)$ can be nonzero even when $s_{i}(k)$ is zero. This is crucial in enabling strong confidentiality as receiving $\Delta s_{ji}(k)$ of a value zero will not allow the recipient to infer information about $s_{i}(k)$ .

Remark 5

In Algorithm 1, the coupling weights are randomly chosen at every iteration. Compared with the commonly used deterministic setting in which the weights are set as $p_{ji}(k)=1/(D_{i}^{out}(k)+1)$ for all $j\in\mathcal{N}_{i}^{out}(k)\cup\{i\}$ , our setting is more general since it includes the commonly used setting as a special case by fixing $p_{ji}(k)$ to deterministic values. Furthermore, the random weights beyond step $K$ in Algorithm 1 can provide additional confidentiality protection for intermediate states $s_{i}(k)$ after iteration $K$ . Given $\Delta s_{ji}(k)=p_{ji}(k)s_{i}(k)$ for $k\geq K+1$ , we can see that using random weights $p_{ji}(k)$ makes the intermediate state $s_{i}(k)$ more difficult to infer than using deterministic weights $p_{ji}(k)$ .

Setting $\Delta s_{ji}(k)$ , $\Delta w_{ji}(k)$ , and $p_{ji}(k)$ to $0$ for $j\notin\mathcal{N}_{i}^{out}(k)\cup\{i\}$ , we can rewrite the update rules of $\mathbf{s}(k)$ for $k\geq K+1$ and $\mathbf{w}(k)$ for $k\geq 0$ as

\left\{\begin{aligned} &s_{i}(k+1)=\sum_{j=1}^{N}\Delta s_{ij}(k)=\sum_{j=1}^{N}p_{ij}(k)s_{j}(k)\ \,\textnormal{for}\ k\geq K+1\\ &w_{i}(k+1)=\sum_{j=1}^{N}\Delta w_{ij}(k)=\sum_{j=1}^{N}p_{ij}(k)w_{j}(k)\ \ \textnormal{for}\ k\geq 0\end{aligned}\right.

(8)

Denoting $\mathbf{s}(k)$ , $\mathbf{w}(k)$ , and $\mathbf{P}(k)$ as $\mathbf{s}(k)=[s_{1}(k)\,\cdots\,s_{N}(k)]^{T}$ , $\mathbf{w}(k)=[w_{1}(k)\,\cdots\,w_{N}(k)]^{T}$ , and $\mathbf{P}(k)=[p_{ij}(k)]_{N\times N}$ , we can further rewrite (8) into a matrix form

\left\{\begin{aligned} &\ \mathbf{s}(k+1)=\mathbf{P}(k)\mathbf{s}(k)\quad\ \ \text{for}\ k\geq K+1\\ &\ \mathbf{w}(k+1)=\mathbf{P}(k)\mathbf{w}(k)\quad\text{for}\ k\geq 0\end{aligned}\right.

(9)

For iteration $k=0$ , we have $\mathbf{s}(0)=[x_{1}^{0},x_{2}^{0},\ldots,x_{N}^{0}]^{T}$ and $\mathbf{w}(0)=\mathbf{1}$ . From Algorithm 1, we know that $\mathbf{P}(k)$ in (9) is time-varying and column-stochastic for $k\geq 0$ .

Defining the transition matrix as follows

\displaystyle\mathbf{\Phi}(k:t)=\mathbf{P}(k)\cdots\mathbf{P}(t)

(10)

for all $k$ and $t$ with $k\geq t$ , where $\mathbf{\Phi}(k:k)=\mathbf{P}(k)$ , we can rewrite (9) as

\left\{\begin{aligned} &\mathbf{s}(k+1)=\mathbf{\Phi}(k:K+1)\mathbf{s}(K+1)\quad\text{for}\ k\geq K+1\\ &\mathbf{w}(k+1)=\mathbf{\Phi}(k:0)\mathbf{w}(0)\qquad\text{for}\ k\geq 0\end{aligned}\right.

(11)

III-B Convergence Analysis

Next we prove that Algorithm 1 can guarantee that the estimates of all agents converge to the exact average value of initial values. We will also analyze the rate of convergence of Algorithm 1. Using the convergence definition in [43] and [54], we define the rate of convergence to be at least $\gamma\in(0,\,1)$ if there exists a positive constant value $C$ such that $\big{\|}\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\big{\|}\leq C\gamma^{k}$ is true for all $k$ , where $\boldsymbol{\pi}(k)=[\pi_{1}(k),\ldots,\pi_{N}(k)]^{T}$ and $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ is the average value. Note that this definition means a smaller $\gamma$ corresponding to a faster convergence. To analyze the convergence rate of Algorithm 1, we first introduce Lemma 1 below:

Lemma 1

For a network of $N$ agents represented by a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ which satisfy Assumptions 1, 2, and 3, under Algorithm 1, each agent $i$ has $w_{i}(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ where $T$ is defined in Assumption 2.

Proof: For $k\geq 1$ , from (11) we have

\displaystyle\mathbf{w}(k)=\mathbf{\Phi}(k-1:0)\mathbf{1}

(12)

Represent $\delta(k)$ as

\displaystyle\delta(k)\triangleq\min_{1\leq i\leq N}{w}_{i}(k)=\min_{1\leq i\leq N}[\mathbf{\Phi}(k-1:0)\,\mathbf{1}]_{i}

(13)

for $k\geq 1$ . To prove $w_{i}(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ , it is sufficient to prove $\delta(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ . We divide our proof into two parts: $1\leq k\leq T(N-1)$ and $k\geq T(N-1)+1$ .

Part 1: $\delta(k)\geq\varepsilon^{T(N-1)}$ for $1\leq k\leq T(N-1)$ . One can verify that the following relationship holds

	$\displaystyle[\mathbf{\Phi}(k-1:0)]_{ii}$	(14)
$\displaystyle=$	$\displaystyle[\mathbf{P}(k-1)\cdots\mathbf{P}(0)]_{ii}$
$\displaystyle\geq$	$\displaystyle[\mathbf{P}(k-1)]_{ii}\,[\mathbf{P}(k-2)\cdots\mathbf{P}(0)]_{ii}$
$\displaystyle\geq$	$\displaystyle\varepsilon\,[\mathbf{\Phi}(k-2:0)]_{ii}$

Given $[\mathbf{\Phi}(0:0)]_{ii}=[\mathbf{P}(0)]_{ii}\geq\varepsilon$ , one can obtain $[\mathbf{\Phi}(k-1:0)]_{ii}\geq\varepsilon^{k}$ . Therefore, it follows that

	$\displaystyle\left[\mathbf{\Phi}(k-1:0)\,\mathbf{1}\right]_{i}$	$\displaystyle\geq[\mathbf{\Phi}(k-1:0)]_{ii}$		(15)
		$\displaystyle\geq\varepsilon^{k}\geq\varepsilon^{T(N-1)}$		(15)

is true for $i=1,\ldots,N$ and $1\leq k\leq T(N-1)$ , implying $\delta(k)\geq\varepsilon^{T(N-1)}$ for $1\leq k\leq T(N-1)$ .

Part 2: $\delta(k)\geq\varepsilon^{T(N-1)}$ for $k\geq T(N-1)+1$ . Under Assumptions 1 and 2, and the requirements on weights $p_{ij}(k)$ in Algorithm 1, and following the arguments in Lemma 2 in [55], we can obtain

[\mathbf{\Phi}(k-1:k-T(N-1))]_{ij}\geq\varepsilon^{T(N-1)}

(16)

for $1\leq i,\,j\leq N$ . Since $k\geq T(N-1)+1$ holds and $\mathbf{P}(k)$ is a column-stochastic matrix,

\mathbf{\Phi}(k-T(N-1)-1:0)=\mathbf{P}(k-T(N-1)-1)\cdots\mathbf{P}(0)

(17)

should also be a column-stochastic matrix. Further using the fact $\mathbf{\Phi}(k-1:0)=\mathbf{\Phi}(k-1:k-T(N-1))\mathbf{\Phi}(k-T(N-1)-1:0)$ leads to $[\mathbf{\Phi}(k-1:0)]_{ij}\geq\varepsilon^{T(N-1)}$ for $1\leq i,\,j\leq N$ . Therefore, we have

[\mathbf{\Phi}(k-1:0)\,\mathbf{1}]_{i}\geq N\varepsilon^{T(N-1)}\geq\varepsilon^{T(N-1)}

(18)

for $i=1,\ldots,N$ , meaning $\delta(k)\geq\varepsilon^{T(N-1)}$ for $k\geq T(N-1)+1$ .

Based on $\delta(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ , we can obtain $w_{i}(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ . In summary, we always have $w_{i}(k)\geq\varepsilon^{T(N-1)}$ for $k\geq 1$ . $\blacksquare$

Theorem 1

For a network of $N$ agents represented by a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ satisfying Assumptions 1, 2, 3, and 4, under Algorithm 1, the estimate $\pi_{i}(k)=\frac{b-a}{N-2}[N\times{\rm frac}(\frac{Ns_{i}(k)}{w_{i}(k)})-1]+a$ of each agent $i$ will converge to the average $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ . More specifically, the rate of convergence of Algorithm 1 is at least $\gamma=(1-\varepsilon^{T(N-1)})^{\frac{1}{T(N-1)}}\in(0,\,1)$ , meaning that there exists a positive constant value $C$ satisfying $\big{\|}\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\big{\|}\leq C\gamma^{k}$ for all $k$ .

Proof: From (6), we have

	$\displaystyle{\rm frac}\Big{(}\sum_{i=1}^{N}s_{i}(k+1)\Big{)}$	(19)
$\displaystyle=$	$\displaystyle{\rm frac}\bigg{(}\sum_{i=1}^{N}{\rm frac}\Big{(}\sum_{j\in\mathcal{N}_{i}^{in}(k)\cup\{i\}}\Delta s_{ij}(k)\Big{)}\bigg{)}$
$\displaystyle=$	$\displaystyle{\rm frac}\bigg{(}\sum_{i=1}^{N}\sum_{j\in\mathcal{N}_{i}^{in}(k)}\Delta s_{ij}(k)+\sum_{i=1}^{N}\Delta s_{ii}(k)\bigg{)}$
$\displaystyle=$	$\displaystyle{\rm frac}\bigg{(}\sum_{i=1}^{N}\sum_{j\in\mathcal{N}_{i}^{in}(k)}\Delta s_{ij}(k)$
	$\displaystyle\qquad\quad+\sum_{i=1}^{N}{\rm frac}\Big{(}s_{i}(k)-\sum_{j\in\mathcal{N}_{i}^{out}(k)}\Delta s_{ji}(k)\Big{)}\bigg{)}$
$\displaystyle=$	$\displaystyle{\rm frac}\Big{(}\sum_{i=1}^{N}s_{i}(k)\Big{)}$

for $k\leq K$ where in the derivation we used the property ${\rm frac}(x+y)={\rm frac}(x+{\rm frac}(y))={\rm frac}({\rm frac}(x)+y)={\rm frac}({\rm frac}(x)+{\rm frac}(y))$ for any $x,\,y\in\mathbb{R}$ . According to Assumption 4, $x_{i}^{0}\in[a,\,b]$ holds for each agent $i$ . Given $s_{i}(0)=\frac{1}{N^{2}}+\frac{(N-2)(x_{i}^{0}-a)}{(b-a)N^{2}}$ , one can obtain $s_{i}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]$ , leading to

{\rm frac}\big{(}\sum_{i=1}^{N}s_{i}(0)\big{)}=\sum_{i=1}^{N}s_{i}(0)\in[\frac{1}{N},\,\frac{N-1}{N}]\subset(0,\,1)

(20)

Further combining (19) and (20) yields

{\rm frac}\big{(}\sum_{i=1}^{N}s_{i}(K+1)\big{)}=\cdots={\rm frac}\big{(}\sum_{i=1}^{N}s_{i}(0)\big{)}=\sum_{i=1}^{N}s_{i}(0)

(21)

Since $\mathbf{P}(k)$ is column stochastic, from (9) we have

\displaystyle\mathbf{1}^{T}\mathbf{w}(k+1)=\mathbf{1}^{T}\mathbf{w}(k)=\cdots=\mathbf{1}^{T}\mathbf{w}(0)=N

(22)

for $k\geq 0$ . Then we rewrite (11) as

\left\{\begin{aligned} &\mathbf{s}(K+l+1)=\mathbf{\Phi}(K+l:K+1)\mathbf{s}(K+1)\\ &\mathbf{w}(K+l+1)=\mathbf{\Phi}(K+l:K+1)\mathbf{w}(K+1)\\ \end{aligned}\right.

(23)

for $l\geq 1$ . Under Assumptions 1 and 2, and the requirements on weights $p_{ij}(k)$ in Algorithm 1, following Proposition 1(b) in [56], we know that the transition matrix $\mathbf{\Phi}(K+l:K+1)$ will converge to a stochastic vector $\boldsymbol{\varphi}(K+l)$ with a geometric rate for all $i$ and $j$ , i.e., for all $i,j=1,\ldots,N$ and $l\geq 1$ , we have

\displaystyle\big{|}[\mathbf{\Phi}(K+l:K+1)]_{ij}-\varphi_{i}(K+l)\big{|}\leq C_{0}\gamma^{l-1}

(24)

with $C_{0}=2({1+\varepsilon^{-T(N-1)}})/({1-\varepsilon^{T(N-1)}})$ and $\gamma=(1-\varepsilon^{T(N-1)})^{\frac{1}{T(N-1)}}$ . Defining $\mathbf{M}(K+l:K+1)$ as

\displaystyle\mathbf{M}(K+l:K+1)\triangleq\mathbf{\Phi}(K+l:K+1)-\boldsymbol{\varphi}(K+l)\,\mathbf{1}^{T}

(25)

we have

\displaystyle\left|\left[\mathbf{M}(K+l:K+1)\right]_{ij}\right|\leq C_{0}\gamma^{l-1}

(26)

for all $i,j=1,\ldots,N$ and $l\geq 1$ . Further combining (25) with (23) leads to

\left\{\begin{aligned} \mathbf{s}(K+l+1)=&\mathbf{M}(K+l:K+1)\mathbf{s}(K+1)\\ &+\,\boldsymbol{\varphi}(K+l)\,\mathbf{1}^{T}\mathbf{s}(K+1)\\ \mathbf{w}(K+l+1)=&\mathbf{M}(K+l:K+1)\mathbf{w}(K+1)\\ &+\,N\boldsymbol{\varphi}(K+l)\\ \end{aligned}\right.

(27)

where in the derivation we used $\mathbf{1}^{T}\mathbf{w}(K+1)=N$ from (22). Then from (27), we have

	$\displaystyle\frac{s_{i}(K+l+1)}{w_{i}(K+l+1)}-\frac{\sum_{j=1}^{N}s_{j}(K+1)}{N}$	(28)
$\displaystyle=$	$\displaystyle\frac{s_{i}(K+l+1)}{w_{i}(K+l+1)}-\frac{\mathbf{1}^{T}\mathbf{s}(K+1)}{N}$
$\displaystyle=$	$\displaystyle\frac{s_{i}(K+l+1)}{w_{i}(K+l+1)}-\frac{\mathbf{1}^{T}\mathbf{s}(K+1)w_{i}(K+l+1)}{Nw_{i}(K+l+1)}$
$\displaystyle=$	$\displaystyle\frac{[\mathbf{M}(K+l:K+1)\mathbf{s}(K+1)]_{i}+\varphi_{i}(K+l)\mathbf{1}^{T}\mathbf{s}(K+1)}{w_{i}(K+l+1)}$
	$\displaystyle\,-\frac{\mathbf{1}^{T}\mathbf{s}(K+1)[\mathbf{M}(K+l:K+1)\mathbf{w}(K+1)]_{i}}{Nw_{i}(K+l+1)}$
	$\displaystyle\,-\frac{\mathbf{1}^{T}\mathbf{s}(K+1)N\varphi_{i}(K+l)}{Nw_{i}(K+l+1)}$
$\displaystyle=$	$\displaystyle\frac{[\mathbf{M}(K+l:K+1)\mathbf{s}(K+1)]_{i}}{w_{i}(K+l+1)}$
	$\displaystyle\,-\frac{\mathbf{1}^{T}\mathbf{s}(K+1)[\mathbf{M}(K+l:K+1)\mathbf{w}(K+1)]_{i}}{Nw_{i}(K+l+1)}$

Therefore, for $i=1,\ldots,N$ and $l\geq 1$ , we can obtain

	$\displaystyle\big{\|}N\frac{s_{i}(K+l+1)}{w_{i}(K+l+1)}-\sum_{j=1}^{N}s_{j}(K+1)\big{\|}$	(29)
$\displaystyle\leq$	$\displaystyle\frac{N\big{\|}[\mathbf{M}(K+l:K+1)\mathbf{s}(K+1)]_{i}\big{\|}}{w_{i}(K+l+1)}$
	$\displaystyle\,+\frac{N\big{\|}\mathbf{1}^{T}\mathbf{s}(K+1)[\mathbf{M}(K+l:K+1)\mathbf{w}(K+1)]_{i}\big{\|}}{Nw_{i}(K+l+1)}$
$\displaystyle\leq$	$\displaystyle\frac{N}{\varepsilon^{T(N-1)}}\big{(}\max_{j}\big{\|}[\mathbf{M}(K+l:K+1)]_{ij}\big{\|}\big{)}\big{\\|}\mathbf{s}(K+1)\big{\\|}_{1}$
	$\displaystyle\,+\frac{N}{\varepsilon^{T(N-1)}}\big{\|}\mathbf{1}^{T}\mathbf{s}(K+1)\big{\|}\big{(}\max_{j}\big{\|}[\mathbf{M}(K+l:K+1)]_{ij}\big{\|}\big{)}$

where in the derivation we used $w_{i}(K+l+1)\geq\varepsilon^{T(N-1)}$ from Lemma 1 and $\big{\|}\mathbf{w}(K+1)\big{\|}_{1}=\sum_{i=1}^{N}|w_{i}(K+1)|=\mathbf{1}^{T}\mathbf{w}(K+1)=N$ from (22). Further using the relationship $\big{|}\mathbf{1}^{T}\mathbf{s}(K+1)\big{|}\leq\big{\|}\mathbf{s}(K+1)\big{\|}_{1}$ and (26) yields

\displaystyle\big{|}N\frac{s_{i}(k)}{w_{i}(k)}-\sum_{j=1}^{N}s_{j}(K+1)\big{|}\leq C_{1}\gamma^{k}

(30)

for $k\geq K+2$ with $C_{1}$ given by

\displaystyle C_{1}=2NC_{0}\big{\|}\mathbf{s}(K+1)\big{\|}_{1}\varepsilon^{-T(N-1)}\gamma^{-K-2}

(31)

Given $s_{i}(0)=\frac{1}{N^{2}}+\frac{(N-2)(x_{i}^{0}-a)}{(b-a)N^{2}}$ , we have

	$\displaystyle\bar{x}^{0}$	$\displaystyle=\frac{1}{N}\sum_{j=1}^{N}x_{j}^{0}=\frac{b-a}{N-2}\Big{(}N\sum_{j=1}^{N}s_{j}(0)-1\Big{)}+a$		(32)
		$\displaystyle=\frac{b-a}{N-2}\Big{(}N\times{\rm frac}\big{(}\sum_{j=1}^{N}s_{j}(K+1)\big{)}-1\Big{)}+a$		(32)

where in the derivation we used ${\rm frac}\big{(}\sum_{j=1}^{N}s_{j}(K+1)\big{)}=\sum_{j=1}^{N}s_{j}(0)$ from (21). Combining (32) with $\pi_{i}(k)=\frac{b-a}{N-2}[N\times{\rm frac}(\frac{Ns_{i}(k)}{w_{i}(k)})-1]+a$ leads to

		$\displaystyle\pi_{i}(k)-\bar{x}^{0}$		(33)
	$\displaystyle=$	$\displaystyle\frac{b-a}{N-2}N\Big{(}{\rm frac}\big{(}\frac{Ns_{i}(k)}{w_{i}(k)}\big{)}-{\rm frac}\big{(}\sum_{j=1}^{N}s_{j}(K+1)\big{)}\Big{)}$		(33)

From (20) and (21), one can obtain ${\rm frac}\big{(}\sum_{j=1}^{N}s_{j}(K+1)\big{)}=\sum_{j=1}^{N}s_{j}(0)\in[\frac{1}{N},\,\frac{N-1}{N}]$ . Defining $\eta$ as $\eta\triangleq\sum_{j=1}^{N}s_{j}(0)\in[\frac{1}{N},\,\frac{N-1}{N}]$ , then there must exist an integer $Q$ such that $\sum_{j=1}^{N}s_{j}(K+1)=Q+\eta$ holds. From (30), we can see that there must exist a positive integer $K_{1}\geq K+2$ such that

\displaystyle\big{|}N\frac{s_{i}(k)}{w_{i}(k)}-\sum_{j=1}^{N}s_{j}(K+1)\big{|}\leq C_{1}\gamma^{k}<\frac{1}{N}

(34)

holds for $k\geq K_{1}$ . Then it follows naturally one has

\displaystyle Q<Q+\eta-C_{1}\gamma^{k}\leq N\frac{s_{i}(k)}{w_{i}(k)}\leq Q+\eta+C_{1}\gamma^{k}<Q+1

(35)

for $k\geq K_{1}$ , which leads to

\displaystyle 0<\eta-C_{1}\gamma^{k}\leq{\rm frac}\big{(}N\frac{s_{i}(k)}{w_{i}(k)}\big{)}\leq\eta+C_{1}\gamma^{k}<1

(36)

for $k\geq K_{1}$ . Thus, we have

	$\displaystyle\big{\|}\pi_{i}(k)-\bar{x}^{0}\big{\|}$	$\displaystyle=\frac{b-a}{N-2}N\big{\|}{\rm frac}\big{(}\frac{Ns_{i}(k)}{w_{i}(k)}\big{)}-\eta\big{\|}$		(37)
		$\displaystyle\leq\frac{b-a}{N-2}NC_{1}\gamma^{k}$		(37)

and further

\displaystyle\big{\|}\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\big{\|}\leq C_{2}\gamma^{k}

(38)

for $k\geq K_{1}$ with $C_{2}\triangleq\frac{b-a}{N-2}N\sqrt{N}C_{1}$ .

For $k\leq K_{1}-1$ , from (33), we have

\displaystyle|\pi_{i}(k)-\bar{x}^{0}|<\frac{b-a}{N-2}N

(39)

which further implies

\displaystyle\|\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\|<\frac{b-a}{N-2}N\sqrt{N}

(40)

for $k\leq K_{1}-1$ . Defining $C$ as

\displaystyle C\triangleq\max\big{\{}C_{2},\frac{b-a}{N-2}N\sqrt{N}\gamma^{-k}\,\big{|}\,0\leq k\leq K_{1}-1\big{\}}

(41)

one can obtain

\displaystyle\big{\|}\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\big{\|}\leq C\gamma^{k}

(42)

for all $k$ . Therefore, each agent $i$ will converge to the average value $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/N$ with the rate of convergence of at least $\gamma=(1-\varepsilon^{T(N-1)})^{\frac{1}{T(N-1)}}\in(0,\,1)$ . $\blacksquare$

From Theorem 1, we can see that a smaller $\gamma$ means a faster convergence. Under the relationship $\gamma=(1-\varepsilon^{T(N-1)})^{\frac{1}{T(N-1)}}$ , to expedite the convergence, i.e., a smaller $\gamma$ , it is sufficient to increase $\varepsilon$ , which amounts to reducing the width of the range $(\varepsilon,\,1)$ for the random selection of $p_{ji}(k)$ . Note that although a reduced range $(\varepsilon,\,1)$ enables an honest-but-curious adversary to obtain a better estimation of the range of agent $i$ ’s intermediate states $s_{i}(k)$ and $w_{i}(k)$ for $k\geq K+1$ from received $p_{ji}(k)s_{i}(k)$ and $p_{ji}(k)w_{i}(k)$ , it does not affect the confidentiality of agent $i$ ’s initial state $x_{i}^{0}$ , as will be shown in the following subsection. It is also worth noting that to meet the requirement of randomly selecting weights in our algorithm, $\varepsilon$ cannot be arbitrarily close to $1$ . In fact, $\varepsilon$ must satisfy $\varepsilon<1/\max_{i,k}({D_{i}^{out}(k)}+1)$ . An easy way to select $\varepsilon$ is to set $0<\varepsilon<1/N$ since $D_{i}^{out}(k)\leq N-1$ is true for all $k$ and $i\in\mathcal{V}$ .

Remark 6

Theorem 1 provides a detailed analysis of the rate of convergence under time-varying directed graphs, the results on which are sparse in the literature on Push-Sum under time-varying random weights.

III-C Confidentiality Analysis

Before presenting our main confidentiality result, we first introduce the following lemma.

Lemma 2

Consider a network of $N$ agents represented by a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ which satisfy Assumptions 1, 2, 3, 4, and 5. Under the proposed Algorithm 1, if at some time instant $0\leq k^{*}\leq K$ , agent $i\notin\mathcal{A}$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ , then under $\mathcal{I}_{s}^{*}\triangleq\{\Delta s_{mn}(k)\,|\,(m,\,n)\in\mathcal{E}(k),(m,\,n)\neq(i,\,l),(m,\,n)\neq(l,\,i),k=0,1,\ldots,K\}$ , the joint probability distributions of $s_{i}(K+1)$ and $s_{l}(K+1)$ are identical for any two different initial states $\mathbf{x}^{0},\tilde{\mathbf{x}}^{0}\in[a,\,b]^{N}$ subject to $x_{i}^{0}+x_{l}^{0}=\tilde{x}_{i}^{0}+\tilde{x}_{l}^{0}$ and $x_{j}^{0}=\tilde{x}_{j}^{0}$ for $j\in\mathcal{V}\setminus\{i,l\}$ .

Proof: According to (6), we can rewrite the update rule of $s_{i}(k)$ as follows

		$\displaystyle s_{i}(k+1)$		(43)
	$\displaystyle=$	$\displaystyle{\rm frac}\Big{(}\sum_{j\in\mathcal{N}_{i}^{in}(k)}\Delta s_{ij}(k)+s_{i}(k)-\sum_{j\in\mathcal{N}_{i}^{out}(k)}\Delta s_{ji}(k)\Big{)}$		(43)

for $k\leq K$ . We stack all variables $\Delta s_{mn}(k)$ into a vector $\boldsymbol{\Delta}\mathbf{s}(k)$ according to indices of edges in $\mathcal{E}(k)$ , namely, the index of $\Delta s_{mn}(k)$ is determined by the index of the edge $(m,\,n)$ in $\mathcal{E}(k)$ . Then we can further rewrite (43) in the following more compact form:

\displaystyle\mathbf{s}(k+1)={\rm frac}\big{(}\mathbf{s}(k)+\mathbf{C}(k)\boldsymbol{\Delta}\mathbf{s}(k)\big{)}

(44)

for $k\leq K$ , where $\mathbf{C}(k)$ is the incidence matrix for graph $\mathcal{G}(k)$ at iteration $k$ .

Define the subsets of agents $\mathcal{H}$ and $\mathcal{R}$ as $\mathcal{H}=\{i,l\}$ and $\mathcal{R}=\mathcal{V}\setminus(\mathcal{H}\cup\mathcal{A})$ , respectively. Let $N_{\mathcal{A}}=|\mathcal{A}|$ and $N_{\mathcal{R}}=|\mathcal{R}|$ represent the number of agents in $\mathcal{A}$ and $\mathcal{R}$ , respectively. It is clear that $\mathcal{H}$ , $\mathcal{A}$ , and $\mathcal{R}$ are disjoint, and $\mathcal{H}\cup\mathcal{A}\cup\mathcal{R}=\mathcal{V}$ holds. For graph $\mathcal{G}(k)=(\mathcal{V},\mathcal{E}(k))$ , we define the subgraph $\mathcal{G_{H}}(k)$ as $\mathcal{G_{H}}(k)=(\mathcal{H},\mathcal{E_{H}}(k))$ where $\mathcal{E_{H}}(k)\subset\mathcal{E}(k)$ is the set of edges entirely within $\mathcal{H}$ . The union of subgraphs $\mathcal{G_{H}}(k)$ for iterations $k=0,1,\ldots,K$ is further denoted as $\mathcal{G}_{\mathcal{H}}^{*}=\bigcup\limits_{k=0}^{K}\mathcal{G_{H}}(k)=(\mathcal{H},\mathcal{E}_{\mathcal{H}}^{*})$ , where $\mathcal{E}_{\mathcal{H}}^{*}=\bigcup\limits_{k=0}^{K}\mathcal{E_{H}}(k)$ is the union of edge sets $\mathcal{E_{H}}(k)$ for iterations $k=0,1,\ldots,K$ . Denote the edge set $\mathcal{E_{A}}(k)$ as the collection of edges associated with all agents in $\mathcal{A}$ at iteration $k$ , i.e., $\mathcal{E_{A}}(k)=\{(m,\,n)\,|\,(m,\,n)\in\mathcal{E}(k),m\ \textnormal{or}\ n\in\mathcal{A}\}$ . Then the set of remaining edges not belonging to $\mathcal{E_{H}}(k)$ or $\mathcal{E_{A}}(k)$ can be denoted as $\mathcal{E_{R}}(k)=\mathcal{E}(k)\setminus(\mathcal{E_{H}}(k)\cup\mathcal{E_{A}}(k))$ , which is a collection of edges that are either entirely within $\mathcal{R}$ or between $\mathcal{R}$ and $\mathcal{H}$ . Let $E_{\mathcal{H}}(k)=|\mathcal{E_{H}}(k)|$ , $E_{\mathcal{A}}(k)=|\mathcal{E_{A}}(k)|$ , and $E_{\mathcal{R}}(k)=|\mathcal{E_{R}}(k)|$ represent the number of edges in $\mathcal{E_{H}}(k)$ , $\mathcal{E_{A}}(k)$ , and $\mathcal{E_{R}}(k)$ , respectively. Note that $\mathcal{E_{H}}(k)$ , $\mathcal{E_{A}}(k)$ , and $\mathcal{E_{R}}(k)$ are disjoint, and we always have $\mathcal{E_{H}}(k)\cup\mathcal{E_{A}}(k)\cup\mathcal{E_{R}}(k)=\mathcal{E}(k)$ . Without loss of generality, we can partition the state vector $\mathbf{s}(k)$ as $\mathbf{s}(k)=[\mathbf{s}_{\mathcal{H}}(k)^{T}\ \mathbf{s}_{\mathcal{A}}(k)^{T}\ \mathbf{s}_{\mathcal{R}}(k)^{T}]^{T}$ where $\mathbf{s}_{\mathcal{H}}(k)$ , $\mathbf{s}_{\mathcal{A}}(k)$ , and $\mathbf{s}_{\mathcal{R}}(k)$ denote the state vectors of agents in $\mathcal{H}$ , $\mathcal{A}$ , and $\mathcal{R}$ , respectively. Then we can further rewrite the update rule of $\mathbf{s}(k)$ for $k\leq K$ as

		$\displaystyle\begin{bmatrix}\mathbf{s}_{\mathcal{H}}(k+1)\\ \mathbf{s}_{\mathcal{A}}(k+1)\\ \mathbf{s}_{\mathcal{R}}(k+1)\end{bmatrix}={\rm frac}\Bigg{(}\begin{bmatrix}\mathbf{s}_{\mathcal{H}}(k)\\ \mathbf{s}_{\mathcal{A}}(k)\\ \mathbf{s}_{\mathcal{R}}(k)\end{bmatrix}$		(45)
		$\displaystyle\quad\ +\begin{bmatrix}\mathbf{C}_{\mathcal{H}}(k)&\mathbf{C}_{\mathcal{A}}^{1}(k)&\mathbf{C}_{\mathcal{R}}^{1}(k)\\ \mathbf{0}_{N_{\mathcal{A}}\times E_{\mathcal{H}}(k)}&\mathbf{C}_{\mathcal{A}}^{2}(k)&\mathbf{0}_{N_{\mathcal{A}}\times E_{\mathcal{R}}(k)}\\ \mathbf{0}_{N_{\mathcal{R}}\times E_{\mathcal{H}}(k)}&\mathbf{C}_{\mathcal{A}}^{3}(k)&\mathbf{C}_{\mathcal{R}}^{2}(k)\\ \end{bmatrix}\begin{bmatrix}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}(k)\\ \boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)\\ \boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(k)\end{bmatrix}\Bigg{)}$		(45)

where $\mathbf{C}_{\mathcal{H}}(k)$ is the incidence matrix for subgraph $\mathcal{G_{H}}(k)=(\mathcal{H},\mathcal{E_{H}}(k))$ , and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}(k)$ , $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)$ , and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(k)$ are vectors stacking $\Delta s_{mn}(k)$ corresponding to edge sets $\mathcal{E_{H}}(k)$ , $\mathcal{E_{A}}(k)$ , and $\mathcal{E_{R}}(k)$ , respectively. It is obvious that $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)$ is completely known to agents in $\mathcal{A}$ since every element in $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)$ is either sent or received by the agents in $\mathcal{A}$ . From (45), we can further obtain

		$\displaystyle\begin{bmatrix}\mathbf{s}_{\mathcal{H}}(K+1)\\ \mathbf{s}_{\mathcal{A}}(K+1)\\ \mathbf{s}_{\mathcal{R}}(K+1)\end{bmatrix}={\rm frac}\Bigg{(}\begin{bmatrix}\mathbf{s}_{\mathcal{H}}(0)\\ \mathbf{s}_{\mathcal{A}}(0)\\ \mathbf{s}_{\mathcal{R}}(0)\end{bmatrix}$		(46)
		$\displaystyle\qquad\qquad+\begin{bmatrix}\mathbf{C}_{\mathcal{H}}^{}&\mathbf{C}_{\mathcal{A}}^{1}&\mathbf{C}_{\mathcal{R}}^{1}\\ \mathbf{0}_{N_{\mathcal{A}}\times E_{\mathcal{H}}^{}}&\mathbf{C}_{\mathcal{A}}^{2}&\mathbf{0}_{N_{\mathcal{A}}\times E_{\mathcal{R}}^{}}\\ \mathbf{0}_{N_{\mathcal{R}}\times E_{\mathcal{H}}^{}}&\mathbf{C}_{\mathcal{A}}^{3}&\mathbf{C}_{\mathcal{R}}^{2}\end{bmatrix}\begin{bmatrix}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{}\\ \boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{}\\ \boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{}\end{bmatrix}\Bigg{)}$		(46)

where $E_{\mathcal{H}}^{*}=\sum_{k=0}^{K}E_{\mathcal{H}}(k)$ , $E_{\mathcal{R}}^{*}=\sum_{k=0}^{K}E_{\mathcal{R}}(k)$ , and

$\displaystyle\mathbf{C}_{\mathcal{H}}^{*}$	$\displaystyle=\begin{bmatrix}\mathbf{C}_{\mathcal{H}}(0)&\cdots&\mathbf{C}_{\mathcal{H}}(K)\\ \end{bmatrix}$	(47)
$\displaystyle\mathbf{C}_{\mathcal{A}}^{t*}$	$\displaystyle=\begin{bmatrix}\mathbf{C}_{\mathcal{A}}^{t}(0)&\cdots&\mathbf{C}_{\mathcal{A}}^{t}(K)\\ \end{bmatrix},\ t=1,2,3$
$\displaystyle\mathbf{C}_{\mathcal{R}}^{t*}$	$\displaystyle=\begin{bmatrix}\mathbf{C}_{\mathcal{R}}^{t}(0)&\cdots&\mathbf{C}_{\mathcal{R}}^{t}(K)\\ \end{bmatrix},\ t=1,2$
$\displaystyle\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{*}$	$\displaystyle=\begin{bmatrix}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}(0)^{T}&\cdots&\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}(K)^{T}\\ \end{bmatrix}^{T}$
$\displaystyle\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}$	$\displaystyle=\begin{bmatrix}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(0)^{T}&\cdots&\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(K)^{T}\\ \end{bmatrix}^{T}$
$\displaystyle\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*}$	$\displaystyle=\begin{bmatrix}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(0)^{T}&\cdots&\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(K)^{T}\\ \end{bmatrix}^{T}$

It is worth noting that $\mathbf{C}_{\mathcal{H}}^{*}$ is the incidence matrix for graph $\mathcal{G}_{\mathcal{H}}^{*}=\bigcup\limits_{k=0}^{K}\mathcal{G_{H}}(k)$ . From (46), we have

		$\displaystyle\mathbf{s}_{\mathcal{H}}(K+1)$		(48)
	$\displaystyle=$	$\displaystyle{\rm frac}\big{(}\mathbf{s}_{\mathcal{H}}(0)+\mathbf{C}_{\mathcal{H}}^{}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{}+\mathbf{C}_{\mathcal{A}}^{1}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{}+\mathbf{C}_{\mathcal{R}}^{1}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{}\big{)}$		(48)

Denoting $\mathbf{v}=[v_{1}\,v_{2}]^{T}$ as

\displaystyle\mathbf{v}\triangleq{\rm frac}\big{(}\mathbf{C}_{\mathcal{H}}^{*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{*}\big{)}

(49)

next we prove that $\mathbf{v}$ is uniformly distributed in $[0,\,1)\times[0,\,1)$ subject to ${\rm frac}(v_{1}+v_{2})=0$ if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ .

Given $\mathcal{H}=\{i,l\}$ , if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ , the columns of $\mathbf{C}_{\mathcal{H}}^{*}$ are either $[1\ -1]^{T}$ ( $l$ is an in-neighbor of agent $i$ ) or $[-1\ 1]^{T}$ ( $l$ is an out-neighbor of agent $i$ ). Denote the $j$ -th column of $\mathbf{C}_{\mathcal{H}}^{*}$ as $\mathbf{c}_{j}^{*}$ , then $\mathbf{c}_{j}^{*}$ can be expressed as

\displaystyle\mathbf{c}_{j}^{*}=r_{j}\mathbf{c}_{1}^{*}

(50)

where $r_{j}\in\{1,-1\}$ is the corresponding coefficient. Defining $\mathbf{r}$ as $\mathbf{r}=[r_{1}\cdots r_{E_{\mathcal{H}}^{*}}]$ , one can obtain

	$\displaystyle\mathbf{v}$	$\displaystyle={\rm frac}\big{(}\mathbf{C}_{\mathcal{H}}^{}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{}\big{)}={\rm frac}\big{(}\mathbf{c}_{1}^{}\mathbf{r}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{}\big{)}$		(51)
		$\displaystyle={\rm frac}\big{(}\mathbf{c}_{1}^{}\cdot{\rm frac}(\mathbf{r}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{})\big{)}$		(51)

Since all the elements in $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{*}$ are independent of each other and uniformly distributed in $[0,\,1)$ , we have that ${\rm frac}(\mathbf{r}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{H}}^{*})$ is uniformly distributed in $[0,\,1)$ . As $\mathbf{c}_{1}^{*}$ is either $[1\ -1]^{T}$ or $[-1\ 1]^{T}$ , we have that $\mathbf{v}$ is uniformly distributed in $[0,\,1)\times[0,\,1)$ subject to ${\rm frac}(v_{1}+v_{2})=0$ .

From (48), we have $\mathbf{s}_{\mathcal{H}}(K+1)={\rm frac}\big{(}\mathbf{s}_{\mathcal{H}}(0)+\mathbf{v}+\mathbf{C}_{\mathcal{A}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}+\mathbf{C}_{\mathcal{R}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*}\big{)}$ . Further taking into account the fact that $\mathbf{v}$ is uniformly distributed in $[0,\,1)\times[0,\,1)$ subject to ${\rm frac}(v_{1}+v_{2})=0$ if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ , we can obtain that $\mathbf{s}_{\mathcal{H}}(K+1)$ is also uniformly distributed in $[0,\,1)\times[0,\,1)$ subject to ${\rm frac}(\mathbf{1}^{T}\mathbf{s}_{\mathcal{H}}(K+1))={\rm frac}(\mathbf{1}^{T}\mathbf{s}_{\mathcal{H}}(0)+\mathbf{1}^{T}\mathbf{C}_{\mathcal{A}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}+\mathbf{1}^{T}\mathbf{C}_{\mathcal{R}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*})$ .

Now we analyze the probability distributions of $\mathbf{s}_{\mathcal{H}}(K+1)$ under different initial conditions of agent $i$ . For any two different initial conditions $\mathbf{x}^{0},\,\tilde{\mathbf{x}}^{0}\in[a,\,b]^{N}$ subject to $x_{i}^{0}+x_{l}^{0}=\tilde{x}_{i}^{0}+\tilde{x}_{l}^{0}$ and $x_{j}^{0}=\tilde{x}_{j}^{0}$ for $j\in\mathcal{V}\setminus\{i,l\}$ , one can obtain $s_{i}(0)+s_{l}(0)=\tilde{s}_{i}(0)+\tilde{s}_{l}(0)$ . Note that all the elements in $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}$ and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*}$ belong to the set $\mathcal{I}_{s}^{*}=\{\Delta s_{mn}(k)\,|\,(m,\,n)\in\mathcal{E}(k),(m,\,n)\neq(i,\,l),(m,\,n)\neq(l,\,i),k=0,1,\ldots,K\}$ . Thus, given $\mathcal{I}_{s}^{*}$ , both $\mathbf{s}_{\mathcal{H}}(K+1)$ and $\tilde{\mathbf{s}}_{\mathcal{H}}(K+1)$ are uniformly distributed in $[0,\,1)\times[0,\,1)$ subject to ${\rm frac}(\mathbf{1}^{T}\mathbf{s}_{\mathcal{H}}(K+1))={\rm frac}(\mathbf{1}^{T}\mathbf{s}_{\mathcal{H}}(0)+\mathbf{1}^{T}\mathbf{C}_{\mathcal{A}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}+\mathbf{1}^{T}\mathbf{C}_{\mathcal{R}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*})={\rm frac}(\mathbf{1}^{T}\tilde{\mathbf{s}}_{\mathcal{H}}(0)+\mathbf{1}^{T}\mathbf{C}_{\mathcal{A}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}^{*}+\mathbf{1}^{T}\mathbf{C}_{\mathcal{R}}^{1*}\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}^{*})={\rm frac}(\mathbf{1}^{T}\tilde{\mathbf{s}}_{\mathcal{H}}(K+1))$ . Therefore, given $\mathcal{I}_{s}^{*}$ , the probability distributions of $\mathbf{s}_{\mathcal{H}}(K+1)$ under different $\mathbf{x}^{0}$ and $\tilde{\mathbf{x}}^{0}$ are identical when $x_{j}^{0}=\tilde{x}_{j}^{0}$ for $j\in\mathcal{V}\setminus\{i,l\}$ and $x_{i}^{0}+x_{l}^{0}=\tilde{x}_{i}^{0}+\tilde{x}_{l}^{0}$ hold for some agent $l$ that is an in-neighbor or out-neighbor of agent $i$ at some time instant $0\leq k^{*}\leq K$ but does not belong to $\mathcal{A}$ . $\blacksquare$

Now we are in position to present our main confidentiality result.

Theorem 2

Consider a network of $N$ agents represented by a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ which satisfy Assumptions 1, 2, 3, 4, and 5. Under the proposed Algorithm 1, the confidentiality of agent $i\notin\mathcal{A}$ can be preserved against $\mathcal{A}$ if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ .

Proof: To show that the confidentiality of agent $i$ can be protected, we have to show that under different initial values of agent $i$ , the probability distributions are identical for honest-but-curious agents $\mathcal{A}$ ’ information set. It suffices to prove that the probability distributions of information sets accessible to $\mathcal{A}$ are identical for all $\mathbf{x}^{0},\,\tilde{\mathbf{x}}^{0}\in[a,\,b]^{N}$ subject to $x_{i}^{0}+x_{l}^{0}=\tilde{x}_{i}^{0}+\tilde{x}_{l}^{0}$ and $x_{j}^{0}=\tilde{x}_{j}^{0}$ for $j\in\mathcal{V}\setminus\{i,l\}$ where agent $l$ is an in-neighbor or out-neighbor of agent $i$ at some time instant $0\leq k^{*}\leq K$ and does not belong to $\mathcal{A}$ . Note that under such $\mathbf{x}^{0}$ and $\tilde{\mathbf{x}}^{0}$ , the sum of the initial states of all nodes not in $\mathcal{A}$ keeps unchanged, i.e., $\sum_{j\in\mathcal{V}\setminus\mathcal{A}}x_{j}^{0}=\sum_{j\in\mathcal{V}\setminus\mathcal{A}}\tilde{x}_{j}^{0}$ . Given $s_{i}(0)=\frac{1}{N^{2}}+\frac{(N-2)(x_{i}^{0}-a)}{(b-a)N^{2}}$ , this is equivalent to proving that the probability distributions of information sets accessible to $\mathcal{A}$ are identical for all $\mathbf{s}(0),\,\tilde{\mathbf{s}}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]^{N}$ subject to $s_{i}(0)+s_{l}(0)=\tilde{s}_{i}(0)+\tilde{s}_{l}(0)$ and $s_{j}(0)=\tilde{s}_{j}(0)$ for $j\in\mathcal{V}\setminus\{i,l\}$ . Denoting $\mathcal{I}_{s}(k)$ and $\mathcal{I}_{w}(k)$ as

	$\displaystyle\mathcal{I}_{s}(k)$	$\displaystyle=\{\mathbf{s}_{\mathcal{A}}(k),\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)\}$		(52)
	$\displaystyle\mathcal{I}_{w}(k)$	$\displaystyle=\{\mathbf{w}_{\mathcal{A}}(k),\boldsymbol{\Delta}\mathbf{w}_{\mathcal{A}}(k)\}$		(52)

where $\mathbf{s}_{\mathcal{A}}(k)$ and $\mathbf{w}_{\mathcal{A}}(k)$ are state vectors of agents in $\mathcal{A}$ , and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)$ and $\boldsymbol{\Delta}\mathbf{w}_{\mathcal{A}}(k)$ are augmented vectors of $\Delta s_{mn}(k)$ and $\Delta w_{mn}(k)$ corresponding to edge set $\mathcal{E_{A}}(k)$ , respectively. We can see that agents in $\mathcal{A}$ have access to both $\mathcal{I}_{s}(k)$ and $\mathcal{I}_{w}(k)$ at each iteration $k$ . Further denote $\mathcal{I}_{1}^{s}$ , $\mathcal{I}_{1}^{w}$ , $\mathcal{I}_{2}^{s}$ , $\mathcal{I}_{2}^{w}$ , $\mathcal{I}_{1}$ , and $\mathcal{I}_{2}$ as

$\displaystyle\mathcal{I}_{1}^{s}$	$\displaystyle=\bigcup\limits_{k=0}^{K}\mathcal{I}_{s}(k)\qquad\quad\,\mathcal{I}_{1}^{w}=\bigcup\limits_{k=0}^{K}\mathcal{I}_{w}(k)\cup\{\mathbf{w}(0)\}$	(53)
$\displaystyle\mathcal{I}_{2}^{s}$	$\displaystyle=\bigcup\limits_{k=K+1}^{\infty}\mathcal{I}_{s}(k)\qquad\mathcal{I}_{2}^{w}=\bigcup\limits_{k=K+1}^{\infty}\mathcal{I}_{w}(k)$
$\displaystyle\mathcal{I}_{1}$	$\displaystyle=\mathcal{I}_{1}^{s}\cup\mathcal{I}_{1}^{w}\qquad\quad\ \ \mathcal{I}_{2}=\mathcal{I}_{2}^{s}\cup\mathcal{I}_{2}^{w}$

According to Algorithm 1, $\mathcal{I}\triangleq\mathcal{I}_{1}\cup\mathcal{I}_{2}$ represents all the information accessible to $\mathcal{A}$ . We denote the conditional probability density of $\mathcal{I}$ given $\mathbf{s}(0)$ as $f(\mathcal{I}\,|\,\mathbf{s}(0))$ . Thus, to show that the confidentiality of agent $i$ can be protected, it is equivalent to proving $f(\mathcal{I}\,|\,\mathbf{s}(0))=f(\mathcal{I}\,|\,\tilde{\mathbf{s}}(0))$ for all $\mathbf{s}(0),\,\tilde{\mathbf{s}}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]^{N}$ subject to $s_{i}(0)+s_{l}(0)=\tilde{s}_{i}(0)+\tilde{s}_{l}(0)$ and $s_{j}(0)=\tilde{s}_{j}(0)$ for $j\in\mathcal{V}\setminus\{i,l\}$ .

Since

\displaystyle f(\mathcal{I}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1},\mathcal{I}_{2}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}\,|\,\mathbf{s}(0))f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\mathbf{s}(0))

(54)

holds, to show $f(\mathcal{I}\,|\,\mathbf{s}(0))=f(\mathcal{I}\,|\,\tilde{\mathbf{s}}(0))$ , it suffices to prove that

\displaystyle f(\mathcal{I}_{1}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}\,|\,\tilde{\mathbf{s}}(0))

(55)

and

\displaystyle f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\mathbf{s}(0))=f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))

(56)

hold for any $\mathbf{s}(0),\,\tilde{\mathbf{s}}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]^{N}$ subject to $s_{i}(0)+s_{l}(0)=\tilde{s}_{i}(0)+\tilde{s}_{l}(0)$ and $s_{j}(0)=\tilde{s}_{j}(0)$ for $j\in\mathcal{V}\setminus\{i,l\}$ .

We first show $f(\mathcal{I}_{1}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}\,|\,\tilde{\mathbf{s}}(0)$ . Since $\mathcal{I}_{1}^{w}$ is independent of $\mathcal{I}_{1}^{s}$ and $\mathbf{s}(0)$ , one can obtain $f(\mathcal{I}_{1}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}^{s},\mathcal{I}_{1}^{w}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}^{w})f(\mathcal{I}_{1}^{s}\,|\,\mathbf{s}(0))$ . From (45), we can see that given $\mathbf{s}_{\mathcal{A}}(0)$ , $\mathcal{I}_{1}^{s}$ is independent of $\mathbf{s}_{\mathcal{H}}(0)$ and $\mathbf{s}_{\mathcal{R}}(0)$ , where $\mathcal{H}=\{i,l\}$ and $\mathcal{R}=\mathcal{V}\setminus(\mathcal{H}\cup\mathcal{A})$ . So $f(\mathcal{I}_{1}^{s}\,|\,\mathbf{s}(0))=f(\mathcal{I}_{1}^{s}\,|\,\mathbf{s}_{\mathcal{A}}(0),\mathbf{s}_{\mathcal{H}}(0),\mathbf{s}_{\mathcal{R}}(0))=f(\mathcal{I}_{1}^{s}\,|\,\mathbf{s}_{\mathcal{A}}(0))$ holds. Therefore, we have

	$\displaystyle f(\mathcal{I}_{1}\,\|\,\mathbf{s}(0))$	$\displaystyle=f(\mathcal{I}_{1}^{w})f(\mathcal{I}_{1}^{s}\,\|\,\mathbf{s}_{\mathcal{A}}(0))=f(\mathcal{I}_{1}^{w})f(\mathcal{I}_{1}^{s}\,\|\,\tilde{\mathbf{s}}_{\mathcal{A}}(0))$		(57)
		$\displaystyle=f(\mathcal{I}_{1}\,\|\,\tilde{\mathbf{s}}(0))$		(57)

where we used $\mathbf{s}_{\mathcal{A}}(0)=\tilde{\mathbf{s}}_{\mathcal{A}}(0)$ in the derivation.

To show $f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\mathbf{s}(0))=f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$ , it suffices to prove

		$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$		(58)
	$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$		(58)

Given $\mathbf{s}(K+1)$ and $\mathbf{w}(K+1)$ , $\mathcal{I}_{2}$ is independent of $\mathcal{I}_{1}$ , which further leads to

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$	(59)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1),\mathcal{I}_{1},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))$
	$\displaystyle\times f(\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))$
	$\displaystyle\times f(\mathbf{s}(K+1)\,\|\,\mathbf{w}(K+1),\mathcal{I}_{1},\mathbf{s}(0))f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$

Further taking into account the facts that 1) $\mathbf{s}(K+1)$ is conditionally independent of $\mathbf{w}(K+1)$ and $\mathcal{I}_{1}^{w}$ given $\mathcal{I}_{1}^{s}$ and $\mathbf{s}(0)$ ; and 2) $\mathbf{w}(K+1)$ is conditionally independent of $\mathcal{I}_{1}^{s}$ and $\mathbf{s}(0)$ given $\mathcal{I}_{1}^{w}$ , one can obtain

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$	(60)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))f(\mathbf{s}(K+1)\,\|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1}^{w})$

Similarly, one can also obtain

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$	(61)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))f(\mathbf{s}(K+1)\,\|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$
	$\displaystyle\times f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1}^{w})$

Therefore, to show $f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\mathbf{s}(0))=f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$ , it suffices to prove $f(\mathbf{s}(K+1)\,|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))=f(\mathbf{s}(K+1)\,|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$ . Denote $\mathcal{I}_{\mathcal{R}}^{s}$ as $\mathcal{I}_{\mathcal{R}}^{s}\triangleq\{\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(k)\,|\,k=0,1,\ldots,K\}$ . To prove $f(\mathbf{s}(K+1)\,|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))=f(\mathbf{s}(K+1)\,|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$ , we only need to prove

\displaystyle f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))=f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))

(62)

From (46), we can obtain the following facts: 1) $\mathbf{s}_{\mathcal{H}}(K+1)$ is conditionally independent of $\mathbf{s}_{\mathcal{A}}(K+1)$ and $\mathbf{s}_{\mathcal{R}}(K+1)$ given $\mathcal{I}_{\mathcal{R}}^{s}$ , $\mathcal{I}_{1}^{s}$ , and $\mathbf{s}(0)$ ; and 2) $\mathbf{s}_{\mathcal{A}}(K+1)$ and $\mathbf{s}_{\mathcal{R}}(K+1)$ are conditionally independent of $\mathbf{s}_{\mathcal{H}}(0)$ given $\mathcal{I}_{\mathcal{R}}^{s}$ , $\mathcal{I}_{1}^{s}$ , $\mathbf{s}_{\mathcal{A}}(0)$ , and $\mathbf{s}_{\mathcal{R}}(0)$ . Taking into account these facts, we have

	$\displaystyle f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,\|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))$	(63)
$\displaystyle=$	$\displaystyle f(\mathbf{s}_{\mathcal{H}}(K+1),\mathbf{s}_{\mathcal{A}}(K+1),\mathbf{s}_{\mathcal{R}}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,\|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathbf{s}_{\mathcal{H}}(K+1)\,\|\,\mathbf{s}_{\mathcal{A}}(K+1),\mathbf{s}_{\mathcal{R}}(K+1),\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{s}_{\mathcal{A}}(K+1),\mathbf{s}_{\mathcal{R}}(K+1)\,\|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\mathbf{s}(0))f(\mathcal{I}_{\mathcal{R}}^{s}\,\|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathbf{s}_{\mathcal{H}}(K+1)\,\|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{s}_{\mathcal{A}}(K+1),\mathbf{s}_{\mathcal{R}}(K+1)\,\|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\mathbf{s}_{\mathcal{A}}(0),\mathbf{s}_{\mathcal{R}}(0))f(\mathcal{I}_{\mathcal{R}}^{s})$

where in the derivation we used the independence between $\mathcal{I}_{\mathcal{R}}^{s}$ and $\{\mathcal{I}_{1}^{s},\mathbf{s}(0)\}$ . Similarly, one can obtain

	$\displaystyle f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,\|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$	(64)
$\displaystyle=$	$\displaystyle f(\mathbf{s}_{\mathcal{H}}(K+1)\,\|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$
	$\displaystyle\times f(\mathbf{s}_{\mathcal{A}}(K+1),\mathbf{s}_{\mathcal{R}}(K+1)\,\|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}_{\mathcal{A}}(0),\tilde{\mathbf{s}}_{\mathcal{R}}(0))f(\mathcal{I}_{\mathcal{R}}^{s})$

From Lemma 2, we have that if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ , $f(\mathbf{s}_{\mathcal{H}}(K+1)\,|\,\mathcal{I}_{s}^{*},\mathbf{s}(0))=f(\mathbf{s}_{\mathcal{H}}(K+1)\,|\,\mathcal{I}_{s}^{*},\tilde{\mathbf{s}}(0))$ holds, where $\mathcal{I}_{s}^{*}=\{\Delta s_{mn}(k)\,|\,(m,\,n)\in\mathcal{E}(k),(m,\,n)\neq(i,\,l),(m,\,n)\neq(l,\,i),k=0,1,\ldots,K\}=\{\Delta s_{mn}(k)\,|\,(m,\,n)\in\mathcal{E_{A}}(k)\cup\mathcal{E_{R}}(k),k=0,1,\ldots,K\}$ is the collection of all elements $\Delta s_{mn}(k)$ in $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)$ and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(k)$ from iteration $0$ to iteration $K$ . Given that $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{A}}(k)\in\mathcal{I}_{1}^{s}$ and $\boldsymbol{\Delta}\mathbf{s}_{\mathcal{R}}(k)\in\mathcal{I}_{\mathcal{R}}^{s}$ hold for $k=0,1,\ldots,K$ , we further have $f(\mathbf{s}_{\mathcal{H}}(K+1)\,|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\mathbf{s}(0))=f(\mathbf{s}_{\mathcal{H}}(K+1)\,|\,\mathcal{I}_{\mathcal{R}}^{s},\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0)$ . Based on $\mathbf{s}_{\mathcal{A}}(0)=\tilde{\mathbf{s}}_{\mathcal{A}}(0)$ and $\mathbf{s}_{\mathcal{R}}(0)=\tilde{\mathbf{s}}_{\mathcal{R}}(0)$ , we have $f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))=f(\mathbf{s}(K+1),\mathcal{I}_{\mathcal{R}}^{s}\,|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$ , implying $f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\mathbf{s}(0))=f(\mathcal{I}_{2}\,|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$ if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ not belonging to $\mathcal{A}$ .

Therefore, we have $f(\mathcal{I}\,|\,\mathbf{s}(0))=f(\mathcal{I}\,|\,\tilde{\mathbf{s}}(0))$ for any $\mathbf{s}(0),\,\tilde{\mathbf{s}}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]^{N}$ subject to $s_{i}(0)+s_{l}(0)=\tilde{s}_{i}(0)+\tilde{s}_{l}(0)$ and $s_{j}(0)=\tilde{s}_{j}(0)$ for $j\in\mathcal{V}\setminus\{i,l\}$ , meaning that the confidentiality of agent $i$ can be preserved if at some time instant $0\leq k^{*}\leq K$ , agent $i$ has an in-neighbor or out-neighbor $l$ that does not belong to $\mathcal{A}$ . $\blacksquare$

Remark 7

Compared with our previous work [1] which defines privacy as the positivity of probability that adversaries’ accessible information set being the same under two different initial states, in this work we significantly improved our confidential results by proving that the probability distributions of information sets are identical under different initial states, meaning that the initial states are perfectly indistinguishable from the viewpoint of adversaries.

Remark 8

It is worth noting that although the confidential approach in [30] looks similar to ours (both protocols employ random values in the first several iterations), they are in fact significantly different. More specifically, to guarantee the accuracy of average consensus, not only does the confidential protocol in [30] require pairwise local averaging of exchanged random values in the first several iterations, but it also needs to compensate the errors induced by the random values immediately after the first several iterations, which is equivalent to introducing time-correlated additive random noises to agents’ states. To the contrary, our confidential approach exchanges time-uncorrelated random values for iterations $k\leq K$ . To ensure consensus accuracy, each agent $i$ uses a carefully-designed $\Delta s_{ii}(k)$ to compensate the errors induced by the random values at each iteration $k\leq K$ . Therefore, the random values used in our approach are space-correlated. As shown in Theorem 2, the space-correlated randomness can make the probability distributions of information sets accessible to adversaries identical under different initial states and hence achieves information-theoretic privacy, which is stronger than the confidentiality achieved using time-correlated noises in [30].

Next we proceed to show that if the conditions in Theorem 2 are not met, then the confidentiality of agent $i$ may be breached.

Theorem 3

Consider a network of $N$ agents represented by a sequence of time-varying directed graphs $\{\mathcal{G}(k)=(\mathcal{V},\,\mathcal{E}(k))\}$ which satisfy Assumptions 1, 2, 3, 4, and 5. Under the proposed Algorithm 1, the confidentiality of agent $i\notin\mathcal{A}$ cannot be preserved against $\mathcal{A}$ if all in-neighbors and out-neighbors of agent $i$ belong to $\mathcal{A}$ , i.e., $\{\mathcal{N}_{i}^{in}(k)\cup\mathcal{N}_{i}^{out}(k)\,\big{|}\,k\geq 0\}\subset\mathcal{A}$ . In fact, when $\{\mathcal{N}_{i}^{in}(k)\cup\mathcal{N}_{i}^{out}(k)\,\big{|}\,k\geq 0\}\subset\mathcal{A}$ is true, the initial value $x_{i}^{0}$ of agent $i$ can be uniquely determined by honest-but-curious agents in $\mathcal{A}$ .

Proof: From (43) we have

		$\displaystyle{\rm frac}\big{(}s_{i}(k+1)-s_{i}(k)\big{)}$		(65)
	$\displaystyle=$	$\displaystyle{\rm frac}\Big{(}\sum_{n\in\mathcal{N}_{i}^{in}(k)}\Delta s_{in}(k)-\sum_{m\in\mathcal{N}_{i}^{out}(k)}\Delta s_{mi}(k)\Big{)}$		(65)

for $k\leq K$ and further

		$\displaystyle{\rm frac}\big{(}s_{i}(K+1)-s_{i}(0)\big{)}$		(66)
	$\displaystyle=$	$\displaystyle{\rm frac}\bigg{(}\sum_{k=0}^{K}\Big{[}\sum_{n\in\mathcal{N}_{i}^{in}(k)}\Delta s_{in}(k)-\sum_{m\in\mathcal{N}_{i}^{out}(k)}\Delta s_{mi}(k)\Big{]}\bigg{)}$		(66)

Since $s_{i}(0)\in[\frac{1}{N^{2}},\,\frac{N-1}{N^{2}}]$ holds, one can obtain

	$\displaystyle s_{i}(0)=$	$\displaystyle{\rm frac}\bigg{(}s_{i}(K+1)$		(67)
		$\displaystyle-\sum_{k=0}^{K}\Big{[}\sum_{n\in\mathcal{N}_{i}^{in}(k)}\Delta s_{in}(k)-\sum_{m\in\mathcal{N}_{i}^{out}(k)}\Delta s_{mi}(k)\Big{]}\bigg{)}$		(67)

According to the requirements in Algorithm 1, we have $\Delta s_{ii}(k)+\sum_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta s_{mi}(k)}=s_{i}(k)$ for $k\geq K+1$ and $\Delta w_{ii}(k)+\sum_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta w_{mi}(k)}=w_{i}(k)$ for $k\geq 0$ . Plugging these relationships into (6) and (7), we can obtain

		$\displaystyle s_{i}(k+1)-s_{i}(k)=$		(68)
		$\displaystyle\qquad\sum\limits_{n\in\mathcal{N}_{i}^{in}(k)}{\Delta s_{in}(k)}-\sum\limits_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta s_{mi}(k)}$		(68)

for $k\geq K+1$ and

		$\displaystyle w_{i}(k+1)-w_{i}(k)=$		(69)
		$\displaystyle\qquad\sum\limits_{n\in\mathcal{N}_{i}^{in}(k)}{\Delta w_{in}(k)}-\sum\limits_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta w_{mi}(k)}$		(69)

for $k\geq 0$ , which further lead to

		$\displaystyle s_{i}(k)-s_{i}(K+1)=$		(70)
		$\displaystyle\qquad\sum\limits_{l=K+1}^{k-1}\Big{[}\sum\limits_{n\in\mathcal{N}_{i}^{in}(k)}{\Delta s_{in}(l)}-\sum\limits_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta s_{mi}(l)}\Big{]}$		(70)

for $k\geq K+1$ and

		$\displaystyle w_{i}(k)-w_{i}(0)=$		(71)
		$\displaystyle\qquad\sum\limits_{l=0}^{k-1}\Big{[}\sum\limits_{n\in\mathcal{N}_{i}^{in}(k)}{\Delta w_{in}(l)}-\sum\limits_{m\in\mathcal{N}_{i}^{out}(k)}{\Delta w_{mi}(l)}\Big{]}$		(71)

for $k\geq 0$ .

Under Assumption 5, if $\{\mathcal{N}_{i}^{out}(k)\cup\mathcal{N}_{i}^{in}(k)\,\big{|}\,k\geq 0\}\subset\mathcal{A}$ is true, then all terms on the right-hand side of (70) and (71) are known to the honest-but-curious agents in $\mathcal{A}$ . Further taking into account $w_{i}(0)=1$ , we have that agents in $\mathcal{A}$ can uniquely determine $w_{i}(k)$ for all $k$ . Under Assumption 1 and $\{\mathcal{N}_{i}^{out}(k)\cup\mathcal{N}_{i}^{in}(k)\,\big{|}\,k\geq 0\}\subset\mathcal{A}$ , there must exist at least one agent $j\in\mathcal{A}$ such that $(j,\,i)\in\mathcal{E}_{\infty}$ is true. This is because otherwise graph $(\mathcal{V},\,\mathcal{E}_{\infty})$ is not strongly connected, which does not satisfy Assumption 1. According to Assumption 2, agent $i$ directly communicates with agent $j\in\mathcal{A}$ at least once in every $T$ consecutive time instants. So there must exist $k^{\prime}\geq K+1$ at which agent $i$ directly communicates with agent $j$ , i.e., agent $i$ sends $\Delta s_{ji}(k^{\prime})$ and $\Delta w_{ji}(k^{\prime})$ to agent $j$ at iteration $k^{\prime}$ . As $j\in\mathcal{A}$ , every honest-but-curious agent in $\mathcal{A}$ has access to $\Delta s_{ji}(k^{\prime})$ and $\Delta w_{ji}(k^{\prime})$ . So agents in $\mathcal{A}$ can easily infer $s_{i}(k^{\prime})$ by using the following relationship

s_{i}(k^{\prime})=\frac{\Delta s_{ji}(k^{\prime})}{\Delta w_{ji}(k^{\prime})}w_{i}(k^{\prime})=\frac{p_{ji}(k^{\prime})s_{i}(k^{\prime})}{p_{ji}(k^{\prime})w_{i}(k^{\prime})}w_{i}(k^{\prime})

(72)

and then determine the value of $s_{i}(K+1)$ using (70). Further making use of (67), agents in $\mathcal{A}$ can infer the value of $s_{i}(0)$ , and then uniquely determine the initial value of agent $i$ using $x_{i}^{0}=\frac{b-a}{N-2}(N^{2}s_{i}(0)-1)+a$ . Therefore, the confidentiality of agent $i\notin\mathcal{A}$ cannot be preserved against $\mathcal{A}$ if all in-neighbors and out-neighbors of agent $i$ belong to $\mathcal{A}$ . $\blacksquare$

Remark 9

It is worth noting that in confidential average consensus, topology requirements such as the ones in Theorem 2 are widely used. In fact, to guarantee both accuracy and confidentiality, [31, 26, 27, 28, 29, 30, 33, 34, 35, 39, 40, 41] all rely on similar topology requirements.

Remark 10

Our algorithm can protect the confidentiality of an agent even when all its neighbors interact (at least one does not collude) with adversaries, which is not allowed in [27] and [31].

Remark 11

From the above analysis, we know that introducing randomness into interaction dynamics by each agent $i$ for $k\leq K$ is key to protect confidentiality against honest-but-curious agents. It is worth noting that compared with the conventional push-sum approach which does not take confidentiality into consideration, the introduced randomness in our approach has no influence on the convergence rate $\gamma$ . However, the randomness does delay the convergence process and hence leads to a trade-off between confidentiality preservation and convergence time. This is confirmed in our numerical simulations in Fig. 2, which shows that convergence only initiates after $k=K+1$ .

Remark 12

If an adversary can obtain side information, then a larger $K$ protects the confidentiality of more intermediate states $s_{i}(k)$ for $1\leq k\leq K$ . This is because for $k\geq K+1$ , $s_{i}(k)$ can be easily obtained by its out-neighbor $j$ due to the relationship $s_{i}(k)=w_{i}(k)\Delta s_{ji}(k)/\Delta w_{ji}(k)$ if side information about $w_{i}(k)$ is available to the adversary $j$ . Therefore, although a larger $K$ leads to more delay in the convergence process, as discussed in Remark 11, it can protect more intermediate states ( $s_{i}(k)$ for $1\leq k\leq K$ ) when an adversary can obtain side information. Of course, if side information is not of concern, a smaller $K$ is preferable to minimize the delay in the convergence process.

Remark 13

Our algorithm can be extended to preserve confidentiality against external eavesdroppers wiretapping all communication links without compromising algorithmic accuracy by patching partially homomorphic encryption. More specifically, using public-key cryptosystems (e.g., Paillier [57], RSA [58], and ElGamal [59]), each agent generates and floods its public key before the consensus iteration starts. Then in decentralized implementation, an agent encrypts its messages to be sent, which can be decrypted by a legitimate recipient without the help of any third party. Note that since public-key cryptosystems can only deal with integers, the final consensus result would be subject to a quantization error. However, as indicated in our previous work [40], the quantization error can be made arbitrarily small in implementation.

IV Results Validation

We conducted numerical simulations to verify the correctness and the effectiveness of our proposed approach.

We first evaluated our proposed Algorithm 1 under a network of $N=5$ agents interacting on a time-varying directed graph. More specifically, we used the interaction graph in Fig. 1(a) when $k$ is even and Fig. 1(b) when $k$ is odd. It can be verified that this time-varying directed graph satisfies Assumptions 1 and 2. Parameter $\varepsilon$ was set to $0.05$ . The initial values $x_{i}^{0}$ for $i=1,\ldots,N$ were randomly chosen from $(-50,\,50)$ . We used $e(k)$ to measure the estimation error between the estimate $\pi_{i}(k)=s_{i}(k)/w_{i}(k)$ and the true average value $\bar{x}^{0}={\sum_{j=1}^{N}x_{j}^{0}}/{N}$ at iteration $k$ , i.e.,

\displaystyle e(k)=\big{\|}\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\big{\|}=\big{(}\sum\limits_{i=1}^{N}(\pi_{i}(k)-\bar{x}^{0})^{2}\big{)}^{1/2}

(73)

Three experiments were conducted with parameter $K$ being $10$ , $20$ , and $30$ , respectively. The evolution of $e(k)$ is shown in Fig. 2. It can be seen that $e(k)$ approached $0$ , meaning that every agent converged to the average value $\bar{x}^{0}=\sum_{j=1}^{N}x_{j}^{0}/{N}$ . From Fig. 2, we can also see that Algorithm 1 did not start to converge in the first $K+1$ iterations due to the introduced randomness, which confirms our analysis in Remark 11.

Refer to caption — Figure 1: A time-varying directed graph with $5$ agents.

We also evaluated the influence of parameter $\varepsilon$ on the convergence rate $\gamma$ . The interaction graph was the same as above. $K$ was set to $10$ . The simulation results are given in Fig. 3 where the mean and variance of $\gamma$ from $1,000$ runs of the algorithm are shown under different values of $\varepsilon$ . Fig. 3 shows that as $\varepsilon$ increases, the convergence rate $\gamma$ decreases (i.e., the convergence speed increases), which confirms our analysis in Sec. III-B.

We then compared the proposed Algorithm 1 with existing data-obfuscation based approaches, more specifically, the differential-privacy based approach in [14], the decaying-noise approach in [27], and the finite-noise-sequence approach in [31]. Under the same setup as in the previous simulation, we chose the initial values as $\{10,15,20,25,30\}$ , which led to an average value $20$ . We adopted the weight matrix $\mathbf{W}$ from [14], i.e., the $ij$ -th entry was $w_{ij}={1}/{(|\mathcal{N}_{j}^{out}|+1)}$ for $i\in\mathcal{N}_{j}^{out}\cup\{j\}$ and $w_{ij}=0$ for $i\notin\mathcal{N}_{j}^{out}\cup\{j\}$ . As the graph is directed and imbalanced, and does not meet the undirected or balanced assumption in [14, 31, 27], all three approaches failed to achieve average consensus, as shown in the numerical simulation results in Fig. 4, Fig. 5, and Fig. 6, respectively.

Finally, we conducted numerical simulations to verify the scalability of our proposed Algorithm 1 using a network of $N=1,000$ agents. At every iteration $k$ , each agent $i$ was assumed to have three out-neighbors, i.e.,

\mathcal{N}_{i}^{out}(k)=\left\{\begin{aligned} &\left\{\overline{i}+1,\overline{i+1}+1,\overline{i+2}+1\right\}\qquad\textnormal{if}\ k\ \textnormal{is even}\\ &\left\{\overline{i-2}+1,\overline{i-3}+1,\overline{i-4}+1\right\}\ \textnormal{if}\ k\ \textnormal{is odd}\\ \end{aligned}\right.

(74)

where the superscript “ $\bar{\quad}$ ” represents modulo operation on $N$ , i.e., $\overline{i}\triangleq i\mod N$ . The initial values $x_{i}^{0}$ for $i=1,2,\ldots,N$ were randomly chosen from $(-50,\,50)$ . $\varepsilon$ and $K$ were set to $0.05$ and $10$ , respectively. The evolution of estimation error $e(k)=\|\boldsymbol{\pi}(k)-\bar{x}^{0}\mathbf{1}\|$ is shown in Fig. 7. It can be seen that $e(k)$ converged to $0$ , implying that our proposed algorithm can guarantee the convergence of all agents to the actual average value even when the network size is large.

V Conclusions

We proposed a confidential average consensus algorithm for time-varying directed graphs. In distinct difference from existing differential-privacy based approaches which enable confidentiality through compromising the accuracy of obtained consensus value, we leveraged the inherent robustness of average consensus to embed randomness in interaction dynamics, which guarantees confidentiality of participating agents without sacrificing the accuracy of average consensus. Finally, we provided numerical simulation results to confirm the effectiveness and efficiency of our proposed approach.

References

[1] H. Gao, C. Zhang, M. Ahmad, and Y. Q. Wang, “Privacy-preserving average consensus on directed graphs using push-sum,” in 2018 IEEE Conference on Communications and Network Security (CNS), 2018.
[2] J. E. Boillat, “Load balancing and poisson equation in a graph,” Concurrency and Computation: Practice and Experience, vol. 2, no. 4, pp. 289–313, 1990.
[3] G. Cybenko, “Dynamic load balancing for distributed memory multiprocessors,” J. Parallel Distrib. Comput., vol. 7, no. 2, pp. 279–301, 1989.
[4] N. A. Lynch, Distributed algorithms. San Francisco, CA: Morgan Kaufmann Publishers, 1996.
[5] D. S. Scherber and H. C. Papadopoulos, “Locally constructed algorithms for distributed computations in ad-hoc networks,” in Proc. Information Processing Sensor Networks (IPSN), 2004, pp. 11–19.
[6] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in Proc. Information Processing Sensor Networks (IPSN), 2005, pp. 63–70.
[7] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Trans. Autom. Control, vol. 49, no. 9, pp. 1520–1533, 2004.
[8] W. Ren and R. W. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Trans. Autom. Control, vol. 50, no. 5, pp. 655–661, 2005.
[9] O. L. Mangasarian, “Privacy-preserving linear programming,” Optimization Letters, vol. 5, no. 1, pp. 165–172, 2011.
[10] R. Hoenkamp, G. B. Huitema, and A. J. C. de Moor-van Vugt, “The neglected consumer: the case of the smart meter rollout in the netherlands,” Renew. Energy Law Policy Rev., pp. 269–282, 2011.
[11] Y. Lou, L. Yu, S. Wang, and P. Yi, “Privacy preservation in distributed subgradient optimization algorithms,” IEEE Trans. Cybern., vol. 48, no. 7, pp. 2154–2165, 2017.
[12] J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, 1984.
[13] Z. Zhang and M. Y. Chow, “Incremental cost consensus algorithm in a smart grid environment,” in 2011 IEEE Power and Energy Society General Meeting, 2011, pp. 1–6.
[14] Z. Huang, S. Mitra, and G. Dullerud, “Differentially private iterative synchronous consensus,” in Proceedings of the 2012 ACM workshop on Privacy in the electronic society, 2012, pp. 81–90.
[15] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributed optimization,” in Proc. Int. Conf. Distrib. Comput. Netw., 2015, pp. 4:1–4:10.
[16] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private average consensus with optimal noise selection,” IFAC-PapersOnLine, vol. 48, no. 22, pp. 203–208, 2015.
[17] ——, “Differentially private average consensus: obstructions, trade-offs, and optimal algorithm design,” Automatica, vol. 81, pp. 221–231, 2017.
[18] V. Katewa, F. Pasqualetti, and V. Gupta, “On privacy vs. cooperation in multi-agent systems,” Int. J. Control, vol. 91, no. 7, pp. 1693–1707, 2018.
[19] L. Gao, S. Deng, W. Ren, and C. Hu, “Differentially private consensus with quantized communication,” IEEE Trans. Cybern., 2019.
[20] D. Ye, T. Zhu, W. Zhou, and S. Y. Philip, “Differentially private malicious agent avoidance in multiagent advising learning,” IEEE Trans. Cybern., 2019.
[21] M. Kefayati, M. S. Talebi, B. H. Khalaj, and H. R. Rabiee, “Secure consensus averaging in sensor networks using random offsets,” in Proc. IEEE Int. Conf. Telecommun. Malaysia Int. Conf. Commun., 2007, pp. 556–560.
[22] X. Wang, J. He, P. Cheng, and J. Chen, “Privacy preserving average consensus with different privacy guarantee,” in 2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 5189–5194.
[23] Y. Wang, Z. Huang, S. Mitra, and G. E. Dullerud, “Differential privacy in linear distributed control systems: Entropy minimizing mechanisms and performance tradeoffs,” IEEE Trans. Control Netw. Syst., vol. 4, no. 1, pp. 118–130, 2017.
[24] J. He, L. Cai, P. Cheng, J. Pan, and L. Shi, “Distributed privacy-preserving data aggregation against dishonest nodes in network systems,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1462–1470, 2018.
[25] ——, “Consensus-based data-privacy preserving data aggregation,” IEEE Transactions on Automatic Control, vol. 64, no. 12, pp. 5222–5229, 2019.
[26] J. He, L. Cai, C. Zhao, P. Cheng, and X. Guan, “Privacy-preserving average consensus: privacy analysis and algorithm design,” IEEE Transactions on Signal and Information Processing over Networks, vol. 5, no. 1, pp. 127–138, 2018.
[27] Y. Mo and R. M. Murray, “Privacy preserving average consensus,” IEEE Trans. Autom. Control, vol. 62, no. 2, pp. 753–765, 2017.
[28] T. Charalambous, N. E. Manitara, and C. N. Hadjicostis, “Privacy-preserving average consensus over digraphs in the presence of time delays,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2019, pp. 238–245.
[29] N. Gupta, J. Kat, and N. Chopra, “Statistical privacy in distributed average consensus on bounded real inputs,” in 2019 American Control Conference (ACC). IEEE, 2019, pp. 1836–1841.
[30] A. B. Pilet, D. Frey, and F. Taiani, “Robust privacy-preserving gossip averaging,” in International Symposium on Stabilizing, Safety, and Security of Distributed Systems. Springer, 2019, pp. 38–52.
[31] N. E. Manitara and C. N. Hadjicostis, “Privacy-preserving asymptotic average consensus,” in 2013 European Control Conference, 2013, pp. 760–765.
[32] X. Duan, J. He, P. Cheng, Y. Mo, and J. Chen, “Privacy preserving maximum consensus,” in 2015 IEEE 54th Annual Conference on Decision and Control (CDC), 2015, pp. 4517–4522.
[33] C. Altafini, “A dynamical approach to privacy preserving average consensus,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 4501–4506.
[34] S. Pequito, S. Kar, S. Sundaram, and A. P. Aguiar, “Design of communication networks for distributed computation with privacy guarantees,” in 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), 2014, pp. 1370–1376.
[35] I. D. Ridgley, R. A. Freeman, and K. M. Lynch, “Simple, private, and accurate distributed averaging,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2019, pp. 446–452.
[36] A. Alaeddini, K. Morgansen, and M. Mesbahi, “Adaptive communication networks with privacy guarantees,” in Proceedings of 2017 American Control Conference, 2017, pp. 4460–4465.
[37] M. Kishida, “Encrypted average consensus with quantized control law,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 5850–5856.
[38] C. N. Hadjicostis and A. D. Dominguez-Garcia, “Privacy-preserving distributed averaging via homomorphically encrypted ratio consensus,” IEEE Trans. Autom. Control, 2020.
[39] W. Fang, M. Zamani, and Z. Chen, “Secure and privacy preserving consensus for second-order systems based on paillier encryption,” arXiv preprint arXiv:1805.01065, 2018.
[40] M. Ruan, H. Gao, and Y. Wang, “Secure and privacy-preserving consensus,” IEEE Trans. Autom. Control, vol. 64, no. 10, pp. 4035–4049, 2019.
[41] Y. Wang, “Privacy-preserving average consensus via state decomposition,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4711–4716, 2019.
[42] A. Nedić and A. Olshevsky, “Distributed optimization over time-varying directed graphs,” IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 601–615, 2014.
[43] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
[44] J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed online algorithms over time-varying directed networks,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 1, pp. 4–17, 2018.
[45] A. Scott, “Tactical data diodes in industrial automation and control systems,” SANS Institute InfoSec Reading Room, 2015.
[46] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation of aggregate information,” in Proc. 44th IEEE Symp. Found. Comput. Sci., 2003, pp. 482–491.
[47] F. Bénézit, V. Blondel, P. Thiran, J. Tsitsiklis, and M. Vetterli, “Weighted gossip: Distributed averaging using non-doubly stochastic matrices,” in Proc. IEEE Int. Symp. Inf. Theory, 2010, pp. 1753–1757.
[48] E. Seneta, Non-negative Matrices and Markov Chains. Springer, 1973.
[49] J. A. Fill, “Eigenvalue bounds on convergence to stationarity for nonreversible markov chains, with an application to the exclusion process,” Ann. Appl. Probab., pp. 62–87, 1991.
[50] C. N. Hadjicostis, A. D. Domínguez-García, and T. Charalambous, “Distributed averaging and balancing in network systems: with applications to coordination and control,” Foundations and Trends® in Systems and Control, vol. 5, no. 2-3, pp. 99–292, 2018.
[51] K. Liu, H. Kargupta, and J. Ryan, “Random projection-based multiplicative data perturbation for privacy preserving distributed data mining,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 92–106, 2006.
[52] S. Han, W. K. Ng, L. Wan, and V. C. S. Lee, “Privacy-preserving gradient-descent methods,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 6, pp. 884–899, 2010.
[53] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving multi-keyword ranked search over encrypted cloud data,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1, pp. 222–233, 2014.
[54] A. Nedić, A. Olshevsky, and W. Shi, “A geometrically convergent method for distributed optimization over time-varying graphs,” in Proc. IEEE 55th Conf. Decis. Control, 2016, pp. 1023–1029.
[55] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
[56] A. Nedić, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Autom. Control, vol. 55, no. 4, pp. 922–938, 2010.
[57] P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in International conference on the theory and applications of cryptographic techniques. Springer, 1999, pp. 223–238.
[58] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, 1978.
[59] T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE transactions on information theory, vol. 31, no. 4, pp. 469–472, 1985.

	$\displaystyle\big{\|}N\frac{s_{i}(K+l+1)}{w_{i}(K+l+1)}-\sum_{j=1}^{N}s_{j}(K+1)\big{\|}$	(29)
$\displaystyle\leq$	$\displaystyle\frac{N\big{\|}[\mathbf{M}(K+l:K+1)\mathbf{s}(K+1)]_{i}\big{\|}}{w_{i}(K+l+1)}$
	$\displaystyle\,+\frac{N\big{\|}\mathbf{1}^{T}\mathbf{s}(K+1)[\mathbf{M}(K+l:K+1)\mathbf{w}(K+1)]_{i}\big{\|}}{Nw_{i}(K+l+1)}$
$\displaystyle\leq$	$\displaystyle\frac{N}{\varepsilon^{T(N-1)}}\big{(}\max_{j}\big{\|}[\mathbf{M}(K+l:K+1)]_{ij}\big{\|}\big{)}\big{\\|}\mathbf{s}(K+1)\big{\\|}_{1}$
	$\displaystyle\,+\frac{N}{\varepsilon^{T(N-1)}}\big{\|}\mathbf{1}^{T}\mathbf{s}(K+1)\big{\|}\big{(}\max_{j}\big{\|}[\mathbf{M}(K+l:K+1)]_{ij}\big{\|}\big{)}$

	$\displaystyle f(\mathcal{I}_{1}\,\|\,\mathbf{s}(0))$	$\displaystyle=f(\mathcal{I}_{1}^{w})f(\mathcal{I}_{1}^{s}\,\|\,\mathbf{s}_{\mathcal{A}}(0))=f(\mathcal{I}_{1}^{w})f(\mathcal{I}_{1}^{s}\,\|\,\tilde{\mathbf{s}}_{\mathcal{A}}(0))$		(57)
		$\displaystyle=f(\mathcal{I}_{1}\,\|\,\tilde{\mathbf{s}}(0))$		(57)

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$	(59)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1),\mathcal{I}_{1},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))$
	$\displaystyle\times f(\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))$
	$\displaystyle\times f(\mathbf{s}(K+1)\,\|\,\mathbf{w}(K+1),\mathcal{I}_{1},\mathbf{s}(0))f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\mathbf{s}(0))$	(60)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))f(\mathbf{s}(K+1)\,\|\,\mathcal{I}_{1}^{s},\mathbf{s}(0))$
	$\displaystyle\times f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1}^{w})$

	$\displaystyle f(\mathcal{I}_{2},\mathbf{s}(K+1),\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1},\tilde{\mathbf{s}}(0))$	(61)
$\displaystyle=$	$\displaystyle f(\mathcal{I}_{2}\,\|\,\mathbf{s}(K+1),\mathbf{w}(K+1))f(\mathbf{s}(K+1)\,\|\,\mathcal{I}_{1}^{s},\tilde{\mathbf{s}}(0))$
	$\displaystyle\times f(\mathbf{w}(K+1)\,\|\,\mathcal{I}_{1}^{w})$