A Modified Distributed Optimization Algorithm via Alternating Direction Method of Multipliers

Ziye Liu Fanghong Guo \IEEEmembershipMember, IEEE and Wei Wang \IEEEmembershipMember, IEEE Z. Liu is with the School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China (e-mail: [email protected]).F. Guo is with the Department of Automation, Zhejiang University of Technology, Hangzhou 310032, China (e-mail: [email protected]).W. Wang is with the School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China (e-mail: [email protected]).

Abstract

Alternating Direction Method of Multipliers (ADMM) algorithm is widely adopted for solving the distributed optimization problem(DOP). In this paper, a modified ADMM algorithm is proposed, in which the unconstrained distributed optimization problems are considered. Our algorithm allows agents to update their local states and dual variables in a completely distributed and parallel manner by adding extra designed items based on sequential ADMM [22]. Moreover, updating rules and storage method for variables are illustrated. It is shown that all agents can reach a consensus by asymptotically converging to the optimal solution. Besides, the global cost function will converges to the optimal value at a rate of $O(1/k)$ . A numerical simulation is given to show the effectiveness of the proposed algorithm.

{IEEEkeywords}

ADMM, distributed optimization, parallel algorithm.

1 Introduction

Optimization in networked multi-agent systems has widespread applications including wireless sensor nerworks [1], internet of things [2], distributed privacy preservation [3], distributed Nash Equilibrium seeking [4, 5] and machine learning [6]. Such a problem was initially handled in a centralized manner [7, 8, 9], which means that a centralized fusion is required to collect and process the information of all the agents through an underlying communication network. Clearly, tremendous communication and computation resources need to be consumed to implement the centralized optimization methods [10]. For instance, in linear regression problems, it is often prohibitive to gather all the data from different applications of interest [9]. In economic dispatch (ED) problems [11, 12], centralized manner cannot satisfy the plug-to-play demand.

To remove the mentioned limitations, distributed optimization problem(DOP) is considered. A common DOP is illustrated as follows,

\displaystyle\min_{x\in\mathbb{R}^{d}}\quad\sum_{i=1}^{n}f_{i}(x).

(1)

where $x\in\mathbb{R}^{d}$ is the global variable to be optimized, $f_{i}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is the local cost function privately known by agent $i$ , while unknown by other agents for $j\neq i$ . In each agent, an estimate of the optimal solution, which is also known as the state, will be generated by applying an iterative algorithm. Then the states need often to be transmitted among networked agents for the purpose of reaching a consensus as the algorithm runs. Finally, the global optimization problem (1) can be solved through collaborations among different agents.

To solve the DOP, two main classes of distributed optimization algorithms have been proposed, namely the subgradient-based methods and dual-based methods. In the former class, a subgradient computation needs to be conducted in each iteration [13]. Each agent updates its state with the local subgradient information and states collected from other agents. The subgradient-based methods have been developed for various scenarios [14, 15, 16, 17, 18, 19]. In the dual-based methods, a Lagrangian-related functions need to be constructed in the update steps. Alternating Direction Method of Multipliers (ADMM) is a well-known dual-based method, of which the early works can be traced back to [20, 21, 35]. By treating the states as primal variables, ADMM minimizes an augmented Lagrangian function and updates both primal and dual variables alternately. In recent years, ADMM has become a popular way to solve DOPs in various areas. Compared with the subgradient-based methods, ADMM-based methods can achieve faster convergence rate [7, 25], and improved robustness against Gaussian noises[37].

It is worth noting that for some early works on ADMM-based algorithms developed to solve the optimization problems in a multi-agent systems, the utilization of the centralized fusion units cannot be avoided [7]. In [23], an ADMM-based optimization algorithm named Jacobi-Proximal ADMM is proposed. It adds a proximal term for each primal variable and a damping parameter for the update step of dual variable. Though the method shows a satisfactory performance, the dual variables need to be handled in a centralized manner. To avoid the need of centralized data processing, some distributed ADMM-based algorithms have been proposed. A sequential ADMM is proposed in [22], in which each agent updates its states by following a predefined order. Both primal and dual variables are computed in a distributed manner. However, the predefined updating order may cost more computational resources, which makes it not well suited for large-scaled systems. A novel algorithm named parallel ADMM is proposed in [8]. It reduplicates dual variables for each edge and doubles the dimension of edge-node incidence matrix to make parallelism amenable, hence the algorithm may need extra storage space.

In this paper, the sequential ADMM algorithm in [22] will be modified by adding extra terms are added in the primal variable updating rule. And then the primal variables of different iterations are adopted to update the dual variable for each agent. Main features of the newly presented algorithm are summarized as follows,

1)

The presented algorithm is fully distributed, in the sense that a central unit for data collection and processing is not needed for solving the optimization problem. The update of both primal and dual variables for each agent only depends on its local and neighboring information.
2)

The algorithm is suitable for parallel computing. By modifing the existing algorithm in [22], each agent updates primal variable in a parallel manner rather than a preset sequential order.

The remainder of our paper is organized as follows. In Section II, the considered unconstrained DOP is presented and the sequential ADMM algorithm in [22] is reviewed. In Section III, the modified ADMM for solving Probmen (1) with distributed and paralle manner is developed. In Section IV, numerical simulation results are provided to validate the effectiveness of the proposed method, followed by the conclusion drawn in Section V.

Notation: For a column vector $x$ , $x^{\rm T}$ denotes the transpose of $x$ . $||x||$ denotes the standard Euclidean norm, $||x||^{2}=x^{\rm T}x$ . For a matrix $A$ , $A^{\rm T}$ denotes the transpose of $A$ . $A_{ij}$ denotes the entry located on the $i$ th row and $j$ th column of matrix $A$ . $[A]_{i}$ and $[A]^{j}$ denotes the $i$ th column and $j$ th row of $A$ , respectively. For matrices $A=[A_{ij}]\in\mathbb{R}^{m\times n}$ and $B=[B_{ij}]\in\mathbb{R}^{m\times n}$ , $A\pm B=[A_{ij}\pm B_{ij}]$ and $\max\{A,B\}\triangleq[\max\{A_{ij},B_{ij}\}]$ . $\mathbb{0}$ represents a matrix of all zeros. $\partial f(x)$ represents the set of all subgradient of function $f(x)$ at $x$ . For a nonempty and finite set $F$ , $|F|$ denotes the cardinality of the set.

2 Problem Statement

In this paper, we shall revise the distributed ADMM algorithm in [22], such that it can be implemented in a parallel manner. Before we present the details of our proposed algorithm, some necessary preliminaries are provided in this section.

2.1 Problem Statement

In this paper, an unconstrained distributed optimization problem, as shown in (1), will be considered. To be more specific, $n$ agents indexed by $1,2,...,n$ are aimed at solving the optimization problem collaboratively under a fixed undirected graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ , where $\mathcal{V}=\{1,2...n\}$ and $\mathcal{E}$ denote the set of nodes and the set of edges connecting two distinct agents, respectively. Suppose that there is at most one edge $e_{ij}$ with a subscript pair $(i,j)$ connecting agents $i$ and $j$ , we prescribe the the first subscript $i$ and the second subscript $j$ satisfy $i<j$ , for example, if agent 1 and agent 2 are connected, then $e_{12}$ is the sole edge for them.

Let us recall the basic idea of solving a distributed optimization problem [7, 13, 22]

\displaystyle\min_{x\in\mathbb{R}}\quad\sum_{i=1}^{n}f_{i}(x)

(2)

where $x\in\mathbb{R}$ is the global variable to be optimized, $f_{i:}$ $\mathbb{R}\rightarrow\mathbb{R}$ is the local cost function stored in agent $i$ .

The goal of Problem (2) is to find the minimizer $x^{*}$ , such that the function $\sum_{i=1}^{n}f_{i}(x)$ can be minimized when $x=x^{\ast}$ . To develop a distributed algorithm for solving Problem (2), this problem should often be reformulated to accommodate the distributed property. Normally, each agent $i$ holds an estimate of $x^{\ast}$ , which can be denoted by $x_{i}$ . As the optimization algorithm goes on, each agent $i$ , for $i\in\mathcal{V}$ , will utilize its local estimate and the information received from its neighbors till current iteration to update its local estimate for the next iteration. After sufficient iterations, the estimates for all agents tend to reach a consensual value. The form of Problem (2) can thus be changed as follows,

	$\displaystyle\min_{x_{i}\in\mathbb{R}}\quad$	$\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}),$
	$\displaystyle\operatorname{s.t.}\quad$	$\displaystyle x_{i}=x_{j},\forall e_{ij}\in\mathcal{E}.$		(3)

The linear constraint implies the consensual mentioned above, Problem (3) and Problem (2) has the same solution.

2.2 Distributed Sequential ADMM Algorithm [22]

To solve Problem (3), a distributed sequential ADMM algorithm is presented in [22]. Firstly, the updating rules of agent $i$ in the $k$ th iteration are developed as follows,

$\displaystyle x_{i}^{k+1}=$	$\displaystyle\underset{x_{i}}{\operatorname{argmin}}\Big{\{}f_{i}\left(x_{i}\right)$
	$\displaystyle+\frac{\rho}{2}\sum_{j\in P_{i}}\|\|x_{j}^{k+1}-x_{i}-\frac{1}{\rho}\lambda_{ji}^{k}\|\|^{2}$
	$\displaystyle+\frac{\rho}{2}\sum_{j\in S_{i}}\|\|x_{i}-x_{j}^{k}-\frac{1}{\rho}\lambda_{ij}^{k}\|\|^{2}\Big{\}},$	(4)
$\displaystyle\lambda_{ji}^{k+1}=$	$\displaystyle\lambda_{ji}^{k}-\rho(x_{j}^{k+1}-x_{i}^{k+1})$	(5)

where $\rho$ is a positive penalty parameter. $\lambda_{ji}$ denotes the dual variable associated with the constraint on edge $e_{ji}$ stored in agent $i$ . $N_{i}=\{j\in\mathcal{V}|e_{ij}\in\mathcal{E}\}$ is the set of neighbors of agent $i$ . Furthermore, the subsets of neighbors $P_{i}=\{j|e_{ji}\in\mathcal{E},j<i\}$ and $S_{i}=\{j|e_{ij}\in\mathcal{E},j>i\}$ are called the predecessors and successors of agent $i$ , respectively. Take the sample graph of 4 agents in Fig. 1 as an example. $N_{2}=\{1,4\}$ , $F_{2}=\{1\}$ , and $R_{2}=\{4\}$ .

Refer to caption — Figure 1: Sample graph of 4 nodes.

The detailed procedure of the distributed sequential ADMM algorithm developed in [22] is provided below.

Algorithm 1 [22] Distributed sequantial ADMM algorithm

1:

Input: $x_{i}^{0},\lambda_{ij}^{0}$ for each $i\in\mathcal{V}$ .
2:

for $k=1,2...$ do
3:

for $i=1,2...n$ do
4:

Agent $i$ updates $x_{i}^{k+1}$ according to (4).
5:

Agent $i$ sends $x_{i}^{k+1}$ to its successors.
6:

end for
7:

Each agent $i$ updates its $\lambda_{ji}^{k+1}$ for $j\in P_{i}$ according to (5), and send $\lambda_{ji}^{k+1}$ to its neighbors.
8:

end for
9:

Output: $x_{i}^{k}$ $\forall i\in\mathcal{V}$

Remark 1

The distributed sequential ADMM algorithm in [22] is developed based on standard ADMM [7]. The distributed feature of the algorithm can be observed from the updating rules (4)-(5). More specifically, only locally available information received from its neighbors is needed to update both primal variable $x_{i}$ and dual variable $\lambda_{ji}$ , with $j<i$ , for each agent $i$ . In other words, center node is not necessary for data collection and processing to implement the algorithm.

Nevertheless, the updating process in (4) is calculated in a sequential order from agent $1$ to $n$ . Each agent $i$ needs to collect its predecessors’ information obtained in iteration $k+1$ (i.e. $x_{j}^{k+1},j\in P_{i}$ ) to update its local state in iteration $k+1$ (i.e. $x_{i}^{k+1}$ ). It implies that the updating processes for the entire group should be operated from agent $1$ to agent $n$ in a sequential order, as depicted in Fig. 2. Hence, the sequential ADMM algorithm may not be suitable to solve distributed optimization problems for large scaled multi-agent systems, since the total waiting time will be increased to an unacceptable level when the number of agents is large.

3 A Modified Distributed Parallel ADMM algorithm

To remove the limitation of distributed sequential ADMM algorithm [22] as summarized in Section II, a new distributed ADMM algorithm is presented in this section. We first present the updating rules of each agent $i$ in the $k$ th iteration:

$\displaystyle x_{i}^{k+1}=$	$\displaystyle\underset{x_{i}}{\operatorname{argmin}}\Big{\{}f_{i}(x_{i})$
	$\displaystyle+(\sum_{j\in P_{i}}\lambda_{ji}^{k}-\sum_{j\in S_{i}}\lambda_{ij}^{k})x_{i}+\frac{\rho}{2}\sum_{j\in N_{i}}\\|x_{i}-x_{j}^{k}\\|^{2}$
	$\displaystyle-\rho(\sum_{j\in P_{i}}x_{j}^{k}-\|S_{i}\|x_{i}^{k})x_{i}+\frac{T\rho}{2}\|S_{i}\|\\|x_{i}-x_{i}^{k}\\|^{2}\Big{\}},$	(6)
$\displaystyle\lambda_{ij}^{k+1}=$	$\displaystyle\lambda_{ij}^{k}-\rho(x_{i}^{k+1}-x_{j}^{k})$	(7)

where $\rho$ is a positive penalty parameter. $T$ is a constant satisfying $T\geqslant 3$ . $P_{i}$ and $S_{i}$ are defined the same as mentioned above in Section II. $\lambda_{ij}$ denotes the dual variable associated with the constraint on edge $e_{ij}$ stored in agent $i$ . Notice that according to the subscript, dual variable $\lambda_{ij}$ is stored in different agents in the two algorithm. The detailed procedure of the newly presented distributed ADMM algorithm is summarized below.

Algorithm 2 Modified distributed parallel ADMM algorithm

1:

Input: $x_{i}^{0},\lambda_{ij}^{0}$ for each $i\in\mathcal{V}$ .
2:

for $k=1,2...$ do
3:

Each agent $i$ updates $x_{i}^{k+1}$ according to (6).
4:

Each agent $i$ updates $\lambda_{ij}^{k+1}$ for $j\in S_{i}$ according to (7).
5:

Each agent $i$ sends $x_{i}^{k+1}$ and $\lambda_{ij}^{k+1}$ to its neighbors.
6:

end for
7:

Output: $x_{i}^{k}$ $\forall i\in\mathcal{V}$

Remark 2

Note that the definations of the predecessors and successors of the distributed sequential ADMM algorithm in [22] and our newly proposed algorithm are the same, mathematically.

The proposed algorithm changes the sequential updating manner to a parallel way. As observed from (6), no terms related to $x_{j}^{k+1}$ are needed to update $x_{i}^{k+1}$ . Hence all the $n$ agents can update both primal and dual variables simultaneously, rather than waiting for its predecessors to complete their updating procedures. For easier understanding the major difference between Algorithms 1 and 2, the parallel updating manner of $x_{i}^{k+1}$ for the considered group of 4 agents is plotted in Fig. 3.

4 Convergence Analysis

In this section, we shall firstly show that all the agents’ states $x_{i}^{k}$ will converge to a unique optimal solution $\bar{x}^{\ast}$ as the number of iterations approaches infinity. Then the convergence rate will be analyzed.

Denote the global cost function by $F({x})=\sum_{i=1}^{n}f_{i}({x_{i})}$ , where $x=[x_{1},x_{2},...,x_{n}]^{\rm T}\in\mathbb{R}^{n}$ , and the optimal solution vector by $\bar{x}^{*}=[x^{*},x^{*},...,x^{*}]^{\rm T}\in\mathbb{R}^{n}$ .

Let the vector $\lambda$ be the Lagrange multiplier vector. For clarity, $\lambda$ is stacked by dual variables with subscript numbers arranged in the ascending lexicographical order. Next an edge-node incidence matrix $A=[A_{pq}]\in\mathbb{R}^{m\times n}$ as defined in [22] is reintroduced to describe the existence of edges among two distinct agents for a given graph. Suppose that $\lambda_{pq}$ is the $r$ th element of $\lambda$ . $\lambda_{pq}$ corresponds to the $r$ th row of $A$ (denoted by $[A]^{r}$ ). Note that the elements in $[A]^{r}$ satisfy that $A_{rp}=1,A_{rq}=-1$ and other entries are $0$ . For example, the vector $\lambda$ can be written as $\lambda=[\lambda_{12},\lambda_{13},\lambda_{14},\lambda_{24},\lambda_{34}]^{\rm T}$ , then the edge-incidence matrix $A$ of Fig. 1 is shown as follows,

\displaystyle A=\begin{bmatrix}1&-1&0&0\\ 1&0&-1&0\\ 1&0&0&-1\\ 0&1&0&-1\\ 0&0&1&-1\end{bmatrix},

(8)

Now Problem (3) can be rewritten in the following compact form,

	$\displaystyle\min_{x_{i}\in\mathbb{R}}\quad$	$\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}),$
	$\displaystyle\operatorname{s.t.}\quad$	$\displaystyle Ax=0.$		(9)

next introduce two matrices $B=\max\{\mathbb{0},A\}$ and $E=A-B$ for further illustration.

Denote Lagrangian function of Problem (1) by $L(x,\lambda)=F(x)-\lambda^{\rm T}Ax$ , next some typical assumptions are imposed.

Assumption 1: Each cost function $f_{i}:\mathbb{R}\rightarrow\mathbb{R}$ is closed, proper and convex.

The next assumption states the existence of the saddle point.

Assumption 2: The Lagrangian function has a saddle point, i.e., there exists a solution, variable pair $\{x^{*},\lambda^{*}\}$ with

\displaystyle L(x^{*},\lambda)\leqslant L(x^{*},\lambda^{*})\leqslant L(x,\lambda^{*}).

(10)

The following lemma shows that Problem (3) can be changed to the form of variational inequality, based on which we can complete the proof of optimality. which is not expressed in [22]. The details are demonstrated in Section IV.

Before establishing the main theorem, we first show three important lemmas. We now recall the following preliminary results.

Lemma 1[23]: Problem (3) is equivalent to variational inequality as follows,

\displaystyle F(x)-F(x^{*})-(u-u^{*})^{\rm T}\mathscr{F}(u^{*})\geqslant 0,\forall u

(11)

where $u=(x^{\rm T},\lambda^{\rm T})^{\rm T}\in\mathbb{R}^{n+m}$ , $u^{*}=({x^{*}}^{\rm T},{\lambda^{*}}^{\rm T})^{\rm T}\in\mathbb{R}^{n+m}$ . $\mathscr{F}(u)=[-[A]_{1}^{T}\lambda,-[A]_{2}^{T}\lambda,...,-[A]_{n}^{T}\lambda,Ax]^{T}$ is the solution of the variational inequality (3).

The classification of neighbors in Section II and Section III serves the convergence proof of our upcoming algorithm and has a special property without losing generality, shown in the next lemma. This property plays a crucial role in the convergence analysis. The proof of this lemma is provided in Appendix.

Lemma 2: In an undirected graph $\mathcal{G}$ with $n$ nodes, the following equation holds for all $n$ :

\displaystyle\sum_{i=1}^{n}|F_{i}|=\sum_{i=1}^{n}|R_{i}|.

(12)

Next a basic lemma for convergence analysis is established.

Lemma 3: The following inequality holds for all $k$ :

	$\displaystyle F(x)-F(x^{k+1})+(x-x^{k+1})^{\rm T}[-A^{\rm T}\lambda^{k+1}$
$\displaystyle+$	$\displaystyle\rho(B^{\rm T}B+4E^{\rm T}E-A^{\rm T}E)(x^{k+1}-x^{k})]$
$\displaystyle+$	$\displaystyle(x-x^{k+1})^{\rm T}A^{\rm T}Ex^{k}\geqslant 0.$	(13)

Proof: We define $g_{i}$ as follows,

	$\displaystyle g_{i}^{k}(x_{i})$	$\displaystyle=(\sum_{j\in F_{i}}\lambda_{ji}^{k}-\sum_{j\in R_{i}}\lambda_{ij}^{k})x_{i}+\frac{\rho}{2}\sum_{j\in N_{i}}\\|x_{i}-x_{j}^{k}\\|^{2}$
		$\displaystyle\quad-\rho(\sum_{j\in F_{i}}x_{j}^{k}-\|R_{i}\|x_{i}^{k})x_{i}+\frac{3\rho}{2}\|R_{i}\|\\|x_{i}-x_{i}^{k}\\|^{2}.$		(14)

Then the gradient of $g_{i}^{k}(x_{i}^{k+1})$ is

$\displaystyle\nabla g_{i}^{k}(x_{i}^{k+1})=$	$\displaystyle\sum_{j\in F_{i}}\lambda_{ji}^{k}-\sum_{j\in R_{i}}\lambda_{ij}^{k}+\rho(\|N_{i}\|x_{i}^{k+1}-\sum_{j\in N_{i}}x_{j}^{k})$
	$\displaystyle+\rho(\sum_{j\in F_{i}}x_{j}^{k}-\|R_{i}\|x_{i}^{k})+3\rho\|R_{i}\|(x_{i}^{k+1}-x_{i}^{k})$
$\displaystyle=$	$\displaystyle\sum_{j\in F_{i}}\lambda_{ji}^{k+1}-\sum_{j\in R_{i}}\lambda_{ij}^{k+1}+\rho[\sum_{j\in F_{i}}(x_{j}^{k+1}-x_{j}^{k})$
	$\displaystyle+\|F_{i}\|(x_{i}^{k+1}-x_{i}^{k})]-\rho(\sum_{j\in F_{i}}x_{j}^{k}-\|R_{i}\|x_{i}^{k})$
	$\displaystyle+3\rho\|R_{i}\|(x_{i}^{k+1}-x_{i}^{k})$	(15)

where the equation holds by utilizing (7) and the relation $(|N_{i}|x_{i}^{k+1}-\sum_{j\in N_{i}}x_{j}^{k})-(|R_{i}|x_{i}^{k+1}-\sum_{j\in R_{i}}x_{j}^{k})=|F_{i}|x_{i}^{k+1}-\sum_{j\in F_{i}}x_{j}^{k}$ . Then denote $h_{i}(x_{i}^{k+1})\in\partial f_{i}(x^{k+1})$ , thus we have $h_{i}(x_{i}^{k+1})+\nabla g_{i}^{k}(x_{i}^{k+1})=0$ and after that $(x_{i}-x_{i}^{k+1})[h_{i}(x_{i}^{k+1})+\nabla g_{i}^{k}(x_{i}^{k+1})]=0$ holds. For the relation of subgradient holds: $f_{i}(x_{i})\geqslant f_{i}(x_{i}^{k+1})+(x_{i}-x_{i}^{k+1})^{\rm T}h_{i}(x_{i}^{k+1})$ , we get the following inequality:

\displaystyle f_{i}(x_{i})-f_{i}(x_{i}^{k+1})+(x_{i}-x_{i}^{k+1})^{\rm T}\nabla g_{i}^{k}(x_{i}^{k+1})\geqslant 0.

(16)

By substituting $\nabla g_{i}^{k}(x_{i}^{k+1})$ with (4) we obtain

	$\displaystyle f_{i}(x_{i})-f_{i}(x_{i}^{k})$
$\displaystyle+$	$\displaystyle(x_{i}-x_{i}^{k})^{\rm T}\cdot(\sum_{j\in F_{i}}\lambda_{ji}^{k+1}-\sum_{j\in R_{i}}\lambda_{ij}^{k+1})$
$\displaystyle+$	$\displaystyle(x_{i}-x_{i}^{k})^{\rm T}\cdot\{\rho(\sum_{j\in F_{i}}x_{j}^{k+1}-\sum_{j\in F_{i}}x_{j}^{k}+\|F_{i}\|x_{i}^{k+1}-\|F_{i}\|x_{i}^{k})$
$\displaystyle-$	$\displaystyle\rho(\sum_{j\in F_{i}}x_{j}^{k}-\|R_{i}\|x_{i}^{k})+3\rho\|R_{i}\|(x_{i}^{k+1}-x_{i}^{k})\}\geqslant 0.$	(17)

Noting that the following relations hold for all $k$ :

	$\displaystyle\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}(\sum_{j\in F_{i}}\lambda_{ji}^{k+1}-\sum_{j\in R_{i}}\lambda_{ij}^{k+1})$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}(-[A]_{i}^{\rm T}\lambda^{k+1})$
$\displaystyle=$	$\displaystyle(x-x^{k+1})^{\rm T}A^{\rm T}\lambda^{k+1},$	(18)

		$\displaystyle\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}(\sum_{j\in F_{i}}x_{j}^{k+1}-\sum_{j\in F_{i}}x_{j}^{k}+\|F_{i}\|x_{i}^{k+1}-\|F_{i}\|x_{i}^{k})$
	$\displaystyle=$	$\displaystyle(x-x^{k+1})^{\rm T}(B^{\rm T}B+E^{\rm T}E-A^{\rm T}E)(x^{k+1}-x^{k}),$		(19)

		$\displaystyle\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}(\sum_{j\in F_{i}}x_{j}^{k}-\|R_{i}\|x_{i}^{k})$
	$\displaystyle=$	$\displaystyle(x-x^{k+1})(-A^{\rm T}E)x^{k},$		(20)

		$\displaystyle\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}\|R_{i}\|(x_{i}^{k+1}-x_{i}^{k})$
	$\displaystyle=$	$\displaystyle(x-x^{k+1})^{\rm T}E^{\rm T}E(x^{k+1}-x^{k})$		(21)

where relations (19) and (20) hold due to the matrix relation $A=B+E$ .

Now summing over $1,2,...,n$ , substituting the terms in (4) and using (18)-(21) we complete the proof. $\hfill\blacksquare$

The following theorem illustrates the main result in our algorithm that all agents’ states converge to the optimal point.

Theorem 1: The state $x_{i}$ generated by Algorithm 2 in each agent $i$ converges to the optimal solution of Problem (1), and global cost function converges to the optimal value with $k\rightarrow\infty$ . i.e.,

	$\displaystyle\lim_{k\rightarrow\infty}x_{i}^{k}=\bar{x}^{*},i=1,2,...n,$		(22)
	$\displaystyle\lim_{k\rightarrow\infty}F(x^{k})=F(x^{*})$		(23)

where $\bar{x}^{*}$ and $F(x^{*})$ is the optimal solution and the optimal value, respectively.

Proof: According to $\lambda^{k+1}=\lambda^{k}-\rho(A-E)x^{k+1}-\rho Ex^{k}$ derived from (7) and $Ax^{*}=0$ , where $x^{*}$ is the optimal solution, these following relations hold as follows,

	$\displaystyle\rho(x^{*}-x^{k+1})^{\rm T}B^{\rm T}B(x^{k+1}-x^{k})$
$\displaystyle=$	$\displaystyle\frac{\rho}{2}(\\|Bx^{k}-Bx^{}\\|^{2}-\\|Bx^{k+1}-Bx^{}\\|^{2})$
	$\displaystyle-\frac{\rho}{2}\\|Bx^{k+1}-Bx^{k}\\|^{2},$	(24)

	$\displaystyle 4\rho(x^{*}-x^{k+1})’E^{\rm T}E(x^{k+1}-x^{k})$
$\displaystyle=$	$\displaystyle 2\rho(\\|Ex^{k}-Ex^{}\\|^{2}-\\|Ex^{k+1}-Ex^{}\\|^{2})$
	$\displaystyle-2\rho\\|E(x^{k+1}-x^{k})\\|^{2},$	(25)

	$\displaystyle 2\rho(x^{k+1})^{\rm T}A^{\rm T}Ex^{k+1}-2\rho(x^{k+1})^{\rm T}A^{\rm T}Ex^{k}$
$\displaystyle=$	$\displaystyle\frac{\rho}{2}\\|Ax^{k+1}\\|^{2}+2\rho\\|E(x^{k+1}-x^{k})\\|^{2}$
	$\displaystyle-\frac{\rho}{2}\\|2E(x^{k+1}-x^{k})-A^{k+1}\\|^{2},$	(26)

	$\displaystyle\rho(x^{k+1})^{\rm T}A^{\rm T}(\lambda^{k+1}-\lambda^{*})-\rho(x^{k+1})^{\rm T}A^{\rm T}Ex^{k+1}$
$\displaystyle=$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{k}-\lambda^{}-Ex^{k}\\|^{2}-\\|\lambda^{k+1}-\lambda^{}-Ex^{k+1}\\|^{2})$
	$\displaystyle-\frac{\rho}{2}\\|Ax^{k+1}\\|^{2}.$	(27)

Then the following relation hold:

	$\displaystyle\rho(x^{k+1})^{\rm T}A^{\rm T}(\lambda^{k+1}-\lambda^{*})$
	$\displaystyle+\rho(x^{*}-x^{k+1})^{\rm T}(B^{\rm T}B+4E^{\rm T}E-A^{\rm T}E)(x^{k+1}-x^{k})$
	$\displaystyle-x^{k+1}A^{\rm T}Ex^{k}$
$\displaystyle=$	$\displaystyle\rho(x^{k+1})^{\rm T}A^{\rm T}(\lambda^{k+1}-\lambda^{*})$
	$\displaystyle+\rho(x^{*}-x^{k+1})^{\rm T}(B^{\rm T}B+4E^{\rm T}E)\cdot(x^{k+1}-x^{k}),$
	$\displaystyle+(x^{k+1})^{\rm T}A^{\rm T}Ex^{k+1}-2(x^{k+1})^{\rm T}A^{\rm T}Ex^{k}$
$\displaystyle=$	$\displaystyle-\frac{\rho}{2}\\|2E(x^{k+1}-x^{k})-Ax^{k+1}\\|^{2}-\frac{\rho}{2}\\|B(x^{k+1}-x^{k})\\|^{2}$
	$\displaystyle+\frac{1}{2\rho}(\\|\lambda^{k}-\lambda^{}-Ex^{k}\\|^{2}-\\|\lambda^{k+1}-\lambda^{}-Ex^{k+1}\\|^{2})$
	$\displaystyle+\frac{\rho}{2}(\\|Bx^{k}-Bx^{}\\|^{2}-\\|Bx^{k+1}-Bx^{}\\|^{2})$
	$\displaystyle+2\rho(\\|Ex^{k}-Ex^{}\\|^{2}-\\|Ex^{k+1}-Ex^{}\\|^{2}).$	(28)

Recall (4), set $x=x^{*}$ in and rearrange the terms, the following inequality is established:

	$\displaystyle(x^{k+1})^{\rm T}A^{\rm T}(\lambda^{k+1}-\lambda^{*})$
	$\displaystyle+(x^{*}-x^{k+1})\rho(B^{\rm T}B+4E^{\rm T}E)\cdot(x^{k+1}-x^{k})$
	$\displaystyle+(x^{k+1})^{\rm T}A^{\rm T}Ex^{k+1}-2x^{k+1}A^{\rm T}Ex^{k}$
$\displaystyle\geqslant$	$\displaystyle F(x^{k+1})-(x^{k+1})^{\rm T}A^{\rm T}\lambda^{}-F(x^{})\geqslant 0$	(29)

where in the second inequality we use the relation of the saddle point of Lagrangian function relation $F(x^{*})-\lambda^{\rm T}Ax^{*}\leq F(x^{*})-(\lambda^{*})^{\rm T}Ax^{*}$ .

Let $V^{k}=\frac{1}{2\rho}\|\lambda^{k}-\lambda^{*}-Ex^{k}\|^{2}+\frac{\rho}{2}\|Bx^{k}-Bx^{*}\|^{2}+2\rho\|Ex^{k}-Ex^{*}\|^{2}$ , and we have that $V^{k}$ is a nonnegative term. By combining (4) and (4), we obtain

	$\displaystyle V^{k}-V^{k+1}-\frac{\rho}{2}\\|2E(x^{k+1}-$	$\displaystyle x^{k})-Ax^{k+1}\\|^{2}$
		$\displaystyle-\frac{\rho}{2}\\|B(x^{k+1}-x^{k})\\|^{2}\geqslant 0.$		(30)

Then rearranging and summing preceding relation over $0,1,...,k$ yields

$\displaystyle V^{0}-V^{k}\geqslant$	$\displaystyle\sum_{k=0}^{s}(\frac{\rho}{2}\\|2E(x^{k+1}-x^{k})-Ax^{k+1}\\|^{2}$
	$\displaystyle+\frac{\rho}{2}\\|B(x^{k+1}-x^{k})\\|^{2})$
$\displaystyle\geqslant$	$\displaystyle 0.$	(31)

Inequality (4) implies $V^{k}$ is bounded because $V_{0}$ is bounded and left side of (4) is nonnegative. Thus the following equation holds for any arbitrary constant $\rho$ .

	$\displaystyle\lim_{k\rightarrow\infty}\frac{\rho}{2}(\\|2E(x^{k+1}-x^{k})-$	$\displaystyle Ax^{k+1}\\|^{2}$
		$\displaystyle+\\|B(x^{k+1}-x^{k})\\|^{2})=0.$		(32)

Equation (4) implies that that states of agents converge to the same point when the number of iterations goes to infinity. We now show that this point is the optimal point of the mentioned optimization Problem (1).

It is trivial to obtain that $\lambda^{k+1}=\lambda^{k}$ and $x_{1}^{k+1}=x_{2}^{k+1}=\ldots=x_{n}^{k+1}$ from $Ax^{k+1}=0$ and $x^{k+1}=x^{k}$ . By summing up (4) over $1,2,...,n$ , for $k\rightarrow\infty$ , $\lambda^{k+1}=\lambda^{k}$ and $x_{1}^{k+1}=x_{2}^{k+1}=\ldots=x_{n}^{k+1}$ , we obtain that

	$\displaystyle F(x)-F(x^{k+1})$	$\displaystyle+\sum_{i=1}^{n}(x_{i}-x_{i}^{k+1})^{\rm T}$
	$\displaystyle\cdot$	$\displaystyle\Big{\{}(-[A]_{i}\lambda-\rho(\sum_{j\in F_{i}}x_{j}^{k}-\sum_{j\in R_{i}}x_{i}^{k})\Big{\}}\geqslant 0.$		(33)

Notice that the optimal solution vector ${x}^{*}$ satisfies $Ax^{*}=0$ , which implies the optimal solution $\bar{x}^{*}$ is the consensual value for all $x_{i}$ , i.e., $(x_{1}^{*},x_{2}^{*},...,x_{n}^{*})^{\rm T}=(\bar{x}^{*},\bar{x}^{*},...,\bar{x}^{*})^{\rm T}$ . For relation (4), now we let $x_{1}=x_{2}=...=x_{n}=\hat{x}$ , then denote $\hat{u}=[\hat{x}\textbf{1}^{\rm T},\lambda^{\rm T}]^{\rm T}\in\mathbb{R}^{n+m}$ and $u^{k+1}=[{x^{k+1}}^{\rm T},{\lambda^{k+1}}^{\rm T}]^{\rm T}\in\mathbb{R}^{n+m}$ . Then for arbitrary $\hat{x}\in\mathbb{R}$ , inequality (4) can be written as

	$\displaystyle F(x)$	$\displaystyle-F(x^{k+1})+(\hat{u}-u^{k+1})^{\rm T}\mathscr{F}(u^{k+1})$
		$\displaystyle-\rho(\hat{u}-u^{k+1})\cdot(\sum_{i=1}^{n}\|F_{i}\|-\sum_{i=1}^{n}\|R_{i}\|)x\geqslant 0.$		(34)

By applying Lemma 2, (4) can be simplified as

\displaystyle F(x)-F(x^{k+1})+(\hat{u}-u^{k+1})^{\rm T}\mathscr{F}(u^{k+1})\geqslant 0.

(35)

Relation (35) indicates that for sufficiently large $k$ , vector $u^{k+1}$ is the solution to (11). Then recall that Lemma 1 implies that the solution of the preceding VI is equivalent to the solution of our primary optimization problem. By using this lemma and the foregoing details we complete the proof of Theorem 1. $\hfill\blacksquare$

Remark 3

The term $\frac{\rho}{2}\sum_{j\in R_{i}}\|x_{i}-x_{i}^{k}\|^{2}$ with coefficient $t=3$ in (6) is the standard form of our algorithm. It can also be proved that when the coefficient $t>3$ , Algorithm 1 also meets Theorem 1 but shows worse simulation performance.

The next theorem reveals the convergence rate for our algorithm and shows the same rate as mentioned in [22, 10].

Theorem 2: Algorithm 2 converges at a rate of $O(\frac{1}{k})$ .

Proof: By adding (4) into (4), we obtain

	$\displaystyle F(x^{*})-F(x^{k+1})$
$\displaystyle-$	$\displaystyle\frac{\rho}{2}\\|2E(x^{k+1}-x^{k})-Ax^{k+1}\\|^{2}-\frac{\rho}{2}\\|Bx^{k+1}-Bx^{k}\\|^{2}$
$\displaystyle+$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{k}-Ex^{k}\\|^{2}-\\|\lambda^{k+1}-Ex^{k+1}\\|^{2})$
$\displaystyle+$	$\displaystyle\frac{\rho}{2}(\\|Bx^{k}-Bx^{}\\|^{2}-\\|Bx^{k+1}-Bx^{}\\|^{2})$
$\displaystyle+$	$\displaystyle 2\rho(\\|Ex^{k}-Ex^{}\\|^{2}-\\|Ex^{k+1}-Ex^{}\\|^{2})\geqslant 0.$	(36)

Denote $\overline{V}^{k}=\frac{1}{2\rho}\|\lambda^{k}-Ex^{k}\|+\frac{\rho}{2}\|Bx^{k}-Bx^{*}\|^{2}+2\rho\|Ex^{k}-Ex^{*}\|^{2}$ , summing up from $0$ to $s-1$ we have

	$\displaystyle\overline{V}^{0}-\overline{V}^{s}+sF(x^{*})-\sum_{k=0}^{s-1}F(x^{k+1})-\frac{\rho}{2}\sum_{k=0}^{s-1}\\|Bx^{k+1}-Bx^{k}\\|^{2}$
	$\displaystyle-\frac{\rho}{2}\sum_{k=0}^{s-1}\\|2E(x^{k+1}-x^{k})-Ax^{k+1}\\|^{2}\geqslant 0.$		(37)

Let $\overline{x}^{s}=\frac{1}{s}\sum_{k=0}^{s-1}x^{k+1}$ . Since $F(x)$ is convex, we have $\sum_{k=0}^{s-1}F(x^{k+1})\geqslant sF(\overline{x}^{k})$ . Then we obtain

$\displaystyle sF(x^{*})-F(\overline{x}^{s})+\overline{V}^{0}$	$\displaystyle\geqslant\overline{V}^{s}-\frac{\rho}{2}\sum_{k=0}^{s-1}\\|Bx^{k+1}-Bx^{k}\\|^{2}$
	$\displaystyle\quad-\frac{\rho}{2}\sum_{k=0}^{s-1}\\|2E(x^{k+1}-x^{k})-Ax^{k+1}\\|^{2}$
	$\displaystyle\geqslant 0.$	(38)

Finally by the re-arrangement of (4), we obtain

\displaystyle F(\overline{x}^{s})-F(x^{*})\leqslant\frac{1}{s}\overline{V}^{0}.

(39)

$\hfill\blacksquare$

Remark 4

It is worth mentioning that multi-dimensional problems like Problem 1 are more general in practice, For simplicity, we consider one-dimensional scenario. Multi-dimensional can be expanded easily from one-dimensional problem algorithm by Kronecker product [22, 7].

5 Simulation

5.1 Experiment Setup

A distributed sensing problem is considered with a network of 9 nodes shown in Fig. 4. For an unknown signal(using one-dimensional signal for convenience) $x\in\mathbb{R}$ , each agent $i$ measures $x$ by using $y_{i}=M_{i}x+e_{i}$ , where $y\in\mathbb{R}$ and $M_{i}\in\mathbb{R}$ are measured data following the standard Gaussian distribution $\mathcal{N}(0,1)$ . Then an consensus least squares problem is to be solved as follows,

\displaystyle\underset{x\in\mathbb{R}}{\operatorname{min}}\quad F(x)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{2}\left\|M_{i}x-y_{i}\right\|_{2}^{2}.

(40)

We use $\frac{\|x^{k}-x^{*}\|_{2}}{\|x^{*}|\|_{2}}$ to record the residual. Both prime and dual variables are initialized by zero. Penalty parameter $\rho$ is chosen to be 1. For comparison, some classical distributed optimization algorithms are also considered such as JADMM in [23] and DSM in [13]. The first one is another distributed ADMM-based algorithm with parallel manner, and the latter one is one of the most nown subgradient-based algorithm.

5.2 JADMM algorithm

We reformulate the typical form of JADMM in [23] as follows,

	$\displaystyle x_{i}^{k+1}\in$	$\displaystyle\operatorname{argmin}_{x_{i}}\Big{\{}f_{i}\left(x_{i}\right)+\frac{1}{2}\\|x_{i}-x_{i}^{k}\\|^{2}$
		$\displaystyle+\sum_{j\in\mathcal{N}_{i}}[(\lambda_{ij}^{k})^{T}x_{i}+\frac{\rho}{2}\\|x_{i}-x_{j}^{k}\\|^{2}]\Big{\}},$		(41)

and we let the penalty parameter $\rho=1$ to match the fore-mentioned denotation.

5.3 DSM algorithm

A well-known algorithm named distributed subgradient method (DSM)[13] is shown as follows,

\displaystyle x_{i}^{k+1}=\sum_{j=1}^{n}a_{j}^{i}(k)x_{j}^{k}-\alpha_{i}(k)d_{i}(k)

(42)

where the coefficients before states denote the weights, $\alpha(k)$ is the diminishing stepsize where we use $\alpha(k)=1/k^{1/2}$ in the simulation, and $d_{i}(k)$ is the subgradient of $f_{i}(x)$ at $x=x_{i}^{k}$ .

5.4 Simulation Performance

All the simulations are operated in MATLAB R2018b and its convex optimization toolbox(CVX) developed by Stanford. The experimental results are depicted in Fig. 5-8.

Fig. 5 depicts that original Problem (2) go to the optimal value $F(x^{*})=0.3014$ with Algorithm 2. The state updating records of $x_{i}^{k},i=1,2,...,9$ are given in Fig. 6, which shows the states reach a consensual optimal value $x^{*}=-0.3890$ . Fig. 7 shows that our proposed algorithm achieves good performance and is the fastest one among the compared algorithms. Fig. 8 illustrates the difference by differing the adjustable parameter and shows that higher parameter may cause worse performance.

6 Conclusion

In this paper, we improve the sequential distributed ADMM algorithm in [22], and propose a new ADMM algorithm which makes distributed ADMM parallelism, without adding more parameters or extra procedures, and we fully prove the optimality both for global cost function and the variables stored in agents. We also give numerical example to verify the possibility and effectiveness of our results. Our results also show similar performance with JPADMM in given example. Future work include adding privacy-preservation property for our proposed method and consider more general scenarios such as constrained distributed optimization problem in smart grids.

7 APPENDIX

7.1 Proof of lemma 2

Proof: We prove this lemma by mathematical induction. Let $n=2$ , it is easy to obtain that relation $\sum_{i=1}^{n}|F_{i}|=\sum_{i=1}^{n}|R_{i}|$ is true. Assume relation (12) holds in the case $n=k$ now. Then for $n=k+1$ , we add one node to the graph with the index $k+1$ . Without losing generality, suppose that agents $1,2,...,p$ are the neighbors of agent $k+1$ . then $|F_{k+1}|=p$ . In the meantime, $|R_{i}|$ of agents $1,2,...,p$ increases by 1 for agent $k+1$ belongs to $R_{i},i=1,2,...,p$ . i.e., the following relations hold:

	$\displaystyle\sum_{i=1}^{k+1}\|F_{i}\|$	$\displaystyle=\sum_{i=1}^{k}\|F_{i}\|+p,$
	$\displaystyle\sum_{i=1}^{k+1}\|R_{i}\|$	$\displaystyle=\sum_{i=1}^{k}\|R_{i}\|+p.$		(43)

It is implied by (7.1) that $\sum_{i=1}^{n}|F_{i}|=\sum_{i=1}^{n}|R_{i}|$ is true for $n=k+1$ . This lemma is completed. $\hfill\blacksquare$

References

[1] Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projection-based algorithms for source localization in wireless sensor networks,” IEEE Trans. Wirel. Commun., vol. 15, no. 6, Jun. 2015.
[2] H. Wu, and W. Wang, “A game theory based collaborative security detection method for internet of things systems,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 6, pp. 1432-1445, Jun. 2018.
[3] C. Zhang, M. Ahmad, and Y. Wang, “ADMM based privacy-preserving decentralized optimization,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 3, pp. 565-580, Mar. 2019.
[4] F. Salehisadaghiani, W. Shi, L. Pavel, “Distributed Nash equilibrium seeking under partial-decision information via the alternating direction method of multipliers,” Automatica, vol. 103, pp. 27-35, 2019.
[5] F. Salehisadaghiani, L. Pavel, “Distributed Nash equilibrium seeking: A gossip-based algorithm,” Automatica, vol. 72, pp. 209–216, 2016.
[6] C. Cortes, and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
[7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
[8] J. Yan et al.,“Parallel alternating direction method of multipliers,” Inf. Sci., vol 507, pp. 185–196, 2020.
[9] J. A. Bazerque, G. Mateos, G. B. Giannakis, “Distributed lasso for in-network linear regression,” in Proc. IEEE ICASSP, Dallas, TX, USA, 2010, pp. 2978-2981.
[10] Tao Yang et al, “A survey of distributed optimization,” Annu. Rev. Control, vol. 47, pp. 3278–305, 2019.
[11] F. Guo et al., “Hierarchical decentralized optimization architecture for economic dispatch: A New Approach for Large-Scale Power System” IEEE Trans. Ind. Inform., vol. 14, no. 2, pp. 523-534, Feb. 2018.
[12] F. Guo, C. Wen, J. Mao, G. Li, and Y. -D. Song, “A distributed hierarchical algorithm for multi-cluster constrained optimization,” Automatica, vol. 77, pp. 230–238, 2017.
[13] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Automat. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
[14] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Automat. Control, vol. 55, no. 4, pp. 922–938, Apr. 2010.
[15] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Trans. Automat. Control, vol. 63, no. 2, pp.434-448 Feb. 2018.
[16] J. C. Duchi, A. Agarwal, M. J. Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Trans. Automat. Control, vol. 57, no. 3, pp.592-606, Mar. 2012.
[17] C. Xi and U. A. Khan, “DEXTRA: A fast algorithm for optimization over directed graphs,” IEEE Trans. Automat. Control, vol. 62, no. 10, pp. 4980–4993, Feb. 2017.
[18] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM J. Optim., vol. 25, no. 2, pp. 944–966, Nov. 2015.
[19] P. Shi, S. Wei, J. Xu, A. Nedic, “Push-pull gradient Methods for distributed optimization in Networks,” IEEE Trans. Automat. Control, vol. 6, no. 1, pp. 1-16, Lan. 2021.
[20] H. H. Bauschke, and J. M. Borwein, “Dykstra’s alternating projection algorithm for two sets,” J. Approx. Theory, vol. 79, no. 3, pp. 418– 443, 1994.
[21] P. L. Combettes, and J. C. Pesquet, “A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery,” IEEE J. Sel. Top. Signal Process, vol. 1, no. 4, pp. 564–574, 2007.
[22] E. Wei and A. Ozdaglar, “Distributed alternating direction method of multipliers,” in Proc. 51st IEEE Conf. Decis. Control, Dec. 2012, pp. 5445–5450.
[23] W. Deng, M. Lai, Z. Peng, and W. Yin, “Parallel multi-block ADMM with $o(1/k)$ convergence,” J. Sci. Comput., vol. 71, no. 2, pp. 712–736, 2017.
[24] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Trans. Automat. Control, vol. 63, no. 2, Feb. 2018.
[25] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process., vol. 62, no. 7, pp. 1750–1761, Apr. 2014.
[26] R. A. Horn, C. R. Johnson, “Positive and nonnegative matrices,” in Matrix Analysis, 2^nd ed. Cambridge University Press, Cambridge, UK, 1985, ch. 8, pp. 517-553.
[27] W. Shi, Q. Ling, K. Yuan, G. Wu. w. Yin,“On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process., vol. 62, no. 7 pp. 1750-1761, Apr. 2014.
[28] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
[29] V. S. Mai and E. H. Abed, “Distributed optimization over weighted directed graphs using row stochastic matrix,” in American Control Conference, pp. 7165–7170, 2016.
[30] Y. Wang, W. Yin, J. Zeng, “Global convergence of ADMM in nonconvex nonsmooth optimization,” J. Sci. Comput., pp. 1–35, 2015.
[31] E. Wei and A. Ozdaglar, “On the $O(1/k)$ convergence of asynchronous distributed alternating direction method of multipliers.” in Proc. IEEE CDC, Maui, HI, USA, Dec. 10-13, 2012, pp. 5445–5450.
[32] A. Nedic and A. Olshevsky, “Distributed optimization over time-varying directed graphs,” IEEE Trans. Automat. Control, vol. 60, no. 3, pp. 601–615, Mar. 2015.
[33] A. Nedic and A. Olshevsky, “Stochastic gradient-push for strongly convex functions on time-varying directed graphs,” IEEE Trans. Automat. Control, vol. 61, no. 12, pp. 3936–3947, Dec. 2016.
[34] C. Xi, V. S. Mai, R. Xin, E. H. Abed, and U. A. Khan, “Linear convergence in optimization over directed graphs with row-stochastic matrices,” IEEE Trans. Automat. Control, vol. 63, no. 10, pp. 3558–3565, Oct. 2018.
[35] R. Zhang and J. T. Kwok, “Asynchronous distributed ADMM for consensus optimization,” in Proc. Int. Conf. Mach. Learn., Jun. 2014, pp. 1701–1709.
[36] B. S. He, H. K. Xu, and X. M. Yuan, “On the proximal jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMM,” J. Sci. Comput., vol. 66, no. 3, pp. 1204–1217, 2016.
[37] A. Simonetto, G. Leus, “Distributed maximum likelihood sensor network localization,” IEEE Trans. Signal Process., vol. 62, no. 6, pp. 1424–1437, Mar. 2014.

	$\displaystyle V^{k}-V^{k+1}-\frac{\rho}{2}\\|2E(x^{k+1}-$	$\displaystyle x^{k})-Ax^{k+1}\\|^{2}$
		$\displaystyle-\frac{\rho}{2}\\|B(x^{k+1}-x^{k})\\|^{2}\geqslant 0.$		(30)

	$\displaystyle\lim_{k\rightarrow\infty}\frac{\rho}{2}(\\|2E(x^{k+1}-x^{k})-$	$\displaystyle Ax^{k+1}\\|^{2}$
		$\displaystyle+\\|B(x^{k+1}-x^{k})\\|^{2})=0.$		(32)

	$\displaystyle\sum_{i=1}^{k+1}\|F_{i}\|$	$\displaystyle=\sum_{i=1}^{k}\|F_{i}\|+p,$
	$\displaystyle\sum_{i=1}^{k+1}\|R_{i}\|$	$\displaystyle=\sum_{i=1}^{k}\|R_{i}\|+p.$		(43)