Overlapping Schwarz Decomposition for Constrained Quadratic Programs

Sungho Shin, Mihai Anitescu, Victor M. Zavala S. Shin is with the Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA (e-mail: [email protected])M. Anitescu is with the Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA, and also with the Department of Statistics, University of Chicago, Chicago, IL 60637, USA (e-mail: [email protected])V. M. Zavala is with the Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA and with the Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA (e-mail: [email protected])

Abstract

We present an overlapping Schwarz decomposition algorithm for constrained quadratic programs (QPs). Schwarz algorithms have been traditionally used to solve linear algebra systems arising from partial differential equations, but we have recently shown that they are also effective at solving structured optimization problems. In the proposed scheme, we consider QPs whose algebraic structure can be represented by graphs. The graph domain is partitioned into overlapping subdomains (yielding a set of coupled subproblems), solutions for the subproblems are computed in parallel, and convergence is enforced by updating primal-dual information in the overlapping regions. We show that convergence is guaranteed if the overlap is sufficiently large and that the convergence rate improves exponentially with the size of the overlap. Convergence results rely on a key property of graph-structured problems that is known as exponential decay of sensitivity. Here, we establish conditions under which this property holds for constrained QPs (as those found in network optimization and optimal control), thus extending existing work that addresses unconstrained QPs. The numerical behavior of the Schwarz scheme is demonstrated by using a DC optimal power flow problem defined over a network with 9,241 nodes.

I Introduction

Structured optimization problems arise in many applications such as trajectory planning and model predictive control [1], multistage stochastic programming [2, 3], optimization with embedded partial differential equations (PDEs) [4], and in optimization of energy infrastructures [5, 6, 7]. Decomposition schemes that exploit the problem structure have been used to overcome scalability limits of centralized schemes [8, 9, 10, 11, 12]. A wide range of decomposition schemes have been proposed in the literature such as Lagrangian decomposition [13], the alternating direction method of multipliers (ADMM) [14], and Jacobi/Gauss-Seidel methods [15]. The basic tenet behind such algorithms is to decompose the original problem into subproblems and to coordinate subproblem solutions by using primal-dual information. These decomposition schemes are different from waterfilling methods [16] in that the constraints are not necessarily satisfied at each iteration within the algorithm. Moreover, these schemes are advantageous in that they can handle a wide variety of objective/constraint formulations and in that they have well-understood convergence properties. However, a disadvantage of these schemes is that convergence can be rather slow [17].

Overlapping Schwarz decomposition schemes have been recently used to solve structured optimization problems, and they have demonstrated to outperform popular schemes such as ADMM and Jacobi/Gauss-Seidel in certain problem classes [18]. Schwarz algorithms were originally developed for the parallel solution of linear algebra systems arising in PDEs, but such schemes can also be used to handle general linear systems and optimization problems by exploiting their underlying algebraic topology [19]. As the name suggests, overlapping Schwarz algorithms decompose the full problem (the underlying graph) into subproblems that are defined over overlapping subdomains. Solutions for the subproblems are computed in parallel and convergence is enforced by updating primal-dual information in the overlapping regions. In the context of quadratic programs (QP) that are unconstrained and convex, we have shown that the convergence rate of Schwarz algorithms improves exponentially with the size of the overlapping region [18]. Overlapping Schwarz schemes provide a bridge between fully decentralized Jacobi/Gauss-Seidel algorithms (no overlap) and centralized algorithms (the overlap is the entire domain).

This paper presents an overlapping Schwarz algorithm for constrained QPs. We analyze the convergence properties of the algorithm and derive an explicit relationship between its convergence rate and the size of the overlap. This result extends existing convergence results for unconstrained QPs [18] to equality- and inequality-constrained QPs. In particular, we show that the algorithm converges with sufficiently large overlap and that the convergence rate improves exponentially with the size of overlap. This convergence result relies on a property called exponential decay of sensitivity [20, 21, 22, 23, 18]. The property states that the sensitivity of the primal-dual solution at a given node decays exponentially with respect to the distance from the perturbation. Such a property has been established for discrete-time [20, 21] and continuous-time [22, 23] optimal control problems (the graph is a line) and for unconstrained QPs (general graph) [18]. This paper establishes the exponential sensitivity property for constrained QPs over general graphs, thus making the results broadly applicable.

The paper is organized as follows. In the remainder of this section we introduce basic notation and the problem under study. In Section II we introduce the overlapping Schwarz algorithm. In Section III we present the main theoretical results. We first analyze the sensitivity of the solution of structured QPs against parametric perturbations and then use the results to establish convergence conditions for the algorithm. Numerical results are given in Section IV.

Notation. The set of real numbers and the set of integers are denoted by $\mathbb{R}$ and $\mathbb{I}$ , respectively, and we define $\mathbb{I}_{a:b}:=\mathbb{I}\cap[a,b]$ , $\mathbb{I}_{>0}:=\mathbb{I}\cap(0,\infty)$ , $\mathbb{R}_{>0}:=(0,\infty)$ , and $\overline{\mathbb{R}}:=\mathbb{R}\cup\{\infty\}$ . By default, vectors are assumed to be column vectors, and we use the syntax $(M_{1},\cdots,M_{n}):=[M_{1}^{\top}\,\cdots\,M_{n}^{\top}]^{\top}$ , $\{M_{i}\}_{i\in\mathcal{I}}:=(M_{i_{1}},\cdots,M_{i_{m}})$ , and $\{M_{i,j}\}_{i\in\mathcal{I},j\in\mathcal{J}}:=\{\{M_{i,j}^{\top}\}_{j\in\mathcal{J}}^{\top}\}_{i\in\mathcal{I}}$ , where $\mathcal{I}=\{i_{1}<\cdots<i_{m}\}$ and $\mathcal{J}=\{j_{1}<\cdots<j_{n}\}$ . Furthermore, $v[i]$ is the $i$ th component of $v$ , $M[i,j]$ is the $(i,j)$ th component of $M$ , $v[\mathcal{I}]:=\{v[i]\}_{i\in\mathcal{I}}$ , and $M[\mathcal{I},\mathcal{J}]:=\{M[i,j]\}_{i\in\mathcal{I},j\in\mathcal{J}}$ . Vector 2-norms and induced 2-norms are denoted by $\|\cdot\|$ . For matrices $A$ and $B$ , $A\succeq B$ indicates that $A-B$ is positive semi-definite while $A\geq B$ represents a componentwise inequality.

Setting. We consider a (potentially infinite) parent graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ is the set of nodes and $\mathcal{E}$ is the set of edges. We also consider the finite node subset $U\subseteq\mathcal{V}$ and the following QP:


$\displaystyle\;\min_{\{x_{i}\}_{i\in U}}\;$	$\displaystyle\sum_{i\in U}\sum_{j\in N_{U}[i]}\frac{1}{2}x_{i}^{\top}Q_{i,j}x_{j}-\sum_{i\in U}f_{i}^{\top}x_{i}$	(1a)
$\displaystyle\mathop{\text{s.t.}}\;$	$\displaystyle\sum_{j\in N_{U}[i]}A^{\mathcal{E}}_{i,j}x_{j}=g^{\mathcal{E}}_{i},\;(\lambda^{\mathcal{E}}_{i}),\;i\in U$	(1b)
	$\displaystyle\sum_{j\in N_{U}[i]}A^{\mathcal{I}}_{i,j}x_{j}\geq g^{\mathcal{I}}_{i},\;(\lambda^{\mathcal{I}}_{i}),\;i\in U.$	(1c)

Here $x_{i}\in\mathbb{R}^{r_{i}}$ are the decision variables for node $i$ ; $\lambda^{\mathcal{E}}_{i}\in\mathbb{R}^{m^{\mathcal{E}}_{i}}$ and $\lambda^{\mathcal{I}}_{i}\in\mathbb{R}^{m^{\mathcal{I}}_{i}}$ are the dual variables; $Q_{i,j}\in\mathbb{R}^{r_{i}\times r_{j}}$ , $A^{\mathcal{E}}_{i,j}\in\mathbb{R}^{m^{\mathcal{E}}_{i}\times r_{j}}$ , $A^{\mathcal{I}}_{i,j}\in\mathbb{R}^{m^{\mathcal{I}}_{i}\times r_{j}}$ , $f_{i}\in\mathbb{R}^{r_{i}}$ , $g^{\mathcal{E}}_{i}\in\mathbb{R}^{m^{\mathcal{E}}_{i}}$ , and $g^{\mathcal{I}}_{i}\in\mathbb{R}^{m^{\mathcal{I}}_{i}}$ are the data; and $N_{U}[X]:=N_{U}(X)\cup X$ , where $N_{U}(X):=\{j\in U\setminus X:\exists i\in X\text{ such that }\{i,j\}\in\mathcal{E}\}$ and the argument is considered as a singleton if $X$ is a single node. We define $A_{i,j}:=(A^{\mathcal{E}}_{i,j},A^{\mathcal{I}}_{i,j})$ , $g_{i}:=(g^{\mathcal{E}}_{i},g^{\mathcal{I}}_{i})$ , $\lambda_{i}:=(\lambda^{\mathcal{E}}_{i},\lambda^{\mathcal{I}}_{i})$ , $z_{i}:=(x_{i},\lambda_{i})$ , $d_{i}:=(f_{i},g_{i})$ , $m_{i}:=m^{\mathcal{E}}_{i}+m^{\mathcal{I}}_{i}$ , and $n_{i}:=r_{i}+m_{i}$ . We assume that $Q_{i,j}=Q_{j,i}^{\top}$ .

An equivalent problem can be written in a compact form:

	$\displaystyle P_{U}(\boldsymbol{d}_{U}):\;\min_{\boldsymbol{x}_{U}}\;$	$\displaystyle\frac{1}{2}\boldsymbol{x}_{U}^{\top}\boldsymbol{Q}_{U}\boldsymbol{x}_{U}-\boldsymbol{f}^{\top}_{U}\boldsymbol{x}_{U}$
	$\displaystyle\mathop{\text{s.t.}}\;$	$\displaystyle\boldsymbol{A}^{\mathcal{E}}_{U}\boldsymbol{x}_{U}=\boldsymbol{g}^{\mathcal{E}}_{U},\;(\boldsymbol{\lambda}^{\mathcal{E}}_{U})$
		$\displaystyle\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}_{U}\geq\boldsymbol{g}^{\mathcal{I}}_{U},\;(\boldsymbol{\lambda}^{\mathcal{I}}_{U}).$

Here, $\boldsymbol{x}_{U}:=\{x_{i}\}_{i\in U}$ ; $\boldsymbol{\lambda}^{\mathcal{E}}_{U}:=\{\lambda^{\mathcal{E}}_{i}\}_{i\in U}$ ; $\boldsymbol{\lambda}^{\mathcal{I}}_{U}:=\{\lambda^{\mathcal{I}}_{i}\}_{i\in U}$ ; $\boldsymbol{\lambda}_{U}:=\{\lambda_{i}\}_{i\in U}$ ; $\boldsymbol{z}_{U}:=\{z_{i}\}_{i\in U}$ ; $\boldsymbol{f}_{U}:=\{f_{i}\}_{i\in U}$ ; $\boldsymbol{g}^{\mathcal{E}}_{U}:=\{g^{\mathcal{E}}_{i}\}_{i\in U}$ ; $\boldsymbol{g}^{\mathcal{I}}_{U}:=\{g^{\mathcal{I}}_{i}\}_{i\in U}$ ; $\boldsymbol{g}_{U}:=\{g_{i}\}_{i\in U}$ ; $\boldsymbol{d}_{U}:=\{d_{i}\}_{i\in U}$ ; $\boldsymbol{Q}_{U}=\{Q_{i,j}\}_{i,j\in U}$ , $\boldsymbol{A}^{\mathcal{E}}_{U}:=\{A^{\mathcal{E}}_{i,j}\}_{i,j\in U}$ ; $\boldsymbol{A}^{\mathcal{I}}_{U}:=\{A^{\mathcal{I}}_{i,j}\}_{i,j\in U}$ ; $\boldsymbol{A}_{U}:=\{A_{i,j}\}_{i,j\in U}$ ; $r_{U}:=\sum_{i\in U}r_{i}$ ; $m_{U}=\sum_{i\in U}m_{i}$ ; and $n_{U}=\sum_{i\in U}n_{i}$ . The problem is denoted as the parametric form $P_{U}(\boldsymbol{d}_{U})$ .

II Overlapping Schwarz Algorithm

This section introduces the Schwarz algorithm for the solution of $P_{V}(\boldsymbol{d}_{V})$ (referred to as the full problem) with $V\subseteq\mathcal{V}$ . We consider a non-overlapping partition $\{V_{k}\subseteq V\}_{k=1}^{K}$ of $V$ and an overlapping partition $\{W_{k}\subseteq V\}_{k=1}^{K}$ of $V$ such that $V_{k}\subseteq W_{k}$ holds for $k\in\mathbb{I}_{1:K}$ . We call $V_{1},\cdots,V_{K}$ original subdomains and $W_{1},\cdots,W_{K}$ expanded subdomains. The Schwarz algorithm is defined below.

Algorithm 1 Overlapping Schwarz Algorithm

\boldsymbol{z}^{(0)}_{V}

\{V_{k}\}_{k=1}^{K}

\{W_{k}\}_{k=1}^{K}

2:for

\ell=0,1,\cdots

3: for (in parallel)

k=1

K

\boldsymbol{z}^{(\ell+1)}_{V_{k}}=T_{V_{k}\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})

5: end for

6:end for

\boldsymbol{z}^{(\ell)}_{V}

Here, we use a syntax that can be applied to any $U\subseteq\mathcal{V}$ : $\boldsymbol{H}_{U}:=\{H_{i,j}\}_{i,j\in U}$ ; $\boldsymbol{H}_{-U}:=\{H_{i,j}\}_{i\in U,j\in N_{V}(U)}$ ; $H_{i,j}:=[Q_{i,j}\;A_{j,i}^{\top};A_{i,j}\;0]$ ; and $\boldsymbol{z}_{-U}:=\{z_{i}\}_{i\in N_{V}(U)}$ . Furthermore, $\boldsymbol{z}^{*}_{U}(\cdot)$ is the primal-dual solution mapping of the parametric optimization problem $P_{U}(\cdot)$ ; $T_{U_{1}\leftarrow U_{2}}:=\{T_{i,j}\}_{i\in U_{1},j\in U_{2}}$ , where $U_{1},U_{2}\subseteq\mathcal{V}$ and $T_{i,j}=I_{n_{i}\times n_{i}}$ if $i=j$ and $0_{n_{i}\times n_{j}}$ otherwise. Note that $\boldsymbol{z}_{-U}$ is supposed to represent the solution information that is complementary to $U$ . The full complementary solution information includes the solution on $V\setminus U$ . However, the variables and constraints in $U$ are coupled only with $N_{V}(U)$ , so it suffices to incorporate information only for the coupled complementary region $N_{V}(U)$ . Therefore, we will abuse the term complementary solution to represent the coupled complementary solution $\boldsymbol{z}_{-U}$ .

The core part of the algorithm (line 3) consists of three steps: subproblem solution, solution restriction, and primal-dual exchange. In the first step, one formulates the subproblem for the $k$ th subdomain as $P_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})$ (this formulation will be justified later in Lemma 6). The subproblem incorporates complementary solution information $\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}}$ , which is obtained during the third inner step of the previous iteration step. The subproblem is solved to obtain its solution $\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})$ . Here, we observe that solution multiplicity exists at the overlapping region. To remove such multiplicity, we restrict the solution in the second step. In particular, we abandon the primal-dual solutions associated with $W_{k}\setminus V_{k}$ (subdomain region acquired by expansion) and take only those solutions associated with $V_{k}$ (the original subdomain region). This procedure is represented by the restriction operator $T_{V_{k}\leftarrow W_{k}}$ . After restriction, the solutions are assembled over $k\in\mathbb{I}_{1:K}$ to make the next guess of the solution $\boldsymbol{z}^{(\ell+1)}_{V}$ . In the third step, the primal-dual solutions are exchanged across the subdomains to update the complementary information $\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}}$ for each subproblem. The schematic of the algorithm is shown in Fig. 1. The algorithm can be implemented in a fully decentralized manner, and different updating schemes can be used (e.g., Gauss-Seidel or asynchronous) [18].

Figure 1: Schematic representation of the overlapping Schwarz algorithm.

III Main Results

In this section we analyze the convergence of the Schwarz algorithm. We will see that parametric sensitivity plays a central role in convergence behavior because, intuitively, primal-dual solutions of the neighbors of a subdomain enter as parametric perturbations. We first analyze the parametric sensitivity of $P_{U}(\cdot)$ in a way that the result can be generally applied to any node subset $U\subseteq\mathcal{V}$ . We then exploit this sensitivity result to establish convergence.

III-A Preliminaries

Assumption 1.

The following is assumed for $U\subseteq\mathcal{V}$ .

(a)

$\boldsymbol{Q}_{U}\succeq 0$ .
(b)

Given a convex set $\mathbb{D}_{U}\subseteq\mathbb{R}^{m_{U}}$ , for any $P_{U}(\boldsymbol{d}_{U})$ with $\boldsymbol{d}_{U}\in\mathbb{D}_{U}$ , there exists a primal-dual solution at which the second-order sufficient condition (SOSC) and linear independence constraint qualification (LICQ) hold.

By Assumption 1, there exists a unique primal-dual solution of $P_{U}(\boldsymbol{d}_{U})$ for any $\boldsymbol{d}_{U}\in\mathbb{D}_{U}$ . Thus, the primal-dual solution mapping $\boldsymbol{z}^{*}_{U}:\mathbb{D}_{U}\rightarrow\mathbb{R}^{n_{U}}$ is well defined.

Definition 1.

Consider $P_{U}(\cdot)$ and $B\subseteq\mathbb{I}_{1:n_{U}}$ .

(a)

$B$ is called a basis if $\boldsymbol{H}_{U}[B,B]$ is nonsingular.

(b)

$\boldsymbol{z}_{U}(\boldsymbol{d}_{U})\in\mathbb{R}^{n_{U}}$ is called a basic solution of $P_{U}(\boldsymbol{d}_{U})$ if


$\displaystyle\boldsymbol{H}_{U}[B,B]\boldsymbol{z}_{U}(\boldsymbol{d}_{U})[B]$	$\displaystyle=\boldsymbol{d}_{U}[B],$	(2a)
$\displaystyle\boldsymbol{z}_{U}(\boldsymbol{d}_{U})[\mathbb{I}_{1:n_{U}}\setminus B]$	$\displaystyle=0.$	(2b)

(c)

$B$ is feasible (optimal) for $P_{U}(\boldsymbol{d}_{U})$ if its basic solution is feasible (optimal) for $P_{U}(\boldsymbol{d}_{U})$ .

Definition 1 is an extension of the notion of basis for linear programs (studied in [21]). Note that the basis associated with a basic solution may not be unique for the case where there exist zero components in $\boldsymbol{x}^{*}_{U}(\boldsymbol{d}_{U})$ or strict complementarity does not hold. We consider $\boldsymbol{z}^{B}_{U}:\mathbb{R}^{m_{U}}\rightarrow\mathbb{R}^{n_{U}}$ as the basic solution mapping for $B$ .

Lemma 1.

Let Assumption 1 hold. For $\boldsymbol{d}_{U}\in\mathbb{D}_{U}$ , there exists $B\in\mathbb{B}_{U}$ that is a basis for $\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U})$ , where $\mathbb{B}_{U}:=\{B\subseteq\mathbb{I}_{1:n_{U}}:\exists\boldsymbol{d}_{U}\in\mathbb{D}_{U}\text{ such that }B\text{ is optimal for }P_{U}(\boldsymbol{d}_{U})\}$ .

Proof.

We let

\displaystyle\widehat{\boldsymbol{H}}:=\begin{bmatrix}\boldsymbol{Q}_{U}&\boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:]^{\top}\\ \boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:]\end{bmatrix},

(3)

where $\mathcal{A}^{*}(\boldsymbol{d}_{U})$ is the active constraint set evaluated at the solution of $P_{U}(\boldsymbol{d}_{U})$ . From Assumption 1, $\boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:]$ has full row-rank by LICQ, and the reduced Hessian associated with $\widehat{\boldsymbol{H}}$ is positive definite by SOSC. These imply that $\widehat{\boldsymbol{H}}$ is nonsingular [24, Lemma 16.1]. We choose $B:=\{i\in\mathbb{I}_{1:n_{U}}:\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U})[i]\neq 0\}$ . One can show that $\boldsymbol{H}_{U}[B,B]$ is a permuted principal submatrix of $\widehat{\boldsymbol{H}}$ . Therefore, ${\boldsymbol{H}}_{U}[B,B]$ is nonsingular, and this yields that $B$ is a basis. From the KKT conditions for $P_{U}(\boldsymbol{d}_{U})$ and the definition of $B$ , (2) holds for $\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U})$ . Thus, we have $\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U})=\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U})$ . ∎

Lemma 2.

Let $q_{1},\cdots,q_{N_{q}}:(a,b)\rightarrow\mathbb{R}$ be quadratic functions. We let $q:(a,b)\rightarrow\mathbb{R}$ be $q(\cdot):=\min_{i\in\mathbb{I}_{1:{N_{q}}}}q_{i}(\cdot)$ . Then there exists $\{a_{0}=a,a_{1},\cdots,a_{K_{q}}=b\}$ such that for each $k\in\mathbb{I}_{1:{K_{q}}}$ , there exists $i\in\mathbb{I}_{1:{N_{q}}}$ such that $q_{i}(\cdot)=q(\cdot)$ on $(a_{k-1},a_{k})$ .

Proof.

For each $(i,j)\in\mathbb{I}_{1:{N_{q}}}\times\mathbb{I}_{1:{N_{q}}}$ , we let $I_{i,j}:=\{x\in(a,b):q_{i}(x)\leq q_{j}(x)\}$ . Since $q_{i}(\cdot)$ and $q_{j}(\cdot)$ are quadratic, we have that $I_{i,j}$ is obtained as a union of intervals (not necessarily open or closed). Then we can define $I_{i}:=\bigcap_{j\in\mathbb{I}_{1:{N_{q}}}}I_{i,j}$ . Since $I_{i,j}$ are unions of intervals, we have that $I_{i}$ is also a union of intervals. By collecting the end points of the intervals in $\{I_{i}:i\in\mathbb{I}_{1:{N_{q}}}\}$ , we can construct $\{a_{0},\cdots,a_{K_{q}}\}$ . In particular, $\{a_{0},\cdots,a_{K_{q}}\}=\bigcup_{i\in\mathbb{I}_{1:{N_{q}}}}\text{closure}(I_{i})\setminus\mathop{\text{interior}}(I_{i})$ . We observe that $\bigcup_{i\in\mathbb{I}_{1:{N_{q}}}}I_{i}=(a,b)$ ; thus, for any $k\in\mathbb{I}_{1:{K_{q}}}$ , $(a_{k-1},a_{k})\subseteq I_{i}$ for some $i\in\mathbb{I}_{1:{N_{q}}}$ . This means $q_{i}(\cdot)=q(\cdot)$ on $(a_{k-1},a_{k})$ . The proof is complete. ∎

Lemma 3.

Let Assumption 1 hold. For $\boldsymbol{d}_{U},\boldsymbol{d}_{U}^{\prime}\in\mathbb{D}_{U}$ , there exist $\{s_{0}=0<\cdots<s_{N_{d}}=1\}$ and $\{B_{k}\in\mathbb{B}_{U}\}_{k=1}^{N_{d}}$ such that for $k\in\mathbb{I}_{1:N_{d}}$ , $B_{k}$ is optimal for $P_{U}(\boldsymbol{d}^{s}_{U})$ with any $s\in[s_{k-1},s_{k}]$ , where $\boldsymbol{d}^{s}_{U}:=(1-s)\boldsymbol{d}_{U}+s\boldsymbol{d}^{\prime}_{U}$ .

Proof.

Let $\mathbb{B}_{U}=\{B_{(1)},\cdots,B_{(T)}\}$ (note that $\mathbb{B}_{U}$ is finite). We define $\pi_{(t)}:[0,1]\rightarrow\overline{\mathbb{R}}$ to be the mapping from $s\in[0,1]$ to the objective value of $\boldsymbol{z}^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U})$ for $P_{U}(\boldsymbol{d}^{s}_{U})$ (the value is $+\infty$ if $B_{(t)}$ is infeasible). Also, we define $\pi:[0,1]\rightarrow\overline{\mathbb{R}}$ by $\pi(s):=\min_{t\in\mathbb{I}_{1:T}}\pi_{(t)}(s)$ . By Lemma 1, $\pi(\cdot)$ is the objective value mapping of $P_{U}(\boldsymbol{d}^{s}_{U})$ from $s\in[0,1]$ .

One can see that $z^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U})$ is affine in $s$ ; thus the feasibility conditions for $z^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U})$ can be expressed by a finite number of affine equalities and inequalities. This implies that the set of $s\in[0,1]$ on which $\pi(s)<\infty$ is obtained as a closed interval in $[0,1]$ . Accordingly, $\pi_{(t)}(\cdot)$ is a quadratic function on a closed interval support. Now, we collect the endpoints of such intervals over $t\in\mathbb{I}_{1:T}$ to construct $\widetilde{\Pi}:=\{\widetilde{s}_{0}=0,\cdots,\widetilde{s}_{\widetilde{N}_{d}}=1\}$ . For each $k\in\mathbb{I}_{1:\widetilde{N}_{d}}$ , we collect $\mathcal{T}_{k}:=\{t\in\mathbb{I}_{1:T}:\pi_{(t)}(\cdot)<\infty$ on $(\widetilde{s}_{k-1},\widetilde{s}_{k})\}$ . Observe that (i) $\pi_{(t)}$ with $t\in\mathcal{T}_{k}$ are quadratic on $(\widetilde{s}_{k-1},\widetilde{s}_{k})$ and (ii) $\pi(\cdot)=\min_{t\in\mathcal{T}_{k}}\pi_{(t)}(\cdot)$ on $(\widetilde{s}_{k-1},\widetilde{s}_{k})$ . By Lemma 2 (stated below, applicable due to (i)), we have that each $(\widetilde{s}_{k-1},\widetilde{s}_{k})$ can be further divided by using $\widetilde{\widetilde{\Pi}}_{k}:=\{\widetilde{\widetilde{s}}_{k,0}=\widetilde{s}_{k-1},\cdots,\widetilde{\widetilde{s}}_{k,\widetilde{\widetilde{N}}_{d,k}}=\widetilde{s}_{k}\}$ , where for each $(\widetilde{\widetilde{s}}_{k,k^{\prime}-1},\widetilde{\widetilde{s}}_{k,k^{\prime}})$ with $k^{\prime}\in\mathbb{I}_{1:\widetilde{\widetilde{N}}_{d,k}}$ and $k\in\mathbb{I}_{1:\widetilde{N}_{d}}$ , there exists $t\in\mathcal{T}_{k}$ such that $\pi_{(t)}=\min_{t\in\mathcal{T}_{k}}\pi_{(t)}(\cdot)=\pi(\cdot)$ (recall observation (ii)).

We now let $\{s_{0},\cdots,s_{N_{d}}\}=\bigcup_{k\in\mathbb{I}_{1:\widetilde{N}_{d}}}\widetilde{\widetilde{\Pi}}_{k}$ . One can observe that, for each $k\in\mathbb{I}_{1:N_{d}}$ , there exists $t\in\mathbb{I}_{1:T}$ such that $\pi_{(t)}(\cdot)=\pi(\cdot)$ on $(s_{k-1},s_{k})$ . We choose such $B_{(t)}$ as $B_{k}$ ; it is known that the objective value mapping of a QP is continuous on its support (e.g., see [25, Corollary 9] or [26, Theorem 5.53]); thus $\pi(\cdot)$ is continuous on its support. By the continuity of $\pi(\cdot)$ and $\pi_{(t)}(\cdot)$ on their supports, we have that $\pi_{(t)}(\cdot)=\pi(\cdot)$ on $(s_{k-1},s_{k})$ implies that the same holds on $[s_{k-1},s_{k}]$ . Finally, one can check that $B_{k}$ is optimal for $P_{U}(\boldsymbol{d}^{s}_{U})$ with $s\in[s_{k-1},s_{k}]$ . The desired $\{s_{k}\}_{k=0}^{N_{d}}$ and $\{B_{k}\}_{k=1}^{N_{d}}$ are thus obtained.

∎

An important implication of Lemma 3 is that the solution path obtained by the perturbation on $\boldsymbol{d}_{U}$ can be divided into multiple paths, each of which is a basic solution mapping. Thus, given Lemma 3, it suffices to study the sensitivities of the basic solution mappings.

III-B Exponential Decay of Sensitivity

We now establish our main sensitivity result for the constrained QP, known as exponential decay of sensitivity.

Theorem 1 (Exponential Decay of Sensitivity).

Let Assumption 1 hold. The following holds for $\boldsymbol{d}_{U},\boldsymbol{d}^{\prime}_{U}\in\mathbb{D}_{U}$ :

\displaystyle\left\|T_{i\leftarrow U}\left(\boldsymbol{z}^{*}_{U}(\boldsymbol{d})-\boldsymbol{z}^{*}_{U}(\boldsymbol{d}^{\prime})\right)\right\|

\displaystyle=\sum_{j\in U}\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}\|d_{j}-d^{\prime}_{j}\|

with $\Gamma_{U}:=\overline{\sigma}_{U}/\underline{\sigma}^{2}_{U}$ ; $\rho_{U}:=(\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2})/(\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2})$ ; $\displaystyle\underline{\sigma}_{U}:=\min_{B\in\mathbb{B}_{U}}\sigma_{\min}(\boldsymbol{H}_{U}[B,B])$ ; $\displaystyle\overline{\sigma}_{U}:=\max_{B\in\mathbb{B}_{U}}\sigma_{\max}(\boldsymbol{H}_{U}[B,B])$ .

Here $\lceil\cdot\rceil$ denotes the ceiling operator, and $\Delta_{U}(i,j)$ denotes the geodesic distance between $i,j\in U$ on the subgraph of $\mathcal{G}(\mathcal{V},\mathcal{E})$ induced by $U$ (the number of elements in the shortest path $\{e_{q}\in\mathcal{E}:e_{q}\subseteq U\}_{q=1}^{n_{e}}$ from $i$ to $j$ ); $\sigma_{\min}$ and $\sigma_{\max}$ denote the minimum and maximum singular values of their matrix argument.

To facilitate the discussion, we define for $M\in\mathbb{R}^{|B|\times|B|}$ and $v\in\mathbb{R}^{|B|}$ the following: $M_{[i][j]}:=M[\mathcal{I}_{i},\mathcal{I}_{j}]$ , $v_{[i]}:=v[\mathcal{I}_{i}]$ , where $\mathcal{I}_{i}:=\{q\in\mathbb{I}_{1:|B|}:b_{q}\in\{\sum_{k\in U,k<i}n_{k}+1,\cdots,\sum_{k\in U,k\leq i}n_{k}\}\}$ , and $B=\{b_{1}<\cdots<b_{|B|}\}$ . Note that $\mathcal{I}_{i}$ is the index set that extracts the indices associated with $z_{i}$ , and $\{\mathcal{I}_{i}\}_{i\in U}$ partitions $\mathbb{I}_{1:|B|}$ .

Lemma 4.

If $\Delta_{U}(i,j)>q\in\mathbb{I}_{>0}$ , $({\boldsymbol{H}}_{U}[B,B]^{q})_{[i][j]}=0$ .

Proof.

We use the notation $\widetilde{\boldsymbol{H}}:=\boldsymbol{H}_{U}[B,B]$ . We proceed by induction. From the fact that $H_{i,j}=0$ if $\Delta_{U}(i,j)>1$ , one can observe that $\widetilde{\boldsymbol{H}}_{[i][j]}=0$ if $\Delta_{U}(i,j)>1$ . Hence, the claim holds for $q=1$ . Now suppose the claim holds for $q=q^{\prime}$ . One can easily see that triangle inequality holds for distance $\Delta_{U}(\cdot,\cdot)$ . From triangle inequality, if $\Delta_{U}(i,j)>q^{\prime}+1$ , either $\Delta_{U}(i,l)>q^{\prime}$ or $\Delta_{U}(j,l)>1$ holds for any $l\in U$ . Thus, if $\Delta_{U}(i,j)>q^{\prime}+1$ , then $(\widetilde{\boldsymbol{H}}^{q^{\prime}+1})_{[i][j]}=\sum_{l\in U}(\widetilde{\boldsymbol{H}}^{q^{\prime}})_{[i][l]}\widetilde{\boldsymbol{H}}_{[l][j]}=0$ . ∎

Lemma 5.

$\|(\boldsymbol{H}_{U}[B,B]^{-1})_{[i][j]}\|\leq\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}$ , if $B\in\mathbb{B}_{U}$ .

Proof.

By definition, $\underline{\sigma}_{U}^{2}I\preceq\widetilde{\boldsymbol{H}}^{2}\preceq\overline{\sigma}_{U}^{2}I$ , and thus

\displaystyle\frac{\underline{\sigma}_{U}^{2}-\overline{\sigma}_{U}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}I\preceq I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\preceq\frac{-\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}I.

(4)

Moreover,


$\displaystyle\widetilde{\boldsymbol{H}}^{-1}$	$\displaystyle=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}\left(\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{-1}$	(5a)
	$\displaystyle=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=0}^{\infty}\widetilde{\boldsymbol{H}}\left(I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{q},$	(5b)

where the second equality follows from [27, Theorem 5.6.9 and Corollay 5.6.16], (4), and the fact that the series in (5b) converges. By Lemma 4, if $\Delta_{U}(i,j)>2q+1$ , then

\displaystyle\left(\widetilde{\boldsymbol{H}}\left(I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{q}\right)_{[i][j]}=0

(6)

holds. By extracting submatrices from (5), one establishes

\displaystyle(\widetilde{\boldsymbol{H}}^{-1})_{[i][j]}=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\left(\widetilde{\boldsymbol{H}}\left(I-\frac{2\widetilde{\boldsymbol{H}}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\right)^{q}\right)_{[i][j]},

where $q_{0}:=\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil$ since the summation over $q=0,\cdots,q_{0}-1$ adds up to zero by (6). Using the triangle inequality and the fact that the matrix norm of a submatrix is always smaller than that of the original matrix, we have

$\displaystyle\\|(\widetilde{\boldsymbol{H}}^{-1})_{[i][j]}\\|$	$\displaystyle\leq\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\left\\|\widetilde{\boldsymbol{H}}\left(I-\frac{2\widetilde{\boldsymbol{H}}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\right)^{q}\right\\|$
	$\displaystyle\leq\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\overline{\sigma}_{U}\left(\frac{\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2}}{\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2}}\right)^{q}$
	$\displaystyle\leq\frac{\overline{\sigma}_{U}}{\underline{\sigma}_{U}^{2}}\left(\frac{\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2}}{\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2}}\right)^{\lceil(\Delta_{U}(i,j)-1)/2\rceil},$	(7)

where the second inequality follows from the submultiplicativity of the matrix norm and the fact that the induced 2-norm of a symmetric matrix is equal to its largest eigenvalue; the last inequality follows from geometric series. ∎

Proof of Theorem 1.

From Lemma 3, we have

\displaystyle\boldsymbol{z}^{*}_{U}(\boldsymbol{d})-\boldsymbol{z}^{*}_{U}(\boldsymbol{d}^{\prime})=\sum_{k=1}^{N_{d}}\boldsymbol{z}^{B_{k}}_{U}(\boldsymbol{d}^{s_{k}}_{U})-\boldsymbol{z}^{B_{k}}_{U}(\boldsymbol{d}^{s_{k-1}}_{U}).

(8)

The solution does not jump when the basis changes, because of solution uniqueness. From the block multiplication formula and $\boldsymbol{d}^{s_{k}}_{U}-\boldsymbol{d}^{s_{k-1}}_{U}=(s_{k}-s_{k-1})(\boldsymbol{d}_{U}-\boldsymbol{d}^{\prime}_{U})$ , we have

		$\displaystyle\left(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k}})[B]-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k-1}})[B]\right)_{[i]}$		(9)
		$\displaystyle\quad=\sum_{j\in U}\left(\widetilde{\boldsymbol{H}}^{-1}\right)_{[i][j]}(s_{k}-s_{k-1})(\boldsymbol{d}_{U}[B]-\boldsymbol{d}^{\prime}_{U}[B])_{[j]}.$

From the definition of bases and basic solutions, we have:

		$\displaystyle\\|\left(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k}})[B]-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k-1}})[B]\right)_{[i]}\\|$		(10)
		$\displaystyle\quad=\left\\|T_{i\leftarrow U}(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k}})-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k-1}}))\right\\|.$

From (9)-(10), the triangle inequality, the submultiplicativity of matrix norms, Lemma 5, and the fact that the norm of a subvector is not greater than the norm of the original vector,

		$\displaystyle\left\\|T_{i\leftarrow U}(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k}})-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k-1}}))\right\\|$		(11)
		$\displaystyle\quad\leq\Gamma_{U}\rho_{U}^{\lceil(\Delta_{U}(i,j)-1)/2\rceil}\left(s_{k}-s_{k-1})\\|d_{j}-d^{\prime}_{j}\right\\|.$

By multiplying (8) by $T_{i\leftarrow U}$ and by applying (11) and the triangle inequality, we obtain the result. ∎

The coefficient:

\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}

in Theorem 1 is the componentwise Lipschitz constant of the solution mapping $\boldsymbol{z}^{*}_{U}(\cdot)$ . Recall that $\rho_{U}\in(0,1)$ ; hence, the coefficient decays exponentially with the distance $\Delta_{U}(i,j)$ . Therefore, the effect of the perturbation decays as one moves away from the perturbation location.

III-C Convergence Analysis

In this subsection, we formally establish the convergence of Algorithm 1. To facilitate the later discussion, we first introduce some notation: $\boldsymbol{z}^{\dagger}_{U}:=\{z^{\dagger}_{i}\}_{i\in U}$ , where $z^{\dagger}_{i}:=T_{i\leftarrow V}\boldsymbol{z}^{*}_{V}(\boldsymbol{d}_{V})$ and $U\subseteq V$ . Furthermore, we define $\|\cdot\|_{U,\infty}:=\max_{i\in U}\|T_{i\leftarrow U}(\cdot)\|$ and $\|\cdot\|_{U,U^{\prime},1}:=\sum_{i\in U}\sum_{j\in U^{\prime}}\|T_{i\leftarrow U}(\cdot)T^{\top}_{j\leftarrow U^{\prime}}\|$ .

Assumption 2.

Assumption 1 holds with $\mathbb{D}_{U}:=\{\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}_{-U}:\boldsymbol{z}_{-U}\in\mathbb{R}^{n_{-U}}\}$ for any $U\subseteq V$ .

Here $n_{-U}=\sum_{i\in N_{V}(U)}n_{i}$ . While Assumption 2 is strong, we believe it can be relaxed; but doing so meaningfully is a technically extensive endeavor beyond the scope of this communication format. We note, however, that it is always satisfied for bound constraints and for augmented Lagrangian reformulations of the original problem by using slacks to obtain only bound inequality constraints and then penalizing all equality constraints, which can approximate the solution set of the original problem arbitrarily well.

Assumption 3.

$\omega:=\min_{k\in\mathbb{I}_{1:K}}\Delta_{V}(V_{k},N_{V}(W_{k}))-1\geq 1$ .

Here, we abuse the notation by letting $\Delta_{V}(U,U^{\prime}):=\min_{u\in U,u^{\prime}\in U^{\prime}}\Delta_{V}(u,u^{\prime})$ , where $U,U^{\prime}\subseteq V$ . We call $\omega$ the size of overlap. Note that an overlapping partition $\{W_{k}\}_{k=1}^{K}$ with size $\omega$ can be constructed from a non-overlapping partition $\{V_{k}\}_{k=1}^{K}$ by expanding each original subdomain using $W_{k}=\{i\in V:\Delta_{V}(i,V_{k})\leq\omega\}$ .

Assumption 4.

$\displaystyle\underline{\sigma}:=\inf_{U\subseteq\mathcal{V}}\underline{\sigma}_{U}>0$ , $\displaystyle\overline{\sigma}:=\sup_{U\subseteq\mathcal{V}}\overline{\sigma}_{U}<\infty$ .

Assumption 4 trivially holds if the parent graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ is finite, but it may be violated if the parent graph is infinite. In particular, $\underline{\sigma}_{V}$ or $\overline{\sigma}_{V}$ may tend to zero or infinity as $V$ grows. Checking the validity of Assumption 4 for the infinite parent graph case is beyond the scope of this paper; sufficient conditions for this to hold will be studied in the future.

We note, however, that Assumptions 2 and 4 hold for augmented Lagrangian reformulations with bounded data when the objective matrix has bounded entries and is strongly diagonally dominant, which is a way we can approximate most QPs with some regularization; in contrast, Assumption 3 can be satisfied by construction. Therefore our setup contains a large set of problems or close approximations.

Lemma 6.

Let Assumption 2 hold. For any $U\subseteq V$ we have that $\boldsymbol{z}^{\dagger}_{U}=\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U})$ .

Proof.

Since $P_{U}(\cdot)$ is a convex QP, the KKT conditions are necessary and sufficient for optimality. By Assumption 2, the primal-dual solution is unique; therefore, it suffices to prove that $\boldsymbol{z}^{\dagger}_{U}$ satisfies the KKT conditions of $P_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U})$ . From the KKT conditions of $P_{V}(\boldsymbol{d}_{V})$ , we have

		$\displaystyle\boldsymbol{Q}_{V}\boldsymbol{x}^{\dagger}_{V}+\boldsymbol{A}_{V}^{\top}\boldsymbol{\lambda}_{V}^{\dagger}=\boldsymbol{f}_{V},\quad\boldsymbol{A}_{V}^{\mathcal{E}}\boldsymbol{x}_{V}^{\dagger}=\boldsymbol{g}_{V}^{\mathcal{E}},\quad\boldsymbol{A}_{V}^{\mathcal{I}}\boldsymbol{x}_{V}^{\dagger}\geq\boldsymbol{g}_{V}^{\mathcal{I}}$
		$\displaystyle\boldsymbol{\lambda}_{V}^{{\dagger},\mathcal{I}}\geq 0,\quad\mathop{\text{diag}}(\boldsymbol{\lambda}_{V}^{{\dagger},\mathcal{I}})(\boldsymbol{A}_{V}^{\mathcal{I}}\boldsymbol{x}_{V}^{\dagger}-\boldsymbol{g}_{V}^{\mathcal{I}})=0.$		(12)

By extracting the rows associated with $U$ and rearranging equations and inequalities, we obtain

		$\displaystyle\boldsymbol{Q}_{U}\boldsymbol{x}^{\dagger}_{U}+\boldsymbol{A}_{U}^{\top}\boldsymbol{\lambda}^{\dagger}_{U}=\boldsymbol{f}_{U}-\boldsymbol{Q}_{-U}\boldsymbol{x}^{\dagger}_{-U}-\boldsymbol{A}^{\top}_{-U}\boldsymbol{\lambda}^{\dagger}_{-U}$		(13)
		$\displaystyle\boldsymbol{A}^{\mathcal{E}}_{U}\boldsymbol{x}^{\dagger}_{U}=\boldsymbol{g}^{\mathcal{E}}_{U}-\boldsymbol{A}^{\mathcal{E}}_{-U}\boldsymbol{x}^{\dagger}_{-U}\quad\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}^{\dagger}_{U}\geq\boldsymbol{g}^{\mathcal{I}}_{U}-\boldsymbol{A}^{\mathcal{I}}_{-U}\boldsymbol{x}^{\dagger}_{-U},$
		$\displaystyle\boldsymbol{\lambda}^{{\dagger},\mathcal{I}}_{U}\geq 0,\quad\mathop{\text{diag}}(\boldsymbol{\lambda}^{{\dagger},\mathcal{I}}_{U})(\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}^{\dagger}_{U}-\boldsymbol{g}^{\mathcal{I}}_{U}+\boldsymbol{A}_{-U}^{\mathcal{I}}\boldsymbol{x}^{\dagger}_{-U})=0.$

Here, note that $\boldsymbol{A}_{-U}:=\{A_{i,j}\}_{i\in U,j\in N_{V}(U)}$ and $\boldsymbol{A}^{\top}_{-U}:=\{A_{j,i}^{\top}\}_{i\in U,j\in N_{V}(U)}$ (they are not transpose to each other). Conditions (13) imply that $\boldsymbol{z}^{\dagger}_{U}$ satisfies the KKT conditions for $P_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U})$ . Thus, it is the solution. ∎

Lemma 6 provides a form of consistent subproblems whose solution recovers a piece of the full solution as long as the complementary solution information is accurate. Thus, it justifies the subproblem formulation in Algorithm 1.

The main result of the paper is stated as follows.

Theorem 2 (Convergence of Overlapping Schwarz).

Let Assumptions 2–4 hold. The sequence $\{\boldsymbol{z}^{(\ell)}_{V}\}_{\ell=0}^{\infty}$ generated by Algorithm 1 satisfies

\displaystyle\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}\leq\left(\alpha_{V}(\omega)\right)^{\ell}\|\boldsymbol{z}^{(0)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}.

(14)

Here, $\alpha_{V}(\omega):=R_{V}\Gamma\rho^{\lceil(\omega-1)/2\rceil}$ ; $\Gamma:={\overline{\sigma}}/{\underline{\sigma}^{2}}$ ; $\rho:=(\overline{\sigma}^{2}-\underline{\sigma}^{2})/(\overline{\sigma}^{2}+\underline{\sigma}^{2})$ ; and $R_{V}:=\displaystyle\max_{U\subseteq V}\|\boldsymbol{H}_{-U}\|_{U,-U,1}$ .

Proof.

By Lemma 6, we have that, for any $i\in V_{k}$ ,

	$\displaystyle\left\\|z^{(\ell+1)}_{i}-z^{\dagger}_{i}\right\\|$	$\displaystyle=\Big{\\|}T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})$
		$\displaystyle\quad\quad-T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{{\dagger}}_{-W_{k}})\Big{\\|}$
		$\displaystyle\leq\sum_{j\in W_{k}}\Gamma_{W_{k}}\rho^{\lceil(\Delta_{W_{k}}(i,j)-1)/2\rceil}_{W_{k}}$
		$\displaystyle\quad\quad\times\left\\|T_{j\leftarrow W_{k}}\boldsymbol{H}_{-W_{k}}(\boldsymbol{z}^{(\ell)}_{-W_{k}}-\boldsymbol{z}^{\dagger}_{-W_{k}})\right\\|.$

Here, the inequality follows from Theorem 1. A key observation is that $T_{j\leftarrow W_{k}}\boldsymbol{H}_{-W_{k}}\neq 0$ only if $\Delta_{V}(j,N_{V}(W_{k}))=1$ . Such a $j$ satisfies $\Delta_{W_{k}}(i,j)\geq\omega$ , for any $i\in V_{k}$ , by the definition of $\omega$ , the triangle inequality for $\Delta_{V}(\cdot,\cdot)$ , and $\Delta_{W_{k}}(i,j)\geq\Delta_{V}(i,j)$ . Therefore, for any $i\in V$ , we have:

	$\displaystyle\left\\|z^{(\ell+1)}_{i}-z^{\dagger}_{i}\right\\|$
	$\displaystyle\leq\max_{k\in\mathbb{I}_{1:K}}\underbrace{\\|\boldsymbol{H}_{-W_{k}}\\|_{W_{k},-W_{k},1}\Gamma_{W_{k}}\rho^{\lceil\frac{\omega-1}{2}\rceil}_{W_{k}}}_{\alpha_{k}}\left\\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{\dagger}_{V}\right\\|_{V,\infty}$

One can show that $\max_{k\in\mathbb{I}_{1:K}}\alpha_{k}\leq\alpha_{V}(\omega)$ with Assumptions 3–4. This implies $\|\boldsymbol{z}^{(\ell+1)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}\leq\alpha_{V}(\omega)\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{*}_{V}\|_{V,\infty}$ , which yields (14). ∎

The upper bound of convergence rate $\alpha_{V}(\omega)$ decays with an increase in the size of overlap $\omega$ (recall that $\rho\in(0,1)$ ). In the case of a finite parent graph, Algorithm 1 converges if $\omega$ is sufficiently large, since either (i) $\alpha_{V}(\omega)<1$ or (ii) each subproblem becomes the full problem (the solution is obtained in one iteration).

If the parent graph is infinite (such a setting is relevant for time grids, unbounded physical space domains, and scenario trees), we can make $V$ arbitrarily large, and thus the limiting behavior of $\alpha_{V}$ is of interest. Theorem 2 suggests that constant $\Gamma$ and the exponential decay factor $\rho$ do not deteriorate as $V$ grows, but $R_{V}$ may grow with $V$ . Here, $R_{V}$ represents the maximum coupling between the subdomains $U\subseteq V$ with its surroundings $V\setminus U$ . Accordingly, the growth of $R_{V}$ is determined by the topology (as long as $\|H_{i,j}\|$ are uniformly bounded). In particular, if the parent graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ is finite-dimensional (e.g., grid points in $\mathbb{I}^{d}$ with $d<\infty$ ), $R_{V}$ grows in a polynomial manner with the diameter of $V$ . In such a case, a sufficiently large $\omega$ can be found so that the decay of $\rho^{\lceil(\omega-1)/2\rceil}$ offsets the growth of $R_{V}$ (an exponentially decaying function times a polynomial converges). On the other hand, in the case of scenario trees, where $R_{V}$ may grow exponentially, the upper bound derived in Theorem 2 may not provide a tight upper bound, and $\alpha_{V}(\omega)$ may diverge as $V$ grows no matter how large $\omega$ is.

Note that there exists an inherent trade-off between the convergence rate and subproblem complexity. The convergence rate improves with $\omega$ but the subproblem solution times also increase with $\omega$ . Therefore, to achieve maximum performance, one needs to tune $\omega$ .

III-D Monitoring Convergence

Convergence can be monitored by checking the residuals to the KKT conditions of the full problem $P_{V}(\boldsymbol{d}_{V})$ . However, a more convenient surrogate of the full KKT residuals can be derived as follows.

Proposition 1.

Suppose that Assumptions 2–3 hold, and let $\{\boldsymbol{z}_{V}^{(\ell)}\}_{\ell=0}^{\infty}$ be a sequence generated by Algorithm 1 and $\boldsymbol{z}^{(\ell,k)}_{W_{k}}:=\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell-1)}_{-W_{k}})$ . Then $\boldsymbol{z}_{V}^{(\ell)}\rightarrow\boldsymbol{z}^{\dagger}_{V}$ as $\ell\rightarrow\infty$ if the following holds for $k\in\mathbb{I}_{1:K}$ :

\displaystyle E^{(\ell)}_{k}:=\boldsymbol{z}^{(\ell,k)}_{N_{V}(V_{k})}-\boldsymbol{z}^{(\ell)}_{N_{V}(V_{k})}\rightarrow 0,\;\text{as}\;\ell\rightarrow\infty.

(15)

Proof.

We make two observations. (i) The KKT conditions of $P_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell-1)}_{-W_{k}})$ and the fact that $N_{V}[V_{k}]\subseteq W_{k}$ (from Assumption 3) imply that (13) holds when we replace $U\leftarrow V_{k}$ , $\boldsymbol{z}^{\dagger}_{U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{V_{k}}$ , and $\boldsymbol{z}^{\dagger}_{-U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{N_{V}(V_{k})}$ , for any $k\in\mathbb{I}_{1:K}$ . (ii) The residual of the KKT systems can be obtained from the residuals to (13) by replacing $U\leftarrow V_{k}$ , $\boldsymbol{z}^{\dagger}_{U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{V_{k}}$ , and $\boldsymbol{z}^{\dagger}_{-U}\leftarrow\boldsymbol{z}^{(\ell)}_{N_{V}(V_{k})}$ and collecting the residuals for $k\in\mathbb{I}_{1:K}$ . Note that the residuals of the KKT conditions (13) are continuous with respect to $\boldsymbol{z}^{\dagger}_{-U}$ . By using a continuity argument, (15) and observations (i)–(ii), the limit points of $\{\boldsymbol{z}^{(\ell)}\}_{\ell=1}^{\infty}$ satisfy (13) for $k\in\mathbb{I}_{1:K}$ (i.e., (III-C) holds). By the uniqueness of the solution such a limit point is unique, and this implies $\boldsymbol{z}_{V}^{(\ell)}\rightarrow\boldsymbol{z}^{\dagger}_{V}$ as $\ell\rightarrow\infty$ . ∎

We can define primal-dual errors as $\displaystyle\epsilon_{\text{pr}}:=\max_{k\in\mathbb{I}_{1:K}}\|\text{pr}(E^{(\ell)}_{k})\|_{\infty}$ and $\displaystyle\epsilon_{\text{du}}:=\max_{k\in\mathbb{I}_{1:K}}\|\text{du}(E^{(\ell)}_{k})\|_{\infty}$ , where pr $(\cdot)$ and du $(\cdot)$ extract the indices associated with primal variables and dual variables, respectively. Convergence criteria can thus be set to the following: Stop if $(\epsilon_{\text{pr}}<\epsilon_{\text{pr}}^{\text{tol}})\land(\epsilon_{\text{du}}<\epsilon_{\text{du}}^{\text{tol}})$ .

IV Numerical Example

Consider the regularized DC OPF problem [28] over a network $\mathcal{G}(\mathcal{V},\mathcal{E})$ :


	$\displaystyle\min_{\begin{subarray}{c}\{\theta_{i}\}_{i\in\mathcal{V}}\\ \{P_{q}\}_{q\in\Omega}\end{subarray}}\;\sum_{q\in\Omega}c_{q,1}P_{q}+c_{q,2}P^{2}_{q}+\frac{\gamma}{2}\sum_{i\in\mathcal{V}}(\theta_{i}-\theta_{j})^{2}$		(16a)
	$\displaystyle\mathop{\text{s.t.}}\;\sum_{q\in\Omega_{i}}P_{q}-\sum_{j\in N_{\mathcal{V}}[i]}B_{i,j}(\theta_{i}-\theta_{j})=P^{L}_{i},\;i\in\mathcal{V}$		(16b)
	$\displaystyle\qquad\underline{P}_{q}\leq P_{q}\leq\overline{P}_{q},\;q\in\Omega,\;\theta_{i}=\theta^{\text{ref}}_{i},\;i\in\mathcal{V}^{\text{ref}}$		(16c)
	$\displaystyle\qquad-\overline{\theta}_{i,j}\leq\theta_{i}-\theta_{j}\leq\overline{\theta}_{i,j},\;\{i,j\}\in\mathcal{E}.$		(16d)

Here, $\Omega_{i}$ is the set of generators in node $i$ ; $\Omega:=\bigcup_{i\in\mathcal{V}}\Omega_{i}$ is the set of all generators; $\mathcal{V}^{\text{ref}}$ is the set of reference nodes; $\theta_{i}$ are the voltage angles; $P_{q}$ are the active power generations; $c_{q,1}$ and $c_{q,2}$ are the generation cost coefficients; $\gamma$ is the regularization coefficient; $B_{i,j}$ are the line susceptances; $P^{L}_{i}$ are the active power loads; $\underline{P}_{q}$ and $\overline{P}_{q}$ are the lower and upper bounds of active power generations, respectively; $\overline{\theta}_{i,j}$ are the voltage angle separation limits; and $\theta^{\text{ref}}_{i}$ are reference voltage angles. The problems are modeled in the algebraic modeling language JuMP [29] and solved with the nonlinear programming solver Ipopt [30]. The $9,241$ -bus test case obtained from pglib-opf (v19.05) is used [31]. We modified the data by placing artificial storage with infinite capacity and high charge/discharge cost in each node. The network node set is partitioned into $16$ subdomains using the graph partitioning tool Metis [32], and each subdomain is expanded to obtain an overlapping partition. The Schwarz scheme is run on a multicore parallel computing server (shared memory and 32 CPUs of Intel Xeon CPU E5-2698 v3 running at 2.30 GHz) using the Julia package Distributed.jl. One master process and 16 worker processes are used (one process per one subproblem). We use $\gamma=10^{5}$ , $\epsilon^{\text{tol}}_{\text{pr}}=10^{-2}$ , and $\epsilon^{\text{tol}}_{\text{du}}=10^{2}$ . The scripts to reproduce the results can be found here https://github.com/zavalab/JuliaBox/tree/master/SchwarzQPcons.

Convergence results are shown in Fig. 2; we vary the size of overlap $\omega$ and show the evolution of the objective value and primal-dual error. The black dashed line represents the optimal objective value (obtained by solving the problem with Ipopt) and the error tolerances. The total computation times and iteration counts are compared in Table I. We can see that increasing $\omega$ accelerates convergence (reduces the iteration count roughly proportionally with the size of the overlap) but does not always reduce the computation time. The reason is that the subproblem complexity increases with $\omega$ ; thus, one can see that an optimal overlap exists. However, the overlapping scheme is still considerably slower than the centralized solver; in particular, Ipopt solves the full problem in 1.92 seconds. We expect that the computational savings can achieved if the problem is bigger. In the future, we will investigate the algorithm with several large-scale problems, and study the conditions under which the overlapping scheme becomes more effective.

Refer to caption — Figure 2: Profiles for objective and primal error for different overlap sizes.

TABLE I: Effect of overlap on solution times and iteration counts

	$\omega=1$	$\omega=5$	$\omega=10$
Solution time	42.5 sec	17.7 sec	24.2 sec
Iterations	$190$ iter	$39$ iter	$25$ iter

V Conclusions

We have presented an overlapping Schwarz algorithm for solving constrained quadratic programs. We show that convergence relies on an exponential decay of the sensitivity result, which we establish for the setting of interest. The behavior of the algorithm was demonstrated by using a large DC optimal power flow problem. In future work we will study convergence in a nonlinear setting, and we will conduct more extensive benchmark studies.

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) under Contract DE-AC02-06CH11347 and by NSF through award CNS-1545046. We also acknowledge partial support from the National Science Foundation under award NSF-EECS-1609183.

References

[1] J. B. Rawlings and D. Q. Mayne, “Model predictive control: Theory and design,” 2009.
[2] Z. J. Yu and L. T. Biegler, “Advanced-step multistage nonlinear model predictive control: Robustness and stability,” Journal of Process Control, vol. 84, pp. 192–206, 2019.
[3] M. V. Pereira and L. M. Pinto, “Multi-stage stochastic optimization applied to energy planning,” Mathematical programming, vol. 52, no. 1-3, pp. 359–375, 1991.
[4] L. T. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders, Real-time PDE-constrained Optimization. SIAM, 2007.
[5] J. Lavaei and S. H. Low, “Zero duality gap in optimal power flow problem,” IEEE Transactions on Power Systems, vol. 27, no. 1, pp. 92–107, 2011.
[6] C. Coffrin, R. Bent, K. Sundar, Y. Ng, and M. Lubin, “Powermodels. jl: An open-source framework for exploring power flow formulations,” in 2018 Power Systems Computation Conference (PSCC). IEEE, 2018, pp. 1–8.
[7] A. Zlotnik, M. Chertkov, and S. Backhaus, “Optimal control of transient flow in natural gas networks,” in 2015 54th IEEE conference on decision and control (CDC). IEEE, 2015, pp. 4563–4570.
[8] A. Rantzer, “Dynamic dual decomposition for distributed control,” in 2009 American Control Conference. IEEE, 2009, pp. 884–888.
[9] C. Conte, T. Summers, M. N. Zeilinger, M. Morari, and C. N. Jones, “Computational aspects of distributed optimization in model predictive control,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). IEEE, 2012, pp. 6819–6824.
[10] S. Shin, P. Hart, T. Jahns, and V. M. Zavala, “A hierarchical optimization architecture for large-scale power networks,” IEEE Transactions on Control of Network Systems, 2019.
[11] A. Engelmann, Y. Jiang, B. Houska, and T. Faulwasser, “Decomposition of non-convex optimization via bi-level distributed aladin,” IEEE Transactions on Control of Network Systems, 2020.
[12] J. R. Jackson and I. E. Grossmann, “Temporal decomposition scheme for nonlinear multisite production planning and distribution models,” Industrial & engineering chemistry research, vol. 42, no. 13, pp. 3045–3055, 2003.
[13] C. Lemaréchal, “Lagrangian relaxation,” in Computational combinatorial optimization. Springer, 2001, pp. 112–156.
[14] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
[15] S. Shin and V. M. Zavala, “Multi-grid schemes for multi-scale coordination of energy systems,” in Energy Markets and Responsive Grids. Springer, 2018, pp. 195–222.
[16] R. Carli and M. Dotoli, “A distributed control algorithm for waterfilling of networked control systems via consensus,” IEEE control systems letters, vol. 1, no. 2, pp. 334–339, 2017.
[17] A. Kozma, C. Conte, and M. Diehl, “Benchmarking large-scale distributed convex quadratic programming algorithms,” Optimization Methods and Software, vol. 30, no. 1, pp. 191–214, 2015.
[18] S. Shin, V. M. Zavala, and M. Anitescu, “Decentralized schemes with overlap for solving graph-structured optimization problems,” IEEE Transactions on Control of Network Systems, 2020.
[19] A. Frommer and D. B. Szyld, “An algebraic convergence theory for restricted additive schwarz methods using weighted max norms,” SIAM journal on numerical analysis, vol. 39, no. 2, pp. 463–479, 2001.
[20] S. Na and M. Anitescu, “Exponential decay in the sensitivity analysis of nonlinear dynamic programming,” arXiv preprint arXiv:1912.06734, 2019.
[21] S. Shin and V. M. Zavala, “Diffusing-horizon model predictive control,” arXiv preprint arXiv:2002.08556, 2020.
[22] L. Grüne, M. Schaller, and A. Schiela, “Sensitivity analysis of optimal control for a class of parabolic pdes motivated by model predictive control,” SIAM Journal on Control and Optimization, vol. 57, no. 4, pp. 2753–2774, 2019.
[23] L. Grüne, M. Schaller, and A. Schiela, “Exponential sensitivity and turnpike analysis for linear quadratic optimal control of general evolution equations,” Journal of Differential Equations, vol. 268, no. 12, pp. 7311–7341, 2020.
[24] J. Nocedal and S. Wright, Numerical optimization. Springer Science & Business Media, 2006.
[25] A. G. Hadigheh, O. Romanko, and T. Terlaky, “Sensitivity analysis in convex quadratic optimization: simultaneous perturbation of the objective and right-hand-side vectors,” Algorithmic Operations Research, vol. 2, no. 2, pp. 94–94, 2007.
[26] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems. Springer Science & Business Media, 2013.
[27] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge university press, 2012.
[28] J. Sun and L. Tesfatsion, “DC optimal power flow formulation and solution using quadprogj,” 2010.
[29] I. Dunning, J. Huchette, and M. Lubin, “Jump: A modeling language for mathematical optimization,” SIAM Review, vol. 59, no. 2, pp. 295–320, 2017.
[30] A. Wächter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006.
[31] S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, R. Huang et al., “The power grid library for benchmarking AC optimal power flow algorithms,” arXiv preprint arXiv:1908.02788, 2019.
[32] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on scientific Computing, vol. 20, no. 1, pp. 359–392, 1998.

	$\displaystyle\left\\|z^{(\ell+1)}_{i}-z^{\dagger}_{i}\right\\|$	$\displaystyle=\Big{\\|}T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})$
		$\displaystyle\quad\quad-T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{{\dagger}}_{-W_{k}})\Big{\\|}$
		$\displaystyle\leq\sum_{j\in W_{k}}\Gamma_{W_{k}}\rho^{\lceil(\Delta_{W_{k}}(i,j)-1)/2\rceil}_{W_{k}}$
		$\displaystyle\quad\quad\times\left\\|T_{j\leftarrow W_{k}}\boldsymbol{H}_{-W_{k}}(\boldsymbol{z}^{(\ell)}_{-W_{k}}-\boldsymbol{z}^{\dagger}_{-W_{k}})\right\\|.$