This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Overlapping Schwarz Decomposition for Constrained Quadratic Programs

Sungho Shin, Mihai Anitescu, Victor M. Zavala S. Shin is with the Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA (e-mail: [email protected])M. Anitescu is with the Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA, and also with the Department of Statistics, University of Chicago, Chicago, IL 60637, USA (e-mail: [email protected])V. M. Zavala is with the Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA and with the Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA (e-mail: [email protected])
Abstract

We present an overlapping Schwarz decomposition algorithm for constrained quadratic programs (QPs). Schwarz algorithms have been traditionally used to solve linear algebra systems arising from partial differential equations, but we have recently shown that they are also effective at solving structured optimization problems. In the proposed scheme, we consider QPs whose algebraic structure can be represented by graphs. The graph domain is partitioned into overlapping subdomains (yielding a set of coupled subproblems), solutions for the subproblems are computed in parallel, and convergence is enforced by updating primal-dual information in the overlapping regions. We show that convergence is guaranteed if the overlap is sufficiently large and that the convergence rate improves exponentially with the size of the overlap. Convergence results rely on a key property of graph-structured problems that is known as exponential decay of sensitivity. Here, we establish conditions under which this property holds for constrained QPs (as those found in network optimization and optimal control), thus extending existing work that addresses unconstrained QPs. The numerical behavior of the Schwarz scheme is demonstrated by using a DC optimal power flow problem defined over a network with 9,241 nodes.

I Introduction

Structured optimization problems arise in many applications such as trajectory planning and model predictive control [1], multistage stochastic programming [2, 3], optimization with embedded partial differential equations (PDEs) [4], and in optimization of energy infrastructures [5, 6, 7]. Decomposition schemes that exploit the problem structure have been used to overcome scalability limits of centralized schemes [8, 9, 10, 11, 12]. A wide range of decomposition schemes have been proposed in the literature such as Lagrangian decomposition [13], the alternating direction method of multipliers (ADMM) [14], and Jacobi/Gauss-Seidel methods [15]. The basic tenet behind such algorithms is to decompose the original problem into subproblems and to coordinate subproblem solutions by using primal-dual information. These decomposition schemes are different from waterfilling methods [16] in that the constraints are not necessarily satisfied at each iteration within the algorithm. Moreover, these schemes are advantageous in that they can handle a wide variety of objective/constraint formulations and in that they have well-understood convergence properties. However, a disadvantage of these schemes is that convergence can be rather slow [17].

Overlapping Schwarz decomposition schemes have been recently used to solve structured optimization problems, and they have demonstrated to outperform popular schemes such as ADMM and Jacobi/Gauss-Seidel in certain problem classes [18]. Schwarz algorithms were originally developed for the parallel solution of linear algebra systems arising in PDEs, but such schemes can also be used to handle general linear systems and optimization problems by exploiting their underlying algebraic topology [19]. As the name suggests, overlapping Schwarz algorithms decompose the full problem (the underlying graph) into subproblems that are defined over overlapping subdomains. Solutions for the subproblems are computed in parallel and convergence is enforced by updating primal-dual information in the overlapping regions. In the context of quadratic programs (QP) that are unconstrained and convex, we have shown that the convergence rate of Schwarz algorithms improves exponentially with the size of the overlapping region [18]. Overlapping Schwarz schemes provide a bridge between fully decentralized Jacobi/Gauss-Seidel algorithms (no overlap) and centralized algorithms (the overlap is the entire domain).

This paper presents an overlapping Schwarz algorithm for constrained QPs. We analyze the convergence properties of the algorithm and derive an explicit relationship between its convergence rate and the size of the overlap. This result extends existing convergence results for unconstrained QPs [18] to equality- and inequality-constrained QPs. In particular, we show that the algorithm converges with sufficiently large overlap and that the convergence rate improves exponentially with the size of overlap. This convergence result relies on a property called exponential decay of sensitivity [20, 21, 22, 23, 18]. The property states that the sensitivity of the primal-dual solution at a given node decays exponentially with respect to the distance from the perturbation. Such a property has been established for discrete-time [20, 21] and continuous-time [22, 23] optimal control problems (the graph is a line) and for unconstrained QPs (general graph) [18]. This paper establishes the exponential sensitivity property for constrained QPs over general graphs, thus making the results broadly applicable.

The paper is organized as follows. In the remainder of this section we introduce basic notation and the problem under study. In Section II we introduce the overlapping Schwarz algorithm. In Section III we present the main theoretical results. We first analyze the sensitivity of the solution of structured QPs against parametric perturbations and then use the results to establish convergence conditions for the algorithm. Numerical results are given in Section IV.

Notation. The set of real numbers and the set of integers are denoted by \mathbb{R} and 𝕀\mathbb{I}, respectively, and we define 𝕀a:b:=𝕀[a,b]\mathbb{I}_{a:b}:=\mathbb{I}\cap[a,b], 𝕀>0:=𝕀(0,)\mathbb{I}_{>0}:=\mathbb{I}\cap(0,\infty), >0:=(0,)\mathbb{R}_{>0}:=(0,\infty), and ¯:={}\overline{\mathbb{R}}:=\mathbb{R}\cup\{\infty\}. By default, vectors are assumed to be column vectors, and we use the syntax (M1,,Mn):=[M1Mn](M_{1},\cdots,M_{n}):=[M_{1}^{\top}\,\cdots\,M_{n}^{\top}]^{\top}, {Mi}i:=(Mi1,,Mim)\{M_{i}\}_{i\in\mathcal{I}}:=(M_{i_{1}},\cdots,M_{i_{m}}), and {Mi,j}i,j𝒥:={{Mi,j}j𝒥}i\{M_{i,j}\}_{i\in\mathcal{I},j\in\mathcal{J}}:=\{\{M_{i,j}^{\top}\}_{j\in\mathcal{J}}^{\top}\}_{i\in\mathcal{I}}, where ={i1<<im}\mathcal{I}=\{i_{1}<\cdots<i_{m}\} and 𝒥={j1<<jn}\mathcal{J}=\{j_{1}<\cdots<j_{n}\}. Furthermore, v[i]v[i] is the iith component of vv, M[i,j]M[i,j] is the (i,j)(i,j)th component of MM, v[]:={v[i]}iv[\mathcal{I}]:=\{v[i]\}_{i\in\mathcal{I}}, and M[,𝒥]:={M[i,j]}i,j𝒥M[\mathcal{I},\mathcal{J}]:=\{M[i,j]\}_{i\in\mathcal{I},j\in\mathcal{J}}. Vector 2-norms and induced 2-norms are denoted by \|\cdot\|. For matrices AA and BB, ABA\succeq B indicates that ABA-B is positive semi-definite while ABA\geq B represents a componentwise inequality.

Setting. We consider a (potentially infinite) parent graph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}), where 𝒱\mathcal{V} is the set of nodes and \mathcal{E} is the set of edges. We also consider the finite node subset U𝒱U\subseteq\mathcal{V} and the following QP:

min{xi}iU\displaystyle\;\min_{\{x_{i}\}_{i\in U}}\; iUjNU[i]12xiQi,jxjiUfixi\displaystyle\sum_{i\in U}\sum_{j\in N_{U}[i]}\frac{1}{2}x_{i}^{\top}Q_{i,j}x_{j}-\sum_{i\in U}f_{i}^{\top}x_{i} (1a)
s.t.\displaystyle\mathop{\text{s.t.}}\; jNU[i]Ai,jxj=gi,(λi),iU\displaystyle\sum_{j\in N_{U}[i]}A^{\mathcal{E}}_{i,j}x_{j}=g^{\mathcal{E}}_{i},\;(\lambda^{\mathcal{E}}_{i}),\;i\in U (1b)
jNU[i]Ai,jxjgi,(λi),iU.\displaystyle\sum_{j\in N_{U}[i]}A^{\mathcal{I}}_{i,j}x_{j}\geq g^{\mathcal{I}}_{i},\;(\lambda^{\mathcal{I}}_{i}),\;i\in U. (1c)

Here xirix_{i}\in\mathbb{R}^{r_{i}} are the decision variables for node ii; λimi\lambda^{\mathcal{E}}_{i}\in\mathbb{R}^{m^{\mathcal{E}}_{i}} and λimi\lambda^{\mathcal{I}}_{i}\in\mathbb{R}^{m^{\mathcal{I}}_{i}} are the dual variables; Qi,jri×rjQ_{i,j}\in\mathbb{R}^{r_{i}\times r_{j}}, Ai,jmi×rjA^{\mathcal{E}}_{i,j}\in\mathbb{R}^{m^{\mathcal{E}}_{i}\times r_{j}}, Ai,jmi×rjA^{\mathcal{I}}_{i,j}\in\mathbb{R}^{m^{\mathcal{I}}_{i}\times r_{j}}, firif_{i}\in\mathbb{R}^{r_{i}}, gimig^{\mathcal{E}}_{i}\in\mathbb{R}^{m^{\mathcal{E}}_{i}}, and gimig^{\mathcal{I}}_{i}\in\mathbb{R}^{m^{\mathcal{I}}_{i}} are the data; and NU[X]:=NU(X)XN_{U}[X]:=N_{U}(X)\cup X, where NU(X):={jUX:iX such that {i,j}}N_{U}(X):=\{j\in U\setminus X:\exists i\in X\text{ such that }\{i,j\}\in\mathcal{E}\} and the argument is considered as a singleton if XX is a single node. We define Ai,j:=(Ai,j,Ai,j)A_{i,j}:=(A^{\mathcal{E}}_{i,j},A^{\mathcal{I}}_{i,j}), gi:=(gi,gi)g_{i}:=(g^{\mathcal{E}}_{i},g^{\mathcal{I}}_{i}), λi:=(λi,λi)\lambda_{i}:=(\lambda^{\mathcal{E}}_{i},\lambda^{\mathcal{I}}_{i}), zi:=(xi,λi)z_{i}:=(x_{i},\lambda_{i}), di:=(fi,gi)d_{i}:=(f_{i},g_{i}), mi:=mi+mim_{i}:=m^{\mathcal{E}}_{i}+m^{\mathcal{I}}_{i}, and ni:=ri+min_{i}:=r_{i}+m_{i}. We assume that Qi,j=Qj,iQ_{i,j}=Q_{j,i}^{\top}.

An equivalent problem can be written in a compact form:

PU(𝒅U):min𝒙U\displaystyle P_{U}(\boldsymbol{d}_{U}):\;\min_{\boldsymbol{x}_{U}}\; 12𝒙U𝑸U𝒙U𝒇U𝒙U\displaystyle\frac{1}{2}\boldsymbol{x}_{U}^{\top}\boldsymbol{Q}_{U}\boldsymbol{x}_{U}-\boldsymbol{f}^{\top}_{U}\boldsymbol{x}_{U}
s.t.\displaystyle\mathop{\text{s.t.}}\; 𝑨U𝒙U=𝒈U,(𝝀U)\displaystyle\boldsymbol{A}^{\mathcal{E}}_{U}\boldsymbol{x}_{U}=\boldsymbol{g}^{\mathcal{E}}_{U},\;(\boldsymbol{\lambda}^{\mathcal{E}}_{U})
𝑨U𝒙U𝒈U,(𝝀U).\displaystyle\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}_{U}\geq\boldsymbol{g}^{\mathcal{I}}_{U},\;(\boldsymbol{\lambda}^{\mathcal{I}}_{U}).

Here, 𝒙U:={xi}iU\boldsymbol{x}_{U}:=\{x_{i}\}_{i\in U}; 𝝀U:={λi}iU\boldsymbol{\lambda}^{\mathcal{E}}_{U}:=\{\lambda^{\mathcal{E}}_{i}\}_{i\in U}; 𝝀U:={λi}iU\boldsymbol{\lambda}^{\mathcal{I}}_{U}:=\{\lambda^{\mathcal{I}}_{i}\}_{i\in U}; 𝝀U:={λi}iU\boldsymbol{\lambda}_{U}:=\{\lambda_{i}\}_{i\in U}; 𝒛U:={zi}iU\boldsymbol{z}_{U}:=\{z_{i}\}_{i\in U}; 𝒇U:={fi}iU\boldsymbol{f}_{U}:=\{f_{i}\}_{i\in U}; 𝒈U:={gi}iU\boldsymbol{g}^{\mathcal{E}}_{U}:=\{g^{\mathcal{E}}_{i}\}_{i\in U}; 𝒈U:={gi}iU\boldsymbol{g}^{\mathcal{I}}_{U}:=\{g^{\mathcal{I}}_{i}\}_{i\in U}; 𝒈U:={gi}iU\boldsymbol{g}_{U}:=\{g_{i}\}_{i\in U}; 𝒅U:={di}iU\boldsymbol{d}_{U}:=\{d_{i}\}_{i\in U}; 𝑸U={Qi,j}i,jU\boldsymbol{Q}_{U}=\{Q_{i,j}\}_{i,j\in U}, 𝑨U:={Ai,j}i,jU\boldsymbol{A}^{\mathcal{E}}_{U}:=\{A^{\mathcal{E}}_{i,j}\}_{i,j\in U}; 𝑨U:={Ai,j}i,jU\boldsymbol{A}^{\mathcal{I}}_{U}:=\{A^{\mathcal{I}}_{i,j}\}_{i,j\in U}; 𝑨U:={Ai,j}i,jU\boldsymbol{A}_{U}:=\{A_{i,j}\}_{i,j\in U}; rU:=iUrir_{U}:=\sum_{i\in U}r_{i}; mU=iUmim_{U}=\sum_{i\in U}m_{i}; and nU=iUnin_{U}=\sum_{i\in U}n_{i}. The problem is denoted as the parametric form PU(𝒅U)P_{U}(\boldsymbol{d}_{U}).

II Overlapping Schwarz Algorithm

This section introduces the Schwarz algorithm for the solution of PV(𝒅V)P_{V}(\boldsymbol{d}_{V}) (referred to as the full problem) with V𝒱V\subseteq\mathcal{V}. We consider a non-overlapping partition {VkV}k=1K\{V_{k}\subseteq V\}_{k=1}^{K} of VV and an overlapping partition {WkV}k=1K\{W_{k}\subseteq V\}_{k=1}^{K} of VV such that VkWkV_{k}\subseteq W_{k} holds for k𝕀1:Kk\in\mathbb{I}_{1:K}. We call V1,,VKV_{1},\cdots,V_{K} original subdomains and W1,,WKW_{1},\cdots,W_{K} expanded subdomains. The Schwarz algorithm is defined below.

Algorithm 1 Overlapping Schwarz Algorithm
1:𝒛V(0)\boldsymbol{z}^{(0)}_{V}, {Vk}k=1K\{V_{k}\}_{k=1}^{K}, {Wk}k=1K\{W_{k}\}_{k=1}^{K}
2:for =0,1,\ell=0,1,\cdots do
3:     for (in parallel) k=1k=1 to KK do
4:         𝒛Vk(+1)=TVkWk𝒛Wk(𝒅Wk𝑯Wk𝒛Wk())\boldsymbol{z}^{(\ell+1)}_{V_{k}}=T_{V_{k}\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})
5:     end for
6:end for
7:𝒛V()\boldsymbol{z}^{(\ell)}_{V}

Here, we use a syntax that can be applied to any U𝒱U\subseteq\mathcal{V}: 𝑯U:={Hi,j}i,jU\boldsymbol{H}_{U}:=\{H_{i,j}\}_{i,j\in U}; 𝑯U:={Hi,j}iU,jNV(U)\boldsymbol{H}_{-U}:=\{H_{i,j}\}_{i\in U,j\in N_{V}(U)}; Hi,j:=[Qi,jAj,i;Ai,j 0]H_{i,j}:=[Q_{i,j}\;A_{j,i}^{\top};A_{i,j}\;0]; and 𝒛U:={zi}iNV(U)\boldsymbol{z}_{-U}:=\{z_{i}\}_{i\in N_{V}(U)}. Furthermore, 𝒛U()\boldsymbol{z}^{*}_{U}(\cdot) is the primal-dual solution mapping of the parametric optimization problem PU()P_{U}(\cdot); TU1U2:={Ti,j}iU1,jU2T_{U_{1}\leftarrow U_{2}}:=\{T_{i,j}\}_{i\in U_{1},j\in U_{2}}, where U1,U2𝒱U_{1},U_{2}\subseteq\mathcal{V} and Ti,j=Ini×niT_{i,j}=I_{n_{i}\times n_{i}} if i=ji=j and 0ni×nj0_{n_{i}\times n_{j}} otherwise. Note that 𝒛U\boldsymbol{z}_{-U} is supposed to represent the solution information that is complementary to UU. The full complementary solution information includes the solution on VUV\setminus U. However, the variables and constraints in UU are coupled only with NV(U)N_{V}(U), so it suffices to incorporate information only for the coupled complementary region NV(U)N_{V}(U). Therefore, we will abuse the term complementary solution to represent the coupled complementary solution 𝒛U\boldsymbol{z}_{-U}.

The core part of the algorithm (line 3) consists of three steps: subproblem solution, solution restriction, and primal-dual exchange. In the first step, one formulates the subproblem for the kkth subdomain as PWk(𝒅Wk𝑯Wk𝒛Wk())P_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}}) (this formulation will be justified later in Lemma 6). The subproblem incorporates complementary solution information 𝑯Wk𝒛Wk()\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}}, which is obtained during the third inner step of the previous iteration step. The subproblem is solved to obtain its solution 𝒛Wk(𝒅Wk𝑯Wk𝒛Wk())\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}}). Here, we observe that solution multiplicity exists at the overlapping region. To remove such multiplicity, we restrict the solution in the second step. In particular, we abandon the primal-dual solutions associated with WkVkW_{k}\setminus V_{k} (subdomain region acquired by expansion) and take only those solutions associated with VkV_{k} (the original subdomain region). This procedure is represented by the restriction operator TVkWkT_{V_{k}\leftarrow W_{k}}. After restriction, the solutions are assembled over k𝕀1:Kk\in\mathbb{I}_{1:K} to make the next guess of the solution 𝒛V(+1)\boldsymbol{z}^{(\ell+1)}_{V}. In the third step, the primal-dual solutions are exchanged across the subdomains to update the complementary information 𝑯Wk𝒛Wk()\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}} for each subproblem. The schematic of the algorithm is shown in Fig. 1. The algorithm can be implemented in a fully decentralized manner, and different updating schemes can be used (e.g., Gauss-Seidel or asynchronous) [18].

𝒛2()\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\boldsymbol{z}^{(\ell)}_{-2}𝒛1()\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\boldsymbol{z}^{(\ell)}_{-1}W2W_{2}W1W_{1}V2V_{2}V1V_{1}
Figure 1: Schematic representation of the overlapping Schwarz algorithm.

III Main Results

In this section we analyze the convergence of the Schwarz algorithm. We will see that parametric sensitivity plays a central role in convergence behavior because, intuitively, primal-dual solutions of the neighbors of a subdomain enter as parametric perturbations. We first analyze the parametric sensitivity of PU()P_{U}(\cdot) in a way that the result can be generally applied to any node subset U𝒱U\subseteq\mathcal{V}. We then exploit this sensitivity result to establish convergence.

III-A Preliminaries

Assumption 1.

The following is assumed for U𝒱U\subseteq\mathcal{V}.

  • (a)

    𝑸U0\boldsymbol{Q}_{U}\succeq 0.

  • (b)

    Given a convex set 𝔻UmU\mathbb{D}_{U}\subseteq\mathbb{R}^{m_{U}}, for any PU(𝒅U)P_{U}(\boldsymbol{d}_{U}) with 𝒅U𝔻U\boldsymbol{d}_{U}\in\mathbb{D}_{U}, there exists a primal-dual solution at which the second-order sufficient condition (SOSC) and linear independence constraint qualification (LICQ) hold.

By Assumption 1, there exists a unique primal-dual solution of PU(𝒅U)P_{U}(\boldsymbol{d}_{U}) for any 𝒅U𝔻U\boldsymbol{d}_{U}\in\mathbb{D}_{U}. Thus, the primal-dual solution mapping 𝒛U:𝔻UnU\boldsymbol{z}^{*}_{U}:\mathbb{D}_{U}\rightarrow\mathbb{R}^{n_{U}} is well defined.

Definition 1.

Consider PU()P_{U}(\cdot) and B𝕀1:nUB\subseteq\mathbb{I}_{1:n_{U}}.

  • (a)

    BB is called a basis if 𝑯U[B,B]\boldsymbol{H}_{U}[B,B] is nonsingular.

  • (b)

    𝒛U(𝒅U)nU\boldsymbol{z}_{U}(\boldsymbol{d}_{U})\in\mathbb{R}^{n_{U}} is called a basic solution of PU(𝒅U)P_{U}(\boldsymbol{d}_{U}) if

    𝑯U[B,B]𝒛U(𝒅U)[B]\displaystyle\boldsymbol{H}_{U}[B,B]\boldsymbol{z}_{U}(\boldsymbol{d}_{U})[B] =𝒅U[B],\displaystyle=\boldsymbol{d}_{U}[B], (2a)
    𝒛U(𝒅U)[𝕀1:nUB]\displaystyle\boldsymbol{z}_{U}(\boldsymbol{d}_{U})[\mathbb{I}_{1:n_{U}}\setminus B] =0.\displaystyle=0. (2b)
  • (c)

    BB is feasible (optimal) for PU(𝒅U)P_{U}(\boldsymbol{d}_{U}) if its basic solution is feasible (optimal) for PU(𝒅U)P_{U}(\boldsymbol{d}_{U}).

Definition 1 is an extension of the notion of basis for linear programs (studied in [21]). Note that the basis associated with a basic solution may not be unique for the case where there exist zero components in 𝒙U(𝒅U)\boldsymbol{x}^{*}_{U}(\boldsymbol{d}_{U}) or strict complementarity does not hold. We consider 𝒛UB:mUnU\boldsymbol{z}^{B}_{U}:\mathbb{R}^{m_{U}}\rightarrow\mathbb{R}^{n_{U}} as the basic solution mapping for BB.

Lemma 1.

Let Assumption 1 hold. For 𝐝U𝔻U\boldsymbol{d}_{U}\in\mathbb{D}_{U}, there exists B𝔹UB\in\mathbb{B}_{U} that is a basis for 𝐳U(𝐝U)\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U}), where 𝔹U:={B𝕀1:nU:𝐝U𝔻U such that B is optimal for PU(𝐝U)}\mathbb{B}_{U}:=\{B\subseteq\mathbb{I}_{1:n_{U}}:\exists\boldsymbol{d}_{U}\in\mathbb{D}_{U}\text{ such that }B\text{ is optimal for }P_{U}(\boldsymbol{d}_{U})\}.

Proof.

We let

𝑯^:=[𝑸U𝑨U[𝒜(𝒅U),:]𝑨U[𝒜(𝒅U),:]],\displaystyle\widehat{\boldsymbol{H}}:=\begin{bmatrix}\boldsymbol{Q}_{U}&\boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:]^{\top}\\ \boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:]\end{bmatrix}, (3)

where 𝒜(𝒅U)\mathcal{A}^{*}(\boldsymbol{d}_{U}) is the active constraint set evaluated at the solution of PU(𝒅U)P_{U}(\boldsymbol{d}_{U}). From Assumption 1, 𝑨U[𝒜(𝒅U),:]\boldsymbol{A}_{U}[\mathcal{A}^{*}(\boldsymbol{d}_{U}),:] has full row-rank by LICQ, and the reduced Hessian associated with 𝑯^\widehat{\boldsymbol{H}} is positive definite by SOSC. These imply that 𝑯^\widehat{\boldsymbol{H}} is nonsingular [24, Lemma 16.1]. We choose B:={i𝕀1:nU:𝒛U(𝒅U)[i]0}B:=\{i\in\mathbb{I}_{1:n_{U}}:\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U})[i]\neq 0\}. One can show that 𝑯U[B,B]\boldsymbol{H}_{U}[B,B] is a permuted principal submatrix of 𝑯^\widehat{\boldsymbol{H}}. Therefore, 𝑯U[B,B]{\boldsymbol{H}}_{U}[B,B] is nonsingular, and this yields that BB is a basis. From the KKT conditions for PU(𝒅U)P_{U}(\boldsymbol{d}_{U}) and the definition of BB, (2) holds for 𝒛U(𝒅U)\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U}). Thus, we have 𝒛UB(𝒅U)=𝒛U(𝒅U)\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U})=\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U}). ∎

Lemma 2.

Let q1,,qNq:(a,b)q_{1},\cdots,q_{N_{q}}:(a,b)\rightarrow\mathbb{R} be quadratic functions. We let q:(a,b)q:(a,b)\rightarrow\mathbb{R} be q():=mini𝕀1:Nqqi()q(\cdot):=\min_{i\in\mathbb{I}_{1:{N_{q}}}}q_{i}(\cdot). Then there exists {a0=a,a1,,aKq=b}\{a_{0}=a,a_{1},\cdots,a_{K_{q}}=b\} such that for each k𝕀1:Kqk\in\mathbb{I}_{1:{K_{q}}}, there exists i𝕀1:Nqi\in\mathbb{I}_{1:{N_{q}}} such that qi()=q()q_{i}(\cdot)=q(\cdot) on (ak1,ak)(a_{k-1},a_{k}).

Proof.

For each (i,j)𝕀1:Nq×𝕀1:Nq(i,j)\in\mathbb{I}_{1:{N_{q}}}\times\mathbb{I}_{1:{N_{q}}}, we let Ii,j:={x(a,b):qi(x)qj(x)}I_{i,j}:=\{x\in(a,b):q_{i}(x)\leq q_{j}(x)\}. Since qi()q_{i}(\cdot) and qj()q_{j}(\cdot) are quadratic, we have that Ii,jI_{i,j} is obtained as a union of intervals (not necessarily open or closed). Then we can define Ii:=j𝕀1:NqIi,jI_{i}:=\bigcap_{j\in\mathbb{I}_{1:{N_{q}}}}I_{i,j}. Since Ii,jI_{i,j} are unions of intervals, we have that IiI_{i} is also a union of intervals. By collecting the end points of the intervals in {Ii:i𝕀1:Nq}\{I_{i}:i\in\mathbb{I}_{1:{N_{q}}}\}, we can construct {a0,,aKq}\{a_{0},\cdots,a_{K_{q}}\}. In particular, {a0,,aKq}=i𝕀1:Nqclosure(Ii)interior(Ii)\{a_{0},\cdots,a_{K_{q}}\}=\bigcup_{i\in\mathbb{I}_{1:{N_{q}}}}\text{closure}(I_{i})\setminus\mathop{\text{interior}}(I_{i}). We observe that i𝕀1:NqIi=(a,b)\bigcup_{i\in\mathbb{I}_{1:{N_{q}}}}I_{i}=(a,b); thus, for any k𝕀1:Kqk\in\mathbb{I}_{1:{K_{q}}}, (ak1,ak)Ii(a_{k-1},a_{k})\subseteq I_{i} for some i𝕀1:Nqi\in\mathbb{I}_{1:{N_{q}}}. This means qi()=q()q_{i}(\cdot)=q(\cdot) on (ak1,ak)(a_{k-1},a_{k}). The proof is complete. ∎

Lemma 3.

Let Assumption 1 hold. For 𝐝U,𝐝U𝔻U\boldsymbol{d}_{U},\boldsymbol{d}_{U}^{\prime}\in\mathbb{D}_{U}, there exist {s0=0<<sNd=1}\{s_{0}=0<\cdots<s_{N_{d}}=1\} and {Bk𝔹U}k=1Nd\{B_{k}\in\mathbb{B}_{U}\}_{k=1}^{N_{d}} such that for k𝕀1:Ndk\in\mathbb{I}_{1:N_{d}}, BkB_{k} is optimal for PU(𝐝Us)P_{U}(\boldsymbol{d}^{s}_{U}) with any s[sk1,sk]s\in[s_{k-1},s_{k}], where 𝐝Us:=(1s)𝐝U+s𝐝U\boldsymbol{d}^{s}_{U}:=(1-s)\boldsymbol{d}_{U}+s\boldsymbol{d}^{\prime}_{U}.

Proof.

Let 𝔹U={B(1),,B(T)}\mathbb{B}_{U}=\{B_{(1)},\cdots,B_{(T)}\} (note that 𝔹U\mathbb{B}_{U} is finite). We define π(t):[0,1]¯\pi_{(t)}:[0,1]\rightarrow\overline{\mathbb{R}} to be the mapping from s[0,1]s\in[0,1] to the objective value of 𝒛UB(t)(𝒅Us)\boldsymbol{z}^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U}) for PU(𝒅Us)P_{U}(\boldsymbol{d}^{s}_{U}) (the value is ++\infty if B(t)B_{(t)} is infeasible). Also, we define π:[0,1]¯\pi:[0,1]\rightarrow\overline{\mathbb{R}} by π(s):=mint𝕀1:Tπ(t)(s)\pi(s):=\min_{t\in\mathbb{I}_{1:T}}\pi_{(t)}(s). By Lemma 1, π()\pi(\cdot) is the objective value mapping of PU(𝒅Us)P_{U}(\boldsymbol{d}^{s}_{U}) from s[0,1]s\in[0,1].

One can see that zUB(t)(𝒅Us)z^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U}) is affine in ss; thus the feasibility conditions for zUB(t)(𝒅Us)z^{B_{(t)}}_{U}(\boldsymbol{d}^{s}_{U}) can be expressed by a finite number of affine equalities and inequalities. This implies that the set of s[0,1]s\in[0,1] on which π(s)<\pi(s)<\infty is obtained as a closed interval in [0,1][0,1]. Accordingly, π(t)()\pi_{(t)}(\cdot) is a quadratic function on a closed interval support. Now, we collect the endpoints of such intervals over t𝕀1:Tt\in\mathbb{I}_{1:T} to construct Π~:={s~0=0,,s~N~d=1}\widetilde{\Pi}:=\{\widetilde{s}_{0}=0,\cdots,\widetilde{s}_{\widetilde{N}_{d}}=1\}. For each k𝕀1:N~dk\in\mathbb{I}_{1:\widetilde{N}_{d}}, we collect 𝒯k:={t𝕀1:T:π(t)()<\mathcal{T}_{k}:=\{t\in\mathbb{I}_{1:T}:\pi_{(t)}(\cdot)<\infty on (s~k1,s~k)}(\widetilde{s}_{k-1},\widetilde{s}_{k})\}. Observe that (i) π(t)\pi_{(t)} with t𝒯kt\in\mathcal{T}_{k} are quadratic on (s~k1,s~k)(\widetilde{s}_{k-1},\widetilde{s}_{k}) and (ii) π()=mint𝒯kπ(t)()\pi(\cdot)=\min_{t\in\mathcal{T}_{k}}\pi_{(t)}(\cdot) on (s~k1,s~k)(\widetilde{s}_{k-1},\widetilde{s}_{k}). By Lemma 2 (stated below, applicable due to (i)), we have that each (s~k1,s~k)(\widetilde{s}_{k-1},\widetilde{s}_{k}) can be further divided by using Π~~k:={s~~k,0=s~k1,,s~~k,N~~d,k=s~k}\widetilde{\widetilde{\Pi}}_{k}:=\{\widetilde{\widetilde{s}}_{k,0}=\widetilde{s}_{k-1},\cdots,\widetilde{\widetilde{s}}_{k,\widetilde{\widetilde{N}}_{d,k}}=\widetilde{s}_{k}\}, where for each (s~~k,k1,s~~k,k)(\widetilde{\widetilde{s}}_{k,k^{\prime}-1},\widetilde{\widetilde{s}}_{k,k^{\prime}}) with k𝕀1:N~~d,kk^{\prime}\in\mathbb{I}_{1:\widetilde{\widetilde{N}}_{d,k}} and k𝕀1:N~dk\in\mathbb{I}_{1:\widetilde{N}_{d}}, there exists t𝒯kt\in\mathcal{T}_{k} such that π(t)=mint𝒯kπ(t)()=π()\pi_{(t)}=\min_{t\in\mathcal{T}_{k}}\pi_{(t)}(\cdot)=\pi(\cdot) (recall observation (ii)).

We now let {s0,,sNd}=k𝕀1:N~dΠ~~k\{s_{0},\cdots,s_{N_{d}}\}=\bigcup_{k\in\mathbb{I}_{1:\widetilde{N}_{d}}}\widetilde{\widetilde{\Pi}}_{k}. One can observe that, for each k𝕀1:Ndk\in\mathbb{I}_{1:N_{d}}, there exists t𝕀1:Tt\in\mathbb{I}_{1:T} such that π(t)()=π()\pi_{(t)}(\cdot)=\pi(\cdot) on (sk1,sk)(s_{k-1},s_{k}). We choose such B(t)B_{(t)} as BkB_{k}; it is known that the objective value mapping of a QP is continuous on its support (e.g., see [25, Corollary 9] or [26, Theorem 5.53]); thus π()\pi(\cdot) is continuous on its support. By the continuity of π()\pi(\cdot) and π(t)()\pi_{(t)}(\cdot) on their supports, we have that π(t)()=π()\pi_{(t)}(\cdot)=\pi(\cdot) on (sk1,sk)(s_{k-1},s_{k}) implies that the same holds on [sk1,sk][s_{k-1},s_{k}]. Finally, one can check that BkB_{k} is optimal for PU(𝒅Us)P_{U}(\boldsymbol{d}^{s}_{U}) with s[sk1,sk]s\in[s_{k-1},s_{k}]. The desired {sk}k=0Nd\{s_{k}\}_{k=0}^{N_{d}} and {Bk}k=1Nd\{B_{k}\}_{k=1}^{N_{d}} are thus obtained.

An important implication of Lemma 3 is that the solution path obtained by the perturbation on 𝒅U\boldsymbol{d}_{U} can be divided into multiple paths, each of which is a basic solution mapping. Thus, given Lemma 3, it suffices to study the sensitivities of the basic solution mappings.

III-B Exponential Decay of Sensitivity

We now establish our main sensitivity result for the constrained QP, known as exponential decay of sensitivity.

Theorem 1 (Exponential Decay of Sensitivity).

Let Assumption 1 hold. The following holds for 𝐝U,𝐝U𝔻U\boldsymbol{d}_{U},\boldsymbol{d}^{\prime}_{U}\in\mathbb{D}_{U}:

TiU(𝒛U(𝒅)𝒛U(𝒅))\displaystyle\left\|T_{i\leftarrow U}\left(\boldsymbol{z}^{*}_{U}(\boldsymbol{d})-\boldsymbol{z}^{*}_{U}(\boldsymbol{d}^{\prime})\right)\right\| =jUΓUρUΔU(i,j)12djdj\displaystyle=\sum_{j\in U}\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}\|d_{j}-d^{\prime}_{j}\|

with ΓU:=σ¯U/σ¯U2\Gamma_{U}:=\overline{\sigma}_{U}/\underline{\sigma}^{2}_{U}; ρU:=(σ¯U2σ¯U2)/(σ¯U2+σ¯U2)\rho_{U}:=(\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2})/(\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2}); σ¯U:=minB𝔹Uσmin(𝐇U[B,B])\displaystyle\underline{\sigma}_{U}:=\min_{B\in\mathbb{B}_{U}}\sigma_{\min}(\boldsymbol{H}_{U}[B,B]); σ¯U:=maxB𝔹Uσmax(𝐇U[B,B])\displaystyle\overline{\sigma}_{U}:=\max_{B\in\mathbb{B}_{U}}\sigma_{\max}(\boldsymbol{H}_{U}[B,B]).

Here \lceil\cdot\rceil denotes the ceiling operator, and ΔU(i,j)\Delta_{U}(i,j) denotes the geodesic distance between i,jUi,j\in U on the subgraph of 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}) induced by UU (the number of elements in the shortest path {eq:eqU}q=1ne\{e_{q}\in\mathcal{E}:e_{q}\subseteq U\}_{q=1}^{n_{e}} from ii to jj); σmin\sigma_{\min} and σmax\sigma_{\max} denote the minimum and maximum singular values of their matrix argument.

To facilitate the discussion, we define for M|B|×|B|M\in\mathbb{R}^{|B|\times|B|} and v|B|v\in\mathbb{R}^{|B|} the following: M[i][j]:=M[i,j]M_{[i][j]}:=M[\mathcal{I}_{i},\mathcal{I}_{j}], v[i]:=v[i]v_{[i]}:=v[\mathcal{I}_{i}], where i:={q𝕀1:|B|:bq{kU,k<ink+1,,kU,kink}}\mathcal{I}_{i}:=\{q\in\mathbb{I}_{1:|B|}:b_{q}\in\{\sum_{k\in U,k<i}n_{k}+1,\cdots,\sum_{k\in U,k\leq i}n_{k}\}\}, and B={b1<<b|B|}B=\{b_{1}<\cdots<b_{|B|}\}. Note that i\mathcal{I}_{i} is the index set that extracts the indices associated with ziz_{i}, and {i}iU\{\mathcal{I}_{i}\}_{i\in U} partitions 𝕀1:|B|\mathbb{I}_{1:|B|}.

Lemma 4.

If ΔU(i,j)>q𝕀>0\Delta_{U}(i,j)>q\in\mathbb{I}_{>0}, (𝐇U[B,B]q)[i][j]=0({\boldsymbol{H}}_{U}[B,B]^{q})_{[i][j]}=0.

Proof.

We use the notation 𝑯~:=𝑯U[B,B]\widetilde{\boldsymbol{H}}:=\boldsymbol{H}_{U}[B,B]. We proceed by induction. From the fact that Hi,j=0H_{i,j}=0 if ΔU(i,j)>1\Delta_{U}(i,j)>1, one can observe that 𝑯~[i][j]=0\widetilde{\boldsymbol{H}}_{[i][j]}=0 if ΔU(i,j)>1\Delta_{U}(i,j)>1. Hence, the claim holds for q=1q=1. Now suppose the claim holds for q=qq=q^{\prime}. One can easily see that triangle inequality holds for distance ΔU(,)\Delta_{U}(\cdot,\cdot). From triangle inequality, if ΔU(i,j)>q+1\Delta_{U}(i,j)>q^{\prime}+1, either ΔU(i,l)>q\Delta_{U}(i,l)>q^{\prime} or ΔU(j,l)>1\Delta_{U}(j,l)>1 holds for any lUl\in U. Thus, if ΔU(i,j)>q+1\Delta_{U}(i,j)>q^{\prime}+1, then (𝑯~q+1)[i][j]=lU(𝑯~q)[i][l]𝑯~[l][j]=0(\widetilde{\boldsymbol{H}}^{q^{\prime}+1})_{[i][j]}=\sum_{l\in U}(\widetilde{\boldsymbol{H}}^{q^{\prime}})_{[i][l]}\widetilde{\boldsymbol{H}}_{[l][j]}=0. ∎

Lemma 5.

(𝑯U[B,B]1)[i][j]ΓUρUΔU(i,j)12\|(\boldsymbol{H}_{U}[B,B]^{-1})_{[i][j]}\|\leq\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}, if B𝔹UB\in\mathbb{B}_{U}.

Proof.

By definition, σ¯U2I𝑯~2σ¯U2I\underline{\sigma}_{U}^{2}I\preceq\widetilde{\boldsymbol{H}}^{2}\preceq\overline{\sigma}_{U}^{2}I, and thus

σ¯U2σ¯U2σ¯U2+σ¯U2II2σ¯U2+σ¯U2𝑯~2σ¯U2+σ¯U2σ¯U2+σ¯U2I.\displaystyle\frac{\underline{\sigma}_{U}^{2}-\overline{\sigma}_{U}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}I\preceq I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\preceq\frac{-\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}I. (4)

Moreover,

𝑯~1\displaystyle\widetilde{\boldsymbol{H}}^{-1} =2σ¯U2+σ¯U2𝑯~(2σ¯U2+σ¯U2𝑯~2)1\displaystyle=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}\left(\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{-1} (5a)
=2σ¯U2+σ¯U2q=0𝑯~(I2σ¯U2+σ¯U2𝑯~2)q,\displaystyle=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=0}^{\infty}\widetilde{\boldsymbol{H}}\left(I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{q}, (5b)

where the second equality follows from [27, Theorem 5.6.9 and Corollay 5.6.16], (4), and the fact that the series in (5b) converges. By Lemma 4, if ΔU(i,j)>2q+1\Delta_{U}(i,j)>2q+1, then

(𝑯~(I2σ¯U2+σ¯U2𝑯~2)q)[i][j]=0\displaystyle\left(\widetilde{\boldsymbol{H}}\left(I-\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\widetilde{\boldsymbol{H}}^{2}\right)^{q}\right)_{[i][j]}=0 (6)

holds. By extracting submatrices from (5), one establishes

(𝑯~1)[i][j]=2σ¯U2+σ¯U2q=q0(𝑯~(I2𝑯~2σ¯U2+σ¯U2)q)[i][j],\displaystyle(\widetilde{\boldsymbol{H}}^{-1})_{[i][j]}=\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\left(\widetilde{\boldsymbol{H}}\left(I-\frac{2\widetilde{\boldsymbol{H}}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\right)^{q}\right)_{[i][j]},

where q0:=ΔU(i,j)12q_{0}:=\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil since the summation over q=0,,q01q=0,\cdots,q_{0}-1 adds up to zero by (6). Using the triangle inequality and the fact that the matrix norm of a submatrix is always smaller than that of the original matrix, we have

(𝑯~1)[i][j]\displaystyle\|(\widetilde{\boldsymbol{H}}^{-1})_{[i][j]}\| 2σ¯U2+σ¯U2q=q0𝑯~(I2𝑯~2σ¯U2+σ¯U2)q\displaystyle\leq\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\left\|\widetilde{\boldsymbol{H}}\left(I-\frac{2\widetilde{\boldsymbol{H}}^{2}}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\right)^{q}\right\|
2σ¯U2+σ¯U2q=q0σ¯U(σ¯U2σ¯U2σ¯U2+σ¯U2)q\displaystyle\leq\frac{2}{\underline{\sigma}_{U}^{2}+\overline{\sigma}_{U}^{2}}\sum_{q=q_{0}}^{\infty}\overline{\sigma}_{U}\left(\frac{\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2}}{\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2}}\right)^{q}
σ¯Uσ¯U2(σ¯U2σ¯U2σ¯U2+σ¯U2)(ΔU(i,j)1)/2,\displaystyle\leq\frac{\overline{\sigma}_{U}}{\underline{\sigma}_{U}^{2}}\left(\frac{\overline{\sigma}_{U}^{2}-\underline{\sigma}_{U}^{2}}{\overline{\sigma}_{U}^{2}+\underline{\sigma}_{U}^{2}}\right)^{\lceil(\Delta_{U}(i,j)-1)/2\rceil}, (7)

where the second inequality follows from the submultiplicativity of the matrix norm and the fact that the induced 2-norm of a symmetric matrix is equal to its largest eigenvalue; the last inequality follows from geometric series. ∎

Proof of Theorem 1.

From Lemma 3, we have

𝒛U(𝒅)𝒛U(𝒅)=k=1Nd𝒛UBk(𝒅Usk)𝒛UBk(𝒅Usk1).\displaystyle\boldsymbol{z}^{*}_{U}(\boldsymbol{d})-\boldsymbol{z}^{*}_{U}(\boldsymbol{d}^{\prime})=\sum_{k=1}^{N_{d}}\boldsymbol{z}^{B_{k}}_{U}(\boldsymbol{d}^{s_{k}}_{U})-\boldsymbol{z}^{B_{k}}_{U}(\boldsymbol{d}^{s_{k-1}}_{U}). (8)

The solution does not jump when the basis changes, because of solution uniqueness. From the block multiplication formula and 𝒅Usk𝒅Usk1=(sksk1)(𝒅U𝒅U)\boldsymbol{d}^{s_{k}}_{U}-\boldsymbol{d}^{s_{k-1}}_{U}=(s_{k}-s_{k-1})(\boldsymbol{d}_{U}-\boldsymbol{d}^{\prime}_{U}), we have

(𝒛UB(𝒅Usk)[B]𝒛UB(𝒅Usk1)[B])[i]\displaystyle\left(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k}})[B]-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k-1}})[B]\right)_{[i]} (9)
=jU(𝑯~1)[i][j](sksk1)(𝒅U[B]𝒅U[B])[j].\displaystyle\quad=\sum_{j\in U}\left(\widetilde{\boldsymbol{H}}^{-1}\right)_{[i][j]}(s_{k}-s_{k-1})(\boldsymbol{d}_{U}[B]-\boldsymbol{d}^{\prime}_{U}[B])_{[j]}.

From the definition of bases and basic solutions, we have:

(𝒛UB(𝒅Usk)[B]𝒛UB(𝒅Usk1)[B])[i]\displaystyle\|\left(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k}})[B]-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}_{U}^{s_{k-1}})[B]\right)_{[i]}\| (10)
=TiU(𝒛UB(𝒅sk)𝒛UB(𝒅sk1)).\displaystyle\quad=\left\|T_{i\leftarrow U}(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k}})-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k-1}}))\right\|.

From (9)-(10), the triangle inequality, the submultiplicativity of matrix norms, Lemma 5, and the fact that the norm of a subvector is not greater than the norm of the original vector,

TiU(𝒛UB(𝒅sk)𝒛UB(𝒅sk1))\displaystyle\left\|T_{i\leftarrow U}(\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k}})-\boldsymbol{z}^{B}_{U}(\boldsymbol{d}^{s_{k-1}}))\right\| (11)
ΓUρU(ΔU(i,j)1)/2(sksk1)djdj.\displaystyle\quad\leq\Gamma_{U}\rho_{U}^{\lceil(\Delta_{U}(i,j)-1)/2\rceil}\left(s_{k}-s_{k-1})\|d_{j}-d^{\prime}_{j}\right\|.

By multiplying (8) by TiUT_{i\leftarrow U} and by applying (11) and the triangle inequality, we obtain the result. ∎

The coefficient:

ΓUρUΔU(i,j)12\Gamma_{U}\rho_{U}^{\lceil\frac{\Delta_{U}(i,j)-1}{2}\rceil}

in Theorem 1 is the componentwise Lipschitz constant of the solution mapping 𝒛U()\boldsymbol{z}^{*}_{U}(\cdot). Recall that ρU(0,1)\rho_{U}\in(0,1); hence, the coefficient decays exponentially with the distance ΔU(i,j)\Delta_{U}(i,j). Therefore, the effect of the perturbation decays as one moves away from the perturbation location.

III-C Convergence Analysis

In this subsection, we formally establish the convergence of Algorithm 1. To facilitate the later discussion, we first introduce some notation: 𝒛U:={zi}iU\boldsymbol{z}^{\dagger}_{U}:=\{z^{\dagger}_{i}\}_{i\in U}, where zi:=TiV𝒛V(𝒅V)z^{\dagger}_{i}:=T_{i\leftarrow V}\boldsymbol{z}^{*}_{V}(\boldsymbol{d}_{V}) and UVU\subseteq V. Furthermore, we define U,:=maxiUTiU()\|\cdot\|_{U,\infty}:=\max_{i\in U}\|T_{i\leftarrow U}(\cdot)\| and U,U,1:=iUjUTiU()TjU\|\cdot\|_{U,U^{\prime},1}:=\sum_{i\in U}\sum_{j\in U^{\prime}}\|T_{i\leftarrow U}(\cdot)T^{\top}_{j\leftarrow U^{\prime}}\|.

Assumption 2.

Assumption 1 holds with 𝔻U:={𝐝U𝐇U𝐳U:𝐳UnU}\mathbb{D}_{U}:=\{\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}_{-U}:\boldsymbol{z}_{-U}\in\mathbb{R}^{n_{-U}}\} for any UVU\subseteq V.

Here nU=iNV(U)nin_{-U}=\sum_{i\in N_{V}(U)}n_{i}. While Assumption 2 is strong, we believe it can be relaxed; but doing so meaningfully is a technically extensive endeavor beyond the scope of this communication format. We note, however, that it is always satisfied for bound constraints and for augmented Lagrangian reformulations of the original problem by using slacks to obtain only bound inequality constraints and then penalizing all equality constraints, which can approximate the solution set of the original problem arbitrarily well.

Assumption 3.

ω:=mink𝕀1:KΔV(Vk,NV(Wk))11\omega:=\min_{k\in\mathbb{I}_{1:K}}\Delta_{V}(V_{k},N_{V}(W_{k}))-1\geq 1.

Here, we abuse the notation by letting ΔV(U,U):=minuU,uUΔV(u,u)\Delta_{V}(U,U^{\prime}):=\min_{u\in U,u^{\prime}\in U^{\prime}}\Delta_{V}(u,u^{\prime}), where U,UVU,U^{\prime}\subseteq V. We call ω\omega the size of overlap. Note that an overlapping partition {Wk}k=1K\{W_{k}\}_{k=1}^{K} with size ω\omega can be constructed from a non-overlapping partition {Vk}k=1K\{V_{k}\}_{k=1}^{K} by expanding each original subdomain using Wk={iV:ΔV(i,Vk)ω}W_{k}=\{i\in V:\Delta_{V}(i,V_{k})\leq\omega\}.

Assumption 4.

σ¯:=infU𝒱σ¯U>0\displaystyle\underline{\sigma}:=\inf_{U\subseteq\mathcal{V}}\underline{\sigma}_{U}>0, σ¯:=supU𝒱σ¯U<\displaystyle\overline{\sigma}:=\sup_{U\subseteq\mathcal{V}}\overline{\sigma}_{U}<\infty.

Assumption 4 trivially holds if the parent graph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}) is finite, but it may be violated if the parent graph is infinite. In particular, σ¯V\underline{\sigma}_{V} or σ¯V\overline{\sigma}_{V} may tend to zero or infinity as VV grows. Checking the validity of Assumption 4 for the infinite parent graph case is beyond the scope of this paper; sufficient conditions for this to hold will be studied in the future.

We note, however, that Assumptions 2 and 4 hold for augmented Lagrangian reformulations with bounded data when the objective matrix has bounded entries and is strongly diagonally dominant, which is a way we can approximate most QPs with some regularization; in contrast, Assumption 3 can be satisfied by construction. Therefore our setup contains a large set of problems or close approximations.

Lemma 6.

Let Assumption 2 hold. For any UVU\subseteq V we have that 𝐳U=𝐳U(𝐝U𝐇U𝐳U)\boldsymbol{z}^{\dagger}_{U}=\boldsymbol{z}^{*}_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U}).

Proof.

Since PU()P_{U}(\cdot) is a convex QP, the KKT conditions are necessary and sufficient for optimality. By Assumption 2, the primal-dual solution is unique; therefore, it suffices to prove that 𝒛U\boldsymbol{z}^{\dagger}_{U} satisfies the KKT conditions of PU(𝒅U𝑯U𝒛U)P_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U}). From the KKT conditions of PV(𝒅V)P_{V}(\boldsymbol{d}_{V}), we have

𝑸V𝒙V+𝑨V𝝀V=𝒇V,𝑨V𝒙V=𝒈V,𝑨V𝒙V𝒈V\displaystyle\boldsymbol{Q}_{V}\boldsymbol{x}^{\dagger}_{V}+\boldsymbol{A}_{V}^{\top}\boldsymbol{\lambda}_{V}^{\dagger}=\boldsymbol{f}_{V},\quad\boldsymbol{A}_{V}^{\mathcal{E}}\boldsymbol{x}_{V}^{\dagger}=\boldsymbol{g}_{V}^{\mathcal{E}},\quad\boldsymbol{A}_{V}^{\mathcal{I}}\boldsymbol{x}_{V}^{\dagger}\geq\boldsymbol{g}_{V}^{\mathcal{I}}
𝝀V,0,diag(𝝀V,)(𝑨V𝒙V𝒈V)=0.\displaystyle\boldsymbol{\lambda}_{V}^{{\dagger},\mathcal{I}}\geq 0,\quad\mathop{\text{diag}}(\boldsymbol{\lambda}_{V}^{{\dagger},\mathcal{I}})(\boldsymbol{A}_{V}^{\mathcal{I}}\boldsymbol{x}_{V}^{\dagger}-\boldsymbol{g}_{V}^{\mathcal{I}})=0. (12)

By extracting the rows associated with UU and rearranging equations and inequalities, we obtain

𝑸U𝒙U+𝑨U𝝀U=𝒇U𝑸U𝒙U𝑨U𝝀U\displaystyle\boldsymbol{Q}_{U}\boldsymbol{x}^{\dagger}_{U}+\boldsymbol{A}_{U}^{\top}\boldsymbol{\lambda}^{\dagger}_{U}=\boldsymbol{f}_{U}-\boldsymbol{Q}_{-U}\boldsymbol{x}^{\dagger}_{-U}-\boldsymbol{A}^{\top}_{-U}\boldsymbol{\lambda}^{\dagger}_{-U} (13)
𝑨U𝒙U=𝒈U𝑨U𝒙U𝑨U𝒙U𝒈U𝑨U𝒙U,\displaystyle\boldsymbol{A}^{\mathcal{E}}_{U}\boldsymbol{x}^{\dagger}_{U}=\boldsymbol{g}^{\mathcal{E}}_{U}-\boldsymbol{A}^{\mathcal{E}}_{-U}\boldsymbol{x}^{\dagger}_{-U}\quad\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}^{\dagger}_{U}\geq\boldsymbol{g}^{\mathcal{I}}_{U}-\boldsymbol{A}^{\mathcal{I}}_{-U}\boldsymbol{x}^{\dagger}_{-U},
𝝀U,0,diag(𝝀U,)(𝑨U𝒙U𝒈U+𝑨U𝒙U)=0.\displaystyle\boldsymbol{\lambda}^{{\dagger},\mathcal{I}}_{U}\geq 0,\quad\mathop{\text{diag}}(\boldsymbol{\lambda}^{{\dagger},\mathcal{I}}_{U})(\boldsymbol{A}^{\mathcal{I}}_{U}\boldsymbol{x}^{\dagger}_{U}-\boldsymbol{g}^{\mathcal{I}}_{U}+\boldsymbol{A}_{-U}^{\mathcal{I}}\boldsymbol{x}^{\dagger}_{-U})=0.

Here, note that 𝑨U:={Ai,j}iU,jNV(U)\boldsymbol{A}_{-U}:=\{A_{i,j}\}_{i\in U,j\in N_{V}(U)} and 𝑨U:={Aj,i}iU,jNV(U)\boldsymbol{A}^{\top}_{-U}:=\{A_{j,i}^{\top}\}_{i\in U,j\in N_{V}(U)} (they are not transpose to each other). Conditions (13) imply that 𝒛U\boldsymbol{z}^{\dagger}_{U} satisfies the KKT conditions for PU(𝒅U𝑯U𝒛U)P_{U}(\boldsymbol{d}_{U}-\boldsymbol{H}_{-U}\boldsymbol{z}^{\dagger}_{-U}). Thus, it is the solution. ∎

Lemma 6 provides a form of consistent subproblems whose solution recovers a piece of the full solution as long as the complementary solution information is accurate. Thus, it justifies the subproblem formulation in Algorithm 1.

The main result of the paper is stated as follows.

Theorem 2 (Convergence of Overlapping Schwarz).

Let Assumptions 24 hold. The sequence {𝐳V()}=0\{\boldsymbol{z}^{(\ell)}_{V}\}_{\ell=0}^{\infty} generated by Algorithm 1 satisfies

𝒛V()𝒛VV,(αV(ω))𝒛V(0)𝒛VV,.\displaystyle\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}\leq\left(\alpha_{V}(\omega)\right)^{\ell}\|\boldsymbol{z}^{(0)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}. (14)

Here, αV(ω):=RVΓρ(ω1)/2\alpha_{V}(\omega):=R_{V}\Gamma\rho^{\lceil(\omega-1)/2\rceil}; Γ:=σ¯/σ¯2\Gamma:={\overline{\sigma}}/{\underline{\sigma}^{2}}; ρ:=(σ¯2σ¯2)/(σ¯2+σ¯2)\rho:=(\overline{\sigma}^{2}-\underline{\sigma}^{2})/(\overline{\sigma}^{2}+\underline{\sigma}^{2}); and RV:=maxUV𝐇UU,U,1R_{V}:=\displaystyle\max_{U\subseteq V}\|\boldsymbol{H}_{-U}\|_{U,-U,1}.

Proof.

By Lemma 6, we have that, for any iVki\in V_{k},

zi(+1)zi\displaystyle\left\|z^{(\ell+1)}_{i}-z^{\dagger}_{i}\right\| =TiWk𝒛Wk(𝒅Wk𝑯Wk𝒛Wk())\displaystyle=\Big{\|}T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell)}_{-W_{k}})
TiWk𝒛Wk(𝒅Wk𝑯Wk𝒛Wk)\displaystyle\quad\quad-T_{i\leftarrow W_{k}}\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{{\dagger}}_{-W_{k}})\Big{\|}
jWkΓWkρWk(ΔWk(i,j)1)/2\displaystyle\leq\sum_{j\in W_{k}}\Gamma_{W_{k}}\rho^{\lceil(\Delta_{W_{k}}(i,j)-1)/2\rceil}_{W_{k}}
×TjWk𝑯Wk(𝒛Wk()𝒛Wk).\displaystyle\quad\quad\times\left\|T_{j\leftarrow W_{k}}\boldsymbol{H}_{-W_{k}}(\boldsymbol{z}^{(\ell)}_{-W_{k}}-\boldsymbol{z}^{\dagger}_{-W_{k}})\right\|.

Here, the inequality follows from Theorem 1. A key observation is that TjWk𝑯Wk0T_{j\leftarrow W_{k}}\boldsymbol{H}_{-W_{k}}\neq 0 only if ΔV(j,NV(Wk))=1\Delta_{V}(j,N_{V}(W_{k}))=1. Such a jj satisfies ΔWk(i,j)ω\Delta_{W_{k}}(i,j)\geq\omega, for any iVki\in V_{k}, by the definition of ω\omega, the triangle inequality for ΔV(,)\Delta_{V}(\cdot,\cdot), and ΔWk(i,j)ΔV(i,j)\Delta_{W_{k}}(i,j)\geq\Delta_{V}(i,j). Therefore, for any iVi\in V, we have:

zi(+1)zi\displaystyle\left\|z^{(\ell+1)}_{i}-z^{\dagger}_{i}\right\|
maxk𝕀1:K𝑯WkWk,Wk,1ΓWkρWkω12αk𝒛V()𝒛VV,\displaystyle\leq\max_{k\in\mathbb{I}_{1:K}}\underbrace{\|\boldsymbol{H}_{-W_{k}}\|_{W_{k},-W_{k},1}\Gamma_{W_{k}}\rho^{\lceil\frac{\omega-1}{2}\rceil}_{W_{k}}}_{\alpha_{k}}\left\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{\dagger}_{V}\right\|_{V,\infty}

One can show that maxk𝕀1:KαkαV(ω)\max_{k\in\mathbb{I}_{1:K}}\alpha_{k}\leq\alpha_{V}(\omega) with Assumptions 34. This implies 𝒛V(+1)𝒛VV,αV(ω)𝒛V()𝒛VV,\|\boldsymbol{z}^{(\ell+1)}_{V}-\boldsymbol{z}^{\dagger}_{V}\|_{V,\infty}\leq\alpha_{V}(\omega)\|\boldsymbol{z}^{(\ell)}_{V}-\boldsymbol{z}^{*}_{V}\|_{V,\infty}, which yields (14). ∎

The upper bound of convergence rate αV(ω)\alpha_{V}(\omega) decays with an increase in the size of overlap ω\omega (recall that ρ(0,1)\rho\in(0,1)). In the case of a finite parent graph, Algorithm 1 converges if ω\omega is sufficiently large, since either (i) αV(ω)<1\alpha_{V}(\omega)<1 or (ii) each subproblem becomes the full problem (the solution is obtained in one iteration).

If the parent graph is infinite (such a setting is relevant for time grids, unbounded physical space domains, and scenario trees), we can make VV arbitrarily large, and thus the limiting behavior of αV\alpha_{V} is of interest. Theorem 2 suggests that constant Γ\Gamma and the exponential decay factor ρ\rho do not deteriorate as VV grows, but RVR_{V} may grow with VV. Here, RVR_{V} represents the maximum coupling between the subdomains UVU\subseteq V with its surroundings VUV\setminus U. Accordingly, the growth of RVR_{V} is determined by the topology (as long as Hi,j\|H_{i,j}\| are uniformly bounded). In particular, if the parent graph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}) is finite-dimensional (e.g., grid points in 𝕀d\mathbb{I}^{d} with d<d<\infty), RVR_{V} grows in a polynomial manner with the diameter of VV. In such a case, a sufficiently large ω\omega can be found so that the decay of ρ(ω1)/2\rho^{\lceil(\omega-1)/2\rceil} offsets the growth of RVR_{V} (an exponentially decaying function times a polynomial converges). On the other hand, in the case of scenario trees, where RVR_{V} may grow exponentially, the upper bound derived in Theorem 2 may not provide a tight upper bound, and αV(ω)\alpha_{V}(\omega) may diverge as VV grows no matter how large ω\omega is.

Note that there exists an inherent trade-off between the convergence rate and subproblem complexity. The convergence rate improves with ω\omega but the subproblem solution times also increase with ω\omega. Therefore, to achieve maximum performance, one needs to tune ω\omega.

III-D Monitoring Convergence

Convergence can be monitored by checking the residuals to the KKT conditions of the full problem PV(𝒅V)P_{V}(\boldsymbol{d}_{V}). However, a more convenient surrogate of the full KKT residuals can be derived as follows.

Proposition 1.

Suppose that Assumptions 23 hold, and let {𝐳V()}=0\{\boldsymbol{z}_{V}^{(\ell)}\}_{\ell=0}^{\infty} be a sequence generated by Algorithm 1 and 𝐳Wk(,k):=𝐳Wk(𝐝Wk𝐇Wk𝐳Wk(1))\boldsymbol{z}^{(\ell,k)}_{W_{k}}:=\boldsymbol{z}^{*}_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell-1)}_{-W_{k}}). Then 𝐳V()𝐳V\boldsymbol{z}_{V}^{(\ell)}\rightarrow\boldsymbol{z}^{\dagger}_{V} as \ell\rightarrow\infty if the following holds for k𝕀1:Kk\in\mathbb{I}_{1:K}:

Ek():=𝒛NV(Vk)(,k)𝒛NV(Vk)()0,as.\displaystyle E^{(\ell)}_{k}:=\boldsymbol{z}^{(\ell,k)}_{N_{V}(V_{k})}-\boldsymbol{z}^{(\ell)}_{N_{V}(V_{k})}\rightarrow 0,\;\text{as}\;\ell\rightarrow\infty. (15)
Proof.

We make two observations. (i) The KKT conditions of PWk(𝒅Wk𝑯Wk𝒛Wk(1))P_{W_{k}}(\boldsymbol{d}_{W_{k}}-\boldsymbol{H}_{-W_{k}}\boldsymbol{z}^{(\ell-1)}_{-W_{k}}) and the fact that NV[Vk]WkN_{V}[V_{k}]\subseteq W_{k} (from Assumption 3) imply that (13) holds when we replace UVkU\leftarrow V_{k}, 𝒛U𝒛Vk(,k)\boldsymbol{z}^{\dagger}_{U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{V_{k}}, and 𝒛U𝒛NV(Vk)(,k)\boldsymbol{z}^{\dagger}_{-U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{N_{V}(V_{k})}, for any k𝕀1:Kk\in\mathbb{I}_{1:K}. (ii) The residual of the KKT systems can be obtained from the residuals to (13) by replacing UVkU\leftarrow V_{k}, 𝒛U𝒛Vk(,k)\boldsymbol{z}^{\dagger}_{U}\leftarrow\boldsymbol{z}^{(\ell,k)}_{V_{k}}, and 𝒛U𝒛NV(Vk)()\boldsymbol{z}^{\dagger}_{-U}\leftarrow\boldsymbol{z}^{(\ell)}_{N_{V}(V_{k})} and collecting the residuals for k𝕀1:Kk\in\mathbb{I}_{1:K}. Note that the residuals of the KKT conditions (13) are continuous with respect to 𝒛U\boldsymbol{z}^{\dagger}_{-U}. By using a continuity argument, (15) and observations (i)–(ii), the limit points of {𝒛()}=1\{\boldsymbol{z}^{(\ell)}\}_{\ell=1}^{\infty} satisfy (13) for k𝕀1:Kk\in\mathbb{I}_{1:K} (i.e., (III-C) holds). By the uniqueness of the solution such a limit point is unique, and this implies 𝒛V()𝒛V\boldsymbol{z}_{V}^{(\ell)}\rightarrow\boldsymbol{z}^{\dagger}_{V} as \ell\rightarrow\infty. ∎

We can define primal-dual errors as ϵpr:=maxk𝕀1:Kpr(Ek())\displaystyle\epsilon_{\text{pr}}:=\max_{k\in\mathbb{I}_{1:K}}\|\text{pr}(E^{(\ell)}_{k})\|_{\infty} and ϵdu:=maxk𝕀1:Kdu(Ek())\displaystyle\epsilon_{\text{du}}:=\max_{k\in\mathbb{I}_{1:K}}\|\text{du}(E^{(\ell)}_{k})\|_{\infty}, where pr()(\cdot) and du()(\cdot) extract the indices associated with primal variables and dual variables, respectively. Convergence criteria can thus be set to the following: Stop if (ϵpr<ϵprtol)(ϵdu<ϵdutol)(\epsilon_{\text{pr}}<\epsilon_{\text{pr}}^{\text{tol}})\land(\epsilon_{\text{du}}<\epsilon_{\text{du}}^{\text{tol}}).

IV Numerical Example

Consider the regularized DC OPF problem [28] over a network 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}):

min{θi}i𝒱{Pq}qΩqΩcq,1Pq+cq,2Pq2+γ2i𝒱(θiθj)2\displaystyle\min_{\begin{subarray}{c}\{\theta_{i}\}_{i\in\mathcal{V}}\\ \{P_{q}\}_{q\in\Omega}\end{subarray}}\;\sum_{q\in\Omega}c_{q,1}P_{q}+c_{q,2}P^{2}_{q}+\frac{\gamma}{2}\sum_{i\in\mathcal{V}}(\theta_{i}-\theta_{j})^{2} (16a)
s.t.qΩiPqjN𝒱[i]Bi,j(θiθj)=PiL,i𝒱\displaystyle\mathop{\text{s.t.}}\;\sum_{q\in\Omega_{i}}P_{q}-\sum_{j\in N_{\mathcal{V}}[i]}B_{i,j}(\theta_{i}-\theta_{j})=P^{L}_{i},\;i\in\mathcal{V} (16b)
P¯qPqP¯q,qΩ,θi=θiref,i𝒱ref\displaystyle\qquad\underline{P}_{q}\leq P_{q}\leq\overline{P}_{q},\;q\in\Omega,\;\theta_{i}=\theta^{\text{ref}}_{i},\;i\in\mathcal{V}^{\text{ref}} (16c)
θ¯i,jθiθjθ¯i,j,{i,j}.\displaystyle\qquad-\overline{\theta}_{i,j}\leq\theta_{i}-\theta_{j}\leq\overline{\theta}_{i,j},\;\{i,j\}\in\mathcal{E}. (16d)

Here, Ωi\Omega_{i} is the set of generators in node ii; Ω:=i𝒱Ωi\Omega:=\bigcup_{i\in\mathcal{V}}\Omega_{i} is the set of all generators; 𝒱ref\mathcal{V}^{\text{ref}} is the set of reference nodes; θi\theta_{i} are the voltage angles; PqP_{q} are the active power generations; cq,1c_{q,1} and cq,2c_{q,2} are the generation cost coefficients; γ\gamma is the regularization coefficient; Bi,jB_{i,j} are the line susceptances; PiLP^{L}_{i} are the active power loads; P¯q\underline{P}_{q} and P¯q\overline{P}_{q} are the lower and upper bounds of active power generations, respectively; θ¯i,j\overline{\theta}_{i,j} are the voltage angle separation limits; and θiref\theta^{\text{ref}}_{i} are reference voltage angles. The problems are modeled in the algebraic modeling language JuMP [29] and solved with the nonlinear programming solver Ipopt [30]. The 9,2419,241-bus test case obtained from pglib-opf (v19.05) is used [31]. We modified the data by placing artificial storage with infinite capacity and high charge/discharge cost in each node. The network node set is partitioned into 1616 subdomains using the graph partitioning tool Metis [32], and each subdomain is expanded to obtain an overlapping partition. The Schwarz scheme is run on a multicore parallel computing server (shared memory and 32 CPUs of Intel Xeon CPU E5-2698 v3 running at 2.30 GHz) using the Julia package Distributed.jl. One master process and 16 worker processes are used (one process per one subproblem). We use γ=105\gamma=10^{5}, ϵprtol=102\epsilon^{\text{tol}}_{\text{pr}}=10^{-2}, and ϵdutol=102\epsilon^{\text{tol}}_{\text{du}}=10^{2}. The scripts to reproduce the results can be found here https://github.com/zavalab/JuliaBox/tree/master/SchwarzQPcons.

Convergence results are shown in Fig. 2; we vary the size of overlap ω\omega and show the evolution of the objective value and primal-dual error. The black dashed line represents the optimal objective value (obtained by solving the problem with Ipopt) and the error tolerances. The total computation times and iteration counts are compared in Table I. We can see that increasing ω\omega accelerates convergence (reduces the iteration count roughly proportionally with the size of the overlap) but does not always reduce the computation time. The reason is that the subproblem complexity increases with ω\omega; thus, one can see that an optimal overlap exists. However, the overlapping scheme is still considerably slower than the centralized solver; in particular, Ipopt solves the full problem in 1.92 seconds. We expect that the computational savings can achieved if the problem is bigger. In the future, we will investigate the algorithm with several large-scale problems, and study the conditions under which the overlapping scheme becomes more effective.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Profiles for objective and primal error for different overlap sizes.
TABLE I: Effect of overlap on solution times and iteration counts
ω=1\omega=1 ω=5\omega=5 ω=10\omega=10
Solution time 42.5 sec 17.7 sec 24.2 sec
Iterations 190190 iter 3939 iter 2525 iter

V Conclusions

We have presented an overlapping Schwarz algorithm for solving constrained quadratic programs. We show that convergence relies on an exponential decay of the sensitivity result, which we establish for the setting of interest. The behavior of the algorithm was demonstrated by using a large DC optimal power flow problem. In future work we will study convergence in a nonlinear setting, and we will conduct more extensive benchmark studies.

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) under Contract DE-AC02-06CH11347 and by NSF through award CNS-1545046. We also acknowledge partial support from the National Science Foundation under award NSF-EECS-1609183.

References

  • [1] J. B. Rawlings and D. Q. Mayne, “Model predictive control: Theory and design,” 2009.
  • [2] Z. J. Yu and L. T. Biegler, “Advanced-step multistage nonlinear model predictive control: Robustness and stability,” Journal of Process Control, vol. 84, pp. 192–206, 2019.
  • [3] M. V. Pereira and L. M. Pinto, “Multi-stage stochastic optimization applied to energy planning,” Mathematical programming, vol. 52, no. 1-3, pp. 359–375, 1991.
  • [4] L. T. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders, Real-time PDE-constrained Optimization.   SIAM, 2007.
  • [5] J. Lavaei and S. H. Low, “Zero duality gap in optimal power flow problem,” IEEE Transactions on Power Systems, vol. 27, no. 1, pp. 92–107, 2011.
  • [6] C. Coffrin, R. Bent, K. Sundar, Y. Ng, and M. Lubin, “Powermodels. jl: An open-source framework for exploring power flow formulations,” in 2018 Power Systems Computation Conference (PSCC).   IEEE, 2018, pp. 1–8.
  • [7] A. Zlotnik, M. Chertkov, and S. Backhaus, “Optimal control of transient flow in natural gas networks,” in 2015 54th IEEE conference on decision and control (CDC).   IEEE, 2015, pp. 4563–4570.
  • [8] A. Rantzer, “Dynamic dual decomposition for distributed control,” in 2009 American Control Conference.   IEEE, 2009, pp. 884–888.
  • [9] C. Conte, T. Summers, M. N. Zeilinger, M. Morari, and C. N. Jones, “Computational aspects of distributed optimization in model predictive control,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).   IEEE, 2012, pp. 6819–6824.
  • [10] S. Shin, P. Hart, T. Jahns, and V. M. Zavala, “A hierarchical optimization architecture for large-scale power networks,” IEEE Transactions on Control of Network Systems, 2019.
  • [11] A. Engelmann, Y. Jiang, B. Houska, and T. Faulwasser, “Decomposition of non-convex optimization via bi-level distributed aladin,” IEEE Transactions on Control of Network Systems, 2020.
  • [12] J. R. Jackson and I. E. Grossmann, “Temporal decomposition scheme for nonlinear multisite production planning and distribution models,” Industrial & engineering chemistry research, vol. 42, no. 13, pp. 3045–3055, 2003.
  • [13] C. Lemaréchal, “Lagrangian relaxation,” in Computational combinatorial optimization.   Springer, 2001, pp. 112–156.
  • [14] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
  • [15] S. Shin and V. M. Zavala, “Multi-grid schemes for multi-scale coordination of energy systems,” in Energy Markets and Responsive Grids.   Springer, 2018, pp. 195–222.
  • [16] R. Carli and M. Dotoli, “A distributed control algorithm for waterfilling of networked control systems via consensus,” IEEE control systems letters, vol. 1, no. 2, pp. 334–339, 2017.
  • [17] A. Kozma, C. Conte, and M. Diehl, “Benchmarking large-scale distributed convex quadratic programming algorithms,” Optimization Methods and Software, vol. 30, no. 1, pp. 191–214, 2015.
  • [18] S. Shin, V. M. Zavala, and M. Anitescu, “Decentralized schemes with overlap for solving graph-structured optimization problems,” IEEE Transactions on Control of Network Systems, 2020.
  • [19] A. Frommer and D. B. Szyld, “An algebraic convergence theory for restricted additive schwarz methods using weighted max norms,” SIAM journal on numerical analysis, vol. 39, no. 2, pp. 463–479, 2001.
  • [20] S. Na and M. Anitescu, “Exponential decay in the sensitivity analysis of nonlinear dynamic programming,” arXiv preprint arXiv:1912.06734, 2019.
  • [21] S. Shin and V. M. Zavala, “Diffusing-horizon model predictive control,” arXiv preprint arXiv:2002.08556, 2020.
  • [22] L. Grüne, M. Schaller, and A. Schiela, “Sensitivity analysis of optimal control for a class of parabolic pdes motivated by model predictive control,” SIAM Journal on Control and Optimization, vol. 57, no. 4, pp. 2753–2774, 2019.
  • [23] L. Grüne, M. Schaller, and A. Schiela, “Exponential sensitivity and turnpike analysis for linear quadratic optimal control of general evolution equations,” Journal of Differential Equations, vol. 268, no. 12, pp. 7311–7341, 2020.
  • [24] J. Nocedal and S. Wright, Numerical optimization.   Springer Science & Business Media, 2006.
  • [25] A. G. Hadigheh, O. Romanko, and T. Terlaky, “Sensitivity analysis in convex quadratic optimization: simultaneous perturbation of the objective and right-hand-side vectors,” Algorithmic Operations Research, vol. 2, no. 2, pp. 94–94, 2007.
  • [26] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems.   Springer Science & Business Media, 2013.
  • [27] R. A. Horn and C. R. Johnson, Matrix analysis.   Cambridge university press, 2012.
  • [28] J. Sun and L. Tesfatsion, “DC optimal power flow formulation and solution using quadprogj,” 2010.
  • [29] I. Dunning, J. Huchette, and M. Lubin, “Jump: A modeling language for mathematical optimization,” SIAM Review, vol. 59, no. 2, pp. 295–320, 2017.
  • [30] A. Wächter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006.
  • [31] S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, R. Huang et al., “The power grid library for benchmarking AC optimal power flow algorithms,” arXiv preprint arXiv:1908.02788, 2019.
  • [32] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on scientific Computing, vol. 20, no. 1, pp. 359–392, 1998.