This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Sungho Shin 22institutetext: Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA 22email: [email protected] 33institutetext: Victor M. Zavala 44institutetext: Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA 44email: [email protected]

Multi-Grid Schemes for Multi-Scale Coordination of Energy Systems

Sungho Shin and Victor M. Zavala
Abstract

We discuss how multi-grid computing schemes can be used to design hierarchical coordination architectures for energy systems. These hierarchical architectures can be used to manage multiple temporal and spatial scales and mitigate fundamental limitations of centralized and decentralized architectures. We present the basic elements of a multi-grid scheme, which includes a smoothing operator (a high-resolution decentralized coordination layer that targets phenomena at high frequencies) and a coarsening operator (a low-resolution centralized coordination layer that targets phenomena at low frequencies). For smoothing, we extend existing convergence results for Gauss-Seidel schemes by applying them to systems that cover unstructured domains. This allows us to target problems with multiple timescales and arbitrary networks. The proposed coordination schemes can be used to guide transactions in decentralized electricity markets. We present a storage control example and a power flow diffusion example to illustrate the developments.

1 Motivation and Setting

We consider the following optimization problem:

minz\displaystyle\min_{{z}}\quad 12zTQzcTz\displaystyle\frac{1}{2}z^{T}Qz-c^{T}z (1a)
s.t. Az+Bd=0,(ν)\displaystyle Az+Bd={0},\quad(\nu) (1b)
Πz=0,(λ)\displaystyle\Pi\,{z}=0,\quad\;\quad\;\;\;(\lambda) (1c)

Here, zNnz{z}\in\mathbb{R}^{N\cdot n_{z}} are decision or primal variables (including states and controls) and dNnd{d}\in\mathbb{R}^{N\cdot n_{d}} is the data (including disturbances and system parameters). These variable vectors contain elements that are distributed over a mesh with NN\in\mathbb{Z} points that covers a certain temporal or spatio-temporal domain of interest Ω\Omega. We define the set of points in the mesh as 𝒩\mathcal{N} with |𝒩|=N|\mathcal{N}|=N. The matrix QNnz×NnzQ\in\mathbb{R}^{N\cdot n_{z}\times N\cdot n_{z}} is positive definite and cNnzc\in\mathbb{R}^{N\cdot n_{z}} is a cost vector. The constraint (1b) (with associated dual variables νm\nu\in\mathbb{R}^{m}) is defined by the matrices Am×NnzA\in\mathbb{R}^{m\times N\cdot n_{z}} and Bm×NndB\in\mathbb{R}^{m\times N\cdot n_{d}} and the matrix AA is assumed to have full row rank. The constraints may include discretized dynamic equations (in space and time) and other physical constraints. The constraints (1c) (with associated dual variables λp\lambda\in\mathbb{R}^{p}) are defined by the matrix Πp×Nnz\Pi\in\mathbb{R}^{p\times N\cdot n_{z}}. This constraint models coupling (connectivity) between the primal variables at different mesh points and can also be used to model boundary conditions. Problem (1) captures formulations used in optimization-based control strategies such as model predictive control (MPC) rawlingsbook .

We assume that the dimension of the mesh NN describing problem (1) is so large that the problem cannot be solved in a centralized manner. This is often the case in systems that cover large temporal and spatial domains and/or multiple scales. In an electrical network, for instance, a large number of nodes and harmonics might need to be captured, rendering centralized control impractical. An alternative to address this complexity is to partition the problem into subdomains to create decentralized control architectures. We begin by defining a partitioned version of problem (1):

minzk\displaystyle\min_{{z}_{k}}\quad k𝒦12zkTQkzkckTzk\displaystyle\sum_{k\in\mathcal{K}}\frac{1}{2}z_{k}^{T}Q_{k}z_{k}-c_{k}^{T}z_{k} (2a)
s.t. Akzk+Bkdk=0,k𝒦(νk)\displaystyle A_{k}z_{k}+B_{k}d_{k}=0,\quad\,k\in\mathcal{K}\quad(\nu_{k}) (2b)
k𝒦Πkkzk=0,k𝒦(λk)\displaystyle\sum_{k^{\prime}\in\mathcal{K}}\Pi_{kk^{\prime}}{z}_{k^{\prime}}=0,\quad\,k\in\mathcal{K}\quad(\lambda_{k}) (2c)

We denote this problem as 𝒫\mathcal{P}. Here, 𝒦\mathcal{K} is a set for partitions (subdomains) of the set 𝒩\mathcal{N} and we define the number of partitions as K:=|𝒦|K:=|\mathcal{K}|. Each partition k𝒦k\in\mathcal{K} contains mesh elements 𝒩k𝒩\mathcal{N}_{k}\subseteq\mathcal{N} satisfying k𝒦𝒩k=𝒩\cup_{k\in\mathcal{K}}\mathcal{N}_{k}=\mathcal{N} and 𝒩k𝒩k=\mathcal{N}_{k}\cap\mathcal{N}_{k^{\prime}}=\emptyset for all k,k𝒦k,k^{\prime}\in\mathcal{K} and kkk\neq k^{\prime}. The number of elements in a partition is denoted as Nk:=|𝒩k|N_{k}:=|\mathcal{N}_{k}|. The variables and data (zk,dk)({z}_{k},{d}_{k}) are defined over the partition k𝒦k\in\mathcal{K}. We represent the cost function as a sum of the partition cost functions with associated positive definite matrices QkQ_{k} and cost vectors ckc^{k}. The constraints are also split into individual partition constraints with associated matrices Ak,BkA_{k},B_{k} and we link the partition variables by using the coupling constraints (2c) and associated matrices Πkk,k,k𝒦\Pi_{kk^{\prime}},\,k,k^{\prime}\in\mathcal{K}. As we will discuss later, we can always obtain such a representation by introducing duplicate decision variables in each partition and by adding suitable coupling constraints. This procedure is known as lifting diehllift .

To avoid centralized coordination, a wide range of decomposition schemes (we also refer to them as decentralized coordination schemes) can be used. A popular approach used in the solution of partial differential equations (PDEs) and decomposition methods such as the alternating direction method of multipliers (ADMM) is the Gauss-Seidel (GS) coordination scheme borzi ; boydadmm . Here, the problem in each partition kk (often called a control agent) is solved independently from the rest and exchanges information with its neighbors to coordinate. For a lifted problem of the form (2), we will derive a decentralized GS scheme that solves problems over individual partitions k𝒦k\in\mathcal{K} of the form:

zk+1=argminzk\displaystyle z_{k}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{k}}\quad 12zkTQkzkzkT(ckk=1k1ΠkkTλk+1k=k+1NΠkkTλk)\displaystyle\frac{1}{2}z_{k}^{T}Q_{k}z_{k}-{{{z}}}_{k}^{T}\left(c_{k}-\sum^{k-1}_{k^{\prime}=1}{\Pi_{k^{\prime}k}}^{T}\lambda_{k^{\prime}}^{\ell+1}-\sum^{N}_{k^{\prime}=k+1}{\Pi_{k^{\prime}k}}^{T}\lambda_{k^{\prime}}^{\ell}\right) (3a)
s.t. Akzk+Bkdk=0\displaystyle A_{k}z_{k}+B_{k}d_{k}={0} (3b)
Πkkzk+k=1k1Πkkzk+1+k=k+1KΠkkzk=0(λk).\displaystyle{\Pi}_{kk}z_{k}+\sum_{k^{\prime}=1}^{k-1}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell+1}+\sum_{k^{\prime}=k+1}^{K}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell}=0\quad(\lambda_{k}). (3c)

We denote this partition subproblem as 𝒫k\mathcal{P}_{k}^{\ell} that is solved at the update step +\ell\in\mathbb{Z}_{+} (that we call here the coordination step). From the solution of this problem we obtain the updated primal variables zk+1z_{k}^{\ell+1} and dual variables λk+1\lambda_{k}^{\ell+1} (corresponding to the coupling constraints (3c)). Here, zkz_{k^{\prime}}^{\ell} and λk\lambda_{k^{\prime}}^{\ell} are primal and dual variables for neighboring partitions connected to partition kk and that have not been updated while zk+1z_{k^{\prime}}^{\ell+1} and λk+1\lambda_{k^{\prime}}^{\ell+1} are primal and dual variables for neighboring partitions that have already been updated. We refer to the variables communicated between partitions as the coordination variables. We note that partition kk cannot update its primal and dual variables until the variables of a subset of the partitions connected to it have been updated. Consequently, the GS scheme is sequential and synchronous in nature. We also note that the connectivity topology (induced by the coupling matrices Πkk\Pi_{kk^{\prime}}) determines the communication structure. We highlight, however, that the order of the updates presented in (3) is lexicographic (in the order of the partition number) but this choice of update order is arbitrary and can be modified. We will see that the update order can be designed to derive parallel schemes (i.e., in which certain partitions can proceed independently of others) but that the order can affect performance. In Figure 1 we illustrate the configuration of a GS scheme over a 1-D mesh while in Figure 2 we present a configuration for a 2-D mesh. For the 2-D mesh, we note that the nodes spanning the domain are grouped into sets of the form 𝒩m,n\mathcal{N}_{m,n} and we note that the information is exchanged using the state and dual variables in the boundary of the partitions. In Section 3 we discuss this approach in more detail.

In the next sections we derive and analyze a decentralized GS scheme to solve problem 𝒫\mathcal{P}. The analysis seeks to illustrate how the structure of the partition subproblem 𝒫k\mathcal{P}_{k} arises and to highlight how information of the coordination variables propagates throughout the partitions. We then discuss how to create coarse representations of the full resolution problem 𝒫\mathcal{P} to obtain approximations for the coordination variables and with this accelerate the decentralized GS scheme. This gives rise to the concept of multi-grid schemes, that can be used to design hierarchical coordination architectures. Our analysis is performed on convex quadratic programs (QPs), which will reveal important features of multi-grid schemes.

Refer to caption
Figure 1: Configuration of a GS scheme over a 1-D mesh.
Refer to caption
Figure 2: Configuration of a GS scheme over a 2-D mesh.

The concepts discussed in this paper seek to extend existing literature on decentralized and hierarchical MPC. Many strategies have been proposed to address the complexity of centralized MPC such as spatial and temporal decomposition methods rawlingscooperative ; kroghdecentralized ; scattoreview ; frasch ; deschutter , fast inexact schemes zavalaasnmpc ; zavalage ; diehlrealtime , and reduced order modeling techniques daoutidis . Decentralized control manages the entire system by coordinating multiple controllers, each operating in a different node or subnetwork. Decentralization also enables resiliency and asynchronicity, which are key practical advantages over centralized MPC. Different schemes for coordinating MPC controllers have been devised scattoreview . Lagrangian dual decomposition is a technique where Lagrange multipliers are used for coordination. This technique is popular in electricity markets because the dual variables can be interpreted as prices that are used as a coordination mechanism illic ; deschutter2 ; arnold . Techniques based on coordinate minimization schemes and distributed gradient methods have also been proposed to coordinate MPC controllers in general settings rawlingsjpc ; mpcadmm ; liuadmm . An important limitation of decentralized schemes is that coordination of subsystems tends to be slow (e.g., convergence rates of existing schemes are at best linear) boydadmm ; luoadmm ; fisher . This slow convergence has been reported in the context of energy networks in arnold . Moreover, spatial decentralization by itself does not address the complexity induced by multiple time scales. In particular, time scales and prediction horizons of different decentralized controllers might be different. To the best of our knowledge, no coordination schemes currently exist to handle such settings.

Hierarchical control seeks to overcome fundamental limitations of decentralized and centralized control schemes. Fundamental concepts of hierarchical control date as far back as the origins of automatic control itself hierarbook . Complex industrial control systems such as the power grid are structured hierarchically in one way or another to deal with multiple time and spatial scales. Existing hierarchies, however, are often constructed in ad-hoc manners by using objectives, physical models, and control formulations at different levels that are often incompatible. For instance, an independent system operator (ISO) solves a hierarchy of optimization problems (unit commitment, economic dispatch, optimal power flow) that use different physical representations of the system. This can lead to lost economic performance, unreachable/infeasible command signals, and instabilities rawlingsunreachable ; zavalaipopt . Hierarchical MPC provides a general framework to tackle dynamics and disturbances occurring at multiple time scales scattohierar ; scattoreview ; hierarbook ; zavalamultigrid and spatial scales farina2017hierarchical . In a traditional hierarchical MPC scheme, one uses a high-level controller to compute coarse control actions that are used as targets (commands) by low level controllers. This approach has been used recently in microgrids and multi-energy systems hierarresid ; hug . More sophisticated MPC controllers use robustness margins of the high level controller that are used by the lower level controller to maintain stability scattohierar . Significant advances in the analysis of multi-scale dynamical systems have also been made, most notably by the use of singular perturbation theory to derive reduced-order representations of complex networks kokotovicpower ; chowpower ; peponides ; simon . The application of such concepts in hierarchical MPC, however, has been rather limited. In particular, the recent review on hierarchical MPC by Scattolini notices that systematic design methods for hierarchical MPC are still lacking scattoreview . More specifically, no hierarchical MPC schemes have been proposed that aggregate and refine trajectories at multiple scales. In addition, existing schemes have been tailored to achieve feasibility and stability but do not have optimality guarantees. This is important in systems where both economic efficiency and stability must be taken into account. To the best of our knowledge, no attempt has been made to combine hierarchical and decentralized MPC schemes to manage spatial and temporal scales simultaneously. The multi-grid computing concepts presented in this work seek to take a first step towards creating more general hierarchical control architectures. The proposed multi-grid schemes provide a framework to coordinate decentralized electricity markets. This is done by exchanging state and price (dual information) at the interfaces of the agents domain. The ability to do this hierarchically enables coordination over multiple spatial schemes, in particular, provides a framework to cover large geographical regions that might involve many market players.

2 Analysis of Gauss-Seidel Schemes

This section presents basic concepts and convergence results for a GS scheme under a general convex QP setting. The results seek to highlight how the structure of the coupling between partition variables as well as the coordination sequence affect the performance of GS schemes.

2.1 Illustrative Setting

To introduce notation, we begin by considering a convex QP with two variable partitions (i.e., 𝒦={1,2}\mathcal{K}=\{1,2\}). To simplify the presentation, we do not include internal partition constraints and focus on coupling constraints across partitions. Under these assumptions, we have the following lifted optimization problem:

minz1,z2\displaystyle\min_{{z}_{1},{z}_{2}}\quad 12[z1z2]T[Q1Q2][z1z2][c1c2]T[z1z2]\displaystyle\frac{1}{2}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}^{T}\begin{bmatrix}Q_{1}&\\ &Q_{2}\end{bmatrix}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}-\begin{bmatrix}c_{1}\\ c_{2}\end{bmatrix}^{T}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix} (4a)
s.t. [Π11Π12Π21Π22]Π[z1z2]z=[00](λ1)(λ2)\displaystyle\underbrace{\begin{bmatrix}\Pi_{11}&\Pi_{12}\\ \Pi_{21}&\Pi_{22}\end{bmatrix}}_{\Pi}\underbrace{\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}}_{z}=\begin{bmatrix}0\\ 0\end{bmatrix}\quad\begin{array}[]{c}(\lambda_{1})\\ (\lambda_{2})\end{array} (4d)

Here, the matrices Π11,Π12,Π21,Π22\Pi_{11},\Pi_{12},\Pi_{21},\Pi_{22} capture coupling between the variables z1z_{1} and z2z_{2}. We highlight that, in general, Π12Π21\Pi_{12}\neq\Pi_{21} (the coupling between partitions is not symmetric). The matrices Q1Q_{1} and Q2Q_{2} are positive definite and we assume that Π11\Pi_{11} and Π22\Pi_{22} have full row rank and that the entire coupling matrix Π\Pi has full row rank. By positive definiteness of Q1Q_{1} and Q2Q_{2} and the full rank assumption of Π\Pi we have that the feasible set is non-empty and the primal and dual solution of (4) is unique. The coupling between partition variables is two-directional (given by the structure of the coupling matrix Π\Pi). This structure is found in 1-D linear networks. When the coupling is only one-directional, as is the case of temporal coupling, we have that Π12=0\Pi_{12}=0 and thus Π\Pi is block lower triangular, indicating that the primal variables only propagate forward in time. We will see, however, that backward propagation of information also exists, but in the space of the dual variables.

The first-order KKT conditions of (4) are given by the linear system:

[Q1Π11TΠ21TΠ11Π12Π12TQ2Π22TΠ21Π22][z1λ1z2λ2]=[c10c20]\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}&&{\Pi_{21}}^{T}\\ \Pi_{11}&&\Pi_{12}&\\ &{\Pi_{12}}^{T}&Q_{2}&{\Pi_{22}}^{T}\\ \Pi_{21}&&\Pi_{22}&\end{bmatrix}\begin{bmatrix}{z}_{1}\\ {\lambda}_{1}\\ {z}_{2}\\ {\lambda}_{2}\end{bmatrix}=\begin{bmatrix}c_{1}\\ 0\\ c_{2}\\ 0\end{bmatrix} (5)

To avoid solving this system in a centralized manner we use a decentralized GS scheme with coordination update index +\ell\in\mathbb{Z}_{+}. The scheme has the form:

[Q1Π11TΠ11][z1+1λ1+1]\displaystyle\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix} =[c10][Π21TΠ12][z2λ2]\displaystyle=\begin{bmatrix}c_{1}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{21}}^{T}\\ {\Pi_{12}}&\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell}\\ {\lambda}_{2}^{\ell}\end{bmatrix} (6a)
[Q2Π22TΠ22][z2+1λ2+1]\displaystyle\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix} =[c20][Π12TΠ21][z1+1λ1+1]\displaystyle=\begin{bmatrix}c_{2}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{12}}^{T}\\ {\Pi_{21}}&\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix} (6b)

This scheme requires an initial guess for the variables of the second partition (z2,λ2)({z}_{2},{\lambda}_{2}) that is used to update (z1,λ1)(z_{1},\lambda_{1}) and we then proceed to update the variables of the second partition. Note, however, that we have picked this coordination order arbitrarily (lexicographic order). In particular, one can start with an initial guess of the first partition to update the second partition and then update the first partition (reverse lexicographic order). This scheme has the form:

[Q2Π22TΠ22][z2+1λ2+1]\displaystyle\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix} =[c20][Π12TΠ21][z1λ1]\displaystyle=\begin{bmatrix}c_{2}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{12}}^{T}\\ {\Pi_{21}}&\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell}\\ {\lambda}_{1}^{\ell}\end{bmatrix} (7a)
[Q1Π11TΠ11][z1+1λ1+1]\displaystyle\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix} =[c10][Π21TΠ12][z2+1λ2+1]\displaystyle=\begin{bmatrix}c_{1}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{21}}^{T}\\ {\Pi_{12}}&\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix} (7b)

We will see that the coordination order affects the convergence properties of the GS scheme. In problems with many partitions, we will see that a large number of coordination orders are possible. Moreover, we will see that coordination orders can be designed to enable sophisticated parallel and asynchronous implementations. A key observation is that the linear systems (6) in the GS scheme are the first-order KKT conditions of the following partition problems 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2}, respectively:

z1+1=argminz1\displaystyle{z}_{1}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{1}} 12z1TQ1z1z1T(c1Π21Tλ2)\displaystyle\quad\frac{1}{2}z_{1}^{T}Q_{1}{z}_{1}-z_{1}^{T}\left({c_{1}}-\Pi_{21}^{T}{\lambda}_{2}^{\ell}\right) (8a)
s.t. Π11z1+Π12z2=0(λ1)\displaystyle\quad\Pi_{11}{z}_{1}+\Pi_{12}{z}_{2}^{\ell}=0\quad(\lambda_{1}) (8b)
z2+1=argminz2\displaystyle{z}_{2}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{2}} 12z2TQ2z2z2T(c2Π12Tλ1+1)\displaystyle\quad\frac{1}{2}z_{2}^{T}Q_{2}{z}_{2}-z_{2}^{T}\left({c_{2}}-\Pi_{12}^{T}{\lambda}_{1}^{\ell+1}\right) (8c)
s.t. Π22z2+Π21z1+1=0(λ2)\displaystyle\quad\Pi_{22}{z}_{2}+\Pi_{21}{z}_{1}^{\ell+1}=0\quad(\lambda_{2}) (8d)

The relevance of this observation is that one can implement the GS scheme by directly solving optimization problems, as opposed to performing intrusive linear algebra calculations zavalamultigrid . This has practical benefits, as one can use algebraic modeling languages and handle sophisticated problem formulations. This also reveals that both primal and dual variables are communicated between partitions. Primal information enters in the coupling constraints. The dual variables enter as cost terms in the objective and highlights the fact that dual variables can be interpreted as prices of the primal variable information exchanged between partitions.

We now seek to establish conditions guaranteeing convergence of this simplified GS scheme. To do so, we define the following matrices and vectors:

A1:=[Q1Π11TΠ11],x1:=[z1λ1],b1:=[c10],B12:=[Π21TΠ12]\displaystyle A_{1}:=\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix},\;{x}_{1}^{\ell}:=\begin{bmatrix}{z}_{1}^{\ell}\\ {\lambda}_{1}^{\ell}\end{bmatrix},\;{b}_{1}:=\begin{bmatrix}c_{1}\\ 0\end{bmatrix},\;B_{12}:=\begin{bmatrix}&-{\Pi_{21}}^{T}\\ -{\Pi_{12}}&\end{bmatrix} (9a)
A2:=[Q2Π22TΠ22],x2:=[z2λ2],b2:=[c20],B21:=[Π12TΠ21].\displaystyle A_{2}:=\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix},\;{x}_{2}^{\ell}:=\begin{bmatrix}{z}_{2}^{\ell}\\ {\lambda}_{2}^{\ell}\end{bmatrix},\;{b}_{2}:=\begin{bmatrix}c_{2}\\ 0\end{bmatrix},\;B_{21}:=\begin{bmatrix}&-{\Pi_{12}}^{T}\\ -{\Pi_{21}}&\end{bmatrix}. (9b)

The partition matrices A1A_{1}, A2A_{2} are non-singular because the matrices Q1Q_{1}, Q2Q_{2} are positive definite and Π11\Pi_{11}, Π22\Pi_{22} have full row rank. Nonsingularity of A1A_{1} and A2A_{2} implies that the partition optimization subproblems 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} have a unique solution for any values of the primal and dual variables of the neighboring partition. We can now express (x1+1,x2+1)({x}_{1}^{\ell+1},{x}_{2}^{\ell+1}) in terms of (x1,x2)({x}_{1}^{\ell},{x}_{2}^{\ell}) to obtain a recursion of the form:

[A1B21A2][x1+1x2+1]=[B12][x1x2]+[b1b2].\displaystyle\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell+1}\\ {x}_{2}^{\ell+1}\end{bmatrix}=\begin{bmatrix}\quad&B_{12}\\ \quad&\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\end{bmatrix}+\begin{bmatrix}{b}_{1}\\ {b}_{2}\end{bmatrix}. (10)

We can write this system in compact form by defining:

w:=[x1x2],S:=[A1B21A2]1[B12],r:=[A1B21A2]1[b1b2].{w}^{\ell}:=\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\end{bmatrix},\quad S:=\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{12}\\ \quad&\end{bmatrix},\quad{r}:=\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}^{-1}\begin{bmatrix}{b}_{1}\\ {b}_{2}\end{bmatrix}. (11)

The solution of the \ell-th update step of (6) can be represented as:

w+1\displaystyle{w}^{\ell+1} =Sw+r.\displaystyle=S{w}^{\ell}+{r}. (12a)
=Sw0+(I+S++S1)r.\displaystyle=S^{\ell}{w}^{0}+\left(I+S+\cdots+S^{\ell-1}\right){r}. (12b)

The solution of the QP (4) (i.e., the solution of the KKT system (5)) solves the implicit system w=Sw+r{w}=S{w}+{r}, which can also be expressed as (IS)w=r(I-S)w=r or w=(IS)1rw=(I-S)^{-1}r. Consequently, we note that the eigenvalues of matrix SS play a key role in the convergence of the GS scheme (8). We discuss this in more detail in the following section.

2.2 General Setting

We now extend the previous analysis to a more general QP setting with an arbitrary number of partitions. Here, we seek to illustrate how to perform lifting in a general case where coupling is implicit in the model and how to derive a GS scheme to solve such a problem. Our discussion is based on the following convex QP:

minz12zTQzcTz\min_{{z}}\quad\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z} (13)

Here z=(z(1),z(2),,z(N))N{z}=(z(1),z(2),\cdots,z(N))\in\mathbb{R}^{N} are the optimization variables, QN×NQ\in\mathbb{R}^{N\times N} is a positive definite matrix, and cNc\in\mathbb{R}^{N} is the cost vector. For simplicity (and without loss of generality) we assume that nz=1n_{z}=1 (there is only one variable per node). We let QijQ_{ij} represent the (i,j)(i,j)-th component of matrix QQ and we let 𝒩={1,2,,N}\mathcal{N}=\{1,2,\cdots,N\} be the variable indices (in this case also the node indices). Problem (13) has a unique solution because the matrix QQ is positive definite.

We focus our analysis on the QP (13) because we note that equality constraints A¯z+B¯d=0\bar{A}z+\bar{B}d=0 in (1) (with A¯T=[ATΠT]\bar{A}^{T}=[A^{T}\,\Pi^{T}] and B¯T=[BT 0]\bar{B}^{T}=[B^{T}\,0]) can be eliminated by using a null-space projection procedure. To see how this can be achieved we note that, if AA has full row rank, we can always construct a matrix ZN×N~Z\in\mathbb{R}^{N\times\tilde{N}} whose columns span the null-space of A¯\bar{A} (i.e., A¯Zz~=0\bar{A}Z\tilde{z}=0 for any z~N~\tilde{z}\in\mathbb{R}^{\tilde{N}}) and with N~<N\tilde{N}<N. Similarly, we can construct a matrix YN×(NN~)Y\in\mathbb{R}^{N\times(N-\tilde{N})} whose columns span the range-space of A¯T\bar{A}^{T} (i.e., Yy~Range(A¯T)Y\tilde{y}\in\textrm{Range}(\bar{A}^{T}) for any y~NN~\tilde{y}\in\mathbb{R}^{N-\tilde{N}}). We can express any zNz\in\mathbb{R}^{N} as z=Zz~+Yy~z=Z\tilde{z}+Y\tilde{y}. We thus have that A¯z=A¯Yy~\bar{A}z=\bar{A}Y\tilde{y} for all z~\tilde{z} and thus y~=(A¯Y)1B¯d\tilde{y}=-(\bar{A}Y)^{-1}\bar{B}d and z=Y(AY)1B¯d+Zz~z=-Y(AY)^{-1}\bar{B}d+Z\tilde{z} satisfies A¯z=B¯d\bar{A}z=-\bar{B}d for any z~\tilde{z}. With this, we can express the quadratic objective as 12zTQzcTz\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z} as 12z~TZTQZz~cTZz~(Y(AY)1B¯d)TQZz~+κ\frac{1}{2}\tilde{z}^{T}Z^{T}QZ\tilde{z}-{c}^{T}Z\tilde{z}-(Y(AY)^{-1}\overline{B}d)^{T}QZ\tilde{z}+\kappa, where κ\kappa is a constant. We thus obtain a QP of the same form as (13) but with matrix QZTQZQ\leftarrow Z^{T}QZ, reduced cost cZTc+ZTQY(AY)1B¯dc\leftarrow Z^{T}c+Z^{T}QY(AY)^{-1}\overline{B}d, and variable vector zz~z\leftarrow\tilde{z}. We highlight that this reduction procedure does not need to be applied in a practical implementation but we only use it to justify that the formulation (13) is general.

We thus have that the variables zz in (13) are coupled implicitly via the matrix QQ and we seek to express this problem in the lifted form. We proceed to partition the set 𝒩\mathcal{N} into a set of partitions 𝒦={1,..,K}\mathcal{K}=\{1,..,K\} to give the partition sets 𝒩1,𝒩2,,𝒩K𝒩\mathcal{N}_{1},\mathcal{N}_{2},\cdots,\mathcal{N}_{K}\subseteq\mathcal{N} satisfying 𝒩=𝒩1𝒩2𝒩K\mathcal{N}=\mathcal{N}_{1}\cup\mathcal{N}_{2}\cup\cdots\cup\mathcal{N}_{K} and 𝒩k𝒩k=for all k,k𝒦 and kk\mathcal{N}_{k}\cap\mathcal{N}_{k^{\prime}}=\emptyset\quad\text{for all }k,k^{\prime}\in\mathcal{K}\text{ and }k\neq k^{\prime}. Coupling between variables arises when Qij0Q_{ij}\neq 0 for i𝒩ki\in\mathcal{N}_{k}, j𝒩kj\in\mathcal{N}_{k}^{\prime}, and kkk\neq k^{\prime}. We perform lifting by defining index sets:

𝒩¯k:={j𝒩𝒩ki𝒩k s.t. Qij0},𝒩¯k:=𝒩k𝒩¯k\displaystyle\underline{\mathcal{N}}_{k}:=\{j\in\mathcal{N}\setminus\mathcal{N}_{k}\mid\exists\,i\in\mathcal{N}_{k}\text{ s.t. }Q_{ij}\neq 0\},\quad\overline{\mathcal{N}}_{k}:=\mathcal{N}_{k}\cup\underline{\mathcal{N}}_{k} (14)

The set 𝒩¯k\underline{\mathcal{N}}_{k} includes all the coupled variables in the partition set 𝒩k\mathcal{N}_{k} that are not in partition kk. The set 𝒩¯k\overline{\mathcal{N}}_{k} includes all variables in partition 𝒩k\mathcal{N}_{k} and its coupled variables. These definitions allow us to express problem (13) as:

minz12k𝒦i𝒩kj𝒩¯kQijz(i)z(j)k𝒦i𝒩kciz(i).\min_{{z}}\quad\frac{1}{2}\sum_{k\in\mathcal{K}}\sum_{i\in\mathcal{N}_{k}}\sum_{j\in\overline{\mathcal{N}}_{k}}Q_{ij}z(i)z(j)-\sum_{k\in\mathcal{K}}\sum_{i\in\mathcal{N}_{k}}c_{i}z(i). (15)

To induce lifting, we introduce a new set of variables {z1,z2,,zK}\{{z}_{1},{z}_{2},\cdots,{z}_{K}\} defined as:

zk:=[z(k1)z(kNk)],z¯k=[z(kNk+1)z(kNk+N¯k)],z¯k:=[zkz¯k]{z}_{k}:=\begin{bmatrix}z(k_{1})\\ \vdots\\ z(k_{N_{k}})\end{bmatrix},\quad\underline{{z}}_{k}=\begin{bmatrix}z(k_{N_{k}+1})\\ \vdots\\ z(k_{N_{k}+\underline{N}_{k}})\end{bmatrix},\quad\overline{{z}}_{k}:=\begin{bmatrix}{z}_{k}\\ \underline{{z}}_{k}\end{bmatrix} (16)

where 𝒩k={k1,k2,,kNk}\mathcal{N}_{k}=\{k_{1},k_{2},\cdots,k_{N_{k}}\}, 𝒩¯k={kNk+1,kNk+2,,kNk+N¯k}\underline{\mathcal{N}}_{k}=\{k_{N_{k}+1},k_{N_{k}+2},\cdots,k_{N_{k}+\underline{N}_{k}}\}, NkN_{k} is the number of variables in partition 𝒩k\mathcal{N}_{k}, and N¯k\underline{N}_{k} is the number of variables coupled to partition kk. With this, we can express problem (15) in the following lifted form:

minz\displaystyle\min_{{z}}\quad k𝒦12z¯kTQ¯kz¯kc¯kTz¯k.\displaystyle\sum_{k\in\mathcal{K}}\frac{1}{2}{\overline{{z}}_{k}}^{T}\overline{Q}_{k}\overline{{z}}_{k}-{\overline{{c}}_{k}}^{T}\overline{{z}}_{k}. (17a)
s.t. Πkkz¯k+k𝒦{k}Πkkz¯k=0,k𝒦.\displaystyle{\Pi}_{kk}\overline{{z}}_{k}+\sum_{k^{\prime}\in\mathcal{K}\setminus\{k\}}{\Pi}_{kk^{\prime}}\overline{{z}}_{k^{\prime}}=0,\quad k\in\mathcal{K}. (17b)

Here, Q¯k(Nk+N¯k)×(Nk+N¯k)\overline{Q}_{k}\in\mathbb{R}^{(N_{k}+\underline{N}_{k})\times(N_{k}+\underline{N}_{k})} and c¯k(Nk+N¯k)\overline{c}_{k}\in\mathbb{R}^{(N_{k}+\underline{N}_{k})} are given by:

(Q¯k)ij={Qkikj,for ki,kj𝒩k12Qkikj,for ki𝒩k and kj𝒩k12Qkikj,for ki𝒩k and kj𝒩k0,otherwise,(c¯k)i={cki,for ki𝒩k0,for ki𝒩k\displaystyle(\overline{Q}_{k})_{ij}=\begin{cases}Q_{k_{i}k_{j}},&\text{for }k_{i},k_{j}\in\mathcal{N}_{k}\\ \frac{1}{2}Q_{k_{i}k_{j}},&\text{for }k_{i}\in\mathcal{N}_{k}\text{ and }k_{j}\notin\mathcal{N}_{k}\\ \frac{1}{2}Q_{k_{i}k_{j}},&\text{for }k_{i}\notin\mathcal{N}_{k}\text{ and }k_{j}\in\mathcal{N}_{k}\\ 0,\quad&\text{otherwise}\end{cases},\qquad(\overline{{c}}_{k})_{i}=\begin{cases}c_{k_{i}},&\text{for }k_{i}\in\mathcal{N}_{k}\\ 0,&\text{for }k_{i}\notin\mathcal{N}_{k}\end{cases} (18a)

The coefficient matrices ΠkkN¯k×(Nk+N¯k)\Pi_{kk}\in\mathbb{R}^{\underline{N}_{k}\times(N_{k}+\underline{N}_{k})} are given by:

Πkk=[eNk+1TeNk+N¯kT],\Pi_{kk}=\begin{bmatrix}{e_{N_{k}+1}}^{T}\\ \vdots\\ {e_{N_{k}+\underline{N}_{k}}}^{T}\end{bmatrix}, (19)

where eiNk+N¯ke_{i}\in\mathbb{R}^{N_{k}+\underline{N}_{k}} are elementary column vectors. Note that for Qij0Q_{ij}\neq 0, i𝒩ki\in\mathcal{N}_{k}, j𝒩kj\in\mathcal{N}_{k^{\prime}}, and kkk\neq k^{\prime}, QijQ_{ij} can be included in the objective function of either partition kk, partition kk^{\prime}, or both. In the lifting scheme shown in (18a), we assume that these terms are equally divided and included in each partition. However, this approach is arbitrary, and other lifting schemes are possible. In other words, we can manipulate the lifting scheme to set the partition sets to satisfy either j𝒩¯kj\in\underline{\mathcal{N}}_{k} or j𝒩¯kj\notin\underline{\mathcal{N}}_{k}. Interestingly, one can show that the solution of the lifted problem is unique and is the same as that of problem (13). The proof of this assertion is intricate and will not be discussed here due to the lack of space. To simplify the notation we express the lifted variables z¯k\bar{z}_{k}, matrices Q¯k\bar{Q}_{k}, and cost vectors c¯k\bar{c}_{k} in (17) simply as zk,Qk,ckz_{k},Q_{k},c_{k}.

The primal-dual solution of (17) can be obtained by solving the KKT conditions:

[Q1Π11TΠ21TΠK1TΠ11Π12Π1KΠ12TQ2Π22TΠ21Π22Π1KTQKΠKKTΠK1ΠKK][z1λ1z2λ2zKλK]=[c10c20cK0]\begin{bmatrix}Q_{1}&\Pi_{11}^{T}&&\Pi_{21}^{T}&&\cdots&&\Pi_{K1}^{T}\\ \Pi_{11}&&\Pi_{12}&&\cdots&&\Pi_{1K}&\\ &\Pi_{12}^{T}&Q_{2}&\Pi_{22}^{T}&&&&\vdots\\ \Pi_{21}&&\Pi_{22}&&&&\vdots&\\ &\vdots&&&\ddots&&&\vdots\\ \vdots&&&&&\ddots&\vdots&\\ &\Pi_{1K}^{T}&&\cdots&&\cdots&Q_{K}&\Pi_{KK}^{T}\\ \Pi_{K1}&&\cdots&&\cdots&&\Pi_{KK}&\\ \end{bmatrix}\begin{bmatrix}z_{1}\\ \lambda_{1}\\ z_{2}\\ \lambda_{2}\\ \vdots\\ z_{K}\\ \lambda_{K}\end{bmatrix}=\begin{bmatrix}c_{1}\\ {0}\\ c_{2}\\ {0}\\ \vdots\\ c_{K}\\ {0}\end{bmatrix} (20)

By exploiting the structure of this system, we can derive a GS scheme of the form:

[QkΠkkTΠkk][zk+1λk+1]=[ck0]k=1k1[0ΠkkTΠkk0][zk+1λk+1]k=k+1K[0ΠkkTΠkk0][zkλk].\begin{bmatrix}Q_{k}&\Pi_{kk}^{T}\\ \Pi_{kk}&\end{bmatrix}\begin{bmatrix}z_{k}^{\ell+1}\\ \lambda_{k}^{\ell+1}\end{bmatrix}=\begin{bmatrix}c_{k}\\ {0}\end{bmatrix}-\sum_{k^{\prime}=1}^{k-1}\begin{bmatrix}0&\Pi_{k^{\prime}k}^{T}\\ \Pi_{kk^{\prime}}&0\end{bmatrix}\begin{bmatrix}z_{k^{\prime}}^{\ell+1}\\ \lambda_{k^{\prime}}^{\ell+1}\end{bmatrix}-\sum_{k^{\prime}=k+1}^{K}\begin{bmatrix}0&\Pi_{k^{\prime}k}^{T}\\ \Pi_{kk^{\prime}}&0\end{bmatrix}\begin{bmatrix}z_{k^{\prime}}^{\ell}\\ \lambda_{k^{\prime}}^{\ell}\end{bmatrix}. (21)

Here, we have used a lexicographic coordination order. We note that the solution of the linear system (21) solves the optimization problem:

zk+1=argminzk\displaystyle z_{k}^{\ell+1}=\mathop{\textrm{argmin}}_{z_{k}} 12zkTQkzkzkT(ckk=1k1ΠkkTλk+1k=k+1KΠkkTλk)\displaystyle\quad\frac{1}{2}{z_{k}}^{T}Q_{k}z_{k}-{{{z}}}_{k}^{T}\left({{{c}_{k}}}-\sum^{k-1}_{k^{\prime}=1}\Pi_{k^{\prime}k}^{T}\lambda_{k^{\prime}}^{\ell+1}-\sum^{K}_{k^{\prime}=k+1}\Pi_{k^{\prime}k}^{T}\lambda_{k^{\prime}}^{\ell}\right) (22a)
s.t. Πkkzk+k=1k1Πkkzk+1+k=k+1KΠkkzk=0(λk).\displaystyle\quad{\Pi}_{kk}z_{k}+\sum_{k^{\prime}=1}^{k-1}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell+1}+\sum_{k^{\prime}=k+1}^{K}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell}=0\quad(\lambda_{k}). (22b)

From this structure, we can see how the primal and dual variables propagate forward and backward relative to the partition kk, due to the inherent block triangular nature of the GS scheme. We now seek to establish a condition that guarantees convergence of the GS scheme in this more general setting. To do so, we define:

Ak:=[QkΠkkTΠkk],xk:=[zkλk],bk:=[ck0],Bkk:=[ΠkkTΠkk].A_{k}:=\begin{bmatrix}Q_{k}&{\Pi_{kk}}^{T}\\ \Pi_{kk}&\end{bmatrix},\quad{x}_{k}^{\ell}:=\begin{bmatrix}{z}_{k}^{\ell}\\ {\lambda}_{k}^{\ell}\end{bmatrix},\quad{b}_{k}:=\begin{bmatrix}c_{k}\\ {0}\end{bmatrix},\quad B_{kk^{\prime}}:=\begin{bmatrix}&-{\Pi_{k^{\prime}k}}^{T}\\ -\Pi_{kk^{\prime}}&\end{bmatrix}. (23)

By using (23), we can express (21) as:

Akxk+1=bk+k=1k1Bkkxk+1+k=k+1KBkkxk.A_{k}{x}_{k}^{\ell+1}={b}_{k}+\sum^{k-1}_{k^{\prime}=1}B_{kk^{\prime}}{x}_{k^{\prime}}^{\ell+1}+\sum^{K}_{k^{\prime}=k+1}B_{kk^{\prime}}{x}_{k^{\prime}}^{\ell}. (24)

This can be expressed in matrix form as:

[A1B21A2BK1BKK1AK][x1+1x2+1xK+1]=[B12B1KBK1K][x1x2xK]+[b1b2bK].\displaystyle\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell+1}\\ {x}_{2}^{\ell+1}\\ \vdots\\ {x}_{K}^{\ell+1}\end{bmatrix}=\begin{bmatrix}\quad&B_{12}&\cdots&B_{1K}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{K-1K}\\ \quad\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\\ \vdots\\ {x}_{K}^{\ell}\end{bmatrix}+\begin{bmatrix}{b}_{1}\\ {b}_{2}\\ \vdots\\ {b}_{K}\end{bmatrix}. (25)

We can see that the partition matrices AkA_{k} are nonsingular by inspecting the block structure of QkQ_{k}. In particular, by the definition of QkQ_{k} in (18a) (we denoted QkQ_{k} as Q¯k\overline{Q}_{k} here) and the definition of Πkk\Pi_{kk} in (19), we can express AkA_{k} as:

Ak=[Q^kQ¯kTQ¯kII]A_{k}=\begin{bmatrix}\hat{Q}_{k}&\underline{Q}_{k}^{T}&\\ {\underline{Q}_{k}}&&I\\ &I&\end{bmatrix} (26)

where each components of Q^kNk×Nk\hat{Q}_{k}\in\mathbb{R}^{N_{k}\times N_{k}} and Q¯kN¯k×Nk\underline{Q}_{k}\in\mathbb{R}^{\underline{N}_{k}\times N_{k}} are defined in (18a). Since QQ is positive definite, Q^k\hat{Q}_{k} is also positive definite. Noting that Q^k\hat{Q}_{k} is nonsingular, we can see that the columns of AkA_{k} are linearly independent and thus AkA_{k} is nonsingular as well. This implies that the block lower triangular matrix on the left-hand side of (25) is also nonsingular. To simplify notation we define:

w\displaystyle{w}^{\ell} :=[x1x2xK],r:=[A1B21A2BK1BKK1AK]1[b1b2bK]\displaystyle:=\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\\ \vdots\\ {x}_{K}^{\ell}\end{bmatrix},\quad{r}:=\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}^{-1}\begin{bmatrix}{b}_{1}\\ {b}_{2}\\ \vdots\\ {b}_{K}\end{bmatrix} (27a)
S\displaystyle S :=[A1B21A2BK1BKK1AK]1[B12B1KBK1K].\displaystyle:=\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{12}&\cdots&B_{1K}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{K-1K}\\ \quad\end{bmatrix}. (27b)

We express (25) by using the compact form w+1=Sw+r{w}^{\ell+1}=S{w}^{\ell}+{r} or w+1=Sw0+(I+S++S1)rw^{\ell+1}=S^{\ell}{w}^{0}+\left(I+S+\cdots+S^{\ell-1}\right){r}. The solution of (20) satisfies w=Sw+r{{w}}=S{w}+{r}. This implies that the solution of the lifted problem also solves the original problem (13). We now formally establish the following convergence result for the GS scheme.

Proposition 1

The GS scheme (22) converges to the solution of (13) if all the eigenvalues of the matrix:

Σ:=[Q1Π11TQ2Π12TΠ22TQKΠ1KTΠK1KTΠKKTΠ11Π21Π22ΠK1ΠKK1ΠKK]1[Π21TΠK1TΠKK1TΠ12Π1KΠK1K]\displaystyle\begin{split}\Sigma:=\begin{bmatrix}Q_{1}&&&&\Pi_{11}^{T}&&&\\ &Q_{2}&&&\Pi_{12}^{T}&\Pi_{22}^{T}&&\\ &&\ddots&&\vdots&\ddots&\ddots&\\ &&&Q_{K}&\Pi_{1K}^{T}&\cdots&\Pi_{K-1K}^{T}&\Pi_{KK}^{T}\\ \Pi_{11}&&&&&&&\\ \Pi_{21}&\Pi_{22}&&&&&&\\ \vdots&\ddots&\ddots&&&&&\\ \Pi_{K1}&\cdots&\Pi_{KK-1}&\Pi_{KK}&&&&\\ \end{bmatrix}^{-1}\begin{bmatrix}&&&&&-\Pi_{21}^{T}&\cdots&-\Pi_{K1}^{T}\\ &&&&&&\ddots&\vdots\\ &&&&&&&-\Pi_{KK-1}^{T}\\ &&&&&&&\\ \quad&-\Pi_{12}&\cdots&-\Pi_{1K}&&&&\\ &&\ddots&\vdots&&&&\\ &&&-\Pi_{K-1K}&&&&\\ &&&&&&&\\ \end{bmatrix}\end{split}

have magnitude less than one.

Proof

Matrix Σ\Sigma is a permutation of matrix SS. The permutation is a similarity transformation and thus SS has the same set of eigenvalues as Σ\Sigma. Consequently, all the non-zero eigenvalues of SS have a magnitude of less than one. This implies that all the eigenvalues of SS^{\ell} decay exponentially as \ell\rightarrow\infty. Furthermore, this indicates that the series I+S+S2+I+S+S^{2}+\cdots converges to (IS)1(I-S)^{-1}. Note that ISI-S is invertible since all the eigenvalue of SS have magnitude less than one. Accordingly, the solution of the scheme (22) satisfies limw=(IS)1r\lim_{\ell\rightarrow\infty}{w}^{\ell}=(I-S)^{-1}{r}. This convergence value (IS)1r(I-S)^{-1}{r} satisfies equation w=Sw+r{{w}}=S{w}+{r}. Note that this is a unique solution to equation w=Sw+r{{w}}=S{w}+{r} and thus solves (17). Problem (17) has same solution of problem (13), and both problems have unique solutions. Thus, the solution of the GS scheme (22) converges to the solution of (13). \Box

Convergence is achieved without any assumptions on the initial guess w0{w}^{0}. The error between the solution of problem (13) and the solution of the \ell-th coordination update of the GS scheme (21) (i.e., (IS)1rw(I-S)^{-1}{r}-{w}^{\ell}) is given by ϵ=S((IS)1rw0){\epsilon}^{\ell}=S^{\ell}\left((I-S)^{-1}{r}-{w}^{0}\right). We will call ϵ\|\epsilon^{\ell}\| the error of the \ell-th coordination step. If we express the initial error term as ϵ0\|\epsilon^{0}\|, we can write ϵ=Sϵ0\|\epsilon^{\ell}\|=\|S^{\ell}\epsilon^{0}\|. Moreover, if we define ρ(S)=λmax(S)\rho(S)=\lambda_{max}(S) then, for any δ>0\delta>0, there exists κ>0\kappa>0 such that the bound Sκ|ρ(S)+δ|\|S^{\ell}\|\leq\kappa|\rho(S)+\delta|^{\ell} holds for all \ell and where S\|S\| is a matrix norm of SS. Using this inequality we can establish that ϵκ|ρ(S)+δ|ϵ0\|\epsilon^{\ell}\|\leq\kappa|\rho(S)+\delta|^{\ell}\|\epsilon^{0}\|. Since we can choose δ\delta to be arbitrarily small, we can see that the decaying rate of the error is O(ρ(S))O\left(\rho(S)^{\ell}\right). Although the error decays exponentially we note that, if ρ(S)\rho(S) is close to one, convergence can be slow and thus having a good initial guess w0{w}^{0} is essential. We also note that ρ(S)\rho(S) is tightly related to the topology of the coupling between partitions, indicating that the partition structure contributes to the convergence rate.

We highlight that the GS concepts discussed here focus on problems with no inequality constraints. In practice, however, GS schemes can also be applied to such problems by using projected GS schemes zavalamultigrid ; borzi2005multigrid ; zavala2010real .

2.3 Coordination Orders

In the lexicographic coordination order proposed in (21), we use the update sequence k=1,2,,Kk=1,2,\cdots,K. We note that the order affects the structure of the matrix Σ\Sigma and thus convergence can be affected as well. To see this, consider a new order given by the sequence σ(1),σ(2),,σ(K)\sigma(1),\sigma(2),\cdots,\sigma(K) where σ:{1,2,,K}{1,2,,K}\sigma:\{1,2,\cdots,K\}\rightarrow\{1,2,\cdots,K\} is a bijective mapping. We can use this mapping to rearrange the partition variables as:

[z1,λ1,z2,λ2,zK,λK][zσ(1),λσ(1),zσ(2),λσ(2),zσ(K),λσ(K)].[z_{1},\lambda_{1},z_{2},\lambda_{2},\dots z_{K},\lambda_{K}]\rightarrow[z_{\sigma(1)},\lambda_{\sigma(1)},z_{\sigma(2)},\lambda_{\sigma(2)},\dots z_{\sigma(K)},\lambda_{\sigma(K)}]. (28)

This gives the reordered matrix:

S=[Aσ(1)Bσ(2)σ(1)Aσ(2)Bσ(K)σ(1)Bσ(K)σ(K1)Aσ(K)]1[Bσ(1)σ(2)Bσ(1)σ(K)Bσ(K1)σ(K)]\displaystyle S=\begin{bmatrix}A_{\sigma(1)}&&&\\ -B_{\sigma(2)\sigma(1)}&A_{\sigma(2)}&&\\ \vdots&\ddots&\ddots&\\ -B_{\sigma(K)\sigma(1)}&\cdots&-B_{\sigma(K)\sigma(K-1)}&A_{\sigma(K)}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{\sigma(1)\sigma(2)}&\cdots&B_{\sigma(1)\sigma(K)}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{\sigma(K-1)\sigma(K)}\\ \quad\end{bmatrix} (29)

Importantly, the change in update order is not necessarily a similarity transformation of matrix SS and thus the eigenvalues will be altered. The GS scheme converges as long as eigenvalues of the reordered matrix SS have magnitude less than one but the convergence rate will be affected. Interestingly, GS schemes are highly flexible and allow for a large number of update orders. For instance, in some cases one can derive ordering sequences that enable parallelization. As an example, the 1-D spatial problem has a special structure with BkkB_{kk^{\prime}}=0 for any (k,k)(k,k^{\prime}) such that |kk|2|k-k^{\prime}|\geq 2. The GS scheme becomes:

Akxk+1\displaystyle A_{k}{x}_{k}^{\ell+1} =bk+Bkk1xk1+1+Bkk+1xk+1\displaystyle={b}_{k}+B_{kk-1}{x}_{k-1}^{\ell+1}+B_{kk+1}{x}_{k+1}^{\ell} (30)

Instead, we consider the following ordering σ(i)=2i1\sigma(i)=2i-1 for 1iK21\leq i\leq\frac{K}{2} and σ(i)=2iK\sigma(i)=2i-K for K2+1iK\frac{K}{2}+1\leq i\leq K. Here we assume that the number of partitions is even. This is a called a red-black ordering and is widely popular in the solution of PDEs. By changing the index using σ()\sigma(\cdot), we can express (30) as:

Aσ(i)xσ(i)+1\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1} =bσ(i)+Bσ(i)σ(i+K2)xσ(i+K2),i=1\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i+\frac{K}{2})}}{x}_{\sigma(i+\frac{K}{2})}^{\ell},\quad i=1 (31a)
Aσ(i)xσ(i)+1\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1} =bσ(i)+Bσ(i+K21)σ(i+K21)xσ(i+K21)+Bσ(i)σ(i+K2)xσ(i+K2),2iK2\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i+\frac{K}{2}-1)}{\sigma(i+\frac{K}{2}-1)}}{x}_{\sigma(i+\frac{K}{2}-1)}^{\ell}+B_{{\sigma(i)}{\sigma(i+\frac{K}{2})}}{x}_{\sigma(i+\frac{K}{2})}^{\ell},\quad 2\leq i\leq\frac{K}{2} (31b)

and

Aσ(i)xσ(i)+1\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1} =bσ(i)+Bσ(i)σ(iK2)xσ(iK2)+1+Bσ(i)σ(iK2+1)xσ(iK2+1)+1,K2+1iK1.\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2})}}{x}_{\sigma(i-\frac{K}{2})}^{\ell+1}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2}+1)}}{x}_{\sigma(i-\frac{K}{2}+1)}^{\ell+1},\quad\frac{K}{2}+1\leq i\leq K-1. (32a)
Aσ(i)xσ(i)+1\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1} =bσ(i)+Bσ(i)σ(iK2)xσ(iK2)+1,i=K.\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2})}}{x}_{\sigma(i-\frac{K}{2})}^{\ell+1},\quad i=K. (32b)

We can thus see that the solution of (31) can proceed independently for any 1iK21\leq i\leq\frac{K}{2}. This is because these partitions only depend on the solutions of (32) but not on the solutions of (31). Likewise, solving (32) can be done independently for any K2+1iK\frac{K}{2}+1\leq i\leq K. Red-black ordering thus enables parallelism. Different coordination orders for 1-D and 2-D meshes are presented in Figures 4 and 4.

Refer to caption
Figure 3: Sketch of 1-D ordering methods.
Refer to caption
Figure 4: Sketch of 2-D ordering methods.

2.4 Coarsening

Low-complexity coarse versions of the full-resolution problem (1) can be solved to obtain an initial guess for the GS scheme and with this accelerate coordination. To illustrate how this can be done, we use the following representation of (1):

minz\displaystyle\min_{{z}}\quad 12zTQzcTz\displaystyle\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z} (33a)
s.t. [AΠ]A¯z+[B0]B¯d=0(ν,λ)\displaystyle\underbrace{\begin{bmatrix}A\\ \Pi\end{bmatrix}}_{\bar{A}}{z}+\underbrace{\begin{bmatrix}B\\ 0\end{bmatrix}}_{\bar{B}}d=0\quad(\nu,{\lambda}) (33b)

Our goal is to obtain a substantial reduction in the dimension of this problem by introducing a mapping from a coarse variable space to the original space. This is represented by the linear mapping z=Tz~{z}=T\tilde{{z}} where z~Ncnz\tilde{{z}}\in\mathbb{R}^{N_{c}\cdot n_{z}} is the coarse variable and we assume that the mapping TNnz×NcnzT\in\mathbb{R}^{N\cdot n_{z}\times N_{c}\cdot n_{z}} (called a restriction operator) has full column rank and Nc<NN_{c}<N. We can thus pose the low-dimensional coarse problem:

minz\displaystyle\min_{{z}}\quad 12z~TTTQTz~cTTz~\displaystyle\frac{1}{2}{\tilde{z}}^{T}T^{T}QT{\tilde{z}}-{c}^{T}T{\tilde{z}} (34a)
s.t. UTA¯Tz~+UTB¯d=0(ν~,λ~)\displaystyle U^{T}\bar{A}T{\tilde{z}}+U^{T}\bar{B}d=0\quad(\tilde{\nu},\tilde{\lambda}) (34b)

A key issue that arises in coarsening is that the columns of matrix A¯T\bar{A}T do not necessarily span the entire range space of B¯\bar{B} (i.e., we might not be able to find a coarse variable z~\tilde{z} that satisfies A¯Tz~+B¯d=0\bar{A}T{\tilde{z}}+\bar{B}d=0). Consequently, we introduce a constraint aggregation matrix UU that has full column rank to ensure that UTA¯TU^{T}\bar{A}T spans the range space of UTB¯U^{T}\bar{B}. With this, we can ensure that the feasible set of (34) is non-empty.

After solving the coarse problem (34), we can project the primal-dual solution from the coarse space to the original space. We note that the dimension of the dual space is also reduced because we performed constraint aggregation. The projection for the primal solution can be done by using z=Tz~{z}=T\tilde{{z}} while the projection for the dual solution can be obtained as (ν,λ)=U(ν~,λ~)(\nu,\lambda)=U(\tilde{\nu},{\tilde{\lambda}}). The derivation of coarsened representations is application-dependent and often requires domain-specific knowledge. In particular, coarsening can also be performed by using reduced order modeling techniques such as proper orthogonal decompositions antoulas or coherency-based network aggregation schemes kokotovic1982coherency . In the following sections we demonstrate how to derive coarse representations in certain settings.

2.5 Multi-Grid Schemes and Hierarchical Coordination

Multi-grid serves as a bridge between a fully centralized and a fully decentralized coordination schemes. In particular, a fully centralized scheme would aim to find a solution of the full-resolution problem (2) by gathering all the information in a single processing unit. A fully decentralized scheme such as GS, on the other hand, would proceed by finding solutions to subproblems (3) over each partition and information would only be shared between the connected partitions through the coordination variables. A drawback of a decentralized scheme is that a potentially large number of coordination steps might be needed to reach a solution for the full-resolution problem, particularly when many partitions are present.

In a multi-grid scheme, we seek to aid the decentralized scheme by using information from a low-resolution central problem that oversees the entire domain. In our context, the information is in the form of states and dual variables defined over the partition interfaces (i.e., the coupling variables and constraints). The key idea of this hierarchical arrangement is that the coarse central scheme can capture effects occurring at low global frequencies while the agents in the decentralized schemes can handle effects occurring at high local frequencies. As can be seen, multi-grid provides a framework to design hierarchical control architectures by leveraging existing and powerful reduced order modeling techniques such as coherency-based aggregation and decentralized control schemes.

Multi-grid is a widely studied computational paradigm. The framework proposed here presents basic elements of this paradigm but diverse extensions are possible borzi ; brandtalgebraic . For instance, the scheme proposed here involves only a coarse and a fine resolution level but one can create a multi-level schemes that transfer information between multiple scales recursively by using meshes of diverse resolution. This can allow us to cover a wider range of frequencies present in the system. In the following section we illustrate how sequential coarsening can be beneficial.

From an electricity markets perspective, we highlight that multi-grid schemes provide a framework to coordinate transactions at multiple spatial and temporal scales. To see this, consider the case of spatial coordination of electricity markets. Under such setting, we can interpret each partition of the spatial domain as a market player (e.g., a microgrid). The market players have internal resources (e.g., distributed energy resources) that they manage to satisfy their internal load. The players, however, can also transact energy with other players in order to improve their economic performance. The proposed GS scheme provides a mechanism to handle intra-partition decision-making (by solving the partition subproblems) and inter-partition transactions by exchanging state (voltages) and dual information (nodal prices). If transaction information is exchanged multiple times (corresponding to multiple GS iterates), the GS scheme will converge to an equilibrium point corresponding to the solution of the centralized economic maximization problem (e.g., the social welfare problem). This is a useful property of decentralized coordination because centralization of information and decision-making is often impractical. If the players only exchange information once (or a handful of times) they might not reach an optimal equilibrium and an inefficiency will be introduced. Moreover, when a disturbance affects the system, many GS iterations might be needed to reach the new optimal equilibrium. This is where hierarchical optimization becomes beneficial, because one can solve and aggregated spatial representation of the system (in which each partition is treated as a node) to compute approximate dual variables and states at the interfaces of the partitions. This approximation can be used to aid the convergence of the decentralized GS scheme (by conveying global spatial information to local market players). One can think of the coarse high-level problem as a system operator (supervisor) problem (e.g., at the distribution level). The operator might, at the same time, need to coordinate with other system operators (each of which oversees its own set of market players). These system operators can at the same time be aggregated into a higher level which would represent, for instance, a transmission or regional operator. We can thus see that hierarchical multi-grid schemes enable scalable coordination of potentially large number of market players over large geographical regions. The hierarchical multi-grid scheme can also be applied to handle multiple timescales of a single market player that might need to manage, for instance, assets with different dynamic characteristics.

3 Case Studies

We now present numerical case studies to demonstrate the concepts in the context of temporal and spatial management of energy systems. We use a multi-scale (in time) optimization problem with features of an storage management problem and a multi-scale (in space) optimization problem that considers power flow dispatch over a network.

3.1 Multi-Scale Temporal Control

We use a multi-grid scheme to solve the following temporal planning problem 𝒫\mathcal{P}:

minx,u\displaystyle\min_{{x},{u}}\quad i𝒩(x(i)2+u(i)2)\displaystyle\sum_{i\in\mathcal{N}}(x(i)^{2}+{u(i)}^{2}) (35a)
s.t. x(i+1)=x(i)+δ(u(i+1)+d(i+1)),i𝒩\displaystyle x(i+1)=x(i)+\delta(u(i+1)+d(i+1)),\quad i\in\mathcal{N} (35b)
x(0)=0.\displaystyle x(0)=0. (35c)

This problem has a state, a control, and a disturbance defined over N=MKN=M\cdot K time points contained in the set 𝒩\mathcal{N}. The state and control are grouped into the decision variable z(i)=(x(i),u(i))z(i)=(x(i),u(i)). The distance between mesh points is given by δ\delta. The structure of this problem resembles that of an inventory (storage) problem in which the disturbance d(i)d(i) is a load and u(i)u(i) is a charge/discharge flow from the storage. In these types of problems, the load might have frequencies covering multiple timescales (e.g., seasonal, daily, and down to seconds). Consequently, the time mesh δ\delta has to be rather fine to capture all the frequencies. Moreover, the planning horizon (the time domain NδN\cdot\delta) might need to be long so as to capture the low frequency components in the load. We partition the problem into KK partitions, each containing MM points. The set of inner points in the partition is defined as \mathcal{M}. The optimization problem over a partition kk solved in the GS scheme is given by:

minxk,uk\displaystyle\min_{{x}_{k},u_{k}}\quad i(xk(i)2+uk(i)2)+xk(M)λk+1\displaystyle\sum_{i\in\mathcal{M}}\left(x_{k}(i)^{2}+{u_{k}(i)}^{2}\right)+x_{k}(M)\lambda_{k+1}^{\ell} (36a)
s.t. xk(i+1)=xk(i)+δ(uk(i+1)+dk(i+1)),i\displaystyle x_{k}(i+1)=x_{k}(i)+\delta(u_{k}(i+1)+d_{k}(i+1)),\quad i\in\mathcal{M} (36b)
xk(0)=xk1+1(M)(λk).\displaystyle x_{k}(0)=x_{k-1}^{\ell+1}(M)\quad(\lambda_{k}). (36c)

In our numerical experiments, we set K=10K=10 and M=100M=100 to give N=1,000N=1,000 points. The time mesh points were set t(i)=iδt(i)=i\cdot\delta with δ=0.1\delta=0.1. We use a disturbance signal composed of a low and a high frequency d(i)=4sin(4πiN)+sin(24πiN),i𝒩d(i)={4\sin\left(\frac{4\pi i}{N}\right)}+{\sin\left(\frac{24\pi i}{N}\right)},\,i\in\mathcal{N}. The disturbance signal and its frequency components are shown in Figure 5.

Refer to caption
Figure 5: Disturbance profile (left) and frequency components (right) for temporal problem.

We solve a coarse problem to create a hierarchical structure that aids the GS scheme (which operates on the partitions at high resolution). To perform coarsening, we aggregate Mc=4M_{c}=4 internal grid points (i.e., collapse 25 points into one coarse point). The set of inner coarse points is c\mathcal{M}_{c}. The projection is xk=Txx~k{x}_{k}=T_{x}\tilde{{x}}_{k} and uk=Tuu~k{u}_{k}=T_{u}\tilde{{u}}_{k} with:

Tx=Tu=[11111]\displaystyle T_{x}=T_{u}=\begin{bmatrix}1\\ &1\\ &\vdots\\ &1\\ &&\ddots\\ &&&1\\ &&&\vdots\\ &&&1\end{bmatrix} (37a)

The projection matrices TxT_{x} and TuT_{u} have full column rank. The fine to coarse restriction can also be expressed as:

x~k(i~)=xk(i),u~k(i~)=uk(i),if i~=i1M/Mc+1\displaystyle\tilde{x}_{k}(\tilde{i})=x_{k}(i),\quad\tilde{u}_{k}(\tilde{i})=u_{k}(i),\quad\text{if }\quad\tilde{i}=\lfloor{\frac{i-1}{M/M_{c}}}\rfloor+1 (38)

where \lfloor\cdot\rfloor is a round-down operator. We denote φ(i)=i1M/Mc+1\varphi(i)=\lfloor{\frac{i-1}{M/M_{c}}}\rfloor+1. The coarse problem can then be stated in terms of the coarse variables as follows:

minx~k,u~k\displaystyle\min_{{\tilde{x}}_{k},\tilde{u}_{k}}\quad jc(x~k(j)2+u~k(j)2)+x~k(Mc)λk+1\displaystyle\sum_{j\in\mathcal{M}_{c}}\left(\tilde{x}_{k}(j)^{2}+{\tilde{u}_{k}(j)}^{2}\right)+\tilde{x}_{k}(M_{c}){\lambda}_{k+1}^{\ell} (39a)
s.t. x~k(j+1)=x~k(j)+MMcδ(u~k(j+1)+d~k(j+1)),jc\displaystyle\tilde{x}_{k}(j+1)=\tilde{x}_{k}(j)+\frac{M}{M_{c}}\delta(\tilde{u}_{k}(j+1)+\tilde{d}_{k}(j+1)),\quad j\in\mathcal{M}_{c} (39b)
x~k(0)=x~k1+1(Mc)(λk),\displaystyle\tilde{x}_{k}(0)=\tilde{x}_{k-1}^{\ell+1}(M_{c})\quad(\lambda_{k}), (39c)

where d~k(j)=McMiφ1(j~)dk(i)\tilde{d}_{k}(j)=\frac{M_{c}}{M}\sum_{i\in\varphi^{-1}(\tilde{j})}d_{k}(i) is the coarsened disturbance signal. The dynamic equations are defined over a smaller dimensional space (defined over c\mathcal{M}_{c}) which results from aggregating the dynamic equations in the full resolution problem (defined over \mathcal{M}). We solve the full resolution problem (35) and compare its solution against that of the pure GS scheme, the one of the coarse low resolution problem, and the one of the hierarchical scheme that solves the coarse problem to coordinate the GS scheme. We also solve the coarse problem by using a GS scheme. Figure 6 shows the results. We note that the solution to the coarse problem (36) captures the general long-term trend of the solution but misses the high frequencies. The GS scheme refines this solution and converges in around 30 coordination steps. By comparing Figure 6 (top-right) and (bottom-right), we can see that initializing GS scheme with the coarse solution significantly reduces the initial error, and demonstrates the benefit of the hierarchical scheme.

Refer to caption
Figure 6: (Top-left) solution of coarse problem, (top-right) solution of first GS step with coarsening, (bottom-left) solution of 30th GS step with coarsening, (bottom-right) solution of first GS step without coarsening.

3.2 Multi-Scale Spatial Control

Useful insights on how to use multi-grid schemes to create hierarchical network control structures result from interpreting network flows as diffusive processes. To illustrate this, we consider a network with nodes defined on a rectangular mesh. A node (i,j)(i,j) in the network exchanges flows with its four neighboring nodes (i,j+1),(i,j1),(i+1,j),(i1,j)(i,j+1),(i,j-1),(i+1,j),(i-1,j) (this is called a stencil). The flows f(i,j)f({i,j}) are a function of the node potentials p(i,j)p({i,j}) and given by:

f(i,j;i,j+1)\displaystyle f({i,j;i,j+1}) =D(p(i,j)p(i,j+1))\displaystyle=D(p({i,j})-p({i,j+1})) (40a)
f(i,j;i,j1)\displaystyle f({i,j;i,j-1}) =D(p(i,j)p(i,j1))\displaystyle=D(p({i,j})-p({i,j-1})) (40b)
f(i,j;i+1,j)\displaystyle f({i,j;i+1,j}) =D(p(i,j)p(i+1,j))\displaystyle=D(p({i,j})-p({i+1,j})) (40c)
f(i,j;i1,j)\displaystyle f({i,j;i-1,j}) =D(p(i,j)p(i1,j)).\displaystyle=D(p({i,j})-p({i-1,j})). (40d)

Here, DD\in\mathbb{R} is the diffusion constant (i.e., the flow resistance) of the link connecting the nodes. At each node (i,j)(i,j) we have a load d(i,j)d(i,j) and a source u(i,j)u(i,j) that is used to counteract (balance) the load. This gives the flow balance conservation equation:

f(i,j;i,j+1)+f(i,j;i,j1)+f(i,j;i+1,j)+f(i,j;i1,j)=u(i,j)+d(i,j).\displaystyle f({i,j;i,j+1})+f({i,j;i,j-1})+f({i,j;i+1,j})+f({i,j;i-1,j})=u(i,j)+d(i,j).

This can also be written in terms of the potentials as:

D(4p(i,j)p(i1,j)p(i+1,j)p(i,j1)p(i,j+1))=u(i,j)+d(i,j).\displaystyle D\left(4\cdot p(i,j)-p(i-1,j)-p(i+1,j)-p(i,j-1)-p(i,j+1)\right)=u(i,j)+d(i,j).
Refer to caption
Figure 7: Disturbance field (left) and its components (right) for spatial optimization problem.

We assume that we have fixed 2-D spatial domain Ω:=[0,X]×[0,Y]\Omega:=[0,X]\times[0,Y] that is discretized using MPM\cdot P nodes in each direction. The sets 𝒩x=𝒩y:={1,2,,PM}\mathcal{N}^{x}=\mathcal{N}^{y}:=\{1,2,\cdots,P\cdot M\} are the sets of points in each direction. The set of total mesh points is 𝒩=𝒩x×𝒩y\mathcal{N}=\mathcal{N}^{x}\times\mathcal{N}^{y} and thus N=(PM)(PM)N=(P\cdot M)\cdot(P\cdot M). As the number of nodes increases, the node potentials form a continuum described by the 2-D diffusion equation:

D(2p(x,y)x2+2p(x,y)y2)=u(x,y)+d(x,y),(x,y)Ω.\displaystyle D\left(\frac{\partial^{2}p(x,y)}{\partial x^{2}}+\frac{\partial^{2}p(x,y)}{\partial y^{2}}\right)=u(x,y)+d(x,y),\quad(x,y)\in\Omega. (41)

Using this analogy, we consider the following full-space problem:

minp,u\displaystyle\min_{{p},{u}}\quad (i,j)𝒩(p(i,j)2+u(i,j)2)\displaystyle\sum_{(i,j)\in\mathcal{N}}\left(p(i,j)^{2}+u(i,j)^{2}\right) (42a)
s.t. D(4p(i,j)p(i1,j)p(i+1,j)p(i,j1)p(i,j+1))\displaystyle D\left(4\cdot p(i,j)-p(i-1,j)-p(i+1,j)-p(i,j-1)-p(i,j+1)\right)
=u(i,j)+d(i,j),(i,j)𝒩\displaystyle\qquad\qquad=u(i,j)+d(i,j),\quad(i,j)\in\mathcal{N} (42b)
p(0,j)=0,j𝒩y\displaystyle p(0,j)=0,\quad j\in\mathcal{N}^{y} (42c)
p(MP+1,j)=0,j𝒩y\displaystyle p(M\cdot P+1,j)=0,\quad j\in\mathcal{N}^{y} (42d)
p(i,0)=0,i𝒩x\displaystyle p(i,0)=0,\quad i\in\mathcal{N}^{x} (42e)
p(i,MP+1)=0,i𝒩x\displaystyle p(i,M\cdot P+1)=0,\quad i\in\mathcal{N}^{x} (42f)

The goal of the optimization problem is, given the loads d(i,j)d(i,j), to control the potentials in the network nodes p(i,j)p(i,j) by using the sources u(i,j)u(i,j). The decision variables at every node are z(i,j)=(p(i,j),u(i,j))z(i,j)=(p(i,j),u(i,j)). The presence of multiple frequencies in the 2-D disturbance load field d(i,j)d(i,j) might require us to consider fine meshes, making the optimization problem intractable. One can think of the disturbance field as spatial variations of electrical loads observed over a geographical region. If the loads have high frequency spatial variations, it would imply that we need high control resolution (i.e., we need sources at every node in the network to achieve tight control). This can be achieved, for instance, by installing distributed energy resources (DERs). Moreover, if the load has low-frequency variations, it would imply that the DERs would have to cooperate to counteract global variations.

In our experiments, the size of mesh was set P=10P=10 and M=10M=10, which results in N=10,000N=10,000 mesh points. To address this complexity, we partition the 2-D domain into K=PPK=P\cdot P partitions each with MMM\cdot M points and we label each element in the partition as k=(n,m)𝒦k=(n,m)\in\mathcal{K}. We can think of each partition k𝒦k\in\mathcal{K} as a region of the network. We define inner index sets by: x=y:={1,2,,M}\mathcal{M}^{x}=\mathcal{M}^{y}:=\{1,2,\cdots,M\} and :=x×y\mathcal{M}:=\mathcal{M}^{x}\times\mathcal{M}^{y}. The GS scheme for partition (n,m)(n,m) is given by:

minpn,m,un,m\displaystyle\min_{p_{n,m},{u}_{n,m}}\quad (i,j)(pn,m(i,j)2+un,m(i,j)2)\displaystyle\sum_{(i,j)\in\mathcal{M}}\left(p_{n,m}(i,j)^{2}+u_{n,m}(i,j)^{2}\right) (43a)
+jypn,m(1,j)λn1,m+1(M+1,j)+jypn,m(M,j)λn+1,m(0,j)\displaystyle+\sum_{j\in\mathcal{M}^{y}}p_{n,m}(1,j)\lambda_{n-1,m}^{\ell+1}(M+1,j)+\sum_{j\in\mathcal{M}^{y}}p_{n,m}(M,j)\lambda_{n+1,m}^{\ell}(0,j)
+ixpn,m(i,1)λn,m1+1(i,M+1)+ixpn,m(i,M)λn,m+1(i,0)\displaystyle+\sum_{i\in\mathcal{M}^{x}}p_{n,m}(i,1)\lambda_{n,m-1}^{\ell+1}(i,M+1)+\sum_{i\in\mathcal{M}^{x}}p_{n,m}(i,M)\lambda_{n,m+1}^{\ell}(i,0)
s.t. D(4pn,m(i,j)pn,m(i1,j)pn,m(i+1,j)pn,m(i,j1)pn,m(i,j+1))\displaystyle D\left(4\cdot p_{n,m}(i,j)-p_{n,m}(i-1,j)-p_{n,m}(i+1,j)-p_{n,m}(i,j-1)-p_{n,m}(i,j+1)\right)
=un,m(i,j)+dn,m(i,j),(i,j)\displaystyle\qquad\qquad=u_{n,m}(i,j)+d_{n,m}(i,j),\quad(i,j)\in\mathcal{M} (43b)
pn,m(0,j)=pn1,m+1(M,j),(λn,m(0,j))\displaystyle p_{n,m}(0,j)=p_{n-1,m}^{\ell+1}(M,j),\quad\quad\;\;(\lambda_{n,m}(0,j)) (43c)
pn,m(M+1,j)=pn+1,m(1,j),(λn,m(M+1,j))\displaystyle p_{n,m}(M+1,j)=p_{n+1,m}^{\ell}(1,j),\quad(\lambda_{n,m}(M+1,j)) (43d)
pn,m(i,0)=pn,m1+1(i,M),(λn,m(i,0))\displaystyle p_{n,m}(i,0)=p_{n,m-1}^{\ell+1}(i,M),\quad\qquad(\lambda_{n,m}(i,0)) (43e)
pn,m(i,M+1)=pn,m+1(i,1),(λn,m(i,M+1)),\displaystyle p_{n,m}(i,M+1)=p_{n,m+1}^{\ell}(i,1),\quad\;\;(\lambda_{n,m}(i,M+1)),\quad (43f)

The constraint indices for the constraints (43c)-(43f) run over jyj\in\mathcal{M}^{y} and ixi\in\mathcal{M}^{x}.

To perform coarsening, a mesh of (M/Mc)(M/Mc)(M/M_{c})\cdot(M/M_{c}) points is collapsed into a single coarse point and the mapping from the coarse space to the original space is:

p~n,m(i~,j~)=pn,m(i,j)if i~=i1M/Mc+1,j~=j1M/Mc+1.\tilde{p}_{n,m}(\tilde{i},\tilde{j})=p_{n,m}(i,j)\quad\text{if }\quad\tilde{i}=\lfloor\frac{i-1}{M/M_{c}}\rfloor+1,\quad\tilde{j}=\lfloor\frac{j-1}{M/M_{c}}\rfloor+1. (44)

As with the temporal case, we also perform aggregation of the constraints in the partition to obtain a coarse representations. In our experiments, we used Mc=2M_{c}=2 as default. The disturbance field is given by a linear combination of a 2-D sinusoidal and of a Gaussian function. The shape of the load field illustrated in Figure 7. Figure 8 shows the optimal potential field obtained with the coarse problem and that obtained with the GS scheme at the first and tenth steps (initialized with the coarse field). Note that the coarse field error captures the global structure of the load field but misses the high frequencies, while the GS scheme corrects the high-frequency load imbalances. In Figure 9 we again illustrate that the hierarchical scheme outperforms the decentralized GS scheme.

Refer to caption
Figure 8: (Top) Potential field solution and error of coarse problem, (middle) solution and error of first GS update, and (bottom) solution and error of tenth GS update.
Refer to caption
Figure 9: (Left) error for first GS update with coarsening and (right) error without coarsening.

3.3 Effect of Coarsening Strategy

We now compare the efficiency of coarsening with different resolutions on the performance of GS. The results are given in Figure 10 (top left and right) and reveal that using a higher resolution for the coarse problem does not necessarily result in better performance of the GS scheme. This is particularly evident in the temporal case while for the spatial case increasing the mesh resolution does help. We attribute this difference to the asymmetric nature of the temporal problem compared to the symmetric mesh of the spatial case. For the temporal problem, we have found that the most effective coarsening strategy is to solve a sequence of coarse problems with increasing resolution. At each coarsening level, however, we only perform a single GS coordination step and the resulting coordination variables are used to initialize the GS coordination at the next level. The error evolution of this sequential coarsening scheme is shown in Figure 10 (top left). We sequentially solved the coarse problems by using Mc=1,Mc=2,Mc=4,Mc=5,Mc=10,Mc=20,Mc=25,Mc=50M_{c}=1,M_{c}=2,M_{c}=4,M_{c}=5,M_{c}=10,M_{c}=20,M_{c}=25,M_{c}=50 (this gives a total of eight GS steps). We note that, at the tenth GS step, the solution from this sequential coarsening scheme is about seventy times smaller than that obtained with no coarsening. This can be attributed to the ability of the single step GS schemes to cover a wider range of frequencies.

3.4 Effect of Coordination Order

We solved the multi-scale temporal control problem (36) and the spatial control problem (43) with four different ordering methods for each problem. For temporal control problem, ordering method 1 was a lexicographic ordering, ordering method 2 was a reverse lexicographic ordering, ordering method 3 was a forward-backward ordering, and ordering method 4 was the red-black scheme. For the spatial control problem, ordering method 1 was a lexicographic ordering, ordering method 2 was a spiral-like ordering, ordering method 3 was the red-black ordering, and ordering method 4 was set by ordering the partitions based on the magnitude of the disturbance. The results are presented in Figure 10 (top left and right). As can be seen, in temporal problem, the performance of reverse lexicographic ordering is significantly better than that achieved by other methods. This can be attributed to the asymmetry of the coupling topology. In particular, in the temporal problem, the primal variable information is propagated in forward direction while the dual information is propagated in reverse direction. It can be seen that dual information plays an important role in the convergence of the temporal problem. In the spatial problem, the performance of the different orderings is virtually the same. The red-black ordering (which enables parallelism) achieves the same performance as the rest. We attribute this to the symmetry of the spatial domain.

Refer to caption
Figure 10: (Top-left) error of temporal control with different coarsening schemes, (top-right) error of spatial control with different coarsening schemes, (bottom-left) error of temporal control with different ordering schemes, and (bottom-right) error of spatial control with different ordering schemes.

4 Conclusions and Directions of Future Work

We have presented basic elements of multi-grid computing schemes and illustrated how to use these to create hierarchical coordination architectures for complex systems. In particular, we discuss how Gauss-Seidel schemes can be seen as decentralized coordination schemes that handle high-frequency effects while coarse solution operators can be seen as low-resolution centralized coordination schemes that handle low-frequency effects. We believe that multi-grid provides a powerful framework to systematically construct hierarchical coordination architectures but diverse challenges need to be addressed. In particular, it is important to understand convergence properties of GS schemes in more complex settings with nonlinear effects and inequality constraints. Moreover, it is necessary to develop effective coarsening (aggregation) schemes that can retain useful information while reducing complexity. Moreover, it is desirable to combine hierarchical coordination schemes and existing control theory to analyze stability and robustness properties.

5 Acknowledgements

We acknowledge funding from the National Science Foundation under award NSF-EECS-1609183.

References

  • (1) Albersmeyer, J., Diehl, M.: The lifted newton method and its application in optimization. SIAM Journal on Optimization 20(3), 1655–1684 (2010)
  • (2) Antoulas, A.C., Sorensen, D.C., Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemporary mathematics 280, 193–220 (2001)
  • (3) Arnold, M., Negenborn, R., Andersson, G., De Schutter, B.: Distributed predictive control for energy hub coordination in coupled electricity and gas networks. In: Intelligent Infrastructures, pp. 235–273. Springer (2010)
  • (4) Baldea, M., Daoutidis, P.: Control of integrated process networks a multi-time scale perspective. Computers & chemical engineering 31(5), 426–444 (2007)
  • (5) Biegler, L.T., Zavala, V.M.: Large-scale nonlinear programming using ipopt: An integrating framework for enterprise-wide dynamic optimization. Computers & Chemical Engineering 33(3), 575–582 (2009)
  • (6) Borzì, A., Kunisch, K.: A multigrid scheme for elliptic constrained optimal control problems. Computational Optimization and Applications 31(3), 309–333 (2005)
  • (7) Borzì, A., Schulz, V.: Multigrid methods for pde optimization. SIAM review 51(2), 361–395 (2009)
  • (8) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3(1), 1–122 (2011)
  • (9) Brandt, A.: Algebraic multigrid theory: The symmetric case. Applied mathematics and computation 19(1), 23–56 (1986)
  • (10) Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.: Distributed model predictive control. Control Systems, IEEE 22(1), 44–52 (2002)
  • (11) Chow, J.H., Kokotovic, P.V.: Time scale modeling of sparse dynamic networks. Automatic Control, IEEE Transactions on 30(8), 714–722 (1985)
  • (12) Diehl, M., Bock, H.G., Schlöder, J.P., Findeisen, R., Nagy, Z., Allgöwer, F.: Real-time optimization and nonlinear model predictive control of processes governed by differential-algebraic equations. Journal of Process Control 12(4), 577–585 (2002)
  • (13) Farina, M., Zhang, X., Scattolini, R.: A hierarchical mpc scheme for interconnected systems. arXiv preprint arXiv:1703.02739 (2017)
  • (14) Fisher, M.L.: The lagrangian relaxation method for solving integer programming problems. Management science 50(12_supplement), 1861–1871 (2004)
  • (15) Giselsson, P., Doan, M.D., Keviczky, T., De Schutter, B., Rantzer, A.: Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3), 829–833 (2013)
  • (16) Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)
  • (17) Jamshidi, M., Tarokh, M., Shafai, B.: Computer-aided analysis and design of linear control systems. Prentice-Hall, Inc. (1992)
  • (18) Joo, J.Y., Ilic, M.D.: Multi-layered optimization of demand resources using lagrange dual decomposition. Smart Grid, IEEE Transactions on 4(4), 2081–2088 (2013)
  • (19) Kokotovic, P.: Subsystems, time scales and multimodeling. Automatica 17(6), 789–795 (1981)
  • (20) Kokotovic, P., Avramovic, B., Chow, J., Winkelman, J.: Coherency based decomposition and aggregation. Automatica 18(1), 47–56 (1982)
  • (21) Kozma, A., Frasch, J.V., Diehl, M.: A distributed method for convex quadratic programming problems arising in optimal control of distributed systems. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 1526–1531. IEEE (2013)
  • (22) Lefort, A., Bourdais, R., Ansanay-Alex, G., Guéguen, H.: Hierarchical control method applied to energy management of a residential house. Energy and Buildings 64, 53–61 (2013)
  • (23) Liu, C., Shahidehpour, M., Wang, J.: Application of augmented lagrangian relaxation to coordinated scheduling of interdependent hydrothermal power and natural gas systems. Generation, Transmission & Distribution, IET 4(12), 1314–1325 (2010)
  • (24) Negenborn, R.R., De Schutter, B., Hellendoorn, J.: Efficient implementation of serial multi-agent model predictive control by parallelization. In: Networking, Sensing and Control, 2007 IEEE International Conference on, pp. 175–180. IEEE (2007)
  • (25) Peponides, G.M., Kokotovic, P.V.: Weak connections, time scales, and aggregation of nonlinear systems. Automatic Control, IEEE Transactions on 28(6), 729–735 (1983)
  • (26) Rawlings, J.B., Bonné, D., Jørgensen, J.B., Venkat, A.N., Jørgensen, S.B.: Unreachable setpoints in model predictive control. Automatic Control, IEEE Transactions on 53(9), 2209–2215 (2008)
  • (27) Rawlings, J.B., Mayne, D.: Model predictive control. Noble Hill Publishing (2008)
  • (28) Scattolini, R.: Architectures for distributed and hierarchical model predictive control–a review. Journal of Process Control 19(5), 723–731 (2009)
  • (29) Scattolini, R., Colaneri, P.: Hierarchical model predictive control. In: Decision and Control, 2007 46th IEEE Conference on, pp. 4803–4808. IEEE (2007)
  • (30) Simon, H.A., Ando, A.: Aggregation of variables in dynamic systems. Econometrica: journal of the Econometric Society pp. 111–138 (1961)
  • (31) Stewart, B.T., Venkat, A.N., Rawlings, J.B., Wright, S.J., Pannocchia, G.: Cooperative distributed model predictive control. Systems & Control Letters 59(8), 460–469 (2010)
  • (32) Stewart, B.T., Wright, S.J., Rawlings, J.B.: Cooperative distributed model predictive control for nonlinear systems. Journal of Process Control 21(5), 698–704 (2011)
  • (33) Summers, T.H., Lygeros, J.: Distributed model predictive consensus via the alternating direction method of multipliers. In: Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pp. 79–84. IEEE (2012)
  • (34) Zavala, V.M.: New architectures for hierarchical predictive control. IFAC-PapersOnLine 49(7), 43–48 (2016)
  • (35) Zavala, V.M., Anitescu, M.: Real-time nonlinear optimization as a generalized equation. SIAM Journal on Control and Optimization 48(8), 5444–5467 (2010)
  • (36) Zavala, V.M., Anitescu, M.: Real-time nonlinear optimization as a generalized equation. SIAM Journal on Control and Optimization 48(8), 5444–5467 (2010)
  • (37) Zavala, V.M., Biegler, L.T.: The advanced-step nmpc controller: Optimality, stability and robustness. Automatica 45(1), 86–93 (2009)
  • (38) Zhu, D., Yang, R., Hug-Glanzmann, G.: Managing microgrids with intermittent resources: A two-layer multi-step optimal control approach. In: North American Power Symposium (NAPS), 2010, pp. 1–8. IEEE (2010)