¹¹institutetext: Sungho Shin ²²institutetext: Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA ²²email: [email protected] ³³institutetext: Victor M. Zavala ⁴⁴institutetext: Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA ⁴⁴email: [email protected]

Multi-Grid Schemes for Multi-Scale Coordination of Energy Systems

Sungho Shin and Victor M. Zavala

Abstract

We discuss how multi-grid computing schemes can be used to design hierarchical coordination architectures for energy systems. These hierarchical architectures can be used to manage multiple temporal and spatial scales and mitigate fundamental limitations of centralized and decentralized architectures. We present the basic elements of a multi-grid scheme, which includes a smoothing operator (a high-resolution decentralized coordination layer that targets phenomena at high frequencies) and a coarsening operator (a low-resolution centralized coordination layer that targets phenomena at low frequencies). For smoothing, we extend existing convergence results for Gauss-Seidel schemes by applying them to systems that cover unstructured domains. This allows us to target problems with multiple timescales and arbitrary networks. The proposed coordination schemes can be used to guide transactions in decentralized electricity markets. We present a storage control example and a power flow diffusion example to illustrate the developments.

1 Motivation and Setting

We consider the following optimization problem:


$\displaystyle\min_{{z}}\quad$	$\displaystyle\frac{1}{2}z^{T}Qz-c^{T}z$	(1a)
s.t.	$\displaystyle Az+Bd={0},\quad(\nu)$	(1b)
	$\displaystyle\Pi\,{z}=0,\quad\;\quad\;\;\;(\lambda)$	(1c)

Here, ${z}\in\mathbb{R}^{N\cdot n_{z}}$ are decision or primal variables (including states and controls) and ${d}\in\mathbb{R}^{N\cdot n_{d}}$ is the data (including disturbances and system parameters). These variable vectors contain elements that are distributed over a mesh with $N\in\mathbb{Z}$ points that covers a certain temporal or spatio-temporal domain of interest $\Omega$ . We define the set of points in the mesh as $\mathcal{N}$ with $|\mathcal{N}|=N$ . The matrix $Q\in\mathbb{R}^{N\cdot n_{z}\times N\cdot n_{z}}$ is positive definite and $c\in\mathbb{R}^{N\cdot n_{z}}$ is a cost vector. The constraint (1b) (with associated dual variables $\nu\in\mathbb{R}^{m}$ ) is defined by the matrices $A\in\mathbb{R}^{m\times N\cdot n_{z}}$ and $B\in\mathbb{R}^{m\times N\cdot n_{d}}$ and the matrix $A$ is assumed to have full row rank. The constraints may include discretized dynamic equations (in space and time) and other physical constraints. The constraints (1c) (with associated dual variables $\lambda\in\mathbb{R}^{p}$ ) are defined by the matrix $\Pi\in\mathbb{R}^{p\times N\cdot n_{z}}$ . This constraint models coupling (connectivity) between the primal variables at different mesh points and can also be used to model boundary conditions. Problem (1) captures formulations used in optimization-based control strategies such as model predictive control (MPC) rawlingsbook .

We assume that the dimension of the mesh $N$ describing problem (1) is so large that the problem cannot be solved in a centralized manner. This is often the case in systems that cover large temporal and spatial domains and/or multiple scales. In an electrical network, for instance, a large number of nodes and harmonics might need to be captured, rendering centralized control impractical. An alternative to address this complexity is to partition the problem into subdomains to create decentralized control architectures. We begin by defining a partitioned version of problem (1):


$\displaystyle\min_{{z}_{k}}\quad$	$\displaystyle\sum_{k\in\mathcal{K}}\frac{1}{2}z_{k}^{T}Q_{k}z_{k}-c_{k}^{T}z_{k}$	(2a)
s.t.	$\displaystyle A_{k}z_{k}+B_{k}d_{k}=0,\quad\,k\in\mathcal{K}\quad(\nu_{k})$	(2b)
	$\displaystyle\sum_{k^{\prime}\in\mathcal{K}}\Pi_{kk^{\prime}}{z}_{k^{\prime}}=0,\quad\,k\in\mathcal{K}\quad(\lambda_{k})$	(2c)

We denote this problem as $\mathcal{P}$ . Here, $\mathcal{K}$ is a set for partitions (subdomains) of the set $\mathcal{N}$ and we define the number of partitions as $K:=|\mathcal{K}|$ . Each partition $k\in\mathcal{K}$ contains mesh elements $\mathcal{N}_{k}\subseteq\mathcal{N}$ satisfying $\cup_{k\in\mathcal{K}}\mathcal{N}_{k}=\mathcal{N}$ and $\mathcal{N}_{k}\cap\mathcal{N}_{k^{\prime}}=\emptyset$ for all $k,k^{\prime}\in\mathcal{K}$ and $k\neq k^{\prime}$ . The number of elements in a partition is denoted as $N_{k}:=|\mathcal{N}_{k}|$ . The variables and data $({z}_{k},{d}_{k})$ are defined over the partition $k\in\mathcal{K}$ . We represent the cost function as a sum of the partition cost functions with associated positive definite matrices $Q_{k}$ and cost vectors $c^{k}$ . The constraints are also split into individual partition constraints with associated matrices $A_{k},B_{k}$ and we link the partition variables by using the coupling constraints (2c) and associated matrices $\Pi_{kk^{\prime}},\,k,k^{\prime}\in\mathcal{K}$ . As we will discuss later, we can always obtain such a representation by introducing duplicate decision variables in each partition and by adding suitable coupling constraints. This procedure is known as lifting diehllift .

To avoid centralized coordination, a wide range of decomposition schemes (we also refer to them as decentralized coordination schemes) can be used. A popular approach used in the solution of partial differential equations (PDEs) and decomposition methods such as the alternating direction method of multipliers (ADMM) is the Gauss-Seidel (GS) coordination scheme borzi ; boydadmm . Here, the problem in each partition $k$ (often called a control agent) is solved independently from the rest and exchanges information with its neighbors to coordinate. For a lifted problem of the form (2), we will derive a decentralized GS scheme that solves problems over individual partitions $k\in\mathcal{K}$ of the form:


$\displaystyle z_{k}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{k}}\quad$	$\displaystyle\frac{1}{2}z_{k}^{T}Q_{k}z_{k}-{{{z}}}_{k}^{T}\left(c_{k}-\sum^{k-1}_{k^{\prime}=1}{\Pi_{k^{\prime}k}}^{T}\lambda_{k^{\prime}}^{\ell+1}-\sum^{N}_{k^{\prime}=k+1}{\Pi_{k^{\prime}k}}^{T}\lambda_{k^{\prime}}^{\ell}\right)$	(3a)
s.t.	$\displaystyle A_{k}z_{k}+B_{k}d_{k}={0}$	(3b)
	$\displaystyle{\Pi}_{kk}z_{k}+\sum_{k^{\prime}=1}^{k-1}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell+1}+\sum_{k^{\prime}=k+1}^{K}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell}=0\quad(\lambda_{k}).$	(3c)

We denote this partition subproblem as $\mathcal{P}_{k}^{\ell}$ that is solved at the update step $\ell\in\mathbb{Z}_{+}$ (that we call here the coordination step). From the solution of this problem we obtain the updated primal variables $z_{k}^{\ell+1}$ and dual variables $\lambda_{k}^{\ell+1}$ (corresponding to the coupling constraints (3c)). Here, $z_{k^{\prime}}^{\ell}$ and $\lambda_{k^{\prime}}^{\ell}$ are primal and dual variables for neighboring partitions connected to partition $k$ and that have not been updated while $z_{k^{\prime}}^{\ell+1}$ and $\lambda_{k^{\prime}}^{\ell+1}$ are primal and dual variables for neighboring partitions that have already been updated. We refer to the variables communicated between partitions as the coordination variables. We note that partition $k$ cannot update its primal and dual variables until the variables of a subset of the partitions connected to it have been updated. Consequently, the GS scheme is sequential and synchronous in nature. We also note that the connectivity topology (induced by the coupling matrices $\Pi_{kk^{\prime}}$ ) determines the communication structure. We highlight, however, that the order of the updates presented in (3) is lexicographic (in the order of the partition number) but this choice of update order is arbitrary and can be modified. We will see that the update order can be designed to derive parallel schemes (i.e., in which certain partitions can proceed independently of others) but that the order can affect performance. In Figure 1 we illustrate the configuration of a GS scheme over a 1-D mesh while in Figure 2 we present a configuration for a 2-D mesh. For the 2-D mesh, we note that the nodes spanning the domain are grouped into sets of the form $\mathcal{N}_{m,n}$ and we note that the information is exchanged using the state and dual variables in the boundary of the partitions. In Section 3 we discuss this approach in more detail.

In the next sections we derive and analyze a decentralized GS scheme to solve problem $\mathcal{P}$ . The analysis seeks to illustrate how the structure of the partition subproblem $\mathcal{P}_{k}$ arises and to highlight how information of the coordination variables propagates throughout the partitions. We then discuss how to create coarse representations of the full resolution problem $\mathcal{P}$ to obtain approximations for the coordination variables and with this accelerate the decentralized GS scheme. This gives rise to the concept of multi-grid schemes, that can be used to design hierarchical coordination architectures. Our analysis is performed on convex quadratic programs (QPs), which will reveal important features of multi-grid schemes.

Refer to caption — Figure 1: Configuration of a GS scheme over a 1-D mesh.

The concepts discussed in this paper seek to extend existing literature on decentralized and hierarchical MPC. Many strategies have been proposed to address the complexity of centralized MPC such as spatial and temporal decomposition methods rawlingscooperative ; kroghdecentralized ; scattoreview ; frasch ; deschutter , fast inexact schemes zavalaasnmpc ; zavalage ; diehlrealtime , and reduced order modeling techniques daoutidis . Decentralized control manages the entire system by coordinating multiple controllers, each operating in a different node or subnetwork. Decentralization also enables resiliency and asynchronicity, which are key practical advantages over centralized MPC. Different schemes for coordinating MPC controllers have been devised scattoreview . Lagrangian dual decomposition is a technique where Lagrange multipliers are used for coordination. This technique is popular in electricity markets because the dual variables can be interpreted as prices that are used as a coordination mechanism illic ; deschutter2 ; arnold . Techniques based on coordinate minimization schemes and distributed gradient methods have also been proposed to coordinate MPC controllers in general settings rawlingsjpc ; mpcadmm ; liuadmm . An important limitation of decentralized schemes is that coordination of subsystems tends to be slow (e.g., convergence rates of existing schemes are at best linear) boydadmm ; luoadmm ; fisher . This slow convergence has been reported in the context of energy networks in arnold . Moreover, spatial decentralization by itself does not address the complexity induced by multiple time scales. In particular, time scales and prediction horizons of different decentralized controllers might be different. To the best of our knowledge, no coordination schemes currently exist to handle such settings.

Hierarchical control seeks to overcome fundamental limitations of decentralized and centralized control schemes. Fundamental concepts of hierarchical control date as far back as the origins of automatic control itself hierarbook . Complex industrial control systems such as the power grid are structured hierarchically in one way or another to deal with multiple time and spatial scales. Existing hierarchies, however, are often constructed in ad-hoc manners by using objectives, physical models, and control formulations at different levels that are often incompatible. For instance, an independent system operator (ISO) solves a hierarchy of optimization problems (unit commitment, economic dispatch, optimal power flow) that use different physical representations of the system. This can lead to lost economic performance, unreachable/infeasible command signals, and instabilities rawlingsunreachable ; zavalaipopt . Hierarchical MPC provides a general framework to tackle dynamics and disturbances occurring at multiple time scales scattohierar ; scattoreview ; hierarbook ; zavalamultigrid and spatial scales farina2017hierarchical . In a traditional hierarchical MPC scheme, one uses a high-level controller to compute coarse control actions that are used as targets (commands) by low level controllers. This approach has been used recently in microgrids and multi-energy systems hierarresid ; hug . More sophisticated MPC controllers use robustness margins of the high level controller that are used by the lower level controller to maintain stability scattohierar . Significant advances in the analysis of multi-scale dynamical systems have also been made, most notably by the use of singular perturbation theory to derive reduced-order representations of complex networks kokotovicpower ; chowpower ; peponides ; simon . The application of such concepts in hierarchical MPC, however, has been rather limited. In particular, the recent review on hierarchical MPC by Scattolini notices that systematic design methods for hierarchical MPC are still lacking scattoreview . More specifically, no hierarchical MPC schemes have been proposed that aggregate and refine trajectories at multiple scales. In addition, existing schemes have been tailored to achieve feasibility and stability but do not have optimality guarantees. This is important in systems where both economic efficiency and stability must be taken into account. To the best of our knowledge, no attempt has been made to combine hierarchical and decentralized MPC schemes to manage spatial and temporal scales simultaneously. The multi-grid computing concepts presented in this work seek to take a first step towards creating more general hierarchical control architectures. The proposed multi-grid schemes provide a framework to coordinate decentralized electricity markets. This is done by exchanging state and price (dual information) at the interfaces of the agents domain. The ability to do this hierarchically enables coordination over multiple spatial schemes, in particular, provides a framework to cover large geographical regions that might involve many market players.

2 Analysis of Gauss-Seidel Schemes

This section presents basic concepts and convergence results for a GS scheme under a general convex QP setting. The results seek to highlight how the structure of the coupling between partition variables as well as the coordination sequence affect the performance of GS schemes.

2.1 Illustrative Setting

To introduce notation, we begin by considering a convex QP with two variable partitions (i.e., $\mathcal{K}=\{1,2\}$ ). To simplify the presentation, we do not include internal partition constraints and focus on coupling constraints across partitions. Under these assumptions, we have the following lifted optimization problem:


$\displaystyle\min_{{z}_{1},{z}_{2}}\quad$	$\displaystyle\frac{1}{2}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}^{T}\begin{bmatrix}Q_{1}&\\ &Q_{2}\end{bmatrix}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}-\begin{bmatrix}c_{1}\\ c_{2}\end{bmatrix}^{T}\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}$	(4a)
s.t.	$\displaystyle\underbrace{\begin{bmatrix}\Pi_{11}&\Pi_{12}\\ \Pi_{21}&\Pi_{22}\end{bmatrix}}_{\Pi}\underbrace{\begin{bmatrix}{z}_{1}\\ {z}_{2}\end{bmatrix}}_{z}=\begin{bmatrix}0\\ 0\end{bmatrix}\quad\begin{array}[]{c}(\lambda_{1})\\ (\lambda_{2})\end{array}$	(4d)

Here, the matrices $\Pi_{11},\Pi_{12},\Pi_{21},\Pi_{22}$ capture coupling between the variables $z_{1}$ and $z_{2}$ . We highlight that, in general, $\Pi_{12}\neq\Pi_{21}$ (the coupling between partitions is not symmetric). The matrices $Q_{1}$ and $Q_{2}$ are positive definite and we assume that $\Pi_{11}$ and $\Pi_{22}$ have full row rank and that the entire coupling matrix $\Pi$ has full row rank. By positive definiteness of $Q_{1}$ and $Q_{2}$ and the full rank assumption of $\Pi$ we have that the feasible set is non-empty and the primal and dual solution of (4) is unique. The coupling between partition variables is two-directional (given by the structure of the coupling matrix $\Pi$ ). This structure is found in 1-D linear networks. When the coupling is only one-directional, as is the case of temporal coupling, we have that $\Pi_{12}=0$ and thus $\Pi$ is block lower triangular, indicating that the primal variables only propagate forward in time. We will see, however, that backward propagation of information also exists, but in the space of the dual variables.

The first-order KKT conditions of (4) are given by the linear system:

\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}&&{\Pi_{21}}^{T}\\ \Pi_{11}&&\Pi_{12}&\\ &{\Pi_{12}}^{T}&Q_{2}&{\Pi_{22}}^{T}\\ \Pi_{21}&&\Pi_{22}&\end{bmatrix}\begin{bmatrix}{z}_{1}\\ {\lambda}_{1}\\ {z}_{2}\\ {\lambda}_{2}\end{bmatrix}=\begin{bmatrix}c_{1}\\ 0\\ c_{2}\\ 0\end{bmatrix}

(5)

To avoid solving this system in a centralized manner we use a decentralized GS scheme with coordination update index $\ell\in\mathbb{Z}_{+}$ . The scheme has the form:


$\displaystyle\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}c_{1}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{21}}^{T}\\ {\Pi_{12}}&\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell}\\ {\lambda}_{2}^{\ell}\end{bmatrix}$	(6a)
$\displaystyle\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}c_{2}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{12}}^{T}\\ {\Pi_{21}}&\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix}$	(6b)

This scheme requires an initial guess for the variables of the second partition $({z}_{2},{\lambda}_{2})$ that is used to update $(z_{1},\lambda_{1})$ and we then proceed to update the variables of the second partition. Note, however, that we have picked this coordination order arbitrarily (lexicographic order). In particular, one can start with an initial guess of the first partition to update the second partition and then update the first partition (reverse lexicographic order). This scheme has the form:


$\displaystyle\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}c_{2}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{12}}^{T}\\ {\Pi_{21}}&\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell}\\ {\lambda}_{1}^{\ell}\end{bmatrix}$	(7a)
$\displaystyle\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix}\begin{bmatrix}{z}_{1}^{\ell+1}\\ {\lambda}_{1}^{\ell+1}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}c_{1}\\ 0\end{bmatrix}-\begin{bmatrix}&{\Pi_{21}}^{T}\\ {\Pi_{12}}&\end{bmatrix}\begin{bmatrix}{z}_{2}^{\ell+1}\\ {\lambda}_{2}^{\ell+1}\end{bmatrix}$	(7b)

We will see that the coordination order affects the convergence properties of the GS scheme. In problems with many partitions, we will see that a large number of coordination orders are possible. Moreover, we will see that coordination orders can be designed to enable sophisticated parallel and asynchronous implementations. A key observation is that the linear systems (6) in the GS scheme are the first-order KKT conditions of the following partition problems $\mathcal{P}_{1}$ and $\mathcal{P}_{2}$ , respectively:


$\displaystyle{z}_{1}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{1}}$	$\displaystyle\quad\frac{1}{2}z_{1}^{T}Q_{1}{z}_{1}-z_{1}^{T}\left({c_{1}}-\Pi_{21}^{T}{\lambda}_{2}^{\ell}\right)$	(8a)
s.t.	$\displaystyle\quad\Pi_{11}{z}_{1}+\Pi_{12}{z}_{2}^{\ell}=0\quad(\lambda_{1})$	(8b)

$\displaystyle{z}_{2}^{\ell+1}=\mathop{\textrm{argmin}}_{{z}_{2}}$	$\displaystyle\quad\frac{1}{2}z_{2}^{T}Q_{2}{z}_{2}-z_{2}^{T}\left({c_{2}}-\Pi_{12}^{T}{\lambda}_{1}^{\ell+1}\right)$	(8c)
s.t.	$\displaystyle\quad\Pi_{22}{z}_{2}+\Pi_{21}{z}_{1}^{\ell+1}=0\quad(\lambda_{2})$	(8d)

The relevance of this observation is that one can implement the GS scheme by directly solving optimization problems, as opposed to performing intrusive linear algebra calculations zavalamultigrid . This has practical benefits, as one can use algebraic modeling languages and handle sophisticated problem formulations. This also reveals that both primal and dual variables are communicated between partitions. Primal information enters in the coupling constraints. The dual variables enter as cost terms in the objective and highlights the fact that dual variables can be interpreted as prices of the primal variable information exchanged between partitions.

We now seek to establish conditions guaranteeing convergence of this simplified GS scheme. To do so, we define the following matrices and vectors:


		$\displaystyle A_{1}:=\begin{bmatrix}Q_{1}&{\Pi_{11}}^{T}\\ {\Pi_{11}}\end{bmatrix},\;{x}_{1}^{\ell}:=\begin{bmatrix}{z}_{1}^{\ell}\\ {\lambda}_{1}^{\ell}\end{bmatrix},\;{b}_{1}:=\begin{bmatrix}c_{1}\\ 0\end{bmatrix},\;B_{12}:=\begin{bmatrix}&-{\Pi_{21}}^{T}\\ -{\Pi_{12}}&\end{bmatrix}$		(9a)
		$\displaystyle A_{2}:=\begin{bmatrix}Q_{2}&{\Pi_{22}}^{T}\\ {\Pi_{22}}\end{bmatrix},\;{x}_{2}^{\ell}:=\begin{bmatrix}{z}_{2}^{\ell}\\ {\lambda}_{2}^{\ell}\end{bmatrix},\;{b}_{2}:=\begin{bmatrix}c_{2}\\ 0\end{bmatrix},\;B_{21}:=\begin{bmatrix}&-{\Pi_{12}}^{T}\\ -{\Pi_{21}}&\end{bmatrix}.$		(9b)

The partition matrices $A_{1}$ , $A_{2}$ are non-singular because the matrices $Q_{1}$ , $Q_{2}$ are positive definite and $\Pi_{11}$ , $\Pi_{22}$ have full row rank. Nonsingularity of $A_{1}$ and $A_{2}$ implies that the partition optimization subproblems $\mathcal{P}_{1}$ and $\mathcal{P}_{2}$ have a unique solution for any values of the primal and dual variables of the neighboring partition. We can now express $({x}_{1}^{\ell+1},{x}_{2}^{\ell+1})$ in terms of $({x}_{1}^{\ell},{x}_{2}^{\ell})$ to obtain a recursion of the form:

\displaystyle\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell+1}\\ {x}_{2}^{\ell+1}\end{bmatrix}=\begin{bmatrix}\quad&B_{12}\\ \quad&\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\end{bmatrix}+\begin{bmatrix}{b}_{1}\\ {b}_{2}\end{bmatrix}.

(10)

We can write this system in compact form by defining:

{w}^{\ell}:=\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\end{bmatrix},\quad S:=\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{12}\\ \quad&\end{bmatrix},\quad{r}:=\begin{bmatrix}A_{1}&&\\ -B_{21}&A_{2}\end{bmatrix}^{-1}\begin{bmatrix}{b}_{1}\\ {b}_{2}\end{bmatrix}.

(11)

The solution of the $\ell$ -th update step of (6) can be represented as:


$\displaystyle{w}^{\ell+1}$	$\displaystyle=S{w}^{\ell}+{r}.$	(12a)
	$\displaystyle=S^{\ell}{w}^{0}+\left(I+S+\cdots+S^{\ell-1}\right){r}.$	(12b)

The solution of the QP (4) (i.e., the solution of the KKT system (5)) solves the implicit system ${w}=S{w}+{r}$ , which can also be expressed as $(I-S)w=r$ or $w=(I-S)^{-1}r$ . Consequently, we note that the eigenvalues of matrix $S$ play a key role in the convergence of the GS scheme (8). We discuss this in more detail in the following section.

2.2 General Setting

We now extend the previous analysis to a more general QP setting with an arbitrary number of partitions. Here, we seek to illustrate how to perform lifting in a general case where coupling is implicit in the model and how to derive a GS scheme to solve such a problem. Our discussion is based on the following convex QP:

\min_{{z}}\quad\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z}

(13)

Here ${z}=(z(1),z(2),\cdots,z(N))\in\mathbb{R}^{N}$ are the optimization variables, $Q\in\mathbb{R}^{N\times N}$ is a positive definite matrix, and $c\in\mathbb{R}^{N}$ is the cost vector. For simplicity (and without loss of generality) we assume that $n_{z}=1$ (there is only one variable per node). We let $Q_{ij}$ represent the $(i,j)$ -th component of matrix $Q$ and we let $\mathcal{N}=\{1,2,\cdots,N\}$ be the variable indices (in this case also the node indices). Problem (13) has a unique solution because the matrix $Q$ is positive definite.

We focus our analysis on the QP (13) because we note that equality constraints $\bar{A}z+\bar{B}d=0$ in (1) (with $\bar{A}^{T}=[A^{T}\,\Pi^{T}]$ and $\bar{B}^{T}=[B^{T}\,0]$ ) can be eliminated by using a null-space projection procedure. To see how this can be achieved we note that, if $A$ has full row rank, we can always construct a matrix $Z\in\mathbb{R}^{N\times\tilde{N}}$ whose columns span the null-space of $\bar{A}$ (i.e., $\bar{A}Z\tilde{z}=0$ for any $\tilde{z}\in\mathbb{R}^{\tilde{N}}$ ) and with $\tilde{N}<N$ . Similarly, we can construct a matrix $Y\in\mathbb{R}^{N\times(N-\tilde{N})}$ whose columns span the range-space of $\bar{A}^{T}$ (i.e., $Y\tilde{y}\in\textrm{Range}(\bar{A}^{T})$ for any $\tilde{y}\in\mathbb{R}^{N-\tilde{N}}$ ). We can express any $z\in\mathbb{R}^{N}$ as $z=Z\tilde{z}+Y\tilde{y}$ . We thus have that $\bar{A}z=\bar{A}Y\tilde{y}$ for all $\tilde{z}$ and thus $\tilde{y}=-(\bar{A}Y)^{-1}\bar{B}d$ and $z=-Y(AY)^{-1}\bar{B}d+Z\tilde{z}$ satisfies $\bar{A}z=-\bar{B}d$ for any $\tilde{z}$ . With this, we can express the quadratic objective as $\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z}$ as $\frac{1}{2}\tilde{z}^{T}Z^{T}QZ\tilde{z}-{c}^{T}Z\tilde{z}-(Y(AY)^{-1}\overline{B}d)^{T}QZ\tilde{z}+\kappa$ , where $\kappa$ is a constant. We thus obtain a QP of the same form as (13) but with matrix $Q\leftarrow Z^{T}QZ$ , reduced cost $c\leftarrow Z^{T}c+Z^{T}QY(AY)^{-1}\overline{B}d$ , and variable vector $z\leftarrow\tilde{z}$ . We highlight that this reduction procedure does not need to be applied in a practical implementation but we only use it to justify that the formulation (13) is general.

We thus have that the variables $z$ in (13) are coupled implicitly via the matrix $Q$ and we seek to express this problem in the lifted form. We proceed to partition the set $\mathcal{N}$ into a set of partitions $\mathcal{K}=\{1,..,K\}$ to give the partition sets $\mathcal{N}_{1},\mathcal{N}_{2},\cdots,\mathcal{N}_{K}\subseteq\mathcal{N}$ satisfying $\mathcal{N}=\mathcal{N}_{1}\cup\mathcal{N}_{2}\cup\cdots\cup\mathcal{N}_{K}$ and $\mathcal{N}_{k}\cap\mathcal{N}_{k^{\prime}}=\emptyset\quad\text{for all }k,k^{\prime}\in\mathcal{K}\text{ and }k\neq k^{\prime}$ . Coupling between variables arises when $Q_{ij}\neq 0$ for $i\in\mathcal{N}_{k}$ , $j\in\mathcal{N}_{k}^{\prime}$ , and $k\neq k^{\prime}$ . We perform lifting by defining index sets:

\displaystyle\underline{\mathcal{N}}_{k}:=\{j\in\mathcal{N}\setminus\mathcal{N}_{k}\mid\exists\,i\in\mathcal{N}_{k}\text{ s.t. }Q_{ij}\neq 0\},\quad\overline{\mathcal{N}}_{k}:=\mathcal{N}_{k}\cup\underline{\mathcal{N}}_{k}

(14)

The set $\underline{\mathcal{N}}_{k}$ includes all the coupled variables in the partition set $\mathcal{N}_{k}$ that are not in partition $k$ . The set $\overline{\mathcal{N}}_{k}$ includes all variables in partition $\mathcal{N}_{k}$ and its coupled variables. These definitions allow us to express problem (13) as:

\min_{{z}}\quad\frac{1}{2}\sum_{k\in\mathcal{K}}\sum_{i\in\mathcal{N}_{k}}\sum_{j\in\overline{\mathcal{N}}_{k}}Q_{ij}z(i)z(j)-\sum_{k\in\mathcal{K}}\sum_{i\in\mathcal{N}_{k}}c_{i}z(i).

(15)

To induce lifting, we introduce a new set of variables $\{{z}_{1},{z}_{2},\cdots,{z}_{K}\}$ defined as:

{z}_{k}:=\begin{bmatrix}z(k_{1})\\ \vdots\\ z(k_{N_{k}})\end{bmatrix},\quad\underline{{z}}_{k}=\begin{bmatrix}z(k_{N_{k}+1})\\ \vdots\\ z(k_{N_{k}+\underline{N}_{k}})\end{bmatrix},\quad\overline{{z}}_{k}:=\begin{bmatrix}{z}_{k}\\ \underline{{z}}_{k}\end{bmatrix}

(16)

where $\mathcal{N}_{k}=\{k_{1},k_{2},\cdots,k_{N_{k}}\}$ , $\underline{\mathcal{N}}_{k}=\{k_{N_{k}+1},k_{N_{k}+2},\cdots,k_{N_{k}+\underline{N}_{k}}\}$ , $N_{k}$ is the number of variables in partition $\mathcal{N}_{k}$ , and $\underline{N}_{k}$ is the number of variables coupled to partition $k$ . With this, we can express problem (15) in the following lifted form:


$\displaystyle\min_{{z}}\quad$	$\displaystyle\sum_{k\in\mathcal{K}}\frac{1}{2}{\overline{{z}}_{k}}^{T}\overline{Q}_{k}\overline{{z}}_{k}-{\overline{{c}}_{k}}^{T}\overline{{z}}_{k}.$	(17a)
s.t.	$\displaystyle{\Pi}_{kk}\overline{{z}}_{k}+\sum_{k^{\prime}\in\mathcal{K}\setminus\{k\}}{\Pi}_{kk^{\prime}}\overline{{z}}_{k^{\prime}}=0,\quad k\in\mathcal{K}.$	(17b)

Here, $\overline{Q}_{k}\in\mathbb{R}^{(N_{k}+\underline{N}_{k})\times(N_{k}+\underline{N}_{k})}$ and $\overline{c}_{k}\in\mathbb{R}^{(N_{k}+\underline{N}_{k})}$ are given by:


	$\displaystyle(\overline{Q}_{k})_{ij}=\begin{cases}Q_{k_{i}k_{j}},&\text{for }k_{i},k_{j}\in\mathcal{N}_{k}\\ \frac{1}{2}Q_{k_{i}k_{j}},&\text{for }k_{i}\in\mathcal{N}_{k}\text{ and }k_{j}\notin\mathcal{N}_{k}\\ \frac{1}{2}Q_{k_{i}k_{j}},&\text{for }k_{i}\notin\mathcal{N}_{k}\text{ and }k_{j}\in\mathcal{N}_{k}\\ 0,\quad&\text{otherwise}\end{cases},\qquad(\overline{{c}}_{k})_{i}=\begin{cases}c_{k_{i}},&\text{for }k_{i}\in\mathcal{N}_{k}\\ 0,&\text{for }k_{i}\notin\mathcal{N}_{k}\end{cases}$		(18a)

The coefficient matrices $\Pi_{kk}\in\mathbb{R}^{\underline{N}_{k}\times(N_{k}+\underline{N}_{k})}$ are given by:

\Pi_{kk}=\begin{bmatrix}{e_{N_{k}+1}}^{T}\\ \vdots\\ {e_{N_{k}+\underline{N}_{k}}}^{T}\end{bmatrix},

(19)

where $e_{i}\in\mathbb{R}^{N_{k}+\underline{N}_{k}}$ are elementary column vectors. Note that for $Q_{ij}\neq 0$ , $i\in\mathcal{N}_{k}$ , $j\in\mathcal{N}_{k^{\prime}}$ , and $k\neq k^{\prime}$ , $Q_{ij}$ can be included in the objective function of either partition $k$ , partition $k^{\prime}$ , or both. In the lifting scheme shown in (18a), we assume that these terms are equally divided and included in each partition. However, this approach is arbitrary, and other lifting schemes are possible. In other words, we can manipulate the lifting scheme to set the partition sets to satisfy either $j\in\underline{\mathcal{N}}_{k}$ or $j\notin\underline{\mathcal{N}}_{k}$ . Interestingly, one can show that the solution of the lifted problem is unique and is the same as that of problem (13). The proof of this assertion is intricate and will not be discussed here due to the lack of space. To simplify the notation we express the lifted variables $\bar{z}_{k}$ , matrices $\bar{Q}_{k}$ , and cost vectors $\bar{c}_{k}$ in (17) simply as $z_{k},Q_{k},c_{k}$ .

The primal-dual solution of (17) can be obtained by solving the KKT conditions:

\begin{bmatrix}Q_{1}&\Pi_{11}^{T}&&\Pi_{21}^{T}&&\cdots&&\Pi_{K1}^{T}\\ \Pi_{11}&&\Pi_{12}&&\cdots&&\Pi_{1K}&\\ &\Pi_{12}^{T}&Q_{2}&\Pi_{22}^{T}&&&&\vdots\\ \Pi_{21}&&\Pi_{22}&&&&\vdots&\\ &\vdots&&&\ddots&&&\vdots\\ \vdots&&&&&\ddots&\vdots&\\ &\Pi_{1K}^{T}&&\cdots&&\cdots&Q_{K}&\Pi_{KK}^{T}\\ \Pi_{K1}&&\cdots&&\cdots&&\Pi_{KK}&\\ \end{bmatrix}\begin{bmatrix}z_{1}\\ \lambda_{1}\\ z_{2}\\ \lambda_{2}\\ \vdots\\ z_{K}\\ \lambda_{K}\end{bmatrix}=\begin{bmatrix}c_{1}\\ {0}\\ c_{2}\\ {0}\\ \vdots\\ c_{K}\\ {0}\end{bmatrix}

(20)

By exploiting the structure of this system, we can derive a GS scheme of the form:

\begin{bmatrix}Q_{k}&\Pi_{kk}^{T}\\ \Pi_{kk}&\end{bmatrix}\begin{bmatrix}z_{k}^{\ell+1}\\ \lambda_{k}^{\ell+1}\end{bmatrix}=\begin{bmatrix}c_{k}\\ {0}\end{bmatrix}-\sum_{k^{\prime}=1}^{k-1}\begin{bmatrix}0&\Pi_{k^{\prime}k}^{T}\\ \Pi_{kk^{\prime}}&0\end{bmatrix}\begin{bmatrix}z_{k^{\prime}}^{\ell+1}\\ \lambda_{k^{\prime}}^{\ell+1}\end{bmatrix}-\sum_{k^{\prime}=k+1}^{K}\begin{bmatrix}0&\Pi_{k^{\prime}k}^{T}\\ \Pi_{kk^{\prime}}&0\end{bmatrix}\begin{bmatrix}z_{k^{\prime}}^{\ell}\\ \lambda_{k^{\prime}}^{\ell}\end{bmatrix}.

(21)

Here, we have used a lexicographic coordination order. We note that the solution of the linear system (21) solves the optimization problem:


$\displaystyle z_{k}^{\ell+1}=\mathop{\textrm{argmin}}_{z_{k}}$	$\displaystyle\quad\frac{1}{2}{z_{k}}^{T}Q_{k}z_{k}-{{{z}}}_{k}^{T}\left({{{c}_{k}}}-\sum^{k-1}_{k^{\prime}=1}\Pi_{k^{\prime}k}^{T}\lambda_{k^{\prime}}^{\ell+1}-\sum^{K}_{k^{\prime}=k+1}\Pi_{k^{\prime}k}^{T}\lambda_{k^{\prime}}^{\ell}\right)$	(22a)
s.t.	$\displaystyle\quad{\Pi}_{kk}z_{k}+\sum_{k^{\prime}=1}^{k-1}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell+1}+\sum_{k^{\prime}=k+1}^{K}{\Pi}_{kk^{\prime}}z_{k^{\prime}}^{\ell}=0\quad(\lambda_{k}).$	(22b)

From this structure, we can see how the primal and dual variables propagate forward and backward relative to the partition $k$ , due to the inherent block triangular nature of the GS scheme. We now seek to establish a condition that guarantees convergence of the GS scheme in this more general setting. To do so, we define:

A_{k}:=\begin{bmatrix}Q_{k}&{\Pi_{kk}}^{T}\\ \Pi_{kk}&\end{bmatrix},\quad{x}_{k}^{\ell}:=\begin{bmatrix}{z}_{k}^{\ell}\\ {\lambda}_{k}^{\ell}\end{bmatrix},\quad{b}_{k}:=\begin{bmatrix}c_{k}\\ {0}\end{bmatrix},\quad B_{kk^{\prime}}:=\begin{bmatrix}&-{\Pi_{k^{\prime}k}}^{T}\\ -\Pi_{kk^{\prime}}&\end{bmatrix}.

(23)

By using (23), we can express (21) as:

A_{k}{x}_{k}^{\ell+1}={b}_{k}+\sum^{k-1}_{k^{\prime}=1}B_{kk^{\prime}}{x}_{k^{\prime}}^{\ell+1}+\sum^{K}_{k^{\prime}=k+1}B_{kk^{\prime}}{x}_{k^{\prime}}^{\ell}.

(24)

This can be expressed in matrix form as:

\displaystyle\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell+1}\\ {x}_{2}^{\ell+1}\\ \vdots\\ {x}_{K}^{\ell+1}\end{bmatrix}=\begin{bmatrix}\quad&B_{12}&\cdots&B_{1K}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{K-1K}\\ \quad\end{bmatrix}\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\\ \vdots\\ {x}_{K}^{\ell}\end{bmatrix}+\begin{bmatrix}{b}_{1}\\ {b}_{2}\\ \vdots\\ {b}_{K}\end{bmatrix}.

(25)

We can see that the partition matrices $A_{k}$ are nonsingular by inspecting the block structure of $Q_{k}$ . In particular, by the definition of $Q_{k}$ in (18a) (we denoted $Q_{k}$ as $\overline{Q}_{k}$ here) and the definition of $\Pi_{kk}$ in (19), we can express $A_{k}$ as:

A_{k}=\begin{bmatrix}\hat{Q}_{k}&\underline{Q}_{k}^{T}&\\ {\underline{Q}_{k}}&&I\\ &I&\end{bmatrix}

(26)

where each components of $\hat{Q}_{k}\in\mathbb{R}^{N_{k}\times N_{k}}$ and $\underline{Q}_{k}\in\mathbb{R}^{\underline{N}_{k}\times N_{k}}$ are defined in (18a). Since $Q$ is positive definite, $\hat{Q}_{k}$ is also positive definite. Noting that $\hat{Q}_{k}$ is nonsingular, we can see that the columns of $A_{k}$ are linearly independent and thus $A_{k}$ is nonsingular as well. This implies that the block lower triangular matrix on the left-hand side of (25) is also nonsingular. To simplify notation we define:


$\displaystyle{w}^{\ell}$	$\displaystyle:=\begin{bmatrix}{x}_{1}^{\ell}\\ {x}_{2}^{\ell}\\ \vdots\\ {x}_{K}^{\ell}\end{bmatrix},\quad{r}:=\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}^{-1}\begin{bmatrix}{b}_{1}\\ {b}_{2}\\ \vdots\\ {b}_{K}\end{bmatrix}$	(27a)
$\displaystyle S$	$\displaystyle:=\begin{bmatrix}A_{1}&&&\\ -B_{21}&A_{2}&&\\ \vdots&\ddots&\ddots&\\ -B_{K1}&\cdots&-B_{KK-1}&A_{K}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{12}&\cdots&B_{1K}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{K-1K}\\ \quad\end{bmatrix}.$	(27b)

We express (25) by using the compact form ${w}^{\ell+1}=S{w}^{\ell}+{r}$ or $w^{\ell+1}=S^{\ell}{w}^{0}+\left(I+S+\cdots+S^{\ell-1}\right){r}$ . The solution of (20) satisfies ${{w}}=S{w}+{r}$ . This implies that the solution of the lifted problem also solves the original problem (13). We now formally establish the following convergence result for the GS scheme.

Proposition 1

The GS scheme (22) converges to the solution of (13) if all the eigenvalues of the matrix:

\displaystyle\begin{split}\Sigma:=\begin{bmatrix}Q_{1}&&&&\Pi_{11}^{T}&&&\\ &Q_{2}&&&\Pi_{12}^{T}&\Pi_{22}^{T}&&\\ &&\ddots&&\vdots&\ddots&\ddots&\\ &&&Q_{K}&\Pi_{1K}^{T}&\cdots&\Pi_{K-1K}^{T}&\Pi_{KK}^{T}\\ \Pi_{11}&&&&&&&\\ \Pi_{21}&\Pi_{22}&&&&&&\\ \vdots&\ddots&\ddots&&&&&\\ \Pi_{K1}&\cdots&\Pi_{KK-1}&\Pi_{KK}&&&&\\ \end{bmatrix}^{-1}\begin{bmatrix}&&&&&-\Pi_{21}^{T}&\cdots&-\Pi_{K1}^{T}\\ &&&&&&\ddots&\vdots\\ &&&&&&&-\Pi_{KK-1}^{T}\\ &&&&&&&\\ \quad&-\Pi_{12}&\cdots&-\Pi_{1K}&&&&\\ &&\ddots&\vdots&&&&\\ &&&-\Pi_{K-1K}&&&&\\ &&&&&&&\\ \end{bmatrix}\end{split}

have magnitude less than one.

Proof

Matrix $\Sigma$ is a permutation of matrix $S$ . The permutation is a similarity transformation and thus $S$ has the same set of eigenvalues as $\Sigma$ . Consequently, all the non-zero eigenvalues of $S$ have a magnitude of less than one. This implies that all the eigenvalues of $S^{\ell}$ decay exponentially as $\ell\rightarrow\infty$ . Furthermore, this indicates that the series $I+S+S^{2}+\cdots$ converges to $(I-S)^{-1}$ . Note that $I-S$ is invertible since all the eigenvalue of $S$ have magnitude less than one. Accordingly, the solution of the scheme (22) satisfies $\lim_{\ell\rightarrow\infty}{w}^{\ell}=(I-S)^{-1}{r}$ . This convergence value $(I-S)^{-1}{r}$ satisfies equation ${{w}}=S{w}+{r}$ . Note that this is a unique solution to equation ${{w}}=S{w}+{r}$ and thus solves (17). Problem (17) has same solution of problem (13), and both problems have unique solutions. Thus, the solution of the GS scheme (22) converges to the solution of (13). $\Box$

Convergence is achieved without any assumptions on the initial guess ${w}^{0}$ . The error between the solution of problem (13) and the solution of the $\ell$ -th coordination update of the GS scheme (21) (i.e., $(I-S)^{-1}{r}-{w}^{\ell}$ ) is given by ${\epsilon}^{\ell}=S^{\ell}\left((I-S)^{-1}{r}-{w}^{0}\right)$ . We will call $\|\epsilon^{\ell}\|$ the error of the $\ell$ -th coordination step. If we express the initial error term as $\|\epsilon^{0}\|$ , we can write $\|\epsilon^{\ell}\|=\|S^{\ell}\epsilon^{0}\|$ . Moreover, if we define $\rho(S)=\lambda_{max}(S)$ then, for any $\delta>0$ , there exists $\kappa>0$ such that the bound $\|S^{\ell}\|\leq\kappa|\rho(S)+\delta|^{\ell}$ holds for all $\ell$ and where $\|S\|$ is a matrix norm of $S$ . Using this inequality we can establish that $\|\epsilon^{\ell}\|\leq\kappa|\rho(S)+\delta|^{\ell}\|\epsilon^{0}\|$ . Since we can choose $\delta$ to be arbitrarily small, we can see that the decaying rate of the error is $O\left(\rho(S)^{\ell}\right)$ . Although the error decays exponentially we note that, if $\rho(S)$ is close to one, convergence can be slow and thus having a good initial guess ${w}^{0}$ is essential. We also note that $\rho(S)$ is tightly related to the topology of the coupling between partitions, indicating that the partition structure contributes to the convergence rate.

We highlight that the GS concepts discussed here focus on problems with no inequality constraints. In practice, however, GS schemes can also be applied to such problems by using projected GS schemes zavalamultigrid ; borzi2005multigrid ; zavala2010real .

2.3 Coordination Orders

In the lexicographic coordination order proposed in (21), we use the update sequence $k=1,2,\cdots,K$ . We note that the order affects the structure of the matrix $\Sigma$ and thus convergence can be affected as well. To see this, consider a new order given by the sequence $\sigma(1),\sigma(2),\cdots,\sigma(K)$ where $\sigma:\{1,2,\cdots,K\}\rightarrow\{1,2,\cdots,K\}$ is a bijective mapping. We can use this mapping to rearrange the partition variables as:

[z_{1},\lambda_{1},z_{2},\lambda_{2},\dots z_{K},\lambda_{K}]\rightarrow[z_{\sigma(1)},\lambda_{\sigma(1)},z_{\sigma(2)},\lambda_{\sigma(2)},\dots z_{\sigma(K)},\lambda_{\sigma(K)}].

(28)

This gives the reordered matrix:

\displaystyle S=\begin{bmatrix}A_{\sigma(1)}&&&\\ -B_{\sigma(2)\sigma(1)}&A_{\sigma(2)}&&\\ \vdots&\ddots&\ddots&\\ -B_{\sigma(K)\sigma(1)}&\cdots&-B_{\sigma(K)\sigma(K-1)}&A_{\sigma(K)}\end{bmatrix}^{-1}\begin{bmatrix}\quad&B_{\sigma(1)\sigma(2)}&\cdots&B_{\sigma(1)\sigma(K)}\\ \quad&&\ddots&\vdots\\ \quad&&&B_{\sigma(K-1)\sigma(K)}\\ \quad\end{bmatrix}

(29)

Importantly, the change in update order is not necessarily a similarity transformation of matrix $S$ and thus the eigenvalues will be altered. The GS scheme converges as long as eigenvalues of the reordered matrix $S$ have magnitude less than one but the convergence rate will be affected. Interestingly, GS schemes are highly flexible and allow for a large number of update orders. For instance, in some cases one can derive ordering sequences that enable parallelization. As an example, the 1-D spatial problem has a special structure with $B_{kk^{\prime}}$ =0 for any $(k,k^{\prime})$ such that $|k-k^{\prime}|\geq 2$ . The GS scheme becomes:

\displaystyle A_{k}{x}_{k}^{\ell+1}

\displaystyle={b}_{k}+B_{kk-1}{x}_{k-1}^{\ell+1}+B_{kk+1}{x}_{k+1}^{\ell}

(30)

Instead, we consider the following ordering $\sigma(i)=2i-1$ for $1\leq i\leq\frac{K}{2}$ and $\sigma(i)=2i-K$ for $\frac{K}{2}+1\leq i\leq K$ . Here we assume that the number of partitions is even. This is a called a red-black ordering and is widely popular in the solution of PDEs. By changing the index using $\sigma(\cdot)$ , we can express (30) as:


$\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1}$	$\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i+\frac{K}{2})}}{x}_{\sigma(i+\frac{K}{2})}^{\ell},\quad i=1$	(31a)
$\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1}$	$\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i+\frac{K}{2}-1)}{\sigma(i+\frac{K}{2}-1)}}{x}_{\sigma(i+\frac{K}{2}-1)}^{\ell}+B_{{\sigma(i)}{\sigma(i+\frac{K}{2})}}{x}_{\sigma(i+\frac{K}{2})}^{\ell},\quad 2\leq i\leq\frac{K}{2}$	(31b)

and


$\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1}$	$\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2})}}{x}_{\sigma(i-\frac{K}{2})}^{\ell+1}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2}+1)}}{x}_{\sigma(i-\frac{K}{2}+1)}^{\ell+1},\quad\frac{K}{2}+1\leq i\leq K-1.$	(32a)
$\displaystyle A_{\sigma(i)}{x}_{\sigma(i)}^{\ell+1}$	$\displaystyle={b}_{\sigma(i)}+B_{{\sigma(i)}{\sigma(i-\frac{K}{2})}}{x}_{\sigma(i-\frac{K}{2})}^{\ell+1},\quad i=K.$	(32b)

We can thus see that the solution of (31) can proceed independently for any $1\leq i\leq\frac{K}{2}$ . This is because these partitions only depend on the solutions of (32) but not on the solutions of (31). Likewise, solving (32) can be done independently for any $\frac{K}{2}+1\leq i\leq K$ . Red-black ordering thus enables parallelism. Different coordination orders for 1-D and 2-D meshes are presented in Figures 4 and 4.

2.4 Coarsening

Low-complexity coarse versions of the full-resolution problem (1) can be solved to obtain an initial guess for the GS scheme and with this accelerate coordination. To illustrate how this can be done, we use the following representation of (1):


$\displaystyle\min_{{z}}\quad$	$\displaystyle\frac{1}{2}{z}^{T}Q{z}-{c}^{T}{z}$	(33a)
s.t.	$\displaystyle\underbrace{\begin{bmatrix}A\\ \Pi\end{bmatrix}}_{\bar{A}}{z}+\underbrace{\begin{bmatrix}B\\ 0\end{bmatrix}}_{\bar{B}}d=0\quad(\nu,{\lambda})$	(33b)

Our goal is to obtain a substantial reduction in the dimension of this problem by introducing a mapping from a coarse variable space to the original space. This is represented by the linear mapping ${z}=T\tilde{{z}}$ where $\tilde{{z}}\in\mathbb{R}^{N_{c}\cdot n_{z}}$ is the coarse variable and we assume that the mapping $T\in\mathbb{R}^{N\cdot n_{z}\times N_{c}\cdot n_{z}}$ (called a restriction operator) has full column rank and $N_{c}<N$ . We can thus pose the low-dimensional coarse problem:


$\displaystyle\min_{{z}}\quad$	$\displaystyle\frac{1}{2}{\tilde{z}}^{T}T^{T}QT{\tilde{z}}-{c}^{T}T{\tilde{z}}$	(34a)
s.t.	$\displaystyle U^{T}\bar{A}T{\tilde{z}}+U^{T}\bar{B}d=0\quad(\tilde{\nu},\tilde{\lambda})$	(34b)

A key issue that arises in coarsening is that the columns of matrix $\bar{A}T$ do not necessarily span the entire range space of $\bar{B}$ (i.e., we might not be able to find a coarse variable $\tilde{z}$ that satisfies $\bar{A}T{\tilde{z}}+\bar{B}d=0$ ). Consequently, we introduce a constraint aggregation matrix $U$ that has full column rank to ensure that $U^{T}\bar{A}T$ spans the range space of $U^{T}\bar{B}$ . With this, we can ensure that the feasible set of (34) is non-empty.

After solving the coarse problem (34), we can project the primal-dual solution from the coarse space to the original space. We note that the dimension of the dual space is also reduced because we performed constraint aggregation. The projection for the primal solution can be done by using ${z}=T\tilde{{z}}$ while the projection for the dual solution can be obtained as $(\nu,\lambda)=U(\tilde{\nu},{\tilde{\lambda}})$ . The derivation of coarsened representations is application-dependent and often requires domain-specific knowledge. In particular, coarsening can also be performed by using reduced order modeling techniques such as proper orthogonal decompositions antoulas or coherency-based network aggregation schemes kokotovic1982coherency . In the following sections we demonstrate how to derive coarse representations in certain settings.

2.5 Multi-Grid Schemes and Hierarchical Coordination

Multi-grid serves as a bridge between a fully centralized and a fully decentralized coordination schemes. In particular, a fully centralized scheme would aim to find a solution of the full-resolution problem (2) by gathering all the information in a single processing unit. A fully decentralized scheme such as GS, on the other hand, would proceed by finding solutions to subproblems (3) over each partition and information would only be shared between the connected partitions through the coordination variables. A drawback of a decentralized scheme is that a potentially large number of coordination steps might be needed to reach a solution for the full-resolution problem, particularly when many partitions are present.

In a multi-grid scheme, we seek to aid the decentralized scheme by using information from a low-resolution central problem that oversees the entire domain. In our context, the information is in the form of states and dual variables defined over the partition interfaces (i.e., the coupling variables and constraints). The key idea of this hierarchical arrangement is that the coarse central scheme can capture effects occurring at low global frequencies while the agents in the decentralized schemes can handle effects occurring at high local frequencies. As can be seen, multi-grid provides a framework to design hierarchical control architectures by leveraging existing and powerful reduced order modeling techniques such as coherency-based aggregation and decentralized control schemes.

Multi-grid is a widely studied computational paradigm. The framework proposed here presents basic elements of this paradigm but diverse extensions are possible borzi ; brandtalgebraic . For instance, the scheme proposed here involves only a coarse and a fine resolution level but one can create a multi-level schemes that transfer information between multiple scales recursively by using meshes of diverse resolution. This can allow us to cover a wider range of frequencies present in the system. In the following section we illustrate how sequential coarsening can be beneficial.

From an electricity markets perspective, we highlight that multi-grid schemes provide a framework to coordinate transactions at multiple spatial and temporal scales. To see this, consider the case of spatial coordination of electricity markets. Under such setting, we can interpret each partition of the spatial domain as a market player (e.g., a microgrid). The market players have internal resources (e.g., distributed energy resources) that they manage to satisfy their internal load. The players, however, can also transact energy with other players in order to improve their economic performance. The proposed GS scheme provides a mechanism to handle intra-partition decision-making (by solving the partition subproblems) and inter-partition transactions by exchanging state (voltages) and dual information (nodal prices). If transaction information is exchanged multiple times (corresponding to multiple GS iterates), the GS scheme will converge to an equilibrium point corresponding to the solution of the centralized economic maximization problem (e.g., the social welfare problem). This is a useful property of decentralized coordination because centralization of information and decision-making is often impractical. If the players only exchange information once (or a handful of times) they might not reach an optimal equilibrium and an inefficiency will be introduced. Moreover, when a disturbance affects the system, many GS iterations might be needed to reach the new optimal equilibrium. This is where hierarchical optimization becomes beneficial, because one can solve and aggregated spatial representation of the system (in which each partition is treated as a node) to compute approximate dual variables and states at the interfaces of the partitions. This approximation can be used to aid the convergence of the decentralized GS scheme (by conveying global spatial information to local market players). One can think of the coarse high-level problem as a system operator (supervisor) problem (e.g., at the distribution level). The operator might, at the same time, need to coordinate with other system operators (each of which oversees its own set of market players). These system operators can at the same time be aggregated into a higher level which would represent, for instance, a transmission or regional operator. We can thus see that hierarchical multi-grid schemes enable scalable coordination of potentially large number of market players over large geographical regions. The hierarchical multi-grid scheme can also be applied to handle multiple timescales of a single market player that might need to manage, for instance, assets with different dynamic characteristics.

3 Case Studies

We now present numerical case studies to demonstrate the concepts in the context of temporal and spatial management of energy systems. We use a multi-scale (in time) optimization problem with features of an storage management problem and a multi-scale (in space) optimization problem that considers power flow dispatch over a network.

3.1 Multi-Scale Temporal Control

We use a multi-grid scheme to solve the following temporal planning problem $\mathcal{P}$ :


$\displaystyle\min_{{x},{u}}\quad$	$\displaystyle\sum_{i\in\mathcal{N}}(x(i)^{2}+{u(i)}^{2})$	(35a)
s.t.	$\displaystyle x(i+1)=x(i)+\delta(u(i+1)+d(i+1)),\quad i\in\mathcal{N}$	(35b)
	$\displaystyle x(0)=0.$	(35c)

This problem has a state, a control, and a disturbance defined over $N=M\cdot K$ time points contained in the set $\mathcal{N}$ . The state and control are grouped into the decision variable $z(i)=(x(i),u(i))$ . The distance between mesh points is given by $\delta$ . The structure of this problem resembles that of an inventory (storage) problem in which the disturbance $d(i)$ is a load and $u(i)$ is a charge/discharge flow from the storage. In these types of problems, the load might have frequencies covering multiple timescales (e.g., seasonal, daily, and down to seconds). Consequently, the time mesh $\delta$ has to be rather fine to capture all the frequencies. Moreover, the planning horizon (the time domain $N\cdot\delta$ ) might need to be long so as to capture the low frequency components in the load. We partition the problem into $K$ partitions, each containing $M$ points. The set of inner points in the partition is defined as $\mathcal{M}$ . The optimization problem over a partition $k$ solved in the GS scheme is given by:


$\displaystyle\min_{{x}_{k},u_{k}}\quad$	$\displaystyle\sum_{i\in\mathcal{M}}\left(x_{k}(i)^{2}+{u_{k}(i)}^{2}\right)+x_{k}(M)\lambda_{k+1}^{\ell}$	(36a)
s.t.	$\displaystyle x_{k}(i+1)=x_{k}(i)+\delta(u_{k}(i+1)+d_{k}(i+1)),\quad i\in\mathcal{M}$	(36b)
	$\displaystyle x_{k}(0)=x_{k-1}^{\ell+1}(M)\quad(\lambda_{k}).$	(36c)

In our numerical experiments, we set $K=10$ and $M=100$ to give $N=1,000$ points. The time mesh points were set $t(i)=i\cdot\delta$ with $\delta=0.1$ . We use a disturbance signal composed of a low and a high frequency $d(i)={4\sin\left(\frac{4\pi i}{N}\right)}+{\sin\left(\frac{24\pi i}{N}\right)},\,i\in\mathcal{N}$ . The disturbance signal and its frequency components are shown in Figure 5.

We solve a coarse problem to create a hierarchical structure that aids the GS scheme (which operates on the partitions at high resolution). To perform coarsening, we aggregate $M_{c}=4$ internal grid points (i.e., collapse 25 points into one coarse point). The set of inner coarse points is $\mathcal{M}_{c}$ . The projection is ${x}_{k}=T_{x}\tilde{{x}}_{k}$ and ${u}_{k}=T_{u}\tilde{{u}}_{k}$ with:


	$\displaystyle T_{x}=T_{u}=\begin{bmatrix}1\\ &1\\ &\vdots\\ &1\\ &&\ddots\\ &&&1\\ &&&\vdots\\ &&&1\end{bmatrix}$		(37a)

The projection matrices $T_{x}$ and $T_{u}$ have full column rank. The fine to coarse restriction can also be expressed as:

\displaystyle\tilde{x}_{k}(\tilde{i})=x_{k}(i),\quad\tilde{u}_{k}(\tilde{i})=u_{k}(i),\quad\text{if }\quad\tilde{i}=\lfloor{\frac{i-1}{M/M_{c}}}\rfloor+1

(38)

where $\lfloor\cdot\rfloor$ is a round-down operator. We denote $\varphi(i)=\lfloor{\frac{i-1}{M/M_{c}}}\rfloor+1$ . The coarse problem can then be stated in terms of the coarse variables as follows:


$\displaystyle\min_{{\tilde{x}}_{k},\tilde{u}_{k}}\quad$	$\displaystyle\sum_{j\in\mathcal{M}_{c}}\left(\tilde{x}_{k}(j)^{2}+{\tilde{u}_{k}(j)}^{2}\right)+\tilde{x}_{k}(M_{c}){\lambda}_{k+1}^{\ell}$	(39a)
s.t.	$\displaystyle\tilde{x}_{k}(j+1)=\tilde{x}_{k}(j)+\frac{M}{M_{c}}\delta(\tilde{u}_{k}(j+1)+\tilde{d}_{k}(j+1)),\quad j\in\mathcal{M}_{c}$	(39b)
	$\displaystyle\tilde{x}_{k}(0)=\tilde{x}_{k-1}^{\ell+1}(M_{c})\quad(\lambda_{k}),$	(39c)

where $\tilde{d}_{k}(j)=\frac{M_{c}}{M}\sum_{i\in\varphi^{-1}(\tilde{j})}d_{k}(i)$ is the coarsened disturbance signal. The dynamic equations are defined over a smaller dimensional space (defined over $\mathcal{M}_{c}$ ) which results from aggregating the dynamic equations in the full resolution problem (defined over $\mathcal{M}$ ). We solve the full resolution problem (35) and compare its solution against that of the pure GS scheme, the one of the coarse low resolution problem, and the one of the hierarchical scheme that solves the coarse problem to coordinate the GS scheme. We also solve the coarse problem by using a GS scheme. Figure 6 shows the results. We note that the solution to the coarse problem (36) captures the general long-term trend of the solution but misses the high frequencies. The GS scheme refines this solution and converges in around 30 coordination steps. By comparing Figure 6 (top-right) and (bottom-right), we can see that initializing GS scheme with the coarse solution significantly reduces the initial error, and demonstrates the benefit of the hierarchical scheme.

3.2 Multi-Scale Spatial Control

Useful insights on how to use multi-grid schemes to create hierarchical network control structures result from interpreting network flows as diffusive processes. To illustrate this, we consider a network with nodes defined on a rectangular mesh. A node $(i,j)$ in the network exchanges flows with its four neighboring nodes $(i,j+1),(i,j-1),(i+1,j),(i-1,j)$ (this is called a stencil). The flows $f({i,j})$ are a function of the node potentials $p({i,j})$ and given by:


$\displaystyle f({i,j;i,j+1})$	$\displaystyle=D(p({i,j})-p({i,j+1}))$	(40a)
$\displaystyle f({i,j;i,j-1})$	$\displaystyle=D(p({i,j})-p({i,j-1}))$	(40b)
$\displaystyle f({i,j;i+1,j})$	$\displaystyle=D(p({i,j})-p({i+1,j}))$	(40c)
$\displaystyle f({i,j;i-1,j})$	$\displaystyle=D(p({i,j})-p({i-1,j})).$	(40d)

Here, $D\in\mathbb{R}$ is the diffusion constant (i.e., the flow resistance) of the link connecting the nodes. At each node $(i,j)$ we have a load $d(i,j)$ and a source $u(i,j)$ that is used to counteract (balance) the load. This gives the flow balance conservation equation:

\displaystyle f({i,j;i,j+1})+f({i,j;i,j-1})+f({i,j;i+1,j})+f({i,j;i-1,j})=u(i,j)+d(i,j).

This can also be written in terms of the potentials as:

\displaystyle D\left(4\cdot p(i,j)-p(i-1,j)-p(i+1,j)-p(i,j-1)-p(i,j+1)\right)=u(i,j)+d(i,j).

We assume that we have fixed 2-D spatial domain $\Omega:=[0,X]\times[0,Y]$ that is discretized using $M\cdot P$ nodes in each direction. The sets $\mathcal{N}^{x}=\mathcal{N}^{y}:=\{1,2,\cdots,P\cdot M\}$ are the sets of points in each direction. The set of total mesh points is $\mathcal{N}=\mathcal{N}^{x}\times\mathcal{N}^{y}$ and thus $N=(P\cdot M)\cdot(P\cdot M)$ . As the number of nodes increases, the node potentials form a continuum described by the 2-D diffusion equation:

\displaystyle D\left(\frac{\partial^{2}p(x,y)}{\partial x^{2}}+\frac{\partial^{2}p(x,y)}{\partial y^{2}}\right)=u(x,y)+d(x,y),\quad(x,y)\in\Omega.

(41)

Using this analogy, we consider the following full-space problem:


$\displaystyle\min_{{p},{u}}\quad$	$\displaystyle\sum_{(i,j)\in\mathcal{N}}\left(p(i,j)^{2}+u(i,j)^{2}\right)$	(42a)
s.t.	$\displaystyle D\left(4\cdot p(i,j)-p(i-1,j)-p(i+1,j)-p(i,j-1)-p(i,j+1)\right)$
	$\displaystyle\qquad\qquad=u(i,j)+d(i,j),\quad(i,j)\in\mathcal{N}$	(42b)
	$\displaystyle p(0,j)=0,\quad j\in\mathcal{N}^{y}$	(42c)
	$\displaystyle p(M\cdot P+1,j)=0,\quad j\in\mathcal{N}^{y}$	(42d)
	$\displaystyle p(i,0)=0,\quad i\in\mathcal{N}^{x}$	(42e)
	$\displaystyle p(i,M\cdot P+1)=0,\quad i\in\mathcal{N}^{x}$	(42f)

The goal of the optimization problem is, given the loads $d(i,j)$ , to control the potentials in the network nodes $p(i,j)$ by using the sources $u(i,j)$ . The decision variables at every node are $z(i,j)=(p(i,j),u(i,j))$ . The presence of multiple frequencies in the 2-D disturbance load field $d(i,j)$ might require us to consider fine meshes, making the optimization problem intractable. One can think of the disturbance field as spatial variations of electrical loads observed over a geographical region. If the loads have high frequency spatial variations, it would imply that we need high control resolution (i.e., we need sources at every node in the network to achieve tight control). This can be achieved, for instance, by installing distributed energy resources (DERs). Moreover, if the load has low-frequency variations, it would imply that the DERs would have to cooperate to counteract global variations.

In our experiments, the size of mesh was set $P=10$ and $M=10$ , which results in $N=10,000$ mesh points. To address this complexity, we partition the 2-D domain into $K=P\cdot P$ partitions each with $M\cdot M$ points and we label each element in the partition as $k=(n,m)\in\mathcal{K}$ . We can think of each partition $k\in\mathcal{K}$ as a region of the network. We define inner index sets by: $\mathcal{M}^{x}=\mathcal{M}^{y}:=\{1,2,\cdots,M\}$ and $\mathcal{M}:=\mathcal{M}^{x}\times\mathcal{M}^{y}$ . The GS scheme for partition $(n,m)$ is given by:


$\displaystyle\min_{p_{n,m},{u}_{n,m}}\quad$	$\displaystyle\sum_{(i,j)\in\mathcal{M}}\left(p_{n,m}(i,j)^{2}+u_{n,m}(i,j)^{2}\right)$	(43a)
	$\displaystyle+\sum_{j\in\mathcal{M}^{y}}p_{n,m}(1,j)\lambda_{n-1,m}^{\ell+1}(M+1,j)+\sum_{j\in\mathcal{M}^{y}}p_{n,m}(M,j)\lambda_{n+1,m}^{\ell}(0,j)$
	$\displaystyle+\sum_{i\in\mathcal{M}^{x}}p_{n,m}(i,1)\lambda_{n,m-1}^{\ell+1}(i,M+1)+\sum_{i\in\mathcal{M}^{x}}p_{n,m}(i,M)\lambda_{n,m+1}^{\ell}(i,0)$
s.t.	$\displaystyle D\left(4\cdot p_{n,m}(i,j)-p_{n,m}(i-1,j)-p_{n,m}(i+1,j)-p_{n,m}(i,j-1)-p_{n,m}(i,j+1)\right)$
	$\displaystyle\qquad\qquad=u_{n,m}(i,j)+d_{n,m}(i,j),\quad(i,j)\in\mathcal{M}$	(43b)
	$\displaystyle p_{n,m}(0,j)=p_{n-1,m}^{\ell+1}(M,j),\quad\quad\;\;(\lambda_{n,m}(0,j))$	(43c)
	$\displaystyle p_{n,m}(M+1,j)=p_{n+1,m}^{\ell}(1,j),\quad(\lambda_{n,m}(M+1,j))$	(43d)
	$\displaystyle p_{n,m}(i,0)=p_{n,m-1}^{\ell+1}(i,M),\quad\qquad(\lambda_{n,m}(i,0))$	(43e)
	$\displaystyle p_{n,m}(i,M+1)=p_{n,m+1}^{\ell}(i,1),\quad\;\;(\lambda_{n,m}(i,M+1)),\quad$	(43f)

The constraint indices for the constraints (43c)-(43f) run over $j\in\mathcal{M}^{y}$ and $i\in\mathcal{M}^{x}$ .

To perform coarsening, a mesh of $(M/M_{c})\cdot(M/M_{c})$ points is collapsed into a single coarse point and the mapping from the coarse space to the original space is:

\tilde{p}_{n,m}(\tilde{i},\tilde{j})=p_{n,m}(i,j)\quad\text{if }\quad\tilde{i}=\lfloor\frac{i-1}{M/M_{c}}\rfloor+1,\quad\tilde{j}=\lfloor\frac{j-1}{M/M_{c}}\rfloor+1.

(44)

As with the temporal case, we also perform aggregation of the constraints in the partition to obtain a coarse representations. In our experiments, we used $M_{c}=2$ as default. The disturbance field is given by a linear combination of a 2-D sinusoidal and of a Gaussian function. The shape of the load field illustrated in Figure 7. Figure 8 shows the optimal potential field obtained with the coarse problem and that obtained with the GS scheme at the first and tenth steps (initialized with the coarse field). Note that the coarse field error captures the global structure of the load field but misses the high frequencies, while the GS scheme corrects the high-frequency load imbalances. In Figure 9 we again illustrate that the hierarchical scheme outperforms the decentralized GS scheme.

3.3 Effect of Coarsening Strategy

We now compare the efficiency of coarsening with different resolutions on the performance of GS. The results are given in Figure 10 (top left and right) and reveal that using a higher resolution for the coarse problem does not necessarily result in better performance of the GS scheme. This is particularly evident in the temporal case while for the spatial case increasing the mesh resolution does help. We attribute this difference to the asymmetric nature of the temporal problem compared to the symmetric mesh of the spatial case. For the temporal problem, we have found that the most effective coarsening strategy is to solve a sequence of coarse problems with increasing resolution. At each coarsening level, however, we only perform a single GS coordination step and the resulting coordination variables are used to initialize the GS coordination at the next level. The error evolution of this sequential coarsening scheme is shown in Figure 10 (top left). We sequentially solved the coarse problems by using $M_{c}=1,M_{c}=2,M_{c}=4,M_{c}=5,M_{c}=10,M_{c}=20,M_{c}=25,M_{c}=50$ (this gives a total of eight GS steps). We note that, at the tenth GS step, the solution from this sequential coarsening scheme is about seventy times smaller than that obtained with no coarsening. This can be attributed to the ability of the single step GS schemes to cover a wider range of frequencies.

3.4 Effect of Coordination Order

We solved the multi-scale temporal control problem (36) and the spatial control problem (43) with four different ordering methods for each problem. For temporal control problem, ordering method 1 was a lexicographic ordering, ordering method 2 was a reverse lexicographic ordering, ordering method 3 was a forward-backward ordering, and ordering method 4 was the red-black scheme. For the spatial control problem, ordering method 1 was a lexicographic ordering, ordering method 2 was a spiral-like ordering, ordering method 3 was the red-black ordering, and ordering method 4 was set by ordering the partitions based on the magnitude of the disturbance. The results are presented in Figure 10 (top left and right). As can be seen, in temporal problem, the performance of reverse lexicographic ordering is significantly better than that achieved by other methods. This can be attributed to the asymmetry of the coupling topology. In particular, in the temporal problem, the primal variable information is propagated in forward direction while the dual information is propagated in reverse direction. It can be seen that dual information plays an important role in the convergence of the temporal problem. In the spatial problem, the performance of the different orderings is virtually the same. The red-black ordering (which enables parallelism) achieves the same performance as the rest. We attribute this to the symmetry of the spatial domain.

4 Conclusions and Directions of Future Work

We have presented basic elements of multi-grid computing schemes and illustrated how to use these to create hierarchical coordination architectures for complex systems. In particular, we discuss how Gauss-Seidel schemes can be seen as decentralized coordination schemes that handle high-frequency effects while coarse solution operators can be seen as low-resolution centralized coordination schemes that handle low-frequency effects. We believe that multi-grid provides a powerful framework to systematically construct hierarchical coordination architectures but diverse challenges need to be addressed. In particular, it is important to understand convergence properties of GS schemes in more complex settings with nonlinear effects and inequality constraints. Moreover, it is necessary to develop effective coarsening (aggregation) schemes that can retain useful information while reducing complexity. Moreover, it is desirable to combine hierarchical coordination schemes and existing control theory to analyze stability and robustness properties.

5 Acknowledgements

We acknowledge funding from the National Science Foundation under award NSF-EECS-1609183.

References

(1) Albersmeyer, J., Diehl, M.: The lifted newton method and its application in optimization. SIAM Journal on Optimization 20(3), 1655–1684 (2010)
(2) Antoulas, A.C., Sorensen, D.C., Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemporary mathematics 280, 193–220 (2001)
(3) Arnold, M., Negenborn, R., Andersson, G., De Schutter, B.: Distributed predictive control for energy hub coordination in coupled electricity and gas networks. In: Intelligent Infrastructures, pp. 235–273. Springer (2010)
(4) Baldea, M., Daoutidis, P.: Control of integrated process networks a multi-time scale perspective. Computers & chemical engineering 31(5), 426–444 (2007)
(5) Biegler, L.T., Zavala, V.M.: Large-scale nonlinear programming using ipopt: An integrating framework for enterprise-wide dynamic optimization. Computers & Chemical Engineering 33(3), 575–582 (2009)
(6) Borzì, A., Kunisch, K.: A multigrid scheme for elliptic constrained optimal control problems. Computational Optimization and Applications 31(3), 309–333 (2005)
(7) Borzì, A., Schulz, V.: Multigrid methods for pde optimization. SIAM review 51(2), 361–395 (2009)
(8) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3(1), 1–122 (2011)
(9) Brandt, A.: Algebraic multigrid theory: The symmetric case. Applied mathematics and computation 19(1), 23–56 (1986)
(10) Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.: Distributed model predictive control. Control Systems, IEEE 22(1), 44–52 (2002)
(11) Chow, J.H., Kokotovic, P.V.: Time scale modeling of sparse dynamic networks. Automatic Control, IEEE Transactions on 30(8), 714–722 (1985)
(12) Diehl, M., Bock, H.G., Schlöder, J.P., Findeisen, R., Nagy, Z., Allgöwer, F.: Real-time optimization and nonlinear model predictive control of processes governed by differential-algebraic equations. Journal of Process Control 12(4), 577–585 (2002)
(13) Farina, M., Zhang, X., Scattolini, R.: A hierarchical mpc scheme for interconnected systems. arXiv preprint arXiv:1703.02739 (2017)
(14) Fisher, M.L.: The lagrangian relaxation method for solving integer programming problems. Management science 50(12_supplement), 1861–1871 (2004)
(15) Giselsson, P., Doan, M.D., Keviczky, T., De Schutter, B., Rantzer, A.: Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3), 829–833 (2013)
(16) Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)
(17) Jamshidi, M., Tarokh, M., Shafai, B.: Computer-aided analysis and design of linear control systems. Prentice-Hall, Inc. (1992)
(18) Joo, J.Y., Ilic, M.D.: Multi-layered optimization of demand resources using lagrange dual decomposition. Smart Grid, IEEE Transactions on 4(4), 2081–2088 (2013)
(19) Kokotovic, P.: Subsystems, time scales and multimodeling. Automatica 17(6), 789–795 (1981)
(20) Kokotovic, P., Avramovic, B., Chow, J., Winkelman, J.: Coherency based decomposition and aggregation. Automatica 18(1), 47–56 (1982)
(21) Kozma, A., Frasch, J.V., Diehl, M.: A distributed method for convex quadratic programming problems arising in optimal control of distributed systems. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 1526–1531. IEEE (2013)
(22) Lefort, A., Bourdais, R., Ansanay-Alex, G., Guéguen, H.: Hierarchical control method applied to energy management of a residential house. Energy and Buildings 64, 53–61 (2013)
(23) Liu, C., Shahidehpour, M., Wang, J.: Application of augmented lagrangian relaxation to coordinated scheduling of interdependent hydrothermal power and natural gas systems. Generation, Transmission & Distribution, IET 4(12), 1314–1325 (2010)
(24) Negenborn, R.R., De Schutter, B., Hellendoorn, J.: Efficient implementation of serial multi-agent model predictive control by parallelization. In: Networking, Sensing and Control, 2007 IEEE International Conference on, pp. 175–180. IEEE (2007)
(25) Peponides, G.M., Kokotovic, P.V.: Weak connections, time scales, and aggregation of nonlinear systems. Automatic Control, IEEE Transactions on 28(6), 729–735 (1983)
(26) Rawlings, J.B., Bonné, D., Jørgensen, J.B., Venkat, A.N., Jørgensen, S.B.: Unreachable setpoints in model predictive control. Automatic Control, IEEE Transactions on 53(9), 2209–2215 (2008)
(27) Rawlings, J.B., Mayne, D.: Model predictive control. Noble Hill Publishing (2008)
(28) Scattolini, R.: Architectures for distributed and hierarchical model predictive control–a review. Journal of Process Control 19(5), 723–731 (2009)
(29) Scattolini, R., Colaneri, P.: Hierarchical model predictive control. In: Decision and Control, 2007 46th IEEE Conference on, pp. 4803–4808. IEEE (2007)
(30) Simon, H.A., Ando, A.: Aggregation of variables in dynamic systems. Econometrica: journal of the Econometric Society pp. 111–138 (1961)
(31) Stewart, B.T., Venkat, A.N., Rawlings, J.B., Wright, S.J., Pannocchia, G.: Cooperative distributed model predictive control. Systems & Control Letters 59(8), 460–469 (2010)
(32) Stewart, B.T., Wright, S.J., Rawlings, J.B.: Cooperative distributed model predictive control for nonlinear systems. Journal of Process Control 21(5), 698–704 (2011)
(33) Summers, T.H., Lygeros, J.: Distributed model predictive consensus via the alternating direction method of multipliers. In: Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pp. 79–84. IEEE (2012)
(34) Zavala, V.M.: New architectures for hierarchical predictive control. IFAC-PapersOnLine 49(7), 43–48 (2016)
(35) Zavala, V.M., Anitescu, M.: Real-time nonlinear optimization as a generalized equation. SIAM Journal on Control and Optimization 48(8), 5444–5467 (2010)
(36) Zavala, V.M., Anitescu, M.: Real-time nonlinear optimization as a generalized equation. SIAM Journal on Control and Optimization 48(8), 5444–5467 (2010)
(37) Zavala, V.M., Biegler, L.T.: The advanced-step nmpc controller: Optimality, stability and robustness. Automatica 45(1), 86–93 (2009)
(38) Zhu, D., Yang, R., Hug-Glanzmann, G.: Managing microgrids with intermittent resources: A two-layer multi-step optimal control approach. In: North American Power Symposium (NAPS), 2010, pp. 1–8. IEEE (2010)