Structured linear quadratic control computations over 2D grids

Armaghan Zafar and Ian R. Manchester The authors are with the Australian Centre for Field Robotics, and Sydney Institute for Robotics and Intelligent Systems, The University of Sydney, Sydney, NSW 2006, Australia (e-mail: armaghan.zafar. [email protected]).

Abstract

In this paper, we present a structured solver based on the preconditioned conjugate gradient method (PCGM) for solving the linear quadratic (LQ) optimal control problem for $K\times N$ sub-systems connected in a two-dimensional (2D) grid structure. Our main contribution is the development of a structured preconditioner based on a fixed number of inner-outer iterations of the nested block Jacobi method. We establish that the proposed preconditioner is positive-definite. Moreover, the proposed approach retains structure in both spatial dimensions as well as in the temporal dimension of the problem. The arithmetic complexity of each PCGM step scales as $O(KNT)$ , where $T$ is the length of the time horizon. The computations involved at each step of the proposed PCGM are decomposable and amenable to distributed implementation on parallel processors connected in a 2D grid structure with localized data exchange. We also provide results of numerical experiments performed on two example systems.

Notations

In this paper, we use $\mathbb{R}:=(-\infty,+\infty)$ and $\mathbb{N}:=\{1,2,\ldots\}$ to denote the set of real numbers and natural numbers, respectively. We denote an $n$ -dimensional real vector with $\mathbb{R}^{n}$ and a real matrix with $n$ rows and $m$ columns with $\mathbb{R}^{n\times m}$ . $\mathcal{A}\backslash\{b\}$ denotes all elements of the set $\mathcal{A}$ except element $b$ . $A^{\prime}$ denotes the transpose of a matrix. $A\succ 0$ means the symmetric matrix $A=A^{\prime}\in\mathbb{R}^{n\times n}$ is positive definite (i.e., there exists $c>0$ such that $x^{\prime}Ax\geq cx^{\prime}x$ for all $x\in\mathbb{R}^{n}$ ) and $A\succeq 0$ means $A$ is a positive semi-definite (i.e., $x^{\prime}Ax\geq 0$ for all $x\in\mathbb{R}^{n}$ ). “ $\mathrm{blkdiag}(.)$ ”” represents construction of block diagonal matrix from input arguments and “ $\mathrm{col}(.)$ ” represents concatenation of input arguments as column vector.

I Introduction

We are interested in solving finite-horizon linear-quadratic (LQ) optimal control problems with discrete-time dynamics arising from the interconnection of sub-systems in a two-dimensional (2D) grid graph structure as shown in Fig.1.

Refer to caption — Figure 1: Bi-directional flow of information between subsystems connected in a 2D grid

Each node is a sub-system and is labeled with subscript $(i,j)$ with $i\in\mathcal{K}:=\{1,2,\ldots,K\}\subset\mathbb{N}$ and $j~{}\in~{}\mathcal{N}:=\{1,2,\ldots,N\}\subset\mathbb{N}$ , where $K$ is the total number of sub-systems along the vertical direction and $N$ is the total number of sub-systems in the horizontal direction. We assume that there is a bi-directional flow of information between adjacent sub-systems, as shown in Fig.1. That is, each sub-system $(i,j)$ is coupled, at max, with $4$ of the neighbouring sub-systems. Spatio-temporal structure of this kind is relevant in many applications such as automated water irrigation networks [1], radial power distribution networks [2], control of traffic signals [3], spreading processes such as wildfires [4, 5] and in the discretization of 2D partial differential equations [6].

The LQ optimal control problem that we are interested in is a large-scale quadratic program with O(KN T) decision variables, where T is the length of the time horizon. We can solve this problem by solving the linear system of equations that arise from the corresponding first-order optimality conditions, known as KKT conditions. Due to the special spatio-temporal structure of the problem, this linear system of equations is also highly structured.

Previous research has studied the temporal structure of the LQ optimal control problem in [7, 8, 9, 10], but these approaches do not utilize the structure in the spatial dimensions of the problem. Structured solvers for LQ optimal control problems related to path-graph networks (i.e., 1D chains) are proposed in [11, 12, 13], and the computations are decomposable and amenable to distributed implementation on parallel processors with path-graph data exchange. However, these approaches do not exploit the structure in both spatial dimensions simultaneously, as we do in this paper. In [14, 15], structured methods are proposed for solving optimal control problems of multi-agent networks, which can be used to solve the LQ optimal control problem we are interested in. However, these approaches often take a large number of iterations to converge. All of these approaches offer decomposable computations, but they do not exploit the complete three-dimensional spatio-temporal structure of the problem. In [16], a structured solver is proposed for a special case where sub-systems are connected in a directed tree graph structure.

The main contribution of this paper extends the developments presented in [13] to the case of 2D grids. We present a structured solver based on preconditioned conjugate gradient method (PCGM) for solving the KKT conditions. Specifically, we propose a structured preconditioner based on nested block Jacobi method (NBJM) to improve the conditioning of the linear system. The proposed approach retains structure in all three dimensions of the problem, extending the work in [13] which explored structure in two dimensions. We show that the proposed preconditioner is positive-definite. Moreover, the arithmetic complexity of each PCGM step scales as $O(KNT)$ . The computations are decomposeable and amenable to distributed implementation on $O(NT)$ parallel processors connected in a 2D grid with localized data exchange.

The rest of the paper is organized as follows. In Section II, we present the structured formulation of the problem and the corresponding KKT conditions. In Section III, we provide an overview of the PCGM. The structured preconditioner based on NBJM is developed in Section IV. We explore performance of the proposed PCGM numerically, for a 2D mass-spring-damper network and an automated water irrigation network, and present results in Section V. Finally, some concluding remarks and possible directions of future work are presented in VI.

II Problem Formulation

The state space dynamics of each sub-system $(i,j)$ are

$\displaystyle x_{i,j,t+1}={}$	$\displaystyle A_{i,j,t}x_{i,j,t}+B_{i,j,t}u_{i,j,t}$	(1)
	$\displaystyle+E_{i,j,t}x_{i,j-1,t}+F_{i,j,t}x_{i,j+1,t}$
	$\displaystyle+G_{i,j,t}x_{i-1,j,t}+H_{i,j,t}x_{i+1,j,t}$

where $x_{i,j,t}\in\mathbb{R}^{n_{i,j}}$ and $u_{i,j,t}\in\mathbb{R}^{m_{i,j}}$ are the state and input of sub-system $(i,j)\in\mathcal{K}\times\mathcal{N}$ at time $t\in\mathcal{T}:=\{0,1,\ldots,T\}\subset\mathbb{N}\cup\{0\}$ , respectively. Initial conditions are given by $x_{i,j,0}=\gamma_{i,j}\in\mathbb{R}^{n_{i,j}}$ and the spatial boundary conditions are given by $x_{0,j,t}=\underline{\alpha}_{j,t}\in\mathbb{R}^{n_{0,j}}$ , $x_{K\!+\!1,j,t}=\overline{\alpha}_{j,t}\in\mathbb{R}^{n_{K\!+\!1},j}$ and $x_{i,0,t}=\underline{\beta}_{i,t}\in\mathbb{R}^{n_{i,0}}$ , $x_{i,N\!+\!1,t}=\overline{\beta}_{i,t}\in\mathbb{R}^{n_{i,N\!+\!1}}$ for $t\in\mathcal{T}$ .

We are interested in the following finite-horizon linear-quadratic (LQ) optimal control problem:

	$\min_{x_{i,j,t},~{}u_{i,j,t}}\quad\frac{1}{2}\sum_{t\in\mathcal{T}}\sum_{j\in\mathcal{N}}\sum_{i\in\mathcal{K}}\ell_{i,j,t}(x_{i,j,t},u_{i,j,t})$		(2a)
subject to

	$\displaystyle\eqref{eq:2d_graph_dynamics}~{}\text{ for }(i,j,t)\in\mathcal{K}~{}\times$	$\displaystyle~{}\mathcal{N}\times(\mathcal{T}\backslash\{T\}),$	(2b)
	$\displaystyle x_{0,j,t}=\underline{\alpha}_{j,t},x_{K\!+\!1,j}=\overline{\alpha}_{j,t}$	$\displaystyle~{}\text{ for }j\in\mathcal{N},~{}t\in\mathcal{T},$	(2c)
	$\displaystyle x_{i,0,t}=\underline{\beta}_{i,t},x_{i,N\!+\!1,t}=\overline{\beta}_{i,t}$	$\displaystyle~{}\text{ for }i\in\mathcal{K},~{}t\in\mathcal{T},$	(2d)
	$\displaystyle x_{i,j,0}=\gamma_{i,j}$	$\displaystyle~{}\text{ for }i\in\mathcal{K},~{}j\in\mathcal{N},$	(2e)

where $\ell_{i,j,t}(x,u)=x^{\prime}Q_{i,j,t}x+u^{\prime}R_{i,j,t}u$ . For $i\in\mathcal{K}$ , $j\in\mathcal{N}$ and $t\in\mathcal{T}\backslash\{T\}$ , it is assumed that $Q_{i,j,t}=Q_{i,j,t}^{\prime}\succ 0$ , and $R_{i,j,t}=R_{i,j,t}^{\prime}\succ 0$ . Moreover, for every $i\in\mathcal{K}$ , $j\in\mathcal{N}$ , $Q_{i,j,T}\succ 0$ , but $R_{i,j,T}=0$ , so that $u_{i,j,T}$ can be removed as a decision variable.

Note that the problem (2) is a large-scale LQ optimal control problem with $O(KNT)$ decision variables. While the cost is separable across the sub-systems in both dimensions as well as the time horizon, the problem is not separable due the spatial coupling between the states of neighbouring sub-systems, and the inter-temporal coupling in the equality constraint (2b). Solution of problem (2) can obtained by solving the large-scale linear system of equations arising from the corresponding first-order optimality conditions known as KKT conditions. However, due to the structured coupling of variables (i.e., states) in the equality constraint (2b), this linear system of equations is also structured.

II-A Structured Reformulation

By defining $\bar{m}_{j}=\sum_{i=1}^{K}(m_{i,j})$ , $\bar{n}_{j}=\sum_{i=1}^{K}(n_{i,j})$ , $\bar{u}_{j,t}=\mathrm{col}(u_{1,j,t},\ldots,u_{K,j,t})\in\mathbb{R}^{\bar{m}_{j}}$ , and $\bar{x}_{j,t}=\mathrm{col}(x_{1,j,t},\ldots,x_{K,j,T})\in\mathbb{R}^{\bar{n}_{j}}$ , problem (2) can be reformulated as the following quadratic program:

	$\min_{\bar{x}_{j,t},~{}\bar{u}_{j,t}}\quad\frac{1}{2}\sum_{t\in\mathcal{T}}\sum_{j\in\mathcal{N}}\bar{\ell}_{j,t}(\bar{x}_{j,t},\bar{u}_{j,t})$		(3a)
subject to

	$\displaystyle\bar{x}_{j,t+1}$	$\displaystyle=\bar{A}_{j,t}\bar{x}_{j,t}+\bar{B}_{j,t}u_{j,t}+\bar{E}_{j,t}\bar{x}_{j-1,t}+\bar{F}_{j,t}\bar{x}_{j+1,t}$
		$\displaystyle+\bar{M}_{j,t}{\bm{\alpha}}_{j,t},~{}j\in\mathcal{N},~{}t\in(\mathcal{T}\backslash\{T\}),$	(3b)
	$\displaystyle\bar{x}_{0,t}$	$\displaystyle=\underline{\beta}_{t},~{}\bar{x}_{N\!+\!1,t}=\overline{\beta}_{t},~{}t\in\mathcal{T},$	(3c)
	$\displaystyle\bar{x}_{j,0}$	$\displaystyle=\gamma_{j},~{}j\in\mathcal{N},$	(3d)

where $\bar{\ell}_{j,t}(\bar{x},\bar{u})=\bar{x}^{\prime}\bar{Q}_{j,t}\bar{x}+\bar{u}^{\prime}\bar{R}_{j,t}\bar{u}$ for $j\in\mathcal{N}$ and $t\in\mathcal{T}$ , $\bar{R}_{j,T}=0$ ,

	$\displaystyle\bar{Q}_{j,t}$	$\displaystyle=\mathrm{blkdiag}(Q_{1,j,t},\ldots,Q_{K,j,t})\in\mathbb{R}^{\bar{n}_{j}\times\bar{n}_{j}},$
	$\displaystyle\bar{R}_{j,t}$	$\displaystyle=\mathrm{blkdiag}(R_{1,j,t},\ldots,R_{K,j,t})\in\mathbb{R}^{\bar{m}_{j}\times\bar{m}_{j}},$
	$\displaystyle\bar{B}_{j,t}$	$\displaystyle=\mathrm{blkdiag}(B_{1,j,t},\ldots,B_{K,j,t})\in\mathbb{R}^{\bar{n}_{j}\times\bar{m}_{j}},$
	$\displaystyle\bar{E}_{j,t}$	$\displaystyle=\mathrm{blkdiag}(E_{1,j,t},\ldots,E_{K,j,t})\in\mathbb{R}^{\bar{n}_{j}\times\bar{n}_{j-1}},$
	$\displaystyle\bar{F}_{j,t}$	$\displaystyle=\mathrm{blkdiag}(F_{1,j,t},\ldots,F_{K,j,t})\in\mathbb{R}^{\bar{n}_{j}\times\bar{n}_{j+1}},$
	$\displaystyle\underline{\beta}_{t}$	$\displaystyle=\mathrm{col}(\underline{\beta}_{1,0,t},\ldots,\underline{\beta}_{K,0,t})\in\mathbb{R}^{\bar{n}_{0}},$
	$\displaystyle\overline{\beta}_{t}$	$\displaystyle=\mathrm{col}(\underline{\beta}_{1,N+1,t},\ldots,\underline{\beta}_{K,N+1,t})\in\mathbb{R}^{\bar{n}_{N+1}},$
	$\displaystyle{\bm{\alpha}}_{j,t}$	$\displaystyle=\mathrm{col}(\underline{\alpha}_{j,t},\overline{\alpha}_{j,t})\in\mathbb{R}^{n_{0,j}+n_{K+1,j}},$
	$\displaystyle\bar{A}_{j,t}$	$\displaystyle\!=\!\begin{bmatrix}A_{1,j,t}&H_{1,j,t}&&\\ G_{2,j,t}&A_{2,j,t}&\ddots&\\ &\ddots&\ddots&H_{K\!-\!1,j,t}\\ &&G_{K,j,t}&A_{K,j,t}\end{bmatrix},\bar{M}_{j,t}\!=\!\left[\begin{array}[]{cc}G_{1,j,t}&0\\ 0&\vdots\\ \vdots&0\\ 0&H_{K,j,t}\end{array}\right]$

Note that $\bar{A}_{j,t}\in\mathbb{R}^{\bar{n}_{j}\times\bar{n}_{j}}$ and $\bar{M}_{j,t}\in\mathbb{R}^{\bar{n}_{j}\times(n_{0,j}+n_{K+1,j})}$ . All the matrices in vertically stacked QP (3) are block diagonal except $\bar{A}_{j,t}$ which is block tri-diagonal due to the spatial coupling of system dynamics along the vertical direction in the optimal control problem (2).

Now, defining $\hat{m}=\sum_{i=1}^{N}(\bar{m}_{j})$ , $\hat{n}=\sum_{i=1}^{N}(\bar{n}_{j})$ , $\hat{u}_{t}=\mathrm{col}(u_{1,t},\ldots,u_{N,t})\in\mathbb{R}^{\hat{m}}$ , and $\hat{x}_{t}=\mathrm{col}(x_{1,t},\ldots,x_{N,T})\in\mathbb{R}^{\hat{n}}$ , problem (3) can be reformulated by stacking across the horizontal dimension of the 2D grid:

	$\min_{\hat{x}_{t},~{}\hat{u}_{t}}\quad\frac{1}{2}\biggl{(}\sum_{t\in\mathcal{T}\backslash\{T\}}\begin{bmatrix}\hat{x}_{t}\\ \hat{u}_{t}\end{bmatrix}^{\prime}\begin{bmatrix}\hat{Q}_{t}&0\\ 0&\hat{R}_{t}\end{bmatrix}\begin{bmatrix}\hat{x}_{t}\\ \hat{u}_{t}\end{bmatrix}\biggr{)}+\hat{x}_{T}^{\prime}\hat{Q}_{T}\hat{x}_{T}$		(4a)
subject to

	$\displaystyle\hat{x}_{t+1}$	$\displaystyle=\hat{A}_{t}\hat{x}_{t}+\hat{B}_{t}\hat{u}_{t}+\hat{M}_{t}\hat{\bm{\alpha}}_{t}+\hat{N}_{t}\hat{\bm{\beta}}_{t},~{}t\in(\mathcal{T}\backslash\{T\}),$	(4b)
	$\displaystyle\hat{x}_{0}$	$\displaystyle=\hat{\bm{\gamma}},$	(4c)

where

	$\displaystyle\hat{Q}_{t}$	$\displaystyle=\mathrm{blkdiag}(Q_{1,t},\ldots,Q_{N,t})\in\mathbb{R}^{\hat{n}\times\hat{n}},$
	$\displaystyle\hat{R}_{t}$	$\displaystyle=\mathrm{blkdiag}(R_{1,t},\ldots,R_{N,t})\in\mathbb{R}^{\hat{m}\times\hat{m}},$
	$\displaystyle\hat{B}_{t}$	$\displaystyle=\mathrm{blkdiag}(B_{1,t},\ldots,B_{N,t})\in\mathbb{R}^{\hat{n}\times\hat{m}},$
	$\displaystyle\hat{M}_{t}$	$\displaystyle=\mathrm{blkdiag}(M_{1,t},\ldots,M_{N,t})\in\mathbb{R}^{\hat{n}\times\sum_{j\in\mathcal{N}}(n_{0,j}+n_{K+1,j})},$
	$\displaystyle\hat{\bm{\alpha}}_{t}$	$\displaystyle=\mathrm{col}(\bm{\alpha}_{1,t},\bm{\alpha}_{N,t})\in\mathbb{R}^{\sum_{j\in\mathcal{N}}(n_{0,j}+n_{K+1,j})},$
	$\displaystyle\hat{\bm{\beta}}_{t}$	$\displaystyle=\mathrm{col}(\underline{\beta}_{t},\ldots,\overline{\beta}_{t})\in\mathbb{R}^{\bar{n}_{0}+\bar{n}_{N+1}},$
	$\displaystyle\hat{A}_{t}$	$\displaystyle=\begin{bmatrix}\bar{A}_{1,t}&\bar{F}_{1,t}&&\\ \bar{E}_{2,t}&\bar{A}_{2,t}&\ddots&\\ &\ddots&\ddots&\bar{F}_{N-1,t}\\ &&\bar{E}_{N,t}&\bar{A}_{N,t}\end{bmatrix},\hat{N}_{t}=\left[\begin{array}[]{cc}\bar{E}_{1,t}&0\\ 0&\vdots\\ \vdots&0\\ 0&\bar{F}_{N,t}\end{array}\right],$

and $\hat{N}_{t}\in\mathbb{R}^{\hat{n}\times(\bar{n}_{0}+\bar{n}_{N+1})}$ . Matrix $\hat{A}_{t}\in\mathbb{R}^{\hat{n}\times\hat{n}}$ is a block tri-diagonal matrix of size $O(NK)$ and each of it’s diagonal blocks are again block tri-diagonal of size $O(K)$ .

Finally, by defining $\tilde{u}=\mathrm{col}(\hat{u}_{0},\ldots,\hat{u}_{T-1})\in\mathbb{R}^{\hat{m}T}$ , and $\tilde{x}=\mathrm{col}(\hat{x}_{0},\ldots,\hat{x}_{T})\in\mathbb{R}^{\hat{n}(T+1)}$ , problem (4) can be reformulated by stacking across the temporal dimension of the problem:

	$\min_{\tilde{x},~{}\tilde{u}}\quad\frac{1}{2}\biggl{(}\begin{bmatrix}\tilde{x}\\ \tilde{u}\end{bmatrix}^{\prime}\begin{bmatrix}\tilde{Q}&0\\ 0&\tilde{R}\end{bmatrix}\begin{bmatrix}\tilde{x}\\ \tilde{u}\end{bmatrix}\biggr{)}$	(5a)
subject to
	$\tilde{A}\tilde{x}+\tilde{B}\tilde{u}+\tilde{\omega}=0$	(5b)

where

	$\displaystyle\tilde{Q}$	$\displaystyle=\mathrm{blkdiag}(\hat{Q}_{0},\ldots,\hat{Q}_{T})\in\mathbb{R}^{\hat{n}(T+1)\times\hat{n}(T+1)},$
	$\displaystyle\tilde{R}$	$\displaystyle=\mathrm{blkdiag}(\hat{R}_{0},\ldots,\hat{R}_{T-1})\in\mathbb{R}^{\hat{m}T\times\hat{m}T},$
	$\displaystyle\tilde{\omega}$	$\displaystyle=\mathrm{col}(0,\hat{\omega}_{0},\ldots,\hat{\omega}_{T-1})\in\mathbb{R}^{\hat{n}(T+1)},$
	$\displaystyle\hat{\omega}_{0}$	$\displaystyle=\hat{M}_{t}\hat{\bm{\alpha}}_{t}+\hat{N}_{t}\hat{\bm{\beta}}_{t}+\hat{\gamma}\in\mathbb{R}^{\hat{n}},$
	$\displaystyle\hat{\omega}_{t}$	$\displaystyle=\hat{M}_{t}\hat{\bm{\alpha}}_{t}+\hat{N}_{t}\hat{\bm{\beta}}_{t}\in\mathbb{R}^{\hat{n}},~{}t\in\mathcal{T}\backslash\{0,T\},$
	$\displaystyle\tilde{A}$	$\displaystyle=\begin{bmatrix}-I&&&\\ \hat{A}_{0}&-I&&\\ &\ddots&\ddots&\\ &&\hat{A}_{T-1}&-I\end{bmatrix},\tilde{B}=\left[\begin{array}[]{ccc}0&\ldots&0\\ \hat{B}_{0}&\ddots&\vdots\\ &\ddots&0\\ &&\hat{B}_{T-1}\end{array}\right],$

while $\tilde{B}\in\mathbb{R}^{\hat{n}(T+1)\times\hat{m}T}$ and $\tilde{A}\in\mathbb{R}^{\hat{n}(T+1)\times\hat{n}(T+1)}$ has a block bi-diagonal matrix due to the temporal coupling of variables in (5).

II-B KKT Conditions

For problem (4), the first-order optimality conditions, also known as KKT conditions, are given by


$\displaystyle\tilde{Q}\tilde{x}+\tilde{A}^{\prime}\tilde{\delta}$	$\displaystyle=0,$	(6a)
$\displaystyle\tilde{R}\tilde{u}+\tilde{B}^{\prime}\tilde{\delta}$	$\displaystyle=0,$	(6b)
$\displaystyle\tilde{A}\tilde{x}+\tilde{B}\tilde{u}$	$\displaystyle=r_{\tilde{\delta}}.$	(6c)

where $r_{\tilde{\delta}}=-\tilde{\omega}$ . The variables $\tilde{\delta}=\mathrm{col}(\hat{\delta}_{0},\ldots,\hat{\delta}_{T})\in\mathbb{R}^{\hat{n}*(T+1)}$ are Lagrange multipliers. Since $Q$ and $R$ are positive definite, the problem (5) is strictly convex and the KKT conditions are necessary and sufficient for optimality [17]. Notice that, (6) is a large-scale, sparse and structured linear system of equations whose size scales as $O(NKT)$ . We can solve (6) as


$\displaystyle\Delta\tilde{\delta}$	$\displaystyle=r_{\tilde{\delta}},$	(7a)
$\displaystyle\tilde{x}$	$\displaystyle=\tilde{Q}^{-1}r_{\tilde{x}},$	(7b)
$\displaystyle\tilde{u}$	$\displaystyle=\tilde{R}^{-1}r_{\tilde{u}}.$	(7c)

where $\Delta=(\tilde{A}\tilde{Q}^{-1}\tilde{A}^{\prime}+\tilde{B}\tilde{R}^{-1}\tilde{B}^{\prime})$ , $r_{\tilde{x}}=(\tilde{A}^{\prime}\tilde{\delta})$ and $r_{\tilde{u}}=(\tilde{B}^{\prime}\tilde{\delta})$ .

Note that matrices $\tilde{Q}\succ 0$ and $\tilde{R}\succ 0$ are block diagonal with block size independent of $N,~{}K,$ and $T$ . Therefore, computation of $\tilde{Q}^{-1}$ and $\tilde{R}^{-1}$ can be carried out efficiently. However, solving (7a) is computationally more expensive. Since $\tilde{Q}$ and $\tilde{R}$ are positive definite so their inverse are also positive definite, and $\tilde{A}$ is full row rank. Therefore, $\Delta$ is a positive definite and has a block tri-diagonal structure

\Delta=\left[\begin{array}[]{cccll}\hat{\Psi}_{0}&\hat{\Xi}_{0}^{\prime}&&&\\ \hat{\Xi}_{0}&\hat{\Psi}_{1}&\hat{\Xi}_{1}&&\\ &\ddots&\ddots&\ddots&\\ &&\hat{\Xi}_{T-2}^{\prime}&\hat{\Psi}_{T-1}&\hat{\Xi}_{T-1}^{\prime}\\ &&&\hat{\Xi}_{T-1}&\hat{\Psi}_{T}\end{array}\right]\in\mathbb{R}^{\hat{n}(T+1)\times\hat{n}(T+1)},

(8)

where

	$\displaystyle\hat{\Psi}_{t}$	$\displaystyle\!=\!\hat{Q}_{t}^{-1}\!+\!\hat{A}_{t\!-\!1}\hat{Q}_{t\!-\!1}^{-1}\hat{A}_{t\!-\!1}^{\prime}\!+\!\hat{B}_{t\!-\!1}\hat{R}_{t\!-\!1}^{-1}\hat{B}_{t\!-\!1}^{\prime}~{}t\in\mathcal{T}\backslash\{0\},$
	$\displaystyle\hat{\Xi}_{t}$	$\displaystyle\!=\!-\hat{A}_{t}\hat{Q}_{t}^{-1}\in\mathbb{R}^{\hat{n}\times\hat{n}},~{}t\in\mathcal{T}\backslash\{T\},$

and $\hat{\Psi}_{0}=\hat{Q}_{0}^{-1}\in\mathbb{R}^{\hat{n}\times\hat{n}}$ . Note that each $\hat{\Psi}_{t}\in\mathbb{R}^{\hat{n}\times\hat{n}},~{}t\in\mathcal{T}\backslash\{0\}$ has a block penta-diagonal structure because $\hat{A}_{t}$ are block tri-diagonal of size $O(NK)$ . Let $\mathcal{V}=\{1,2,\ldots,V\}$ , with $V=N/2$ when $N$ is even and $V=(N+1)/2$ otherwise. Also define,


$\displaystyle\bar{\Phi}_{v,t}$	$\displaystyle=\begin{bmatrix}\bar{Z}_{2v-1,t}&\bar{Y}_{2v,t}^{\prime}\\ \bar{Y}_{2v,t}&\bar{Z}_{2v,t}\end{bmatrix},~{}v\in\mathcal{V}\backslash\{V\},$	(9a)
$\displaystyle\bar{\Phi}_{V,t}$	$\displaystyle=\begin{cases}\begin{bmatrix}\bar{Z}_{N-1,t}&\bar{Y}_{N,t}^{\prime}\\ \bar{Y}_{N,t}&\bar{Z}_{N,t}\end{bmatrix},&\text{$V$ even},\\ \bar{Z}_{N,t},&\text{$V$ odd},\end{cases}$	(9b)
$\displaystyle\bar{\Omega}_{v,t}$	$\displaystyle=\begin{bmatrix}\bar{W}_{2v-1,t}&\bar{Y}_{2v-1,t}\\ 0&\bar{W}_{2v}\end{bmatrix},~{}v\in\mathcal{V}\backslash\{1,V\},$	(9c)
$\displaystyle\bar{\Omega}_{V,t}$	$\displaystyle=\begin{cases}\begin{bmatrix}\bar{W}_{N-1,t}&\bar{Y}_{N-1,t}\\ 0&\bar{W}_{N,t}\end{bmatrix},&\text{$N$ even},\\ \begin{bmatrix}\bar{V}_{N,t}&\bar{Y}_{N,t}\end{bmatrix},&\text{$N$ odd},\end{cases}$	(9d)

where


	$\displaystyle\bar{Z}_{j,t}$	$\displaystyle\!=\!\bar{A}_{j,t}\bar{Q}_{j,t}^{-1}\bar{A}_{j,t}^{\prime}\!+\!\bar{E}_{j,t}\bar{Q}_{j-1,t}^{-1}\bar{E}_{j,t}^{\prime}\!+\!\bar{F}_{j,t}\bar{Q}_{j+1,t}^{-1}\bar{F}_{j,t}^{\prime},$
	$\displaystyle\bar{Y}_{j,t}$	$\displaystyle\!=\!\bar{E}_{j,t}\bar{Q}_{j-1,t}^{-1}\bar{A}_{j-1,t}^{\prime}\!+\!\bar{A}_{j,t}\bar{Q}_{j,t}^{-1}\bar{F}_{j-1,t}^{\prime},$
	$\displaystyle\bar{V}_{j,t}$	$\displaystyle\!=\!\bar{E}_{j,t}\bar{Q}_{j-1,t}^{-1}\bar{F}_{j-2,t}^{\prime},$

for $j\in\mathcal{N}$ , $t\in\mathcal{T}\backslash\{T\}$ , with $\bar{Q}_{0,t}=0$ , $\bar{Q}_{N+1,t}=0$ and $\bar{F}_{0,t}=0$ . Given this, $\hat{\Psi}_{t}$ for $t\in\mathcal{T}\backslash\{T\}$ becomes a block tri-diagonal matrix given by

\hat{\Psi}_{t}=\hat{\Phi}_{t}-\hat{\Omega}_{t}=\left[\begin{array}[]{cccll}\bar{\Phi}_{1,t}&\bar{\Omega}_{2,t}^{\prime}&&&\\ \bar{\Omega}_{2,t}&\bar{\Phi}_{2,t}&\bar{\Omega}_{3,t}&&\\ &\ddots&\ddots&\ddots&\\ &&\bar{\Omega}_{V-1,t}^{\prime}&\bar{\Phi}_{V-1,t}&\bar{\Omega}_{V,t}^{\prime}\\ &&&\bar{\Omega}_{V,t}&\bar{\Phi}_{V,t}\end{array}\right],

(11)

where $\hat{\Phi}_{t}=\mathrm{blkdiag}(\bar{\Phi}_{1,t},\ldots,\bar{\Phi}_{V,t})$ and $\hat{\Omega}_{t}=\hat{\Phi}_{t}-\hat{\Psi}_{t}$ . This structured splitting of $\hat{\Psi}$ will be needed in the next section.

In the next section, we propose an iterative algorithm based on preconditioned conjugate gradient method (PCGM) to solve (7a). The main contribution of this paper is the development of a structured preconditioner based on nested block Jacobi iterations whose properties are detailed in Section IV.

III Preconditioned Conjugate Gradient Method

The positive definite linear system of equations (7a) can be solved using conjugate gradient method (CGM) [18]. Let $e^{(k)}=\tilde{\delta}^{(k)}-\tilde{\delta}^{*}$ be the error between $k$ -th iterate $\tilde{\delta}^{(k)}$ of the CGM and the exact solution $\tilde{\delta}^{*}$ of (7a). It can be shown that $e^{(k)}$ satisfies the following [19, Thm. 6.29]:

\|e^{(k)}\|_{\Psi}\leq 2\left((\sqrt{\kappa(\Delta)}-1)\big{/}(\sqrt{\kappa(\Delta)}+1)\right)^{i}\|e^{(0)}\|_{\Delta},

(12)

where $\|e\|_{\Delta}=e^{\prime}\Delta e$ , $\kappa(\Delta)=\lambda_{\mathrm{max}}(\Delta)\big{/}\lambda_{\mathrm{min}}(\Delta)$ is the condition number and $\lambda_{\mathrm{max}}(\Delta)$ , $\lambda_{\mathrm{min}}(\Delta)$ ) are the maximum and minimum eigenvalue of $\Delta$ , respectively. It shows that the CGM converges faster if $\kappa(\Delta)$ is closer to $1$ . Therefore, to improve the performance of CGM, the system of equations (7a) is transformed such that the condition number of the transformed coefficient matrix is improved. The transformed system of equations is known as preconditioned system. Let $P=P^{\prime}$ be a positive definite matrix, then solving (7a) is equivalent to solving the transformed system

P^{-1/2}\Delta P^{-1/2}\tilde{\gamma}=r_{\tilde{\gamma}},

(13)

where $\tilde{\gamma}=P^{1/2}\tilde{\delta}$ and $r_{\tilde{\gamma}}=P^{-1/2}r_{\tilde{\delta}}$ . An efficient implementation of preconditioned CGM (PCGM) is given in Algorithm 1 from [20].

Algorithm 1 PCGM [20] for (7a) with preconditioner

P

1:initialize

\tilde{\delta}^{(0)}

\epsilon

\text{iter}_{\text{max}}

r^{(0)}=r_{\tilde{\theta}}-\Delta\tilde{\delta}^{(0)}

3:solve

Pd^{(0)}=r^{(0)}

\mu^{(0)}=(d^{(0)})^{\prime}r^{(0)}

i=0

6:while

i<\text{iter}_{\text{max}}

y^{(i)}=\Delta{d}^{(i)}

\vartheta^{(i)}={\mu^{(i)}}/{((y^{(i)})^{\prime}d^{(i)})}

\tilde{\delta}^{(i+1)}=\tilde{\delta}^{(i)}+\vartheta^{(i)}d^{(i)}

10:

r^{(i+1)}=r^{(i)}-\vartheta^{(i)}y^{(i)}

11: if

\|r^{(i+1)}\|_{\infty}<\epsilon

exit

12: solve

Pq^{(i+1)}=r^{(i+1)}

13:

\mu^{(i+1)}=(q^{(i+1)})^{\prime}r^{(i+1)}

14:

d^{(i+1)}=r^{(i+1)}+\left({\mu^{(i+1)}}/{\mu^{(i)}}\right)d^{(i)}

15:

i=i+1

16:end while

Note that lines 7 to 14 constitute one PCG step. The transformed system $P^{-1/2}\Delta P^{-1/2}$ is never formed explicitly. Instead, the preconditioning is performed by solving a linear system of equations at lines 3 and 12 of Algorithm 1. Choice of preconditioner $P$ determines the overall complexity of PCGM. Ideally, if we use the preconditioner $P=\Delta$ , this will result in $P^{-1/2}\Delta P^{-1/2}=I$ . However, the linear system to be solved at the preconditioning lines 3 and 12 becomes the original problem. The incomplete sparse LU factorization of the coefficient matrix $\Delta$ can be used as a preconditioner [21, 22]. However, to ensure that the resulting preconditioner is both positive definite and effective, it may be necessary to use incomplete LU factors that are denser (i.e., have less structure) than $\Delta$ .

In the next section, we devise a structured preconditioning approach based on nested block Jacobi iterations. We apply a fixed number of inner iterations at each outer iteration of block Jacobi method to approximately implement lines 3 and 12 with $P=\Delta$ . The proposed approach is inspired by the ideas from [23, 24, 25, 26, 27, 28]. It extends the developments presented in [13] where same ideas have already been explored within the context of 1D chains.

IV Nested Block Jacobi Preconditioner

Let $\Delta=\tilde{\Psi}-\tilde{\Xi}\succ 0$ is a matrix splitting such that


$\displaystyle\tilde{\Psi}$	$\displaystyle=\mathrm{blkdiag}(\hat{\Psi}_{0},\ldots,\hat{\Psi}_{T})\succ 0,$	(14a)
$\displaystyle\tilde{\Xi}$	$\displaystyle=\tilde{\Psi}-\Delta.$	(14b)

Then, the block Jacobi method for solving (7a) involves the following:

\tilde{\Psi}\tilde{\delta}^{(s+1)}=r_{\tilde{\delta}}+\Xi\tilde{\delta}^{(s)},

(15)

where $s$ is the number of block Jacobi iterations. These iterations converge to the solution of (7a) if and only if

\rho(\Psi^{-1}\Xi)<1,

(16)

where $\rho(\cdot)$ denotes spectral radius [20, Thm. 2.16]. Given that $\Delta$ is a positive definite block tri-diagonal matrix, and the splitting (14), condition (16) always holds when (15) is solved exactly [20, Lem. 4.7 with Thm. 4.18]. However, instead of solving (15) exactly, we propose a fixed number of inner iterations. It is apparent from (11), that $\tilde{\Psi}\succ 0$ has a block tri-diagonal structure with $\hat{\Omega}_{1,t}=0$ and $\hat{\Omega}_{V+1,t}=0$ for $t\in\mathcal{T}$ . Let $\tilde{\Psi}=\tilde{\Phi}-\tilde{\Omega}\succ 0$ such that


$\displaystyle\tilde{\Phi}$	$\displaystyle=\mathrm{blkdiag}(\hat{\Phi}_{0},\ldots,\hat{\Phi}_{T})\succ 0,$	(17a)
$\displaystyle\tilde{\Omega}$	$\displaystyle=\tilde{\Phi}-\tilde{\Psi}.$	(17b)

Then for each outer iteration $s$ , we propose $L$ number of inner iterations. This results in a nested block Jacobi method (NBJM) given in Algorithm 2.

Algorithm 2 NBJM for (7a)

1:initialize

\tilde{\delta}^{(0)}=0

2:for

s=0,\ldots,S-1

3: set

r_{\tilde{\theta}}^{(s)}=r_{\tilde{\delta}}+\Xi\tilde{\delta}^{(s)}

and initialize

\tilde{\theta}^{(0)}=0

4: for

l=0,\ldots,L-1

5: solve

\tilde{\Phi}\tilde{\theta}^{(l+1)}=r_{\tilde{\theta}}^{(s)}+\tilde{\Omega}\tilde{\theta}^{(l)}

6: end for

7: if

\|\tilde{\delta}^{(s+1)}-\tilde{\delta}^{(s)}\|_{\infty}<\epsilon

exit

8:end for

Note that for the proposed splitting (17) of the positive definite block tri-diagonal matrix $\tilde{\Psi}$ , condition $\rho(\tilde{\Phi}^{-1}\tilde{\Omega})<1$ always holds [20, Lem. 4.7 with Thm. 4.18].

IV-A Positive-definiteness of Preconditioner

We propose to apply fixed number of inner-outer iterations of NBJM (i.e., Algorithm 2), with $P=\Delta$ at lines 3 and 12 in Algorithm 1. Executing this is equivalent to the use of a positive definite preconditioner. First, we present the following technical lemma which leads to the main result.

Lemma 1.

Given that for the proposed splitting (17) of the positive definite block tri-diagonal matrix $\tilde{\Psi}$ , condition $\rho(\tilde{\Phi}^{-1}\tilde{\Omega})<1$ always holds. Let $L$ is a positive even integer i.e., $L\in\{2,4,\ldots\}$ , the inverse of $\tilde{\Psi}$ can be expressed as

\tilde{\Psi}^{-1}=\tilde{\Upsilon}_{L}+\tilde{\Pi}_{L},

(18)

where $\tilde{\Upsilon}_{L}=\sum_{l=0}^{L-1}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}\succ 0$ and $\tilde{\Pi}_{L}=\sum_{l=L}^{\infty}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}\succeq 0$ .

Proof.

First note that for the proposed splitting (17) of $\tilde{\Psi}=\tilde{\Phi}-\tilde{\Omega}\succ 0$ , with $\Lambda=\mathrm{blkdiag}(\Lambda_{T+1},\ldots,\Lambda_{1})$ and $\Lambda_{t}=(-I)^{t}$ for $t=1,\ldots,T+1$ , it holds that $\tilde{\Phi}+\tilde{\Omega}=\Lambda^{\prime}\tilde{\Psi}\Lambda\succ 0$ . Then using $(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}=\tilde{\Phi}^{-1}(\tilde{\Omega}\tilde{\Phi}^{-1})^{l}$ , note that

	$\displaystyle\tilde{\Upsilon}_{L}$	$\displaystyle=\sum_{l=0}^{(L/2)-1}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1}(\tilde{\Omega}\tilde{\Phi}^{-1})^{l}$
		$\displaystyle=\sum_{l=1}^{(L/2)-1}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1}(\tilde{\Omega}\tilde{\Phi}^{-1})^{l}$
		$\displaystyle+\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1},$

and

\displaystyle\tilde{\Pi}_{L}

\displaystyle=\sum_{l=(L/2)}^{\infty}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1}(\tilde{\Omega}\tilde{\Phi}^{-1})^{l},

for $L\in\{2,4,\ldots\}$ . Note that $\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1}\succ 0$ and $(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}(\tilde{\Phi}+\tilde{\Omega})\tilde{\Phi}^{-1}(\tilde{\Omega}\tilde{\Phi}^{-1})^{l}\succeq 0$ for all $l\in\mathbb{N}$ . Therefore, $\tilde{\Upsilon}_{L}\succ 0$ and $\tilde{\Pi}_{L}\succeq 0$ as claimed. ∎

Remark 2.

Note that from Lem. 1, $\tilde{\Psi}^{-1}-\tilde{\Upsilon}_{L}=\tilde{\Pi}_{L}\succeq 0$ which implies that $\tilde{\Upsilon}_{L}^{-1}-\tilde{\Psi}\succeq 0$ [29, Cor. 7.7.4]. Also note that for the proposed splitting (14) of $\Delta\succ 0$ in (8), with $\Lambda=\mathrm{blkdiag}(\Lambda_{T+1},\ldots,\Lambda_{1})$ and $\Lambda_{t}=(-I)^{t}$ for $t=1,\ldots,T+1$ , it holds that $\tilde{\Psi}+\tilde{\Xi}=\Lambda^{\prime}\tilde{\Psi}\Lambda\succ 0$ . Therefore, $(\tilde{\Upsilon}_{L}^{-1}+\tilde{\Xi})-(\tilde{\Psi}+\tilde{\Xi})\succeq 0\implies(\tilde{\Upsilon}_{L}^{-1}+\tilde{\Xi})\succeq(\tilde{\Psi}+\tilde{\Xi})\succ 0$ .

Now we present the main result.

Theorem 3.

Given $S\in\mathbb{N}$ and $L\in\{2,4,\ldots\}$ , the $S^{\mathrm{th}}$ iterate of Algorithm 2 satisfies $P_{S}\tilde{\delta}^{S}=r_{\tilde{\delta}}$ with $P_{S}=\varDelta_{S}^{-1}\succ 0$ , where $\varDelta_{S}=\sum_{s=0}^{S-1}(\tilde{\Upsilon}_{L}\tilde{\Xi})^{s}\tilde{\Upsilon}_{L}$ and $\tilde{\Upsilon}_{L}=\sum_{l=0}^{L-1}(\tilde{\Phi}^{-1}\tilde{\Omega})^{l}\tilde{\Phi}^{-1}$ .

Proof.

Noting that for each outer iteration $s$ of Algorithm 2, we execute $L$ number of inner iterations with $\tilde{\theta}^{(0)}=0$ . Therefore, $\tilde{\delta}^{(s)}=r_{\tilde{\delta}}+(\tilde{\Upsilon}_{L}\tilde{\Xi})\tilde{\delta}^{(s-1)}=\varDelta_{S}r_{\tilde{\delta}}+(\tilde{\Upsilon}_{L}\tilde{\Xi})^{S}\tilde{\delta}^{(0)}=\varDelta_{S}r_{\tilde{\delta}}$ . Then note that $\varDelta_{1}=\tilde{\Upsilon}_{1}\succ 0$ , $\tilde{\Upsilon}_{L}(\tilde{\Upsilon}_{L}^{-1}+\Xi)\tilde{\Upsilon}_{L}\succ 0$ and using $(\tilde{\Upsilon}_{L}\tilde{\Xi})^{s}\tilde{\Upsilon}_{L}=\tilde{\Upsilon}_{L}(\tilde{\Xi}\tilde{\Upsilon}_{L})^{s}$ that

	$\displaystyle\varDelta_{2S}$	$\displaystyle=\sum_{s=0}^{S-1}(\tilde{\Upsilon}_{L}\tilde{\Xi})^{s}\tilde{\Upsilon}_{L}(\tilde{\Upsilon}_{L}^{-1}+\Xi)\tilde{\Upsilon}_{L}(\tilde{\Xi}\tilde{\Upsilon}_{L})^{s}$
		$\displaystyle=\sum_{s=1}^{S-1}(\tilde{\Upsilon}_{L}\tilde{\Xi})^{s}\tilde{\Upsilon}_{L}(\tilde{\Upsilon}_{L}^{-1}+\Xi)\tilde{\Upsilon}_{L}(\tilde{\Xi}\tilde{\Upsilon}_{L})^{s}$
		$\displaystyle\quad+\tilde{\Upsilon}_{L}(\tilde{\Upsilon}_{L}^{-1}+\Xi)\tilde{\Upsilon}_{L}\succ 0,$

and

	$\displaystyle\varDelta_{2S+1}$	$\displaystyle=\sum_{s=0}^{2S-1}(\tilde{\Upsilon}_{L}\tilde{\Xi})^{s}\tilde{\Upsilon}_{L}+(\tilde{\Upsilon}_{L}\tilde{\Xi})^{2{S}}\tilde{\Upsilon}_{L}$
		$\displaystyle=\varDelta_{2S}+(\tilde{\Upsilon}_{L}\tilde{\Xi})^{S}\tilde{\Upsilon}_{L}((\tilde{\Upsilon}_{L}\tilde{\Xi})^{S})^{\prime}\succ 0,$

for ${S}\in\mathbb{N}$ . Therefore, $\varDelta_{S}\succ 0$ and consequently $P_{S}=\varDelta_{S}^{-1}\succ 0$ .. ∎

IV-B Decomposable Computations

Note that we do not explicitly construct the preconditioner $P_{S}$ . Instead, at each PCG step, we execute $S$ number of outer iterations of Algorithm 2 such that for each outer iteration $s$ , $L$ number of inner iterations are performed. At each inner iteration, we need to solve

\tilde{\Phi}\tilde{\theta}^{(l+1)}=r_{\tilde{\theta}}^{(s)}+\tilde{\Omega}\tilde{\theta}^{(l)}.

(19)

Let, $\tilde{\theta}=\mathrm{col}(\hat{\theta}_{0},\ldots,\hat{\theta}_{T})$ where each $\hat{\theta}_{t}=\mathrm{blkdiag}(\bar{\theta}_{1,t},\ldots,\bar{\theta}_{V,T})$ for $t\in\mathcal{T}$ . Then, the computations in (19) can be decomposed into $V\times(T+1)$ smaller problems

\bar{\Phi}_{v,t}\bar{\theta}_{v,t}^{(l+1)}=\bar{\tau}_{v,t},

(20)

for $v\in\mathcal{V}$ and $t\in\mathcal{T}$ , where $\bar{\tau}_{v,t}=(r_{\tilde{\theta}}^{(s)})_{v,t}+\bar{\Omega}_{v,t}\bar{\theta}_{v-1,t}^{(l)}+\bar{\Omega}_{v+1,t}^{\prime}\bar{\theta}_{v+1,t}^{(l)}$ , and $(r_{\tilde{\theta}}^{(s)})_{v,t}$ represents the decomposition of vector $r_{\tilde{\theta}}^{(s)}$ , accordingly. Note that for each smaller (20), the coefficient matrix $\bar{\Phi}_{v,t}$ is a block banded matrix of size $O(K)$ , where the subblocks are of size $O(n_{i,j})$ independent of $K$ , $N$ and $T$ . Therefore for each $v\in\mathcal{V}$ and $t\in setT$ , the linear system (20) can be solved using block Cholesky factorization of $\bar{\Phi}_{v,t}$ with arithmetic complexity of $O(K)$ . Note that this factorization is performed once at the start of Algorithm 1. Therefore, the computations at preconditioning lines 3 and 12 decompose into $V\times(T+1)$ parallel threads with per thread arithmetic complexity of $O(K)$ . By defining $\tilde{\delta}=\mathrm{col}(\hat{\delta}_{0},\ldots,\hat{\delta}_{T})$ where each $\hat{\delta}_{t}=\mathrm{blkdiag}(\bar{\delta}_{1,t},\ldots,\bar{\delta}_{V,T})$ and for $t\in\mathcal{T}$ and partitioning $\Delta$ and $\tilde{\Xi}$ accordingly, the matrix vector products at line 7 of Algorithm 1 can also be decomposed into $V\times(T+1)$ parallel threads connected in a 2D grid structure with localized data exchange. In Table I, we present the arithmetic complexity of each line of Algorithm 1. It shows the overall arithmetic complexity as well as per-thread, along with the overhead of scalar-value data exchange for each thread in case of parallel implementation on $V\times(T+1)$ processors. The arithmetic complexity of each PCG step is dominated by the preconditioning line 12. With $L$ and $S$ being fixed, the overall arithmetic complexity of each PCG step becomes $O(KNT)$ . Note that the computations involved to form the dot products at lines 8 and 13 and to update the infinity norm of the residual at line 11 are sequential in nature. In case of parallel implementation, they can be carried out by performing a backward-forward across the 2D grid with localized scalar data exchange. It should be noted that the total arithmetic complexity of PCGM to converge to a solution of desired accuracy depends on the total number of steps performed. In worst case, the number of steps can be of $O(KNT)$ . This will result in $O((KNT)^{2})$ arithmetic complexity. However, it often performs much better than this worst case complexity, as illustrated in the next section.

Single Thread

V\times~{}(T+1)

Parallel Threads

Alg. 1

Computations

per thread

Data xchg.

per thread

Line 7:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

O(K\check{n})

Line 8:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

O(1)

Line 9:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

Line 10:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

Line 11:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

O(1)

Line 12:

O(LSKNT\check{n}^{3})

O(K\check{n}^{3})

O(LSK\check{n})

Line 13:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

O(1)

Line 14:

O(KNT\check{n}^{3})

O(K\check{n}^{3})

TABLE I: Arithmetic complexity and data-exchange overhead of the main loop of Algorithm 1 where

\check{n}=\max_{i,j}(n_{i,j})

, where

n_{i,j}

is the size of the state

x_{i,j}

; and

L

S

are the fixed number of inner-outer iterations of Algorithm 2.

V Numerical Experiments

In this section, we present results of the numerical experiments for solving LQ optimal control for two cases.

Case (I): We consider a 2D grid of mass-spring-damper systems with $N\times K$ masses, where each sub-system $(i,j)$ has the discrete-time state space dynamics of the form (1) with four states and two inputs. The cost function has $Q_{i,j,t}=I\in\mathbb{R}^{4\times 4}$ and $R_{i,j,t}=2$ for $i\in\mathcal{K}$ , $j\in\mathcal{N}$ , $t\in\mathcal{T}$ . The mass, spring constant, and damping coefficient parameters are randomly selected between $0.8$ and $1.5$ to create a heterogeneous network.

Case (II): We consider an LQ optimal control problem of an automated irrigation network under distant-downstream control [30] shown in Fig. 2. The network consists of $N$ controlled pools in the primary channel and $K$ controlled pools in each secondary channel originating from the primary channel’s pools. The dynamics of each controlled pool can be described by a sub-system in the form of (1) with $F_{i,j,t}=0$ and $G_{i,j,t}=0$ . That is, there is a direct coupling between the states of sub-systems $(i,j)$ and $i,j-1$ in the primary channel and $(i,j)$ and $i-1,j$ in the secondary channels. However, there is no inter-coupling between the states of subsystems in the secondary channels. The control inputs $u_{i,j,t}$ are the adjustment of the water-level reference, and the states $x_{i,j,t}$ are the deviation of state trajectory from equilibrium. For each sub-system $i\in\mathcal{K},~{}j\in\mathcal{N}$ , the number of states $n_{i,j}=4$ , and the number of inputs $m_{i,j}=1$ . The corresponding cost function has $Q_{i,j}=I\in\mathbb{R}^{4\times 4}$ and $R_{i,j}=1$ .

The LQ optimal control problem for both cases is solved by solving the corresponding KKT conditions of the form (6). We compare results of the following three methods for solving (7a).

•

The proposed PCGM in Algorithm 1, with proposed nested block Jacobi preconditioning with $L=2$ and $S=2$ .
•

A standard nested block Jacobi method (see e.g. [28]).
•

MATLAB’s Backslash (CHOLMOD [31]) with sparse matrix representation

The experiment are conducted with $N=K=T$ varied from $10$ to $100$ such that the number of scalar variables in the largest problems for both cases are of $O(10^{6})$ . We consider the processor time of a single thread implementation which serves as a proxy for the overall arithmetic complexity of each method. We set the stopping criteria for the infinity norm of the residuals of the PCGM and standard nested block Jacobi method to $\epsilon=10^{-9}$ . Note that for solving (7a) with the standard nested block Jacobi method the outer iterations are performed until stopping criteria is met, while the inner iterations are fixed to $L=2$ . All computations are performed on a Windows PC with Intel Core-i7 processor with $2.4GHz$ clock speed and 16GB of RAM.

Case (I): 2D Grid of spring-mass-damper network. Fig. 5 shows the total number of steps of the proposed PCGM with $L=2,~{}S=2$ and total number of outer steps of the standard NBJM with $L=2$ against the total number of variables in (7a). It shows that the standard NBJM consistently takes a large number of steps, in the order of thousands. On the other hand, our proposed PCGM consistently takes fewer number of steps lesser than hundred as compared to the size of the system. For instance, for $K=N=T=40$ , the total number of unknown are $O(10^{5})$ while the total number of PCGM steps performed are $81$ only. Such small number of steps of proposed PCGM are particularly favourable for a distributed implementation resulting in a small data exchange overhead.

Fig. 5 shows the normalized processor time for solving (7a) using all three approaches. It shows that along the line $K=N=T$ , the per step arithmetic complexity of PCGM and standard NBJM scales as $O(K^{3})$ . However, due to the large number of step performed by NBJM, the overall processor time required is an order larger than the proposed PCGM. In contrast, for smaller systems MATLAB’s Backslash performs well but its overall processor time scales as $O(K^{6})$ . This is because Backslash invokes sparse cholesky factorization from CHOLMOD [31]. Despite its superior capabilities including permutation of variables for computational benefit, the number of fill-ins during the factorization increases exponentially since the structure within the sub blocks of $\Delta$ is lost. Moreover, for problems with $K=N=T$ greater than $60$ , processor time for Backslash also includes memory management overhead, thus no longer a good proxy for arithmetic complexity. For problems with $K=N=T=90$ and above, Backslash went out of memory. While the proposed PCGM performs much better with an added scope of the decomposibility of computations.

Fig. 5 illustrates the actual condition number $\kappa(\Delta)$ and the achieved conditioning of the transformed system $\kappa(P_{S}^{-1/2}\Delta P_{S}^{-1/2})$ using the structured preconditioner $P_{S}$ given in Thm. 3. It can be seen that the conditioning is improved by almost two order of magnitude with the proposed approach.

Case (II): Automated Irrigation Network. Fig. 8 compares the total number of PCGM steps and standard NBJM steps with the size of the system. It can be seen that the number of steps performed by NBJM increases as $O(K^{2})$ with the size of the problem reaching up to $10^{5}$ for $K=N=T=50$ . As such, for larger systems, the results of NBJM are not included. By contrast, the increase in the number of steps of PCGM is $O(K)$ but the steps performed are much smaller than the size of the system.

Fig.8 shows the overall processor time of the three approaches. The processor time of NBJM increases as $O(K^{5})$ and that of Backslash is $O(K^{6})$ . Whereas the increase in the processor time of the proposed PCGM is $O(K^{4})$ which reflects the linear increase in the number of steps of PCGM along the line $K=N=T$ . Moreover, for problem size larger than $K=N=T=40$ , Backslash suffers from memory management issues and it went out of memory for problems larger than $K=N=T=60$ .

Finally, Fig.8 shows the condition number of $\Delta$ in (7a) and $\kappa(P_{S}^{-1/2}\Delta P_{S}^{-1/2})$ of the preconditioned system. It can be seen that there is an order of magnitude improvement in the conditioning with $L=2$ and $S=2$ .

VI Conclusions

In this paper, we have proposed a structured PCGM for solving LQ optimal control problems of systems with 2D grid structure. The per step arithmetic complexity of the proposed approach scales linearly in each spatial dimension as well as in temporal dimension. The computations are amenable to distributed implementation on $O(NT)$ parallel processors with localized data exchange mirroring the 2D grid structure of the problem. Future works include the development of analytical bounds on the achieved conditioning and an implementation of the proposed approach on parallel processing architectures. It will enable us to perform a deeper analysis of the distributed performance of proposed approach along with the issues related with the data exchange overhead.

VII Acknowledgement

We would like to thank Prof. Michael Cantoni and Dr. Farhad Farokhi from the University of Melbourne, Australia for their insights and the helpful discussions about this work. This work was supported by the Air Force Office of Scientific Research Grant FA2386-19-1-4076 and the NSW Defence Innovation Network.

References

[1] Y. Li, M. Cantoni, and E. Weyer, “On Water-Level Error Propagation in Controlled Irrigation Channels,” in Proceedings of the 44th IEEE Conference on Decision and Control. IEEE, 2005, pp. 2101–2106.
[2] Q. Peng and S. H. Low, “Distributed optimal power flow algorithm for radial networks, I: Balanced single phase case,” IEEE Transactions on Smart Grid, vol. 9, no. 1, pp. 111–121, 2018.
[3] M. Liu, J. Zhao, S. P. Hoogendoorn, and M. Wang, “An optimal control approach of integrating traffic signals and cooperative vehicle trajectories at intersections,” Transportmetrica B, vol. 10, no. 1, pp. 971–987, 2022.
[4] I. Karafyllidis and A. Thanailakis, “A model for predicting forest fire spreading using cellular automata,” Ecological Modelling, vol. 99, no. 1, pp. 87–97, 1997.
[5] V. L. Somers and I. R. Manchester, “Priority maps for surveillance and intervention of wildfires and other spreading processes,” Proceedings - IEEE International Conference on Robotics and Automation, vol. 2019-May, pp. 739–745, 2019.
[6] H. Maurer and H. D. Mittelmann, “Optimization techniques for solving elliptic control problems with control and state constraints: Part 1. Boundary control,” Computational Optimization and Applications, vol. 16, pp. 29–55, 2000.
[7] C. V. Rao, S. J. Wright, and J. B. Rawlings, “Application of Interior-Point Methods to Model Predictive Control,” Journal of Optimization Theory and Applications, vol. 99, no. 3, pp. 723–757, dec 1998.
[8] Y. Wang and S. Boyd, “Fast Model Predictive Control Using Online Optimization,” IEEE Transactions on Control Systems Technology, vol. 18, no. 2, pp. 267–278, 2010.
[9] A. Shahzad, E. C. Kerrigan, and G. A. Constantinides, “A Stable and Efficient Method for Solving a Convex Quadratic Program with Application to Optimal Control,” SIAM Journal on Optimization, vol. 22, no. 4, pp. 1369–1393, jan 2012.
[10] I. Nielsen and D. Axehill, “Direct Parallel Computations of Second-Order Search Directions for Model Predictive Control,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2845–2860, 2019.
[11] F. Rey, P. Hokayem, and J. Lygeros, “ADMM for Exploiting Structure in MPC Problems,” IEEE Transactions on Automatic Control, pp. 1–1, 2020.
[12] A. Zafar, M. Cantoni, and F. Farokhi, “Optimal control computation for cascade systems by structured Jacobi iterations,” IFAC-PapersOnLine, vol. 52, no. 20, pp. 291–296, 2019.
[13] ——, “Structured Preconditioning of Conjugate Gradients for Path-Graph Network Optimal Control Problems,” IEEE Transactions on Automatic Control, vol. 67, no. 8, pp. 4115–4122, aug 2022.
[14] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained Consensus and Optimization in Multi-Agent Networks,” IEEE Transactions on Automatic Control, vol. 55, no. 4, pp. 922–938, 2010.
[15] A. Falsone, K. Margellos, S. Garatti, and M. Prandini, “Dual Decomposition for Multi-Agent Distributed Optimization with Coupling Constraints,” Automatica, vol. 84, pp. 149–158, 2017.
[16] A. Zafar, F. Farokhi, and M. Cantoni, “Linear Quadratic Control Computation for Systems with a Directed Tree Structure,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 6536–6541, 2020.
[17] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 2000.
[18] M. R. Hestenes and E. Stiefel, “Methods of Conjugate Gradients for Solving Linear Systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 1, pp. 409–436, 1952.
[19] Y. Saad, “Iterative Methods for Sparse Linear Systems, Second Edition,” Methods, p. 528, 2003.
[20] W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations, ser. Applied Mathematical Sciences. Cham: Springer International Publishing, 2016, vol. 95.
[21] M. Benzi and M. Tůma, “A Robust Incomplete Factorization Preconditioner for Positive Definite Matrices,” Numerical Linear Algebra with Applications, vol. 10, no. 5-6, pp. 385–400, 2003.
[22] J. Xia and Z. Xin, “Effective and Robust Preconditioning of General SPD Matrices via Structured Incomplete Factorization,” SIAM Journal on Matrix Analysis and Applications, vol. 38, no. 4, pp. 1298–1322, jan 2017.
[23] N. K. Nichols, “On the Convergence of Two-Stage Iterative Processes for Solving Linear Equations,” SIAM Journal on Numerical Analysis, vol. 10, no. 3, pp. 460–469, jun 1973.
[24] O. G. Johnson, C. A. Micchelli, and G. Paul, “Polynomial preconditioners for conjugate gradient calculations,” SIAM Journal on Numerical Analysis, vol. 20, no. 2, pp. 362–376, 1983.
[25] P. J. Lanzkron, D. J. Rose, and D. B. Szyld, “Convergence of nested classical iterative methods for linear systems,” Numerische Mathematik, vol. 58, no. 1, pp. 685–702, dec 1990.
[26] V. Migallón and J. Penadés, “Convergence of two-stage iterative methods for Hermitian positive definite matrices,” Applied Mathematics Letters, vol. 10, no. 3, pp. 79–83, 1997.
[27] Z.-h. Cao, H.-b. Wu, and Z.-y. Liu, “Two-stage iterative methods for consistent Hermitian positive semidefinite systems,” Linear Algebra and its Applications, vol. 294, no. 1-3, pp. 217–238, jun 1999.
[28] Z.-H. Cao, “Convergence of Nested Iterative Methods for Symmetric P-Regular Splittings,” SIAM Journal on Matrix Analysis and Applications, vol. 22, no. 1, pp. 20–32, jan 2000.
[29] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 2013.
[30] M. Cantoni, E. Weyer, Y. Li, S. K. Ooi, I. Mareels, and M. Ryan, “Control of Large-Scale Irrigation Networks,” Proceedings of the IEEE, vol. 95, no. 1, pp. 75–91, jan 2007.
[31] Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, “Algorithm 887: CHOLMOD, supernodal sparse cholesky factorization and update/downdate,” ACM Transactions on Mathematical Software, vol. 35, no. 3, pp. 1–14, 2008.