\newsiamremark

remarkRemark \newsiamremarkexampleExample \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersPinT preconditioner for parabolic optimal control problemsS. Y. Hon, P. Y. Fung, and X.-L. Lin

An optimal parallel-in-time preconditioner for parabolic optimal control problems

Sean Y. Hon²²footnotemark: 2 Po Yin Fung³³footnotemark: 3 Xue-lei Lin¹¹footnotemark: 1 ⁰⁰footnotetext: ¹¹footnotemark: 1 Corresponding Author. School of Science, Harbin Institute of Technology, Shenzhen 518055, China (). ⁰⁰footnotetext: ²²footnotemark: 2 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR (). ⁰⁰footnotetext: ³³footnotemark: 3 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR (). [email protected] [email protected] [email protected]

Abstract

In this work, we propose a novel diagonalization-based preconditioner for the all-at-once linear system arising from the optimal control problem of parabolic equations. The proposed preconditioner is constructed based on an $\epsilon$ -circulant modification to the rotated block diagonal (RBD) preconditioning technique, which can be efficiently diagonalized by fast Fourier transforms in a parallel-in-time fashion. To our knowledge, this marks the first application of the $\epsilon$ -circulant modification to RBD preconditioning. Before our work, the studies of PinT preconditioning techniques for the optimal control problem are mainly focused on $\epsilon$ -circulant modification to Schur complement based preconditioners, which involves multiplication of forward and backward evolutionary processes and thus square the condition number. Compared with those Schur complement based preconditioning techniques in the literature, the advantage of the proposed $\epsilon$ -circulant modified RBD preconditioning is that it does not involve the multiplication of forward and backward evolutionary processes. When the generalized minimal residual method is deployed on the preconditioned system, we prove that when choosing $\epsilon=\mathcal{O}(\sqrt{\tau})$ with $\tau$ being the temporal step-size , the convergence rate of the preconditioned GMRES solver is independent of the matrix size and the regularization parameter. Such restriction on $\epsilon$ is more relax than the assumptions on $\epsilon$ from other works related to $\epsilon$ -circulant based preconditioning techniques for the optimal control problem. Numerical results are provided to demonstrate the effectiveness of our proposed solvers.

keywords:

Block Toeplitz matrices, parallel-in-time,

\epsilon

-circulant matrices, preconditioning, all-at-once systems

{AMS}

65F08, 65F10, 65M22, 15B05

1 Introduction

Developing diagonalization-based parallel-in-time (PinT) for solving optimization problems constrained by partial differential equations (PDEs) has gained substantial attention in recently years. We refer to [19, 12, 27, 9] and the references therein for a comprehensive overview of these optimization problems. Various efficient PinT preconditioners have been proposed to solve all-at-once linear systems arising from time-dependent PDEs [23, 22, 17, 20, 15] and PDE-constrained optimal control problems [14, 29, 30, 31, 13, 8, 10]; see for a review paper on these diagonalization-based methods [21] and the references therein reported.

In this work, we consider solving the distributed optimal control problem constrained by a parabolic equation. Namely, the following quadratic cost functional is minimized:

(1)

\min_{y,u}~{}\mathcal{J}(y,u):=\frac{1}{2}\|y-g\|^{2}_{L^{2}(\Omega\times(0,T))}+\frac{\gamma}{2}\|u\|^{2}_{L^{2}(\Omega\times(0,T))},

constrained by a parabolic equation with certain initial and boundary conditions

(2)

\left\{\begin{array}[]{lc}y_{t}-\mathcal{L}y=f+u,\quad(x,t)\in\Omega\times(0,T],\qquad y=0,\quad(x,t)\in\partial\Omega\times(0,T],\\ y(x,0)=y_{0},\quad x\in\Omega,\end{array}\right.\,

where $u,g\in L^{2}$ are the distributed control and the targeted tracking trajectory, respectively, $\gamma>0$ is a regularization parameter, $\mathcal{L}=\nabla\cdot(a({x})\nabla)$ , and both $f$ and $y_{0}$ are given functions. The theoretical aspects of existence, uniqueness, and regularity of the solution, under suitable assumptions, were thoroughly studied in [19]. The optimal solution of (1) & (2) can be characterized by the following system:

(3)

\left\{\begin{array}[]{lc}y_{t}-\mathcal{L}y-\frac{1}{\gamma}p=f,\quad(x,t)\in\Omega\times(0,T],\qquad y=0,\quad(x,t)\in\partial\Omega\times(0,T],\\ y(x,0)=y_{0}\quad x\in\Omega,\\ -p_{t}-\mathcal{L}p+y=g,\quad(x,t)\in\Omega\times(0,T],\qquad p=0,\quad(x,t)\in\partial\Omega\times(0,T],\\ p(x,T)=0\quad x\in\Omega,\end{array}\right.\,

where the control variable $u$ has been eliminated.

Similar to [31, 30], we discretize (3) using the backward Euler method for time and a typical finite element method for space discretization, which gives

	$\displaystyle M_{m}\frac{\mathbf{y}_{m}^{(k+1)}-\mathbf{y}_{m}^{(k)}}{\tau}+K_{m}\mathbf{y}_{m}^{(k+1)}$	$\displaystyle=$	$\displaystyle M_{m}\left(\mathbf{f}_{m}^{(k+1)}+\frac{1}{\gamma}\mathbf{p}_{m}^{(k)}\right),$
	$\displaystyle-M_{m}\frac{\mathbf{p}_{m}^{(k+1)}-\mathbf{p}_{m}^{(k)}}{\tau}+K_{m}\mathbf{p}_{m}^{(k)}$	$\displaystyle=$	$\displaystyle M_{m}\left(\mathbf{g}_{m}^{(k)}-\mathbf{y}_{m}^{(k+1)}\right).$

Combining the given initial and boundary conditions, one needs to solve the following linear system

(4)

\widetilde{\mathcal{A}}\begin{bmatrix}\mathbf{y}\\ \mathbf{p}\end{bmatrix}=\begin{bmatrix}\mathbf{g}\\ \mathbf{f}\end{bmatrix},

where we have $\mathbf{y}=[\mathbf{y}_{m}^{(1)},\cdots,\mathbf{y}_{m}^{(n)}]^{\top}$ , $\mathbf{p}=[\mathbf{p}_{m}^{(0)},\cdots,\mathbf{p}_{m}^{(n-1)}]^{\top}$ ,

\mathbf{f}=\begin{bmatrix}M_{m}\tau\mathbf{f}_{m}^{(1)}+M_{m}\mathbf{y}_{m}^{(0)}\\ M_{m}\tau\mathbf{f}_{m}^{(2)}\\ \vdots\\ M_{m}\tau\mathbf{f}_{m}^{(n)}\\ \end{bmatrix},\quad\mathbf{g}=\tau\begin{bmatrix}M_{m}\mathbf{g}_{m}^{(0)}\\ M_{m}\mathbf{g}_{m}^{(1)}\\ \vdots\\ M_{m}\mathbf{g}_{m}^{(n-1)}\\ \end{bmatrix},

(5)

\displaystyle\widetilde{\mathcal{A}}

\displaystyle=

\displaystyle\begin{bmatrix}\tau I_{n}\otimes M_{m}&B_{n}^{\top}\otimes M_{m}+\tau I_{n}\otimes K_{m}\\ B_{n}\otimes M_{m}+\tau I_{n}\otimes K_{m}&-\frac{\tau}{\gamma}I_{n}\otimes M_{m}\end{bmatrix},

and the matrix $B_{n}\in\mathbb{R}^{n\times n}$ is as follows.

(6)

B_{n}=\begin{bmatrix}1&&&&\\ -1&1&&&\\ &-1&1&&\\ &&\ddots&\ddots&\\ &&&-1&1\end{bmatrix}.

The matrices $M_{m}$ and $K_{m}$ represent the mass matrix and the stiffness matrix, respectively. For the finite difference method, we have $M_{m}=I_{m}$ and $K_{m}=-L_{m}$ , where $-L_{m}$ is the discretization matrix of the negative Laplacian. Throughout, the following conditions on both $M_{m}$ and $K_{m}$ are assumed, which are in line with [17, Section 2]:

1.

The matrices $M_{m}$ and $K_{m}$ are (real) symmetric positive definite;
2.

The matrices $M_{m}$ and $K_{m}$ are sparse, having only $\mathcal{O}(m)$ nonzero elements;
3.

The matrix $M_{m}$ has a condition number uniformly upper bounded with respect to $m$ , i.e., there exists a positive finite constant $c_{0}$ independent of $m$ and $n$ such that

(7) $\sup\limits_{m>0}\kappa_{2}(M_{m})\leq c_{0}<+\infty,$

where $\kappa_{2}(M_{m}):=||M_{m}||_{2}||M_{m}^{-1}||_{2}$ denotes the condition number of $M_{m}$ .

In fact, these assumptions are easily met by applying typical finite element discretization to the spatial term $\mathcal{L}$ .

We then further transform (5) into the following equivalent system

(8)

{\mathcal{A}}\underbrace{\begin{bmatrix}\sqrt{\gamma}\mathbf{y}\\ \mathbf{p}\end{bmatrix}}_{:={\bf x}}=\underbrace{\begin{bmatrix}\mathbf{g}\\ -\sqrt{\gamma}\mathbf{f}\end{bmatrix}}_{:={\bf b}},

where

(9)		$\displaystyle{\mathcal{A}}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}\alpha{I}_{n}\otimes M_{m}&B_{n}^{\top}\otimes M_{m}+\tau{I}_{n}\otimes K_{m}\\ -(B_{n}\otimes M_{m}+\tau{I}_{n}\otimes K_{m})&\alpha{I}_{n}\otimes M_{m}\end{bmatrix}$
(9)			$\displaystyle=$	$\displaystyle\begin{bmatrix}\alpha{I}_{n}\otimes M_{m}&\mathcal{T}^{\top}\\ -\mathcal{T}&\alpha{I}_{n}\otimes M_{m}\end{bmatrix}.$

Note that $\alpha=\frac{\tau}{\sqrt{\gamma}}$ , $I_{k}$ denotes the $k\times k$ identity matrix, and

(10)

\mathcal{T}=B_{n}\otimes M_{m}+\tau{I}_{n}\otimes K_{m}.

In what follows, we will develop a preconditioned generalized minimal residual (GMRES) method for the nonsymmetric system (8).

For $\mathcal{A}$ , we first consider the following rotated block diagonal (RBD) preconditioner proposed in [4]:

(11)

\displaystyle\mathcal{P}=\frac{1}{2}\begin{bmatrix}\mathcal{T}^{\top}+\alpha{I}_{n}\otimes M_{m}&\\ &\mathcal{T}+\alpha{I}_{n}\otimes M_{m}\end{bmatrix}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix},

where $\mathcal{T}$ is defined in (10). As will be shown in Theorem 2.4, the matrix $\mathcal{P}$ can achieve an excellent preconditioning effect for $\mathcal{A}$ . Yet, $\mathcal{P}$ is in general computationally expensive to invert. Thus, we propose the following PinT preconditioner (or denoted by RBD $\epsilon$ -circulant preconditioner) as a modification of $\mathcal{P}$ , which can be efficiently implemented:

(12)

\displaystyle\mathcal{P}_{\epsilon}=\frac{1}{2}\begin{bmatrix}\mathcal{C}^{\top}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}&\\ &\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}\end{bmatrix}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix},

where

\displaystyle\mathcal{C}_{\epsilon}=C_{\epsilon,n}\otimes M_{m}+\tau{I}_{n}\otimes K_{m}.

Note that $C_{\epsilon,n}$ is defined as

(13)

\displaystyle C_{\epsilon,n}=\begin{bmatrix}1&&&&-\epsilon\\ -1&1&&&\\ &-1&1&&\\ &&\ddots&\ddots&\\ &&&-1&1\end{bmatrix},

and $\epsilon\in(0,1]$ . Since $C_{\epsilon,n}$ is an $\epsilon$ -circulant matrix, it admits the following decomposition [7, 6]

(14)

C_{\epsilon,n}=D_{\epsilon}^{-1}\mathbb{F}_{n}\Lambda_{\epsilon,n}\mathbb{F}_{n}^{*}D_{\epsilon},

where

D_{\epsilon}={\rm diag}\left(\epsilon^{\frac{i-1}{N}}\right)_{i=1}^{n},

\mathbb{F}_{n}=\frac{1}{\sqrt{n}}[\theta_{n}^{(i-1)(j-1)}]_{i,j=1}^{n}\quad\textrm{with}\quad\theta_{n}=\exp\left(\frac{2\pi{\bf i}}{n}\right),

and

\Lambda_{\epsilon,n}={\rm diag}(\lambda_{i-1}^{(\epsilon)})_{i=1}^{n}\quad\textrm{with}\quad\lambda_{k}^{(\epsilon)}=1-\epsilon^{\frac{1}{n}}\theta_{n}^{-k}.

Several efficient preconditioned iterative solvers have been proposed for (5). One notable class of such methods is the circulant-based preconditioners [31, 8, 10], whose main idea is to replace the Toeplitz matrix $B_{n}$ defined by (6) by a circulant or skew-circulant matrix in order to form an efficient preconditioner. Despite their numerically observed success, theoretically these preconditioners have not shown to be parameter-robust due to the low-rank approximation from the circulant-based approximation.

Another major existing method is preconditioning technique based on Schur complement approximations [30, 14, 18], which is a typical preconditioning approach in the context of preconditioning for saddle point systems (see, e.g., [26, 28]) and their effectiveness is based on approximating the Schur complements by multiplications of easily invertible matrices (e.g., [25, 14, 2, 24]). Recently, $\epsilon$ -circulant modifications have been introduced to such approximate-Schur-complement preconditioning techniques for the optimal control problems, which leads to a significant improvement on computational efficiency and parallelism along time dimension (see, e.g, [17, 20, 15, 16, 30, 21]).

Our proposed preconditioning method, while still based on $\epsilon$ -circulant matrices, diverges from the approximate-Schur-complement approach. Specifically, the Toeplitz matrix $B_{n}$ is formulated using an $\epsilon$ -circulant matrix as detailed in (13). An important advancement of our method is that it can achieve optimal convergence rate when $\epsilon=\mathcal{O}(\tau^{1/2})$ . This contrasts with $\epsilon$ -circulant modified-approximate-Schur-complement preconditioner, such as described in [18], which requires a more restrictive $\epsilon=\mathcal{O}(\tau^{2})$ . Additionally, while the other $\epsilon$ -circulant modified-approximate-Schur-complement approach in [30] also allows for $\epsilon=\mathcal{O}(\tau^{1/2})$ , it imposes the severe theoretical constraint that $n$ must be an even number, a limitation our method does not share.

Remark 1.1.

In addition to the RBD preconditioner of interest in this work, other effective preconditioners such as the preconditioned square block matrix [1] could potentially be incorporated with the $\epsilon$ -circulant preconditioning technique. We have chosen the RBD preconditioner due to its simple form, whose inversion only mainly determined by that of the block diagonal with $\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}$ and $\mathcal{C}^{\top}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}$ . Such an inversion can be straightforwardly implemented in a PinT way.

The paper is organized as follows. In Section 2, we provide our main results on the spectral analysis for our proposed preconditioners. Numerical examples are given in Section 3 for supporting the performance of our proposed preconditioner. As last, conclusions are given in Section 4.

2 Main Result

Before presenting our main preconditioning result, we introduce some useful notation in what follows.

Let

\mathcal{G}=\frac{1}{2}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix},\qquad\mathcal{H}_{\epsilon}=\begin{bmatrix}\mathcal{C}_{\epsilon}^{\top}+\alpha{I}_{n}\otimes M_{m}&\\ &\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}\end{bmatrix},

and

\mathcal{B}=\begin{bmatrix}\mathcal{T}^{\top}+\alpha{I}_{n}\otimes M_{m}&\mathcal{T}^{\top}-\alpha{I}_{n}\otimes M_{m}\\ -\mathcal{T}+\alpha{I}_{n}\otimes M_{m}&\mathcal{T}+\alpha{I}_{n}\otimes M_{m}\end{bmatrix}.

Then, it is straightforward to verify that

\mathcal{A}=\mathcal{B}\mathcal{G},\qquad\mathcal{P}_{\epsilon}=\mathcal{H}_{\epsilon}\mathcal{G}.

To check the invertibility of $\mathcal{P}_{\epsilon}$ , it suffices to verify the invertibility of $\mathcal{G}$ and $\mathcal{H}_{\epsilon}$ respectively. Clearly, $\mathcal{G}$ is invertible. By (14), it is straightforward to see that $\mathcal{C}_{\epsilon}+\alpha I_{n}\otimes M_{m}$ is similar to a block diagonal matrix with each diagonal block having the form $\alpha M_{m}+z_{1}I_{m}+z_{2}{\bf i}I_{m}$ ( $z_{1}\geq 0$ , $z_{2}\in\mathbb{R}$ ). Consequently, $\mathcal{C}_{\epsilon}+\alpha I_{n}\otimes M_{m}$ is invertible. Moreover, $\mathcal{C}_{\epsilon}^{\top}+\alpha{I}_{n}\otimes M_{m}$ as the transpose of $\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}$ is also invertible. Thus, $\mathcal{H}_{\epsilon}$ , as a block diagonal matrix with $\mathcal{C}_{\epsilon}^{\top}+\alpha{I}_{n}\otimes M_{m}$ and $\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}$ as its diagonal blocks, is also invertible. Therefore, $\mathcal{P}_{\epsilon}$ is invertible.

In this work, we propose the use of the matrix $\mathcal{P}_{\epsilon}$ to precondition the saddle point system described in (8). In other words, we shall apply the GMRES solver to solve the following preconditioned system

(15)

\mathcal{P}_{\epsilon}^{-1}\mathcal{A}{\bf x}=\mathcal{P}_{\epsilon}^{-1}{\bf b}.

We will focus on investigating the convergence rate of the GMRES solver for the preconditioned system mentioned above, by theoretically estimating a suitable choice of $\epsilon$ .

Denote

	$\displaystyle\mathcal{W}=\begin{bmatrix}I_{n}\otimes M_{m}&\\ &I_{n}\otimes M_{m}\end{bmatrix}\qquad\quad\widetilde{\mathcal{T}}=B_{n}\otimes I_{m}+\tau{I}_{n}\otimes(M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}}),$
	$\displaystyle\widetilde{\mathcal{B}}=\begin{bmatrix}\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn}&\widetilde{\mathcal{T}}^{\top}-\alpha{I}_{mn}\\ -\widetilde{\mathcal{T}}+\alpha{I}_{mn}&\widetilde{\mathcal{T}}+\alpha{I}_{mn}\end{bmatrix},\qquad\widetilde{\mathcal{C}}_{\epsilon}=C_{\epsilon,n}\otimes I_{m}+\tau{I}_{n}\otimes(M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}}),$
	$\displaystyle\quad\widetilde{\mathcal{H}}_{\epsilon}=\begin{bmatrix}\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn}&\\ &\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn}\end{bmatrix},\qquad\widetilde{\mathcal{P}}_{\epsilon}=\widetilde{\mathcal{H}}_{\epsilon}\mathcal{G}.$

It is then easy to see that

\mathcal{T}=\mathcal{W}^{\frac{1}{2}}\widetilde{\mathcal{T}}\mathcal{W}^{\frac{1}{2}},\quad\mathcal{B}=\mathcal{W}^{\frac{1}{2}}\widetilde{\mathcal{B}}\mathcal{W}^{\frac{1}{2}},\quad\mathcal{C}_{\epsilon}=\mathcal{W}^{\frac{1}{2}}\widetilde{\mathcal{C}}_{\epsilon}\mathcal{W}^{\frac{1}{2}},\quad\mathcal{H}_{\epsilon}=\mathcal{W}^{\frac{1}{2}}\widetilde{\mathcal{H}}_{\epsilon}\mathcal{W}^{\frac{1}{2}}.

With the equations above, it is straightforward to verify that (15) can be rewritten as

\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\mathcal{W}^{\frac{1}{2}}\mathcal{G}{\bf x}=\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\widetilde{\mathcal{H}}_{\epsilon}^{-1}\mathcal{W}^{-\frac{1}{2}}{\bf b},

meaning that (15) can be equivalently converted into the following system

(16)

\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\bf x}=\tilde{\bf b},

where $\tilde{\bf b}=\widetilde{\mathcal{H}}_{\epsilon}^{-1}\mathcal{W}^{-\frac{1}{2}}{\bf b}$ ; the unknown $\tilde{\bf x}$ in (16) and the unknown ${\bf x}$ in (15) are related by

\tilde{\bf x}=\mathcal{W}^{\frac{1}{2}}\mathcal{G}{\bf x}.

We will examine the theoretical convergence rate of the GMRES solver for the linear system (16) and subsequently relate it to the convergence rate of the GMRES solver for (15) using Theorem 2.2.

The convergence behavior of GMRES solver is closely related to the Krylov subspace. For a square matrix $\mathbf{E}\in\mathbb{R}^{k\times k}$ and a vector ${\bf y}\in\mathbb{R}^{k\times 1}$ , a Krylov subspace of degree $j\geq 1$ is defined as follows

\mathcal{K}_{j}(\mathbf{E},{\bf y}):=\text{span}\{{\bf y},\mathbf{E}{\bf y},\mathbf{E}^{2}{\bf y},\dots,\mathbf{E}^{j-1}{\bf y}\}.

For a set $\mathcal{S}$ and a point $z$ , we denote

z+\mathcal{S}:=\{z+y|y\in\mathcal{S}\}.

We recall the relation between the iterative solution by GMRES and the Krylov subspace in the following lemma.

Lemma 2.1.

For a non-singular $k\times k$ real linear system ${\bf Z}{\bf y}={\bf w}$ , let ${\bf y}_{j}$ be the GMRES iterative solution at the $j$ -th iteration $(j\geq 1)$ with ${\bf y}_{0}$ as an initial guess. Then, the $j$ -th iteration solution ${\bf y}_{j}$ minimize the residual error over the Krylov subspace $\mathcal{K}_{j}({\bf Z},{\bf r}_{0})$ with ${\bf r}_{0}={\bf w}-{\bf Z}{\bf y}_{0},$ i.e.,

\mathbf{y}_{j}=\underset{\mathbf{c}\in\mathbf{y}_{0}+\mathcal{K}_{j}\left(\mathbf{Z},\mathbf{r}_{0}\right)}{\arg\min}\|\mathbf{w}-\mathbf{Z}\mathbf{c}\|_{2}.

Theorem 2.2.

Let $\tilde{\mathbf{x}}_{0}$ be arbitrary initial guess for (16). Let $\mathbf{x}_{0}:=\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\mathbf{x}}_{0}$ be the initial guess for (15). Let $\mathbf{x}_{j}(\tilde{\mathbf{x}}_{j},\textnormal{respectively})$ be the j-th $(j\geq 1)$ iteration solution derived by applying GMRES solver to (15) $(\eqref{auxiliarysystem},\textnormal{respectively})$ with $\mathbf{u}_{0}(\tilde{\mathbf{u}}_{0},\textnormal{respectively})$ as an initial guess. Then,

\left\|\mathbf{r}_{j}\right\|_{2}\leq\sqrt{2}||\mathcal{W}^{-\frac{1}{2}}||_{2}||\tilde{\bf r}_{j}||_{2},\quad j=1,2,...,

where $\mathbf{r}_{j}:=\mathcal{P}_{\epsilon}^{-1}\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathbf{x}_{j}~{}(\tilde{\mathbf{r}}_{j}:=\tilde{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\mathbf{x}}_{j}$ , respectively $)$ denotes the residual vector at the $j$ -th GMRES iteration for (15) $($ (16), respectively $)$ .

Proof 2.3.

Note that

\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}-{\bf x}_{0}=\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}-\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{0}=\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}(\tilde{\bf x}_{j}-\tilde{\bf x}_{0}).

By definition of $\tilde{\bf x}_{j}$ and $\tilde{\bf x}_{0}$ , we know that $\tilde{\bf x}_{j}-\tilde{\bf x}_{0}\in\mathcal{K}_{j}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}},\tilde{\mathbf{r}}_{0})$ with $\tilde{\mathbf{r}}_{0}=\tilde{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\mathbf{x}}_{0}$ , meaning that there exists real numbers $\eta_{0},\eta_{1},...,\eta_{j-1}$ such that

	$\displaystyle\tilde{\bf x}_{j}-\tilde{\bf x}_{0}$	$\displaystyle=\sum\limits_{k=0}^{j-1}\eta_{k}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}})^{k}\tilde{\mathbf{r}}_{0}$
		$\displaystyle=\sum\limits_{k=0}^{j-1}\eta_{k}\mathcal{W}^{\frac{1}{2}}\mathcal{G}(\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\mathcal{W}^{\frac{1}{2}}\mathcal{G})^{k}\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\mathbf{r}}_{0}$
		$\displaystyle=\sum\limits_{k=0}^{j-1}\eta_{k}\mathcal{W}^{\frac{1}{2}}\mathcal{G}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A})^{k}\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\mathbf{r}}_{0}$
		$\displaystyle=\sum\limits_{k=0}^{j-1}\eta_{k}\mathcal{W}^{\frac{1}{2}}\mathcal{G}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A})^{k}(\mathcal{P}_{\epsilon}^{-1}{\bf b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}{\bf x}_{0})=\mathcal{W}^{\frac{1}{2}}\mathcal{G}\sum\limits_{k=0}^{j-1}\eta_{k}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A})^{k}{\bf r}_{0}.$

Therefore, we have $\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}(\tilde{\bf x}_{j}-\tilde{\bf x}_{0})\in\mathcal{K}_{j}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A},{\bf r}_{0})$ . Hence,

\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}-{\bf x}_{0}\in\mathcal{K}_{j}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A},{\bf r}_{0}).

In other words, $\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}\in{\bf x}_{0}+\mathcal{K}_{j}(\mathcal{P}_{\epsilon}^{-1}\mathcal{A},{\bf r}_{0})$ . Then, Lemma 2.1 implies that

	$\displaystyle\|\|\mathbf{r}_{j}\|\|_{2}$	$\displaystyle=\|\|\mathcal{P}_{\epsilon}^{-1}\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathbf{x}_{j}\|\|_{2}$
		$\displaystyle\leq\|\|\mathcal{P}_{\epsilon}^{-1}\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}\|\|_{2}$
		$\displaystyle=\|\|\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\mathcal{W}^{-\frac{1}{2}}{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\bf x}_{j})\|\|_{2}$
		$\displaystyle=\|\|\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf r}_{j}\|\|_{2}\leq\|\|\mathcal{G}^{-1}\|\|_{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\|\|\tilde{\bf r}_{j}\|\|_{2}=\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\|\|\tilde{\bf r}_{j}\|\|_{2}.$

The proof is complete.

According to Theorem 2.2, any convergence rate estimation for the GMRES solver applied to (16) is also applicable to (15). Namely, the GMRES convergence for (15) is guaranteed by its convergence for (16). As a result, in the following discussion, we will focus on investigating the convergence of the GMRES solver for (16) by first examining the spectrum of its coefficient matrix, $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}$ .

Denote

\widetilde{\mathcal{H}}=\begin{bmatrix}\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn}&\\ &\widetilde{\mathcal{T}}+\alpha{I}_{mn}\end{bmatrix}.

Clearly, $\widetilde{\mathcal{H}}_{\epsilon}$ is an approximation to $\widetilde{\mathcal{H}}$ . In the next, we will study the spectrum of $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ and then relate the spectrum of $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}$ to the spectrum of $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ by the fact $\lim\limits_{\epsilon\rightarrow 0^{+}}\widetilde{\mathcal{H}}_{\epsilon}=\widetilde{\mathcal{H}}$ .

The following theorem shows that $\widetilde{\mathcal{H}}$ is an ideal preconditioner for $\widetilde{\mathcal{B}}$ . We remark that the proof mirrors that of [4, Theorem 4.1], with modifications. However, for self-containment, we decide to provide a complete proof below.

Denote by $\sigma({\bf C})$ , the spectrum of a square matrix ${\bf C}$ .

Theorem 2.4.

The matrix $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ is unitarily diagonalizable and its eigenvalues satisfy $\sigma(\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}})\subseteq\{1+{\bf i}x|x\in[-1,1]\}$ , where $\mathbf{i}=\sqrt{-1}$ is the imaginary unit.

Proof 2.5.

Consider

			$\displaystyle\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn}&\\ &\widetilde{\mathcal{T}}+\alpha{I}_{mn}\end{bmatrix}^{-1}\begin{bmatrix}\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn}&\widetilde{\mathcal{T}}^{\top}-\alpha{I}_{mn}\\ -\widetilde{\mathcal{T}}+\alpha{I}_{mn}&\widetilde{\mathcal{T}}+\alpha{I}_{mn}\end{bmatrix}$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}I_{mn}&(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})\\ (\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}(-\widetilde{\mathcal{T}}+\alpha I_{mn})&I_{mn}\end{bmatrix}$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}I_{mn}&(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}\\ (\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}(-\widetilde{\mathcal{T}}+\alpha I_{mn})&I_{mn}\end{bmatrix},$

where the last equality comes from the following fact

			$\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})$
		$\displaystyle=$	$\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})I_{mn}$
		$\displaystyle=$	$\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})[(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}]$
		$\displaystyle=$	$\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}[(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})](\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}$
		$\displaystyle=$	$\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}[(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})](\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}$
		$\displaystyle=$	$\displaystyle[(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})](\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}$
		$\displaystyle=$	$\displaystyle(\widetilde{\mathcal{T}}^{\top}-\alpha I_{mn})(\widetilde{\mathcal{T}}^{\top}+\alpha I_{mn})^{-1}.$

Considering $\widetilde{\mathcal{E}}=(\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}(-\widetilde{\mathcal{T}}+\alpha I_{mn})$ with its singular value decomposition, namely, $\widetilde{\mathcal{E}}=\mathcal{U}\Sigma\mathcal{V}^{*}$ . Then, we can further decompose $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ as follows:

\displaystyle\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}

\displaystyle=

\displaystyle\begin{bmatrix}I_{mn}&-\widetilde{\mathcal{E}}^{\top}\\ \widetilde{\mathcal{E}}&I_{mn}\end{bmatrix}=\begin{bmatrix}\mathcal{V}&\\ &\mathcal{U}\end{bmatrix}\begin{bmatrix}I_{mn}&-\Sigma^{*}\\ \Sigma&I_{mn}\end{bmatrix}\begin{bmatrix}\mathcal{V}^{*}&\\ &\mathcal{U}^{*}\end{bmatrix}.

Denote by

\displaystyle\mathcal{Q}=\frac{1}{\sqrt{2}}\begin{bmatrix}I_{mn}&I_{mn}\\ -\mathbf{i}I_{mn}&\mathbf{i}I_{mn}\end{bmatrix},

where $\mathbf{i}=\sqrt{-1}$ is the imaginary unit. Note that $\mathcal{Q}$ is a unitary matrix. It holds that

\displaystyle\begin{bmatrix}I_{mn}&-\Sigma^{*}\\ \Sigma&I_{mn}\end{bmatrix}

\displaystyle=

\displaystyle\mathcal{Q}\begin{bmatrix}I_{mn}+\mathbf{i}\Sigma&\\ &I_{mn}-\mathbf{i}\Sigma\end{bmatrix}\mathcal{Q}^{*}.

Denote $\mathcal{X}=\begin{bmatrix}\mathcal{V}&\\ &\mathcal{U}\end{bmatrix}\mathcal{Q}$ . Consequently, it follows that

	$\displaystyle\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}\mathcal{V}&\\ &\mathcal{U}\end{bmatrix}\mathcal{Q}\begin{bmatrix}I_{mn}+\mathbf{i}\Sigma&\\ &I_{mn}-\mathbf{i}\Sigma\end{bmatrix}\mathcal{Q}^{}\begin{bmatrix}\mathcal{V}^{}&\\ &\mathcal{U}^{*}\end{bmatrix}\mathcal{Q}$
		$\displaystyle=$	$\displaystyle\mathcal{X}\begin{bmatrix}I_{mn}+\mathbf{i}\Sigma&\\ &I_{mn}-\mathbf{i}\Sigma\end{bmatrix}\mathcal{X}^{*}$

Clearly, the eigenvalues of $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ locates in $\{1+{\bf i}x|x\in[-||\widetilde{\mathcal{E}}||_{2},||\widetilde{\mathcal{E}}||_{2}]\}\subset\mathbb{C}$ . As the matrix $\widetilde{\mathcal{T}}$ is positive definite (i.e., $\widetilde{\mathcal{T}}+\widetilde{\mathcal{T}}^{\top}$ is Hermitian positive definite), we know from [3] that $\|\widetilde{\mathcal{E}}\|_{2}<1$ . Therefore,

\sigma(\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}})\subseteq\{1+{\bf i}x|x\in[-||\widetilde{\mathcal{E}}||_{2},||\widetilde{\mathcal{E}}||_{2}]\}\subseteq\{1+{\bf i}x|x\in[-1,1]\}.

Moreover, it is trivial to see that $\mathcal{X}$ is unitary. Hence, $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ is unitarily diagonalizable. The proof is complete.

It is straightforward to verify that $\widetilde{\mathcal{T}}+\alpha I_{mn}$ can be rewritten as

\widetilde{\mathcal{T}}+\alpha I_{mn}=\left[\begin{array}[c]{cccc}T_{0}&&&\\ -I_{m}&T_{0}&&\\ &\ddots&\ddots&\\ &&-I_{m}&T_{0}\end{array}\right],\quad T_{0}=(1+\alpha)I_{m}+\tau M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}},

and that $(\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}$ has the following expression

(21)

\displaystyle(\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}=\left[\begin{array}[c]{cccc}T_{0}^{-1}&&&\\ T_{0}^{-2}&T_{0}^{-1}&&\\ \vdots&\ddots&\ddots&\\ T_{0}^{-n}&\ldots&T_{0}^{-2}&T_{0}^{-1}\end{array}\right].

Denote

\mathcal{Z}_{\epsilon}:=\epsilon^{-1}(I_{m}-\epsilon T_{0}^{-n}).

Lemma 2.6.

For $\epsilon\in(0,1]$ , both $\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn}$ and $\mathcal{Z}_{\epsilon}$ are invertible. Also, we have

\displaystyle(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})=I_{mn}+(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}.

Proof 2.7.

It is clear that

	$\displaystyle\sigma(T_{0})$	$\displaystyle=$	$\displaystyle\sigma\left(\left(\alpha+1\right)I_{m}+\tau M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}}\right)$
		$\displaystyle\subset$	$\displaystyle(1,+\infty).$

Hence, $\sigma(T_{0}^{-1})\subset(0,1)$ . Using $\epsilon\in(0,1]$ , we have $\sigma(I_{m}-\epsilon T_{0}^{-n})\subset(0,1)$ , implying $\mathcal{Z}_{\epsilon}$ is invertible.

Then, by (21), we see that

\epsilon[I_{m}-\epsilon E_{n}^{\top}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}]^{-1}=\mathcal{Z}_{\epsilon}^{-1}.

Note that $\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn}=\widetilde{\mathcal{T}}+\alpha{I}_{mn}-\epsilon E_{1}E_{n}^{\top}$ , where $E_{i}=e_{i}\otimes I_{m}$ with $e_{i}$ denoting the $i$ -th column of $I_{n}$ and $\otimes$ denoting the Kronecker product. By the Sherman–Morrison–Woodbury formula, we have

\displaystyle(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}=(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}+(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}.

Therefore,

(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})=I_{mn}+(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}.

The proof is complete.

Remark 2.8.

Similar to Lemma 2.6, one can also show that the matrix $\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn}$ is also invertible for $\epsilon\in(0,1]$ and

\displaystyle(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})=I_{mn}+(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})^{-1}E_{n}\mathcal{Z}_{\epsilon}^{-1}E_{1}^{\top}.

Now, $\widetilde{\mathcal{H}}_{\epsilon}={\rm blockdiag}(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn},\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})$ , which is clearly invertible.

Denote by $\lambda_{\min}({\bf C})$ , the minimum eigenvalue of a Hermitian matrix ${\bf C}$ .

Theorem 2.9.

The following statements regarding $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}$ hold.

(i)

rank $(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn})$ = $2m$ . Furthermore, $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}$ has exactly $2(n-1)m$ many eigenvalues equal to $1$ .
(ii)

Given any constant $\eta\in(0,1)$ , for $\epsilon\in(0,\eta]$ , we have

$\max_{\lambda\in\sigma(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}})}|\lambda-1|\leq\frac{\epsilon}{1-\eta}.$

Proof 2.10.

Using Lemma 2.6 and Remark 2.8, we have

	$\displaystyle\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}$	$\displaystyle=\begin{bmatrix}(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})&\\ &(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})\end{bmatrix}-I_{2mn}$
		$\displaystyle=\begin{bmatrix}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})^{-1}E_{n}\mathcal{Z}_{\epsilon}^{-1}E_{1}^{\top}&\\ &(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}\end{bmatrix}.$

Since $\textrm{rank}\left((\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})^{-1}E_{n}\mathcal{Z}_{\epsilon}^{-1}E_{1}^{\top}\right)=m$ and $\textrm{rank}\left((\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}\right)=m$ , we know that $\textrm{rank}\left(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}\right)=2m$ . Moreover, it is straightforward to check that

\displaystyle(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})^{-1}E_{n}\mathcal{Z}_{\epsilon}^{-1}E_{1}^{\top}

\displaystyle=\begin{bmatrix}T_{o}^{-n}\mathcal{Z}_{\epsilon}^{-1}&&&\\ T_{o}^{-n+1}\mathcal{Z}_{\epsilon}^{-1}&&&\\ \vdots&&&\\ T_{o}^{-1}\mathcal{Z}_{\epsilon}^{-1}&&&\end{bmatrix},

(22)

\displaystyle(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}

\displaystyle=\begin{bmatrix}&&&T_{0}^{-1}\mathcal{Z}_{\epsilon}^{-1}\\ &&&T_{0}^{-2}\mathcal{Z}_{\epsilon}^{-1}\\ &&&\vdots\\ &&&T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1}\end{bmatrix}.

Then, we see that $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}$ has exactly $2(n-1)m$ many zero eigenvalues and thus $\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}$ has exactly $2(n-1)m$ many eigenvalues equal to $1$ .

From the discussion above, we know that

			$\displaystyle\sigma(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}})$
		$\displaystyle=$	$\displaystyle\{1\}\cup\sigma(I_{mn}+(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top})\cup\sigma(I_{mn}+(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})^{-1}E_{n}\mathcal{Z}_{\epsilon}^{-1}E_{1}^{\top})$
		$\displaystyle=$	$\displaystyle\{1\}\cup\sigma(I_{m}+T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1}).$

Then,

\displaystyle\max_{\lambda\in\sigma(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}})}|\lambda-1|

\displaystyle=\max_{\lambda\in\sigma(T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1})}|\lambda|.

Recall that $T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1}=\epsilon T_{0}^{-n}(I_{m}-\epsilon T_{0}^{-n})^{-1}$ and $T_{0}=(1+\alpha)I_{m}+\tau M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}}$ is a Hermitian positive definite matrix with $\lambda_{\min}(T_{0})>1$ . Therefore,

	$\displaystyle\max_{\lambda\in\sigma(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}})}\|\lambda-1\|$	$\displaystyle=\max_{\lambda\in\sigma(T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1})}\|\lambda\|$
		$\displaystyle=\max_{\lambda\in\sigma(T_{0}^{-1})}\left\|\frac{\epsilon\lambda^{n}}{1-\epsilon\lambda^{n}}\right\|$
		$\displaystyle=\max_{\lambda\in\sigma(T_{0}^{-1})}\frac{\epsilon\lambda^{n}}{1-\epsilon\lambda^{n}}\leq\sup\limits_{\lambda\in(0,1)}\frac{\epsilon\lambda^{n}}{1-\epsilon\lambda^{n}}\leq\frac{\epsilon}{1-\epsilon}\leq\frac{\epsilon}{1-\eta}.$

The proof is complete.

Lemma 2.11.

Given any constant $\eta\in(0,1)$ , for $\epsilon\in(0,\eta]$ , we have

\|(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}\|_{2}\leq\frac{\epsilon\sqrt{n}}{1-\eta}.

Proof 2.12.

From the proof of (22), we see that

	$\displaystyle(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}$	$\displaystyle=(\widetilde{\mathcal{T}}+\alpha{I}_{mn})^{-1}E_{1}\mathcal{Z}_{\epsilon}^{-1}E_{n}^{\top}$
		$\displaystyle=\begin{bmatrix}&&&T_{0}^{-1}\mathcal{Z}_{\epsilon}^{-1}\\ &&&T_{0}^{-2}\mathcal{Z}_{\epsilon}^{-1}\\ &&&\vdots\\ &&&T_{0}^{-n}\mathcal{Z}_{\epsilon}^{-1}\end{bmatrix}.$

Recall that $T_{0}=(1+\alpha)I_{m}+\tau M_{m}^{-\frac{1}{2}}K_{m}M_{m}^{-\frac{1}{2}}$ is Hermitian positive definite. Let $T_{0}^{-1}=\mathcal{Q}_{m}\Lambda\mathcal{Q}_{m}^{\top}$ be the orthogonal diagonalization of $T_{0}^{-1}$ with $\mathcal{Q}_{m}\in\mathbb{R}^{m\times m}$ being an orthogonal matrix and $\Lambda\in\mathbb{R}^{m\times m}$ being a diagonal matrix with positive diagonal entries. Recall that $\mathcal{Z}_{\epsilon}^{-1}=\epsilon(I_{m}-\epsilon T_{0}^{-n})^{-1}$ . Then, one can further decompose $(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}$ as follows

			$\displaystyle(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}$
		$\displaystyle=$	$\displaystyle\epsilon(I_{n}\otimes\mathcal{Q}_{m})\begin{bmatrix}&&&\Lambda^{1}(I_{m}-\epsilon\Lambda^{n})^{-1}\\ &&&\Lambda^{2}(I_{m}-\epsilon\Lambda^{n})^{-1}\\ &&&\vdots\\ &&&\Lambda^{n}(I_{m}-\epsilon\Lambda^{n})^{-1}\end{bmatrix}(I_{n}\otimes\mathcal{Q}_{m}^{\top}).$

Then, by rewriting $\Lambda={\rm diag}(\lambda_{i})_{i=1}^{m}$ , we have

			$\displaystyle\\|(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\epsilon\sqrt{\left\\|\sum_{j=1}^{n}(\Lambda^{j}(I_{m}-\epsilon\Lambda^{n})^{-1})^{2}\right\\|_{2}}$
		$\displaystyle=$	$\displaystyle\epsilon\sqrt{\max_{1\leq k\leq m}\sum_{j=1}^{n}\left(\frac{\lambda_{k}^{j}}{1-\epsilon\lambda_{k}^{n}}\right)^{2}}\leq\epsilon\sqrt{\sup_{\lambda\in(0,1)}\sum_{j=1}^{n}\left(\frac{\lambda^{j}}{1-\epsilon\lambda^{n}}\right)^{2}},$

where the last inequality comes from the fact that $\sigma(T_{0}^{-1})\subset(0,1)$ . Since the functions $g_{j}(x)=\frac{x^{j}}{1-\epsilon x^{n}}$ are monotonically increasing on $x\in[0,1]$ for each $j=1,2,...,n$ , we have

$\displaystyle\\|(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}\\|_{2}$	$\displaystyle\leq$	$\displaystyle\epsilon\sqrt{\sum_{j=1}^{n}\frac{1}{(1-\epsilon)^{2}}}$
	$\displaystyle=$	$\displaystyle\frac{\epsilon\sqrt{n}}{1-\epsilon}$
	$\displaystyle\leq$	$\displaystyle\frac{\epsilon\sqrt{n}}{1-\eta}.$

The proof is complete.

Remark 2.13.

Note that the result and the proof of $\|(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})\|_{2}$ is the same as Lemma 2.11. Given any constant $\eta\in(0,1)$ , for any $\epsilon\in(0,\eta]$ , we have

\|(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})-I_{mn}\|_{2}\leq\frac{\epsilon\sqrt{n}}{1-\eta}.

Lemma 2.14.

Given any $\eta\in(0,1)$ , choose $\epsilon\in(0,\eta]$ . Then,

\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\|_{2}\leq\sqrt{2}\left(1+\frac{\epsilon\sqrt{n}}{1-\eta}\right).

Proof 2.15.

$\displaystyle\\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\\|_{2}$	$\displaystyle=$	$\displaystyle\\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}\\|_{2}\\|\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle(\\|I_{2mn}\\|_{2}+\\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}\\|_{2})\\|\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}\\|_{2}.$

Since

			$\displaystyle\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})-I_{mn}&\\ &(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}\end{bmatrix},$

Lemma 2.11 and Remark 2.13 immediately induce that

\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn}\|_{2}\leq\frac{\epsilon\sqrt{n}}{1-\eta}.

Therefore,

\displaystyle\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\|_{2}

\displaystyle\leq

\displaystyle\left(1+\frac{\epsilon\sqrt{n}}{1-\eta}\right)\|\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}\|_{2}.

From Theorem 2.4, we know that $\sigma(\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}})\subseteq\{1+{\bf i}x|x\in[-1,1]\}$ and that $\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$ is unitarily diagonalizable, which implies $\|\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}\|_{2}\leq\sqrt{2}$ . Therefore,

\displaystyle\|\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\|_{2}

\displaystyle\leq

\displaystyle\sqrt{2}\left(1+\frac{\epsilon\sqrt{n}}{1-\eta}\right).

The proof is complete.

For any matrix $\mathcal{K}\in\mathbb{R}^{m\times m}$ , denote

\displaystyle\mathbb{H}(\mathcal{K}):=\frac{\mathcal{K}+\mathcal{K}^{\top}}{2}.

Lemma 2.16.

Given $\eta\in(0,1)$ , for any $\epsilon\in(0,\eta]$ , we have

\|\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}-I_{2mn})\|_{2}\leq\frac{2\epsilon\sqrt{n}}{1-\eta}.

Proof 2.17.

Denote

\mathcal{R}_{1}=(\widetilde{\mathcal{C}}_{\epsilon}^{\top}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}^{\top}+\alpha{I}_{mn})-I_{mn},\quad\mathcal{R}_{2}=(\widetilde{\mathcal{C}}_{\epsilon}+\alpha{I}_{mn})^{-1}(\widetilde{\mathcal{T}}+\alpha{I}_{mn})-I_{mn}.

Recall that $\widetilde{\mathcal{E}}=(\widetilde{\mathcal{T}}+\alpha I_{mn})^{-1}(-\widetilde{\mathcal{T}}+\alpha I_{mn})$ defined in proof of Theorem 2.4.

By the proof of Theorem 2.4 and the proof of Lemma 2.11, we know that

	$\displaystyle\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}$	$\displaystyle=\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}\widetilde{\mathcal{H}}^{-1}\widetilde{\mathcal{B}}$
		$\displaystyle=(I_{2mn}+\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{H}}-I_{2mn})\begin{bmatrix}I_{mn}&-\widetilde{\mathcal{E}}^{\top}\\ \widetilde{\mathcal{E}}&I_{mn}\end{bmatrix}$
		$\displaystyle=\begin{bmatrix}I_{mn}+\mathcal{R}_{1}&\\ &I_{mn}+\mathcal{R}_{2}\end{bmatrix}\begin{bmatrix}I_{mn}&-\widetilde{\mathcal{E}}^{\top}\\ \widetilde{\mathcal{E}}&I_{mn}\end{bmatrix}$
		$\displaystyle=\begin{bmatrix}I_{mn}+\mathcal{R}_{1}&-(I_{mn}+\mathcal{R}_{1})\widetilde{\mathcal{E}}^{\top}\\ (I_{mn}+\mathcal{R}_{2})\widetilde{\mathcal{E}}&I_{mn}+\mathcal{R}_{2}\end{bmatrix}.$

By simple calculations, we have

			$\displaystyle\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}-I_{2mn})$
		$\displaystyle=$	$\displaystyle\frac{1}{2}\begin{bmatrix}\mathcal{R}_{1}+\mathcal{R}_{2}^{\top}&(\mathcal{R}_{2}\widetilde{\mathcal{E}}-\widetilde{\mathcal{E}}\mathcal{R}_{1}^{\top})^{\top}\\ \mathcal{R}_{2}\widetilde{\mathcal{E}}-\widetilde{\mathcal{E}}\mathcal{R}_{1}^{\top}&\mathcal{R}_{2}+\mathcal{R}_{2}^{\top}\end{bmatrix}$
		$\displaystyle=$	$\displaystyle\frac{1}{2}\begin{bmatrix}\mathcal{R}_{1}+\mathcal{R}_{2}^{\top}&\\ &\mathcal{R}_{2}+\mathcal{R}_{2}^{\top}\end{bmatrix}+\frac{1}{2}\begin{bmatrix}&(\mathcal{R}_{2}\widetilde{\mathcal{E}}-\widetilde{\mathcal{E}}\mathcal{R}_{1}^{\top})^{\top}\\ \mathcal{R}_{2}\widetilde{\mathcal{E}}-\widetilde{\mathcal{E}}\mathcal{R}_{1}^{\top}&\end{bmatrix}.$

Using the result of Lemma 2.11, we have

			$\displaystyle\frac{1}{2}\left\\|\begin{bmatrix}\mathcal{R}_{1}+\mathcal{R}_{1}^{\top}&\\ &\mathcal{R}_{2}+\mathcal{R}_{2}^{\top}\end{bmatrix}\right\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\frac{1}{2}\max\{\\|\mathcal{R}_{1}+\mathcal{R}_{1}^{\top}\\|_{2},\\|\mathcal{R}_{2}+\mathcal{R}_{2}^{\top}\\|_{2}\}$
		$\displaystyle\leq$	$\displaystyle\frac{1}{2}\max\{\\|\mathcal{R}_{1}\\|_{2}+\\|\mathcal{R}_{1}^{\top}\\|_{2},\\|\mathcal{R}_{2}\\|_{2}+\\|\mathcal{R}_{2}^{\top}\\|_{2}\}$
		$\displaystyle=$	$\displaystyle\max\{\\|\mathcal{R}_{1}\\|_{2},\\|\mathcal{R}_{2}\\|_{2}\}$
		$\displaystyle\leq$	$\displaystyle\frac{\epsilon\sqrt{n}}{1-\eta}.$

Note that $\|\widetilde{\mathcal{E}}\|_{2}<1$ from the result in [3]. Then,

			$\displaystyle\frac{1}{2}\left\\|\begin{bmatrix}&(\mathcal{R}_{2}\mathcal{E}-\mathcal{E}\mathcal{R}_{1}^{\top})^{\top}\\ \mathcal{R}_{2}\mathcal{E}-\mathcal{E}\mathcal{R}_{1}^{\top}&\end{bmatrix}\right\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\frac{1}{2}\max\{\\|(\mathcal{R}_{2}\mathcal{E}-\mathcal{E}\mathcal{R}_{1}^{\top})^{\top}\\|_{2},\\|\mathcal{R}_{2}\mathcal{E}-\mathcal{E}\mathcal{R}_{1}^{\top}\\|_{2}\}$
		$\displaystyle=$	$\displaystyle\frac{1}{2}\\|\mathcal{R}_{2}\mathcal{E}-\mathcal{E}\mathcal{R}_{1}^{\top}\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\frac{1}{2}(\\|\mathcal{R}_{2}\\|_{2}\\|\mathcal{E}\\|_{2}+\\|\mathcal{E}\\|_{2}\\|\mathcal{R}_{1}\\|_{2})$
		$\displaystyle\leq$	$\displaystyle\frac{1}{2}(\\|\mathcal{R}_{2}\\|_{2}\\|\widetilde{\mathcal{E}}\\|_{2}+\\|\widetilde{\mathcal{E}}\\|_{2}\\|\mathcal{R}_{1}\\|_{2})$
		$\displaystyle<$	$\displaystyle\frac{1}{2}\left(\frac{\epsilon\sqrt{n}}{1-\eta}+\frac{\epsilon\sqrt{n}}{1-\eta}\right)$
		$\displaystyle=$	$\displaystyle\frac{\epsilon\sqrt{n}}{1-\eta}.$

Therefore, it is clear that

\|\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}-I_{2mn})\|_{2}\leq\frac{2\epsilon\sqrt{n}}{1-\eta}.

The proof is complete.

Denote $\mathcal{O}$ the zero matrix of appropriate dimensions.

Lemma 2.18.

[5] Let $\Xi\mathbf{q}=\mathbf{w}$ be a real square linear system with $\mathcal{H}(\Xi)\succ\mathcal{O}$ . Then, the residuals of the iterates generated by applying GMRES to solving $\Xi\mathbf{q}=\mathbf{w}$ satisfy

\|\mathbf{r_{k}}\|_{2}\leq\left(1-\frac{\lambda_{\textrm{min}}(\mathbb{H}(\Xi))^{2}}{\|\Xi\|_{2}^{2}}\right)^{k/2}\|\mathbf{r_{0}}\|_{2},

where $\mathbf{r_{k}}=\mathbf{w}-\Xi\mathbf{q_{k}}$ with $\mathbf{q_{k}}~{}(k\geq 1)$ being the $k$ -th GMRES iteration solution and $\mathbf{q_{0}}$ being an arbitrary initial guess.

Lemma 2.19.

Given $\delta\in(0,1)$ , for any $\epsilon\in(0,c_{\tau}]$ , where $c_{\tau}:=\frac{\delta\sqrt{\tau}}{\delta\sqrt{\tau}+2\sqrt{T}}$ , the residuals of the iterates generated by applying GMRES to solving the auxiliary linear system (16) satisfy

\|\tilde{\mathbf{r}}_{k}\|_{2}\leq\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\|\tilde{\mathbf{r}}_{0}\|_{2},

where $\tilde{\mathbf{r}}_{k}=\tilde{\mathbf{b}}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\mathbf{x}}_{k}$ with $\tilde{\mathbf{x}}_{k}~{}(k\geq 1)$ being the $k$ -th GMRES iteration solution and $\tilde{\mathbf{x}}_{0}$ being an arbitrary initial guess.

Proof 2.20.

By Lemma 2.16, we have

$\displaystyle\\|\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}-I_{2mn})\\|_{2}$	$\displaystyle\leq$	$\displaystyle\frac{2\epsilon\sqrt{n}}{1-c_{\tau}}$
	$\displaystyle=$	$\displaystyle\frac{\epsilon\delta}{c_{\tau}}$
	$\displaystyle\leq$	$\displaystyle\delta.$

Then,

$\displaystyle\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}})$	$\displaystyle=$	$\displaystyle I_{2mn}+\mathcal{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}-I_{2mn})$
	$\displaystyle\succeq$	$\displaystyle(1-\delta)I_{2mn}$
	$\displaystyle\succ$	$\displaystyle\mathcal{O}.$

Since $\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}})\succ\mathcal{O}$ , Lemma 2.18 is applicable to the preconditioned system (16). From (2.20), it is clear that

\displaystyle\lambda_{\textrm{min}}\left(\mathbb{H}\left(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\right)\right)^{2}\geq(1-\delta)^{2}.

By Lemma 2.14, we have

	$\displaystyle\\|\mathbb{H}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}})\\|_{2}$	$\displaystyle\leq$	$\displaystyle\sqrt{2}\left(1+\frac{\epsilon\sqrt{n}}{1-c_{\tau}}\right)$
		$\displaystyle\leq$	$\displaystyle\sqrt{2}\left(1+\frac{\delta}{2}\right).$

Then, Lemma 2.18 gives

	$\displaystyle\\|\tilde{\mathbf{r}}_{k}\\|_{2}$	$\displaystyle\leq$	$\displaystyle\left(1-\frac{(1-\delta)^{2}}{\left(\sqrt{2}\left(1+\frac{\delta}{2}\right)\right)^{2}}\right)^{k/2}\\|\tilde{\mathbf{r}}_{0}\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\tilde{\mathbf{r}}_{k}\\|_{2}.$

The proof is complete.

Theorem 2.21.

\|\mathbf{r}_{k}\|_{2}\leq\sqrt{c_{0}}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\|\mathbf{r}_{0}\|_{2},

where $c_{0}$ defined in (7) is a positive constant independent of the matrix size, $\mathbf{r}_{k}=\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathbf{x}_{k}~{}(k\geq 0)$ with $\mathbf{x}_{k}~{}(k\geq 1)$ being the $k$ -th GMRES iteration solution and $\mathbf{x}_{0}$ being an arbitrary initial guess.

Proof 2.22.

Let $\tilde{\mathbf{x}}_{0}=\mathcal{W}^{\frac{1}{2}}\mathcal{G}\mathbf{x}_{0}$ . Then, it holds that $\mathbf{x}_{0}=\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\mathbf{x}}_{0}$ . Take $\tilde{\mathbf{x}}_{0}$ as an initial guess of GMRES solver for solving the auxiliary linear system (16). Then, according to Theorem 2.2, we see that

\left\|\mathbf{r}_{k}\right\|_{2}\leq\sqrt{2}||\mathcal{W}^{-\frac{1}{2}}||_{2}||\tilde{\bf r}_{k}||_{2},\quad k=1,2,...,

where $\tilde{\mathbf{r}}_{k}:=\tilde{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\mathbf{x}}_{k}$ denotes the residual vector at the $k$ -th GMRES iteration for solving (16) and $\tilde{\mathbf{x}}_{k}$ denotes the $k$ -th iterative solution of GMRES solver for solving (16). According to Lemma 2.19, we can further estimate $||\tilde{\bf r}_{k}||_{2}$ as

\|\tilde{\mathbf{r}}_{k}\|_{2}\leq\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\|\tilde{\mathbf{r}}_{0}\|_{2},

where $\tilde{\mathbf{r}}_{0}:=\tilde{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\mathbf{x}}_{0}$ denotes the initial residual vector. Therefore,

\left\|\mathbf{r}_{k}\right\|_{2}\leq\sqrt{2}||\mathcal{W}^{-\frac{1}{2}}||_{2}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\|\tilde{\mathbf{r}}_{0}\|_{2},\quad k=1,2,\dots.

Note that $\tilde{\mathbf{r}}_{0}=\mathcal{W}^{\frac{1}{2}}\mathcal{G}\mathbf{r}_{0}$ . Therefore,

	$\displaystyle\left\\|\mathbf{r}_{k}\right\\|_{2}$	$\displaystyle\leq\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathcal{W}^{\frac{1}{2}}\mathcal{G}\mathbf{r}_{0}\\|_{2}$
		$\displaystyle\leq\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathcal{W}^{\frac{1}{2}}\\|_{2}\\|\mathcal{G}\\|_{2}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle=\kappa_{2}(\mathcal{W}^{\frac{1}{2}})\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle=\sqrt{\kappa_{2}(M_{m})}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle\leq\sqrt{c_{0}}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2},\quad k=1,2,\dots.$

The proof is complete.

Remark 2.23.

As a consequence of Theorem 2.21, GMRES can achieve a mesh-independent convergence rate when $\epsilon=\mathcal{O}(\tau^{1/2})$ .

2.1 Implementation

First, we discuss the computation of $\mathcal{A}\mathbf{v}$ for any given vector $\mathbf{v}$ . The computation of matrix-vector product $\mathcal{A}\mathbf{v}$ can be computed in $\mathcal{O}(mn)$ since $\mathcal{A}$ is a sparse matrix consisting of two simple bi-diagonal block Toeplitz matrices. The required storage is of $\mathcal{O}(mn)$ .

At each GMRES iteration, the matrix-vector product $\mathcal{P}_{\epsilon}^{-1}\mathbf{v}$ for a given vector $\mathbf{v}$ needs to be computed. Recalling that $\epsilon$ -circulant matrices are diagonalizable by the product of a diagonal matrix and a discrete Fourier matrix $\mathbb{F}_{n}=\frac{1}{\sqrt{n}}[\theta_{n}^{(i-1)(j-1)}]_{i,j=1}^{n}\in\mathbb{C}^{n\times n}$ with $\theta_{n}=\exp{(\frac{2\pi\mathbf{i}}{n})}$ , we can represent the matrix $C_{\epsilon,n}$ defined by (13) using the diagonalization $C_{\epsilon,n}=D_{\epsilon}^{-1}\mathbb{F}_{n}\Lambda_{\epsilon,n}\mathbb{F}_{n}^{*}D_{\epsilon}$ . Note that $\Lambda_{\epsilon,n}$ is a diagonal matrix.

Hence, we can decompose $\mathcal{P}_{\epsilon}$ from (12) as follows:

			$\displaystyle\mathcal{P}_{\epsilon}$
		$\displaystyle=$	$\displaystyle\frac{1}{2}\begin{bmatrix}\mathcal{C}^{\top}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}&\\ &\mathcal{C}_{\epsilon}+\alpha{I}_{n}\otimes M_{m}\end{bmatrix}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix}$
		$\displaystyle=$	$\displaystyle\mathcal{U}\begin{bmatrix}(\Lambda^{*}_{\epsilon,n}+\alpha I_{n})\otimes M_{m}+\tau I_{n}\otimes(-L_{m})&\\ &(\Lambda_{\epsilon,n}+\alpha I_{n})\otimes M_{m}+\tau I_{n}\otimes(-L_{m})\end{bmatrix}\mathcal{U}^{-1}$
			$\displaystyle\times\frac{1}{2}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix}.$

Note that $\mathcal{U}=\begin{bmatrix}(\mathbb{F}_{n}^{*}D_{\epsilon})^{\top}\otimes I_{m}&\\ &D_{\epsilon}^{-1}\mathbb{F}_{n}\otimes I_{m}\end{bmatrix}$ , where $D_{\epsilon}={\rm diag}(\epsilon^{\frac{i-1}{N}})_{i=1}^{n}$ and $\mathbb{F}_{n}=\frac{1}{\sqrt{n}}[\theta_{n}^{(i-1)(j-1)}]_{i,j=1}^{n}$ with $\theta_{n}=\exp(\frac{2\pi{\bf i}}{n})$ . Also, $\Lambda_{\epsilon,n}={\rm diag}(\lambda_{i-1}^{(\epsilon)})_{i=1}^{n}$ with $\lambda_{k}^{(\epsilon)}=1-\epsilon^{\frac{1}{n}}\theta_{n}^{-k}$ .

Therefore, the computation of $\mathbf{w}=\mathcal{P}_{\epsilon}^{-1}\mathbf{v}$ can be implemented by the following four steps.

1.

$\textrm{Compute}~{}\widehat{\mathbf{v}}=\mathcal{U}^{-1}\mathbf{v}$ ,

Compute

\displaystyle\widetilde{\mathbf{v}}=\begin{bmatrix}(\Lambda^{*}_{\epsilon,n}+\alpha I_{n})\otimes M_{m}+\tau I_{n}\otimes(-L_{m})&\\ &(\Lambda_{\epsilon,n}+\alpha I_{n})\otimes M_{m}+\tau I_{n}\otimes(-L_{m})\end{bmatrix}^{-1}\widehat{\mathbf{v}},

3.

$\textrm{Compute}~{}\widetilde{\mathbf{w}}=\mathcal{U}\widetilde{\mathbf{v}}$ ,
4.

$\textrm{Compute}~{}\mathbf{w}=\left(\frac{1}{2}\begin{bmatrix}I_{mn}&I_{mn}\\ -I_{mn}&I_{mn}\end{bmatrix}\right)^{-1}\widetilde{\mathbf{w}}=\begin{bmatrix}I_{mn}&-I_{mn}\\ I_{mn}&I_{mn}\end{bmatrix}\widetilde{\mathbf{w}}$ .

Both Steps 1 and 3 can be computed by fast Fourier transforms in $\mathcal{O}(mn\log{n})$ . In Step 2, the shifted Laplacian systems can be efficiently solved for instance by using the multigrid method. A detailed description of this highly effective implementation can be found in [11] for example. The matrix-vector multiplication in Step 4 requires only $\mathcal{O}(mn)$ operations.

3 Numerical Result

In this section, we provide numerical results to show the performance of our proposed preconditioners. All numerical experiments are carried out using MATLAB 2023b on a PC with Intel i5-13600KF CPU 3.50GHz and 32 GB RAM.

The CPU time in seconds is measured using MATLAB built-in functions $\mathbb{tic}$ and $\mathbb{toc}$ . All Steps $1-3$ in Section 2.1 are implemented by the functions $\mathbb{dst}$ and $\mathbb{fft}$ as discrete sine transform and fast Fourier transform respectively. The GMRES solver used is implemented using the built-in functions $\mathbb{gmres}$ . We choose a zero initial guess and a stopping tolerance of $10^{-6}$ based on the reduction in relative residual norms for the solver tested.

In the related tables, we denote by ‘Iter’ the number of iterations for solving a linear system by an iterative solver within the given accuracy. Denote by ‘DoF’, the number of unknowns in a linear system. Let $p^{*}$ and $y^{*}$ denote the approximate solution to $p$ and $y$ , respectively. Then, we define the error measure $e_{h}$ as

e_{h}=\left\|\begin{bmatrix}y^{*}\\ p^{*}\end{bmatrix}-\begin{bmatrix}y\\ p\end{bmatrix}\right\|_{L^{\infty}_{\tau}(L^{2}(\Omega))}.

The time interval $[0,T]$ and the space are partitioned uniformly with the mesh step size $\tau=T/n=T/{h^{-1}}$ and $h=1/(m+1)$ , respectively, where $h$ can be found in the related tables.

For each of the following examples, we take $\epsilon=\min{\left\{\frac{1}{2},\frac{1}{2}\tau\right\}}$ for GMRES- $\mathcal{P}_{\epsilon}$ .

Example 3.1.

In this example [18], we consider the following two-dimensional problem of solving (1), where $\Omega=(0,1)^{2}$ , $T=1$ , $a(x_{1},x_{2})=1$ , and

	$\displaystyle f(x_{1},x_{2},t)$	$\displaystyle=(2\pi^{2}-1)e^{-t}\sin{(\pi x_{1})}\sin{(\pi x_{2})},$
	$\displaystyle g(x_{1},x_{2},t)$	$\displaystyle=e^{-t}\sin{(\pi x_{1})}\sin{(\pi x_{2})},$

The analytical solution of which is given by

\displaystyle y(x_{1},x_{2},t)

\displaystyle=e^{-t}\sin{(\pi x_{1})}\sin{(\pi x_{2})},\quad p=0.

We remark that $K_{m}$ can be fast diagonalizable by the discrete sine transform matrix in this example, so we applied the fast sine transforms to solve the shifted Laplacian linear system in Step $2$ of the four-step procedures in Subsection 2.1.

Table 1 presents the iteration counts, CPU time, and errors of GMRES- $\mathcal{P}_{\epsilon}$ when employing the preconditioner with various $\gamma$ values in the backward Euler method. Our observations are as follows: (i) GMRES- $\mathcal{P}_{\epsilon}$ performs excellently and consistently, maintaining stable iteration counts and CPU times across a range of $\gamma$ values; and (ii) the error decreases as the mesh is refined.

Table 1: Results of GMRES-

\mathcal{P}_{\epsilon}

for Example 3.1

$\gamma$	$h$	DoF	GMRES- $\mathcal{P}_{\epsilon}$
$\gamma$	$h$	DoF	Iter	CPU	$e_{h}$
$10^{-10}$	$2^{-5}$	61504	4	0.034	1.54e-2
	$2^{-6}$	508032	4	0.306	7.75e-3
	$2^{-7}$	4129024	4	2.450	3.89e-3
	$2^{-8}$	33292800	4	23.478	1.95e-3
$10^{-8}$	$2^{-5}$	61504	6	0.044	1.54e-2
	$2^{-6}$	508032	6	0.387	7.75e-3
	$2^{-7}$	4129024	6	3.589	3.89e-3
	$2^{-8}$	33292800	7	32.822	1.95e-3
$10^{-6}$	$2^{-5}$	61504	8	0.054	1.54e-2
	$2^{-6}$	508032	10	0.740	7.71e-3
	$2^{-7}$	4129024	10	6.126	3.86e-3
	$2^{-8}$	33292800	12	60.325	1.93e-3
$10^{-4}$	$2^{-5}$	61504	11	0.074	1.42e-2
	$2^{-6}$	508032	11	0.739	7.09e-3
	$2^{-7}$	4129024	9	5.124	3.56e-3
	$2^{-8}$	33292800	6	31.904	1.78e-3
$10^{-2}$	$2^{-5}$	61504	12	0.075	3.10e-3
	$2^{-6}$	508032	12	0.882	1.50e-3
	$2^{-7}$	4129024	14	8.125	7.40e-4
	$2^{-8}$	33292800	14	68.417	3.67e-4
$1$	$2^{-5}$	61504	8	0.066	7.19e-4
	$2^{-6}$	508032	8	0.531	3.65e-4
	$2^{-7}$	4129024	8	4.665	1.84e-4
	$2^{-8}$	33292800	8	38.146	9.25e-5

Example 3.2.

In this example, we consider the following two-dimensional problem of solving (1) with a variable function $a(x_{1},x_{2})$ , where $\Omega=(0,1)^{2}$ , $T=1$ , $a(x_{1},x_{2})=10^{-5}\sin{(\pi x_{1}x_{2})}$ , and

f(x_{1},x_{2},t)=-\sin{(\pi t)}\sin{(\pi x_{1})}\sin{(\pi x_{2})}+e^{-t}x_{1}(1-x_{1})[2\times 10^{-5}\sin{(\pi x_{1}x_{2})}\\ -x_{2}(1-x_{2})-10^{-5}\pi\cos{(\pi x_{1}x_{2})}x(1-2x_{2})]\\ +e^{-t}x_{2}(1-x_{2})[2\times 10^{-5}\sin{(\pi x_{1}x_{2})}-10^{-5}\pi\cos{(\pi x_{1}x_{2})}x_{2}(1-2x_{1})],

g(x_{1},x_{2},t)=-\gamma\pi\cos{(\pi t)}\sin{(\pi x_{1})}\sin{(\pi x_{2})}+e^{-t}x_{1}(1-x_{1})x_{2}(1-x_{2})\\ -10^{-5}\gamma\pi^{2}\sin{(\pi t)}[-2\sin{(\pi x_{1}x_{2})}\sin{(\pi x_{1})}\sin{(\pi x_{2})}\\ +\cos{(\pi x_{1}x_{2})}(x_{1}\sin{(\pi x_{1})}\cos{(\pi x_{2})}+x_{2}\cos{(\pi x_{1})}\sin{(\pi x_{2})})].

The analytical solution of which is given by

\displaystyle y(x_{1},x_{2},t)=e^{-t}x_{1}(1-x_{1})x_{2}(1-x_{2}),\quad p(x_{1},x_{2},t)=\gamma\sin{(\pi t)}\sin{(\pi x_{1})}\sin{(\pi x_{2})}.

Since $K_{m}$ is not fast diagonalizable in this example, we applied one iteration of the V-cycle geometric multigrid method to solve the shifted Laplacian linear system (as detailed in Subsection 2.1). The Gauss-Seidel method is employed as the pre-smoother for the multigrid method.

Table 2 presents the iteration counts, CPU time, and errors for GMRES- $\mathcal{P}_{\epsilon}$ when the preconditioner is utilized with various values of $\gamma$ in the backward Euler method. The purpose of this example is to evaluate the effectiveness of our solvers when the coefficient function $a(x_{1},x_{2})$ is non-constant.

Similar to the last example, the results indicate that (i) GMRES- $\mathcal{P}_{\epsilon}$ achieves stable iteration counts and CPU times across a broad range of $\gamma$ values; (ii) the errors decrease as expected as the mesh refines.

Table 2: Results of GMRES-

\mathcal{P}_{\epsilon}

for Example 3.2

$\gamma$	$h$	DoF	GMRES- $\mathcal{P}_{\epsilon}$
$\gamma$	$h$	DoF	Iter	CPU	$e_{h}$
$10^{-10}$	$2^{-5}$	61504	4	0.185	1.03e-3
	$2^{-6}$	508032	4	0.775	5.17e-4
	$2^{-7}$	4129024	4	5.805	2.59e-4
	$2^{-8}$	33292800	4	54.357	1.30e-4
$10^{-8}$	$2^{-5}$	61504	6	0.222	1.03e-3
	$2^{-6}$	508032	6	1.059	5.17e-4
	$2^{-7}$	4129024	6	8.241	2.59e-4
	$2^{-8}$	33292800	7	119.212	1.30e-4
$10^{-6}$	$2^{-5}$	61504	8	0.339	1.02e-3
	$2^{-6}$	508032	10	2.046	5.15e-4
	$2^{-7}$	4129024	11	18.462	2.57e-4
	$2^{-8}$	33292800	13	194.312	1.29e-4
$10^{-4}$	$2^{-5}$	61504	14	0.668	9.82e-4
	$2^{-6}$	508032	15	2.982	4.92e-4
	$2^{-7}$	4129024	13	24.479	2.46e-4
	$2^{-8}$	33292800	11	199.911	1.23e-4
$10^{-2}$	$2^{-5}$	61504	11	0.516	4.03e-3
	$2^{-6}$	508032	9	1.973	2.17e-3
	$2^{-7}$	4129024	8	16.740	1.13e-3
	$2^{-8}$	33292800	7	146.326	5.76e-4
$1$	$2^{-5}$	61504	6	0.337	2.85e-2
	$2^{-6}$	508032	6	1.610	1.43e-2
	$2^{-7}$	4129024	6	13.809	7.20e-3
	$2^{-8}$	33292800	6	131.204	3.61e-3

4 Conclusions

In this study, we developed a novel PinT preconditioner for all-at-once linear systems from optimal control problems with parabolic equations, using $\epsilon$ -circulant matrices which can be efficiently implemented by fast Fourier transforms. Our approach enhances the convergence rate of the GMRES method, maintaining linearity with a theoretically estimated $\epsilon$ and independence from matrix size and the regularization parameter. Numerical experiments confirm the effectiveness and efficiency of our preconditioner.

Acknowledgments

The work of Sean Y. Hon was supported in part by the Hong Kong RGC under grant 22300921 and a start-up grant from the Croucher Foundation. The work of Xue-lei Lin was partially supported by research grants: 12301480 from NSFC, HA45001143 from Harbin Institute of Technology, Shenzhen, HA11409084 from Shenzhen.

References

[1] Owe Axelsson. Optimality properties of a square block matrix preconditioner with applications. Computers & Mathematics with Applications, 80(2):286-294, 2020.
[2] Owe Axelsson and Maya Neytcheva. Eigenvalue estimates for preconditioned saddle point matrices. Numerical Linear Algebra with Applications, 13(4):339-360, 2006.
[3] Zhong-Zhi Bai and Apostolos Hadjidimos. Optimization of extrapolated Cayley transform with non-Hermitian positive definite matrix. Linear Algebra and Its Applications, 463:322-339, 2014.
[4] Zhong-Zhi Bai and and Kang-Ya Lu. Optimal rotated block-diagonal preconditioning for discretized optimal control problems constrained with fractional time-dependent diffusive equations. Applied Numerical Mathematics, 163:126-146, 2021.
[5] Bernhard Beckermann, Sergei A. Goreinov, and Eugene E. Tyrtyshnikov. Some remarks on the Elman estimate for GMRES. SIAM Journal on Matrix Analysis and Applications, 27(3):772–778, 2005.
[6] Daniele Bertaccini and Michael K. Ng. Block $\omega$ -circulant preconditioners for the systems of differential equations. Calcolo, 40(2):71-90, 2003.
[7] Dario A. Bini, Guy Latouche, and Beatrice Meini. Numerical Methods for Structured Markov Chains. Oxford University Press, New York, 2005.
[8] Arne Bouillon, Giovanni Samaey, and Karl Meerbergen. On generalized preconditioners for time-parallel parabolic optimal control. arXiv preprint, arXiv:2302.06406, 2024.
[9] Alfio Borzì and Volker Schulz. Computational optimization of systems governed by partial differential equations. Society for Industrial and Applied Mathematics, 2011.
[10] Po Yin Fung and Sean Hon. Block $\omega$ -circulant preconditioners for parabolic optimal control problems. arXiv e-prints, arXiv:2406.00952, 2024.
[11] Yunhui He and Jun Liu. A Vanka-type multigrid solver for complex-shifted Laplacian systems from diagonalization-based parallel-in-time algorithms. Applied Mathematics Letters, 132, 108125, 2022.
[12] Michael Hinze, Rene Pinnau, Michael Ulbrich, and Stefan Ulbrich. Optimization with PDE constraints (Vol. 23). Springer Science & Business Media, 2008.
[13] Sean Hon, Jiamei Dong, and Stefano Serra-Capizzano. A preconditioned MINRES method for optimal control of wave equations and its asymptotic spectral distribution theory. SIAM Journal on Matrix Analysis and Applications, 44(4):1477–1509, 2023.
[14] Santolo Leveque and John W. Pearson. Fast iterative solver for the optimal control of time-dependent PDEs with Crank–Nicolson discretization in time. Numerical Linear Algebra with Applications, 29:e2419, 2022.
[15] Congcong Li, Xuelei Lin, Sean Hon, Shu-Lin Wu. A preconditioned MINRES method for block lower triangular Toeplitz systems. arXiv preprint, arXiv:2307.07749, 2023.
[16] Xuelei Lin and Sean Hon. A block $\alpha$ -circulant based preconditioned MINRES method for wave equations. arXiv preprint, arXiv:2306.03574, 2024.
[17] Xue-lei Lin and Michael Ng. An all-at-once preconditioner for evolutionary partial differential equations. SIAM Journal on Scientific Computing, 43(4), A2766–A2784, 2021.
[18] Xue-lei Lin and Shu-Lin Wu. A parallel-in-time preconditioner for Crank-Nicolson discretization of a parabolic optimal control problem. Journal of Computational and Applied Mathematics, 116106, 2024.
[19] Jacques Louis Lions. Optimal control of systems governed by partial differential equations. Springer-Verlag Berlin Heidelberg, 1971.
[20] Jun Liu and Shu-Lin Wu. A fast block $\alpha$ -circulant preconditioner for all-at-once systems from wave equations. SIAM Journal on Matrix Analysis and Applications. 41(4), 1912–1943, 2021.
[21] Martin J. Gander, Jun Liu, Shu-Lin Wu, Xiaoqiang Yue, and Tao Zhou. ParaDiag: parallel-in-time algorithms based on the diagonalization technique. arXiv e-prints, arXiv:2005.09158, 2020.
[22] Eleanor McDonald, Sean Hon, Jennifer Pestana, and Andy Wathen. Preconditioning for nonsymmetry and time-dependence. In Domain Decomposition Methods in Science and Engineering XXIII, 81–91, Springer, 2017.
[23] Eleanor McDonald, Jennifer Pestana, and Andy Wathen. Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations. SIAM Journal on Scientific Computing, 40(2) A1012–A1033, 2018. https://doi.org/10.1137/16m1062016
[24] Malcolm F. Murphy, Gene H. Golub, and Andrew J. Wathen. A note on preconditioning for indefinite linear systems. SIAM Journal on Scientific Computing, 21(6):1969–1972, 2000.
[25] John W. Pearson and Andrew J. Wathen. A new approximation of the Schur complement in preconditioners for PDE-constrained optimization. Numerical Linear Algebra and Applications, 19(5):816-829, 2012.
[26] John W. Pearson, Martin Stoll, and Andrew J. Wathen. Regularization-robust preconditioners for time-dependent PDE-constrained optimization problems. SIAM Journal on Matrix Analysis and Applications, 33:1126-1152, 2012.
[27] Ferdi Tröltzsch. Optimal control of partial differential equations: theory, methods, and applications (Vol. 112). American Mathematical Soc, 2010.
[28] Andrew J. Wathen. Preconditioning. Acta Numerica. 24, 329-376, 2015.
[29] Shu-Lin Wu and Jun Liu. A parallel-in-time block-circulant preconditioner for optimal control of wave equations. SIAM Journal on Scientific Computing, 42(3):A1510–A1540, 2020.
[30] Shu-Lin Wu, Zhiyong Wang, and Tao Zhou. PinT preconditioner for forward-backward evolutionary equations. SIAM Journal on Matrix Analysis and Applications, 44(4): 1771-1798, 2023.
[31] Shu-Lin Wu and Tao Zhou. Diagonalization-based parallel-in-time algorithms for heat PDE-constrained optimization problems. ESAIM: Control, Optimisation and Calculus of Variations, 26, 88, 2020.

	$\displaystyle\|\|\mathbf{r}_{j}\|\|_{2}$	$\displaystyle=\|\|\mathcal{P}_{\epsilon}^{-1}\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathbf{x}_{j}\|\|_{2}$
		$\displaystyle\leq\|\|\mathcal{P}_{\epsilon}^{-1}\mathbf{b}-\mathcal{P}_{\epsilon}^{-1}\mathcal{A}\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf x}_{j}\|\|_{2}$
		$\displaystyle=\|\|\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}(\widetilde{\mathcal{H}}_{\epsilon}^{-1}\mathcal{W}^{-\frac{1}{2}}{\bf b}-\widetilde{\mathcal{H}}_{\epsilon}^{-1}\widetilde{\mathcal{B}}\tilde{\bf x}_{j})\|\|_{2}$
		$\displaystyle=\|\|\mathcal{G}^{-1}\mathcal{W}^{-\frac{1}{2}}\tilde{\bf r}_{j}\|\|_{2}\leq\|\|\mathcal{G}^{-1}\|\|_{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\|\|\tilde{\bf r}_{j}\|\|_{2}=\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\|\|\tilde{\bf r}_{j}\|\|_{2}.$

	$\displaystyle\\|\tilde{\mathbf{r}}_{k}\\|_{2}$	$\displaystyle\leq$	$\displaystyle\left(1-\frac{(1-\delta)^{2}}{\left(\sqrt{2}\left(1+\frac{\delta}{2}\right)\right)^{2}}\right)^{k/2}\\|\tilde{\mathbf{r}}_{0}\\|_{2}$
		$\displaystyle\leq$	$\displaystyle\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\tilde{\mathbf{r}}_{k}\\|_{2}.$

	$\displaystyle\left\\|\mathbf{r}_{k}\right\\|_{2}$	$\displaystyle\leq\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathcal{W}^{\frac{1}{2}}\mathcal{G}\mathbf{r}_{0}\\|_{2}$
		$\displaystyle\leq\sqrt{2}\|\|\mathcal{W}^{-\frac{1}{2}}\|\|_{2}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathcal{W}^{\frac{1}{2}}\\|_{2}\\|\mathcal{G}\\|_{2}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle=\kappa_{2}(\mathcal{W}^{\frac{1}{2}})\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle=\sqrt{\kappa_{2}(M_{m})}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2}$
		$\displaystyle\leq\sqrt{c_{0}}\left(\frac{\sqrt{-\delta^{2}+8\delta+2}}{2+\delta}\right)^{k}\\|\mathbf{r}_{0}\\|_{2},\quad k=1,2,\dots.$