Convergence analysis of a two-grid method for nonsymmetric positive definite problems

Xuefeng Xu Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA [email protected]; [email protected]

Abstract.

Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.

Key words and phrases:

multigrid, two-grid methods, nonsymmetric positive definite problems, convergence theory, inexact coarse solvers

2010 Mathematics Subject Classification:

Primary 65F08, 65F10, 65N15, 65N55; Secondary 15A18, 65N75

1. Introduction

Multigrid is one of the most efficient numerical methods for solving large-scale systems of linear equations that arise from discretized partial differential equations. It has been shown to be a powerful solver, with linear or near-linear computational complexity, for a large class of linear systems; see, e.g., [9, 3, 21, 22]. The fundamental module of multigrid methods is a two-grid scheme, which consists of two complementary error-reduction processes: smoothing and coarse-grid correction. The smoothing process is typically a simple iterative method, such as the (weighted) Jacobi and Gauss–Seidel methods. One limitation of these classical methods is that they are only efficient at eliminating high-frequency (i.e., oscillatory) error in general. The remaining low-frequency (i.e., smooth) error will be further reduced by the coarse-grid correction, which is essentially a projection-type method. These two processes will be applied iteratively in multigrid algorithms until a desired residual tolerance is achieved.

Much of multigrid analysis within the literature relies on the problem matrix, $A$ , being symmetric positive definite (SPD). For such problems, the convergence theory of multigrid methods has been well developed; see, e.g., [24, 4, 5, 22, 25, 27, 28]. When $A$ is SPD, the convergence analysis of a symmetric two-grid method (i.e., pre- and postsmoothing steps are performed in a symmetric way) amounts to analyzing the eigenvalues of its iteration matrix (also called the error propagation matrix), because the $A$ -norm of the iteration matrix, called the convergence factor, coincides with its spectral radius. In the case of Galerkin coarse solver, an identity has been established to characterize the convergence factor of two-grid methods [24, 5, 29], which has been widely used to analyze two-grid methods; see, e.g., [5, 25, 1, 26]. Nevertheless, it is often too costly to solve the Galerkin coarse-grid system exactly, especially when its size is relatively large. Recently, Xu and Zhang [27, 28] extended the two-grid convergence theory to the case of non-Galerkin coarse solvers. They proposed two theoretical frameworks for analyzing the convergence of inexact two-grid methods, which have been successfully applied to the convergence analysis of multigrid methods.

Unlike the SPD case, the convergence theory of multigrid methods for nonsymmetric problems is still not mature. Most existing algorithms are based largely on heuristic or incomplete theory. In general, a nonsymmetric matrix fails to induce a norm and the error propagation matrix of the coarse-grid correction is an oblique projection. Thus, two fundamental questions need to be answered when studying two-grid methods for nonsymmetric problems. The first one is how to choose a suitable convergence measure. Ideally, the selected measure can facilitate us to derive an identity for characterizing the convergence of two-grid methods, as in the SPD case. The simplest choice is the spectral radius of iteration matrix [17, 23, 15, 6]. However, it describes only the asymptotic convergence, which may be a completely misleading indicator [8, 14, 18], especially when the iterations are not carried out enough times. Some other measures (e.g., the $\sqrt{A^{T}A}$ - and $\sqrt{AA^{T}}$ -norms) and the associated convergence estimates can be found in [2, 12, 14, 13, 16, 18] and the references therein.

The second question is how to build the intergrid transfer operators $R\in\mathbb{R}^{n_{\rm c}\times n}$ (restriction) and $P\in\mathbb{R}^{n\times n_{\rm c}}$ (prolongation) such that the coarse-grid correction will not increase error [16], where $n_{\rm c}$ is the number of coarse variables and $n$ is the number of fine ones ( $A\in\mathbb{R}^{n\times n}$ ). The error propagation matrix of the correction process takes the form $I-P(RAP)^{-1}RA$ , provided that $RAP\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}}$ is nonsingular. Note that $I-P(RAP)^{-1}RA$ is a projection (i.e., an idempotent operator). For any operator norm $\|\cdot\|$ , it holds that

\|I-P(RAP)^{-1}RA\|\geq 1.

If $\|I-P(RAP)^{-1}RA\|>1$ , then the correction process may increase error, which is contrary to the core idea of multigrid methods. Indeed, the coarse-grid correction is key to the fast convergence of multigrid algorithms. Therefore, for a fixed $R$ , one needs to design a prolongation $P$ such that $\|I-P(RAP)^{-1}RA\|=1$ in order to ensure that the correction process will not increase error.

In this paper, we are concerned with the convergence theory of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from discretized convection-diffusion equations). In the two-grid method, an SPD matrix, $M\in\mathbb{R}^{n\times n}$ , is used as a smoother and the induced $M$ -norm is used for measuring two-grid convergence. In addition, $M^{-1}A^{T}R^{T}$ is used as a prolongation operator in order to ensure that the coarse-grid correction will not increase error. Such a prolongation can be viewed as a multiplicative perturbation of the classical choice $R^{T}$ (since $M^{-1}A^{T}\approx I$ ). Unlike the existing estimates, we here establish a succinct identity for the convergence factor of the two-grid method. It can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by the row space of $R$ . Compared with the exact case, the convergence theory of inexact two-grid methods is of more practical significance, while, for nonsymmetric problems, such theory is scarce in the literature. Motivated by this observation, we present some convergence estimates for an inexact variant of the two-grid method (both linear and nonlinear coarse solvers are considered), which lay a theoretical foundation for developing new algorithms.

The rest of this paper is organized as follows. In Section 2, we present a two-grid method for solving nonsymmetric positive definite linear systems. In Section 3, we establish an identity for the convergence factor of the two-grid method, followed by two applications of the identity. In Section 4, we propose and analyze an inexact variant of the two-grid method. In Section 5, we give some concluding remarks.

2. Preliminaries

For convenience, we first list some notation used in the subsequent discussions.

–

$I_{n}$ denotes the $n\times n$ identity matrix (or $I$ when its size is clear from context).
–

$\lambda_{\min}(\cdot)$ , $\lambda_{\min}^{+}(\cdot)$ , and $\lambda_{\max}(\cdot)$ stand for the smallest eigenvalue, the smallest positive eigenvalue, and the largest eigenvalue of a matrix, respectively.
–

$\lambda_{i}(\cdot)$ denotes the $i$ th smallest eigenvalue of a matrix.
–

$\lambda(\cdot)$ denotes the spectrum of a matrix.
–

$\|\cdot\|_{2}$ denotes the spectral norm of a matrix.
–

$\|\cdot\|_{M}$ denotes the norm induced by an SPD matrix $M\in\mathbb{R}^{n\times n}$ : if $\mathbf{v}\in\mathbb{R}^{n}$ , then $\|\mathbf{v}\|_{M}=\sqrt{\mathbf{v}^{T}M\mathbf{v}}$ ; if $X\in\mathbb{R}^{n\times n}$ , then $\|X\|_{M}=\max_{\mathbf{v}\in\mathbb{R}^{n}\backslash\{0\}}\frac{\|X\mathbf{v}\|_{M}}{\|\mathbf{v}\|_{M}}$ .
–

$\mathbb{E}[\cdot]$ denotes the expectation of a random variable.

Consider solving the linear system

(2.1)

A\mathbf{u}=\mathbf{f},

where $A\in\mathbb{R}^{n\times n}$ is nonsymmetric but positive definite (namely, $\mathbf{v}^{T}A\mathbf{v}>0$ for all $\mathbf{v}\in\mathbb{R}^{n}\backslash\{0\}$ , which implies that $A$ is nonsingular), $\mathbf{u}\in\mathbb{R}^{n}$ , and $\mathbf{f}\in\mathbb{R}^{n}$ . A natural splitting of $A$ is given by

A=A_{\rm sym}+A_{\rm skew},

where $A_{\rm sym}=\frac{1}{2}(A+A^{T})$ and $A_{\rm skew}=\frac{1}{2}(A-A^{T})$ are the symmetric and skew-symmetric parts of $A$ , respectively. It is easy to check that $A$ is positive definite if and only if the symmetric matrix $A_{\rm sym}$ is positive definite.

Several basic assumptions involved in the analysis of two-grid methods are summarized as follows.

•

Let $M\in\mathbb{R}^{n\times n}$ be an SPD smoother such that $\|I-M^{-1}A\|_{M}\leq 1$ .
•

Let $R\in\mathbb{R}^{n_{\rm c}\times n}$ be a restriction matrix of rank $n_{\rm c}$ , where $n_{\rm c}\,(<n)$ is the number of coarse variables.
•

Let $P\in\mathbb{R}^{n\times n_{\rm c}}$ be a prolongation (or interpolation) matrix of rank $n_{\rm c}$ .
•

Assume that the coarse-grid matrix $A_{\rm c}:=RAP\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}}$ is nonsingular.

Remark 2.1.

The assumption $\|I-M^{-1}A\|_{M}\leq 1$ is equivalent to

\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}\leq 1,

that is,

\lambda_{\min}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\geq 0,

where

(2.2)

\widetilde{A}:=A+A^{T}-AM^{-1}A^{T}.

As a result, $\|I-M^{-1}A\|_{M}\leq 1$ if and only if the symmetric matrix $\widetilde{A}$ is positive semidefinite, which is a very weak assumption. If $\widetilde{A}$ fails to be positive semidefinite, one only needs to modify $M$ properly, e.g., replace $M$ by $\omega M$ ( $\omega>0$ is a parameter). Indeed, there always exists an SPD matrix $M$ such that $\|I-M^{-1}A\|_{M}\leq 1$ .

With the above assumptions, an exact two-grid method for solving (2.1) can be described by Algorithm 1, in which $A_{\rm c}^{-1}$ is used as a coarse solver.

Algorithm 1 Exact two-grid method.

1:Smoothing:

\mathbf{u}^{(1)}\leftarrow\mathbf{u}^{(0)}+M^{-1}\big{(}\mathbf{f}-A\mathbf{u}^{(0)}\big{)}

\triangleright

\mathbf{u}^{(0)}\in\mathbb{R}^{n}

is an initial guess

2:Restriction:

\mathbf{r}_{\rm c}\leftarrow R\big{(}\mathbf{f}-A\mathbf{u}^{(1)}\big{)}

3:Coarse-grid correction:

\mathbf{e}_{\rm c}\leftarrow A_{\rm c}^{-1}\mathbf{r}_{\rm c}

4:Prolongation:

\mathbf{u}_{\rm TG}\leftarrow\mathbf{u}^{(1)}+P\mathbf{e}_{\rm c}

Remark 2.2.

When $A$ is SPD, pre- and postsmoothing steps are often performed in a symmetric way. The resulting two-grid iteration matrix is symmetric in the $A$ -inner product, in which case the $A$ -norm of the two-grid iteration matrix coincides with its spectral radius. However, for nonsymmetric problems, the two-grid iteration matrix will no longer be symmetric, even if a postsmoothing step is added in Algorithm 1.

In fact, Algorithm 1 involves the following two error-reduction processes:


(2.3a)	$\displaystyle\mathbf{u}-\mathbf{u}^{(1)}$	$\displaystyle=(I-M^{-1}A)\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},$
(2.3b)	$\displaystyle\mathbf{u}-\mathbf{u}_{\rm TG}$	$\displaystyle=(I-\varPi_{A})\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)},$

where

(2.4)

\varPi_{A}:=PA_{\rm c}^{-1}RA.

Then

	$\displaystyle\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}$	$\displaystyle\leq\\|I-M^{-1}A\\|_{M}\big{\\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\\|}_{M},$
	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm TG}\\|_{M}$	$\displaystyle\leq\\|I-\varPi_{A}\\|_{M}\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}.$

Recall that $\|I-M^{-1}A\|_{M}\leq 1$ . The initial error $\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}$ will not be amplified by (2.3a).

It can be seen from (2.4) that $\varPi_{A}$ is a projection along (or parallel to) $\operatorname*{null}(RA)$ onto $\operatorname*{range}(P)$ , which leads to

\|I-\varPi_{A}\|_{M}\geq 1,

with equality if and only if $\varPi_{A}=M^{-1}\varPi_{A}^{T}M$ . To ensure that the error $\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}$ will not be amplified by (2.3b), it suffices to choose a prolongation $P$ such that

(2.5)

\varPi_{A}=M^{-1}A^{T}R^{T}(RAP)^{-T}P^{T}M.

Observe that $\varPi_{A}$ is a projection along $\operatorname*{null}(RA)$ onto $\operatorname*{range}(P)$ , while the right-hand side of (2.5) is a projection along $\operatorname*{null}(P^{T}M)$ onto $\operatorname*{range}(M^{-1}A^{T}R^{T})$ . Hence, (2.5) entails that

\operatorname*{range}(P)=\operatorname*{range}(M^{-1}A^{T}R^{T})\quad\text{and}\quad\operatorname*{null}(P^{T}M)=\operatorname*{null}(RA),

which can be satisfied by

P=M^{-1}A^{T}R^{T}W

for any $n_{\rm c}\times n_{\rm c}$ nonsingular matrix $W$ . Taking $W=I_{n_{\rm c}}$ yields the prolongation

(2.6)

P_{\star}\equiv M^{-1}A^{T}R^{T}.

For any coarse vector $\mathbf{v}_{\rm c}\in\mathbb{R}^{n_{\rm c}}$ , $P_{\star}\mathbf{v}_{\rm c}$ can be easily computed, because the action of $M^{-1}$ is always available. Furthermore, it is possible that $P_{\star}$ has a sparse structure. For instance, if $A$ is sparse, $M=\omega\operatorname*{diag}(A)$ , and $R=\big{(}0\,\ I_{n_{\rm c}}\big{)}$ , then $P_{\star}$ is sparse.

Remark 2.3.

For SPD problems, it is advisable to take $P=R^{T}$ , which is the most commonly used prolongation in both geometric and algebraic multigrid methods. Then

\|I-\varPi_{A}\|_{A}=\big{\|}I-R^{T}(RAR^{T})^{-1}RA\big{\|}_{A}=1,

which shows that the error $\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{A}$ will not be amplified by (2.3b). However, for nonsymmetric problems, this goal cannot be achieved by the choice $R^{T}$ . According to the above discussion, it is suggested to take $P=P_{\star}$ , which can be viewed as a multiplicative perturbation of $R^{T}$ . Moreover,

\frac{\|P_{\star}-R^{T}\|_{M}}{\|R^{T}\|_{M}}\leq\|I-M^{-1}A^{T}\|_{M}=\|I-M^{-1}A\|_{M}\leq 1.

Combining (2.3a) and (2.3b), we obtain

\mathbf{u}-\mathbf{u}_{\rm TG}=E_{\rm TG}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

where

(2.7)

E_{\rm TG}=(I-\varPi_{A})(I-M^{-1}A),

which is called the iteration matrix (or error propagation matrix) of Algorithm 1. Then, it holds that

\|\mathbf{u}-\mathbf{u}_{\rm TG}\|_{M}\leq\|E_{\rm TG}\|_{M}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}

with

(2.8)

\|E_{\rm TG}\|_{M}=\big{\|}\big{(}I-M^{\frac{1}{2}}\varPi_{A}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\|}_{2}.

The quantity (2.8) is referred to as the $M$ -convergence factor of Algorithm 1, which will be analyzed in the next section.

3. Convergence analysis of Algorithm 1

In this section, we analyze the convergence of Algorithm 1 with prolongation $P_{\star}$ . A succinct identity for characterizing the convergence factor $\|E_{\rm TG}\|_{M}$ is established, followed by discussions on how the identity can be used to derive a class of optimal restriction operators and analyze the influence of $\operatorname*{range}(R^{T})$ on $\|E_{\rm TG}\|_{M}$ .

We start with a technical lemma, which provides a sufficient and necessary condition for $\|E_{\rm TG}\|_{M}<1$ (see the proof of Theorem 3.3 for details).

Lemma 3.1.

Let $\widetilde{A}$ and $\varPi_{A}$ be defined by (2.2) and (2.4), respectively. Then

(3.1)

\operatorname*{null}(\widetilde{A})\cap\operatorname*{null}(RA)=\{0\}

if and only if $\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c}$ .

Proof.

Observe that $\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c}$ is equivalent to

(3.2)

\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=\operatorname*{null}(I-\varPi_{A}).

Thus, it suffices to show that (3.1) and (3.2) are equivalent.

“(3.1) $\Rightarrow$ (3.2)”: For any $\mathbf{x}\in\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}$ , we have

\widetilde{A}(I-\varPi_{A})\mathbf{x}=0,

which implies that $(I-\varPi_{A})\mathbf{x}\in\operatorname*{null}(\widetilde{A})$ . On the other hand, $(I-\varPi_{A})\mathbf{x}\in\operatorname*{null}(RA)$ . If (3.1) holds, then $\mathbf{x}\in\operatorname*{null}(I-\varPi_{A})$ . The arbitrariness of $\mathbf{x}$ yields

\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}\subseteq\operatorname*{null}(I-\varPi_{A}),

which, together with the fact $\operatorname*{null}(I-\varPi_{A})\subseteq\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}$ , leads to (3.2).

“(3.2) $\Rightarrow$ (3.1)”: Assume that the relation (3.1) does not hold, i.e., there exists a nonzero vector $\mathbf{y}$ in $\operatorname*{null}(\widetilde{A})\cap\operatorname*{null}(RA)$ . Since

\operatorname*{null}(RA)=\operatorname*{range}(I-\varPi_{A}),

$\mathbf{y}$ can be expressed as $\mathbf{y}=(I-\varPi_{A})\mathbf{z}$ for some $\mathbf{z}\in\mathbb{R}^{n}\backslash\operatorname*{range}(P)$ , which, combined with the fact $\mathbf{y}\in\operatorname*{null}(\widetilde{A})=\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}\big{)}$ , yields

\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\mathbf{z}=0.

Consequently, $\mathbf{z}$ is in $\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}$ but not in $\operatorname*{null}(I-\varPi_{A})$ , which contradicts with (3.2). This completes the proof. ∎

Remark 3.2.

Obviously, the condition (3.1) will be satisfied if $\widetilde{A}$ is SPD, or, equivalently, $\|I-M^{-1}A\|_{M}<1$ . In addition, (3.1) implies that $\operatorname*{rank}(\widetilde{A})\geq n-n_{\rm c}$ .

Using (2.8) and Lemma 3.1, we can derive the following identity.

Theorem 3.3.

Under the assumptions of Algorithm 1 and the condition (3.1), the $M$ -convergence factor of Algorithm 1 with prolongation (2.6) can be characterized as

(3.3)

\|E_{\rm TG}\|_{M}=\sqrt{1-\sigma_{\rm TG}}

with

(3.4)

\sigma_{\rm TG}=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)},

where $\widetilde{A}$ and $\varPi_{A}$ are defined by (2.2) and (2.4), respectively.

Proof.

If $P=P_{\star}$ , then

A_{\rm c}=RAP_{\star}=P_{\star}^{T}MP_{\star}

and

\varPi_{A}=P_{\star}A_{\rm c}^{-1}RA=P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M.

Let

\varPi=M^{\frac{1}{2}}\varPi_{A}M^{-\frac{1}{2}}.

Then

(3.5)

\varPi=M^{\frac{1}{2}}P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M^{\frac{1}{2}}.

Since $\varPi^{T}=\varPi=\varPi^{2}$ and $\operatorname*{rank}(\varPi)=n_{\rm c}$ , there exists an $n\times n$ orthogonal matrix $Q$ such that

(3.6)

Q^{T}\varPi Q=\begin{pmatrix}I_{n_{\rm c}}&0\\ 0&0\end{pmatrix}.

By (2.8), we have

	$\displaystyle\\|E_{\rm TG}\\|_{M}^{2}$	$\displaystyle=\big{\\|}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\\|}_{2}^{2}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{)}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{)}$
		$\displaystyle=1-\lambda_{\min}\big{(}\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}.$

Let

(3.7)

Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q=\begin{pmatrix}X_{1}&X_{2}\\ X_{2}^{T}&X_{3}\end{pmatrix},

where $X_{1}\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}}$ , $X_{2}\in\mathbb{R}^{n_{\rm c}\times(n-n_{\rm c})}$ , and $X_{3}\in\mathbb{R}^{(n-n_{\rm c})\times(n-n_{\rm c})}$ . The positive semidefiniteness of $Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q$ implies that of $X_{3}$ . From the relation

	$\displaystyle I-Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q$	$\displaystyle=Q^{T}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}Q$
		$\displaystyle=Q^{T}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}Q,$

we deduce that $I_{n-n_{\rm c}}-X_{3}$ is SPSD. It follows that $\lambda(X_{3})\subset[0,1]$ . Direct computation yields

\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}I_{n_{\rm c}}&X_{2}\\ 0&X_{3}\end{pmatrix}Q^{T}.

Then

	$\displaystyle\\|E_{\rm TG}\\|_{M}^{2}$	$\displaystyle=1-\lambda_{\min}\big{(}\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}$
		$\displaystyle=1-\min\big{\{}1,\,\lambda_{\min}(X_{3})\big{\}}$
		$\displaystyle=1-\lambda_{\min}(X_{3}).$

Since

(I-\varPi)M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}0&0\\ 0&X_{3}\end{pmatrix}Q^{T},

we obtain

	$\displaystyle\operatorname*{rank}(X_{3})$	$\displaystyle=\operatorname*{rank}\big{(}(I-\varPi)M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}$
		$\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}M^{-\frac{1}{2}}(I-\varPi)\big{)}$
		$\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c},$

where we have used Lemma 3.1. This means that $X_{3}$ is SPD, i.e., $\lambda_{\min}(X_{3})>0$ . Due to

(3.8)

M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}0&X_{2}\\ 0&X_{3}\end{pmatrix}Q^{T},

it follows that

\lambda_{\min}(X_{3})=\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}.

Thus,

	$\displaystyle\\|E_{\rm TG}\\|_{M}^{2}$	$\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}$
		$\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)M^{\frac{1}{2}}\big{)}$
		$\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)},$

which leads to the identity (3.3). ∎

The identity (3.3) provides a straightforward approach to analyzing the convergence properties of Algorithm 1. Of particular interest is a restriction operator that minimizes the convergence factor $\|E_{\rm TG}\|_{M}$ , provided that $M$ is preselected.

The following inequality, known as the Poincaré separation theorem (see, e.g., [11, Corollary 4.3.37]), will be used to analyze the optimal restriction.

Lemma 3.4.

Let $H\in\mathbb{C}^{n\times n}$ be Hermitian, and let $\{\mathbf{q}_{k}\}_{k=1}^{m}\subset\mathbb{C}^{n}\,(1\leq m\leq n)$ be a set of orthonormal vectors. Then, for any $i=1,\ldots,m$ , it holds that

\lambda_{i}(H)\leq\lambda_{i}(\check{H})\leq\lambda_{i+n-m}(H),

where $\check{H}=\big{(}\mathbf{q}_{i}^{\ast}H\mathbf{q}_{j}\big{)}_{i,j=1}^{m}\in\mathbb{C}^{m\times m}$ and $\mathbf{q}_{i}^{\ast}$ denotes the conjugate transpose of $\mathbf{q}_{i}$ .

Based on (3.3) and Lemma 3.4, we can establish the following optimal restriction theory.

Theorem 3.5.

Let $\widetilde{A}$ be defined by (2.2), and let $\{(\mu_{i},\mathbf{v}_{i})\}_{i=1}^{n}$ be the eigenpairs of the generalized eigenvalue problem

\widetilde{A}\mathbf{v}=\mu M\mathbf{v},

where

\mu_{1}\leq\mu_{2}\leq\cdots\leq\mu_{n}\quad\text{and}\quad\mathbf{v}_{i}^{T}M\mathbf{v}_{j}=\begin{cases}1&\text{if $i=j$},\\ 0&\text{if $i\neq j$}.\end{cases}

Under the assumptions of Theorem 3.3, it holds that

\|E_{\rm TG}\|_{M}\geq\sqrt{1-\mu_{n_{\rm c}+1}},

with equality if $\operatorname*{null}(RA)=\operatorname*{span}\{\mathbf{v}_{n_{\rm c}+1},\ldots,\mathbf{v}_{n}\}$ .

Proof.

Observe that

\lambda\big{(}M^{-1}\widetilde{A}\big{)}=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}.

Since $M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}$ and $I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}$ are SPSD, it follows that

0=\mu_{1}=\cdots=\mu_{n-r}<\mu_{n-r+1}\leq\cdots\leq\mu_{n}\leq 1,

where $r=\operatorname*{rank}(\widetilde{A})\geq n-n_{\rm c}$ .

Let

V=(\mathbf{v}_{1},\ldots,\mathbf{v}_{n})\quad\text{and}\quad U_{1}=V^{-1}P_{\star}\big{(}P_{\star}^{T}V^{-T}V^{-1}P_{\star}\big{)}^{-\frac{1}{2}}.

It is easy to check that $V$ is an $n\times n$ nonsingular matrix with $V^{-1}=V^{T}M$ and $U_{1}$ is an $n\times n_{\rm c}$ matrix with orthonormal columns (i.e., $U_{1}^{T}U_{1}=I_{n_{\rm c}}$ ). Let $U_{2}$ be an $n\times(n-n_{\rm c})$ matrix such that $(U_{1}\ U_{2})\in\mathbb{R}^{n\times n}$ is orthogonal, i.e.,

(U_{1}\ U_{2})^{-1}=(U_{1}\ U_{2})^{T}.

Then

	$\displaystyle M^{-1}\widetilde{A}(I-\varPi_{A})$	$\displaystyle=M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}$
		$\displaystyle=M^{-1}\widetilde{A}\big{(}I-VU_{1}(U_{1}^{T}V^{T}MVU_{1})^{-1}U_{1}^{T}V^{T}M\big{)}$
		$\displaystyle=M^{-1}\widetilde{A}(I-VU_{1}U_{1}^{T}V^{T}M)$
		$\displaystyle=M^{-1}\widetilde{A}(I-VU_{1}U_{1}^{T}V^{-1})$
		$\displaystyle=M^{-1}\widetilde{A}VU_{2}U_{2}^{T}V^{-1}$
		$\displaystyle=V\Lambda U_{2}U_{2}^{T}V^{-1},$

where $\Lambda=\operatorname*{diag}\big{(}0,\ldots,0,\mu_{n-r+1},\ldots,\mu_{n}\big{)}\in\mathbb{R}^{n\times n}$ . Hence,

	$\displaystyle\lambda\big{(}\Lambda U_{2}U_{2}^{T}\big{)}$	$\displaystyle=\lambda\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)}$
		$\displaystyle=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}M^{\frac{1}{2}}(I-\varPi_{A})M^{-\frac{1}{2}}\big{)}$
		$\displaystyle=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)},$

which, together with (3.8), yields that $\Lambda U_{2}U_{2}^{T}$ has $n-n_{\rm c}$ positive eigenvalues and eigenvalue $0$ with multiplicity $n_{\rm c}$ . Due to

\begin{pmatrix}U_{1}^{T}\\ U_{2}^{T}\end{pmatrix}\Lambda U_{2}U_{2}^{T}\big{(}U_{1}\ U_{2}\big{)}=\begin{pmatrix}0&U_{1}^{T}\Lambda U_{2}\\ 0&U_{2}^{T}\Lambda U_{2}\end{pmatrix},

it follows that $U_{2}^{T}\Lambda U_{2}$ is SPD. Using (3.3) and Lemma 3.4, we obtain

	$\displaystyle\\|E_{\rm TG}\\|_{M}$	$\displaystyle=\sqrt{1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)}}$
		$\displaystyle=\sqrt{1-\lambda_{\min}^{+}(\Lambda U_{2}U_{2}^{T})}$
		$\displaystyle=\sqrt{1-\lambda_{1}(U_{2}^{T}\Lambda U_{2})}$
		$\displaystyle\geq\sqrt{1-\mu_{n_{\rm c}+1}}.$

In particular, if $\operatorname*{null}(RA)=\operatorname*{span}\{\mathbf{v}_{n_{\rm c}+1},\ldots,\mathbf{v}_{n}\}$ , then

V^{-1}P_{\star}=V^{-1}M^{-1}A^{T}R^{T}=V^{T}A^{T}R^{T}=(RAV)^{T}=\begin{pmatrix}R_{\rm c}^{T}\\ 0\end{pmatrix},

where $R_{\rm c}=RA(\mathbf{v}_{1},\ldots,\mathbf{v}_{n_{\rm c}})\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}}$ is nonsingular. We then have

	$\displaystyle U_{2}U_{2}^{T}$	$\displaystyle=I-U_{1}U_{1}^{T}$
		$\displaystyle=I-V^{-1}P_{\star}\big{(}P_{\star}^{T}V^{-T}V^{-1}P_{\star}\big{)}^{-1}P_{\star}^{T}V^{-T}$
		$\displaystyle=I-\begin{pmatrix}R_{\rm c}^{T}\\ 0\end{pmatrix}R_{\rm c}^{-T}R_{\rm c}^{-1}\big{(}R_{\rm c}\,\ 0\big{)}$
		$\displaystyle=\begin{pmatrix}0&0\\ 0&I_{n-n_{\rm c}}\end{pmatrix}.$

Thus,

\|E_{\rm TG}\|_{M}=\sqrt{1-\lambda_{\min}^{+}(\Lambda U_{2}U_{2}^{T})}=\sqrt{1-\mu_{n_{\rm c}+1}}.

This completes the proof. ∎

To analyze the influence of $\operatorname*{range}(R^{T})$ on $\|E_{\rm TG}\|_{M}$ , we need the following lemma; see, e.g., [11, Corollary 4.3.5].

Lemma 3.6.

Let $H_{1}$ and $H_{2}$ be $n\times n$ Hermitian matrices. If $H_{2}$ is singular, then

\lambda_{i}(H_{1}+H_{2})\leq\lambda_{i+\operatorname*{rank}(H_{2})}(H_{1})

for all $i=1,\ldots,n-\operatorname*{rank}(H_{2})$ .

The following theorem shows that $\|E_{\rm TG}\|_{M}$ decreases when expanding the row space of $R$ .

Theorem 3.7.

Assume that $\widehat{R}\in\mathbb{R}^{\widehat{n}_{\rm c}\times n}$ has full row rank, where $n_{\rm c}\leq\widehat{n}_{\rm c}<n$ . If $\operatorname*{range}(R^{T})\subseteq\operatorname*{range}(\widehat{R}^{T})$ , then

(3.9)

\|\widehat{E}_{\rm TG}\|_{M}\leq\|E_{\rm TG}\|_{M},

where $E_{\rm TG}$ is given by (2.7) and $\widehat{E}_{\rm TG}$ is formed by replacing $R$ in (2.7) with $\widehat{R}$ .

Proof.

Since $\operatorname*{range}(R^{T})\subseteq\operatorname*{range}(\widehat{R}^{T})$ , there exists an $\widehat{n}_{\rm c}\times n_{\rm c}$ matrix $Y$ , with full column rank, such that

R^{T}=\widehat{R}^{T}Y.

Let $\widehat{Y}$ be an $\widehat{n}_{\rm c}\times\widehat{n}_{\rm c}$ nonsingular matrix such that

Y=\widehat{Y}\begin{pmatrix}I_{n_{\rm c}}\\ 0\end{pmatrix}.

Then

R^{T}=\widehat{R}^{T}\widehat{Y}\begin{pmatrix}I_{n_{\rm c}}\\ 0\end{pmatrix}.

Hence, there exists an $n\times(\widehat{n}_{\rm c}-n_{\rm c})$ matrix $Z_{0}$ , with full column rank, such that

(3.10)

\widehat{R}^{T}=\big{(}R^{T}\ Z_{0}\big{)}\widehat{Y}^{-1}.

According to the proof of Theorem 3.3, we have

	$\displaystyle\sigma_{\rm TG}$	$\displaystyle=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}\big{)}$
		$\displaystyle=\lambda_{n_{\rm c}+1}\big{(}M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}\big{)}$
		$\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}M^{-1}\big{)}$
		$\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}.$

Similarly, one has

\|\widehat{E}_{\rm TG}\|_{M}=\sqrt{1-\widehat{\sigma}_{\rm TG}},

where

\widehat{\sigma}_{\rm TG}=\lambda_{\widehat{n}_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}\quad\text{with}\quad\widehat{P}_{\star}=M^{-1}A^{T}\widehat{R}^{T}.

By (3.10), we have

\widehat{P}_{\star}=\big{(}P_{\star}\ Z\big{)}\widehat{Y}^{-1}\quad\text{with}\quad Z=M^{-1}A^{T}Z_{0}.

Let

	$\displaystyle D_{\star}$	$\displaystyle=\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T},$
	$\displaystyle L_{\star}$	$\displaystyle=\begin{pmatrix}I_{n_{\rm c}}&0\\ -Z^{T}MP_{\star}(P_{\star}^{T}MP_{\star})^{-1}&I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix},$
	$\displaystyle S_{\star}$	$\displaystyle=Z^{T}MZ-Z^{T}MP_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ.$

Then

	$\displaystyle D_{\star}$	$\displaystyle=\big{(}P_{\star}\ Z\big{)}\bigg{[}\begin{pmatrix}P_{\star}^{T}MP_{\star}&P_{\star}^{T}MZ\\ Z^{T}MP_{\star}&Z^{T}MZ\end{pmatrix}^{-1}-\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&0\end{pmatrix}\bigg{]}\big{(}P_{\star}\ Z\big{)}^{T}$
		$\displaystyle=\big{(}P_{\star}\ Z\big{)}\bigg{[}L_{\star}^{T}\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&S_{\star}^{-1}\end{pmatrix}L_{\star}-\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&0\end{pmatrix}\bigg{]}\big{(}P_{\star}\ Z\big{)}^{T}$
		$\displaystyle=\big{(}P_{\star}\ Z\big{)}\begin{pmatrix}-(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ\\ I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix}S_{\star}^{-1}\begin{pmatrix}-(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ\\ I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix}^{T}\big{(}P_{\star}\ Z\big{)}^{T},$

from which we deduce that $D_{\star}$ is SPSD and $\operatorname*{rank}(D_{\star})\leq\widehat{n}_{\rm c}-n_{\rm c}$ . It follows that

\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}D_{\star}\widetilde{A}^{\frac{1}{2}}\big{)}\leq\widehat{n}_{\rm c}-n_{\rm c},

which, together with Lemma 3.6, yields

	$\displaystyle\sigma_{\rm TG}$	$\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}$
		$\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}+\widetilde{A}^{\frac{1}{2}}D_{\star}\widetilde{A}^{\frac{1}{2}}\big{)}$
		$\displaystyle\leq\lambda_{n_{\rm c}+1+\widehat{n}_{\rm c}-n_{\rm c}}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}$
		$\displaystyle=\lambda_{\widehat{n}_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}=\widehat{\sigma}_{\rm TG}.$

Thus,

\|E_{\rm TG}\|_{M}=\sqrt{1-\sigma_{\rm TG}}\geq\sqrt{1-\widehat{\sigma}_{\rm TG}}=\|\widehat{E}_{\rm TG}\|_{M},

which gives the inequality (3.9). ∎

4. An inexact variant of Algorithm 1

Recall that the coarse-grid system in Algorithm 1 to be solved reads

(4.1)

A_{\rm c}\mathbf{e}_{\rm c}=\mathbf{r}_{\rm c}.

In practice, it is often too costly to solve (4.1) exactly when its size is large. Instead, without essential loss of convergence speed, it is advisable to find an approximation to the exact solution $\mathbf{e}_{\rm c}\equiv A_{\rm c}^{-1}\mathbf{r}_{\rm c}$ . Note that, unlike the original problem (2.1), the coefficient matrix in (4.1) is SPD if $P_{\star}$ is used as a prolongation. For such linear systems, there are many efficient numerical methods available in the literature, such as multigrid and conjugate gradient methods.

In what follows, we analyze the convergence of an inexact variant of Algorithm 1, which is formalized as Algorithm 2.

Algorithm 2 Inexact two-grid method.

1:Smoothing:

\mathbf{u}^{(1)}\leftarrow\mathbf{u}^{(0)}+M^{-1}\big{(}\mathbf{f}-A\mathbf{u}^{(0)}\big{)}

\triangleright

\mathbf{u}^{(0)}\in\mathbb{R}^{n}

is an initial guess

2:Restriction:

\mathbf{r}_{\rm c}\leftarrow R\big{(}\mathbf{f}-A\mathbf{u}^{(1)}\big{)}

3:Coarse-grid correction:

\hat{\mathbf{e}}_{\rm c}\leftarrow\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket

\triangleright

\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket:\mathbb{R}^{n_{\rm c}}\rightarrow\mathbb{R}^{n_{\rm c}}

and

\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket\approx A_{\rm c}^{-1}

4:Prolongation:

\mathbf{u}_{\rm ITG}\leftarrow\mathbf{u}^{(1)}+P_{\star}\hat{\mathbf{e}}_{\rm c}

\triangleright

P_{\star}

is given by (2.6)

4.1. Linear coarse solvers

The solver $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket$ in Algorithm 2 is a general mapping from $\mathbb{R}^{n_{\rm c}}$ to $\mathbb{R}^{n_{\rm c}}$ , which is expected to be a good approximation to $A_{\rm c}^{-1}$ . In this subsection, we consider the linear case $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1}$ , where $B_{\rm c}$ is an $n_{\rm c}\times n_{\rm c}$ nonsingular matrix such that $B_{\rm c}+B_{\rm c}^{T}-A_{\rm c}$ is SPD, or, equivalently, $\big{\|}I_{n_{\rm c}}-B_{\rm c}^{-1}A_{\rm c}\big{\|}_{A_{\rm c}}<1$ .

From Algorithm 2, we have

\mathbf{u}-\mathbf{u}_{\rm ITG}=E_{\rm ITG}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

where

E_{\rm ITG}=\big{(}I-P_{\star}B_{\rm c}^{-1}RA\big{)}(I-M^{-1}A).

In view of (2.6), $E_{\rm ITG}$ can be expressed as

(4.2)

E_{\rm ITG}=\big{(}I-P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M\big{)}(I-M^{-1}A).

Define

(4.3)

\overline{B}_{\rm c}:=B_{\rm c}\big{(}B_{\rm c}+B_{\rm c}^{T}-A_{\rm c}\big{)}^{-1}B_{\rm c}^{T}.

By (4.2), we have

	$\displaystyle\\|E_{\rm ITG}\\|_{M}^{2}$	$\displaystyle=\big{\\|}\big{(}I-P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M\big{)}(I-M^{-1}A)\big{\\|}_{M}^{2}$
		$\displaystyle=\big{\\|}\big{(}I-M^{\frac{1}{2}}P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\\|}_{2}^{2}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{\frac{1}{2}}P_{\star}\overline{B}_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}.$

Due to

I-A_{\rm c}^{\frac{1}{2}}\overline{B}_{\rm c}^{-1}A_{\rm c}^{\frac{1}{2}}=\Big{(}I-A_{\rm c}^{\frac{1}{2}}B_{\rm c}^{-T}A_{\rm c}^{\frac{1}{2}}\Big{)}\Big{(}I-A_{\rm c}^{\frac{1}{2}}B_{\rm c}^{-1}A_{\rm c}^{\frac{1}{2}}\Big{)},

it follows that $\overline{B}_{\rm c}-A_{\rm c}$ is SPSD. Let


(4.4a)	$\displaystyle\alpha_{1}$	$\displaystyle=\lambda_{\min}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)},$
(4.4b)	$\displaystyle\alpha_{2}$	$\displaystyle=\lambda_{\max}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}.$

Then

0<\alpha_{1}\leq\alpha_{2}\leq 1.

From (4.4a) and (4.4b), we deduce that $\overline{B}_{\rm c}-\frac{1}{\alpha_{2}}A_{\rm c}$ and $\frac{1}{\alpha_{1}}A_{\rm c}-\overline{B}_{\rm c}$ are SPSD. Hence,

(4.5)

\lambda_{\max}(\mathcal{E}_{2})\leq\|E_{\rm ITG}\|_{M}^{2}\leq\lambda_{\max}(\mathcal{E}_{1}),

where

\mathcal{E}_{k}=\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\alpha_{k}\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}

with $\varPi$ being given by (3.5).

To estimate the largest eigenvalues of $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ , we need the following eigenvalue identities.

Lemma 4.1.

Let $\varPi$ be given by (3.5). Then


(4.6a)		$\displaystyle\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=0,$
(4.6b)		$\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\sigma_{\rm TG},$
(4.6c)		$\displaystyle\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=0,$
(4.6d)		$\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\delta_{\rm TG},$

where $\sigma_{\rm TG}$ is given by (3.4) and

(4.7)

\delta_{\rm TG}=\begin{cases}\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\varPi_{A}\big{)}&\text{if $\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})=\{0\}$},\\[2.0pt] 0&\text{if $\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})\neq\{0\}$}.\end{cases}

Proof.

Recall that $I-\varPi$ is an $L^{2}$ -orthogonal projection operator. Then, the matrix $\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}$ is SPSD, which leads to (4.6a). Similarly, the identity (4.6c) holds. The identity (4.6b) follows from (3.3) and the fact

\|E_{\rm TG}\|_{M}^{2}=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}.

It remains to prove (4.6d).

By (3.6) and (3.7), we have

(4.8)		$\displaystyle\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\varPi$	$\displaystyle=Q\begin{pmatrix}I_{n_{\rm c}}-X_{1}&0\\ -X_{2}^{T}&0\end{pmatrix}Q^{T},$
(4.9)		$\displaystyle\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi$	$\displaystyle=Q\begin{pmatrix}X_{1}&0\\ 0&0\end{pmatrix}Q^{T}.$

Due to

\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}=I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}},

it follows that

	$\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$	$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\varPi\big{)}$
		$\displaystyle=\max\big{\{}1-\lambda_{\min}(X_{1}),\,0\big{\}},$

where we have used the fact (4.8). Since $Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q$ and $I-Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q$ are SPSD, we deduce from (3.7) that $X_{1}$ is SPSD and $\lambda(X_{1})\subset[0,1]$ . Thus,

(4.10)

\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\lambda_{\min}(X_{1}).

From (4.9), we have

	$\displaystyle\operatorname*{rank}(X_{1})$	$\displaystyle=\operatorname*{rank}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}$
		$\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}M^{-\frac{1}{2}}\varPi\big{)}$
		$\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}$
		$\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}\leq n_{\rm c},$

with equality if and only if

(4.11)

\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})=\{0\}.

In fact, if the relation (4.11) holds, then $P_{\star}^{T}\widetilde{A}P_{\star}$ is SPD, because it is SPSD and $\mathbf{v}_{\rm c}^{T}P_{\star}^{T}\widetilde{A}P_{\star}\mathbf{v}_{\rm c}=(P_{\star}\mathbf{v}_{\rm c})^{T}\widetilde{A}(P_{\star}\mathbf{v}_{\rm c})=0$ entails that $\mathbf{v}_{\rm c}=0$ . Hence,

\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}=\operatorname*{rank}\big{(}P_{\star}^{T}\widetilde{A}P_{\star}\big{)}=n_{\rm c}.

Conversely, if $\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}=n_{\rm c}$ , then $\operatorname*{rank}\big{(}P_{\star}^{T}\widetilde{A}P_{\star}\big{)}=n_{\rm c}$ , i.e., $P_{\star}^{T}\widetilde{A}P_{\star}$ is SPD. If (4.11) does not hold, then there exists a nonzero vector $\mathbf{w}_{\rm c}\in\mathbb{R}^{n_{\rm c}}$ such that

\widetilde{A}P_{\star}\mathbf{w}_{\rm c}=0.

Accordingly, $\mathbf{w}_{\rm c}^{T}P_{\star}^{T}\widetilde{A}P_{\star}\mathbf{w}_{\rm c}=0$ , which contradicts with the positive definiteness of $P_{\star}^{T}\widetilde{A}P_{\star}$ . The above analysis, together with (4.9), yields

(4.12)

\lambda_{\min}(X_{1})=\begin{cases}\lambda_{\min}^{+}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}&\text{if~{}\eqref{null-range} holds},\\[2.0pt] 0&\text{otherwise}.\end{cases}

The identity (4.6d) then follows from (4.10), (4.12), and the fact

\lambda_{\min}^{+}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}=\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\varPi_{A}\big{)}.

This completes the proof. ∎

To bound $\lambda_{\max}(\mathcal{E}_{1})$ and $\lambda_{\max}(\mathcal{E}_{2})$ , we also need an important tool for eigenvalue analysis, which is the following Weyl’s theorem; see, e.g., [11, Theorem 4.3.1].

Lemma 4.2.

Let $H_{1}$ and $H_{2}$ be $n\times n$ Hermitian matrices. Assume that the spectra of $H_{1}$ , $H_{2}$ , and $H_{1}+H_{2}$ are $\{\lambda_{i}(H_{1})\}_{i=1}^{n}$ , $\{\lambda_{i}(H_{2})\}_{i=1}^{n}$ , and $\{\lambda_{i}(H_{1}+H_{2})\}_{i=1}^{n}$ , respectively. Then, for each $k=1,\ldots,n$ , it holds that

\lambda_{k-j+1}(H_{1})+\lambda_{j}(H_{2})\leq\lambda_{k}(H_{1}+H_{2})\leq\lambda_{k+\ell}(H_{1})+\lambda_{n-\ell}(H_{2})

for all $j=1,\ldots,k$ and $\ell=0,\ldots,n-k$ . In particular, one has


(4.13a)	$\displaystyle\lambda_{\max}(H_{1}+H_{2})$	$\displaystyle\geq\max\big{\{}\lambda_{\max}(H_{1})+\lambda_{\min}(H_{2}),\,\lambda_{\min}(H_{1})+\lambda_{\max}(H_{2})\big{\}},$
(4.13b)	$\displaystyle\lambda_{\max}(H_{1}+H_{2})$	$\displaystyle\leq\lambda_{\max}(H_{1})+\lambda_{\max}(H_{2}).$

We are now ready to present a convergence estimate for Algorithm 2.

Theorem 4.3.

Let $B_{\rm c}$ be an $n_{\rm c}\times n_{\rm c}$ matrix such that $B_{\rm c}+B_{\rm c}^{T}-A_{\rm c}$ is SPD, and let $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1}$ . Then, the $M$ -convergence factor of Algorithm 2 satisfies

(4.14)

\mathscr{L}(\alpha_{2})\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}(\alpha_{1})

with


(4.15a)	$\displaystyle\mathscr{L}(\alpha_{2})$	$\displaystyle:=\sqrt{1-\min\big{\{}\sigma_{\rm TG},\,\lambda_{\min}(M^{-1}\widetilde{A})+\alpha_{2}(1-\delta_{\rm TG})\big{\}}},$
(4.15b)	$\displaystyle\mathscr{U}(\alpha_{1})$	$\displaystyle:=\sqrt{1-\alpha_{1}\sigma_{\rm TG}-(1-\alpha_{1})\lambda_{\min}(M^{-1}\widetilde{A})},$

where $\sigma_{\rm TG}$ , $\delta_{\rm TG}$ , $\alpha_{1}$ , and $\alpha_{2}$ are given by (3.4), (4.7), (4.4a), and (4.4b), respectively.

Proof.

In view of (4.5), one can obtain a two-sided estimate for $\|E_{\rm ITG}\|_{M}$ by bounding the eigenvalues $\lambda_{\max}(\mathcal{E}_{1})$ and $\lambda_{\max}(\mathcal{E}_{2})$ .

By (4.6d) and (4.13a), we have

	$\displaystyle\lambda_{\max}(\mathcal{E}_{2})$	$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\alpha_{2}\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\quad-\alpha_{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}-\alpha_{2}(1-\delta_{\rm TG})$
		$\displaystyle=1-\lambda_{\min}(M^{-1}\widetilde{A})-\alpha_{2}(1-\delta_{\rm TG}).$

On the other hand, we deduce from (4.6b), (4.6c), and (4.13a) that

	$\displaystyle\lambda_{\max}(\mathcal{E}_{2})$	$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-\varPi+(1-\alpha_{2})\varPi\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\quad+(1-\alpha_{2})\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle=1-\sigma_{\rm TG}.$

Thus,

(4.16)

\lambda_{\max}(\mathcal{E}_{2})\geq 1-\min\big{\{}\sigma_{\rm TG},\,\lambda_{\min}(M^{-1}\widetilde{A})+\alpha_{2}(1-\delta_{\rm TG})\big{\}}.

Using (4.6b) and (4.13b), we obtain

	$\displaystyle\lambda_{\max}(\mathcal{E}_{1})$	$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}(1-\alpha_{1})I+\alpha_{1}(I-\varPi)\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\leq(1-\alpha_{1})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\quad+\alpha_{1}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle=(1-\alpha_{1})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}+\alpha_{1}(1-\sigma_{\rm TG})$
		$\displaystyle=(1-\alpha_{1})\big{(}1-\lambda_{\min}(M^{-1}\widetilde{A})\big{)}+\alpha_{1}(1-\sigma_{\rm TG}),$

that is,

(4.17)

\lambda_{\max}(\mathcal{E}_{1})\leq 1-\alpha_{1}\sigma_{\rm TG}-(1-\alpha_{1})\lambda_{\min}(M^{-1}\widetilde{A}).

Combining (4.5), (4.16), and (4.17), we can arrive at the estimate (4.14). ∎

Remark 4.4.

From (4.6b), (4.6d), and (4.13b), we deduce that

2-\sigma_{\rm TG}-\delta_{\rm TG}\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\lambda_{\min}(M^{-1}\widetilde{A}),

which yields

1-\delta_{\rm TG}+\lambda_{\min}(M^{-1}\widetilde{A})\geq\sigma_{\rm TG}.

If $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=A_{\rm c}^{-1}$ , then $\alpha_{1}=\alpha_{2}=1$ , in which case one has

\mathscr{L}(\alpha_{2})=\mathscr{U}(\alpha_{1})=\sqrt{1-\sigma_{\rm TG}},

and hence (4.14) reduces to the identity (3.3).

As a consequence of Theorem 4.3, we have the following corollary.

Corollary 4.5.

Let $\mathscr{L}(\cdot)$ and $\mathscr{U}(\cdot)$ be defined by (4.15a) and (4.15b), respectively. Assume that $B_{\rm c}$ is an $n_{\rm c}\times n_{\rm c}$ matrix such that $2B_{\rm c}-A_{\rm c}$ is SPD, and let

(4.18)

\beta_{1}=\lambda_{\min}\big{(}B_{\rm c}^{-1}A_{\rm c}\big{)}\quad\text{and}\quad\beta_{2}=\lambda_{\max}\big{(}B_{\rm c}^{-1}A_{\rm c}\big{)}.

Then, the convergence factor of Algorithm 2 with $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1}$ satisfies the following estimates.

(i) If $\beta_{2}\leq 1$ , then

\mathscr{L}\big{(}(2-\beta_{2})\beta_{2}\big{)}\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}(2-\beta_{1})\beta_{1}\big{)}.

(ii) If $\beta_{1}\leq 1<\beta_{2}$ , then

\mathscr{L}(1)\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}\min\big{\{}(2-\beta_{1})\beta_{1},\,(2-\beta_{2})\beta_{2}\big{\}}\big{)}.

(iii) If $1<\beta_{1}\leq\beta_{2}<2$ , then

\mathscr{L}\big{(}(2-\beta_{1})\beta_{1}\big{)}\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}(2-\beta_{2})\beta_{2}\big{)}.

Proof.

Since $B_{\rm c}$ is SPD, it follows from (4.3) that

\overline{B}_{\rm c}^{-1}A_{\rm c}=\big{(}2I_{n_{\rm c}}-B_{\rm c}^{-1}A_{\rm c}\big{)}B_{\rm c}^{-1}A_{\rm c}.

With the notation in (4.18), we have

	$\displaystyle\alpha_{1}$	$\displaystyle=\lambda_{\min}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}=\begin{cases}(2-\beta_{1})\beta_{1}&\text{if $\beta_{2}\leq 1$},\\[2.0pt] \min\big{\{}(2-\beta_{1})\beta_{1},\,(2-\beta_{2})\beta_{2}\big{\}}&\text{if $\beta_{1}\leq 1<\beta_{2}$},\\[2.0pt] (2-\beta_{2})\beta_{2}&\text{if $1<\beta_{1}\leq\beta_{2}<2$},\end{cases}$
	$\displaystyle\alpha_{2}$	$\displaystyle=\lambda_{\max}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}=\begin{cases}(2-\beta_{2})\beta_{2}&\text{if $\beta_{2}\leq 1$},\\[2.0pt] 1&\text{if $\beta_{1}\leq 1<\beta_{2}$},\\[2.0pt] (2-\beta_{1})\beta_{1}&\text{if $1<\beta_{1}\leq\beta_{2}<2$}.\end{cases}$

The desired result then follows from Theorem 4.3. ∎

4.2. Nonlinear coarse solvers

In Theorem 4.3, the extreme eigenvalues $\alpha_{1}$ and $\alpha_{2}$ (or their bounds) are required in order to get an estimate for $\|E_{\rm ITG}\|_{M}$ . However, such quantities will be unavailable if $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket$ is a nonlinear solver (e.g., the conjugate gradient [10] and generalized minimal residual methods [20]). In this subsection, we analyze the convergence of Algorithm 2 with nonlinear coarse solvers.

Theorem 4.6.

(4.19)

\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}\leq\varepsilon\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}

for some $\varepsilon\in[0,1)$ , then $\mathbf{u}_{\rm ITG}$ produced by Algorithm 2 satisfies

(4.20)

\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}\leq\sqrt{1-(1-\varepsilon^{2})\sigma_{\rm TG}-\varepsilon^{2}\lambda_{\min}(M^{-1}\widetilde{A})}\,\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M},

where $\sigma_{\rm TG}$ is given by (3.4).

Proof.

From (4.19), we have

\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}-2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\leq\varepsilon^{2}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

that is,

(4.21)

2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\geq(1-\varepsilon^{2})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}.

Due to

\mathbf{u}-\mathbf{u}_{\rm ITG}=\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket,

it follows that

	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}$	$\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}MP_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}A^{T}R^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}.$

Using (4.21), we obtain

	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}$	$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{\\|}A_{\rm c}^{-1}RA\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{\\|}A_{\rm c}^{-1}P_{\star}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\varPi M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}$
		$\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\big{(}I-(1-\varepsilon^{2})\varPi\big{)}M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)},$

where $\varPi$ is given by (3.5). Since

\mathbf{u}-\mathbf{u}^{(1)}=(I-M^{-1}A)\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

it follows that

\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\leq\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)}^{T}M^{\frac{1}{2}}\mathcal{E}M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)}

with

\mathcal{E}=\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-(1-\varepsilon^{2})\varPi\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}.

Hence,

(4.22)

\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\leq\lambda_{\max}(\mathcal{E})\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.

By (4.6b) and (4.13b), we have

	$\displaystyle\lambda_{\max}(\mathcal{E})$	$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}\varepsilon^{2}I+(1-\varepsilon^{2})(I-\varPi)\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\leq\varepsilon^{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle\quad+(1-\varepsilon^{2})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}$
		$\displaystyle=\varepsilon^{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}+(1-\varepsilon^{2})(1-\sigma_{\rm TG})$
		$\displaystyle=\varepsilon^{2}\big{(}1-\lambda_{\min}(M^{-1}\widetilde{A})\big{)}+(1-\varepsilon^{2})(1-\sigma_{\rm TG})$
		$\displaystyle=1-(1-\varepsilon^{2})\sigma_{\rm TG}-\varepsilon^{2}\lambda_{\min}(M^{-1}\widetilde{A}),$

which, combined with (4.22), yields the estimate (4.20). ∎

Besides deterministic methods, one may apply some randomized methods to the problem (4.1). Similarly to Theorem 4.6, one can derive the following convergence estimates.

Theorem 4.7.

Assume that $\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket:\mathbb{R}^{n_{\rm c}}\rightarrow\mathbb{R}^{n_{\rm c}}$ is a randomized coarse solver, and let $\gamma_{1},\gamma_{2}\in[0,1)$ be two tolerance factors.

(i) If

(4.23)

\big{\|}\mathbb{E}\big{[}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}\leq\gamma_{1}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

then

\|\mathbb{E}[\mathbf{u}-\mathbf{u}_{\rm ITG}]\|_{M}^{2}\leq\big{(}1-(1-\gamma_{1})\sigma_{\rm TG}-\gamma_{1}\lambda_{\min}(M^{-1}\widetilde{A})\big{)}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.

(ii) If

(4.24)

\mathbb{E}\Big{[}\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}\leq\gamma_{2}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

then

\mathbb{E}\big{[}\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\big{]}\leq\big{(}1-(1-\gamma_{2})\sigma_{\rm TG}-\gamma_{2}\lambda_{\min}(M^{-1}\widetilde{A})\big{)}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.

Proof.

From (4.23) and (4.24), we deduce that

	$\displaystyle 2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}$	$\displaystyle\geq(1-\gamma_{1})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}+\big{\\|}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\\|}_{A_{\rm c}}^{2},$
	$\displaystyle 2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}$	$\displaystyle\geq(1-\gamma_{2})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}+\mathbb{E}\Big{[}\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}\Big{]}.$

Then

	$\displaystyle\\|\mathbb{E}[\mathbf{u}-\mathbf{u}_{\rm ITG}]\\|_{M}^{2}$	$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\\|}_{M}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\big{\\|}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\gamma_{1})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2},$
	$\displaystyle\mathbb{E}\big{[}\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}\big{]}$	$\displaystyle=\mathbb{E}\Big{[}\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{M}^{2}\Big{]}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\mathbb{E}\Big{[}\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}\Big{]}$
		$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\gamma_{2})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}.$

The remainder of this proof is similar to Theorem 4.6. ∎

Remark 4.8.

We remark that the conditions (4.23) and (4.24) can be satisfied by a wide variety of randomized/stochastic algorithms; see, e.g., [7, 19]. For any random vector $\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket$ , it holds that (see [7, Lemma 4.1])

\mathbb{E}\Big{[}\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}=\big{\|}\mathbb{E}\big{[}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}+\mathbb{E}\Big{[}\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket-\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}\Big{]}.

This shows that (4.24) is a stronger type of condition compared to (4.23).

5. Conclusions

In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems. In the case of exact coarse solver, we establish an elegant identity for characterizing the $M$ -convergence factor of the two-grid method. Based on the identity, we derive a class of optimal restriction operators and analyze the influence of the row space of restriction on the convergence factor. Furthermore, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered. It is worth pointing out that our results can be extended straightforwardly to the complex case (non-Hermitian positive definite problems). In the future, we expect to develop some new practical algorithms for nonsymmetric or non-Hermitian positive definite problems based on our two-grid theory.

Acknowledgment

The author is grateful to Prof. Chen-Song Zhang (LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences) for his helpful suggestions.

References

[1] J. Brannick, F. Cao, K. Kahl, R. D. Falgout, and X. Hu, Optimal interpolation and compatible relaxation in classical algebraic multigrid, SIAM J. Sci. Comput. 40 (2018), A1473–A1493.
[2] M. Brezina, T. A. Manteuffel, S. F. McCormick, J. W. Ruge, and G. Sanders, Towards adaptive smoothed aggregation ( $\alpha$ SA) for nonsymmetric problems, SIAM J. Sci. Comput. 32 (2010), 14–39.
[3] W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM, Philadelphia, 2000.
[4] R. D. Falgout and P. S. Vassilevski, On generalizing the algebraic multigrid framework, SIAM J. Numer. Anal. 42 (2004), 1669–1693.
[5] R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov, On two-grid convergence estimates, Numer. Linear Algebra Appl. 12 (2005), 471–494.
[6] L. García Ramos, R. Kehl, and R. Nabben, Projections, deflation, and multigrid for nonsymmetric matrices, SIAM J. Matrix Anal. Appl. 41 (2020), 83–105.
[7] R. M. Gower and P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. Appl. 36 (2015), 1660–1690.
[8] A. Greenbaum and Z. Strakos, Matrices that generate the same Krylov residual spaces, Recent Advances in Iterative Methods, IMA Vol. Math. Appl., vol. 60, Springer, New York, 1994, pp. 95–118.
[9] W. Hackbusch, Multi-Grid Methods and Applications, Springer-Verlag, Berlin, 1985.
[10] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand. 49 (1952), 409–436.
[11] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed., Cambridge University Press, Cambridge, UK, 2013.
[12] J. Lottes, Towards Robust Algebraic Multigrid Methods for Nonsymmetric Problems, Springer Theses, Springer International Publishing AG, 2017.
[13] T. A. Manteuffel, S. Münzenmaier, J. W. Ruge, and B. S. Southworth, Nonsymmetric reduction-based algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S242–S268.
[14] T. A. Manteuffel, L. N. Olson, J. B. Schroder, and B. S. Southworth, A root-node–based algebraic multigrid method, SIAM J. Sci. Comput. 39 (2017), S723–S756.
[15] T. A. Manteuffel, J. W. Ruge, and B. S. Southworth, Nonsymmetric algebraic multigrid based on local approximate ideal restriction ( $\ell$ AIR), SIAM J. Sci. Comput. 40 (2018), A4105–A4130.
[16] T. A. Manteuffel and B. S. Southworth, Convergence in norm of nonsymmetric algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S269–S296.
[17] Y. Notay, Algebraic analysis of two-grid methods: The nonsymmetric case, Numer. Linear Algebra Appl. 17 (2010), 73–96.
[18] Y. Notay, Analysis of two-grid methods: The nonnormal case, Math. Comp. 89 (2020), 807–827.
[19] P. Richtárik and M. Takáč, Stochastic reformulations of linear systems: Algorithms and convergence theory, SIAM J. Matrix Anal. Appl. 41 (2020), 487–524.
[20] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (1986), 856–869.
[21] U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid, Academic Press, London, 2001.
[22] P. S. Vassilevski, Multilevel Block Factorization Preconditioners: Matrix-Based Analysis and Algorithms for Solving Finite Element Equations, Springer, New York, 2008.
[23] T. A. Wiesner, R. S. Tuminaro, W. A. Wall, and M. W. Gee, Multigrid transfers for nonsymmetric systems based on Schur complements and Galerkin projections, Numer. Linear Algebra Appl. 21 (2014), 415–438.
[24] J. Xu and L. T. Zikatanov, The method of alternating projections and the method of subspace corrections in Hilbert space, J. Amer. Math. Soc. 15 (2002), 573–597.
[25] J. Xu and L. T. Zikatanov, Algebraic multigrid methods, Acta Numer. 26 (2017), 591–721.
[26] X. Xu and C.-S. Zhang, On the ideal interpolation operator in algebraic multigrid methods, SIAM J. Numer. Anal. 56 (2018), 1693–1710.
[27] X. Xu and C.-S. Zhang, Convergence analysis of inexact two-grid methods: A theoretical framework, SIAM J. Numer. Anal. 60 (2022), 133–156.
[28] X. Xu and C.-S. Zhang, A new analytical framework for the convergence of inexact two-grid methods, SIAM J. Matrix Anal. Appl. 43 (2022), 512–533.
[29] L. T. Zikatanov, Two-sided bounds on the convergence rate of two-level methods, Numer. Linear Algebra Appl. 15 (2008), 439–454.

	$\displaystyle\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}$	$\displaystyle\leq\\|I-M^{-1}A\\|_{M}\big{\\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\\|}_{M},$
	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm TG}\\|_{M}$	$\displaystyle\leq\\|I-\varPi_{A}\\|_{M}\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}.$

	$\displaystyle\\|E_{\rm ITG}\\|_{M}^{2}$	$\displaystyle=\big{\\|}\big{(}I-P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M\big{)}(I-M^{-1}A)\big{\\|}_{M}^{2}$
		$\displaystyle=\big{\\|}\big{(}I-M^{\frac{1}{2}}P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\\|}_{2}^{2}$
		$\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{\frac{1}{2}}P_{\star}\overline{B}_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}.$

	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}$	$\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}MP_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}A^{T}R^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}.$

	$\displaystyle\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}$	$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{\\|}A_{\rm c}^{-1}RA\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{\\|}A_{\rm c}^{-1}P_{\star}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\varepsilon^{2})\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\varPi M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}$
		$\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\big{(}I-(1-\varepsilon^{2})\varPi\big{)}M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)},$

	$\displaystyle\\|\mathbb{E}[\mathbf{u}-\mathbf{u}_{\rm ITG}]\\|_{M}^{2}$	$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\\|}_{M}^{2}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\big{\\|}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\\|}_{A_{\rm c}}^{2}$
		$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\gamma_{1})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2},$
	$\displaystyle\mathbb{E}\big{[}\\|\mathbf{u}-\mathbf{u}_{\rm ITG}\\|_{M}^{2}\big{]}$	$\displaystyle=\mathbb{E}\Big{[}\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{M}^{2}\Big{]}$
		$\displaystyle=\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\mathbb{E}\Big{[}\big{\\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\\|}_{A_{\rm c}}^{2}\Big{]}$
		$\displaystyle\leq\big{\\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\\|}_{M}^{2}-(1-\gamma_{2})\\|\mathbf{e}_{\rm c}\\|_{A_{\rm c}}^{2}.$