This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence analysis of a two-grid method for nonsymmetric positive definite problems

Xuefeng Xu Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA [email protected]; [email protected]
Abstract.

Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.

Key words and phrases:
multigrid, two-grid methods, nonsymmetric positive definite problems, convergence theory, inexact coarse solvers
2010 Mathematics Subject Classification:
Primary 65F08, 65F10, 65N15, 65N55; Secondary 15A18, 65N75

1. Introduction

Multigrid is one of the most efficient numerical methods for solving large-scale systems of linear equations that arise from discretized partial differential equations. It has been shown to be a powerful solver, with linear or near-linear computational complexity, for a large class of linear systems; see, e.g., [9, 3, 21, 22]. The fundamental module of multigrid methods is a two-grid scheme, which consists of two complementary error-reduction processes: smoothing and coarse-grid correction. The smoothing process is typically a simple iterative method, such as the (weighted) Jacobi and Gauss–Seidel methods. One limitation of these classical methods is that they are only efficient at eliminating high-frequency (i.e., oscillatory) error in general. The remaining low-frequency (i.e., smooth) error will be further reduced by the coarse-grid correction, which is essentially a projection-type method. These two processes will be applied iteratively in multigrid algorithms until a desired residual tolerance is achieved.

Much of multigrid analysis within the literature relies on the problem matrix, AA, being symmetric positive definite (SPD). For such problems, the convergence theory of multigrid methods has been well developed; see, e.g., [24, 4, 5, 22, 25, 27, 28]. When AA is SPD, the convergence analysis of a symmetric two-grid method (i.e., pre- and postsmoothing steps are performed in a symmetric way) amounts to analyzing the eigenvalues of its iteration matrix (also called the error propagation matrix), because the AA-norm of the iteration matrix, called the convergence factor, coincides with its spectral radius. In the case of Galerkin coarse solver, an identity has been established to characterize the convergence factor of two-grid methods [24, 5, 29], which has been widely used to analyze two-grid methods; see, e.g., [5, 25, 1, 26]. Nevertheless, it is often too costly to solve the Galerkin coarse-grid system exactly, especially when its size is relatively large. Recently, Xu and Zhang [27, 28] extended the two-grid convergence theory to the case of non-Galerkin coarse solvers. They proposed two theoretical frameworks for analyzing the convergence of inexact two-grid methods, which have been successfully applied to the convergence analysis of multigrid methods.

Unlike the SPD case, the convergence theory of multigrid methods for nonsymmetric problems is still not mature. Most existing algorithms are based largely on heuristic or incomplete theory. In general, a nonsymmetric matrix fails to induce a norm and the error propagation matrix of the coarse-grid correction is an oblique projection. Thus, two fundamental questions need to be answered when studying two-grid methods for nonsymmetric problems. The first one is how to choose a suitable convergence measure. Ideally, the selected measure can facilitate us to derive an identity for characterizing the convergence of two-grid methods, as in the SPD case. The simplest choice is the spectral radius of iteration matrix [17, 23, 15, 6]. However, it describes only the asymptotic convergence, which may be a completely misleading indicator [8, 14, 18], especially when the iterations are not carried out enough times. Some other measures (e.g., the ATA\sqrt{A^{T}A}- and AAT\sqrt{AA^{T}}-norms) and the associated convergence estimates can be found in [2, 12, 14, 13, 16, 18] and the references therein.

The second question is how to build the intergrid transfer operators Rnc×nR\in\mathbb{R}^{n_{\rm c}\times n} (restriction) and Pn×ncP\in\mathbb{R}^{n\times n_{\rm c}} (prolongation) such that the coarse-grid correction will not increase error [16], where ncn_{\rm c} is the number of coarse variables and nn is the number of fine ones (An×nA\in\mathbb{R}^{n\times n}). The error propagation matrix of the correction process takes the form IP(RAP)1RAI-P(RAP)^{-1}RA, provided that RAPnc×ncRAP\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}} is nonsingular. Note that IP(RAP)1RAI-P(RAP)^{-1}RA is a projection (i.e., an idempotent operator). For any operator norm \|\cdot\|, it holds that

IP(RAP)1RA1.\|I-P(RAP)^{-1}RA\|\geq 1.

If IP(RAP)1RA>1\|I-P(RAP)^{-1}RA\|>1, then the correction process may increase error, which is contrary to the core idea of multigrid methods. Indeed, the coarse-grid correction is key to the fast convergence of multigrid algorithms. Therefore, for a fixed RR, one needs to design a prolongation PP such that IP(RAP)1RA=1\|I-P(RAP)^{-1}RA\|=1 in order to ensure that the correction process will not increase error.

In this paper, we are concerned with the convergence theory of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from discretized convection-diffusion equations). In the two-grid method, an SPD matrix, Mn×nM\in\mathbb{R}^{n\times n}, is used as a smoother and the induced MM-norm is used for measuring two-grid convergence. In addition, M1ATRTM^{-1}A^{T}R^{T} is used as a prolongation operator in order to ensure that the coarse-grid correction will not increase error. Such a prolongation can be viewed as a multiplicative perturbation of the classical choice RTR^{T} (since M1ATIM^{-1}A^{T}\approx I). Unlike the existing estimates, we here establish a succinct identity for the convergence factor of the two-grid method. It can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by the row space of RR. Compared with the exact case, the convergence theory of inexact two-grid methods is of more practical significance, while, for nonsymmetric problems, such theory is scarce in the literature. Motivated by this observation, we present some convergence estimates for an inexact variant of the two-grid method (both linear and nonlinear coarse solvers are considered), which lay a theoretical foundation for developing new algorithms.

The rest of this paper is organized as follows. In Section 2, we present a two-grid method for solving nonsymmetric positive definite linear systems. In Section 3, we establish an identity for the convergence factor of the two-grid method, followed by two applications of the identity. In Section 4, we propose and analyze an inexact variant of the two-grid method. In Section 5, we give some concluding remarks.

2. Preliminaries

For convenience, we first list some notation used in the subsequent discussions.

  • InI_{n} denotes the n×nn\times n identity matrix (or II when its size is clear from context).

  • λmin()\lambda_{\min}(\cdot), λmin+()\lambda_{\min}^{+}(\cdot), and λmax()\lambda_{\max}(\cdot) stand for the smallest eigenvalue, the smallest positive eigenvalue, and the largest eigenvalue of a matrix, respectively.

  • λi()\lambda_{i}(\cdot) denotes the iith smallest eigenvalue of a matrix.

  • λ()\lambda(\cdot) denotes the spectrum of a matrix.

  • 2\|\cdot\|_{2} denotes the spectral norm of a matrix.

  • M\|\cdot\|_{M} denotes the norm induced by an SPD matrix Mn×nM\in\mathbb{R}^{n\times n}: if 𝐯n\mathbf{v}\in\mathbb{R}^{n}, then 𝐯M=𝐯TM𝐯\|\mathbf{v}\|_{M}=\sqrt{\mathbf{v}^{T}M\mathbf{v}}; if Xn×nX\in\mathbb{R}^{n\times n}, then XM=max𝐯n\{0}X𝐯M𝐯M\|X\|_{M}=\max_{\mathbf{v}\in\mathbb{R}^{n}\backslash\{0\}}\frac{\|X\mathbf{v}\|_{M}}{\|\mathbf{v}\|_{M}}.

  • 𝔼[]\mathbb{E}[\cdot] denotes the expectation of a random variable.

Consider solving the linear system

(2.1) A𝐮=𝐟,A\mathbf{u}=\mathbf{f},

where An×nA\in\mathbb{R}^{n\times n} is nonsymmetric but positive definite (namely, 𝐯TA𝐯>0\mathbf{v}^{T}A\mathbf{v}>0 for all 𝐯n\{0}\mathbf{v}\in\mathbb{R}^{n}\backslash\{0\}, which implies that AA is nonsingular), 𝐮n\mathbf{u}\in\mathbb{R}^{n}, and 𝐟n\mathbf{f}\in\mathbb{R}^{n}. A natural splitting of AA is given by

A=Asym+Askew,A=A_{\rm sym}+A_{\rm skew},

where Asym=12(A+AT)A_{\rm sym}=\frac{1}{2}(A+A^{T}) and Askew=12(AAT)A_{\rm skew}=\frac{1}{2}(A-A^{T}) are the symmetric and skew-symmetric parts of AA, respectively. It is easy to check that AA is positive definite if and only if the symmetric matrix AsymA_{\rm sym} is positive definite.

Several basic assumptions involved in the analysis of two-grid methods are summarized as follows.

  • Let Mn×nM\in\mathbb{R}^{n\times n} be an SPD smoother such that IM1AM1\|I-M^{-1}A\|_{M}\leq 1.

  • Let Rnc×nR\in\mathbb{R}^{n_{\rm c}\times n} be a restriction matrix of rank ncn_{\rm c}, where nc(<n)n_{\rm c}\,(<n) is the number of coarse variables.

  • Let Pn×ncP\in\mathbb{R}^{n\times n_{\rm c}} be a prolongation (or interpolation) matrix of rank ncn_{\rm c}.

  • Assume that the coarse-grid matrix Ac:=RAPnc×ncA_{\rm c}:=RAP\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}} is nonsingular.

Remark 2.1.

The assumption IM1AM1\|I-M^{-1}A\|_{M}\leq 1 is equivalent to

λmax((IM12AM12)(IM12ATM12))1,\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}\leq 1,

that is,

λmin(M12A~M12)0,\lambda_{\min}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\geq 0,

where

(2.2) A~:=A+ATAM1AT.\widetilde{A}:=A+A^{T}-AM^{-1}A^{T}.

As a result, IM1AM1\|I-M^{-1}A\|_{M}\leq 1 if and only if the symmetric matrix A~\widetilde{A} is positive semidefinite, which is a very weak assumption. If A~\widetilde{A} fails to be positive semidefinite, one only needs to modify MM properly, e.g., replace MM by ωM\omega M (ω>0\omega>0 is a parameter). Indeed, there always exists an SPD matrix MM such that IM1AM1\|I-M^{-1}A\|_{M}\leq 1.

With the above assumptions, an exact two-grid method for solving (2.1) can be described by Algorithm 1, in which Ac1A_{\rm c}^{-1} is used as a coarse solver.

Algorithm 1  Exact two-grid method.
1:Smoothing: 𝐮(1)𝐮(0)+M1(𝐟A𝐮(0))\mathbf{u}^{(1)}\leftarrow\mathbf{u}^{(0)}+M^{-1}\big{(}\mathbf{f}-A\mathbf{u}^{(0)}\big{)} \triangleright 𝐮(0)n\mathbf{u}^{(0)}\in\mathbb{R}^{n} is an initial guess
2:Restriction: 𝐫cR(𝐟A𝐮(1))\mathbf{r}_{\rm c}\leftarrow R\big{(}\mathbf{f}-A\mathbf{u}^{(1)}\big{)}
3:Coarse-grid correction: 𝐞cAc1𝐫c\mathbf{e}_{\rm c}\leftarrow A_{\rm c}^{-1}\mathbf{r}_{\rm c}
4:Prolongation: 𝐮TG𝐮(1)+P𝐞c\mathbf{u}_{\rm TG}\leftarrow\mathbf{u}^{(1)}+P\mathbf{e}_{\rm c}
Remark 2.2.

When AA is SPD, pre- and postsmoothing steps are often performed in a symmetric way. The resulting two-grid iteration matrix is symmetric in the AA-inner product, in which case the AA-norm of the two-grid iteration matrix coincides with its spectral radius. However, for nonsymmetric problems, the two-grid iteration matrix will no longer be symmetric, even if a postsmoothing step is added in Algorithm 1.

In fact, Algorithm 1 involves the following two error-reduction processes:

(2.3a) 𝐮𝐮(1)\displaystyle\mathbf{u}-\mathbf{u}^{(1)} =(IM1A)(𝐮𝐮(0)),\displaystyle=(I-M^{-1}A)\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},
(2.3b) 𝐮𝐮TG\displaystyle\mathbf{u}-\mathbf{u}_{\rm TG} =(IΠA)(𝐮𝐮(1)),\displaystyle=(I-\varPi_{A})\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)},

where

(2.4) ΠA:=PAc1RA.\varPi_{A}:=PA_{\rm c}^{-1}RA.

Then

𝐮𝐮(1)M\displaystyle\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M} IM1AM𝐮𝐮(0)M,\displaystyle\leq\|I-M^{-1}A\|_{M}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M},
𝐮𝐮TGM\displaystyle\|\mathbf{u}-\mathbf{u}_{\rm TG}\|_{M} IΠAM𝐮𝐮(1)M.\displaystyle\leq\|I-\varPi_{A}\|_{M}\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}.

Recall that IM1AM1\|I-M^{-1}A\|_{M}\leq 1. The initial error 𝐮𝐮(0)M\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M} will not be amplified by (2.3a).

It can be seen from (2.4) that ΠA\varPi_{A} is a projection along (or parallel to) null(RA)\operatorname*{null}(RA) onto range(P)\operatorname*{range}(P), which leads to

IΠAM1,\|I-\varPi_{A}\|_{M}\geq 1,

with equality if and only if ΠA=M1ΠATM\varPi_{A}=M^{-1}\varPi_{A}^{T}M. To ensure that the error 𝐮𝐮(1)M\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M} will not be amplified by (2.3b), it suffices to choose a prolongation PP such that

(2.5) ΠA=M1ATRT(RAP)TPTM.\varPi_{A}=M^{-1}A^{T}R^{T}(RAP)^{-T}P^{T}M.

Observe that ΠA\varPi_{A} is a projection along null(RA)\operatorname*{null}(RA) onto range(P)\operatorname*{range}(P), while the right-hand side of (2.5) is a projection along null(PTM)\operatorname*{null}(P^{T}M) onto range(M1ATRT)\operatorname*{range}(M^{-1}A^{T}R^{T}). Hence, (2.5) entails that

range(P)=range(M1ATRT)andnull(PTM)=null(RA),\operatorname*{range}(P)=\operatorname*{range}(M^{-1}A^{T}R^{T})\quad\text{and}\quad\operatorname*{null}(P^{T}M)=\operatorname*{null}(RA),

which can be satisfied by

P=M1ATRTWP=M^{-1}A^{T}R^{T}W

for any nc×ncn_{\rm c}\times n_{\rm c} nonsingular matrix WW. Taking W=IncW=I_{n_{\rm c}} yields the prolongation

(2.6) PM1ATRT.P_{\star}\equiv M^{-1}A^{T}R^{T}.

For any coarse vector 𝐯cnc\mathbf{v}_{\rm c}\in\mathbb{R}^{n_{\rm c}}, P𝐯cP_{\star}\mathbf{v}_{\rm c} can be easily computed, because the action of M1M^{-1} is always available. Furthermore, it is possible that PP_{\star} has a sparse structure. For instance, if AA is sparse, M=ωdiag(A)M=\omega\operatorname*{diag}(A), and R=(0Inc)R=\big{(}0\,\ I_{n_{\rm c}}\big{)}, then PP_{\star} is sparse.

Remark 2.3.

For SPD problems, it is advisable to take P=RTP=R^{T}, which is the most commonly used prolongation in both geometric and algebraic multigrid methods. Then

IΠAA=IRT(RART)1RAA=1,\|I-\varPi_{A}\|_{A}=\big{\|}I-R^{T}(RAR^{T})^{-1}RA\big{\|}_{A}=1,

which shows that the error 𝐮𝐮(1)A\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{A} will not be amplified by (2.3b). However, for nonsymmetric problems, this goal cannot be achieved by the choice RTR^{T}. According to the above discussion, it is suggested to take P=PP=P_{\star}, which can be viewed as a multiplicative perturbation of RTR^{T}. Moreover,

PRTMRTMIM1ATM=IM1AM1.\frac{\|P_{\star}-R^{T}\|_{M}}{\|R^{T}\|_{M}}\leq\|I-M^{-1}A^{T}\|_{M}=\|I-M^{-1}A\|_{M}\leq 1.

Combining (2.3a) and (2.3b), we obtain

𝐮𝐮TG=ETG(𝐮𝐮(0)),\mathbf{u}-\mathbf{u}_{\rm TG}=E_{\rm TG}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

where

(2.7) ETG=(IΠA)(IM1A),E_{\rm TG}=(I-\varPi_{A})(I-M^{-1}A),

which is called the iteration matrix (or error propagation matrix) of Algorithm 1. Then, it holds that

𝐮𝐮TGMETGM𝐮𝐮(0)M\|\mathbf{u}-\mathbf{u}_{\rm TG}\|_{M}\leq\|E_{\rm TG}\|_{M}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}

with

(2.8) ETGM=(IM12ΠAM12)(IM12AM12)2.\|E_{\rm TG}\|_{M}=\big{\|}\big{(}I-M^{\frac{1}{2}}\varPi_{A}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\|}_{2}.

The quantity (2.8) is referred to as the MM-convergence factor of Algorithm 1, which will be analyzed in the next section.

3. Convergence analysis of Algorithm 1

In this section, we analyze the convergence of Algorithm 1 with prolongation PP_{\star}. A succinct identity for characterizing the convergence factor ETGM\|E_{\rm TG}\|_{M} is established, followed by discussions on how the identity can be used to derive a class of optimal restriction operators and analyze the influence of range(RT)\operatorname*{range}(R^{T}) on ETGM\|E_{\rm TG}\|_{M}.

We start with a technical lemma, which provides a sufficient and necessary condition for ETGM<1\|E_{\rm TG}\|_{M}<1 (see the proof of Theorem 3.3 for details).

Lemma 3.1.

Let A~\widetilde{A} and ΠA\varPi_{A} be defined by (2.2) and (2.4), respectively. Then

(3.1) null(A~)null(RA)={0}\operatorname*{null}(\widetilde{A})\cap\operatorname*{null}(RA)=\{0\}

if and only if rank(A~12(IΠA))=nnc\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c}.

Proof.

Observe that rank(A~12(IΠA))=nnc\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c} is equivalent to

(3.2) null(A~12(IΠA))=null(IΠA).\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=\operatorname*{null}(I-\varPi_{A}).

Thus, it suffices to show that (3.1) and (3.2) are equivalent.

“(3.1)\Rightarrow(3.2)”: For any 𝐱null(A~12(IΠA))\mathbf{x}\in\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}, we have

A~(IΠA)𝐱=0,\widetilde{A}(I-\varPi_{A})\mathbf{x}=0,

which implies that (IΠA)𝐱null(A~)(I-\varPi_{A})\mathbf{x}\in\operatorname*{null}(\widetilde{A}). On the other hand, (IΠA)𝐱null(RA)(I-\varPi_{A})\mathbf{x}\in\operatorname*{null}(RA). If (3.1) holds, then 𝐱null(IΠA)\mathbf{x}\in\operatorname*{null}(I-\varPi_{A}). The arbitrariness of 𝐱\mathbf{x} yields

null(A~12(IΠA))null(IΠA),\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}\subseteq\operatorname*{null}(I-\varPi_{A}),

which, together with the fact null(IΠA)null(A~12(IΠA))\operatorname*{null}(I-\varPi_{A})\subseteq\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}, leads to (3.2).

“(3.2)\Rightarrow(3.1)”: Assume that the relation (3.1) does not hold, i.e., there exists a nonzero vector 𝐲\mathbf{y} in null(A~)null(RA)\operatorname*{null}(\widetilde{A})\cap\operatorname*{null}(RA). Since

null(RA)=range(IΠA),\operatorname*{null}(RA)=\operatorname*{range}(I-\varPi_{A}),

𝐲\mathbf{y} can be expressed as 𝐲=(IΠA)𝐳\mathbf{y}=(I-\varPi_{A})\mathbf{z} for some 𝐳n\range(P)\mathbf{z}\in\mathbb{R}^{n}\backslash\operatorname*{range}(P), which, combined with the fact 𝐲null(A~)=null(A~12)\mathbf{y}\in\operatorname*{null}(\widetilde{A})=\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}\big{)}, yields

A~12(IΠA)𝐳=0.\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\mathbf{z}=0.

Consequently, 𝐳\mathbf{z} is in null(A~12(IΠA))\operatorname*{null}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)} but not in null(IΠA)\operatorname*{null}(I-\varPi_{A}), which contradicts with (3.2). This completes the proof. ∎

Remark 3.2.

Obviously, the condition (3.1) will be satisfied if A~\widetilde{A} is SPD, or, equivalently, IM1AM<1\|I-M^{-1}A\|_{M}<1. In addition, (3.1) implies that rank(A~)nnc\operatorname*{rank}(\widetilde{A})\geq n-n_{\rm c}.

Using (2.8) and Lemma 3.1, we can derive the following identity.

Theorem 3.3.

Under the assumptions of Algorithm 1 and the condition (3.1), the MM-convergence factor of Algorithm 1 with prolongation (2.6) can be characterized as

(3.3) ETGM=1σTG\|E_{\rm TG}\|_{M}=\sqrt{1-\sigma_{\rm TG}}

with

(3.4) σTG=λmin+(M1A~(IΠA)),\sigma_{\rm TG}=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)},

where A~\widetilde{A} and ΠA\varPi_{A} are defined by (2.2) and (2.4), respectively.

Proof.

If P=PP=P_{\star}, then

Ac=RAP=PTMPA_{\rm c}=RAP_{\star}=P_{\star}^{T}MP_{\star}

and

ΠA=PAc1RA=P(PTMP)1PTM.\varPi_{A}=P_{\star}A_{\rm c}^{-1}RA=P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M.

Let

Π=M12ΠAM12.\varPi=M^{\frac{1}{2}}\varPi_{A}M^{-\frac{1}{2}}.

Then

(3.5) Π=M12P(PTMP)1PTM12.\varPi=M^{\frac{1}{2}}P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M^{\frac{1}{2}}.

Since ΠT=Π=Π2\varPi^{T}=\varPi=\varPi^{2} and rank(Π)=nc\operatorname*{rank}(\varPi)=n_{\rm c}, there exists an n×nn\times n orthogonal matrix QQ such that

(3.6) QTΠQ=(Inc000).Q^{T}\varPi Q=\begin{pmatrix}I_{n_{\rm c}}&0\\ 0&0\end{pmatrix}.

By (2.8), we have

ETGM2\displaystyle\|E_{\rm TG}\|_{M}^{2} =(IΠ)(IM12AM12)22\displaystyle=\big{\|}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\|}_{2}^{2}
=λmax((IM12ATM12)(IΠ)(IM12AM12))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
=λmax((IM12AM12)(IM12ATM12)(IΠ))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{)}
=λmax((IM12A~M12)(IΠ))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{)}
=1λmin(Π+M12A~M12(IΠ)).\displaystyle=1-\lambda_{\min}\big{(}\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}.

Let

(3.7) QTM12A~M12Q=(X1X2X2TX3),Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q=\begin{pmatrix}X_{1}&X_{2}\\ X_{2}^{T}&X_{3}\end{pmatrix},

where X1nc×ncX_{1}\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}}, X2nc×(nnc)X_{2}\in\mathbb{R}^{n_{\rm c}\times(n-n_{\rm c})}, and X3(nnc)×(nnc)X_{3}\in\mathbb{R}^{(n-n_{\rm c})\times(n-n_{\rm c})}. The positive semidefiniteness of QTM12A~M12QQ^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q implies that of X3X_{3}. From the relation

IQTM12A~M12Q\displaystyle I-Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q =QT(IM12A~M12)Q\displaystyle=Q^{T}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}Q
=QT(IM12AM12)(IM12ATM12)Q,\displaystyle=Q^{T}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}Q,

we deduce that InncX3I_{n-n_{\rm c}}-X_{3} is SPSD. It follows that λ(X3)[0,1]\lambda(X_{3})\subset[0,1]. Direct computation yields

Π+M12A~M12(IΠ)=Q(IncX20X3)QT.\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}I_{n_{\rm c}}&X_{2}\\ 0&X_{3}\end{pmatrix}Q^{T}.

Then

ETGM2\displaystyle\|E_{\rm TG}\|_{M}^{2} =1λmin(Π+M12A~M12(IΠ))\displaystyle=1-\lambda_{\min}\big{(}\varPi+M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}
=1min{1,λmin(X3)}\displaystyle=1-\min\big{\{}1,\,\lambda_{\min}(X_{3})\big{\}}
=1λmin(X3).\displaystyle=1-\lambda_{\min}(X_{3}).

Since

(IΠ)M12A~M12(IΠ)=Q(000X3)QT,(I-\varPi)M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}0&0\\ 0&X_{3}\end{pmatrix}Q^{T},

we obtain

rank(X3)\displaystyle\operatorname*{rank}(X_{3}) =rank((IΠ)M12A~M12(IΠ))\displaystyle=\operatorname*{rank}\big{(}(I-\varPi)M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}
=rank(A~12M12(IΠ))\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}M^{-\frac{1}{2}}(I-\varPi)\big{)}
=rank(A~12(IΠA))=nnc,\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}(I-\varPi_{A})\big{)}=n-n_{\rm c},

where we have used Lemma 3.1. This means that X3X_{3} is SPD, i.e., λmin(X3)>0\lambda_{\min}(X_{3})>0. Due to

(3.8) M12A~M12(IΠ)=Q(0X20X3)QT,M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)=Q\begin{pmatrix}0&X_{2}\\ 0&X_{3}\end{pmatrix}Q^{T},

it follows that

λmin(X3)=λmin+(M12A~M12(IΠ)).\lambda_{\min}(X_{3})=\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}.

Thus,

ETGM2\displaystyle\|E_{\rm TG}\|_{M}^{2} =1λmin+(M12A~M12(IΠ))\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)}
=1λmin+(M1A~M12(IΠ)M12)\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)M^{\frac{1}{2}}\big{)}
=1λmin+(M1A~(IΠA)),\displaystyle=1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)},

which leads to the identity (3.3). ∎

The identity (3.3) provides a straightforward approach to analyzing the convergence properties of Algorithm 1. Of particular interest is a restriction operator that minimizes the convergence factor ETGM\|E_{\rm TG}\|_{M}, provided that MM is preselected.

The following inequality, known as the Poincaré separation theorem (see, e.g., [11, Corollary 4.3.37]), will be used to analyze the optimal restriction.

Lemma 3.4.

Let Hn×nH\in\mathbb{C}^{n\times n} be Hermitian, and let {𝐪k}k=1mn(1mn)\{\mathbf{q}_{k}\}_{k=1}^{m}\subset\mathbb{C}^{n}\,(1\leq m\leq n) be a set of orthonormal vectors. Then, for any i=1,,mi=1,\ldots,m, it holds that

λi(H)λi(Hˇ)λi+nm(H),\lambda_{i}(H)\leq\lambda_{i}(\check{H})\leq\lambda_{i+n-m}(H),

where Hˇ=(𝐪iH𝐪j)i,j=1mm×m\check{H}=\big{(}\mathbf{q}_{i}^{\ast}H\mathbf{q}_{j}\big{)}_{i,j=1}^{m}\in\mathbb{C}^{m\times m} and 𝐪i\mathbf{q}_{i}^{\ast} denotes the conjugate transpose of 𝐪i\mathbf{q}_{i}.

Based on (3.3) and Lemma 3.4, we can establish the following optimal restriction theory.

Theorem 3.5.

Let A~\widetilde{A} be defined by (2.2), and let {(μi,𝐯i)}i=1n\{(\mu_{i},\mathbf{v}_{i})\}_{i=1}^{n} be the eigenpairs of the generalized eigenvalue problem

A~𝐯=μM𝐯,\widetilde{A}\mathbf{v}=\mu M\mathbf{v},

where

μ1μ2μnand𝐯iTM𝐯j={1if i=j,0if ij.\mu_{1}\leq\mu_{2}\leq\cdots\leq\mu_{n}\quad\text{and}\quad\mathbf{v}_{i}^{T}M\mathbf{v}_{j}=\begin{cases}1&\text{if $i=j$},\\ 0&\text{if $i\neq j$}.\end{cases}

Under the assumptions of Theorem 3.3, it holds that

ETGM1μnc+1,\|E_{\rm TG}\|_{M}\geq\sqrt{1-\mu_{n_{\rm c}+1}},

with equality if null(RA)=span{𝐯nc+1,,𝐯n}\operatorname*{null}(RA)=\operatorname*{span}\{\mathbf{v}_{n_{\rm c}+1},\ldots,\mathbf{v}_{n}\}.

Proof.

Observe that

λ(M1A~)=λ(M12A~M12).\lambda\big{(}M^{-1}\widetilde{A}\big{)}=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}.

Since M12A~M12M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}} and IM12A~M12I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}} are SPSD, it follows that

0=μ1==μnr<μnr+1μn1,0=\mu_{1}=\cdots=\mu_{n-r}<\mu_{n-r+1}\leq\cdots\leq\mu_{n}\leq 1,

where r=rank(A~)nncr=\operatorname*{rank}(\widetilde{A})\geq n-n_{\rm c}.

Let

V=(𝐯1,,𝐯n)andU1=V1P(PTVTV1P)12.V=(\mathbf{v}_{1},\ldots,\mathbf{v}_{n})\quad\text{and}\quad U_{1}=V^{-1}P_{\star}\big{(}P_{\star}^{T}V^{-T}V^{-1}P_{\star}\big{)}^{-\frac{1}{2}}.

It is easy to check that VV is an n×nn\times n nonsingular matrix with V1=VTMV^{-1}=V^{T}M and U1U_{1} is an n×ncn\times n_{\rm c} matrix with orthonormal columns (i.e., U1TU1=IncU_{1}^{T}U_{1}=I_{n_{\rm c}}). Let U2U_{2} be an n×(nnc)n\times(n-n_{\rm c}) matrix such that (U1U2)n×n(U_{1}\ U_{2})\in\mathbb{R}^{n\times n} is orthogonal, i.e.,

(U1U2)1=(U1U2)T.(U_{1}\ U_{2})^{-1}=(U_{1}\ U_{2})^{T}.

Then

M1A~(IΠA)\displaystyle M^{-1}\widetilde{A}(I-\varPi_{A}) =M1A~(IP(PTMP)1PTM)\displaystyle=M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}
=M1A~(IVU1(U1TVTMVU1)1U1TVTM)\displaystyle=M^{-1}\widetilde{A}\big{(}I-VU_{1}(U_{1}^{T}V^{T}MVU_{1})^{-1}U_{1}^{T}V^{T}M\big{)}
=M1A~(IVU1U1TVTM)\displaystyle=M^{-1}\widetilde{A}(I-VU_{1}U_{1}^{T}V^{T}M)
=M1A~(IVU1U1TV1)\displaystyle=M^{-1}\widetilde{A}(I-VU_{1}U_{1}^{T}V^{-1})
=M1A~VU2U2TV1\displaystyle=M^{-1}\widetilde{A}VU_{2}U_{2}^{T}V^{-1}
=VΛU2U2TV1,\displaystyle=V\Lambda U_{2}U_{2}^{T}V^{-1},

where Λ=diag(0,,0,μnr+1,,μn)n×n\Lambda=\operatorname*{diag}\big{(}0,\ldots,0,\mu_{n-r+1},\ldots,\mu_{n}\big{)}\in\mathbb{R}^{n\times n}. Hence,

λ(ΛU2U2T)\displaystyle\lambda\big{(}\Lambda U_{2}U_{2}^{T}\big{)} =λ(M1A~(IΠA))\displaystyle=\lambda\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)}
=λ(M12A~M12M12(IΠA)M12)\displaystyle=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}M^{\frac{1}{2}}(I-\varPi_{A})M^{-\frac{1}{2}}\big{)}
=λ(M12A~M12(IΠ)),\displaystyle=\lambda\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}(I-\varPi)\big{)},

which, together with (3.8), yields that ΛU2U2T\Lambda U_{2}U_{2}^{T} has nncn-n_{\rm c} positive eigenvalues and eigenvalue 0 with multiplicity ncn_{\rm c}. Due to

(U1TU2T)ΛU2U2T(U1U2)=(0U1TΛU20U2TΛU2),\begin{pmatrix}U_{1}^{T}\\ U_{2}^{T}\end{pmatrix}\Lambda U_{2}U_{2}^{T}\big{(}U_{1}\ U_{2}\big{)}=\begin{pmatrix}0&U_{1}^{T}\Lambda U_{2}\\ 0&U_{2}^{T}\Lambda U_{2}\end{pmatrix},

it follows that U2TΛU2U_{2}^{T}\Lambda U_{2} is SPD. Using (3.3) and Lemma 3.4, we obtain

ETGM\displaystyle\|E_{\rm TG}\|_{M} =1λmin+(M1A~(IΠA))\displaystyle=\sqrt{1-\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}(I-\varPi_{A})\big{)}}
=1λmin+(ΛU2U2T)\displaystyle=\sqrt{1-\lambda_{\min}^{+}(\Lambda U_{2}U_{2}^{T})}
=1λ1(U2TΛU2)\displaystyle=\sqrt{1-\lambda_{1}(U_{2}^{T}\Lambda U_{2})}
1μnc+1.\displaystyle\geq\sqrt{1-\mu_{n_{\rm c}+1}}.

In particular, if null(RA)=span{𝐯nc+1,,𝐯n}\operatorname*{null}(RA)=\operatorname*{span}\{\mathbf{v}_{n_{\rm c}+1},\ldots,\mathbf{v}_{n}\}, then

V1P=V1M1ATRT=VTATRT=(RAV)T=(RcT0),V^{-1}P_{\star}=V^{-1}M^{-1}A^{T}R^{T}=V^{T}A^{T}R^{T}=(RAV)^{T}=\begin{pmatrix}R_{\rm c}^{T}\\ 0\end{pmatrix},

where Rc=RA(𝐯1,,𝐯nc)nc×ncR_{\rm c}=RA(\mathbf{v}_{1},\ldots,\mathbf{v}_{n_{\rm c}})\in\mathbb{R}^{n_{\rm c}\times n_{\rm c}} is nonsingular. We then have

U2U2T\displaystyle U_{2}U_{2}^{T} =IU1U1T\displaystyle=I-U_{1}U_{1}^{T}
=IV1P(PTVTV1P)1PTVT\displaystyle=I-V^{-1}P_{\star}\big{(}P_{\star}^{T}V^{-T}V^{-1}P_{\star}\big{)}^{-1}P_{\star}^{T}V^{-T}
=I(RcT0)RcTRc1(Rc  0)\displaystyle=I-\begin{pmatrix}R_{\rm c}^{T}\\ 0\end{pmatrix}R_{\rm c}^{-T}R_{\rm c}^{-1}\big{(}R_{\rm c}\,\ 0\big{)}
=(000Innc).\displaystyle=\begin{pmatrix}0&0\\ 0&I_{n-n_{\rm c}}\end{pmatrix}.

Thus,

ETGM=1λmin+(ΛU2U2T)=1μnc+1.\|E_{\rm TG}\|_{M}=\sqrt{1-\lambda_{\min}^{+}(\Lambda U_{2}U_{2}^{T})}=\sqrt{1-\mu_{n_{\rm c}+1}}.

This completes the proof. ∎

To analyze the influence of range(RT)\operatorname*{range}(R^{T}) on ETGM\|E_{\rm TG}\|_{M}, we need the following lemma; see, e.g., [11, Corollary 4.3.5].

Lemma 3.6.

Let H1H_{1} and H2H_{2} be n×nn\times n Hermitian matrices. If H2H_{2} is singular, then

λi(H1+H2)λi+rank(H2)(H1)\lambda_{i}(H_{1}+H_{2})\leq\lambda_{i+\operatorname*{rank}(H_{2})}(H_{1})

for all i=1,,nrank(H2)i=1,\ldots,n-\operatorname*{rank}(H_{2}).

The following theorem shows that ETGM\|E_{\rm TG}\|_{M} decreases when expanding the row space of RR.

Theorem 3.7.

Assume that R^n^c×n\widehat{R}\in\mathbb{R}^{\widehat{n}_{\rm c}\times n} has full row rank, where ncn^c<nn_{\rm c}\leq\widehat{n}_{\rm c}<n. If range(RT)range(R^T)\operatorname*{range}(R^{T})\subseteq\operatorname*{range}(\widehat{R}^{T}), then

(3.9) E^TGMETGM,\|\widehat{E}_{\rm TG}\|_{M}\leq\|E_{\rm TG}\|_{M},

where ETGE_{\rm TG} is given by (2.7) and E^TG\widehat{E}_{\rm TG} is formed by replacing RR in (2.7) with R^\widehat{R}.

Proof.

Since range(RT)range(R^T)\operatorname*{range}(R^{T})\subseteq\operatorname*{range}(\widehat{R}^{T}), there exists an n^c×nc\widehat{n}_{\rm c}\times n_{\rm c} matrix YY, with full column rank, such that

RT=R^TY.R^{T}=\widehat{R}^{T}Y.

Let Y^\widehat{Y} be an n^c×n^c\widehat{n}_{\rm c}\times\widehat{n}_{\rm c} nonsingular matrix such that

Y=Y^(Inc0).Y=\widehat{Y}\begin{pmatrix}I_{n_{\rm c}}\\ 0\end{pmatrix}.

Then

RT=R^TY^(Inc0).R^{T}=\widehat{R}^{T}\widehat{Y}\begin{pmatrix}I_{n_{\rm c}}\\ 0\end{pmatrix}.

Hence, there exists an n×(n^cnc)n\times(\widehat{n}_{\rm c}-n_{\rm c}) matrix Z0Z_{0}, with full column rank, such that

(3.10) R^T=(RTZ0)Y^1.\widehat{R}^{T}=\big{(}R^{T}\ Z_{0}\big{)}\widehat{Y}^{-1}.

According to the proof of Theorem 3.3, we have

σTG\displaystyle\sigma_{\rm TG} =λmin+(M1A~(IP(PTMP)1PTM))\displaystyle=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}\big{)}
=λnc+1(M1A~(IP(PTMP)1PTM))\displaystyle=\lambda_{n_{\rm c}+1}\big{(}M^{-1}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}\big{)}
=λnc+1(A~(IP(PTMP)1PTM)M1)\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}\big{(}I-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}M\big{)}M^{-1}\big{)}
=λnc+1(A~12(M1P(PTMP)1PT)A~12).\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}.

Similarly, one has

E^TGM=1σ^TG,\|\widehat{E}_{\rm TG}\|_{M}=\sqrt{1-\widehat{\sigma}_{\rm TG}},

where

σ^TG=λn^c+1(A~12(M1P^(P^TMP^)1P^T)A~12)withP^=M1ATR^T.\widehat{\sigma}_{\rm TG}=\lambda_{\widehat{n}_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}\quad\text{with}\quad\widehat{P}_{\star}=M^{-1}A^{T}\widehat{R}^{T}.

By (3.10), we have

P^=(PZ)Y^1withZ=M1ATZ0.\widehat{P}_{\star}=\big{(}P_{\star}\ Z\big{)}\widehat{Y}^{-1}\quad\text{with}\quad Z=M^{-1}A^{T}Z_{0}.

Let

D\displaystyle D_{\star} =P^(P^TMP^)1P^TP(PTMP)1PT,\displaystyle=\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T},
L\displaystyle L_{\star} =(Inc0ZTMP(PTMP)1In^cnc),\displaystyle=\begin{pmatrix}I_{n_{\rm c}}&0\\ -Z^{T}MP_{\star}(P_{\star}^{T}MP_{\star})^{-1}&I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix},
S\displaystyle S_{\star} =ZTMZZTMP(PTMP)1PTMZ.\displaystyle=Z^{T}MZ-Z^{T}MP_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ.

Then

D\displaystyle D_{\star} =(PZ)[(PTMPPTMZZTMPZTMZ)1((PTMP)1000)](PZ)T\displaystyle=\big{(}P_{\star}\ Z\big{)}\bigg{[}\begin{pmatrix}P_{\star}^{T}MP_{\star}&P_{\star}^{T}MZ\\ Z^{T}MP_{\star}&Z^{T}MZ\end{pmatrix}^{-1}-\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&0\end{pmatrix}\bigg{]}\big{(}P_{\star}\ Z\big{)}^{T}
=(PZ)[LT((PTMP)100S1)L((PTMP)1000)](PZ)T\displaystyle=\big{(}P_{\star}\ Z\big{)}\bigg{[}L_{\star}^{T}\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&S_{\star}^{-1}\end{pmatrix}L_{\star}-\begin{pmatrix}(P_{\star}^{T}MP_{\star})^{-1}&0\\ 0&0\end{pmatrix}\bigg{]}\big{(}P_{\star}\ Z\big{)}^{T}
=(PZ)((PTMP)1PTMZIn^cnc)S1((PTMP)1PTMZIn^cnc)T(PZ)T,\displaystyle=\big{(}P_{\star}\ Z\big{)}\begin{pmatrix}-(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ\\ I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix}S_{\star}^{-1}\begin{pmatrix}-(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}MZ\\ I_{\widehat{n}_{\rm c}-n_{\rm c}}\end{pmatrix}^{T}\big{(}P_{\star}\ Z\big{)}^{T},

from which we deduce that DD_{\star} is SPSD and rank(D)n^cnc\operatorname*{rank}(D_{\star})\leq\widehat{n}_{\rm c}-n_{\rm c}. It follows that

rank(A~12DA~12)n^cnc,\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}D_{\star}\widetilde{A}^{\frac{1}{2}}\big{)}\leq\widehat{n}_{\rm c}-n_{\rm c},

which, together with Lemma 3.6, yields

σTG\displaystyle\sigma_{\rm TG} =λnc+1(A~12(M1P(PTMP)1PT)A~12)\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}
=λnc+1(A~12(M1P^(P^TMP^)1P^T)A~12+A~12DA~12)\displaystyle=\lambda_{n_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}+\widetilde{A}^{\frac{1}{2}}D_{\star}\widetilde{A}^{\frac{1}{2}}\big{)}
λnc+1+n^cnc(A~12(M1P^(P^TMP^)1P^T)A~12)\displaystyle\leq\lambda_{n_{\rm c}+1+\widehat{n}_{\rm c}-n_{\rm c}}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}
=λn^c+1(A~12(M1P^(P^TMP^)1P^T)A~12)=σ^TG.\displaystyle=\lambda_{\widehat{n}_{\rm c}+1}\big{(}\widetilde{A}^{\frac{1}{2}}\big{(}M^{-1}-\widehat{P}_{\star}(\widehat{P}_{\star}^{T}M\widehat{P}_{\star})^{-1}\widehat{P}_{\star}^{T}\big{)}\widetilde{A}^{\frac{1}{2}}\big{)}=\widehat{\sigma}_{\rm TG}.

Thus,

ETGM=1σTG1σ^TG=E^TGM,\|E_{\rm TG}\|_{M}=\sqrt{1-\sigma_{\rm TG}}\geq\sqrt{1-\widehat{\sigma}_{\rm TG}}=\|\widehat{E}_{\rm TG}\|_{M},

which gives the inequality (3.9). ∎

4. An inexact variant of Algorithm 1

Recall that the coarse-grid system in Algorithm 1 to be solved reads

(4.1) Ac𝐞c=𝐫c.A_{\rm c}\mathbf{e}_{\rm c}=\mathbf{r}_{\rm c}.

In practice, it is often too costly to solve (4.1) exactly when its size is large. Instead, without essential loss of convergence speed, it is advisable to find an approximation to the exact solution 𝐞cAc1𝐫c\mathbf{e}_{\rm c}\equiv A_{\rm c}^{-1}\mathbf{r}_{\rm c}. Note that, unlike the original problem (2.1), the coefficient matrix in (4.1) is SPD if PP_{\star} is used as a prolongation. For such linear systems, there are many efficient numerical methods available in the literature, such as multigrid and conjugate gradient methods.

In what follows, we analyze the convergence of an inexact variant of Algorithm 1, which is formalized as Algorithm 2.

Algorithm 2  Inexact two-grid method.
1:Smoothing: 𝐮(1)𝐮(0)+M1(𝐟A𝐮(0))\mathbf{u}^{(1)}\leftarrow\mathbf{u}^{(0)}+M^{-1}\big{(}\mathbf{f}-A\mathbf{u}^{(0)}\big{)} \triangleright 𝐮(0)n\mathbf{u}^{(0)}\in\mathbb{R}^{n} is an initial guess
2:Restriction: 𝐫cR(𝐟A𝐮(1))\mathbf{r}_{\rm c}\leftarrow R\big{(}\mathbf{f}-A\mathbf{u}^{(1)}\big{)}
3:Coarse-grid correction: 𝐞^cc𝐫c\hat{\mathbf{e}}_{\rm c}\leftarrow\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket \triangleright c:ncnc\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket:\mathbb{R}^{n_{\rm c}}\rightarrow\mathbb{R}^{n_{\rm c}} and cAc1\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket\approx A_{\rm c}^{-1}
4:Prolongation: 𝐮ITG𝐮(1)+P𝐞^c\mathbf{u}_{\rm ITG}\leftarrow\mathbf{u}^{(1)}+P_{\star}\hat{\mathbf{e}}_{\rm c} \triangleright PP_{\star} is given by (2.6)

4.1. Linear coarse solvers

The solver c\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket in Algorithm 2 is a general mapping from nc\mathbb{R}^{n_{\rm c}} to nc\mathbb{R}^{n_{\rm c}}, which is expected to be a good approximation to Ac1A_{\rm c}^{-1}. In this subsection, we consider the linear case c=Bc1\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1}, where BcB_{\rm c} is an nc×ncn_{\rm c}\times n_{\rm c} nonsingular matrix such that Bc+BcTAcB_{\rm c}+B_{\rm c}^{T}-A_{\rm c} is SPD, or, equivalently, IncBc1AcAc<1\big{\|}I_{n_{\rm c}}-B_{\rm c}^{-1}A_{\rm c}\big{\|}_{A_{\rm c}}<1.

From Algorithm 2, we have

𝐮𝐮ITG=EITG(𝐮𝐮(0)),\mathbf{u}-\mathbf{u}_{\rm ITG}=E_{\rm ITG}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

where

EITG=(IPBc1RA)(IM1A).E_{\rm ITG}=\big{(}I-P_{\star}B_{\rm c}^{-1}RA\big{)}(I-M^{-1}A).

In view of (2.6), EITGE_{\rm ITG} can be expressed as

(4.2) EITG=(IPBc1PTM)(IM1A).E_{\rm ITG}=\big{(}I-P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M\big{)}(I-M^{-1}A).

Define

(4.3) B¯c:=Bc(Bc+BcTAc)1BcT.\overline{B}_{\rm c}:=B_{\rm c}\big{(}B_{\rm c}+B_{\rm c}^{T}-A_{\rm c}\big{)}^{-1}B_{\rm c}^{T}.

By (4.2), we have

EITGM2\displaystyle\|E_{\rm ITG}\|_{M}^{2} =(IPBc1PTM)(IM1A)M2\displaystyle=\big{\|}\big{(}I-P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M\big{)}(I-M^{-1}A)\big{\|}_{M}^{2}
=(IM12PBc1PTM12)(IM12AM12)22\displaystyle=\big{\|}\big{(}I-M^{\frac{1}{2}}P_{\star}B_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{\|}_{2}^{2}
=λmax((IM12ATM12)(IM12PB¯c1PTM12)(IM12AM12)).\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{\frac{1}{2}}P_{\star}\overline{B}_{\rm c}^{-1}P_{\star}^{T}M^{\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}.

Due to

IAc12B¯c1Ac12=(IAc12BcTAc12)(IAc12Bc1Ac12),I-A_{\rm c}^{\frac{1}{2}}\overline{B}_{\rm c}^{-1}A_{\rm c}^{\frac{1}{2}}=\Big{(}I-A_{\rm c}^{\frac{1}{2}}B_{\rm c}^{-T}A_{\rm c}^{\frac{1}{2}}\Big{)}\Big{(}I-A_{\rm c}^{\frac{1}{2}}B_{\rm c}^{-1}A_{\rm c}^{\frac{1}{2}}\Big{)},

it follows that B¯cAc\overline{B}_{\rm c}-A_{\rm c} is SPSD. Let

(4.4a) α1\displaystyle\alpha_{1} =λmin(B¯c1Ac),\displaystyle=\lambda_{\min}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)},
(4.4b) α2\displaystyle\alpha_{2} =λmax(B¯c1Ac).\displaystyle=\lambda_{\max}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}.

Then

0<α1α21.0<\alpha_{1}\leq\alpha_{2}\leq 1.

From (4.4a) and (4.4b), we deduce that B¯c1α2Ac\overline{B}_{\rm c}-\frac{1}{\alpha_{2}}A_{\rm c} and 1α1AcB¯c\frac{1}{\alpha_{1}}A_{\rm c}-\overline{B}_{\rm c} are SPSD. Hence,

(4.5) λmax(2)EITGM2λmax(1),\lambda_{\max}(\mathcal{E}_{2})\leq\|E_{\rm ITG}\|_{M}^{2}\leq\lambda_{\max}(\mathcal{E}_{1}),

where

k=(IM12ATM12)(IαkΠ)(IM12AM12)\mathcal{E}_{k}=\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\alpha_{k}\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}

with Π\varPi being given by (3.5).

To estimate the largest eigenvalues of 1\mathcal{E}_{1} and 2\mathcal{E}_{2}, we need the following eigenvalue identities.

Lemma 4.1.

Let Π\varPi be given by (3.5). Then

(4.6a) λmin((IM12ATM12)(IΠ)(IM12AM12))=0,\displaystyle\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=0,
(4.6b) λmax((IM12ATM12)(IΠ)(IM12AM12))=1σTG,\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\sigma_{\rm TG},
(4.6c) λmin((IM12ATM12)Π(IM12AM12))=0,\displaystyle\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=0,
(4.6d) λmax((IM12ATM12)Π(IM12AM12))=1δTG,\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\delta_{\rm TG},

where σTG\sigma_{\rm TG} is given by (3.4) and

(4.7) δTG={λmin+(M1A~ΠA)if null(A~)range(P)={0},0if null(A~)range(P){0}.\delta_{\rm TG}=\begin{cases}\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\varPi_{A}\big{)}&\text{if $\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})=\{0\}$},\\[2.0pt] 0&\text{if $\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})\neq\{0\}$}.\end{cases}
Proof.

Recall that IΠI-\varPi is an L2L^{2}-orthogonal projection operator. Then, the matrix (IM12ATM12)(IΠ)(IM12AM12)\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)} is SPSD, which leads to (4.6a). Similarly, the identity (4.6c) holds. The identity (4.6b) follows from (3.3) and the fact

ETGM2=λmax((IM12ATM12)(IΠ)(IM12AM12)).\|E_{\rm TG}\|_{M}^{2}=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}.

It remains to prove (4.6d).

By (3.6) and (3.7), we have

(4.8) (IM12A~M12)Π\displaystyle\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\varPi =Q(IncX10X2T0)QT,\displaystyle=Q\begin{pmatrix}I_{n_{\rm c}}-X_{1}&0\\ -X_{2}^{T}&0\end{pmatrix}Q^{T},
(4.9) ΠM12A~M12Π\displaystyle\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi =Q(X1000)QT.\displaystyle=Q\begin{pmatrix}X_{1}&0\\ 0&0\end{pmatrix}Q^{T}.

Due to

(IM12AM12)(IM12ATM12)=IM12A~M12,\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}=I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}},

it follows that

λmax((IM12ATM12)Π(IM12AM12))\displaystyle\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)} =λmax((IM12A~M12)Π)\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\big{)}\varPi\big{)}
=max{1λmin(X1), 0},\displaystyle=\max\big{\{}1-\lambda_{\min}(X_{1}),\,0\big{\}},

where we have used the fact (4.8). Since QTM12A~M12QQ^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q and IQTM12A~M12QI-Q^{T}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}Q are SPSD, we deduce from (3.7) that X1X_{1} is SPSD and λ(X1)[0,1]\lambda(X_{1})\subset[0,1]. Thus,

(4.10) λmax((IM12ATM12)Π(IM12AM12))=1λmin(X1).\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\lambda_{\min}(X_{1}).

From (4.9), we have

rank(X1)\displaystyle\operatorname*{rank}(X_{1}) =rank(ΠM12A~M12Π)\displaystyle=\operatorname*{rank}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}
=rank(A~12M12Π)\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}M^{-\frac{1}{2}}\varPi\big{)}
=rank(A~12P(PTMP)1PT)\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}(P_{\star}^{T}MP_{\star})^{-1}P_{\star}^{T}\big{)}
=rank(A~12P)nc,\displaystyle=\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}\leq n_{\rm c},

with equality if and only if

(4.11) null(A~)range(P)={0}.\operatorname*{null}(\widetilde{A})\cap\operatorname*{range}(P_{\star})=\{0\}.

In fact, if the relation (4.11) holds, then PTA~PP_{\star}^{T}\widetilde{A}P_{\star} is SPD, because it is SPSD and 𝐯cTPTA~P𝐯c=(P𝐯c)TA~(P𝐯c)=0\mathbf{v}_{\rm c}^{T}P_{\star}^{T}\widetilde{A}P_{\star}\mathbf{v}_{\rm c}=(P_{\star}\mathbf{v}_{\rm c})^{T}\widetilde{A}(P_{\star}\mathbf{v}_{\rm c})=0 entails that 𝐯c=0\mathbf{v}_{\rm c}=0. Hence,

rank(A~12P)=rank(PTA~P)=nc.\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}=\operatorname*{rank}\big{(}P_{\star}^{T}\widetilde{A}P_{\star}\big{)}=n_{\rm c}.

Conversely, if rank(A~12P)=nc\operatorname*{rank}\big{(}\widetilde{A}^{\frac{1}{2}}P_{\star}\big{)}=n_{\rm c}, then rank(PTA~P)=nc\operatorname*{rank}\big{(}P_{\star}^{T}\widetilde{A}P_{\star}\big{)}=n_{\rm c}, i.e., PTA~PP_{\star}^{T}\widetilde{A}P_{\star} is SPD. If (4.11) does not hold, then there exists a nonzero vector 𝐰cnc\mathbf{w}_{\rm c}\in\mathbb{R}^{n_{\rm c}} such that

A~P𝐰c=0.\widetilde{A}P_{\star}\mathbf{w}_{\rm c}=0.

Accordingly, 𝐰cTPTA~P𝐰c=0\mathbf{w}_{\rm c}^{T}P_{\star}^{T}\widetilde{A}P_{\star}\mathbf{w}_{\rm c}=0, which contradicts with the positive definiteness of PTA~PP_{\star}^{T}\widetilde{A}P_{\star}. The above analysis, together with (4.9), yields

(4.12) λmin(X1)={λmin+(ΠM12A~M12Π)if (4.11) holds,0otherwise.\lambda_{\min}(X_{1})=\begin{cases}\lambda_{\min}^{+}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}&\text{if~{}\eqref{null-range} holds},\\[2.0pt] 0&\text{otherwise}.\end{cases}

The identity (4.6d) then follows from (4.10), (4.12), and the fact

λmin+(ΠM12A~M12Π)=λmin+(M12A~M12Π)=λmin+(M1A~ΠA).\lambda_{\min}^{+}\big{(}\varPi M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}=\lambda_{\min}^{+}\big{(}M^{-\frac{1}{2}}\widetilde{A}M^{-\frac{1}{2}}\varPi\big{)}=\lambda_{\min}^{+}\big{(}M^{-1}\widetilde{A}\varPi_{A}\big{)}.

This completes the proof. ∎

To bound λmax(1)\lambda_{\max}(\mathcal{E}_{1}) and λmax(2)\lambda_{\max}(\mathcal{E}_{2}), we also need an important tool for eigenvalue analysis, which is the following Weyl’s theorem; see, e.g., [11, Theorem 4.3.1].

Lemma 4.2.

Let H1H_{1} and H2H_{2} be n×nn\times n Hermitian matrices. Assume that the spectra of H1H_{1}, H2H_{2}, and H1+H2H_{1}+H_{2} are {λi(H1)}i=1n\{\lambda_{i}(H_{1})\}_{i=1}^{n}, {λi(H2)}i=1n\{\lambda_{i}(H_{2})\}_{i=1}^{n}, and {λi(H1+H2)}i=1n\{\lambda_{i}(H_{1}+H_{2})\}_{i=1}^{n}, respectively. Then, for each k=1,,nk=1,\ldots,n, it holds that

λkj+1(H1)+λj(H2)λk(H1+H2)λk+(H1)+λn(H2)\lambda_{k-j+1}(H_{1})+\lambda_{j}(H_{2})\leq\lambda_{k}(H_{1}+H_{2})\leq\lambda_{k+\ell}(H_{1})+\lambda_{n-\ell}(H_{2})

for all j=1,,kj=1,\ldots,k and =0,,nk\ell=0,\ldots,n-k. In particular, one has

(4.13a) λmax(H1+H2)\displaystyle\lambda_{\max}(H_{1}+H_{2}) max{λmax(H1)+λmin(H2),λmin(H1)+λmax(H2)},\displaystyle\geq\max\big{\{}\lambda_{\max}(H_{1})+\lambda_{\min}(H_{2}),\,\lambda_{\min}(H_{1})+\lambda_{\max}(H_{2})\big{\}},
(4.13b) λmax(H1+H2)\displaystyle\lambda_{\max}(H_{1}+H_{2}) λmax(H1)+λmax(H2).\displaystyle\leq\lambda_{\max}(H_{1})+\lambda_{\max}(H_{2}).

We are now ready to present a convergence estimate for Algorithm 2.

Theorem 4.3.

Let BcB_{\rm c} be an nc×ncn_{\rm c}\times n_{\rm c} matrix such that Bc+BcTAcB_{\rm c}+B_{\rm c}^{T}-A_{\rm c} is SPD, and let c=Bc1\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1}. Then, the MM-convergence factor of Algorithm 2 satisfies

(4.14) (α2)EITGM𝒰(α1)\mathscr{L}(\alpha_{2})\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}(\alpha_{1})

with

(4.15a) (α2)\displaystyle\mathscr{L}(\alpha_{2}) :=1min{σTG,λmin(M1A~)+α2(1δTG)},\displaystyle:=\sqrt{1-\min\big{\{}\sigma_{\rm TG},\,\lambda_{\min}(M^{-1}\widetilde{A})+\alpha_{2}(1-\delta_{\rm TG})\big{\}}},
(4.15b) 𝒰(α1)\displaystyle\mathscr{U}(\alpha_{1}) :=1α1σTG(1α1)λmin(M1A~),\displaystyle:=\sqrt{1-\alpha_{1}\sigma_{\rm TG}-(1-\alpha_{1})\lambda_{\min}(M^{-1}\widetilde{A})},

where σTG\sigma_{\rm TG}, δTG\delta_{\rm TG}, α1\alpha_{1}, and α2\alpha_{2} are given by (3.4), (4.7), (4.4a), and (4.4b), respectively.

Proof.

In view of (4.5), one can obtain a two-sided estimate for EITGM\|E_{\rm ITG}\|_{M} by bounding the eigenvalues λmax(1)\lambda_{\max}(\mathcal{E}_{1}) and λmax(2)\lambda_{\max}(\mathcal{E}_{2}).

By (4.6d) and (4.13a), we have

λmax(2)\displaystyle\lambda_{\max}(\mathcal{E}_{2}) =λmax((IM12ATM12)(Iα2Π)(IM12AM12))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\alpha_{2}\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
λmax((IM12ATM12)(IM12AM12))\displaystyle\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
α2λmax((IM12ATM12)Π(IM12AM12))\displaystyle\quad-\alpha_{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
=λmax((IM12AM12)(IM12ATM12))α2(1δTG)\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}-\alpha_{2}(1-\delta_{\rm TG})
=1λmin(M1A~)α2(1δTG).\displaystyle=1-\lambda_{\min}(M^{-1}\widetilde{A})-\alpha_{2}(1-\delta_{\rm TG}).

On the other hand, we deduce from (4.6b), (4.6c), and (4.13a) that

λmax(2)\displaystyle\lambda_{\max}(\mathcal{E}_{2}) =λmax((IM12ATM12)(IΠ+(1α2)Π)(IM12AM12))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-\varPi+(1-\alpha_{2})\varPi\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
λmax((IM12ATM12)(IΠ)(IM12AM12))\displaystyle\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
+(1α2)λmin((IM12ATM12)Π(IM12AM12))\displaystyle\quad+(1-\alpha_{2})\lambda_{\min}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\varPi\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
=1σTG.\displaystyle=1-\sigma_{\rm TG}.

Thus,

(4.16) λmax(2)1min{σTG,λmin(M1A~)+α2(1δTG)}.\lambda_{\max}(\mathcal{E}_{2})\geq 1-\min\big{\{}\sigma_{\rm TG},\,\lambda_{\min}(M^{-1}\widetilde{A})+\alpha_{2}(1-\delta_{\rm TG})\big{\}}.

Using (4.6b) and (4.13b), we obtain

λmax(1)\displaystyle\lambda_{\max}(\mathcal{E}_{1}) =λmax((IM12ATM12)((1α1)I+α1(IΠ))(IM12AM12))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}(1-\alpha_{1})I+\alpha_{1}(I-\varPi)\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
(1α1)λmax((IM12ATM12)(IM12AM12))\displaystyle\leq(1-\alpha_{1})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
+α1λmax((IM12ATM12)(IΠ)(IM12AM12))\displaystyle\quad+\alpha_{1}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
=(1α1)λmax((IM12AM12)(IM12ATM12))+α1(1σTG)\displaystyle=(1-\alpha_{1})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}+\alpha_{1}(1-\sigma_{\rm TG})
=(1α1)(1λmin(M1A~))+α1(1σTG),\displaystyle=(1-\alpha_{1})\big{(}1-\lambda_{\min}(M^{-1}\widetilde{A})\big{)}+\alpha_{1}(1-\sigma_{\rm TG}),

that is,

(4.17) λmax(1)1α1σTG(1α1)λmin(M1A~).\lambda_{\max}(\mathcal{E}_{1})\leq 1-\alpha_{1}\sigma_{\rm TG}-(1-\alpha_{1})\lambda_{\min}(M^{-1}\widetilde{A}).

Combining (4.5), (4.16), and (4.17), we can arrive at the estimate (4.14). ∎

Remark 4.4.

From (4.6b), (4.6d), and (4.13b), we deduce that

2σTGδTGλmax((IM12ATM12)(IM12AM12))=1λmin(M1A~),2-\sigma_{\rm TG}-\delta_{\rm TG}\geq\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}=1-\lambda_{\min}(M^{-1}\widetilde{A}),

which yields

1δTG+λmin(M1A~)σTG.1-\delta_{\rm TG}+\lambda_{\min}(M^{-1}\widetilde{A})\geq\sigma_{\rm TG}.

If c=Ac1\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=A_{\rm c}^{-1}, then α1=α2=1\alpha_{1}=\alpha_{2}=1, in which case one has

(α2)=𝒰(α1)=1σTG,\mathscr{L}(\alpha_{2})=\mathscr{U}(\alpha_{1})=\sqrt{1-\sigma_{\rm TG}},

and hence (4.14) reduces to the identity (3.3).

As a consequence of Theorem 4.3, we have the following corollary.

Corollary 4.5.

Let ()\mathscr{L}(\cdot) and 𝒰()\mathscr{U}(\cdot) be defined by (4.15a) and (4.15b), respectively. Assume that BcB_{\rm c} is an nc×ncn_{\rm c}\times n_{\rm c} matrix such that 2BcAc2B_{\rm c}-A_{\rm c} is SPD, and let

(4.18) β1=λmin(Bc1Ac)andβ2=λmax(Bc1Ac).\beta_{1}=\lambda_{\min}\big{(}B_{\rm c}^{-1}A_{\rm c}\big{)}\quad\text{and}\quad\beta_{2}=\lambda_{\max}\big{(}B_{\rm c}^{-1}A_{\rm c}\big{)}.

Then, the convergence factor of Algorithm 2 with c=Bc1\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket=B_{\rm c}^{-1} satisfies the following estimates.

(i) If β21\beta_{2}\leq 1, then

((2β2)β2)EITGM𝒰((2β1)β1).\mathscr{L}\big{(}(2-\beta_{2})\beta_{2}\big{)}\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}(2-\beta_{1})\beta_{1}\big{)}.

(ii) If β11<β2\beta_{1}\leq 1<\beta_{2}, then

(1)EITGM𝒰(min{(2β1)β1,(2β2)β2}).\mathscr{L}(1)\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}\min\big{\{}(2-\beta_{1})\beta_{1},\,(2-\beta_{2})\beta_{2}\big{\}}\big{)}.

(iii) If 1<β1β2<21<\beta_{1}\leq\beta_{2}<2, then

((2β1)β1)EITGM𝒰((2β2)β2).\mathscr{L}\big{(}(2-\beta_{1})\beta_{1}\big{)}\leq\|E_{\rm ITG}\|_{M}\leq\mathscr{U}\big{(}(2-\beta_{2})\beta_{2}\big{)}.
Proof.

Since BcB_{\rm c} is SPD, it follows from (4.3) that

B¯c1Ac=(2IncBc1Ac)Bc1Ac.\overline{B}_{\rm c}^{-1}A_{\rm c}=\big{(}2I_{n_{\rm c}}-B_{\rm c}^{-1}A_{\rm c}\big{)}B_{\rm c}^{-1}A_{\rm c}.

With the notation in (4.18), we have

α1\displaystyle\alpha_{1} =λmin(B¯c1Ac)={(2β1)β1if β21,min{(2β1)β1,(2β2)β2}if β11<β2,(2β2)β2if 1<β1β2<2,\displaystyle=\lambda_{\min}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}=\begin{cases}(2-\beta_{1})\beta_{1}&\text{if $\beta_{2}\leq 1$},\\[2.0pt] \min\big{\{}(2-\beta_{1})\beta_{1},\,(2-\beta_{2})\beta_{2}\big{\}}&\text{if $\beta_{1}\leq 1<\beta_{2}$},\\[2.0pt] (2-\beta_{2})\beta_{2}&\text{if $1<\beta_{1}\leq\beta_{2}<2$},\end{cases}
α2\displaystyle\alpha_{2} =λmax(B¯c1Ac)={(2β2)β2if β21,1if β11<β2,(2β1)β1if 1<β1β2<2.\displaystyle=\lambda_{\max}\big{(}\overline{B}_{\rm c}^{-1}A_{\rm c}\big{)}=\begin{cases}(2-\beta_{2})\beta_{2}&\text{if $\beta_{2}\leq 1$},\\[2.0pt] 1&\text{if $\beta_{1}\leq 1<\beta_{2}$},\\[2.0pt] (2-\beta_{1})\beta_{1}&\text{if $1<\beta_{1}\leq\beta_{2}<2$}.\end{cases}

The desired result then follows from Theorem 4.3. ∎

4.2. Nonlinear coarse solvers

In Theorem 4.3, the extreme eigenvalues α1\alpha_{1} and α2\alpha_{2} (or their bounds) are required in order to get an estimate for EITGM\|E_{\rm ITG}\|_{M}. However, such quantities will be unavailable if c\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket is a nonlinear solver (e.g., the conjugate gradient [10] and generalized minimal residual methods [20]). In this subsection, we analyze the convergence of Algorithm 2 with nonlinear coarse solvers.

Theorem 4.6.

If

(4.19) 𝐞cc𝐫cAcε𝐞cAc\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}\leq\varepsilon\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}

for some ε[0,1)\varepsilon\in[0,1), then 𝐮ITG\mathbf{u}_{\rm ITG} produced by Algorithm 2 satisfies

(4.20) 𝐮𝐮ITGM1(1ε2)σTGε2λmin(M1A~)𝐮𝐮(0)M,\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}\leq\sqrt{1-(1-\varepsilon^{2})\sigma_{\rm TG}-\varepsilon^{2}\lambda_{\min}(M^{-1}\widetilde{A})}\,\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M},

where σTG\sigma_{\rm TG} is given by (3.4).

Proof.

From (4.19), we have

𝐞cAc22𝐫cTc𝐫c+c𝐫cAc2ε2𝐞cAc2,\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}-2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\leq\varepsilon^{2}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

that is,

(4.21) 2𝐫cTc𝐫c(1ε2)𝐞cAc2+c𝐫cAc2.2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\geq(1-\varepsilon^{2})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}.

Due to

𝐮𝐮ITG=𝐮𝐮(1)Pc𝐫c,\mathbf{u}-\mathbf{u}_{\rm ITG}=\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket,

it follows that

𝐮𝐮ITGM2\displaystyle\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2} =(𝐮𝐮(1)Pc𝐫c)TM(𝐮𝐮(1)Pc𝐫c)\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{)}
=𝐮𝐮(1)M22(𝐮𝐮(1))TMPc𝐫c+c𝐫cAc2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}MP_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}
=𝐮𝐮(1)M22(𝐮𝐮(1))TATRTc𝐫c+c𝐫cAc2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-2\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}A^{T}R^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}
=𝐮𝐮(1)M22𝐫cTc𝐫c+c𝐫cAc2.\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket+\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}.

Using (4.21), we obtain

𝐮𝐮ITGM2\displaystyle\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2} 𝐮𝐮(1)M2(1ε2)𝐞cAc2\displaystyle\leq\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\varepsilon^{2})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}
=𝐮𝐮(1)M2(1ε2)Ac1RA(𝐮𝐮(1))Ac2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\varepsilon^{2})\big{\|}A_{\rm c}^{-1}RA\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\|}_{A_{\rm c}}^{2}
=𝐮𝐮(1)M2(1ε2)Ac1PTM(𝐮𝐮(1))Ac2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\varepsilon^{2})\big{\|}A_{\rm c}^{-1}P_{\star}^{T}M\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}\big{\|}_{A_{\rm c}}^{2}
=𝐮𝐮(1)M2(1ε2)(𝐮𝐮(1))TM12ΠM12(𝐮𝐮(1))\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\varepsilon^{2})\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\varPi M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}
=(𝐮𝐮(1))TM12(I(1ε2)Π)M12(𝐮𝐮(1)),\displaystyle=\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)}^{T}M^{\frac{1}{2}}\big{(}I-(1-\varepsilon^{2})\varPi\big{)}M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(1)}\big{)},

where Π\varPi is given by (3.5). Since

𝐮𝐮(1)=(IM1A)(𝐮𝐮(0)),\mathbf{u}-\mathbf{u}^{(1)}=(I-M^{-1}A)\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)},

it follows that

𝐮𝐮ITGM2(𝐮𝐮(0))TM12M12(𝐮𝐮(0))\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\leq\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)}^{T}M^{\frac{1}{2}}\mathcal{E}M^{\frac{1}{2}}\big{(}\mathbf{u}-\mathbf{u}^{(0)}\big{)}

with

=(IM12ATM12)(I(1ε2)Π)(IM12AM12).\mathcal{E}=\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-(1-\varepsilon^{2})\varPi\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}.

Hence,

(4.22) 𝐮𝐮ITGM2λmax()𝐮𝐮(0)M2.\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\leq\lambda_{\max}(\mathcal{E})\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.

By (4.6b) and (4.13b), we have

λmax()\displaystyle\lambda_{\max}(\mathcal{E}) =λmax((IM12ATM12)(ε2I+(1ε2)(IΠ))(IM12AM12))\displaystyle=\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}\varepsilon^{2}I+(1-\varepsilon^{2})(I-\varPi)\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
ε2λmax((IM12ATM12)(IM12AM12))\displaystyle\leq\varepsilon^{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
+(1ε2)λmax((IM12ATM12)(IΠ)(IM12AM12))\displaystyle\quad+(1-\varepsilon^{2})\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}(I-\varPi)\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{)}
=ε2λmax((IM12AM12)(IM12ATM12))+(1ε2)(1σTG)\displaystyle=\varepsilon^{2}\lambda_{\max}\big{(}\big{(}I-M^{-\frac{1}{2}}AM^{-\frac{1}{2}}\big{)}\big{(}I-M^{-\frac{1}{2}}A^{T}M^{-\frac{1}{2}}\big{)}\big{)}+(1-\varepsilon^{2})(1-\sigma_{\rm TG})
=ε2(1λmin(M1A~))+(1ε2)(1σTG)\displaystyle=\varepsilon^{2}\big{(}1-\lambda_{\min}(M^{-1}\widetilde{A})\big{)}+(1-\varepsilon^{2})(1-\sigma_{\rm TG})
=1(1ε2)σTGε2λmin(M1A~),\displaystyle=1-(1-\varepsilon^{2})\sigma_{\rm TG}-\varepsilon^{2}\lambda_{\min}(M^{-1}\widetilde{A}),

which, combined with (4.22), yields the estimate (4.20). ∎

Besides deterministic methods, one may apply some randomized methods to the problem (4.1). Similarly to Theorem 4.6, one can derive the following convergence estimates.

Theorem 4.7.

Assume that c:ncnc\mathscr{B}_{\rm c}\llbracket\cdot\rrbracket:\mathbb{R}^{n_{\rm c}}\rightarrow\mathbb{R}^{n_{\rm c}} is a randomized coarse solver, and let γ1,γ2[0,1)\gamma_{1},\gamma_{2}\in[0,1) be two tolerance factors.

(i) If

(4.23) 𝔼[𝐞cc𝐫c]Ac2γ1𝐞cAc2,\big{\|}\mathbb{E}\big{[}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}\leq\gamma_{1}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

then

𝔼[𝐮𝐮ITG]M2(1(1γ1)σTGγ1λmin(M1A~))𝐮𝐮(0)M2.\|\mathbb{E}[\mathbf{u}-\mathbf{u}_{\rm ITG}]\|_{M}^{2}\leq\big{(}1-(1-\gamma_{1})\sigma_{\rm TG}-\gamma_{1}\lambda_{\min}(M^{-1}\widetilde{A})\big{)}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.

(ii) If

(4.24) 𝔼[𝐞cc𝐫cAc2]γ2𝐞cAc2,\mathbb{E}\Big{[}\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}\leq\gamma_{2}\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},

then

𝔼[𝐮𝐮ITGM2](1(1γ2)σTGγ2λmin(M1A~))𝐮𝐮(0)M2.\mathbb{E}\big{[}\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\big{]}\leq\big{(}1-(1-\gamma_{2})\sigma_{\rm TG}-\gamma_{2}\lambda_{\min}(M^{-1}\widetilde{A})\big{)}\big{\|}\mathbf{u}-\mathbf{u}^{(0)}\big{\|}_{M}^{2}.
Proof.

From (4.23) and (4.24), we deduce that

2𝐫cT𝔼[c𝐫c]\displaystyle 2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]} (1γ1)𝐞cAc2+𝔼[c𝐫c]Ac2,\displaystyle\geq(1-\gamma_{1})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}+\big{\|}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2},
2𝐫cT𝔼[c𝐫c]\displaystyle 2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]} (1γ2)𝐞cAc2+𝔼[c𝐫cAc2].\displaystyle\geq(1-\gamma_{2})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}+\mathbb{E}\Big{[}\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}.

Then

𝔼[𝐮𝐮ITG]M2\displaystyle\|\mathbb{E}[\mathbf{u}-\mathbf{u}_{\rm ITG}]\|_{M}^{2} =𝐮𝐮(1)P𝔼[c𝐫c]M2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{M}^{2}
=𝐮𝐮(1)M22𝐫cT𝔼[c𝐫c]+𝔼[c𝐫c]Ac2\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\big{\|}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}
𝐮𝐮(1)M2(1γ1)𝐞cAc2,\displaystyle\leq\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\gamma_{1})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2},
𝔼[𝐮𝐮ITGM2]\displaystyle\mathbb{E}\big{[}\|\mathbf{u}-\mathbf{u}_{\rm ITG}\|_{M}^{2}\big{]} =𝔼[𝐮𝐮(1)Pc𝐫cM2]\displaystyle=\mathbb{E}\Big{[}\big{\|}\mathbf{u}-\mathbf{u}^{(1)}-P_{\star}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{M}^{2}\Big{]}
=𝐮𝐮(1)M22𝐫cT𝔼[c𝐫c]+𝔼[c𝐫cAc2]\displaystyle=\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-2\mathbf{r}_{\rm c}^{T}\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}+\mathbb{E}\Big{[}\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}
𝐮𝐮(1)M2(1γ2)𝐞cAc2.\displaystyle\leq\big{\|}\mathbf{u}-\mathbf{u}^{(1)}\big{\|}_{M}^{2}-(1-\gamma_{2})\|\mathbf{e}_{\rm c}\|_{A_{\rm c}}^{2}.

The remainder of this proof is similar to Theorem 4.6. ∎

Remark 4.8.

We remark that the conditions (4.23) and (4.24) can be satisfied by a wide variety of randomized/stochastic algorithms; see, e.g., [7, 19]. For any random vector c𝐫c\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket, it holds that (see [7, Lemma 4.1])

𝔼[𝐞cc𝐫cAc2]=𝔼[𝐞cc𝐫c]Ac2+𝔼[c𝐫c𝔼[c𝐫c]Ac2].\mathbb{E}\Big{[}\big{\|}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{\|}_{A_{\rm c}}^{2}\Big{]}=\big{\|}\mathbb{E}\big{[}\mathbf{e}_{\rm c}-\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}+\mathbb{E}\Big{[}\big{\|}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket-\mathbb{E}\big{[}\mathscr{B}_{\rm c}\llbracket\mathbf{r}_{\rm c}\rrbracket\big{]}\big{\|}_{A_{\rm c}}^{2}\Big{]}.

This shows that (4.24) is a stronger type of condition compared to (4.23).

5. Conclusions

In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems. In the case of exact coarse solver, we establish an elegant identity for characterizing the MM-convergence factor of the two-grid method. Based on the identity, we derive a class of optimal restriction operators and analyze the influence of the row space of restriction on the convergence factor. Furthermore, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered. It is worth pointing out that our results can be extended straightforwardly to the complex case (non-Hermitian positive definite problems). In the future, we expect to develop some new practical algorithms for nonsymmetric or non-Hermitian positive definite problems based on our two-grid theory.

Acknowledgment

The author is grateful to Prof. Chen-Song Zhang (LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences) for his helpful suggestions.

References

  • [1] J. Brannick, F. Cao, K. Kahl, R. D. Falgout, and X. Hu, Optimal interpolation and compatible relaxation in classical algebraic multigrid, SIAM J. Sci. Comput. 40 (2018), A1473–A1493.
  • [2] M. Brezina, T. A. Manteuffel, S. F. McCormick, J. W. Ruge, and G. Sanders, Towards adaptive smoothed aggregation (α\alphaSA) for nonsymmetric problems, SIAM J. Sci. Comput. 32 (2010), 14–39.
  • [3] W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM, Philadelphia, 2000.
  • [4] R. D. Falgout and P. S. Vassilevski, On generalizing the algebraic multigrid framework, SIAM J. Numer. Anal. 42 (2004), 1669–1693.
  • [5] R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov, On two-grid convergence estimates, Numer. Linear Algebra Appl. 12 (2005), 471–494.
  • [6] L. García Ramos, R. Kehl, and R. Nabben, Projections, deflation, and multigrid for nonsymmetric matrices, SIAM J. Matrix Anal. Appl. 41 (2020), 83–105.
  • [7] R. M. Gower and P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. Appl. 36 (2015), 1660–1690.
  • [8] A. Greenbaum and Z. Strakos, Matrices that generate the same Krylov residual spaces, Recent Advances in Iterative Methods, IMA Vol. Math. Appl., vol. 60, Springer, New York, 1994, pp. 95–118.
  • [9] W. Hackbusch, Multi-Grid Methods and Applications, Springer-Verlag, Berlin, 1985.
  • [10] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand. 49 (1952), 409–436.
  • [11] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed., Cambridge University Press, Cambridge, UK, 2013.
  • [12] J. Lottes, Towards Robust Algebraic Multigrid Methods for Nonsymmetric Problems, Springer Theses, Springer International Publishing AG, 2017.
  • [13] T. A. Manteuffel, S. Münzenmaier, J. W. Ruge, and B. S. Southworth, Nonsymmetric reduction-based algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S242–S268.
  • [14] T. A. Manteuffel, L. N. Olson, J. B. Schroder, and B. S. Southworth, A root-node–based algebraic multigrid method, SIAM J. Sci. Comput. 39 (2017), S723–S756.
  • [15] T. A. Manteuffel, J. W. Ruge, and B. S. Southworth, Nonsymmetric algebraic multigrid based on local approximate ideal restriction (\ellAIR), SIAM J. Sci. Comput. 40 (2018), A4105–A4130.
  • [16] T. A. Manteuffel and B. S. Southworth, Convergence in norm of nonsymmetric algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S269–S296.
  • [17] Y. Notay, Algebraic analysis of two-grid methods: The nonsymmetric case, Numer. Linear Algebra Appl. 17 (2010), 73–96.
  • [18] Y. Notay, Analysis of two-grid methods: The nonnormal case, Math. Comp. 89 (2020), 807–827.
  • [19] P. Richtárik and M. Takáč, Stochastic reformulations of linear systems: Algorithms and convergence theory, SIAM J. Matrix Anal. Appl. 41 (2020), 487–524.
  • [20] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (1986), 856–869.
  • [21] U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid, Academic Press, London, 2001.
  • [22] P. S. Vassilevski, Multilevel Block Factorization Preconditioners: Matrix-Based Analysis and Algorithms for Solving Finite Element Equations, Springer, New York, 2008.
  • [23] T. A. Wiesner, R. S. Tuminaro, W. A. Wall, and M. W. Gee, Multigrid transfers for nonsymmetric systems based on Schur complements and Galerkin projections, Numer. Linear Algebra Appl. 21 (2014), 415–438.
  • [24] J. Xu and L. T. Zikatanov, The method of alternating projections and the method of subspace corrections in Hilbert space, J. Amer. Math. Soc. 15 (2002), 573–597.
  • [25] J. Xu and L. T. Zikatanov, Algebraic multigrid methods, Acta Numer. 26 (2017), 591–721.
  • [26] X. Xu and C.-S. Zhang, On the ideal interpolation operator in algebraic multigrid methods, SIAM J. Numer. Anal. 56 (2018), 1693–1710.
  • [27] X. Xu and C.-S. Zhang, Convergence analysis of inexact two-grid methods: A theoretical framework, SIAM J. Numer. Anal. 60 (2022), 133–156.
  • [28] X. Xu and C.-S. Zhang, A new analytical framework for the convergence of inexact two-grid methods, SIAM J. Matrix Anal. Appl. 43 (2022), 512–533.
  • [29] L. T. Zikatanov, Two-sided bounds on the convergence rate of two-level methods, Numer. Linear Algebra Appl. 15 (2008), 439–454.