Convergence analysis of a two-grid method for nonsymmetric positive definite problems
Abstract.
Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.
Key words and phrases:
multigrid, two-grid methods, nonsymmetric positive definite problems, convergence theory, inexact coarse solvers2010 Mathematics Subject Classification:
Primary 65F08, 65F10, 65N15, 65N55; Secondary 15A18, 65N751. Introduction
Multigrid is one of the most efficient numerical methods for solving large-scale systems of linear equations that arise from discretized partial differential equations. It has been shown to be a powerful solver, with linear or near-linear computational complexity, for a large class of linear systems; see, e.g., [9, 3, 21, 22]. The fundamental module of multigrid methods is a two-grid scheme, which consists of two complementary error-reduction processes: smoothing and coarse-grid correction. The smoothing process is typically a simple iterative method, such as the (weighted) Jacobi and Gauss–Seidel methods. One limitation of these classical methods is that they are only efficient at eliminating high-frequency (i.e., oscillatory) error in general. The remaining low-frequency (i.e., smooth) error will be further reduced by the coarse-grid correction, which is essentially a projection-type method. These two processes will be applied iteratively in multigrid algorithms until a desired residual tolerance is achieved.
Much of multigrid analysis within the literature relies on the problem matrix, , being symmetric positive definite (SPD). For such problems, the convergence theory of multigrid methods has been well developed; see, e.g., [24, 4, 5, 22, 25, 27, 28]. When is SPD, the convergence analysis of a symmetric two-grid method (i.e., pre- and postsmoothing steps are performed in a symmetric way) amounts to analyzing the eigenvalues of its iteration matrix (also called the error propagation matrix), because the -norm of the iteration matrix, called the convergence factor, coincides with its spectral radius. In the case of Galerkin coarse solver, an identity has been established to characterize the convergence factor of two-grid methods [24, 5, 29], which has been widely used to analyze two-grid methods; see, e.g., [5, 25, 1, 26]. Nevertheless, it is often too costly to solve the Galerkin coarse-grid system exactly, especially when its size is relatively large. Recently, Xu and Zhang [27, 28] extended the two-grid convergence theory to the case of non-Galerkin coarse solvers. They proposed two theoretical frameworks for analyzing the convergence of inexact two-grid methods, which have been successfully applied to the convergence analysis of multigrid methods.
Unlike the SPD case, the convergence theory of multigrid methods for nonsymmetric problems is still not mature. Most existing algorithms are based largely on heuristic or incomplete theory. In general, a nonsymmetric matrix fails to induce a norm and the error propagation matrix of the coarse-grid correction is an oblique projection. Thus, two fundamental questions need to be answered when studying two-grid methods for nonsymmetric problems. The first one is how to choose a suitable convergence measure. Ideally, the selected measure can facilitate us to derive an identity for characterizing the convergence of two-grid methods, as in the SPD case. The simplest choice is the spectral radius of iteration matrix [17, 23, 15, 6]. However, it describes only the asymptotic convergence, which may be a completely misleading indicator [8, 14, 18], especially when the iterations are not carried out enough times. Some other measures (e.g., the - and -norms) and the associated convergence estimates can be found in [2, 12, 14, 13, 16, 18] and the references therein.
The second question is how to build the intergrid transfer operators (restriction) and (prolongation) such that the coarse-grid correction will not increase error [16], where is the number of coarse variables and is the number of fine ones (). The error propagation matrix of the correction process takes the form , provided that is nonsingular. Note that is a projection (i.e., an idempotent operator). For any operator norm , it holds that
If , then the correction process may increase error, which is contrary to the core idea of multigrid methods. Indeed, the coarse-grid correction is key to the fast convergence of multigrid algorithms. Therefore, for a fixed , one needs to design a prolongation such that in order to ensure that the correction process will not increase error.
In this paper, we are concerned with the convergence theory of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from discretized convection-diffusion equations). In the two-grid method, an SPD matrix, , is used as a smoother and the induced -norm is used for measuring two-grid convergence. In addition, is used as a prolongation operator in order to ensure that the coarse-grid correction will not increase error. Such a prolongation can be viewed as a multiplicative perturbation of the classical choice (since ). Unlike the existing estimates, we here establish a succinct identity for the convergence factor of the two-grid method. It can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by the row space of . Compared with the exact case, the convergence theory of inexact two-grid methods is of more practical significance, while, for nonsymmetric problems, such theory is scarce in the literature. Motivated by this observation, we present some convergence estimates for an inexact variant of the two-grid method (both linear and nonlinear coarse solvers are considered), which lay a theoretical foundation for developing new algorithms.
The rest of this paper is organized as follows. In Section 2, we present a two-grid method for solving nonsymmetric positive definite linear systems. In Section 3, we establish an identity for the convergence factor of the two-grid method, followed by two applications of the identity. In Section 4, we propose and analyze an inexact variant of the two-grid method. In Section 5, we give some concluding remarks.
2. Preliminaries
For convenience, we first list some notation used in the subsequent discussions.
-
–
denotes the identity matrix (or when its size is clear from context).
-
–
, , and stand for the smallest eigenvalue, the smallest positive eigenvalue, and the largest eigenvalue of a matrix, respectively.
-
–
denotes the th smallest eigenvalue of a matrix.
-
–
denotes the spectrum of a matrix.
-
–
denotes the spectral norm of a matrix.
-
–
denotes the norm induced by an SPD matrix : if , then ; if , then .
-
–
denotes the expectation of a random variable.
Consider solving the linear system
(2.1) |
where is nonsymmetric but positive definite (namely, for all , which implies that is nonsingular), , and . A natural splitting of is given by
where and are the symmetric and skew-symmetric parts of , respectively. It is easy to check that is positive definite if and only if the symmetric matrix is positive definite.
Several basic assumptions involved in the analysis of two-grid methods are summarized as follows.
-
•
Let be an SPD smoother such that .
-
•
Let be a restriction matrix of rank , where is the number of coarse variables.
-
•
Let be a prolongation (or interpolation) matrix of rank .
-
•
Assume that the coarse-grid matrix is nonsingular.
Remark 2.1.
The assumption is equivalent to
that is,
where
(2.2) |
As a result, if and only if the symmetric matrix is positive semidefinite, which is a very weak assumption. If fails to be positive semidefinite, one only needs to modify properly, e.g., replace by ( is a parameter). Indeed, there always exists an SPD matrix such that .
With the above assumptions, an exact two-grid method for solving (2.1) can be described by Algorithm 1, in which is used as a coarse solver.
Remark 2.2.
When is SPD, pre- and postsmoothing steps are often performed in a symmetric way. The resulting two-grid iteration matrix is symmetric in the -inner product, in which case the -norm of the two-grid iteration matrix coincides with its spectral radius. However, for nonsymmetric problems, the two-grid iteration matrix will no longer be symmetric, even if a postsmoothing step is added in Algorithm 1.
In fact, Algorithm 1 involves the following two error-reduction processes:
(2.3a) | ||||
(2.3b) |
where
(2.4) |
Then
Recall that . The initial error will not be amplified by (2.3a).
It can be seen from (2.4) that is a projection along (or parallel to) onto , which leads to
with equality if and only if . To ensure that the error will not be amplified by (2.3b), it suffices to choose a prolongation such that
(2.5) |
Observe that is a projection along onto , while the right-hand side of (2.5) is a projection along onto . Hence, (2.5) entails that
which can be satisfied by
for any nonsingular matrix . Taking yields the prolongation
(2.6) |
For any coarse vector , can be easily computed, because the action of is always available. Furthermore, it is possible that has a sparse structure. For instance, if is sparse, , and , then is sparse.
Remark 2.3.
For SPD problems, it is advisable to take , which is the most commonly used prolongation in both geometric and algebraic multigrid methods. Then
which shows that the error will not be amplified by (2.3b). However, for nonsymmetric problems, this goal cannot be achieved by the choice . According to the above discussion, it is suggested to take , which can be viewed as a multiplicative perturbation of . Moreover,
3. Convergence analysis of Algorithm 1
In this section, we analyze the convergence of Algorithm 1 with prolongation . A succinct identity for characterizing the convergence factor is established, followed by discussions on how the identity can be used to derive a class of optimal restriction operators and analyze the influence of on .
We start with a technical lemma, which provides a sufficient and necessary condition for (see the proof of Theorem 3.3 for details).
Proof.
Remark 3.2.
Theorem 3.3.
Proof.
If , then
and
Let
Then
(3.5) |
Since and , there exists an orthogonal matrix such that
(3.6) |
By (2.8), we have
Let
(3.7) |
where , , and . The positive semidefiniteness of implies that of . From the relation
we deduce that is SPSD. It follows that . Direct computation yields
Then
The identity (3.3) provides a straightforward approach to analyzing the convergence properties of Algorithm 1. Of particular interest is a restriction operator that minimizes the convergence factor , provided that is preselected.
The following inequality, known as the Poincaré separation theorem (see, e.g., [11, Corollary 4.3.37]), will be used to analyze the optimal restriction.
Lemma 3.4.
Let be Hermitian, and let be a set of orthonormal vectors. Then, for any , it holds that
where and denotes the conjugate transpose of .
Theorem 3.5.
Proof.
Observe that
Since and are SPSD, it follows that
where .
Let
It is easy to check that is an nonsingular matrix with and is an matrix with orthonormal columns (i.e., ). Let be an matrix such that is orthogonal, i.e.,
Then
where . Hence,
which, together with (3.8), yields that has positive eigenvalues and eigenvalue with multiplicity . Due to
it follows that is SPD. Using (3.3) and Lemma 3.4, we obtain
In particular, if , then
where is nonsingular. We then have
Thus,
This completes the proof. ∎
To analyze the influence of on , we need the following lemma; see, e.g., [11, Corollary 4.3.5].
Lemma 3.6.
Let and be Hermitian matrices. If is singular, then
for all .
The following theorem shows that decreases when expanding the row space of .
Theorem 3.7.
Proof.
Since , there exists an matrix , with full column rank, such that
Let be an nonsingular matrix such that
Then
Hence, there exists an matrix , with full column rank, such that
(3.10) |
4. An inexact variant of Algorithm 1
Recall that the coarse-grid system in Algorithm 1 to be solved reads
(4.1) |
In practice, it is often too costly to solve (4.1) exactly when its size is large. Instead, without essential loss of convergence speed, it is advisable to find an approximation to the exact solution . Note that, unlike the original problem (2.1), the coefficient matrix in (4.1) is SPD if is used as a prolongation. For such linear systems, there are many efficient numerical methods available in the literature, such as multigrid and conjugate gradient methods.
In what follows, we analyze the convergence of an inexact variant of Algorithm 1, which is formalized as Algorithm 2.
4.1. Linear coarse solvers
The solver in Algorithm 2 is a general mapping from to , which is expected to be a good approximation to . In this subsection, we consider the linear case , where is an nonsingular matrix such that is SPD, or, equivalently, .
Define
(4.3) |
By (4.2), we have
Due to
it follows that is SPSD. Let
(4.4a) | ||||
(4.4b) |
Then
From (4.4a) and (4.4b), we deduce that and are SPSD. Hence,
(4.5) |
where
with being given by (3.5).
To estimate the largest eigenvalues of and , we need the following eigenvalue identities.
Lemma 4.1.
Proof.
Recall that is an -orthogonal projection operator. Then, the matrix is SPSD, which leads to (4.6a). Similarly, the identity (4.6c) holds. The identity (4.6b) follows from (3.3) and the fact
It remains to prove (4.6d).
(4.8) | ||||
(4.9) |
Due to
it follows that
where we have used the fact (4.8). Since and are SPSD, we deduce from (3.7) that is SPSD and . Thus,
(4.10) |
From (4.9), we have
with equality if and only if
(4.11) |
In fact, if the relation (4.11) holds, then is SPD, because it is SPSD and entails that . Hence,
Conversely, if , then , i.e., is SPD. If (4.11) does not hold, then there exists a nonzero vector such that
Accordingly, , which contradicts with the positive definiteness of . The above analysis, together with (4.9), yields
(4.12) |
To bound and , we also need an important tool for eigenvalue analysis, which is the following Weyl’s theorem; see, e.g., [11, Theorem 4.3.1].
Lemma 4.2.
Let and be Hermitian matrices. Assume that the spectra of , , and are , , and , respectively. Then, for each , it holds that
for all and . In particular, one has
(4.13a) | ||||
(4.13b) |
We are now ready to present a convergence estimate for Algorithm 2.
Theorem 4.3.
Proof.
In view of (4.5), one can obtain a two-sided estimate for by bounding the eigenvalues and .
Remark 4.4.
As a consequence of Theorem 4.3, we have the following corollary.
4.2. Nonlinear coarse solvers
In Theorem 4.3, the extreme eigenvalues and (or their bounds) are required in order to get an estimate for . However, such quantities will be unavailable if is a nonlinear solver (e.g., the conjugate gradient [10] and generalized minimal residual methods [20]). In this subsection, we analyze the convergence of Algorithm 2 with nonlinear coarse solvers.
Theorem 4.6.
Proof.
Besides deterministic methods, one may apply some randomized methods to the problem (4.1). Similarly to Theorem 4.6, one can derive the following convergence estimates.
Theorem 4.7.
Assume that is a randomized coarse solver, and let be two tolerance factors.
(i) If
(4.23) |
then
(ii) If
(4.24) |
then
Proof.
5. Conclusions
In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems. In the case of exact coarse solver, we establish an elegant identity for characterizing the -convergence factor of the two-grid method. Based on the identity, we derive a class of optimal restriction operators and analyze the influence of the row space of restriction on the convergence factor. Furthermore, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered. It is worth pointing out that our results can be extended straightforwardly to the complex case (non-Hermitian positive definite problems). In the future, we expect to develop some new practical algorithms for nonsymmetric or non-Hermitian positive definite problems based on our two-grid theory.
Acknowledgment
The author is grateful to Prof. Chen-Song Zhang (LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences) for his helpful suggestions.
References
- [1] J. Brannick, F. Cao, K. Kahl, R. D. Falgout, and X. Hu, Optimal interpolation and compatible relaxation in classical algebraic multigrid, SIAM J. Sci. Comput. 40 (2018), A1473–A1493.
- [2] M. Brezina, T. A. Manteuffel, S. F. McCormick, J. W. Ruge, and G. Sanders, Towards adaptive smoothed aggregation (SA) for nonsymmetric problems, SIAM J. Sci. Comput. 32 (2010), 14–39.
- [3] W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM, Philadelphia, 2000.
- [4] R. D. Falgout and P. S. Vassilevski, On generalizing the algebraic multigrid framework, SIAM J. Numer. Anal. 42 (2004), 1669–1693.
- [5] R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov, On two-grid convergence estimates, Numer. Linear Algebra Appl. 12 (2005), 471–494.
- [6] L. García Ramos, R. Kehl, and R. Nabben, Projections, deflation, and multigrid for nonsymmetric matrices, SIAM J. Matrix Anal. Appl. 41 (2020), 83–105.
- [7] R. M. Gower and P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. Appl. 36 (2015), 1660–1690.
- [8] A. Greenbaum and Z. Strakos, Matrices that generate the same Krylov residual spaces, Recent Advances in Iterative Methods, IMA Vol. Math. Appl., vol. 60, Springer, New York, 1994, pp. 95–118.
- [9] W. Hackbusch, Multi-Grid Methods and Applications, Springer-Verlag, Berlin, 1985.
- [10] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand. 49 (1952), 409–436.
- [11] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed., Cambridge University Press, Cambridge, UK, 2013.
- [12] J. Lottes, Towards Robust Algebraic Multigrid Methods for Nonsymmetric Problems, Springer Theses, Springer International Publishing AG, 2017.
- [13] T. A. Manteuffel, S. Münzenmaier, J. W. Ruge, and B. S. Southworth, Nonsymmetric reduction-based algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S242–S268.
- [14] T. A. Manteuffel, L. N. Olson, J. B. Schroder, and B. S. Southworth, A root-node–based algebraic multigrid method, SIAM J. Sci. Comput. 39 (2017), S723–S756.
- [15] T. A. Manteuffel, J. W. Ruge, and B. S. Southworth, Nonsymmetric algebraic multigrid based on local approximate ideal restriction (AIR), SIAM J. Sci. Comput. 40 (2018), A4105–A4130.
- [16] T. A. Manteuffel and B. S. Southworth, Convergence in norm of nonsymmetric algebraic multigrid, SIAM J. Sci. Comput. 41 (2019), S269–S296.
- [17] Y. Notay, Algebraic analysis of two-grid methods: The nonsymmetric case, Numer. Linear Algebra Appl. 17 (2010), 73–96.
- [18] Y. Notay, Analysis of two-grid methods: The nonnormal case, Math. Comp. 89 (2020), 807–827.
- [19] P. Richtárik and M. Takáč, Stochastic reformulations of linear systems: Algorithms and convergence theory, SIAM J. Matrix Anal. Appl. 41 (2020), 487–524.
- [20] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (1986), 856–869.
- [21] U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid, Academic Press, London, 2001.
- [22] P. S. Vassilevski, Multilevel Block Factorization Preconditioners: Matrix-Based Analysis and Algorithms for Solving Finite Element Equations, Springer, New York, 2008.
- [23] T. A. Wiesner, R. S. Tuminaro, W. A. Wall, and M. W. Gee, Multigrid transfers for nonsymmetric systems based on Schur complements and Galerkin projections, Numer. Linear Algebra Appl. 21 (2014), 415–438.
- [24] J. Xu and L. T. Zikatanov, The method of alternating projections and the method of subspace corrections in Hilbert space, J. Amer. Math. Soc. 15 (2002), 573–597.
- [25] J. Xu and L. T. Zikatanov, Algebraic multigrid methods, Acta Numer. 26 (2017), 591–721.
- [26] X. Xu and C.-S. Zhang, On the ideal interpolation operator in algebraic multigrid methods, SIAM J. Numer. Anal. 56 (2018), 1693–1710.
- [27] X. Xu and C.-S. Zhang, Convergence analysis of inexact two-grid methods: A theoretical framework, SIAM J. Numer. Anal. 60 (2022), 133–156.
- [28] X. Xu and C.-S. Zhang, A new analytical framework for the convergence of inexact two-grid methods, SIAM J. Matrix Anal. Appl. 43 (2022), 512–533.
- [29] L. T. Zikatanov, Two-sided bounds on the convergence rate of two-level methods, Numer. Linear Algebra Appl. 15 (2008), 439–454.