This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jyear

2021

[1]\fnmSean \surHon

[1]\orgdivDepartment of Mathematics, \orgnameHong Kong Baptist University, \orgaddress\streetKowloon Tong, \cityHong Kong SAR, \countryChina

2]\orgdivDepartment of Science and High Technology, \orgnameUniversity of Insubria, \orgaddress \cityComo, \postcode6152, \stateLombardy, \countryItaly

3]\orgdivDepartment of Information Technology, \orgnameUppsala University, \orgaddress \cityUppsala, \postcode75008, \stateUppland, \countrySweden

A sine transform based preconditioned MINRES method for all-at-once systems from constant and variable-coefficient evolutionary PDEs

[email protected]    \fnmPo Yin \surFung [email protected]    \fnmJiamei \surDong [email protected]    \fnmStefano \surSerra-Capizzano [email protected] * [ [
Abstract

In this work, we propose a simple yet generic preconditioned Krylov subspace method for a large class of nonsymmetric block Toeplitz all-at-once systems arising from discretizing evolutionary partial differential equations. Namely, our main result is to propose two novel symmetric positive definite preconditioners, which can be efficiently diagonalized by the discrete sine transform matrix. More specifically, our approach is to first permute the original linear system to obtain a symmetric one, and subsequently develop desired preconditioners based on the spectral symbol of the modified matrix. Then, we show that the eigenvalues of the preconditioned matrix sequences are clustered around ±1\pm 1, which entails rapid convergence when the minimal residual method is devised. Alternatively, when the conjugate gradient method on the normal equations is used, we show that our preconditioner is effective in the sense that the eigenvalues of the preconditioned matrix sequence are clustered around unity. An extension of our proposed preconditioned method is given for high-order backward difference time discretization schemes, which can be applied on a wide range of time-dependent equations. Numerical examples are given, also in the variable-coefficient setting, to demonstrate the effectiveness of our proposed preconditioners, which consistently outperforms an existing block circulant preconditioner discussed in the relevant literature.

keywords:
sine transform based preconditioners; parallel-in-time; Krylov subspace methods; MINRES; all-at-once discretization; block Toeplitz systems
pacs:
[

MSC Classification]15B05, 65F08, 65F10, 65M22

1 Introduction

Over the past few years, there has been a growing interest on developing effective preconditioners for solving the all-at-once linear systems stemming from evolutionary partial differential equations (PDEs); see for instance Bertaccini2001 ; BertacciniNg2003 ; doi:10.1137/16M1062016 ; McDonald2017 ; HON2021113965 ; 0x003ab34e ; https://doi.org/10.1002/nla.2386 ; doi:10.1137/20M1316354 ; LIN2021110221 ; doi:10.1137/19M1309869 ; WU2021110076 and references therein. Instead of solving PDEs in a sequential manner, this class of parallel-in-time (PinT) methods solves for all unknowns simultaneously by constructing a large linear system which is composed of smaller linear systems at each time level. Our proposed method in this work belongs to the diagonalization-based all-at-once methods 10.1007/978-3-319-18827-0_50 , along with doi:10.1137/16M1062016 ; doi:10.1137/20M1316354 . Related PinT methods include space-time multigrid doi:10.1137/15M1046605 ; doi:10.1137/0916050 , multigrid reduction in time doi:10.1137/130944230 ; doi:10.1137/16M1074096 , and parareal method LIONS2001661 ; doi:10.1137/05064607X . For a survey on the development of these PinT methods, we refer the readers to 10.1007/978-3-319-23321-5_3 and the references therein.

As a model problem, we consider the following initial-boundary (value) problem for the heat equation

{u(x,t)t=(a(x)u(x,t))+f(x,t),(x,t)Ω×(0,T],u=g,(x,t)Ω×(0,T],u(x,0)=u0(x),xΩ.\left\{\begin{array}[]{lc}\frac{\partial u(x,t)}{\partial t}={\color[rgb]{0,0,0}\nabla\cdot(a({x})\nabla u(x,t))}+f(x,t),&(x,t)\in\Omega\times(0,T],\\ u=g,&(x,t)\in\partial\Omega\times(0,T],\\ u(x,0)=u_{0}(x),&x\in\Omega.\end{array}\right.\, (1)

where Ω\Omega is open, Ω\partial\Omega denotes the boundary of Ω\Omega, and a,f,g,u0a,f,g,u_{0} are all given functions.

After discretizing the spatial domain with a mesh and using the simple two-level θ\theta-method for time discretization with nn time-steps of size τ=Tn\tau=\frac{T}{n}, we get

Mm(𝐮m(k+1)𝐮m(k)τ)=Km(θ𝐮m(k+1)+(1θ)𝐮m(k))+θ𝐟m(k+1)+(1θ)𝐟m(k),M_{m}\bigg{(}\frac{\mathbf{u}_{m}^{(k+1)}-\mathbf{u}_{m}^{(k)}}{\tau}\bigg{)}=-K_{m}\big{(}\theta\mathbf{u}_{m}^{(k+1)}+(1-\theta)\mathbf{u}_{m}^{(k)}\big{)}+\theta\mathbf{f}_{m}^{(k+1)}+(1-\theta)\mathbf{f}_{m}^{(k)},

where Mm,Kmm×mM_{m},K_{m}\in\mathbb{R}^{m\times m} are the mass matrix and the stiffness matrix approximating (a(x))-\nabla\cdot(a({x})\nabla), respectively, in the standard finite element terminology, 𝐮m(k)=[u1(k),,um(k)]T,\mathbf{u}_{m}^{(k)}=[u_{1}^{(k)},\dots,u_{m}^{(k)}]^{T}, 𝐟m(k)=[f1(k),,fm(k)]T,\mathbf{f}_{m}^{(k)}=[f_{1}^{(k)},\dots,f_{m}^{(k)}]^{T}, and θ[0,1]\theta\in[0,1]. The forward Euler method corresponds to the choice θ=0\theta=0, the backward Euler method to θ=1\theta=1, while the Crank-Nicolson method is associated with θ=1/2\theta=1/2.

The discrete equations to be solved are described below

(Mm+θτKm)𝐮m(k+1)=(Mm(1θ)τKm)𝐮m(k)+θτ𝐟m(k+1)+(1θ)τ𝐟m(k).(M_{m}+\theta\tau K_{m})\mathbf{u}_{m}^{(k+1)}=(M_{m}-(1-\theta)\tau K_{m})\mathbf{u}_{m}^{(k)}+\theta\tau\mathbf{f}_{m}^{(k+1)}+(1-\theta)\tau\mathbf{f}_{m}^{(k)}.

Instead of solving the above equations for 𝐮m(k)\mathbf{u}_{m}^{(k)} sequentially, for k=1,2,,nk=1,2,\ldots,n, we solve the following equivalent all-at-once system

𝒯H[𝐮m(1)𝐮m(2)𝐮m(3)𝐮m(n)]=:𝐮=[θτ𝐟m(1)+(1θ)τ𝐟m(0)+(Mm(1θ)τKm)𝐮m(0)θτ𝐟m(2)+(1θ)τ𝐟m(1)θτ𝐟m(3)+(1θ)τ𝐟m(2)θτ𝐟m(n)+(1θ)τ𝐟m(n1)]=:𝐟H,\mathcal{T}_{H}\underbrace{\begin{bmatrix}{}\mathbf{u}_{m}^{(1)}\\ \mathbf{u}_{m}^{(2)}\\ \mathbf{u}_{m}^{(3)}\\ \vdots\\ \mathbf{u}_{m}^{(n)}\end{bmatrix}}_{{\color[rgb]{0,0,0}=:\mathbf{u}}}=\underbrace{\begin{bmatrix}{}\theta\tau\mathbf{f}_{m}^{(1)}+(1-\theta)\tau\mathbf{f}_{m}^{(0)}+(M_{m}-(1-\theta)\tau K_{m})\mathbf{u}_{m}^{(0)}\\ \theta\tau\mathbf{f}_{m}^{(2)}+(1-\theta)\tau\mathbf{f}_{m}^{(1)}\\ \theta\tau\mathbf{f}_{m}^{(3)}+(1-\theta)\tau\mathbf{f}_{m}^{(2)}\\ \vdots\\ \theta\tau\mathbf{f}_{m}^{(n)}+(1-\theta)\tau\mathbf{f}_{m}^{(n-1)}\end{bmatrix}}_{{\color[rgb]{0,0,0}=:\mathbf{f}_{H}}},

where

𝒯H=[Mm+θτKmMm+(1θ)τKmMm+(1θ)τKmMm+θτKm]\mathcal{T}_{H}=\begin{bmatrix}M_{m}+\theta\tau K_{m}&&&\\ -M_{m}+(1-\theta)\tau K_{m}&\ddots&&\\ &\ddots&\ddots&\\ &&\ddots&\ddots\\ &&-M_{m}+(1-\theta)\tau K_{m}&M_{m}+\theta\tau K_{m}\end{bmatrix} (2)

is a mnmn by mnmn nonsymmetric block Toeplitz matrix, which is generated by the matrix-valued function

gθ(x)=Mm+θτKm=:A(0)+(Mm+(1θ)τKm)=:A(1)eix.g_{\theta}(x)=\underbrace{M_{m}+\theta\tau K_{m}}_{=:A_{(0)}}+\underbrace{(-M_{m}+(1-\theta)\tau K_{m})}_{=:A_{(1)}}e^{\textbf{i}x}. (3)

According to the notion of generating function reported in Section 2 (see also book-GLT-I and the references there reported), it is noted that if KmK_{m} and MmM_{m} commute then A(0)A_{(0)} and A(1)A_{(1)} commute. If one uses a finite element formulation to discretize in space for (1), then MmM_{m} and KmK_{m} are simultaneously diagonalizable, provided that a uniform grid is used. Alternatively, if finite difference methods are used for (1), then MmM_{m}, being exactly the identity matrix ImI_{m} in this case, always commutes with KmK_{m} (see (doi:10.1137/16M1062016, , Section 3.1.1) for a discussion).

Therefore, throughout this work, the following assumptions on both MmM_{m} and KmK_{m} are made, which are compatible with (doi:10.1137/20M1316354, , Section 2):

  1. 1.

    Both MmM_{m} and KmK_{m} are real symmetric positive definite (SPD);

  2. 2.

    Both MmM_{m} and KmK_{m} are sparse, i.e., they have only 𝒪(m)\mathcal{O}(m) nonzero entries.

  3. 3.

    MmM_{m} and KmK_{m} commute.

As already observed the commutation between MmM_{m} and KmK_{m} is obvious if MmM_{m} is the identity matrix or in the constant coefficient setting. When a(x)a(x) is not constant or there is also a dependency on time, that is aa(x,t)a\equiv a(x,t), or the discretization method is different (finite elements, finite volumes, IgA) or graded meshes are used, the commutation is no longer true algebraically, but it holds asymptotically for large enough mm and it can be proved via multilevel GLT tools (see book-GLT-II ). More in detail, by assuming m=m(n)m=m(n) with m(n)m(n) going to infinity as nn tend to infinity and so that 1=o(m(n))1=o(m(n)) and

c=limh0+τh2>0,c=\lim_{h\rightarrow 0^{+}}\frac{\tau}{h^{2}}>0, (4)

in the case of a variable coefficient aa(x,t)a\equiv a(x,t), we obtain that the matrix sequence {h2Km}m\{h^{2}K_{m}\}_{m} is spectrally distributed as p(ψ)a(t,x)p(\psi)a(t,x) on (ψ,x)[0,π,Ω](\psi,x)\in[0,\pi,\Omega], in the sense of Definition 1, since {h2Km}m\{h^{2}K_{m}\}_{m} a real symmetric GLT matrix sequence with GLT symbol p(ψ)a(t,x)p(\psi)a(t,x). The latter statement holds since we exploit a canonical decomposition of h2Kmh^{2}K_{m} as the product of a diagonal sampling sequence and of Toeplitz matrix sequence plus a zero-distributed matrix sequence (see again (book-GLT-II, , Ch. 6, Sect. 7.3)). Hence, under the assumption that finite differences are used and by exploiting the tensor structure of 𝒯H\mathcal{T}_{H} in (2), the global matrix sequence {𝒯H}m\{\mathcal{T}_{H}\}_{m} will be again a GLT matrix sequence with GLT symbol

cθp(ψ2)a(t,x)+c(1θ)a(t,x)e𝐢ψ1.c\theta\,p(\psi_{2})a(t,x)+c(1-\theta)\,a(t,x)e^{-\mathbf{i}\psi_{1}}.

Still the same distribution is present if condition (4) is violated and c=c=\infty, but with a different scaling factor and it applies to the matrix sequence {h2τ𝒯H}m\{\frac{h^{2}}{\tau}\mathcal{T}_{H}\}_{m}.

However, in the present setting mm is considered fixed that is c=0c=0 and hence also the condition 1=o(m(n))1=o(m(n)) is not satisfied. Here a matrix-valued distribution is deduced. When assuming that both nn and mm are allowed to tend to infinity, in the way indicated in the previous lines, the situation changes and the related type of analysis represents a research line with various cases to be considered in the future.

Instead of directly solving (2), we consider the permuted linear system 𝒴𝒯H𝐮=𝒴𝐟H\mathcal{Y}\mathcal{T}_{H}\mathbf{u}=\mathcal{Y}\mathbf{f}_{H}, where

𝒴𝒯H=[A(1)A(0)A(1)A(0)A(1)A(0)A(0)].\mathcal{Y}\mathcal{T}_{H}=\begin{bmatrix}&&&A_{(1)}&A_{(0)}\\ &&\iddots&\iddots&\\ &A_{(1)}&A_{(0)}&&\\ A_{(1)}&A_{(0)}&&&\\ A_{(0)}&&&&\\ \end{bmatrix}. (5)

Note that 𝒴=YnIm\mathcal{Y}=Y_{n}\otimes I_{m} with Ynn×nY_{n}\in\mathbb{R}^{n\times n} being the anti-identity matrix or the flip matrix that is [Yn]j,k=1[Y_{n}]_{j,k}=1 if and only if j+k=n+1j+k=n+1 and [Yn]j,k=0[Y_{n}]_{j,k}=0 otherwise. Clearly, 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} is symmetric since LmL_{m} is assumed symmetric. Thus, we can then deploy a symmetric Krylov subspace method such as the minimal residual (MINRES) method and design an effective preconditioner for 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H}, whose convergence depends only on the eigenvalues.

In doi:10.1137/16M1062016 , a Strang block circulant preconditioner was proposed for 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} and proved effective when a(x)=1a(x)=1. The main advantage of this iterative approach is that the inverse of the preconditioner can be computed using parallel fast Fourier transforms over a number of processors. However, the performance of such a preconditioner can be unsatisfactory, as will be shown in the numerical examples in Section 5. We remark that such an unsatisfactory preconditioning effect of circulant matrices Strang1986 ; doi:10.1137/0909051 for certain ill-conditioned Toeplitz systems was discussed in Hon_SC_Wathen .

Thus, instead of using block circulant preconditioners, we propose in this work the following SPD preconditioner for 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H}, that can be diagonalized by the discrete sine transform matrix

𝒫H=[A(0)2+A(1)2A(0)A(1)A(0)A(1)A(0)2+A(1)2A(0)A(1)A(0)A(1)A(0)2+A(1)2],\displaystyle\mathcal{P}_{H}=\sqrt{\begin{bmatrix}A_{(0)}^{2}+A_{(1)}^{2}&A_{(0)}A_{(1)}&&&\\ A_{(0)}A_{(1)}&A_{(0)}^{2}+A_{(1)}^{2}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&A_{(0)}A_{(1)}\\ &&&A_{(0)}A_{(1)}&A_{(0)}^{2}+A_{(1)}^{2}\end{bmatrix}}, (6)

which is constructed based on approximating the asymptotic eigenvalue distribution of 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H}. Similar to the block circulant preconditioner, the computation of 𝒫H1\mathcal{P}_{H}^{-1} can also be parallelizable over nn processors. We refer readers to Bini1990 for a detailed discussion on such a parallel implementation.

We remind that in a pure Toeplitz setting the algebra of matrices diagonalized by the (real) orthogonal sine transform (i.e., the so-called τ\tau algebra BC83 ) was extensively used in the 90s (see D1 ; D2 ; DS ; Slinear99 and references therein), while recently the approach based on τ\tau preconditioning gained further attention in the context of approximated fractional differential equations (see BEV21 ; DMS16 and references therein). Here, we propose its use in a different setting where the Toeplitz structure is nonsymmetric. We recall that the τ\tau preconditioning is superior when compared to the circulant preconditioning, when real symmetric Toeplitz-like structures are considered, as discussed and theoretically motivated in DS ; Slinear99 .

In particular, we will show that the eigenvalues of the preconditioned matrix sequence {𝒫H1𝒴𝒯H}n\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n} are clustered around ±1\pm 1, which leads to fast convergence when MINRES is employed.

Yet, 𝒫H\mathcal{P}_{H} requires fast diagonalizability of MmM_{m} and KmK_{m} in order to be efficiently implemented. When such diagonalizability is not available, we further propose the following preconditioner 𝒫θ\mathcal{P}_{\theta} as a modification of 𝒫H\mathcal{P}_{H},

𝒫θ=Mm+θτKm\displaystyle\mathcal{P}_{\theta}=\mathcal{H}\otimes M_{m}+\mathcal{H}_{\theta}\otimes\tau K_{m} (7)

where

=[21121112]\displaystyle\mathcal{H}=\sqrt{\begin{bmatrix}2&-1&&&\\ -1&2&-1&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&-1\\ &&&-1&2\end{bmatrix}}

and

θ=[θ2+(1θ)2θ(1θ)θ(1θ)θ2+(1θ)2θ(1θ)θ(1θ)θ2+(1θ)2].\displaystyle\mathcal{H}_{\theta}=\sqrt{\begin{bmatrix}\theta^{2}+(1-\theta)^{2}&\theta(1-\theta)&&&\\ \theta(1-\theta)&\theta^{2}+(1-\theta)^{2}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\theta(1-\theta)\\ &&&\theta(1-\theta)&\theta^{2}+(1-\theta)^{2}\end{bmatrix}}.

Its derivation and implementation will be discussed in Section 3.1.4.

In general, as mentioned in (ANU:9672992, , Chapter 6), the convergence study of preconditioning strategies for nonsymmetric problems is heuristic, since descriptive convergence bounds are usually not available for the generalized minimal residual (GMRES) method or any of the other applicable nonsymmetric Krylov subspace iterative methods.

The paper is organized as follows. In Section 2, we review some preliminary results on block Toeplitz matrices and matrix-valued functions. We present our main results in Section 3 where our proposed preconditioner and its effectiveness are provided. An extension of our preconditioning method is given in Section 4 in the setting of high-order in time discretization schemes. Numerical tests are given in Section 5 to support the analysis of our proposed preconditioner. Finally in Section 6 we draw conclusions and list a few open problems, including the challenging case of variable-coefficients when both the sizes mm and nn tend to infinity.

2 Preliminaries

In the following subsections, we provide some useful background knowledge about block Toeplitz matrices and matrix functions.

2.1 Block Toeplitz matrices

We let L1([π,π],m×m)L^{1}([-\pi,\pi],\mathbb{C}^{m\times m}) be the Banach space of all matrix-valued functions that are Lebesgue integrable over [π,π][-\pi,\pi]. The L1L^{1}-norm induced by the trace norm over m×m\mathbb{C}^{m\times m} is

fL1=12πππf(ψ)tr𝑑ψ<,\|f\|_{L^{1}}=\frac{1}{2\pi}\int_{-\pi}^{\pi}\|f(\psi)\|_{\text{tr}}\,d\psi<\infty,

where Antr:=j=1nσj(An)\|A_{n}\|_{\text{tr}}:=\sum_{j=1}^{n}\sigma_{j}(A_{n}) denotes the trace norm of Ann×nA_{n}\in\mathbb{C}^{n\times n}. The block Toeplitz matrix generated by fL1([π,π],m×m)f\in L^{1}([-\pi,\pi],\mathbb{C}^{m\times m}) is denoted by T(n,m)[f]T_{(n,m)}[f], namely

T(n,m)[f]\displaystyle{T}_{(n,m)}[f] =\displaystyle= [A(0)A(1)A(n+1)A(1)A(1)A(n1)A(1)A(0)]mn×mn,\displaystyle\begin{bmatrix}A_{(0)}&A_{(-1)}&\cdots&A_{(-n+1)}\\ A_{(1)}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&A_{(-1)}\\ A_{(n-1)}&\cdots&A_{(1)}&A_{(0)}\end{bmatrix}\in\mathbb{C}^{mn\times mn}, (8)

where the Fourier coefficients of ff are

A(k)=12πππf(ψ)e𝐢kψ𝑑ψm×m,k=0,±1,±2,.A_{(k)}=\frac{1}{2\pi}\int_{-\pi}^{\pi}f(\psi)e^{-\mathbf{i}k\psi}\,d\psi\in\mathbb{C}^{m\times m},\qquad k=0,\pm 1,\pm 2,\dots.

The function ff is called the generating function or symbol of T(n,m)[f]T_{({n},m)}[f]. For thorough discussions on the related properties of (multilevel) block Toeplitz matrices, we refer the readers to MR2108963 ; Chan:1996:CGM:240441.240445 ; book-GLT-I and references therein.

Before discussing the asymptotic singular value or spectral distribution of T(n,m)[f]T_{(n,m)}[f] associated with ff, which is crucial to develop our preconditioning theory, we introduce the following definition.

Let Cc()C_{c}(\mathbb{C}) (or Cc()C_{c}(\mathbb{R})) be the space of complex-valued continuous functions defined on \mathbb{C} (or \mathbb{R}) with bounded support.

Definition 1.

Ferrari2019 Let {A(n,s)}n\{A_{(n,s)}\}_{n} be a sequence of snsn-by-snsn matrices with ss fixed positive integer.

  1. [1.]

  2. 1.

    We say that {A(n,s)}n\{A_{(n,s)}\}_{n} has an asymptotic singular value distribution described by an ss-by-ss matrix-valued ff if

    limn1snj=1nsF(σj(A(n,s)))=12πππ1sj=1sF(σj(f(x)))dx,FCc()\lim_{n\to\infty}\frac{1}{sn}\sum_{j=1}^{ns}F(\sigma_{j}(A_{(n,s)}))=\frac{1}{2\pi}\int_{-\pi}^{\pi}\frac{1}{s}\sum_{j=1}^{s}F(\sigma_{j}(f(x)))\,dx,\quad\forall F\in C_{c}(\mathbb{R})

    and we write {A(n,s)}nσf\{A_{(n,s)}\}_{n}\sim_{\sigma}f.

  3. 2.

    We say that {A(n,s)}n\{A_{(n,s)}\}_{n} has an asymptotic spectral distribution described by an ss-by-ss matrix-valued ff if

    limn1snj=1npF(λj(A(n,s)))=12πππ1sj=1sF(λj(f(x)))dx,FCc()\lim_{n\to\infty}\frac{1}{sn}\sum_{j=1}^{np}F(\lambda_{j}(A_{(n,s)}))=\frac{1}{2\pi}\int_{-\pi}^{\pi}\frac{1}{s}\sum_{j=1}^{s}F(\lambda_{j}(f(x)))\,dx,\quad\forall F\in C_{c}(\mathbb{C})

    and we write {A(n,s)}nλf\{A_{(n,s)}\}_{n}\sim_{\lambda}f.

Before presenting our main preconditioning results, we introduce the following notation. Given DkD\subset\mathbb{R}^{k} with 0<μk(D)<0<\mu_{k}(D)<\infty, we define D~p\widetilde{D}_{p} as DDpD\bigcup D_{p}, where pkp\in\mathbb{R}^{k} and Dp=p+DD_{p}=p+D, with the constraint that DD and DpD_{p} have non-intersecting interior part, i.e., DDp=D^{\circ}\bigcap D_{p}^{\circ}=\emptyset. In this way, we have μk(D~p)=2μk(D)\mu_{k}(\widetilde{D}_{p})=2\mu_{k}(D). Given any gg defined over DD, we define ψg\psi_{g} over D~p\widetilde{D}_{p} in the following way

ψg(x)={g(x),xD,g(xp),xDp,xD.\psi_{g}(x)=\left\{\begin{array}[]{cc}g(x),&x\in D,\\ -g(x-p),&x\in D_{p},\ x\notin D.\end{array}\right.\, (9)
Theorem 1.

(Ferrari2019, , Theorem 3.4) Suppose fL1([π,π],m×m)f\in L^{1}([-\pi,\pi],\mathbb{C}^{m\times m}) has Hermitian Fourier coefficients. Let T(n,m)[f]mn×mnT_{(n,m)}[f]\in\mathbb{C}^{mn\times mn} be the block Toeplitz matrix generated by ff and let Y(n,m)=YnImmn×mnY_{(n,m)}=Y_{n}\otimes I_{m}\in\mathbb{R}^{mn\times mn}. Then

{Y(n,m)T(n,m)[f]}nλψ|f|,|f|=(ff)1/2,\{Y_{(n,m)}T_{(n,m)}[f]\}_{n}\sim_{\lambda}\psi_{|f|},\qquad|f|=(ff^{*})^{1/2},

over the domain D~\widetilde{D} with D=[0,2π]D=[0,2\pi] and p=2πp=-2\pi, where ψ|f|\psi_{|f|} is defined in (9). That is

limn1mnj=1mnF(λj(Y(n,m)T(n,m)[f]))=14mπππj=1mF(λj(|f|(x)))+F(λj(|f|(x)))dx,\lim_{n\to\infty}\frac{1}{mn}\sum_{j=1}^{mn}F(\lambda_{j}(Y_{(n,m)}T_{(n,m)}[f]))=\frac{1}{4m\pi}\int_{-\pi}^{\pi}\sum_{j=1}^{m}F(\lambda_{j}(|f|(x)))+F(-\lambda_{j}(|f|(x)))\,dx,

where λj(|f|(x))\lambda_{j}(|f|(x)), j=1,2,,mj=1,2,\dots,m, are the eigenvalue functions of |f||f|.

As for the symmetrized matrix 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} given in (5), we can see by MazzaPestana2018 ; Ferrari2019 that their eigenvalues are distributed as ±|gθ|\pm|g_{\theta}|. By Theorem 1, 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} is always symmetric indefinite when nn is sufficiently large, which gives grounds for the use of MINRES in this work.

2.2 Matrix functions

In this subsection, several useful results on functions of matrices are presented.

If FF is analytic on a simply connected open region of the complex plane containing the interval [1,1][-1,1], there exist ellipses with foci in 1-1 and 11 such that FF is analytic in their interiors. Let α>1\alpha>1 and β>0\beta>0 be the half axes of such an ellipse with α2β2=1\sqrt{\alpha^{2}-\beta^{2}}=1. Such an ellipse, denoted by E𝒳{E}_{\mathcal{X}}, is completely specified once the number 𝒳:=α+β\mathcal{X}:=\alpha+\beta is known.

Theorem 2 (Bernstein’s theorem).

(Benzi1999, , Theorem 2.1) Let the function FF be analytic in the interior of the ellipse 𝔼𝒳\mathbb{E}_{\mathcal{X}} with 𝒳>1\mathcal{X}>1 and continuous on 𝔼𝒳\mathbb{E}_{\mathcal{X}}. In addition, suppose F(x)F(x) is real for real xx. Then, the best approximation error

Ek(F):=inf{Ep:deg(p)k}2M(𝒳)𝒳k(𝒳1),E_{k}(F):=\textrm{inf}\{\|E-p\|_{\infty}:\textrm{deg}(p)\leq k\}\leq\frac{2M(\mathcal{X})}{\mathcal{X}^{k}(\mathcal{X}-1)},

where deg(p)\textrm{deg}(p) denotes the degree of the polynomial p(x)p(x) and

Fp=max1x1|F(x)p(x)|,M(𝒳)=maxx𝔼𝒳{|F(x)|}.\|F-p\|_{\infty}=\max_{-1\leq x\leq 1}|F(x)-p(x)|,\quad M(\mathcal{X})=\max_{x\in\mathbb{E}_{\mathcal{X}}}\{|F(x)|\}.

Let AnA_{n} be an n×nn\times n symmetric matrix and let [λmin,λmax][\lambda_{\min},\lambda_{\max}] be the smallest interval containing σ(An)\sigma(A_{n}). If we introduce the linear affine function

ψ(λ)=2λ(λmin+λmax)λmaxλmin,\psi(\lambda)=\frac{2\lambda-(\lambda_{\min}+\lambda_{\max})}{\lambda_{\max}-\lambda_{\min}},

then ψ([λmin,λmax])=[1,1]\psi([\lambda_{\min},\lambda_{\max}])=[-1,1] and therefore the spectrum of the symmetric matrix

Bn:=ψ(An)=2λmaxλminAnλmin+λmaxλmaxλminInB_{n}:=\psi(A_{n})=\frac{2}{\lambda_{\max}-\lambda_{\min}}A_{n}-\frac{\lambda_{\min}+\lambda_{\max}}{\lambda_{\max}-\lambda_{\min}}I_{n}

is contained in [1,1][-1,1]. Given a function ff analytic on a simply connected region containing [λmin,λmax][\lambda_{\min},\lambda_{\max}] and such that f(λ)f(\lambda) is real when λ\lambda is real, the function F=fψ1F=f\circ\psi^{-1} satisfies the assumptions of Bernstein’s theorem.

In the special case where AnA_{n} is SPD and f(x)=x1/2f(x)=x^{-1/2} that are of our interest in this work, we apply Bernstein’s result to the function

F(x)=1(ba)2x+a+b2,F(x)=\frac{1}{\sqrt{\frac{(b-a)}{2}x+\frac{a+b}{2}}}, (10)

where a=λmin(An),b=λmax(An)a=\lambda_{\min}(A_{n}),b=\lambda_{\max}(A_{n}), and 1<𝒳<κ+1κ11<\mathcal{X}<\frac{\sqrt{\kappa}+1}{\sqrt{\kappa}-1} with the spectral condition number of AnA_{n} being κ=b/a.\kappa={b}/{a}.

3 Main results

In this section, we provide our results on the proposed preconditioner 𝒫\mathcal{P} defined by (6), whose design is based on the following spectral symbol

|gθ|2\displaystyle|g_{\theta}|^{2} =\displaystyle= A(0)2+A(1)2+A(0)A(1)e𝐢x+A(0)A(1)e𝐢x\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+A_{(0)}A_{(1)}e^{-\mathbf{i}x}+A_{(0)}A_{(1)}e^{\mathbf{i}x} (11)
=\displaystyle= A(0)2+A(1)2+2A(0)A(1)cos(x).\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos{(x)}.

Due to the fact that 𝒫=T(n,m)[|gθ|2]\mathcal{P}=\sqrt{T_{(n,m)}[|g_{\theta}|^{2}]}, it can be regarded as an optimal preconditioner for 𝒴𝒯\mathcal{Y}\mathcal{T}, in the sense of the spectral distribution theory developed in the work Ferrari2019 . We discuss mainly MINRES combined with our proposed preconditioner, and related issues such as implementations and convergence analysis are given as well.

3.1 Preconditioning for heat equations

In this section, we provide a MINRES approach for 𝒴𝒯H𝐮=𝒴𝐟H\mathcal{Y}\mathcal{T}_{H}\mathbf{u}=\mathcal{Y}\mathbf{f}_{H}.

3.1.1 Implementations

We first discuss the fast computation of 𝒴𝒯H𝐯\mathcal{Y}\mathcal{T}_{H}\mathbf{v} for any given vector 𝐯\mathbf{v}. Since both MmM_{m} and KmK_{m} are sparse matrices, 𝒯H\mathcal{T}_{H} is sparse. Hence, computing the matrix-vector product 𝒯H𝐯\mathcal{T}_{H}\mathbf{v} only requires linear complexity of 𝒪(mn)\mathcal{O}(mn). Finally, computing 𝒴𝒯H𝐯\mathcal{Y}\mathcal{T}_{H}\mathbf{v} needs the same complexity since 𝒴\mathcal{Y} simply imposes a reordering on the vector 𝒯H𝐯\mathcal{T}_{H}\mathbf{v}. Alternatively, due to the fact that 𝒯H\mathcal{T}_{H} itself is a block Toeplitz matrix, it is well-known that the product 𝒴𝒯H𝐯\mathcal{Y}\mathcal{T}_{H}\mathbf{v} can be computed in 𝒪(mnlogn)\mathcal{O}(mn\log{n}) operations using FFTs and the related storage is of 𝒪(mn)\mathcal{O}(mn), where mm can be regarded as a fixed constant and nn is the effective dimensional parameter.

We first indicate that 𝒫\mathcal{P} has the following decomposition

𝒫H=T(n,m)[|gθ|2]\displaystyle\mathcal{P}_{H}=\sqrt{T_{(n,m)}[|g_{\theta}|^{2}]} =In(A(0)2+A(1)2)+Pn2(A(0)A(1))\displaystyle=\sqrt{I_{n}\otimes(A_{(0)}^{2}+A_{(1)}^{2})+P_{n}\otimes 2(A_{(0)}A_{(1)})}

where

Pn=[0121201212120]P_{n}=\begin{bmatrix}0&\frac{1}{2}&&&\\ \frac{1}{2}&0&\frac{1}{2}&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\frac{1}{2}\\ &&&\frac{1}{2}&0\end{bmatrix} (12)

is a tridiagonal Toeplitz matrix which has the eigendecomposition Pn=SnDnSnP_{n}=S_{n}D_{n}S_{n} with Sn=2n+1[sin(ijπn+1)]i,j=1nn×nS_{n}=\sqrt{\frac{2}{n+1}}\Big{[}\sin{(\frac{ij\pi}{n+1})}\Big{]}^{n}_{i,j=1}\in\mathbb{R}^{n\times n} being the symmetric discrete sine transform matrix. Hence,

𝒫H\displaystyle\mathcal{P}_{H} =InUm(Λ(0)2+Λ(1)2)UmT+Pn(Um2Λ(0)Λ(1)UmT)\displaystyle=\sqrt{I_{n}\otimes U_{m}(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})U_{m}^{T}+P_{n}\otimes(U_{m}2\Lambda_{(0)}\Lambda_{(1)}U_{m}^{T})}
=(SnUm)(In(Λ(0)2+Λ(1)2)+Dn2Λ(0)Λ(1))(SnUm)T\displaystyle=\sqrt{(S_{n}\otimes U_{m})\Big{(}I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}\Big{)}(S_{n}\otimes U_{m})^{T}}
=(SnUm)In(Λ(0)2+Λ(1)2)+Dn2Λ(0)Λ(1)(SnUm)T.\displaystyle=(S_{n}\otimes U_{m})\sqrt{I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}}(S_{n}\otimes U_{m})^{T}.

The product 𝐳=𝒫H1𝐯\mathbf{z}=\mathcal{P}_{H}^{-1}\mathbf{v} can be computed via the following three-step procedures:

  1. [1.]

  2. 1.

    Compute𝐯~=(SnUmT)𝐯;\textrm{Compute}~{}\widetilde{\mathbf{v}}=(S_{n}\otimes U_{m}^{T})\mathbf{v}\mathchar 24635\relax\;

  3. 2.

    Compute𝐳~=(In(Λ(0)2+Λ(1)2)+Dn2Λ(0)Λ(1))1𝐯~;\textrm{Compute}~{}\widetilde{\mathbf{z}}=\Big{(}\sqrt{I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}}\Big{)}^{-1}\widetilde{\mathbf{v}}\mathchar 24635\relax\;

  4. 3.

    Compute𝐳=(SnUm)𝐳~\textrm{Compute}~{}{\mathbf{z}}=(S_{n}\otimes{U_{m}})\widetilde{\mathbf{z}}.

When both MmM_{m} and KmK_{m} are diagonalizable by Um=SmU_{m}=S_{m}, which holds true when the spatial discretization is a finite difference/element method with a uniform square grid and a constant diffusion coefficient function a(x)a(x), the matrix-vector product 𝐳=𝒫H1𝐯\mathbf{z}=\mathcal{P}_{H}^{-1}\mathbf{v} can be computed efficiently in 𝒪(nm(logm+logn))\mathcal{O}(nm(\log{m}+\log{n})) operations using fast sine transforms with a storage is of order 𝒪(nm)\mathcal{O}(nm).

When the diffusion coefficients change corresponding to space, the matrix KmK_{m} is no longer a Toeplitz matrix and is not diagonalizable by SmS_{m}. Considering the one-dimensional space case, let the spatial grid size h=1m+1h=\frac{1}{m+1}, we have the following coefficient matrix

Km=1h2[a12+a32a32a32a32+a52a52a2m12a2m12a2m12+a2m+12],K_{m}=\frac{1}{h^{2}}\begin{bmatrix}a_{\frac{1}{2}}+a_{\frac{3}{2}}&-a_{\frac{3}{2}}&&&\\ -a_{\frac{3}{2}}&a_{\frac{3}{2}}+a_{\frac{5}{2}}&-a_{\frac{5}{2}}&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&-a_{\frac{2m-1}{2}}\\ &&&-a_{\frac{2m-1}{2}}&a_{\frac{2m-1}{2}}+a_{\frac{2m+1}{2}}\end{bmatrix}, (13)

where aj=xmin+jha_{j}=x_{\textrm{m}in}+jh for all j[0,m]j\in[0,m]. In this case, we can construct a preconditioner for KmK_{m} by averaging is entries along the three diagonals to obtain a tridiagonal symmetric Toeplitz matrix. Clearly, such a new preconditioner can be fast diagonalized by SmS_{m}. Thus, the overall preconditioner 𝒫H\mathcal{P}_{H} can be diagonalized as well.

3.1.2 Eigenvalue analysis of 𝒫H1𝒴𝒯H\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}

The following proposition guarantees the effectiveness of (𝒯H𝒯H)1\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1} as a preconditioner for 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H}, showing that the eigenvalues of (𝒯H𝒯H)1𝒴𝒯H\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H} are clustered around ±1\pm 1 which ensures fast convergence of MINRES.

Proposition 3.

Let 𝒯H\mathcal{T}_{H} and 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} be defined in (2) and (5), respectively. Then

(𝒯H𝒯H)1𝒴𝒯H=𝒬0,\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{0},

where 𝒬0\mathcal{Q}_{0} is both symmetric and orthogonal matrix, and have only 1-1 and 11 as eigenvalues.

Proof: Since the eigendecomposition of the symmetric matrix 𝒴𝒯H\mathcal{Y}\mathcal{T}_{H} is 𝒴𝒯H=E𝒬E\mathcal{Y}\mathcal{T}_{H}=E^{\top}\mathcal{Q}E, we can have

(𝒯H𝒯H)1𝒴𝒯H\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H} =\displaystyle= ((𝒴𝒯H)𝒴𝒯H)1𝒴𝒯H\displaystyle\left(\sqrt{(\mathcal{Y}\mathcal{T}_{H})^{\top}\mathcal{Y}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}
=\displaystyle= ((𝒴𝒯H)2)1𝒴𝒯H\displaystyle\left(\sqrt{(\mathcal{Y}\mathcal{T}_{H})^{2}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}
=\displaystyle= E(𝒬2)1/2EE𝒬E\displaystyle E^{\top}(\mathcal{Q}^{2})^{-1/2}EE^{\top}\mathcal{Q}E
=\displaystyle= E𝒬^E\displaystyle E^{\top}\hat{\mathcal{Q}}E
=\displaystyle= 𝒬0\displaystyle\mathcal{Q}_{0}

where 𝒬^\hat{\mathcal{Q}} is a diagonal matrix whose entries are either 1-1 or 11. Therefore, 𝒬0\mathcal{Q}_{0} is symmetric and orthogonal, and hence its eigenvalues are only 1-1 and 11.

Now, we turn our focus to the following lemma and proposition, which will be helpful to show our main result.

Lemma 4.

Let 𝒯H,𝒫Hmn×mn\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn} be defined in (2) and (6), respectively. Then,

rank((𝒯H𝒯H)K𝒫H2K)Km,\textrm{rank}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K})\leq Km,

for some positive integer KK provided that n>Kmn>Km.

Proof: First, we observe that

𝒯H𝒯H𝒫H2=[A(1)2].\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2}=\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&-A_{(1)}^{2}\\ \end{bmatrix}.

Second, exploiting the simple structures of both 𝒯H𝒯H\mathcal{T}_{H}^{\top}\mathcal{T}_{H} and 𝒫H2\mathcal{P}_{H}^{2}, we have by direct computations

(𝒯H𝒯H)nα(𝒯H𝒯H𝒫H2)𝒫H2nβ=[](\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{n_{\alpha}}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2n_{\beta}}=\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix} (14)

for integer values nαn_{\alpha} and nβn_{\beta}, where * represents a nonzero entry. Namely, (𝒯H𝒯H)nα(𝒯H𝒯H𝒫H2)𝒫H2nβ(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{n_{\alpha}}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2n_{\beta}} is a block matrix with one block in its Southeast corner, and the block is of size (nα+1)m×(nβ+1)m(n_{\alpha}+1)m\times(n_{\beta}+1)m. Thus,

(𝒯H𝒯H)K𝒫H2K\displaystyle(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K} =\displaystyle= i=0K1((𝒯H𝒯H)Ki𝒫H2i(𝒯H𝒯H)Ki1𝒫H2i+2)\displaystyle\sum_{i=0}^{K-1}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i}\mathcal{P}_{H}^{2i}-(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i-1}\mathcal{P}_{H}^{2i+2})
=\displaystyle= i=0K1(𝒯H𝒯H)Ki1(𝒯H𝒯H𝒫H2)𝒫H2i\displaystyle\sum_{i=0}^{K-1}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i-1}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2i}

is also a block matrix with one block in its Southeast corner, which is of size Km×KmKm\times Km provided that n>Kmn>Km. Hence, we have rank((𝒯H𝒯H)K𝒫H2K)Km\textrm{rank}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K})\leq Km.

Remark 1.

The computational lemma used in the proof of Lemma 4 was first given in (doi:10.1137/080720280, , Lemma 3.11).

Proposition 5.

Let 𝒯H,𝒫Hmn×mn\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn} be defined in (2) and (6), respectively. Then for any ϵ>0\epsilon>0 there exists an integer KK such that for all n>Kmn>Km

(𝒯H𝒯H)1𝒫H1=+1,\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{P}_{H}^{-1}=\mathcal{E}+\mathcal{R}_{1},

where 2ϵ\|\mathcal{E}\|_{2}\leq\epsilon and rank(1)Km\textrm{rank}(\mathcal{R}_{1})\leq Km.

Proof: Let f(x)=x1/2f(x)=x^{-1/2} and F(x)F(x) be defined in equation (10). By Theorem 2, there exists a polynomial pKp_{K} with degree less than or equal to KK such that

(𝒯H𝒯H)1pK(𝒯H𝒯H)2\displaystyle\|\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\|_{2} =|(𝒯H𝒯H)12pK(𝒯H𝒯H)2\displaystyle=|(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{-\frac{1}{2}}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\|_{2}
=maxxσ(𝒯H𝒯H)|F(x)pK(x)|\displaystyle=\max_{x\in\sigma(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})}|F(x)-p_{K}(x)|
FpK(x)\displaystyle\leq\|F-p_{K}(x)\|_{\infty}
2M(𝒳𝒯H𝒯H)𝒳𝒯H𝒯H11𝒳𝒯H𝒯HK,\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}})}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}^{K}},
𝒫H1pK(𝒫H2)2\displaystyle\|\mathcal{P}_{H}^{-1}-p_{K}(\mathcal{P}_{H}^{2})\|_{2} =(𝒫H2)12pK(𝒫H2)2\displaystyle=\|(\mathcal{P}_{H}^{2})^{-\frac{1}{2}}-p_{K}(\mathcal{P}_{H}^{2})\|_{2}
=maxxσ(𝒫H2)|F(x)pK(x)|\displaystyle=\max_{x\in\sigma(\mathcal{P}_{H}^{2})}|F(x)-p_{K}(x)|
FpK(x)\displaystyle\leq\|F-p_{K}(x)\|_{\infty}
2M(𝒳𝒫H2)𝒳𝒫H211𝒳𝒫H2K,\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{P}_{H}^{2}})}{\mathcal{X}_{\mathcal{P}_{H}^{2}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{P}_{H}^{2}}^{K}},

where

1<𝒳𝒯H𝒯H<κ𝒯H𝒯H+1κ𝒯H𝒯H1,1<𝒳𝒫H2<κ𝒫H2+1κ𝒫H21,{\color[rgb]{0,0,0}1<\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}<\frac{\sqrt{\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}}+1}{\sqrt{\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}}-1}},\qquad 1<\mathcal{X}_{\mathcal{P}_{H}^{2}}<\frac{\sqrt{\kappa_{\mathcal{P}_{H}^{2}}}+1}{\sqrt{\kappa_{\mathcal{P}_{H}^{2}}}-1},

and κ𝒯H𝒯H\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}} and κ𝒫H2\kappa_{\mathcal{P}_{H}^{2}} are the condition numbers of 𝒯H𝒯H\mathcal{T}_{H}^{\top}\mathcal{T}_{H} and 𝒫H2\mathcal{P}_{H}^{2}, respectively. Thus, for any ϵ>0\epsilon>0 there exists an integer KK such that

(𝒯H𝒯H)1pK(𝒯H𝒯H)2ϵand𝒫H1pK(𝒫H2)2ϵ.{\color[rgb]{0,0,0}\|\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\|_{2}\leq\epsilon}\quad\textrm{and}\quad\|\mathcal{P}_{H}^{-1}-p_{K}(\mathcal{P}_{H}^{2})\|_{2}\leq\epsilon.

Also, we have

pK(𝒯H𝒯H)pK(𝒫H2)=i=0Kai((𝒯H𝒯H)i𝒫H2i)=:1.\displaystyle p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})-p_{K}(\mathcal{P}_{H}^{2})=\underbrace{\sum_{i=0}^{K}a_{i}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{i}-\mathcal{P}_{H}^{2i})}_{=:\mathcal{R}_{1}}.

By Lemma 4, we know that 1\mathcal{R}_{1} has the same sparsity structure as that of (𝒯H𝒯H)K𝒫H2K{\color[rgb]{0,0,0}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}}-\mathcal{P}_{H}^{2K}. Consequently, we deduce rank(1)Km\textrm{rank}(\mathcal{R}_{1})\leq{\color[rgb]{0,0,0}Km}.

We then obtain

(𝒯H𝒯H)1𝒫H1\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{P}_{H}^{-1}
=(𝒯H𝒯H)12pK(𝒯H𝒯H)+pK(𝒫H2)𝒫H1=:+pK(𝒯H𝒯H)pK(𝒫H2)=:1,\displaystyle=\underbrace{(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{-\frac{1}{2}}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})+p_{K}(\mathcal{P}_{H}^{2})-\mathcal{P}_{H}^{-1}}_{{\color[rgb]{0,0,0}=:\mathcal{E}}}+\underbrace{p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})-p_{K}(\mathcal{P}_{H}^{2})}_{{\color[rgb]{0,0,0}=:}\mathcal{R}_{1}},

where 22ϵ\|\mathcal{E}\|_{2}\leq 2\epsilon and rank(1)Km\textrm{rank}(\mathcal{R}_{1})\leq Km. Therefore, the proof is concluded.

Theorem 6.

Let 𝒴𝒯H,𝒫Hmn×mn\mathcal{Y}\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn} be defined in (5) and (6), respectively. Then for any ϵ>0\epsilon>0 there exists an integer KK such that for all n>Kmn>{\color[rgb]{0,0,0}Km}

𝒫H1𝒴𝒯H=𝒬H+H+H,\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{H}+\mathcal{{E}_{\textrm{H}}}+\mathcal{{R}_{\textrm{H}}},

where 𝒬H\mathcal{Q}_{H} is both symmetric and orthogonal, H2ϵ\|\mathcal{{E}_{\textrm{H}}}\|_{2}\leq\epsilon, and rank(H)Km.\textrm{rank}(\mathcal{{R}_{\textrm{H}}})\leq{\color[rgb]{0,0,0}Km}.

Proof: By Proposition 3, we can write

(𝒯H𝒯H)1𝒴𝒯H=𝒬0,\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{0},

By Proposition 5, we then have

𝒫H1𝒴𝒯H\displaystyle\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H} =\displaystyle= ((𝒯H𝒯H)11)𝒴𝒯H\displaystyle(\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{E}-\mathcal{R}_{1})\mathcal{Y}\mathcal{T}_{H}
=\displaystyle= 𝒬0𝒴𝒯H=:H1𝒴𝒯H=:H,\displaystyle\mathcal{Q}_{0}\underbrace{-\mathcal{E}\mathcal{Y}\mathcal{T}_{H}}_{=:\mathcal{{E}}_{H}}\underbrace{-\mathcal{R}_{1}\mathcal{Y}\mathcal{T}_{H}}_{=:\mathcal{{R}}_{H}},

where

rank(H)=rank(1𝒴𝒯H)Km\displaystyle\textrm{rank}(\mathcal{{R}}_{H}){\color[rgb]{0,0,0}=\textrm{rank}(\mathcal{R}_{1}\mathcal{Y}\mathcal{T}_{H})\leq Km}

and

H2=𝒴𝒯H2𝒴𝒯H2ϵ.\displaystyle\|\mathcal{{E}}_{H}\|_{2}=\|-\mathcal{E}\mathcal{Y}\mathcal{T}_{H}\|_{2}\leq\|\mathcal{Y}\mathcal{T}_{H}\|_{2}\epsilon.

Note that

𝒴𝒯H2=𝒯H2gθ.\displaystyle\|\mathcal{Y}\mathcal{T}_{H}\|_{2}=\|\mathcal{T}_{H}\|_{2}\leq\|g_{\theta}\|_{\infty}.

by using the general inequalities in (Serra_Tilli_2002, , Corollary 4.2), noticing that the Schatten \infty norm in that paper is exactly the spectral norm 2\|\cdot\|_{2}. Furthermore, we have strict inequality, that is YT2=T2<gθ\|YT\|_{2}=\|T\|_{2}<\|g_{\theta}\|_{\infty}, whenever the infimum of |gθ||g_{\theta}| is strictly bounded by gθ\|g_{\theta}\|_{\infty}, as it often occurs, including in our case.

Hence, 𝒴𝒯H2\|\mathcal{Y}\mathcal{T}_{H}\|_{2} is uniformly bounded with respect to nn and the proof is concluded.

By (BRANDTS20103100, , Corollary 3) and Theorem 1, we know from Theorem 6 that the preconditioned matrix sequence {𝒫H1𝒴𝒯H}n\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n} has clustered eigenvalues around ±1\pm 1, with a number of outliers independent of nn. Hence, the convergence is independent of the time steps, and we can expect fast MINRES converges in exact arithmetic with 𝒫H\mathcal{P}_{H} as a preconditioner.

3.1.3 Eigenvalue analysis of (𝒫H1𝒯H)T𝒫H1𝒯H(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}

For a complete eigenvalue analysis, we examine the preconditioned normal equation matrix (𝒫H1𝒯H)T𝒫H1𝒯H(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}, even though we do not use the conjugate gradient method on normal equations (CGNE) in this work.

As for implementations, the product (𝒫H1𝒯H)T𝒫H1𝒯H𝐯(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}\mathbf{v} for any vector 𝐯\mathbf{v} can be computed effectively using the similar arguments in Section 3.1.1.

In the following result, we show that the eigenvalues of (𝒫H1𝒯H)T𝒫H1𝒯H(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H} are clustered around 11 with outliers independent of the time step nn, which implies fast convergence of CGNE.

Theorem 7.

Let 𝒯H,𝒫Hmn×mn\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn} be defined in (2) and (6), respectively. Then,

(𝒫H1𝒯H)T𝒫H1𝒯H=Imn+~H,\displaystyle(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}=I_{mn}+\mathcal{\widetilde{R}}_{H},

where ImnI_{mn} is the mnmn by mnmn identity matrix and rank(~H)m\textrm{rank}{\color[rgb]{0,0,0}(\mathcal{\widetilde{R}}_{H})}\leq m.

Proof: Direct computations show that

𝒫H2𝒯HT𝒯H=[OmOmA(1)2]=:~0,\displaystyle\mathcal{P}_{H}^{2}-\mathcal{T}_{H}^{T}\mathcal{T}_{H}=\underbrace{\begin{bmatrix}O_{m}&&&\\ &\ddots&&\\ &&O_{m}&\\ &&&A^{2}_{(1)}\\ \end{bmatrix}}_{=:\mathcal{\widetilde{R}}_{0}},

where rank(~0)m\textrm{rank}(\mathcal{\widetilde{R}}_{0})\leq m. The matrix equation further leads to

Imn𝒯H𝒫H2𝒯HT=𝒯H𝒫H2~0𝒯H1=:~1,\displaystyle I_{mn}-\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{T}_{H}^{T}=\underbrace{\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{\widetilde{R}}_{0}\mathcal{T}_{H}^{-1}}_{=:\mathcal{\widetilde{R}}_{1}}, (15)

where rank(~1)=rank(~0)m\textrm{rank}(\mathcal{\widetilde{R}}_{1})=\textrm{rank}(\mathcal{\widetilde{R}}_{0})\leq m.

We now consider the normal equation of 𝒫H1𝒴𝒯H\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}, namely

(𝒫H1𝒴𝒯H)T𝒫H1𝒴𝒯H=𝒴𝒯H𝒫H2𝒯HT𝒴=𝒴(Imn~1)𝒴=Imn𝒴~1𝒴=:~H\displaystyle(\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Y}\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{T}_{H}^{T}\mathcal{Y}=\mathcal{Y}(I_{mn}-\mathcal{\widetilde{R}}_{1})\mathcal{Y}=I_{mn}\underbrace{-\mathcal{Y}\mathcal{\widetilde{R}}_{1}\mathcal{Y}}_{=:\mathcal{\widetilde{R}}_{H}}

where rank(~H)=rank(~1)m\textrm{rank}(\mathcal{\widetilde{R}}_{H})=\textrm{rank}(\mathcal{\widetilde{R}}_{1})\leq m.

Notice that every rank-one term in ~\mathcal{\widetilde{R}} can perturb at the same time one eigenvalue of 11. Therefore, by Theorem 7, there are at least m(n1)m(n-1) eigenvalues of the normal equation matrix are equal to 11, which implies rapid convergence of CGNE.

3.1.4 Modified Preconditioner

Following our spectral symbol matching strategy, the construction of the modified preconditioner 𝒫θ\mathcal{P}_{\theta} is motivated by the following approximation without requiring the fast diagonalizability of MmM_{m} and KmK_{m}.

Since the original preconditioner 𝒫H\mathcal{P}_{H} satisfy {𝒫H1𝒴𝒯H}n\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n} are clustered around ±1\pm 1 , where gθ(x)=A(0)+A(1)eixg_{\theta}(x)=A_{(0)}+A_{(1)}e^{\textbf{i}x} defined by (3), we can approximate the spectral symbol gθg_{\theta}

|gθ|2=A(0)2+A(1)2+2A(0)A(1)cos(x)\displaystyle\sqrt{|g_{\theta}|^{2}}=\sqrt{A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos(x)}

where

A(0)2+A(1)2+2A(0)A(1)cos(x)\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos(x)
=\displaystyle= (22cos(x))Mm2+(θ2+(1θ)2+2θ(1θ)cos(x))τ2Km2\displaystyle(2-2\cos(x))M_{m}^{2}+(\theta^{2}+(1-\theta)^{2}+2\theta(1-\theta)\cos(x))\tau^{2}K_{m}^{2}
+2((2θ1)+(12θ)cos(x))τKmMm\displaystyle+2((2\theta-1)+(1-2\theta)\cos(x))\tau K_{m}M_{m}

via the following matrix-valued function

gθ~=22cos(x)Mm+τθ2+(1θ)2+2θ(1θ)cos(x)Km\displaystyle\widetilde{g_{\theta}}=\sqrt{2-2\cos(x)}M_{m}+\tau\sqrt{\theta^{2}+(1-\theta)^{2}+2\theta(1-\theta)\cos(x)}K_{m}

Hence, we can regard 𝒫θ\mathcal{P}_{\theta} as satisfying {𝒫θ1𝒴𝒯H}n\{\mathcal{P}_{\theta}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n} are clustered around ±1\pm 1.

In matrix form, the corresponding preconditioner generated by gθ~\widetilde{g_{\theta}} is precisely our proposed modified preconditioner 𝒫θ\mathcal{P}_{\theta} defined by (7).

The product 𝒫θ1𝐯\mathcal{P}_{\theta}^{-1}\mathbf{v} for any vector 𝐯\mathbf{v} can be implemented by the following three-step procedures, which does not require the fast diagonalizability of MmM_{m} and KmK_{m}. Notice that both \mathcal{H} and θ\mathcal{H}_{\theta} from (7) are Tau matrices. Hence,

𝒫θ\displaystyle\mathcal{P}_{\theta} =Mm+θτKm\displaystyle=\mathcal{H}\otimes M_{m}+\mathcal{H}_{\theta}\otimes\tau K_{m}
=(SnΛSn)Mm+(SnΛθSn)τKm\displaystyle=(S_{n}\Lambda_{\mathcal{H}}S_{n})\otimes M_{m}+(S_{n}\Lambda_{\mathcal{H}_{\theta}}S_{n})\otimes\tau K_{m}
=(SnIm)(ΛMm+ΛθKm)(SnIm)T,\displaystyle=(S_{n}\otimes I_{m})(\Lambda_{\mathcal{H}}\otimes M_{m}+\Lambda_{\mathcal{H}_{\theta}}\otimes K_{m})(S_{n}\otimes I_{m})^{T},

where Λ\Lambda_{\mathcal{H}} and Λθ\Lambda_{\mathcal{H}_{\theta}} are respectively the diagonal matrices containing the eigenvalues of \mathcal{H} and θ\mathcal{H}_{\theta}.

Hence, the computations of 𝐳=𝒫θ1𝐯\mathbf{z}=\mathcal{P}_{\theta}^{-1}\mathbf{v} can be implemented via the following three steps:

  1. [1.]

  2. 1.

    Compute𝐯~=(SnIm)𝐯;\textrm{Compute}~{}\widetilde{\mathbf{v}}=(S_{n}\otimes I_{m})\mathbf{v}\mathchar 24635\relax\;

  3. 2.

    Compute𝐳~=(ΛMm+ΛθKm)1𝐯~;\textrm{Compute}~{}\widetilde{\mathbf{z}}=(\Lambda_{\mathcal{H}}\otimes M_{m}+\Lambda_{\mathcal{H}_{\theta}}\otimes K_{m})^{-1}\widetilde{\mathbf{v}}\mathchar 24635\relax\;

  4. 3.

    Compute𝐳=(SnIm)𝐳~\textrm{Compute}~{}{\mathbf{z}}=(S_{n}\otimes{I_{m}})\widetilde{\mathbf{z}}.

Both Steps 1 & 3 can be computed efficiently via fast sine transforms in 𝒪(mnlogn)\mathcal{O}(mn\log{n}) operations. As for Step 2, the shifted Laplacian systems can be efficiently solved for example using the multigrid methods doi:10.1137/15M102085X . For more details regarding such efficient implementation, we refer to (doi:10.1137/16M1062016, , Section 3.1).

4 Extension to multistep methods

Our proposed preconditioning method can be easily extended for high-order in time discretization schemes. Consider again (1), using an ll-order backward difference scheme for time, we have the following all-at-once system 𝒯^𝐮=𝐟^\widehat{\mathcal{T}}\mathbf{u}=\widehat{\mathbf{f}} to solve for 𝐮\mathbf{u}, where

𝒯^=[A(0)A(1)A(l)A(l)A(1)A(0)]\widehat{\mathcal{T}}=\begin{bmatrix}A_{(0)}&&&&&\\ A_{(1)}&\ddots&&&&\\ \vdots&\ddots&\ddots&&&\\ A_{(l)}&\ddots&\ddots&\ddots&&\\ &\ddots&\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\ddots&\ddots\\ &&&A_{(l)}&\cdots&A_{(1)}&A_{(0)}\end{bmatrix} (16)

is a mnmn by mnmn multiple block Toeplitz matrix, which is generated by the matrix-valued function

g^(x)=A(0)+A(1)eix++A(l)eilx,\widehat{g}(x)=A_{(0)}+A_{(1)}e^{\textbf{i}x}+\cdots+A_{(l)}e^{\textbf{i}lx}, (17)

and 𝐟^\widehat{\mathbf{f}} is an appropriate vector resulted from the adopted discretization schemes. Note that the matrices A(k),k=1,,l,A_{(k)},k=1,\dots,l, are composed of the mass matrix MmM_{m} and the stiffness matrix KmK_{m}.

Our approach is to first permute (16) and obtain a symmetric (indefinite) matrix

𝒴𝒯^=[A(l)A(1)A(0)A(l)A(1)A(0)],\mathcal{Y}\widehat{\mathcal{T}}=\begin{bmatrix}&&&A_{(l)}&\cdots&A_{(1)}&A_{(0)}\\ &&\iddots&\iddots&\iddots&\iddots\\ &\iddots&\iddots&\iddots&\iddots&\\ A_{(l)}&\iddots&\iddots&\iddots&&\\ \vdots&\iddots&\iddots&&&\\ A_{(1)}&\iddots&&&&\\ A_{(0)}&&&&&&\end{bmatrix}, (18)

and then construct a corresponding sine transform based preconditioner 𝒫^\widehat{\mathcal{P}} for it. We first compute the following matrix-valued function from (17)

|g^(x)|2\displaystyle|\widehat{g}(x)|^{2} =\displaystyle= A(0)2+A(1)2++A(l)2=:q0(A)\displaystyle\underbrace{A_{(0)}^{2}+A_{(1)}^{2}+\cdots+A_{(l)}^{2}}_{=:q_{0}(A)}
+(A(0)A(1)+A(1)A(2)++A(l1)A(l))=:q1(A)2cos(x)\displaystyle+\underbrace{(A_{(0)}A_{(1)}+A_{(1)}A_{(2)}+\cdots+A_{(l-1)}A_{(l)})}_{=:q_{1}(A)}2\cos{(x)}
++(A(0)A(l))=:ql(A)2cos(lx),\displaystyle+\cdots+\underbrace{(A_{(0)}A_{(l)})}_{=:q_{l}(A)}2\cos{(lx)},

where qk,k=0,,l,q_{k},k=0,\dots,l, are all matrix polynomials composed of A(k),k=0,,lA_{(k)},k=0,\dots,l. Now, by expressing cos(kx),k=1,,l,\cos{(kx)},k=1,\dots,l, in terms of cos(x)\cos{(x)}, we can further rewrite |g^|2|\widehat{g}|^{2} as

|g^(x)|2\displaystyle|\widehat{g}(x)|^{2} =\displaystyle= q¯0(A)+q¯1(A)cos(x)++q¯l(A)cosl(x),\displaystyle\overline{q}_{0}(A)+\overline{q}_{1}(A)\cos{(x)}+\cdots+\overline{q}_{l}(A)\cos^{l}{(x)},

where q¯k,k=0,,l,\overline{q}_{k},k=0,\dots,l, are certain matrix polynomials composed of A(k),k=0,,l.A_{(k)},k=0,\dots,l.

We propose the following SPD preconditioner, which approximates the symbol |g^||\widehat{g}| in the eigenvalue sense, for 𝒴𝒯^\mathcal{Y}\widehat{\mathcal{T}}:

𝒫^=Inq¯0(A)+Pnq¯1(A)++Pnlq¯l(A),\displaystyle{\color[rgb]{0,0,0}\widehat{\mathcal{P}}}=\sqrt{I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A)}, (20)

where PnP_{n} is the same matrix given by (12).

Note that

𝒫^\displaystyle{\color[rgb]{0,0,0}\widehat{\mathcal{P}}} =InUmq¯0(Λ)UmT+PnUmq¯1(Λ)UmT++PnlUmq¯l(Λ)UmT\displaystyle=\sqrt{I_{n}\otimes U_{m}\overline{q}_{0}(\Lambda)U_{m}^{T}+P_{n}\otimes U_{m}\overline{q}_{1}(\Lambda)U_{m}^{T}+\cdots+P_{n}^{l}\otimes U_{m}\overline{q}_{l}(\Lambda)U_{m}^{T}}
=(SnUm)(Inq¯0(Λ)+Dnq¯1(Λ)++Dnlq¯l(Λ))(SnUm)T\displaystyle=\sqrt{(S_{n}\otimes U_{m})\big{(}I_{n}\otimes\overline{q}_{0}(\Lambda)+D_{n}\otimes\overline{q}_{1}(\Lambda)+\cdots+D_{n}^{l}\otimes\overline{q}_{l}(\Lambda)\big{)}(S_{n}\otimes U_{m})^{T}}
=(SnUm)Inq¯0(Λ)+Dnq¯1(Λ)++Dnlq¯l(Λ)(SnUm)T.\displaystyle=(S_{n}\otimes U_{m})\sqrt{I_{n}\otimes\overline{q}_{0}(\Lambda)+D_{n}\otimes\overline{q}_{1}(\Lambda)+\cdots+D_{n}^{l}\otimes\overline{q}_{l}(\Lambda)}(S_{n}\otimes U_{m})^{T}.

Hence, the product 𝒫^1𝐯{\color[rgb]{0,0,0}{\widehat{\mathcal{P}}}^{-1}}\mathbf{v} for any vector 𝐯\mathbf{v} can be computed effectively using the similar three-step procedures in Section 3.1.1.

4.1 Eigenvalue analysis of 𝒫^1𝒴𝒯^\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}}

As before, we first introduce matrix (𝒯^𝒯^)1\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1} as an auxiliary for showing the eigenvalues of 𝒫^1𝒴𝒯^\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}} are clustered. The following proposition guarantees the effectiveness of 𝒯^𝒯^\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}} as a preconditioner for 𝒴𝒯^\mathcal{Y}\widehat{\mathcal{T}}, showing that the eigenvalues of {(𝒯^𝒯^)1𝒴𝒯^}n\{\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}\}_{n} are clustered around ±1\pm 1, which leads to fast convergence of MINRES.

Proposition 8.

Let 𝒴𝒯^,𝒯^mn×mn\mathcal{Y}\widehat{\mathcal{T}},\widehat{\mathcal{T}}\in\mathbb{R}^{mn\times mn} be defined in (18) and (16), respectively. Then, (𝒯^𝒯^)1𝒴𝒯^\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}} is symmetric and orthogonal, and eigenvalues are only 11 and 1-1.

Proof: Since the eigendecomposition of the symmetric matrix 𝒴𝒯^\mathcal{Y}\widehat{\mathcal{T}} is 𝒴𝒯^=E𝒬0E\mathcal{Y}\widehat{\mathcal{T}}=E^{\top}\mathcal{Q}_{0}E, we can have

(𝒯^𝒯^)1𝒴𝒯^\displaystyle\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}} =\displaystyle= ((𝒴𝒯^)𝒴𝒯^)1𝒴𝒯^\displaystyle\left(\sqrt{(\mathcal{Y}\widehat{\mathcal{T}})^{\top}\mathcal{Y}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}
=\displaystyle= ((𝒴𝒯^)2)1𝒴𝒯^\displaystyle\left(\sqrt{(\mathcal{Y}\widehat{\mathcal{T}})^{2}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}
=\displaystyle= E(𝒬02)1/2EE𝒬0E\displaystyle E^{\top}(\mathcal{Q}_{0}^{2})^{-1/2}EE^{\top}\mathcal{Q}_{0}E
=\displaystyle= E𝒬0^E\displaystyle E^{\top}\widehat{\mathcal{Q}_{0}}E

where 𝒬0^\widehat{\mathcal{Q}_{0}} is a diagonal matrix whose entries are either 1-1 or 11. Therefore, (𝒯^𝒯^)1𝒴𝒯^\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}} is symmetric and orthogonal, and hence its eigenvalues are only 1-1 and 11.

In addition, we require the following lemma and proposition for showing our main result.

Lemma 9.

Let 𝒯^,𝒫^mn×mn{\color[rgb]{0,0,0}\widehat{\mathcal{T}}},\widehat{\mathcal{P}}\in\mathbb{R}^{mn\times mn} be defined in (16) and (20), respectively. Then,

rank((𝒯^𝒯^)K𝒫^2K)2Klm\textrm{rank}({\color[rgb]{0,0,0}{(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})}^{K}}-\widehat{\mathcal{P}}^{2K})\leq{\color[rgb]{0,0,0}2Klm}

for some positive integer KK provided that n>2Klmn>{\color[rgb]{0,0,0}2Klm}.

Proof: First, we observe from (4) that

T(n,m)[|g^|2]\displaystyle{T}_{(n,m)}[|\widehat{g}|^{2}] =\displaystyle= T(n,m)[q0(A)+q1(A)2cos(x)++ql(A)2cos(lx)]\displaystyle{T}_{(n,m)}[q_{0}(A)+q_{1}(A)2\cos{(x)}+\cdots+q_{l}(A)2\cos{(lx)}]
=\displaystyle= [q0(A)q1(A)ql(A)q1(A)ql(A)ql(A)q1(A)ql(A)q1(A)q0(A)].\displaystyle\begin{bmatrix}q_{0}(A)&q_{1}(A)&\cdots&q_{l}(A)&&&&&\\ q_{1}(A)&\ddots&\ddots&\ddots&\ddots&&&&\\ \vdots&\ddots&&&&\ddots&&&\\ q_{l}(A)&\ddots&&&&&\ddots&\\ &\ddots&&&&&&\ddots&\\ &&\ddots&&&&&\ddots&q_{l}(A)\\ &&&\ddots&&&&\ddots&\vdots\\ &&&&\ddots&\ddots&\ddots&\ddots&q_{1}(A)\\ &&&&&q_{l}(A)&\cdots&q_{1}(A)&q_{0}(A)\end{bmatrix}.

Note that we also have

T(n,m)[|g^|2]\displaystyle{T}_{(n,m)}[|\widehat{g}|^{2}]
=\displaystyle= T(n,m)[q¯0(A)+q¯1(A)cos(x)+q¯2(A)cos2(x)++q¯l(A)cosl(x)]\displaystyle{T}_{(n,m)}[\overline{q}_{0}(A)+\overline{q}_{1}(A)\cos{(x)}+\overline{q}_{2}(A)\cos^{2}{(x)}+\cdots+\overline{q}_{l}(A)\cos^{l}{(x)}]
=\displaystyle= T(n,m)[q¯0(A)]+T(n,m)[q¯1(A)cos(x)]+T(n,m)[q¯2(A)cos2(x)]\displaystyle{T}_{(n,m)}[\overline{q}_{0}(A)]+{T}_{(n,m)}[\overline{q}_{1}(A)\cos{(x)}]+{T}_{(n,m)}[\overline{q}_{2}(A)\cos^{2}{(x)}]
++T(n,m)[q¯l(A)cosl(x)]\displaystyle+\cdots+{T}_{(n,m)}[\overline{q}_{l}(A)\cos^{l}{(x)}]
=\displaystyle= Inq¯0(A)+T(n,1)[cos(x)]q¯1(A)+T(n,1)[cos2(x)]q¯2(A)\displaystyle I_{n}\otimes\overline{q}_{0}(A)+T_{(n,1)}[\cos{(x)}]\otimes\overline{q}_{1}(A)+T_{(n,1)}[\cos^{2}{(x)}]\otimes\overline{q}_{2}(A)
++T(n,1)[cosl(x)]q¯l(A)\displaystyle+\cdots+T_{(n,1)}[\cos^{l}{(x)}]\otimes\overline{q}_{l}(A)
=\displaystyle= Inq¯0(A)+T(n,1)[cos(x)]q¯1(A)+(T(n,1)[cos(x)]2+R(2))q¯2(A)\displaystyle I_{n}\otimes\overline{q}_{0}(A)+T_{(n,1)}[\cos{(x)}]\otimes\overline{q}_{1}(A)+(T_{(n,1)}[\cos{(x)}]^{2}+{R}_{(2)})\otimes\overline{q}_{2}(A)
++(T(n,1)[cos(x)]l+R(l))q¯l(A),\displaystyle+\cdots+(T_{(n,1)}[\cos{(x)}]^{l}+{R}_{(l)})\otimes\overline{q}_{l}(A),

where R(k),k=2,,l,{R}_{(k)},k=2,\dots,l, are low rank matrices in the sense that rank(R(k))2(k1)\textrm{rank}({R}_{(k)})\leq 2(k-1).

Recalling from (20) that 𝒫^2=Inq¯0(A)+Pnq¯1(A)++Pnlq¯l(A)\mathcal{\widehat{P}}^{2}=I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A) and also note that Pn=T(n,1)[cos(x)]P_{n}=T_{(n,1)}[\cos{(x)}], we have

T(n,m)[|g^|2]\displaystyle{T}_{(n,m)}[|\widehat{g}|^{2}] =\displaystyle= Inq¯0(A)+Pnq¯1(A)+(Pn2+R(2))q¯2(A)\displaystyle I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+({\color[rgb]{0,0,0}P_{n}^{2}}+{R}_{(2)})\otimes\overline{q}_{2}(A)
++(Pnl+R(l))q¯l(A)\displaystyle+\cdots+(P_{n}^{l}+{R}_{(l)})\otimes\overline{q}_{l}(A)
=\displaystyle= Inq¯0(A)+Pnq¯1(A)+Pn2q¯2(A)++Pnlq¯l(A)=𝒫^2\displaystyle\underbrace{I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+{\color[rgb]{0,0,0}P_{n}^{2}}\otimes\overline{q}_{2}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A)}_{=\mathcal{\widehat{P}}^{2}}
+R(2)q¯2(A)++R(l)q¯l(A)^1.\displaystyle+\underbrace{{R}_{(2)}\otimes\overline{q}_{2}(A)+\cdots+{R}_{(l)}\otimes\overline{q}_{l}(A)}_{\widehat{\mathcal{R}}_{1}}.

One can further show by direct computations that ^1\widehat{\mathcal{R}}_{1} has the following structure

^1=[]{\color[rgb]{0,0,0}\widehat{\mathcal{R}}_{1}=\begin{bmatrix}*&\cdots&*&&&&&&\\ \vdots&&\vdots&&&&&&\\ *&\cdots&*&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}}

where * represents a nonzero entry, since such a block structure is determined by the last term R(l)q¯l(A){R}_{(l)}\otimes\overline{q}_{l}(A). Namely, it is a block matrix with in the Southeast and Northwest corners and the block is of size lm×lmlm\times lm.

We can now examine the structure of the matrix 𝒯^𝒯^𝒫^2{\color[rgb]{0,0,0}\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}-\widehat{\mathcal{P}}^{2}. By (16), we have

𝒯^𝒯^𝒫^2\displaystyle\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2} =\displaystyle= 𝒯^𝒯^T(n,m)[|g^|2]+T(n,m)[|g^|2]𝒫^2\displaystyle\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-{T}_{(n,m)}[|\widehat{g}|^{2}]+{T}_{(n,m)}[|\widehat{g}|^{2}]-\widehat{\mathcal{P}}^{2}
=\displaystyle= []+^1,\displaystyle\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}+\widehat{\mathcal{R}}_{1},

where the first matrix is a block matrix with a lm×lmlm\times lm block in the corner. Thus, 𝒯^𝒯^𝒫^2{\color[rgb]{0,0,0}\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}-\widehat{\mathcal{P}}^{2} is a block matrix with a lm×lmlm\times lm block in the two corners. We have by direct computations

(𝒯^𝒯^)nα(𝒯^𝒯^𝒫^2)𝒫^2nβ=[].(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{n_{\alpha}}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2n_{\beta}}={\color[rgb]{0,0,0}\begin{bmatrix}*&\cdots&*&&&&&&\\ \vdots&&\vdots&&&&&&\\ *&\cdots&*&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}.}

Similar to (14), by exploiting the simple structure of 𝒯^𝒯^𝒫^2\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2}, we can show that for given positive integers nαn_{\alpha} and nβn_{\beta} the matrix (𝒯^𝒯^)nα(𝒯^𝒯^𝒫^2)𝒫^2nβ(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{n_{\alpha}}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2n_{\beta}} is a block matrix with a block in its Southeast and Northwest corners and the block is of size (nα+1)lm×(nβ+1)lm(n_{\alpha}+1)lm\times(n_{\beta}+1)lm. Thus,

(𝒯^𝒯^)K𝒫^2K\displaystyle(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K}-\widehat{\mathcal{P}}^{2K} =\displaystyle= i=0K1(𝒯^𝒯^)Ki1(𝒯^𝒯^𝒫^2)𝒫^2i\displaystyle\sum_{i=0}^{K-1}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K-i-1}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2i}

is also a block matrix with a block in the two corners and the size is Klm×KlmKlm\times Klm, provided that n>2Klmn>{\color[rgb]{0,0,0}2Klm}. Hence, we have rank((𝒯^𝒯^)K𝒫^2K)2Klm\textrm{rank}((\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K}-\widehat{\mathcal{P}}^{2K})\leq 2Klm.

The following results, i.e., Proposition 10 and Theorem 11, follow using similar arguments for showing Proposition 5 and Theorem 6. Hence, the proofs are left to the readers.

Proposition 10.

Let 𝒯^,𝒫^mn×mn{\color[rgb]{0,0,0}\widehat{\mathcal{T}}},\widehat{\mathcal{P}}\in\mathbb{R}^{mn\times mn} be defined in (16) and (20), respectively. Then for any ϵ>0\epsilon>0 there exists an integer KK such that for all n>2Klmn>{\color[rgb]{0,0,0}2Klm}

(𝒯^𝒯^)1/2𝒫^1=^0+^2,{\color[rgb]{0,0,0}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{-1/2}}-\widehat{\mathcal{P}}^{-1}=\widehat{\mathcal{E}}_{0}+\widehat{\mathcal{R}}_{2},

where ^02ϵ\|\widehat{\mathcal{E}}_{0}\|_{2}\leq\epsilon and rank(^2)2Klm\textrm{rank}(\widehat{\mathcal{R}}_{2})\leq{\color[rgb]{0,0,0}2Klm}.

Theorem 11.

Let 𝒴𝒯^,𝒫^mn×mn\mathcal{Y}\mathcal{\widehat{T}},\mathcal{\widehat{P}}\in\mathbb{R}^{mn\times mn} be defined in (18) and (20), respectively. Then for any ϵ>0\epsilon>0 there exists an integer KK such that for all n>2Klmn>{\color[rgb]{0,0,0}2Klm}

𝒫^1𝒴𝒯^=𝒬^+^+^,\mathcal{\widehat{P}}^{-1}\mathcal{Y}\mathcal{\widehat{T}}=\mathcal{\widehat{Q}}+\mathcal{\widehat{E}}+\mathcal{\widehat{R}},

where 𝒬^\mathcal{\widehat{Q}} is both symmetric and orthogonal, ^2ϵ\|\mathcal{\widehat{E}}\|_{2}\leq\epsilon, and rank(^)2Klm.\textrm{rank}(\mathcal{\widehat{R}})\leq{\color[rgb]{0,0,0}2Klm}.

Since both ^\mathcal{\widehat{E}} and ^\mathcal{\widehat{R}} in Theorem 11 are symmetric, by (BRANDTS20103100, , Corollary 3) and Theorem 1, we know that the preconditioned matrix sequence {𝒫^1𝒴𝒯^}n\{\mathcal{\widehat{P}}^{-1}\mathcal{Y}\mathcal{\widehat{T}}\}_{n} has clustered eigenvalues around ±1\pm 1 with number of outliers independent of nn. Hence, the convergence is independent of the time steps, and we can then expect that MINRES converges rapidly in exact arithmetic with 𝒫^\mathcal{\widehat{P}} as a preconditioner.

5 Numerical examples

In this section, we demonstrate the effectiveness of our proposed preconditioner. All numerical experiments are carried out using MATLAB on a Dell R640 server equipped with dual Xeon Gold 6246R 16-Cores 3.4GHz CPUs, 512GB RAM running Ubuntu 20.04 LTS. The CPU time, which is denoted by “CPU” in the tables below, is estimated in seconds using the MATLAB built-in functions tic/toc. All Krylov subspace solvers are implemented using the build-in functions on MATLAB, and “Iter” refers to the iteration number required for convergence. Furthermore, we choose a zero initial guess and a stopping tolerance of 10610^{-6} based on the reduction in relative residual norms. Throughout all examples, we consider finite difference methods with uniform spatial grids, which results in MmM_{m} being the identity matrix and KmK_{m} being diagonalized by the discrete sine transform. Note that in the examples “DoF” denotes the degree of freedom, and 𝒞H\mathcal{C}_{H} is the existing absolute value block circulant preconditioner proposed in doi:10.1137/16M1062016 .

Example 1.

doi:10.1137/16M1062016 The first example is a two-dimensional problem of solving (1) with Ω=(0,1)×(0,1)\Omega=(0,1)\times(0,1), u0(x,y)=x(x1)y(y1)u_{0}(x,y)=x(x-1)y(y-1), a(x,y)=105a(x,y)=10^{-5}, g=0g=0, f=0f=0, and T=1T=1.

For this example, we adopt the backward Euler method and the Crank-Nicolson method in time for solving the equation and report the convergence results in Tables 1 and 2, respectively. While 𝒞H\mathcal{C}_{H} increases quickly in iteration number with the parameters for this ill-conditioned example, our proposed preconditioners 𝒫H\mathcal{P}_{H} and 𝒫θ\mathcal{P}_{\theta} significantly outperform 𝒞H\mathcal{C}_{H} in terms of both iteration numbers and CPU times. In addition, we observe that 𝒫θ\mathcal{P}_{\theta} is only slightly inferior in precondition effect when compared with 𝒫H\mathcal{P}_{H}. In fact, in most cases, their induced iteration numbers are identical. The use of 𝒫θ\mathcal{P}_{\theta} as a preconditioner appears to be excellent, considering that it is an approximation to 𝒫H\mathcal{P}_{H}.

Table 1: Convergence results with MINRES for Example 1 when θ=1\theta=1 (the backward Euler method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
252^{5} 252^{5} 30752 34 0.24 11 0.096 11 0.071
262^{6} 127008 48 0.65 11 0.17 11 0.18
272^{7} 516128 59 2.18 11 0.57 11 0.56
282^{8} 2080800 82 19.56 11 3.27 11 3.27
262^{6} 252^{5} 61504 34 0.30 11 0.12 11 0.11
262^{6} 254016 48 1.04 11 0.26 11 0.27
272^{7} 1032256 72 6.81 11 1.35 13 1.48
282^{8} 4161600 82 40.82 11 7.28 13 8.37
272^{7} 252^{5} 123008 34 0.49 13 0.21 13 0.22
262^{6} 508032 48 1.92 13 0.62 13 0.61
272^{7} 2064512 72 15.04 13 3.68 13 3.73
282^{8} 8323200 79 81.81 13 17.19 13 17.32
282^{8} 252^{5} 246016 34 0.78 13 0.35 15 0.38
262^{6} 1016064 48 4.62 13 1.61 15 1.78
272^{7} 4129024 71 34.91 13 8.28 15 9.34
282^{8} 16646400 79 157.74 14 35.95 15 38.32
\botrule
Table 2: Convergence results with MINRES for Example 1 when θ=0.5\theta=0.5 (the Crank-Nicolson method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
252^{5} 252^{5} 30752 33 0.45 11 0.098 11 0.082
262^{6} 127008 48 0.67 11 0.18 11 0.20
272^{7} 516128 59 2.42 11 0.63 11 0.63
282^{8} 2080800 82 18.83 11 3.44 11 3.49
262^{6} 252^{5} 61504 34 0.38 11 0.15 11 0.14
262^{6} 254016 48 1.10 11 0.30 13 0.36
272^{7} 1032256 73 7.28 11 1.44 13 1.51
282^{8} 4161600 83 44.01 11 7.64 13 8.74
272^{7} 252^{5} 123008 34 0.68 13 0.23 13 0.24
262^{6} 508032 48 2.20 13 0.71 13 0.67
272^{7} 2064512 72 16.06 13 3.98 13 3.97
282^{8} 8323200 80 87.92 13 18.48 13 19.00
282^{8} 252^{5} 246016 34 0.83 13 0.39 15 0.42
262^{6} 1016064 48 5.02 13 1.73 15 1.97
272^{7} 4129024 72 37.34 13 9.14 15 10.34
282^{8} 16646400 79 165.53 14 37.91 15 40.49
\botrule
Example 2.

This example is a two-dimensional problem of solving (1) with Ω=(0,1)2\Omega=(0,1)^{2}, u0(x,y)=x(1x)y(1y)u_{0}(x,y)=x(1-x)y(1-y), a(x,y)=105sin(πxy)a(x,y)=10^{-5}\sin(\pi xy), g=0,T=1g=0,T=1 and

f(x,y,t)=etx(1x)[2×105sin(πxy)y(1y)π×105cos(πxy)x(12y)]+ety(1y)[2×105sin(πxy)π×105cos(πxy)y(12x)].f(x,y,t)=e^{-t}x(1-x)[2\times 10^{-5}\sin(\pi xy)-y(1-y)-\pi\times 10^{-5}\cos(\pi xy)x(1-2y)]\\ +e^{-t}y(1-y)[2\times 10^{-5}\sin(\pi xy)-\pi\times 10^{-5}\cos(\pi xy)y(1-2x)].

This example has the closed-form analytical solution as follows

u(x,y,t)=etx(1x)y(1y).\displaystyle u(x,y,t)=e^{-t}x(1-x)y(1-y).

Since the exact solution of this example is known, we can define the error estimate as follows

en,m=𝐮𝐮\displaystyle e_{n,m}=\|\mathbf{u}-\mathbf{u^{*}}\|_{\infty}

where 𝐮\mathbf{u} denotes the iterative solution of the linear system in (2), and 𝐮\mathbf{u^{*}} denotes the exact solution of the heat equation on the mesh.

Tables 3 and 4 show the convergence results of MINRES using the backward Euler method and the Crank-Nicolson method in time, respectively. Table 5 shows the corresponding error results of MINRES using different preconditioners. Again, similar to the last example, we observed that our proposed preconditioners 𝒫H\mathcal{P}_{H} and 𝒫θ\mathcal{P}_{\theta} considerably surpass 𝒞H\mathcal{C}_{H} in this variable-coefficient case.

Table 3: Convergence results with MINRES for Example 2 when θ=1\theta=1 (the backward Euler method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
252^{5} 252^{5} 30752 107 0.63 11 0.084 11 0.084
262^{6} 127008 141 1.95 11 0.17 12 0.16
272^{7} 516128 218 7.91 11 0.57 13 0.64
282^{8} 2080800 315 62.44 12 3.82 16 4.87
262^{6} 252^{5} 61504 106 1.00 11 0.13 13 0.15
262^{6} 254016 154 3.20 11 0.25 13 0.31
272^{7} 1032256 219 21.27 13 1.44 14 1.49
282^{8} 4161600 307 155.67 13 8.18 17 10.69
272^{7} 252^{5} 123008 107 1.49 13 0.22 13 0.21
262^{6} 508032 160 6.30 13 0.58 14 0.63
272^{7} 2064512 218 47.22 13 3.44 15 4.55
282^{8} 8323200 303 315.43 13 17.64 18 23.49
282^{8} 252^{5} 246016 118 2.50 14 0.35 15 0.36
262^{6} 1016064 177 14.86 14 1.58 15 1.45
272^{7} 4129024 220 111.71 14 10.00 17 11.22
282^{8} 16646400 299 601.57 15 39.08 19 48.44
\botrule
Table 4: Convergence results with MINRES for Example 2 when θ=0.5\theta=0.5 (the Crank-Nicolson method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
252^{5} 252^{5} 30752 106 0.61 11 0.085 11 0.076
262^{6} 127008 141 1.94 11 0.19 12 0.18
272^{7} 516128 216 8.75 11 0.70 13 0.70
282^{8} 2080800 309 67.75 12 4.22 17 5.63
262^{6} 252^{5} 61504 106 0.96 11 0.12 13 0.14
262^{6} 254016 154 3.38 11 0.27 13 0.32
272^{7} 1032256 218 18.64 11 1.16 14 1.42
282^{8} 4161600 304 164.44 13 9.68 17 11.38
272^{7} 252^{5} 123008 107 1.53 13 0.21 13 0.21
262^{6} 508032 160 6.78 13 0.63 13 0.66
272^{7} 2064512 218 48.20 13 4.12 15 4.63
282^{8} 8323200 301 337.99 13 18.98 19 26.58
282^{8} 252^{5} 246016 117 2.63 14 0.36 15 0.38
262^{6} 1016064 177 15.35 14 1.47 15 1.56
272^{7} 4129024 221 114.99 14 9.39 17 11.16
282^{8} 16646400 299 648.21 15 42.38 19 52.54
\botrule
Table 5: Error results with MINRES for Example 2
Backward Euler Crank-Nicolson
nn m+1m+1 DoF 𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta} 𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
252^{5} 252^{5} 30752 6.14e-4 6.14e-4 6.14e-4 3.12e-6 3.12e-6 3.12e-6
262^{6} 127008 6.14e-4 6.14e-4 6.14e-4 3.12e-6 3.12e-6 3.12e-6
272^{7} 516128 6.14e-4 6.14e-4 6.14e-4 3.12e-6 3.12e-6 3.12e-6
282^{8} 2080800 6.14e-4 6.14e-4 6.14e-4 3.12e-6 3.12e-6 3.12e-6
262^{6} 252^{5} 61504 3.08e-4 3.08e-4 3.08e-4 8.12e-7 7.99e-7 8.02e-7
262^{6} 254016 3.08e-4 3.08e-4 3.08e-4 8.05e-7 8.01e-7 8.03e-7
272^{7} 1032256 3.08e-4 3.08e-4 3.08e-4 8.15e-7 8.07e-7 8.01e-7
282^{8} 4161600 3.08e-4 3.08e-4 3.08e-4 8.15e-7 8.02e-7 8.02e-7
272^{7} 252^{5} 123008 1.54e-4 1.54e-4 1.54e-4 2.38e-7 1.99e-7 2.04e-7
262^{6} 508032 1.54e-4 1.54e-4 1.54e-4 4.08e-7 2.00e-7 2.00e-7
272^{7} 2064512 1.54e-4 1.54e-4 1.54e-4 4.78e-7 2.03e-7 2.56e-7
282^{8} 8323200 1.54e-4 1.54e-4 1.54e-4 4.10e-7 7.08e-7 5.87e-7
282^{8} 252^{5} 246016 7.71e-5 7.71e-5 7.71e-5 1.79e-7 5.79e-8 4.98e-8
262^{6} 1016064 7.71e-5 7.71e-5 7.71e-5 6.37e-7 5.91e-8 6.93e-8
272^{7} 4129024 7.71e-5 7.71e-5 7.71e-5 6.50e-7 1.53e-7 2.58e-7
282^{8} 16646400 7.71e-5 7.71e-5 7.71e-5 3.76e-7 6.42e-7 7.60e-7
\botrule
Example 3.

The last example is a three-dimensional problem of solving (1) with Ω=(0,1)3\Omega=(0,1)^{3}, u0(x,y)=x(x1)y(y1)z(z1)u_{0}(x,y)=x(x-1)y(y-1)z(z-1), a(x,y,z)=103a(x,y,z)=10^{-3}, g=0g=0, f=0f=0, and T=1T=1.

Tables 6 and 7 respectively show the convergence results of MINRES when the backward Euler method and the Crank-Nicolson method are used for time. Overall, the preconditioners 𝒫H\mathcal{P}_{H} and 𝒫θ\mathcal{P}_{\theta} still perform better than 𝒞H\mathcal{C}_{H} with regard to iteration numbers. In this relatively well-conditioned case, all preconditioned solvers appear to have comparable CPU times.

Table 6: Convergence results with MINRES for Example 3 when θ=1\theta=1 (the backward Euler method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
232^{3} 232^{3} 2744 11 0.034 10 0.027 13 0.029
242^{4} 27000 18 0.12 12 0.093 14 0.10
252^{5} 238328 21 0.54 13 0.39 16 0.49
262^{6} 2000376 21 5.91 13 5.06 16 6.07
242^{4} 232^{3} 5488 14 0.032 12 0.037 14 0.040
242^{4} 54000 18 0.17 15 0.15 17 0.18
252^{5} 476656 21 1.08 15 0.85 18 0.96
262^{6} 4000752 24 15.77 17 13.32 18 14.01
252^{5} 232^{3} 10976 14 0.059 14 0.070 15 0.075
242^{4} 108000 18 0.29 17 0.27 19 0.31
252^{5} 953312 22 2.37 18 2.00 21 2.30
262^{6} 8001504 24 32.41 19 30.75 22 36.00
262^{6} 232^{3} 21952 14 0.084 15 0.094 17 0.13
242^{4} 216000 18 0.49 18 0.50 21 0.55
252^{5} 1906624 22 6.13 21 7.05 24 7.94
262^{6} 16003008 24 65.54 21 66.41 24 75.87
\botrule
Table 7: Convergence results with MINRES for Example 3 when θ=0.5\theta=0.5 (the Crank-Nicolson method)
𝒞H\mathcal{C}_{H} 𝒫H\mathcal{P}_{H} 𝒫θ\mathcal{P}_{\theta}
nn m+1m+1 DoF Iter CPU Iter CPU Iter CPU
232^{3} 232^{3} 2744 14 0.040 10 0.034 13 0.030
242^{4} 27000 18 0.10 13 0.095 15 0.11
252^{5} 238328 21 0.61 13 0.44 17 0.55
262^{6} 2000376 21 6.23 13 5.11 17 6.71
242^{4} 232^{3} 5488 14 0.034 12 0.037 15 0.048
242^{4} 54000 18 0.20 15 0.19 17 0.18
252^{5} 476656 21 1.10 15 0.90 20 1.16
262^{6} 4000752 25 17.49 17 13.98 20 16.36
252^{5} 232^{3} 10976 14 0.066 14 0.064 16 0.081
242^{4} 108000 18 0.30 17 0.30 19 0.35
252^{5} 953312 23 3.03 18 2.86 22 3.46
262^{6} 8001504 24 34.45 19 32.64 22 37.47
262^{6} 232^{3} 21952 14 0.10 15 0.095 17 0.11
242^{4} 216000 18 0.51 18 0.50 22 0.61
252^{5} 1906624 23 7.56 21 7.23 25 8.49
262^{6} 16003008 24 69.35 21 69.73 26 85.77
\botrule

6 Conclusion

We have developed in this work a sine transform based preconditioning method which can be used for solving a wide range of time-dependent PDEs. Namely, our proposed preconditioners can be applied for a large class of symmetrized all-at-once system 𝒴𝒯^\mathcal{Y}\mathcal{\widehat{T}} resulted from discretizing the underlying equation. The method is generic, and can be used with high-order discretization schemes for time.

We have adopted the Krylov subspace solver of MINRES by exploiting the symmetry of the permuted matrix. In particular, we emphasize that our sine transform based preconditioners can be seen as optimal or quasi-optimal preconditioners, which are both symmetric positive definite and efficiently implemented, according to the previous work on the spectral distribution of symmetrized Toeplitz matrix sequences MazzaPestana2018 ; Ferrari2019 . We observe that a quasi optimal behavior is obtained in the case of multilevel coefficient matrices, which is a good result given the theoretical barriers proved in SeTy2 for matrix algebra preconditioners with equimodular transforms like multilevel circulants (see also nega-gen for more discussion and more general results).

Also, specifically designed for MINRES, our preconditioners consistently outperform an existing block circulant preconditioner in both iteration counts and CPU times as shown in each numerical example. It is observed that they can achieve rapid convergence, especially in the ill-conditioned case. We stress that the severity of ill-conditioning of the original linear system mainly comes from both the spatial and temporal discretization schemes for the underlying equation. In addition to effective preconditioning, one remedy is to develop better discretization schemes in the first place for both space and time, which would be a potential research direction in the next stage. As long as the resulting linear systems from such discretization schemes are of lower triangular banded block Toeplitz structure, our proposed preconditioners can be incorporated. The required structure, mainly as a result of time discretization through an all-at-once process, is not difficult to obtain.

Our preconditioning technique (i.e., 𝒫H\mathcal{P}_{H}) utilizes the fast diagonalizability of the Laplacian matrix, which relies mainly on the uniform spatial grid setting. We however emphasize that such a uniform grid assumption is a common starting point in the all-at-once preconditioning context, as with many existing works such as LIN2021110221 ; WU2021110076 . Along this line, it would be a direction for future research to develop efficient preconditioning methods when the physical domain is irregular - in this setting the theory of generalized locally Toeplitz sequences could be exploited for the spectral and convergence analysis GLT-LAA2 ; Ba . Also, our proposed preconditioner 𝒫θ\mathcal{P}_{\theta}, which does not require such fast diagonalizability and has shown excellent preconditioning effect in the numerical tests, deserves further investigation.

For another future work, it would be interesting to combine the symmetrization MINRES solver with the block ϵ\epsilon-circulant preconditioning technique doi:10.1137/20M1316354 ; doi:10.1137/19M1309869 ; WU2021110076 which has only been shown applicable for GMRES. Such a combination would further advance the pioneering block circulant preconditioning theory (cooperated with MINRES) originally proposed in doi:10.1137/16M1062016 , which is still at an early stage of development.

As already observed, a further interesting case is that of variable coefficients. When a(x)a(x) is not constant or there is also a dependency on time, that is aa(x,t)a\equiv a(x,t), the spectral analysis can be performed and the related solvers can be designed via multilevel GLT tools (see book-GLT-II ), by assuming that both nn and mm are allowed to tend to infinity. In this setting, instead of dealing with a matrix valued symbol (whose size is the fixed parameter mm), we have dd further levels in the structure, where dd is the spatial dimensionality of the problem, but with scalar-valued symbols. Of course, this setting is a challenge (see SeTy2 ; nega-gen and references therein) to be considered in future work, even though the numerical tests considered in this work seem promising in this direction.

Declarations

Ethical Approval and Consent to participate

Not applicable.

Consent for publication

The authors consent to publication of this work in Numerical Algorithms of Springer Nature, when accepted.

Human and Animal Ethics

Not applicable.

Availability of supporting data

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Competing interests

The authors declare no competing interests.

Funding

The work of S. Hon was supported in part by the Hong Kong RGC under grant 22300921, a start-up allowance from the Croucher Foundation, and a Tier 2 Start-up Grant from Hong Kong Baptist University. The work of S. Serra-Capizzano was supported in part by INDAM-GNCS. Furthermore, the work of Stefano Serra-Capizzano wss funded from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955701. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Belgium, France, Germany, Switzerland. Stefano Serra-Capizzano is also grateful for the support of the Laboratory of Theory, Economics and Systems – Department of Computer Science at Athens University of Economics and Business.

Authors’ contributions

S. Hon developed the methodology, wrote the main manuscript text, and conducted the main numerical experiments. P. Y. Fung improved the numerical experiments and modified the manuscript text. J. Dong modified the manuscript text. S. Serra-Capizzano validated the methodology and modified the manuscript text. All authors reviewed the manuscript.

References

  • (1) Barakitis, N., Ekstrom, S. E., Vassalos, P.: Preconditioners for fractional diffusion equations based on the spectral symbol, Numerical Linear Algebra with Applications, e2441 (2022) https://doi.org/10.1002/nla.2441
  • (2) Barbarino, G.: A systematic approach to reduced GLT. BIT Numerical Mathematics. 1–63 (2021) https://doi.org/10.1007/s10543-021-00896-7
  • (3) Benzi, M., Golub, G. H.: Bounds for the entries of matrix functions with applications to preconditioning. BIT Numerical Mathematics. 39(3), 417–438 (1999) https://link.springer.com/article/10.1023/A:1022362401426
  • (4) Bertaccini, D.: Reliable preconditioned iterative linear solvers for some integrators. Numerical Linear Algebra with Applications. 8(2), 111–125 (2001) https://doi.org/10.1002/1099-1506(200103)8:2<111::AID-NLA234>3.0.CO;2-Q
  • (5) Bertaccini, D., Ng, M. K.: Block ω\omega-circulant preconditioners for the systems of differential equations. Calcolo. 40(2), 71–90 (2003) https://link.springer.com/article/10.1007/s100920300004
  • (6) Bini, D., Benedetto, F.: A new preconditioner for the parallel solution of positive definite Toeplitz systems. in Proc. Second ACM Symp. on Parallel Algorithms and Architectures, pp. 220–223 (1990). https://doi.org/10.1145/97444.97688
  • (7) Bini, D., Capovani, M.: Spectral and computational properties of band symmetric Toeplitz matrices. Linear Algebra and Its Applications. 52, 99–126 (1983) https://doi.org/10.1016/0024-3795(83)90009-5
  • (8) Brandts, J. H., da Silva, R. R.: Computable eigenvalue bounds for rank-kk perturbations. Linear Algebra and Its Applications. 432(12), 3100–3116 (2010) https://doi.org/10.1016/j.laa.2010.02.010
  • (9) Chan, R. H., Ng, M. K.: Conjugate gradient methods for Toeplitz systems. SIAM Review. 38(3), 427–482 (1996) https://doi.org/10.1137/s0036144594276474
  • (10) Chan, T. F.: An optimal circulant preconditioner for Toeplitz systems. SIAM Journal on Scientific and Statistical Computing. 9(4), 766–771 (1988) https://doi.org/10.1137/0909051
  • (11) Cocquet, P. H., Gander, M. J.: How large a shift is needed in the shifted Helmholtz preconditioner for its effective inversion by multigrid?. SIAM Journal on Scientific and Statistical Computing, 39(2), A438–A478 (2017) https://doi.org/10.1137/15M102085X
  • (12) Danieli, F., Wathen, A. J.: All-at-once solution of linear wave equations. Numerical Linear Algebra with Applications. e2386, (2021) https://doi.org/10.1002/nla.2386
  • (13) Di Benedetto, F.: Analysis of preconditioning techniques for ill-conditioned Toeplitz matrices. SIAM Journal on Scientific Computing. 16(3), 682–697 (1995) https://doi.org/10.1137/0916041
  • (14) Di Benedetto, F.: Solution of Toeplitz normal equations by sine transform based preconditioning. Linear Algebra and Its Applications. 285(1-3), 229–255 (1998) https://doi.org/10.1016/s0024-3795(98)10115-5
  • (15) Di Benedetto, F., Serra-Capizzano, S.: A unifying approach to abstract matrix algebra preconditioning. Numerische Mathematik. 82, 57–90 (1999) https://doi.org/10.1007/s002110050411
  • (16) Dobrev, V. A., Kolev, Tz., Petersson, N.A., Schroder J.B.: Two-level convergence theory for multigrid reduction in time (MGRIT). SIAM Journal on Scientific Computing. 39(5), S501–S527 (2017) https://doi.org/10.1137/16m1074096
  • (17) Donatelli, M., Mazza, M., Serra-Capizzano, S.: Spectral analysis and structure preserving preconditioners for fractional diffusion equations. Journal of Computational Physics. 307, 262–279 (2016) https://doi.org/10.1016/j.jcp.2015.11.061
  • (18) Falgout, R. D., Friedhoff, S., Kolev, T. V., MacLachlan, S. P., Schroder J. B.: Parallel time integration with multigrid. SIAM Journal on Scientific Computing. 36(6), C635–C661 (2014) https://doi.org/10.1002/pamm.201410456
  • (19) Ferrari, P., Furci, I., Hon, S., Mursaleen, M. A., Serra-Capizzano, S.: The eigenvalue distribution of special 2-by-2 block matrix-sequences with applications to the case of symmetrized Toeplitz structures. SIAM Journal on Matrix Analysis and Applications. 40(3), 1066–1086 (2019) https://doi.org/10.1137/18m1207399
  • (20) Gander, M.J.: 50 years of time parallel time integration. Multiple Shooting and Time Domain Decomposition Methods. 69–113. Springer, Cham (2015). Series Editors: T. Carraro, M. Geiger, S. Körkel, R. Rannacher https://doi.org/10.1007/978-3-319-23321-5_3
  • (21) Gander, M. J., Halpern, L., Ryan, J., Tran, T. T. B.: A direct solver for time parallelization. Domain Decomposition Methods in Science and Engineering XXII. 491–499. Springer, Cham (2016). Series Editors: T, Dickopf, M, J. Gander, L. Halpern, R. Krause, L.F. Pavarino https://doi.org/10.1007/978-3-319-18827-0_50
  • (22) Gander, M. J., Neumüller, M.: Analysis of a new space-time parallel multigrid algorithm for parabolic problems. SIAM Journal on Scientific Computing. 38(4) A2173–A2208 (2016) https://doi.org/10.1137/15m1046605
  • (23) Gander, M. J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method. SIAM Journal on Scientific Computing. 29(2) 556–578 (2007) https://doi.org/10.1137/05064607x
  • (24) Garoni, C., Serra-Capizzano, S.: Generalized locally Toeplitz sequences: Theory and applications. Volume 1, Springer, Cham (2017) https://doi.org/10.1007/978-3-319-53679-8_8
  • (25) Garoni, C., Serra-Capizzano, S.: Generalized locally Toeplitz sequences: Theory and applications. Volume 2, Springer, Cham (2018) https://doi.org/10.1007/978-3-030-02233-4
  • (26) Goddard, A., Wathen, A.: A note on parallel preconditioning for all-at-once evolutionary PDEs. Electronic Transactions on Numerical Analysis. 51, 135–150 (2019) https://doi.org/10.1553/etna_vol51s135
  • (27) Hon, S.: Optimal block circulant preconditioners for block Toeplitz systems with application to evolutionary PDEs. Journal of Computational and Applied Mathematics. 407, 113965 (2021). https://doi.org/10.1016/j.cam.2021.113965
  • (28) Hon, S., Serra-Capizzano, S., Wathen, A.: Band-Toeplitz preconditioners for ill-conditioned Toeplitz systems. BIT Numerical Mathematics. (2021) https://doi.org/10.1007/s10543-021-00889-6
  • (29) Horton, G., Vandewalle, S.: A space-time multigrid method for parabolic partial differential Equations. SIAM Journal on Scientific Computing. 16(4), 848–864 (1995). https://doi.org/10.1137/0916050
  • (30) Lin, X. L., Ng, M. K.: An all-at-once preconditioner for evolutionary partial differential equations. SIAM Journal on Scientific Computing. 43(4), A2766–A2784 (2021) https://doi.org/10.1137/20m1316354
  • (31) Lin, X. L., Ng, M. K., Zhi, Y.: A parallel-in-time two-sided preconditioning for all-at-once system from a non-local evolutionary equation with weakly singular kernel. Journal of Computational Physics. 434, 110221 (2021) https://doi.org/10.1016/j.jcp.2021.110221
  • (32) Lions, J. L., Maday, Y., Turinici, G.: A "parareal" in time discretization of PDE’s. Comptes rendus de l’Académie des sciences. Série 1, Mathématique. 332(7), 661–668 (2001)
  • (33) Liu, J., Wu, S. L.: A fast block α\alpha-circulant preconditioner for all-at-once systems from wave equations. SIAM Journal on Matrix Analysis and Applications. 41(4), 1912–1943 (2021) https://doi.org/10.1137/19m1309869
  • (34) Mazza, M., Pestana, J.: Spectral properties of flipped Toeplitz matrices and related preconditioning. BIT Numerical Mathematics. 59, 463–482 (2018) https://doi.org/10.1007/s10543-018-0740-y
  • (35) McDonald, E., Hon, S., Pestana, J., Wathen, A.: Preconditioning for nonsymmetry and time-dependence. In Lecture Notes in Computational Science and Engineering. 116, 81–91 (2017). Springer International Publishing. https://doi.org/10.1007/978-3-319-52389-7_7
  • (36) McDonald, E., Pestana, J., Wathen, A.: Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations, SIAM Journal on Scientific Computing, 40(2) A1012–A1033 (2018) https://doi.org/10.1137/16m1062016
  • (37) Ng, M. K.: Iterative Methods for Toeplitz Systems. Numerical Mathematics and Scientific Computation. Oxford University Press, New York (2004)
  • (38) Ng, M. K., Pan, J.: Approximate inverse circulant-plus-diagonal preconditioners for Toeplitz-plus-diagonal matrices. SIAM Journal on Scientific Computing, 32(3), 1442–1464 (2010) https://doi.org/10.1137/080720280
  • (39) Serra-Capizzano, S.: The GLT class as a generalized Fourier analysis and applications. Linear Algebra and Its Applications. 419(1), 180–233 (2006) https://doi.org/10.1016/j.laa.2006.04.012
  • (40) Serra-Capizzano, S.: Toeplitz preconditioners constructed from linear approximation processes. SIAM Journal on Matrix Analysis and Applications. 20(2), 446–465 (1999) https://doi.org/10.1137/s0895479897316904
  • (41) Serra-Capizzano, S.: Matrix algebra preconditioners for multilevel Toeplitz matrices are not superlinear. Special issue on structured and infinite systems of linear equations. Linear Algebra and Its Applications. 343/344, 303–319 (2002) https://dx.doi.org/10.1016/S0024-3795(01)00361-5
  • (42) Serra-Capizzano, S., Tilli, P.: On unitarily invariant norms of matrix-valued linear positive operators. Journal of Inequalities and Applications. 7(3), 309–330 (2002) https://doi.org/10.1155/s1025583402000152
  • (43) Serra-Capizzano, S., Tyrtyshnikov., E.: How to prove that a preconditioner cannot be superlinear, Mathematics of Computation, 72(243), 1305–1316 (2003) https://doi.org/10.1090/s0025-5718-03-01506-0
  • (44) Strang, G.: A Proposal for Toeplitz Matrix Calculations, Studies in Applied Mathematics, 74(2), 171–176 (1986) https://doi.org/10.1002/sapm1986742171
  • (45) Wathen, A.: Preconditioning. Acta Numerica. 24, 329–376 (2015) https://doi.org/10.1017/s0962492915000021
  • (46) Wu, S. L., Zhou, T.: Parallel implementation for the two-stage SDIRK methods via diagonalization. Journal of Computational Physics. 428, 110076 (2021) https://doi.org/10.1016/j.jcp.2020.110076