\jyear

2021

[1]\fnmSean \surHon

[1]\orgdivDepartment of Mathematics, \orgnameHong Kong Baptist University, \orgaddress\streetKowloon Tong, \cityHong Kong SAR, \countryChina

2]\orgdivDepartment of Science and High Technology, \orgnameUniversity of Insubria, \orgaddress \cityComo, \postcode6152, \stateLombardy, \countryItaly

3]\orgdivDepartment of Information Technology, \orgnameUppsala University, \orgaddress \cityUppsala, \postcode75008, \stateUppland, \countrySweden

A sine transform based preconditioned MINRES method for all-at-once systems from constant and variable-coefficient evolutionary PDEs

[email protected] \fnmPo Yin \surFung [email protected] \fnmJiamei \surDong [email protected] \fnmStefano \surSerra-Capizzano [email protected] * [ [

Abstract

In this work, we propose a simple yet generic preconditioned Krylov subspace method for a large class of nonsymmetric block Toeplitz all-at-once systems arising from discretizing evolutionary partial differential equations. Namely, our main result is to propose two novel symmetric positive definite preconditioners, which can be efficiently diagonalized by the discrete sine transform matrix. More specifically, our approach is to first permute the original linear system to obtain a symmetric one, and subsequently develop desired preconditioners based on the spectral symbol of the modified matrix. Then, we show that the eigenvalues of the preconditioned matrix sequences are clustered around $\pm 1$ , which entails rapid convergence when the minimal residual method is devised. Alternatively, when the conjugate gradient method on the normal equations is used, we show that our preconditioner is effective in the sense that the eigenvalues of the preconditioned matrix sequence are clustered around unity. An extension of our proposed preconditioned method is given for high-order backward difference time discretization schemes, which can be applied on a wide range of time-dependent equations. Numerical examples are given, also in the variable-coefficient setting, to demonstrate the effectiveness of our proposed preconditioners, which consistently outperforms an existing block circulant preconditioner discussed in the relevant literature.

keywords:

sine transform based preconditioners; parallel-in-time; Krylov subspace methods; MINRES; all-at-once discretization; block Toeplitz systems

pacs:

[

MSC Classification]15B05, 65F08, 65F10, 65M22

1 Introduction

Over the past few years, there has been a growing interest on developing effective preconditioners for solving the all-at-once linear systems stemming from evolutionary partial differential equations (PDEs); see for instance Bertaccini2001 ; BertacciniNg2003 ; doi:10.1137/16M1062016 ; McDonald2017 ; HON2021113965 ; 0x003ab34e ; https://doi.org/10.1002/nla.2386 ; doi:10.1137/20M1316354 ; LIN2021110221 ; doi:10.1137/19M1309869 ; WU2021110076 and references therein. Instead of solving PDEs in a sequential manner, this class of parallel-in-time (PinT) methods solves for all unknowns simultaneously by constructing a large linear system which is composed of smaller linear systems at each time level. Our proposed method in this work belongs to the diagonalization-based all-at-once methods 10.1007/978-3-319-18827-0_50 , along with doi:10.1137/16M1062016 ; doi:10.1137/20M1316354 . Related PinT methods include space-time multigrid doi:10.1137/15M1046605 ; doi:10.1137/0916050 , multigrid reduction in time doi:10.1137/130944230 ; doi:10.1137/16M1074096 , and parareal method LIONS2001661 ; doi:10.1137/05064607X . For a survey on the development of these PinT methods, we refer the readers to 10.1007/978-3-319-23321-5_3 and the references therein.

As a model problem, we consider the following initial-boundary (value) problem for the heat equation

\left\{\begin{array}[]{lc}\frac{\partial u(x,t)}{\partial t}={\color[rgb]{0,0,0}\nabla\cdot(a({x})\nabla u(x,t))}+f(x,t),&(x,t)\in\Omega\times(0,T],\\ u=g,&(x,t)\in\partial\Omega\times(0,T],\\ u(x,0)=u_{0}(x),&x\in\Omega.\end{array}\right.\,

(1)

where $\Omega$ is open, $\partial\Omega$ denotes the boundary of $\Omega$ , and $a,f,g,u_{0}$ are all given functions.

After discretizing the spatial domain with a mesh and using the simple two-level $\theta$ -method for time discretization with $n$ time-steps of size $\tau=\frac{T}{n}$ , we get

M_{m}\bigg{(}\frac{\mathbf{u}_{m}^{(k+1)}-\mathbf{u}_{m}^{(k)}}{\tau}\bigg{)}=-K_{m}\big{(}\theta\mathbf{u}_{m}^{(k+1)}+(1-\theta)\mathbf{u}_{m}^{(k)}\big{)}+\theta\mathbf{f}_{m}^{(k+1)}+(1-\theta)\mathbf{f}_{m}^{(k)},

where $M_{m},K_{m}\in\mathbb{R}^{m\times m}$ are the mass matrix and the stiffness matrix approximating $-\nabla\cdot(a({x})\nabla)$ , respectively, in the standard finite element terminology, $\mathbf{u}_{m}^{(k)}=[u_{1}^{(k)},\dots,u_{m}^{(k)}]^{T},$ $\mathbf{f}_{m}^{(k)}=[f_{1}^{(k)},\dots,f_{m}^{(k)}]^{T},$ and $\theta\in[0,1]$ . The forward Euler method corresponds to the choice $\theta=0$ , the backward Euler method to $\theta=1$ , while the Crank-Nicolson method is associated with $\theta=1/2$ .

The discrete equations to be solved are described below

(M_{m}+\theta\tau K_{m})\mathbf{u}_{m}^{(k+1)}=(M_{m}-(1-\theta)\tau K_{m})\mathbf{u}_{m}^{(k)}+\theta\tau\mathbf{f}_{m}^{(k+1)}+(1-\theta)\tau\mathbf{f}_{m}^{(k)}.

Instead of solving the above equations for $\mathbf{u}_{m}^{(k)}$ sequentially, for $k=1,2,\ldots,n$ , we solve the following equivalent all-at-once system

\mathcal{T}_{H}\underbrace{\begin{bmatrix}{}\mathbf{u}_{m}^{(1)}\\ \mathbf{u}_{m}^{(2)}\\ \mathbf{u}_{m}^{(3)}\\ \vdots\\ \mathbf{u}_{m}^{(n)}\end{bmatrix}}_{{\color[rgb]{0,0,0}=:\mathbf{u}}}=\underbrace{\begin{bmatrix}{}\theta\tau\mathbf{f}_{m}^{(1)}+(1-\theta)\tau\mathbf{f}_{m}^{(0)}+(M_{m}-(1-\theta)\tau K_{m})\mathbf{u}_{m}^{(0)}\\ \theta\tau\mathbf{f}_{m}^{(2)}+(1-\theta)\tau\mathbf{f}_{m}^{(1)}\\ \theta\tau\mathbf{f}_{m}^{(3)}+(1-\theta)\tau\mathbf{f}_{m}^{(2)}\\ \vdots\\ \theta\tau\mathbf{f}_{m}^{(n)}+(1-\theta)\tau\mathbf{f}_{m}^{(n-1)}\end{bmatrix}}_{{\color[rgb]{0,0,0}=:\mathbf{f}_{H}}},

where

\mathcal{T}_{H}=\begin{bmatrix}M_{m}+\theta\tau K_{m}&&&\\ -M_{m}+(1-\theta)\tau K_{m}&\ddots&&\\ &\ddots&\ddots&\\ &&\ddots&\ddots\\ &&-M_{m}+(1-\theta)\tau K_{m}&M_{m}+\theta\tau K_{m}\end{bmatrix}

(2)

is a $mn$ by $mn$ nonsymmetric block Toeplitz matrix, which is generated by the matrix-valued function

g_{\theta}(x)=\underbrace{M_{m}+\theta\tau K_{m}}_{=:A_{(0)}}+\underbrace{(-M_{m}+(1-\theta)\tau K_{m})}_{=:A_{(1)}}e^{\textbf{i}x}.

(3)

According to the notion of generating function reported in Section 2 (see also book-GLT-I and the references there reported), it is noted that if $K_{m}$ and $M_{m}$ commute then $A_{(0)}$ and $A_{(1)}$ commute. If one uses a finite element formulation to discretize in space for (1), then $M_{m}$ and $K_{m}$ are simultaneously diagonalizable, provided that a uniform grid is used. Alternatively, if finite difference methods are used for (1), then $M_{m}$ , being exactly the identity matrix $I_{m}$ in this case, always commutes with $K_{m}$ (see (doi:10.1137/16M1062016, , Section 3.1.1) for a discussion).

Therefore, throughout this work, the following assumptions on both $M_{m}$ and $K_{m}$ are made, which are compatible with (doi:10.1137/20M1316354, , Section 2):

1.

Both $M_{m}$ and $K_{m}$ are real symmetric positive definite (SPD);
2.

Both $M_{m}$ and $K_{m}$ are sparse, i.e., they have only $\mathcal{O}(m)$ nonzero entries.
3.

$M_{m}$ and $K_{m}$ commute.

As already observed the commutation between $M_{m}$ and $K_{m}$ is obvious if $M_{m}$ is the identity matrix or in the constant coefficient setting. When $a(x)$ is not constant or there is also a dependency on time, that is $a\equiv a(x,t)$ , or the discretization method is different (finite elements, finite volumes, IgA) or graded meshes are used, the commutation is no longer true algebraically, but it holds asymptotically for large enough $m$ and it can be proved via multilevel GLT tools (see book-GLT-II ). More in detail, by assuming $m=m(n)$ with $m(n)$ going to infinity as $n$ tend to infinity and so that $1=o(m(n))$ and

c=\lim_{h\rightarrow 0^{+}}\frac{\tau}{h^{2}}>0,

(4)

in the case of a variable coefficient $a\equiv a(x,t)$ , we obtain that the matrix sequence $\{h^{2}K_{m}\}_{m}$ is spectrally distributed as $p(\psi)a(t,x)$ on $(\psi,x)\in[0,\pi,\Omega]$ , in the sense of Definition 1, since $\{h^{2}K_{m}\}_{m}$ a real symmetric GLT matrix sequence with GLT symbol $p(\psi)a(t,x)$ . The latter statement holds since we exploit a canonical decomposition of $h^{2}K_{m}$ as the product of a diagonal sampling sequence and of Toeplitz matrix sequence plus a zero-distributed matrix sequence (see again (book-GLT-II, , Ch. 6, Sect. 7.3)). Hence, under the assumption that finite differences are used and by exploiting the tensor structure of $\mathcal{T}_{H}$ in (2), the global matrix sequence $\{\mathcal{T}_{H}\}_{m}$ will be again a GLT matrix sequence with GLT symbol

c\theta\,p(\psi_{2})a(t,x)+c(1-\theta)\,a(t,x)e^{-\mathbf{i}\psi_{1}}.

Still the same distribution is present if condition (4) is violated and $c=\infty$ , but with a different scaling factor and it applies to the matrix sequence $\{\frac{h^{2}}{\tau}\mathcal{T}_{H}\}_{m}$ .

However, in the present setting $m$ is considered fixed that is $c=0$ and hence also the condition $1=o(m(n))$ is not satisfied. Here a matrix-valued distribution is deduced. When assuming that both $n$ and $m$ are allowed to tend to infinity, in the way indicated in the previous lines, the situation changes and the related type of analysis represents a research line with various cases to be considered in the future.

Instead of directly solving (2), we consider the permuted linear system $\mathcal{Y}\mathcal{T}_{H}\mathbf{u}=\mathcal{Y}\mathbf{f}_{H}$ , where

\mathcal{Y}\mathcal{T}_{H}=\begin{bmatrix}&&&A_{(1)}&A_{(0)}\\ &&\iddots&\iddots&\\ &A_{(1)}&A_{(0)}&&\\ A_{(1)}&A_{(0)}&&&\\ A_{(0)}&&&&\\ \end{bmatrix}.

(5)

Note that $\mathcal{Y}=Y_{n}\otimes I_{m}$ with $Y_{n}\in\mathbb{R}^{n\times n}$ being the anti-identity matrix or the flip matrix that is $[Y_{n}]_{j,k}=1$ if and only if $j+k=n+1$ and $[Y_{n}]_{j,k}=0$ otherwise. Clearly, $\mathcal{Y}\mathcal{T}_{H}$ is symmetric since $L_{m}$ is assumed symmetric. Thus, we can then deploy a symmetric Krylov subspace method such as the minimal residual (MINRES) method and design an effective preconditioner for $\mathcal{Y}\mathcal{T}_{H}$ , whose convergence depends only on the eigenvalues.

In doi:10.1137/16M1062016 , a Strang block circulant preconditioner was proposed for $\mathcal{Y}\mathcal{T}_{H}$ and proved effective when $a(x)=1$ . The main advantage of this iterative approach is that the inverse of the preconditioner can be computed using parallel fast Fourier transforms over a number of processors. However, the performance of such a preconditioner can be unsatisfactory, as will be shown in the numerical examples in Section 5. We remark that such an unsatisfactory preconditioning effect of circulant matrices Strang1986 ; doi:10.1137/0909051 for certain ill-conditioned Toeplitz systems was discussed in Hon_SC_Wathen .

Thus, instead of using block circulant preconditioners, we propose in this work the following SPD preconditioner for $\mathcal{Y}\mathcal{T}_{H}$ , that can be diagonalized by the discrete sine transform matrix

\displaystyle\mathcal{P}_{H}=\sqrt{\begin{bmatrix}A_{(0)}^{2}+A_{(1)}^{2}&A_{(0)}A_{(1)}&&&\\ A_{(0)}A_{(1)}&A_{(0)}^{2}+A_{(1)}^{2}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&A_{(0)}A_{(1)}\\ &&&A_{(0)}A_{(1)}&A_{(0)}^{2}+A_{(1)}^{2}\end{bmatrix}},

(6)

which is constructed based on approximating the asymptotic eigenvalue distribution of $\mathcal{Y}\mathcal{T}_{H}$ . Similar to the block circulant preconditioner, the computation of $\mathcal{P}_{H}^{-1}$ can also be parallelizable over $n$ processors. We refer readers to Bini1990 for a detailed discussion on such a parallel implementation.

We remind that in a pure Toeplitz setting the algebra of matrices diagonalized by the (real) orthogonal sine transform (i.e., the so-called $\tau$ algebra BC83 ) was extensively used in the 90s (see D1 ; D2 ; DS ; Slinear99 and references therein), while recently the approach based on $\tau$ preconditioning gained further attention in the context of approximated fractional differential equations (see BEV21 ; DMS16 and references therein). Here, we propose its use in a different setting where the Toeplitz structure is nonsymmetric. We recall that the $\tau$ preconditioning is superior when compared to the circulant preconditioning, when real symmetric Toeplitz-like structures are considered, as discussed and theoretically motivated in DS ; Slinear99 .

In particular, we will show that the eigenvalues of the preconditioned matrix sequence $\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n}$ are clustered around $\pm 1$ , which leads to fast convergence when MINRES is employed.

Yet, $\mathcal{P}_{H}$ requires fast diagonalizability of $M_{m}$ and $K_{m}$ in order to be efficiently implemented. When such diagonalizability is not available, we further propose the following preconditioner $\mathcal{P}_{\theta}$ as a modification of $\mathcal{P}_{H}$ ,

\displaystyle\mathcal{P}_{\theta}=\mathcal{H}\otimes M_{m}+\mathcal{H}_{\theta}\otimes\tau K_{m}

(7)

where

\displaystyle\mathcal{H}=\sqrt{\begin{bmatrix}2&-1&&&\\ -1&2&-1&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&-1\\ &&&-1&2\end{bmatrix}}

and

\displaystyle\mathcal{H}_{\theta}=\sqrt{\begin{bmatrix}\theta^{2}+(1-\theta)^{2}&\theta(1-\theta)&&&\\ \theta(1-\theta)&\theta^{2}+(1-\theta)^{2}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\theta(1-\theta)\\ &&&\theta(1-\theta)&\theta^{2}+(1-\theta)^{2}\end{bmatrix}}.

Its derivation and implementation will be discussed in Section 3.1.4.

In general, as mentioned in (ANU:9672992, , Chapter 6), the convergence study of preconditioning strategies for nonsymmetric problems is heuristic, since descriptive convergence bounds are usually not available for the generalized minimal residual (GMRES) method or any of the other applicable nonsymmetric Krylov subspace iterative methods.

The paper is organized as follows. In Section 2, we review some preliminary results on block Toeplitz matrices and matrix-valued functions. We present our main results in Section 3 where our proposed preconditioner and its effectiveness are provided. An extension of our preconditioning method is given in Section 4 in the setting of high-order in time discretization schemes. Numerical tests are given in Section 5 to support the analysis of our proposed preconditioner. Finally in Section 6 we draw conclusions and list a few open problems, including the challenging case of variable-coefficients when both the sizes $m$ and $n$ tend to infinity.

2 Preliminaries

In the following subsections, we provide some useful background knowledge about block Toeplitz matrices and matrix functions.

2.1 Block Toeplitz matrices

We let $L^{1}([-\pi,\pi],\mathbb{C}^{m\times m})$ be the Banach space of all matrix-valued functions that are Lebesgue integrable over $[-\pi,\pi]$ . The $L^{1}$ -norm induced by the trace norm over $\mathbb{C}^{m\times m}$ is

\|f\|_{L^{1}}=\frac{1}{2\pi}\int_{-\pi}^{\pi}\|f(\psi)\|_{\text{tr}}\,d\psi<\infty,

where $\|A_{n}\|_{\text{tr}}:=\sum_{j=1}^{n}\sigma_{j}(A_{n})$ denotes the trace norm of $A_{n}\in\mathbb{C}^{n\times n}$ . The block Toeplitz matrix generated by $f\in L^{1}([-\pi,\pi],\mathbb{C}^{m\times m})$ is denoted by $T_{(n,m)}[f]$ , namely

\displaystyle{T}_{(n,m)}[f]

\displaystyle=

\displaystyle\begin{bmatrix}A_{(0)}&A_{(-1)}&\cdots&A_{(-n+1)}\\ A_{(1)}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&A_{(-1)}\\ A_{(n-1)}&\cdots&A_{(1)}&A_{(0)}\end{bmatrix}\in\mathbb{C}^{mn\times mn},

(8)

where the Fourier coefficients of $f$ are

A_{(k)}=\frac{1}{2\pi}\int_{-\pi}^{\pi}f(\psi)e^{-\mathbf{i}k\psi}\,d\psi\in\mathbb{C}^{m\times m},\qquad k=0,\pm 1,\pm 2,\dots.

The function $f$ is called the generating function or symbol of $T_{({n},m)}[f]$ . For thorough discussions on the related properties of (multilevel) block Toeplitz matrices, we refer the readers to MR2108963 ; Chan:1996:CGM:240441.240445 ; book-GLT-I and references therein.

Before discussing the asymptotic singular value or spectral distribution of $T_{(n,m)}[f]$ associated with $f$ , which is crucial to develop our preconditioning theory, we introduce the following definition.

Let $C_{c}(\mathbb{C})$ (or $C_{c}(\mathbb{R})$ ) be the space of complex-valued continuous functions defined on $\mathbb{C}$ (or $\mathbb{R}$ ) with bounded support.

Definition 1.

Ferrari2019 Let $\{A_{(n,s)}\}_{n}$ be a sequence of $sn$ -by- $sn$ matrices with $s$ fixed positive integer.

[1.]

We say that $\{A_{(n,s)}\}_{n}$ has an asymptotic singular value distribution described by an $s$ -by- $s$ matrix-valued $f$ if

\lim_{n\to\infty}\frac{1}{sn}\sum_{j=1}^{ns}F(\sigma_{j}(A_{(n,s)}))=\frac{1}{2\pi}\int_{-\pi}^{\pi}\frac{1}{s}\sum_{j=1}^{s}F(\sigma_{j}(f(x)))\,dx,\quad\forall F\in C_{c}(\mathbb{R})

and we write $\{A_{(n,s)}\}_{n}\sim_{\sigma}f$ .

We say that $\{A_{(n,s)}\}_{n}$ has an asymptotic spectral distribution described by an $s$ -by- $s$ matrix-valued $f$ if

\lim_{n\to\infty}\frac{1}{sn}\sum_{j=1}^{np}F(\lambda_{j}(A_{(n,s)}))=\frac{1}{2\pi}\int_{-\pi}^{\pi}\frac{1}{s}\sum_{j=1}^{s}F(\lambda_{j}(f(x)))\,dx,\quad\forall F\in C_{c}(\mathbb{C})

and we write $\{A_{(n,s)}\}_{n}\sim_{\lambda}f$ .

Before presenting our main preconditioning results, we introduce the following notation. Given $D\subset\mathbb{R}^{k}$ with $0<\mu_{k}(D)<\infty$ , we define $\widetilde{D}_{p}$ as $D\bigcup D_{p}$ , where $p\in\mathbb{R}^{k}$ and $D_{p}=p+D$ , with the constraint that $D$ and $D_{p}$ have non-intersecting interior part, i.e., $D^{\circ}\bigcap D_{p}^{\circ}=\emptyset$ . In this way, we have $\mu_{k}(\widetilde{D}_{p})=2\mu_{k}(D)$ . Given any $g$ defined over $D$ , we define $\psi_{g}$ over $\widetilde{D}_{p}$ in the following way

\psi_{g}(x)=\left\{\begin{array}[]{cc}g(x),&x\in D,\\ -g(x-p),&x\in D_{p},\ x\notin D.\end{array}\right.\,

(9)

Theorem 1.

(Ferrari2019, , Theorem 3.4) Suppose $f\in L^{1}([-\pi,\pi],\mathbb{C}^{m\times m})$ has Hermitian Fourier coefficients. Let $T_{(n,m)}[f]\in\mathbb{C}^{mn\times mn}$ be the block Toeplitz matrix generated by $f$ and let $Y_{(n,m)}=Y_{n}\otimes I_{m}\in\mathbb{R}^{mn\times mn}$ . Then

\{Y_{(n,m)}T_{(n,m)}[f]\}_{n}\sim_{\lambda}\psi_{|f|},\qquad|f|=(ff^{*})^{1/2},

over the domain $\widetilde{D}$ with $D=[0,2\pi]$ and $p=-2\pi$ , where $\psi_{|f|}$ is defined in (9). That is

\lim_{n\to\infty}\frac{1}{mn}\sum_{j=1}^{mn}F(\lambda_{j}(Y_{(n,m)}T_{(n,m)}[f]))=\frac{1}{4m\pi}\int_{-\pi}^{\pi}\sum_{j=1}^{m}F(\lambda_{j}(|f|(x)))+F(-\lambda_{j}(|f|(x)))\,dx,

where $\lambda_{j}(|f|(x))$ , $j=1,2,\dots,m$ , are the eigenvalue functions of $|f|$ .

As for the symmetrized matrix $\mathcal{Y}\mathcal{T}_{H}$ given in (5), we can see by MazzaPestana2018 ; Ferrari2019 that their eigenvalues are distributed as $\pm|g_{\theta}|$ . By Theorem 1, $\mathcal{Y}\mathcal{T}_{H}$ is always symmetric indefinite when $n$ is sufficiently large, which gives grounds for the use of MINRES in this work.

2.2 Matrix functions

In this subsection, several useful results on functions of matrices are presented.

If $F$ is analytic on a simply connected open region of the complex plane containing the interval $[-1,1]$ , there exist ellipses with foci in $-1$ and $1$ such that $F$ is analytic in their interiors. Let $\alpha>1$ and $\beta>0$ be the half axes of such an ellipse with $\sqrt{\alpha^{2}-\beta^{2}}=1$ . Such an ellipse, denoted by ${E}_{\mathcal{X}}$ , is completely specified once the number $\mathcal{X}:=\alpha+\beta$ is known.

Theorem 2 (Bernstein’s theorem).

(Benzi1999, , Theorem 2.1) Let the function $F$ be analytic in the interior of the ellipse $\mathbb{E}_{\mathcal{X}}$ with $\mathcal{X}>1$ and continuous on $\mathbb{E}_{\mathcal{X}}$ . In addition, suppose $F(x)$ is real for real $x$ . Then, the best approximation error

E_{k}(F):=\textrm{inf}\{\|E-p\|_{\infty}:\textrm{deg}(p)\leq k\}\leq\frac{2M(\mathcal{X})}{\mathcal{X}^{k}(\mathcal{X}-1)},

where $\textrm{deg}(p)$ denotes the degree of the polynomial $p(x)$ and

\|F-p\|_{\infty}=\max_{-1\leq x\leq 1}|F(x)-p(x)|,\quad M(\mathcal{X})=\max_{x\in\mathbb{E}_{\mathcal{X}}}\{|F(x)|\}.

Let $A_{n}$ be an $n\times n$ symmetric matrix and let $[\lambda_{\min},\lambda_{\max}]$ be the smallest interval containing $\sigma(A_{n})$ . If we introduce the linear affine function

\psi(\lambda)=\frac{2\lambda-(\lambda_{\min}+\lambda_{\max})}{\lambda_{\max}-\lambda_{\min}},

then $\psi([\lambda_{\min},\lambda_{\max}])=[-1,1]$ and therefore the spectrum of the symmetric matrix

B_{n}:=\psi(A_{n})=\frac{2}{\lambda_{\max}-\lambda_{\min}}A_{n}-\frac{\lambda_{\min}+\lambda_{\max}}{\lambda_{\max}-\lambda_{\min}}I_{n}

is contained in $[-1,1]$ . Given a function $f$ analytic on a simply connected region containing $[\lambda_{\min},\lambda_{\max}]$ and such that $f(\lambda)$ is real when $\lambda$ is real, the function $F=f\circ\psi^{-1}$ satisfies the assumptions of Bernstein’s theorem.

In the special case where $A_{n}$ is SPD and $f(x)=x^{-1/2}$ that are of our interest in this work, we apply Bernstein’s result to the function

F(x)=\frac{1}{\sqrt{\frac{(b-a)}{2}x+\frac{a+b}{2}}},

(10)

where $a=\lambda_{\min}(A_{n}),b=\lambda_{\max}(A_{n})$ , and $1<\mathcal{X}<\frac{\sqrt{\kappa}+1}{\sqrt{\kappa}-1}$ with the spectral condition number of $A_{n}$ being $\kappa={b}/{a}.$

3 Main results

In this section, we provide our results on the proposed preconditioner $\mathcal{P}$ defined by (6), whose design is based on the following spectral symbol

	$\displaystyle\|g_{\theta}\|^{2}$	$\displaystyle=$	$\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+A_{(0)}A_{(1)}e^{-\mathbf{i}x}+A_{(0)}A_{(1)}e^{\mathbf{i}x}$		(11)
		$\displaystyle=$	$\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos{(x)}.$		(11)

Due to the fact that $\mathcal{P}=\sqrt{T_{(n,m)}[|g_{\theta}|^{2}]}$ , it can be regarded as an optimal preconditioner for $\mathcal{Y}\mathcal{T}$ , in the sense of the spectral distribution theory developed in the work Ferrari2019 . We discuss mainly MINRES combined with our proposed preconditioner, and related issues such as implementations and convergence analysis are given as well.

3.1 Preconditioning for heat equations

In this section, we provide a MINRES approach for $\mathcal{Y}\mathcal{T}_{H}\mathbf{u}=\mathcal{Y}\mathbf{f}_{H}$ .

3.1.1 Implementations

We first discuss the fast computation of $\mathcal{Y}\mathcal{T}_{H}\mathbf{v}$ for any given vector $\mathbf{v}$ . Since both $M_{m}$ and $K_{m}$ are sparse matrices, $\mathcal{T}_{H}$ is sparse. Hence, computing the matrix-vector product $\mathcal{T}_{H}\mathbf{v}$ only requires linear complexity of $\mathcal{O}(mn)$ . Finally, computing $\mathcal{Y}\mathcal{T}_{H}\mathbf{v}$ needs the same complexity since $\mathcal{Y}$ simply imposes a reordering on the vector $\mathcal{T}_{H}\mathbf{v}$ . Alternatively, due to the fact that $\mathcal{T}_{H}$ itself is a block Toeplitz matrix, it is well-known that the product $\mathcal{Y}\mathcal{T}_{H}\mathbf{v}$ can be computed in $\mathcal{O}(mn\log{n})$ operations using FFTs and the related storage is of $\mathcal{O}(mn)$ , where $m$ can be regarded as a fixed constant and $n$ is the effective dimensional parameter.

We first indicate that $\mathcal{P}$ has the following decomposition

\displaystyle\mathcal{P}_{H}=\sqrt{T_{(n,m)}[|g_{\theta}|^{2}]}

\displaystyle=\sqrt{I_{n}\otimes(A_{(0)}^{2}+A_{(1)}^{2})+P_{n}\otimes 2(A_{(0)}A_{(1)})}

where

P_{n}=\begin{bmatrix}0&\frac{1}{2}&&&\\ \frac{1}{2}&0&\frac{1}{2}&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\frac{1}{2}\\ &&&\frac{1}{2}&0\end{bmatrix}

(12)

is a tridiagonal Toeplitz matrix which has the eigendecomposition $P_{n}=S_{n}D_{n}S_{n}$ with $S_{n}=\sqrt{\frac{2}{n+1}}\Big{[}\sin{(\frac{ij\pi}{n+1})}\Big{]}^{n}_{i,j=1}\in\mathbb{R}^{n\times n}$ being the symmetric discrete sine transform matrix. Hence,

	$\displaystyle\mathcal{P}_{H}$	$\displaystyle=\sqrt{I_{n}\otimes U_{m}(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})U_{m}^{T}+P_{n}\otimes(U_{m}2\Lambda_{(0)}\Lambda_{(1)}U_{m}^{T})}$
		$\displaystyle=\sqrt{(S_{n}\otimes U_{m})\Big{(}I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}\Big{)}(S_{n}\otimes U_{m})^{T}}$
		$\displaystyle=(S_{n}\otimes U_{m})\sqrt{I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}}(S_{n}\otimes U_{m})^{T}.$

The product $\mathbf{z}=\mathcal{P}_{H}^{-1}\mathbf{v}$ can be computed via the following three-step procedures:

[1.]
1.

$\textrm{Compute}~{}\widetilde{\mathbf{v}}=(S_{n}\otimes U_{m}^{T})\mathbf{v}\mathchar 24635\relax\;$
2.

$\textrm{Compute}~{}\widetilde{\mathbf{z}}=\Big{(}\sqrt{I_{n}\otimes(\Lambda^{2}_{(0)}+\Lambda^{2}_{(1)})+D_{n}\otimes 2\Lambda_{(0)}\Lambda_{(1)}}\Big{)}^{-1}\widetilde{\mathbf{v}}\mathchar 24635\relax\;$
3.

$\textrm{Compute}~{}{\mathbf{z}}=(S_{n}\otimes{U_{m}})\widetilde{\mathbf{z}}$ .

When both $M_{m}$ and $K_{m}$ are diagonalizable by $U_{m}=S_{m}$ , which holds true when the spatial discretization is a finite difference/element method with a uniform square grid and a constant diffusion coefficient function $a(x)$ , the matrix-vector product $\mathbf{z}=\mathcal{P}_{H}^{-1}\mathbf{v}$ can be computed efficiently in $\mathcal{O}(nm(\log{m}+\log{n}))$ operations using fast sine transforms with a storage is of order $\mathcal{O}(nm)$ .

When the diffusion coefficients change corresponding to space, the matrix $K_{m}$ is no longer a Toeplitz matrix and is not diagonalizable by $S_{m}$ . Considering the one-dimensional space case, let the spatial grid size $h=\frac{1}{m+1}$ , we have the following coefficient matrix

K_{m}=\frac{1}{h^{2}}\begin{bmatrix}a_{\frac{1}{2}}+a_{\frac{3}{2}}&-a_{\frac{3}{2}}&&&\\ -a_{\frac{3}{2}}&a_{\frac{3}{2}}+a_{\frac{5}{2}}&-a_{\frac{5}{2}}&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&-a_{\frac{2m-1}{2}}\\ &&&-a_{\frac{2m-1}{2}}&a_{\frac{2m-1}{2}}+a_{\frac{2m+1}{2}}\end{bmatrix},

(13)

where $a_{j}=x_{\textrm{m}in}+jh$ for all $j\in[0,m]$ . In this case, we can construct a preconditioner for $K_{m}$ by averaging is entries along the three diagonals to obtain a tridiagonal symmetric Toeplitz matrix. Clearly, such a new preconditioner can be fast diagonalized by $S_{m}$ . Thus, the overall preconditioner $\mathcal{P}_{H}$ can be diagonalized as well.

3.1.2 Eigenvalue analysis of $\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}$

The following proposition guarantees the effectiveness of $\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}$ as a preconditioner for $\mathcal{Y}\mathcal{T}_{H}$ , showing that the eigenvalues of $\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}$ are clustered around $\pm 1$ which ensures fast convergence of MINRES.

Proposition 3.

Let $\mathcal{T}_{H}$ and $\mathcal{Y}\mathcal{T}_{H}$ be defined in (2) and (5), respectively. Then

\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{0},

where $\mathcal{Q}_{0}$ is both symmetric and orthogonal matrix, and have only $-1$ and $1$ as eigenvalues.

Proof: Since the eigendecomposition of the symmetric matrix $\mathcal{Y}\mathcal{T}_{H}$ is $\mathcal{Y}\mathcal{T}_{H}=E^{\top}\mathcal{Q}E$ , we can have

$\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}$	$\displaystyle=$	$\displaystyle\left(\sqrt{(\mathcal{Y}\mathcal{T}_{H})^{\top}\mathcal{Y}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}$
	$\displaystyle=$	$\displaystyle\left(\sqrt{(\mathcal{Y}\mathcal{T}_{H})^{2}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}$
	$\displaystyle=$	$\displaystyle E^{\top}(\mathcal{Q}^{2})^{-1/2}EE^{\top}\mathcal{Q}E$
	$\displaystyle=$	$\displaystyle E^{\top}\hat{\mathcal{Q}}E$
	$\displaystyle=$	$\displaystyle\mathcal{Q}_{0}$

where $\hat{\mathcal{Q}}$ is a diagonal matrix whose entries are either $-1$ or $1$ . Therefore, $\mathcal{Q}_{0}$ is symmetric and orthogonal, and hence its eigenvalues are only $-1$ and $1$ .

Now, we turn our focus to the following lemma and proposition, which will be helpful to show our main result.

Lemma 4.

Let $\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn}$ be defined in (2) and (6), respectively. Then,

\textrm{rank}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K})\leq Km,

for some positive integer $K$ provided that $n>Km$ .

Proof: First, we observe that

\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2}=\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&-A_{(1)}^{2}\\ \end{bmatrix}.

Second, exploiting the simple structures of both $\mathcal{T}_{H}^{\top}\mathcal{T}_{H}$ and $\mathcal{P}_{H}^{2}$ , we have by direct computations

(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{n_{\alpha}}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2n_{\beta}}=\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}

(14)

for integer values $n_{\alpha}$ and $n_{\beta}$ , where $*$ represents a nonzero entry. Namely, $(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{n_{\alpha}}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2n_{\beta}}$ is a block matrix with one block in its Southeast corner, and the block is of size $(n_{\alpha}+1)m\times(n_{\beta}+1)m$ . Thus,

	$\displaystyle(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K}$	$\displaystyle=$	$\displaystyle\sum_{i=0}^{K-1}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i}\mathcal{P}_{H}^{2i}-(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i-1}\mathcal{P}_{H}^{2i+2})$
		$\displaystyle=$	$\displaystyle\sum_{i=0}^{K-1}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K-i-1}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H}-\mathcal{P}_{H}^{2})\mathcal{P}_{H}^{2i}$

is also a block matrix with one block in its Southeast corner, which is of size $Km\times Km$ provided that $n>Km$ . Hence, we have $\textrm{rank}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}-\mathcal{P}_{H}^{2K})\leq Km$ .

Remark 1.

The computational lemma used in the proof of Lemma 4 was first given in (doi:10.1137/080720280, , Lemma 3.11).

Proposition 5.

Let $\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn}$ be defined in (2) and (6), respectively. Then for any $\epsilon>0$ there exists an integer $K$ such that for all $n>Km$

\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{P}_{H}^{-1}=\mathcal{E}+\mathcal{R}_{1},

where $\|\mathcal{E}\|_{2}\leq\epsilon$ and $\textrm{rank}(\mathcal{R}_{1})\leq Km$ .

Proof: Let $f(x)=x^{-1/2}$ and $F(x)$ be defined in equation (10). By Theorem 2, there exists a polynomial $p_{K}$ with degree less than or equal to $K$ such that

	$\displaystyle\\|\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\\|_{2}$	$\displaystyle=\|(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{-\frac{1}{2}}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\\|_{2}$
		$\displaystyle=\max_{x\in\sigma(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})}\|F(x)-p_{K}(x)\|$
		$\displaystyle\leq\\|F-p_{K}(x)\\|_{\infty}$
		$\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}})}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}^{K}},$

	$\displaystyle\\|\mathcal{P}_{H}^{-1}-p_{K}(\mathcal{P}_{H}^{2})\\|_{2}$	$\displaystyle=\\|(\mathcal{P}_{H}^{2})^{-\frac{1}{2}}-p_{K}(\mathcal{P}_{H}^{2})\\|_{2}$
		$\displaystyle=\max_{x\in\sigma(\mathcal{P}_{H}^{2})}\|F(x)-p_{K}(x)\|$
		$\displaystyle\leq\\|F-p_{K}(x)\\|_{\infty}$
		$\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{P}_{H}^{2}})}{\mathcal{X}_{\mathcal{P}_{H}^{2}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{P}_{H}^{2}}^{K}},$

where

{\color[rgb]{0,0,0}1<\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}<\frac{\sqrt{\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}}+1}{\sqrt{\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}}-1}},\qquad 1<\mathcal{X}_{\mathcal{P}_{H}^{2}}<\frac{\sqrt{\kappa_{\mathcal{P}_{H}^{2}}}+1}{\sqrt{\kappa_{\mathcal{P}_{H}^{2}}}-1},

and $\kappa_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}$ and $\kappa_{\mathcal{P}_{H}^{2}}$ are the condition numbers of $\mathcal{T}_{H}^{\top}\mathcal{T}_{H}$ and $\mathcal{P}_{H}^{2}$ , respectively. Thus, for any $\epsilon>0$ there exists an integer $K$ such that

{\color[rgb]{0,0,0}\|\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\|_{2}\leq\epsilon}\quad\textrm{and}\quad\|\mathcal{P}_{H}^{-1}-p_{K}(\mathcal{P}_{H}^{2})\|_{2}\leq\epsilon.

Also, we have

\displaystyle p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})-p_{K}(\mathcal{P}_{H}^{2})=\underbrace{\sum_{i=0}^{K}a_{i}((\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{i}-\mathcal{P}_{H}^{2i})}_{=:\mathcal{R}_{1}}.

By Lemma 4, we know that $\mathcal{R}_{1}$ has the same sparsity structure as that of ${\color[rgb]{0,0,0}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{K}}-\mathcal{P}_{H}^{2K}$ . Consequently, we deduce $\textrm{rank}(\mathcal{R}_{1})\leq{\color[rgb]{0,0,0}Km}$ .

We then obtain

	$\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{P}_{H}^{-1}$
	$\displaystyle=\underbrace{(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{-\frac{1}{2}}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})+p_{K}(\mathcal{P}_{H}^{2})-\mathcal{P}_{H}^{-1}}_{{\color[rgb]{0,0,0}=:\mathcal{E}}}+\underbrace{p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})-p_{K}(\mathcal{P}_{H}^{2})}_{{\color[rgb]{0,0,0}=:}\mathcal{R}_{1}},$

where $\|\mathcal{E}\|_{2}\leq 2\epsilon$ and $\textrm{rank}(\mathcal{R}_{1})\leq Km$ . Therefore, the proof is concluded.

Theorem 6.

Let $\mathcal{Y}\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn}$ be defined in (5) and (6), respectively. Then for any $\epsilon>0$ there exists an integer $K$ such that for all $n>{\color[rgb]{0,0,0}Km}$

\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{H}+\mathcal{{E}_{\textrm{H}}}+\mathcal{{R}_{\textrm{H}}},

where $\mathcal{Q}_{H}$ is both symmetric and orthogonal, $\|\mathcal{{E}_{\textrm{H}}}\|_{2}\leq\epsilon$ , and $\textrm{rank}(\mathcal{{R}_{\textrm{H}}})\leq{\color[rgb]{0,0,0}Km}.$

Proof: By Proposition 3, we can write

\displaystyle\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Q}_{0},

By Proposition 5, we then have

	$\displaystyle\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}$	$\displaystyle=$	$\displaystyle(\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-\mathcal{E}-\mathcal{R}_{1})\mathcal{Y}\mathcal{T}_{H}$
		$\displaystyle=$	$\displaystyle\mathcal{Q}_{0}\underbrace{-\mathcal{E}\mathcal{Y}\mathcal{T}_{H}}_{=:\mathcal{{E}}_{H}}\underbrace{-\mathcal{R}_{1}\mathcal{Y}\mathcal{T}_{H}}_{=:\mathcal{{R}}_{H}},$

where

\displaystyle\textrm{rank}(\mathcal{{R}}_{H}){\color[rgb]{0,0,0}=\textrm{rank}(\mathcal{R}_{1}\mathcal{Y}\mathcal{T}_{H})\leq Km}

and

\displaystyle\|\mathcal{{E}}_{H}\|_{2}=\|-\mathcal{E}\mathcal{Y}\mathcal{T}_{H}\|_{2}\leq\|\mathcal{Y}\mathcal{T}_{H}\|_{2}\epsilon.

Note that

\displaystyle\|\mathcal{Y}\mathcal{T}_{H}\|_{2}=\|\mathcal{T}_{H}\|_{2}\leq\|g_{\theta}\|_{\infty}.

by using the general inequalities in (Serra_Tilli_2002, , Corollary 4.2), noticing that the Schatten $\infty$ norm in that paper is exactly the spectral norm $\|\cdot\|_{2}$ . Furthermore, we have strict inequality, that is $\|YT\|_{2}=\|T\|_{2}<\|g_{\theta}\|_{\infty}$ , whenever the infimum of $|g_{\theta}|$ is strictly bounded by $\|g_{\theta}\|_{\infty}$ , as it often occurs, including in our case.

Hence, $\|\mathcal{Y}\mathcal{T}_{H}\|_{2}$ is uniformly bounded with respect to $n$ and the proof is concluded.

By (BRANDTS20103100, , Corollary 3) and Theorem 1, we know from Theorem 6 that the preconditioned matrix sequence $\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n}$ has clustered eigenvalues around $\pm 1$ , with a number of outliers independent of $n$ . Hence, the convergence is independent of the time steps, and we can expect fast MINRES converges in exact arithmetic with $\mathcal{P}_{H}$ as a preconditioner.

3.1.3 Eigenvalue analysis of $(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}$

For a complete eigenvalue analysis, we examine the preconditioned normal equation matrix $(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}$ , even though we do not use the conjugate gradient method on normal equations (CGNE) in this work.

As for implementations, the product $(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}\mathbf{v}$ for any vector $\mathbf{v}$ can be computed effectively using the similar arguments in Section 3.1.1.

In the following result, we show that the eigenvalues of $(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}$ are clustered around $1$ with outliers independent of the time step $n$ , which implies fast convergence of CGNE.

Theorem 7.

Let $\mathcal{T}_{H},\mathcal{P}_{H}\in\mathbb{R}^{mn\times mn}$ be defined in (2) and (6), respectively. Then,

\displaystyle(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}=I_{mn}+\mathcal{\widetilde{R}}_{H},

where $I_{mn}$ is the $mn$ by $mn$ identity matrix and $\textrm{rank}{\color[rgb]{0,0,0}(\mathcal{\widetilde{R}}_{H})}\leq m$ .

Proof: Direct computations show that

\displaystyle\mathcal{P}_{H}^{2}-\mathcal{T}_{H}^{T}\mathcal{T}_{H}=\underbrace{\begin{bmatrix}O_{m}&&&\\ &\ddots&&\\ &&O_{m}&\\ &&&A^{2}_{(1)}\\ \end{bmatrix}}_{=:\mathcal{\widetilde{R}}_{0}},

where $\textrm{rank}(\mathcal{\widetilde{R}}_{0})\leq m$ . The matrix equation further leads to

\displaystyle I_{mn}-\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{T}_{H}^{T}=\underbrace{\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{\widetilde{R}}_{0}\mathcal{T}_{H}^{-1}}_{=:\mathcal{\widetilde{R}}_{1}},

(15)

where $\textrm{rank}(\mathcal{\widetilde{R}}_{1})=\textrm{rank}(\mathcal{\widetilde{R}}_{0})\leq m$ .

We now consider the normal equation of $\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}$ , namely

\displaystyle(\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}=\mathcal{Y}\mathcal{T}_{H}\mathcal{P}_{H}^{-2}\mathcal{T}_{H}^{T}\mathcal{Y}=\mathcal{Y}(I_{mn}-\mathcal{\widetilde{R}}_{1})\mathcal{Y}=I_{mn}\underbrace{-\mathcal{Y}\mathcal{\widetilde{R}}_{1}\mathcal{Y}}_{=:\mathcal{\widetilde{R}}_{H}}

where $\textrm{rank}(\mathcal{\widetilde{R}}_{H})=\textrm{rank}(\mathcal{\widetilde{R}}_{1})\leq m$ .

Notice that every rank-one term in $\mathcal{\widetilde{R}}$ can perturb at the same time one eigenvalue of $1$ . Therefore, by Theorem 7, there are at least $m(n-1)$ eigenvalues of the normal equation matrix are equal to $1$ , which implies rapid convergence of CGNE.

3.1.4 Modified Preconditioner

Following our spectral symbol matching strategy, the construction of the modified preconditioner $\mathcal{P}_{\theta}$ is motivated by the following approximation without requiring the fast diagonalizability of $M_{m}$ and $K_{m}$ .

Since the original preconditioner $\mathcal{P}_{H}$ satisfy $\{\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n}$ are clustered around $\pm 1$ , where $g_{\theta}(x)=A_{(0)}+A_{(1)}e^{\textbf{i}x}$ defined by (3), we can approximate the spectral symbol $g_{\theta}$

\displaystyle\sqrt{|g_{\theta}|^{2}}=\sqrt{A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos(x)}

where

			$\displaystyle A_{(0)}^{2}+A_{(1)}^{2}+2A_{(0)}A_{(1)}\cos(x)$
		$\displaystyle=$	$\displaystyle(2-2\cos(x))M_{m}^{2}+(\theta^{2}+(1-\theta)^{2}+2\theta(1-\theta)\cos(x))\tau^{2}K_{m}^{2}$
			$\displaystyle+2((2\theta-1)+(1-2\theta)\cos(x))\tau K_{m}M_{m}$

via the following matrix-valued function

\displaystyle\widetilde{g_{\theta}}=\sqrt{2-2\cos(x)}M_{m}+\tau\sqrt{\theta^{2}+(1-\theta)^{2}+2\theta(1-\theta)\cos(x)}K_{m}

Hence, we can regard $\mathcal{P}_{\theta}$ as satisfying $\{\mathcal{P}_{\theta}^{-1}\mathcal{Y}\mathcal{T}_{H}\}_{n}$ are clustered around $\pm 1$ .

In matrix form, the corresponding preconditioner generated by $\widetilde{g_{\theta}}$ is precisely our proposed modified preconditioner $\mathcal{P}_{\theta}$ defined by (7).

The product $\mathcal{P}_{\theta}^{-1}\mathbf{v}$ for any vector $\mathbf{v}$ can be implemented by the following three-step procedures, which does not require the fast diagonalizability of $M_{m}$ and $K_{m}$ . Notice that both $\mathcal{H}$ and $\mathcal{H}_{\theta}$ from (7) are Tau matrices. Hence,

	$\displaystyle\mathcal{P}_{\theta}$	$\displaystyle=\mathcal{H}\otimes M_{m}+\mathcal{H}_{\theta}\otimes\tau K_{m}$
		$\displaystyle=(S_{n}\Lambda_{\mathcal{H}}S_{n})\otimes M_{m}+(S_{n}\Lambda_{\mathcal{H}_{\theta}}S_{n})\otimes\tau K_{m}$
		$\displaystyle=(S_{n}\otimes I_{m})(\Lambda_{\mathcal{H}}\otimes M_{m}+\Lambda_{\mathcal{H}_{\theta}}\otimes K_{m})(S_{n}\otimes I_{m})^{T},$

where $\Lambda_{\mathcal{H}}$ and $\Lambda_{\mathcal{H}_{\theta}}$ are respectively the diagonal matrices containing the eigenvalues of $\mathcal{H}$ and $\mathcal{H}_{\theta}$ .

Hence, the computations of $\mathbf{z}=\mathcal{P}_{\theta}^{-1}\mathbf{v}$ can be implemented via the following three steps:

[1.]
1.

$\textrm{Compute}~{}\widetilde{\mathbf{v}}=(S_{n}\otimes I_{m})\mathbf{v}\mathchar 24635\relax\;$
2.

$\textrm{Compute}~{}\widetilde{\mathbf{z}}=(\Lambda_{\mathcal{H}}\otimes M_{m}+\Lambda_{\mathcal{H}_{\theta}}\otimes K_{m})^{-1}\widetilde{\mathbf{v}}\mathchar 24635\relax\;$
3.

$\textrm{Compute}~{}{\mathbf{z}}=(S_{n}\otimes{I_{m}})\widetilde{\mathbf{z}}$ .

Both Steps 1 & 3 can be computed efficiently via fast sine transforms in $\mathcal{O}(mn\log{n})$ operations. As for Step 2, the shifted Laplacian systems can be efficiently solved for example using the multigrid methods doi:10.1137/15M102085X . For more details regarding such efficient implementation, we refer to (doi:10.1137/16M1062016, , Section 3.1).

4 Extension to multistep methods

Our proposed preconditioning method can be easily extended for high-order in time discretization schemes. Consider again (1), using an $l$ -order backward difference scheme for time, we have the following all-at-once system $\widehat{\mathcal{T}}\mathbf{u}=\widehat{\mathbf{f}}$ to solve for $\mathbf{u}$ , where

\widehat{\mathcal{T}}=\begin{bmatrix}A_{(0)}&&&&&\\ A_{(1)}&\ddots&&&&\\ \vdots&\ddots&\ddots&&&\\ A_{(l)}&\ddots&\ddots&\ddots&&\\ &\ddots&\ddots&\ddots&\ddots&\\ &&\ddots&\ddots&\ddots&\ddots\\ &&&A_{(l)}&\cdots&A_{(1)}&A_{(0)}\end{bmatrix}

(16)

is a $mn$ by $mn$ multiple block Toeplitz matrix, which is generated by the matrix-valued function

\widehat{g}(x)=A_{(0)}+A_{(1)}e^{\textbf{i}x}+\cdots+A_{(l)}e^{\textbf{i}lx},

(17)

and $\widehat{\mathbf{f}}$ is an appropriate vector resulted from the adopted discretization schemes. Note that the matrices $A_{(k)},k=1,\dots,l,$ are composed of the mass matrix $M_{m}$ and the stiffness matrix $K_{m}$ .

Our approach is to first permute (16) and obtain a symmetric (indefinite) matrix

\mathcal{Y}\widehat{\mathcal{T}}=\begin{bmatrix}&&&A_{(l)}&\cdots&A_{(1)}&A_{(0)}\\ &&\iddots&\iddots&\iddots&\iddots\\ &\iddots&\iddots&\iddots&\iddots&\\ A_{(l)}&\iddots&\iddots&\iddots&&\\ \vdots&\iddots&\iddots&&&\\ A_{(1)}&\iddots&&&&\\ A_{(0)}&&&&&&\end{bmatrix},

(18)

and then construct a corresponding sine transform based preconditioner $\widehat{\mathcal{P}}$ for it. We first compute the following matrix-valued function from (17)

$\displaystyle\|\widehat{g}(x)\|^{2}$	$\displaystyle=$	$\displaystyle\underbrace{A_{(0)}^{2}+A_{(1)}^{2}+\cdots+A_{(l)}^{2}}_{=:q_{0}(A)}$
		$\displaystyle+\underbrace{(A_{(0)}A_{(1)}+A_{(1)}A_{(2)}+\cdots+A_{(l-1)}A_{(l)})}_{=:q_{1}(A)}2\cos{(x)}$
		$\displaystyle+\cdots+\underbrace{(A_{(0)}A_{(l)})}_{=:q_{l}(A)}2\cos{(lx)},$

where $q_{k},k=0,\dots,l,$ are all matrix polynomials composed of $A_{(k)},k=0,\dots,l$ . Now, by expressing $\cos{(kx)},k=1,\dots,l,$ in terms of $\cos{(x)}$ , we can further rewrite $|\widehat{g}|^{2}$ as

\displaystyle|\widehat{g}(x)|^{2}

\displaystyle=

\displaystyle\overline{q}_{0}(A)+\overline{q}_{1}(A)\cos{(x)}+\cdots+\overline{q}_{l}(A)\cos^{l}{(x)},

where $\overline{q}_{k},k=0,\dots,l,$ are certain matrix polynomials composed of $A_{(k)},k=0,\dots,l.$

We propose the following SPD preconditioner, which approximates the symbol $|\widehat{g}|$ in the eigenvalue sense, for $\mathcal{Y}\widehat{\mathcal{T}}$ :

\displaystyle{\color[rgb]{0,0,0}\widehat{\mathcal{P}}}=\sqrt{I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A)},

(20)

where $P_{n}$ is the same matrix given by (12).

Note that

	$\displaystyle{\color[rgb]{0,0,0}\widehat{\mathcal{P}}}$	$\displaystyle=\sqrt{I_{n}\otimes U_{m}\overline{q}_{0}(\Lambda)U_{m}^{T}+P_{n}\otimes U_{m}\overline{q}_{1}(\Lambda)U_{m}^{T}+\cdots+P_{n}^{l}\otimes U_{m}\overline{q}_{l}(\Lambda)U_{m}^{T}}$
		$\displaystyle=\sqrt{(S_{n}\otimes U_{m})\big{(}I_{n}\otimes\overline{q}_{0}(\Lambda)+D_{n}\otimes\overline{q}_{1}(\Lambda)+\cdots+D_{n}^{l}\otimes\overline{q}_{l}(\Lambda)\big{)}(S_{n}\otimes U_{m})^{T}}$
		$\displaystyle=(S_{n}\otimes U_{m})\sqrt{I_{n}\otimes\overline{q}_{0}(\Lambda)+D_{n}\otimes\overline{q}_{1}(\Lambda)+\cdots+D_{n}^{l}\otimes\overline{q}_{l}(\Lambda)}(S_{n}\otimes U_{m})^{T}.$

Hence, the product ${\color[rgb]{0,0,0}{\widehat{\mathcal{P}}}^{-1}}\mathbf{v}$ for any vector $\mathbf{v}$ can be computed effectively using the similar three-step procedures in Section 3.1.1.

4.1 Eigenvalue analysis of $\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}}$

As before, we first introduce matrix $\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}$ as an auxiliary for showing the eigenvalues of $\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}}$ are clustered. The following proposition guarantees the effectiveness of $\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}$ as a preconditioner for $\mathcal{Y}\widehat{\mathcal{T}}$ , showing that the eigenvalues of $\{\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}\}_{n}$ are clustered around $\pm 1$ , which leads to fast convergence of MINRES.

Proposition 8.

Let $\mathcal{Y}\widehat{\mathcal{T}},\widehat{\mathcal{T}}\in\mathbb{R}^{mn\times mn}$ be defined in (18) and (16), respectively. Then, $\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}$ is symmetric and orthogonal, and eigenvalues are only $1$ and $-1$ .

Proof: Since the eigendecomposition of the symmetric matrix $\mathcal{Y}\widehat{\mathcal{T}}$ is $\mathcal{Y}\widehat{\mathcal{T}}=E^{\top}\mathcal{Q}_{0}E$ , we can have

$\displaystyle\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}$	$\displaystyle=$	$\displaystyle\left(\sqrt{(\mathcal{Y}\widehat{\mathcal{T}})^{\top}\mathcal{Y}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}$
	$\displaystyle=$	$\displaystyle\left(\sqrt{(\mathcal{Y}\widehat{\mathcal{T}})^{2}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}$
	$\displaystyle=$	$\displaystyle E^{\top}(\mathcal{Q}_{0}^{2})^{-1/2}EE^{\top}\mathcal{Q}_{0}E$
	$\displaystyle=$	$\displaystyle E^{\top}\widehat{\mathcal{Q}_{0}}E$

where $\widehat{\mathcal{Q}_{0}}$ is a diagonal matrix whose entries are either $-1$ or $1$ . Therefore, $\left(\sqrt{\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}\right)^{-1}\mathcal{Y}\widehat{\mathcal{T}}$ is symmetric and orthogonal, and hence its eigenvalues are only $-1$ and $1$ .

In addition, we require the following lemma and proposition for showing our main result.

Lemma 9.

Let ${\color[rgb]{0,0,0}\widehat{\mathcal{T}}},\widehat{\mathcal{P}}\in\mathbb{R}^{mn\times mn}$ be defined in (16) and (20), respectively. Then,

\textrm{rank}({\color[rgb]{0,0,0}{(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})}^{K}}-\widehat{\mathcal{P}}^{2K})\leq{\color[rgb]{0,0,0}2Klm}

for some positive integer $K$ provided that $n>{\color[rgb]{0,0,0}2Klm}$ .

Proof: First, we observe from (4) that

	$\displaystyle{T}_{(n,m)}[\|\widehat{g}\|^{2}]$	$\displaystyle=$	$\displaystyle{T}_{(n,m)}[q_{0}(A)+q_{1}(A)2\cos{(x)}+\cdots+q_{l}(A)2\cos{(lx)}]$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}q_{0}(A)&q_{1}(A)&\cdots&q_{l}(A)&&&&&\\ q_{1}(A)&\ddots&\ddots&\ddots&\ddots&&&&\\ \vdots&\ddots&&&&\ddots&&&\\ q_{l}(A)&\ddots&&&&&\ddots&\\ &\ddots&&&&&&\ddots&\\ &&\ddots&&&&&\ddots&q_{l}(A)\\ &&&\ddots&&&&\ddots&\vdots\\ &&&&\ddots&\ddots&\ddots&\ddots&q_{1}(A)\\ &&&&&q_{l}(A)&\cdots&q_{1}(A)&q_{0}(A)\end{bmatrix}.$

Note that we also have

			$\displaystyle{T}_{(n,m)}[\|\widehat{g}\|^{2}]$
		$\displaystyle=$	$\displaystyle{T}_{(n,m)}[\overline{q}_{0}(A)+\overline{q}_{1}(A)\cos{(x)}+\overline{q}_{2}(A)\cos^{2}{(x)}+\cdots+\overline{q}_{l}(A)\cos^{l}{(x)}]$
		$\displaystyle=$	$\displaystyle{T}_{(n,m)}[\overline{q}_{0}(A)]+{T}_{(n,m)}[\overline{q}_{1}(A)\cos{(x)}]+{T}_{(n,m)}[\overline{q}_{2}(A)\cos^{2}{(x)}]$
			$\displaystyle+\cdots+{T}_{(n,m)}[\overline{q}_{l}(A)\cos^{l}{(x)}]$
		$\displaystyle=$	$\displaystyle I_{n}\otimes\overline{q}_{0}(A)+T_{(n,1)}[\cos{(x)}]\otimes\overline{q}_{1}(A)+T_{(n,1)}[\cos^{2}{(x)}]\otimes\overline{q}_{2}(A)$
			$\displaystyle+\cdots+T_{(n,1)}[\cos^{l}{(x)}]\otimes\overline{q}_{l}(A)$
		$\displaystyle=$	$\displaystyle I_{n}\otimes\overline{q}_{0}(A)+T_{(n,1)}[\cos{(x)}]\otimes\overline{q}_{1}(A)+(T_{(n,1)}[\cos{(x)}]^{2}+{R}_{(2)})\otimes\overline{q}_{2}(A)$
			$\displaystyle+\cdots+(T_{(n,1)}[\cos{(x)}]^{l}+{R}_{(l)})\otimes\overline{q}_{l}(A),$

where ${R}_{(k)},k=2,\dots,l,$ are low rank matrices in the sense that $\textrm{rank}({R}_{(k)})\leq 2(k-1)$ .

Recalling from (20) that $\mathcal{\widehat{P}}^{2}=I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A)$ and also note that $P_{n}=T_{(n,1)}[\cos{(x)}]$ , we have

$\displaystyle{T}_{(n,m)}[\|\widehat{g}\|^{2}]$	$\displaystyle=$	$\displaystyle I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+({\color[rgb]{0,0,0}P_{n}^{2}}+{R}_{(2)})\otimes\overline{q}_{2}(A)$
		$\displaystyle+\cdots+(P_{n}^{l}+{R}_{(l)})\otimes\overline{q}_{l}(A)$
	$\displaystyle=$	$\displaystyle\underbrace{I_{n}\otimes\overline{q}_{0}(A)+P_{n}\otimes\overline{q}_{1}(A)+{\color[rgb]{0,0,0}P_{n}^{2}}\otimes\overline{q}_{2}(A)+\cdots+P_{n}^{l}\otimes\overline{q}_{l}(A)}_{=\mathcal{\widehat{P}}^{2}}$
		$\displaystyle+\underbrace{{R}_{(2)}\otimes\overline{q}_{2}(A)+\cdots+{R}_{(l)}\otimes\overline{q}_{l}(A)}_{\widehat{\mathcal{R}}_{1}}.$

One can further show by direct computations that $\widehat{\mathcal{R}}_{1}$ has the following structure

{\color[rgb]{0,0,0}\widehat{\mathcal{R}}_{1}=\begin{bmatrix}*&\cdots&*&&&&&&\\ \vdots&&\vdots&&&&&&\\ *&\cdots&*&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}}

where $*$ represents a nonzero entry, since such a block structure is determined by the last term ${R}_{(l)}\otimes\overline{q}_{l}(A)$ . Namely, it is a block matrix with in the Southeast and Northwest corners and the block is of size $lm\times lm$ .

We can now examine the structure of the matrix ${\color[rgb]{0,0,0}\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}-\widehat{\mathcal{P}}^{2}$ . By (16), we have

	$\displaystyle\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2}$	$\displaystyle=$	$\displaystyle\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-{T}_{(n,m)}[\|\widehat{g}\|^{2}]+{T}_{(n,m)}[\|\widehat{g}\|^{2}]-\widehat{\mathcal{P}}^{2}$
		$\displaystyle=$	$\displaystyle\begin{bmatrix}&&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&&\\ &&&&&&&\cdots&\\ &&&&&&\vdots&&\vdots\\ &&&&&&&\cdots&\\ \end{bmatrix}+\widehat{\mathcal{R}}_{1},$

where the first matrix is a block matrix with a $lm\times lm$ block in the corner. Thus, ${\color[rgb]{0,0,0}\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}}-\widehat{\mathcal{P}}^{2}$ is a block matrix with a $lm\times lm$ block in the two corners. We have by direct computations

(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{n_{\alpha}}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2n_{\beta}}={\color[rgb]{0,0,0}\begin{bmatrix}*&\cdots&*&&&&&&\\ \vdots&&\vdots&&&&&&\\ *&\cdots&*&&&&&&\\ &&&&&&&&\\ &&&&&&*&\cdots&*\\ &&&&&&\vdots&&\vdots\\ &&&&&&*&\cdots&*\\ \end{bmatrix}.}

Similar to (14), by exploiting the simple structure of $\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2}$ , we can show that for given positive integers $n_{\alpha}$ and $n_{\beta}$ the matrix $(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{n_{\alpha}}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2n_{\beta}}$ is a block matrix with a block in its Southeast and Northwest corners and the block is of size $(n_{\alpha}+1)lm\times(n_{\beta}+1)lm$ . Thus,

\displaystyle(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K}-\widehat{\mathcal{P}}^{2K}

\displaystyle=

\displaystyle\sum_{i=0}^{K-1}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K-i-1}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}}-\widehat{\mathcal{P}}^{2})\widehat{\mathcal{P}}^{2i}

is also a block matrix with a block in the two corners and the size is $Klm\times Klm$ , provided that $n>{\color[rgb]{0,0,0}2Klm}$ . Hence, we have $\textrm{rank}((\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{K}-\widehat{\mathcal{P}}^{2K})\leq 2Klm$ .

The following results, i.e., Proposition 10 and Theorem 11, follow using similar arguments for showing Proposition 5 and Theorem 6. Hence, the proofs are left to the readers.

Proposition 10.

Let ${\color[rgb]{0,0,0}\widehat{\mathcal{T}}},\widehat{\mathcal{P}}\in\mathbb{R}^{mn\times mn}$ be defined in (16) and (20), respectively. Then for any $\epsilon>0$ there exists an integer $K$ such that for all $n>{\color[rgb]{0,0,0}2Klm}$

{\color[rgb]{0,0,0}(\widehat{\mathcal{T}}^{\top}\widehat{\mathcal{T}})^{-1/2}}-\widehat{\mathcal{P}}^{-1}=\widehat{\mathcal{E}}_{0}+\widehat{\mathcal{R}}_{2},

where $\|\widehat{\mathcal{E}}_{0}\|_{2}\leq\epsilon$ and $\textrm{rank}(\widehat{\mathcal{R}}_{2})\leq{\color[rgb]{0,0,0}2Klm}$ .

Theorem 11.

Let $\mathcal{Y}\mathcal{\widehat{T}},\mathcal{\widehat{P}}\in\mathbb{R}^{mn\times mn}$ be defined in (18) and (20), respectively. Then for any $\epsilon>0$ there exists an integer $K$ such that for all $n>{\color[rgb]{0,0,0}2Klm}$

\mathcal{\widehat{P}}^{-1}\mathcal{Y}\mathcal{\widehat{T}}=\mathcal{\widehat{Q}}+\mathcal{\widehat{E}}+\mathcal{\widehat{R}},

where $\mathcal{\widehat{Q}}$ is both symmetric and orthogonal, $\|\mathcal{\widehat{E}}\|_{2}\leq\epsilon$ , and $\textrm{rank}(\mathcal{\widehat{R}})\leq{\color[rgb]{0,0,0}2Klm}.$

Since both $\mathcal{\widehat{E}}$ and $\mathcal{\widehat{R}}$ in Theorem 11 are symmetric, by (BRANDTS20103100, , Corollary 3) and Theorem 1, we know that the preconditioned matrix sequence $\{\mathcal{\widehat{P}}^{-1}\mathcal{Y}\mathcal{\widehat{T}}\}_{n}$ has clustered eigenvalues around $\pm 1$ with number of outliers independent of $n$ . Hence, the convergence is independent of the time steps, and we can then expect that MINRES converges rapidly in exact arithmetic with $\mathcal{\widehat{P}}$ as a preconditioner.

5 Numerical examples

In this section, we demonstrate the effectiveness of our proposed preconditioner. All numerical experiments are carried out using MATLAB on a Dell R640 server equipped with dual Xeon Gold 6246R 16-Cores 3.4GHz CPUs, 512GB RAM running Ubuntu 20.04 LTS. The CPU time, which is denoted by “CPU” in the tables below, is estimated in seconds using the MATLAB built-in functions tic/toc. All Krylov subspace solvers are implemented using the build-in functions on MATLAB, and “Iter” refers to the iteration number required for convergence. Furthermore, we choose a zero initial guess and a stopping tolerance of $10^{-6}$ based on the reduction in relative residual norms. Throughout all examples, we consider finite difference methods with uniform spatial grids, which results in $M_{m}$ being the identity matrix and $K_{m}$ being diagonalized by the discrete sine transform. Note that in the examples “DoF” denotes the degree of freedom, and $\mathcal{C}_{H}$ is the existing absolute value block circulant preconditioner proposed in doi:10.1137/16M1062016 .

Example 1.

doi:10.1137/16M1062016 The first example is a two-dimensional problem of solving (1) with $\Omega=(0,1)\times(0,1)$ , $u_{0}(x,y)=x(x-1)y(y-1)$ , $a(x,y)=10^{-5}$ , $g=0$ , $f=0$ , and $T=1$ .

For this example, we adopt the backward Euler method and the Crank-Nicolson method in time for solving the equation and report the convergence results in Tables 1 and 2, respectively. While $\mathcal{C}_{H}$ increases quickly in iteration number with the parameters for this ill-conditioned example, our proposed preconditioners $\mathcal{P}_{H}$ and $\mathcal{P}_{\theta}$ significantly outperform $\mathcal{C}_{H}$ in terms of both iteration numbers and CPU times. In addition, we observe that $\mathcal{P}_{\theta}$ is only slightly inferior in precondition effect when compared with $\mathcal{P}_{H}$ . In fact, in most cases, their induced iteration numbers are identical. The use of $\mathcal{P}_{\theta}$ as a preconditioner appears to be excellent, considering that it is an approximation to $\mathcal{P}_{H}$ .

Table 1: Convergence results with MINRES for Example 1 when

\theta=1

(the backward Euler method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{5}$	$2^{5}$	30752	34	0.24	11	0.096	11	0.071
	$2^{6}$	127008	48	0.65	11	0.17	11	0.18
	$2^{7}$	516128	59	2.18	11	0.57	11	0.56
	$2^{8}$	2080800	82	19.56	11	3.27	11	3.27
$2^{6}$	$2^{5}$	61504	34	0.30	11	0.12	11	0.11
	$2^{6}$	254016	48	1.04	11	0.26	11	0.27
	$2^{7}$	1032256	72	6.81	11	1.35	13	1.48
	$2^{8}$	4161600	82	40.82	11	7.28	13	8.37
$2^{7}$	$2^{5}$	123008	34	0.49	13	0.21	13	0.22
	$2^{6}$	508032	48	1.92	13	0.62	13	0.61
	$2^{7}$	2064512	72	15.04	13	3.68	13	3.73
	$2^{8}$	8323200	79	81.81	13	17.19	13	17.32
$2^{8}$	$2^{5}$	246016	34	0.78	13	0.35	15	0.38
	$2^{6}$	1016064	48	4.62	13	1.61	15	1.78
	$2^{7}$	4129024	71	34.91	13	8.28	15	9.34
	$2^{8}$	16646400	79	157.74	14	35.95	15	38.32
\botrule

Table 2: Convergence results with MINRES for Example 1 when

\theta=0.5

(the Crank-Nicolson method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{5}$	$2^{5}$	30752	33	0.45	11	0.098	11	0.082
	$2^{6}$	127008	48	0.67	11	0.18	11	0.20
	$2^{7}$	516128	59	2.42	11	0.63	11	0.63
	$2^{8}$	2080800	82	18.83	11	3.44	11	3.49
$2^{6}$	$2^{5}$	61504	34	0.38	11	0.15	11	0.14
	$2^{6}$	254016	48	1.10	11	0.30	13	0.36
	$2^{7}$	1032256	73	7.28	11	1.44	13	1.51
	$2^{8}$	4161600	83	44.01	11	7.64	13	8.74
$2^{7}$	$2^{5}$	123008	34	0.68	13	0.23	13	0.24
	$2^{6}$	508032	48	2.20	13	0.71	13	0.67
	$2^{7}$	2064512	72	16.06	13	3.98	13	3.97
	$2^{8}$	8323200	80	87.92	13	18.48	13	19.00
$2^{8}$	$2^{5}$	246016	34	0.83	13	0.39	15	0.42
	$2^{6}$	1016064	48	5.02	13	1.73	15	1.97
	$2^{7}$	4129024	72	37.34	13	9.14	15	10.34
	$2^{8}$	16646400	79	165.53	14	37.91	15	40.49
\botrule

Example 2.

This example is a two-dimensional problem of solving (1) with $\Omega=(0,1)^{2}$ , $u_{0}(x,y)=x(1-x)y(1-y)$ , $a(x,y)=10^{-5}\sin(\pi xy)$ , $g=0,T=1$ and

f(x,y,t)=e^{-t}x(1-x)[2\times 10^{-5}\sin(\pi xy)-y(1-y)-\pi\times 10^{-5}\cos(\pi xy)x(1-2y)]\\ +e^{-t}y(1-y)[2\times 10^{-5}\sin(\pi xy)-\pi\times 10^{-5}\cos(\pi xy)y(1-2x)].

This example has the closed-form analytical solution as follows

\displaystyle u(x,y,t)=e^{-t}x(1-x)y(1-y).

Since the exact solution of this example is known, we can define the error estimate as follows

\displaystyle e_{n,m}=\|\mathbf{u}-\mathbf{u^{*}}\|_{\infty}

where $\mathbf{u}$ denotes the iterative solution of the linear system in (2), and $\mathbf{u^{*}}$ denotes the exact solution of the heat equation on the mesh.

Tables 3 and 4 show the convergence results of MINRES using the backward Euler method and the Crank-Nicolson method in time, respectively. Table 5 shows the corresponding error results of MINRES using different preconditioners. Again, similar to the last example, we observed that our proposed preconditioners $\mathcal{P}_{H}$ and $\mathcal{P}_{\theta}$ considerably surpass $\mathcal{C}_{H}$ in this variable-coefficient case.

Table 3: Convergence results with MINRES for Example 2 when

\theta=1

(the backward Euler method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{5}$	$2^{5}$	30752	107	0.63	11	0.084	11	0.084
	$2^{6}$	127008	141	1.95	11	0.17	12	0.16
	$2^{7}$	516128	218	7.91	11	0.57	13	0.64
	$2^{8}$	2080800	315	62.44	12	3.82	16	4.87
$2^{6}$	$2^{5}$	61504	106	1.00	11	0.13	13	0.15
	$2^{6}$	254016	154	3.20	11	0.25	13	0.31
	$2^{7}$	1032256	219	21.27	13	1.44	14	1.49
	$2^{8}$	4161600	307	155.67	13	8.18	17	10.69
$2^{7}$	$2^{5}$	123008	107	1.49	13	0.22	13	0.21
	$2^{6}$	508032	160	6.30	13	0.58	14	0.63
	$2^{7}$	2064512	218	47.22	13	3.44	15	4.55
	$2^{8}$	8323200	303	315.43	13	17.64	18	23.49
$2^{8}$	$2^{5}$	246016	118	2.50	14	0.35	15	0.36
	$2^{6}$	1016064	177	14.86	14	1.58	15	1.45
	$2^{7}$	4129024	220	111.71	14	10.00	17	11.22
	$2^{8}$	16646400	299	601.57	15	39.08	19	48.44
\botrule

Table 4: Convergence results with MINRES for Example 2 when

\theta=0.5

(the Crank-Nicolson method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{5}$	$2^{5}$	30752	106	0.61	11	0.085	11	0.076
	$2^{6}$	127008	141	1.94	11	0.19	12	0.18
	$2^{7}$	516128	216	8.75	11	0.70	13	0.70
	$2^{8}$	2080800	309	67.75	12	4.22	17	5.63
$2^{6}$	$2^{5}$	61504	106	0.96	11	0.12	13	0.14
	$2^{6}$	254016	154	3.38	11	0.27	13	0.32
	$2^{7}$	1032256	218	18.64	11	1.16	14	1.42
	$2^{8}$	4161600	304	164.44	13	9.68	17	11.38
$2^{7}$	$2^{5}$	123008	107	1.53	13	0.21	13	0.21
	$2^{6}$	508032	160	6.78	13	0.63	13	0.66
	$2^{7}$	2064512	218	48.20	13	4.12	15	4.63
	$2^{8}$	8323200	301	337.99	13	18.98	19	26.58
$2^{8}$	$2^{5}$	246016	117	2.63	14	0.36	15	0.38
	$2^{6}$	1016064	177	15.35	14	1.47	15	1.56
	$2^{7}$	4129024	221	114.99	14	9.39	17	11.16
	$2^{8}$	16646400	299	648.21	15	42.38	19	52.54
\botrule

Table 5: Error results with MINRES for Example 2

			Backward Euler			Crank-Nicolson
$n$	$m+1$	DoF	$\mathcal{C}_{H}$	$\mathcal{P}_{H}$	$\mathcal{P}_{\theta}$	$\mathcal{C}_{H}$	$\mathcal{P}_{H}$	$\mathcal{P}_{\theta}$
$2^{5}$	$2^{5}$	30752	6.14e-4	6.14e-4	6.14e-4	3.12e-6	3.12e-6	3.12e-6
	$2^{6}$	127008	6.14e-4	6.14e-4	6.14e-4	3.12e-6	3.12e-6	3.12e-6
	$2^{7}$	516128	6.14e-4	6.14e-4	6.14e-4	3.12e-6	3.12e-6	3.12e-6
	$2^{8}$	2080800	6.14e-4	6.14e-4	6.14e-4	3.12e-6	3.12e-6	3.12e-6
$2^{6}$	$2^{5}$	61504	3.08e-4	3.08e-4	3.08e-4	8.12e-7	7.99e-7	8.02e-7
	$2^{6}$	254016	3.08e-4	3.08e-4	3.08e-4	8.05e-7	8.01e-7	8.03e-7
	$2^{7}$	1032256	3.08e-4	3.08e-4	3.08e-4	8.15e-7	8.07e-7	8.01e-7
	$2^{8}$	4161600	3.08e-4	3.08e-4	3.08e-4	8.15e-7	8.02e-7	8.02e-7
$2^{7}$	$2^{5}$	123008	1.54e-4	1.54e-4	1.54e-4	2.38e-7	1.99e-7	2.04e-7
	$2^{6}$	508032	1.54e-4	1.54e-4	1.54e-4	4.08e-7	2.00e-7	2.00e-7
	$2^{7}$	2064512	1.54e-4	1.54e-4	1.54e-4	4.78e-7	2.03e-7	2.56e-7
	$2^{8}$	8323200	1.54e-4	1.54e-4	1.54e-4	4.10e-7	7.08e-7	5.87e-7
$2^{8}$	$2^{5}$	246016	7.71e-5	7.71e-5	7.71e-5	1.79e-7	5.79e-8	4.98e-8
	$2^{6}$	1016064	7.71e-5	7.71e-5	7.71e-5	6.37e-7	5.91e-8	6.93e-8
	$2^{7}$	4129024	7.71e-5	7.71e-5	7.71e-5	6.50e-7	1.53e-7	2.58e-7
	$2^{8}$	16646400	7.71e-5	7.71e-5	7.71e-5	3.76e-7	6.42e-7	7.60e-7
\botrule

Example 3.

The last example is a three-dimensional problem of solving (1) with $\Omega=(0,1)^{3}$ , $u_{0}(x,y)=x(x-1)y(y-1)z(z-1)$ , $a(x,y,z)=10^{-3}$ , $g=0$ , $f=0$ , and $T=1$ .

Tables 6 and 7 respectively show the convergence results of MINRES when the backward Euler method and the Crank-Nicolson method are used for time. Overall, the preconditioners $\mathcal{P}_{H}$ and $\mathcal{P}_{\theta}$ still perform better than $\mathcal{C}_{H}$ with regard to iteration numbers. In this relatively well-conditioned case, all preconditioned solvers appear to have comparable CPU times.

Table 6: Convergence results with MINRES for Example 3 when

\theta=1

(the backward Euler method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{3}$	$2^{3}$	2744	11	0.034	10	0.027	13	0.029
	$2^{4}$	27000	18	0.12	12	0.093	14	0.10
	$2^{5}$	238328	21	0.54	13	0.39	16	0.49
	$2^{6}$	2000376	21	5.91	13	5.06	16	6.07
$2^{4}$	$2^{3}$	5488	14	0.032	12	0.037	14	0.040
	$2^{4}$	54000	18	0.17	15	0.15	17	0.18
	$2^{5}$	476656	21	1.08	15	0.85	18	0.96
	$2^{6}$	4000752	24	15.77	17	13.32	18	14.01
$2^{5}$	$2^{3}$	10976	14	0.059	14	0.070	15	0.075
	$2^{4}$	108000	18	0.29	17	0.27	19	0.31
	$2^{5}$	953312	22	2.37	18	2.00	21	2.30
	$2^{6}$	8001504	24	32.41	19	30.75	22	36.00
$2^{6}$	$2^{3}$	21952	14	0.084	15	0.094	17	0.13
	$2^{4}$	216000	18	0.49	18	0.50	21	0.55
	$2^{5}$	1906624	22	6.13	21	7.05	24	7.94
	$2^{6}$	16003008	24	65.54	21	66.41	24	75.87
\botrule

Table 7: Convergence results with MINRES for Example 3 when

\theta=0.5

(the Crank-Nicolson method)

			$\mathcal{C}_{H}$		$\mathcal{P}_{H}$		$\mathcal{P}_{\theta}$
$n$	$m+1$	DoF	Iter	CPU	Iter	CPU	Iter	CPU
$2^{3}$	$2^{3}$	2744	14	0.040	10	0.034	13	0.030
	$2^{4}$	27000	18	0.10	13	0.095	15	0.11
	$2^{5}$	238328	21	0.61	13	0.44	17	0.55
	$2^{6}$	2000376	21	6.23	13	5.11	17	6.71
$2^{4}$	$2^{3}$	5488	14	0.034	12	0.037	15	0.048
	$2^{4}$	54000	18	0.20	15	0.19	17	0.18
	$2^{5}$	476656	21	1.10	15	0.90	20	1.16
	$2^{6}$	4000752	25	17.49	17	13.98	20	16.36
$2^{5}$	$2^{3}$	10976	14	0.066	14	0.064	16	0.081
	$2^{4}$	108000	18	0.30	17	0.30	19	0.35
	$2^{5}$	953312	23	3.03	18	2.86	22	3.46
	$2^{6}$	8001504	24	34.45	19	32.64	22	37.47
$2^{6}$	$2^{3}$	21952	14	0.10	15	0.095	17	0.11
	$2^{4}$	216000	18	0.51	18	0.50	22	0.61
	$2^{5}$	1906624	23	7.56	21	7.23	25	8.49
	$2^{6}$	16003008	24	69.35	21	69.73	26	85.77
\botrule

6 Conclusion

We have developed in this work a sine transform based preconditioning method which can be used for solving a wide range of time-dependent PDEs. Namely, our proposed preconditioners can be applied for a large class of symmetrized all-at-once system $\mathcal{Y}\mathcal{\widehat{T}}$ resulted from discretizing the underlying equation. The method is generic, and can be used with high-order discretization schemes for time.

We have adopted the Krylov subspace solver of MINRES by exploiting the symmetry of the permuted matrix. In particular, we emphasize that our sine transform based preconditioners can be seen as optimal or quasi-optimal preconditioners, which are both symmetric positive definite and efficiently implemented, according to the previous work on the spectral distribution of symmetrized Toeplitz matrix sequences MazzaPestana2018 ; Ferrari2019 . We observe that a quasi optimal behavior is obtained in the case of multilevel coefficient matrices, which is a good result given the theoretical barriers proved in SeTy2 for matrix algebra preconditioners with equimodular transforms like multilevel circulants (see also nega-gen for more discussion and more general results).

Also, specifically designed for MINRES, our preconditioners consistently outperform an existing block circulant preconditioner in both iteration counts and CPU times as shown in each numerical example. It is observed that they can achieve rapid convergence, especially in the ill-conditioned case. We stress that the severity of ill-conditioning of the original linear system mainly comes from both the spatial and temporal discretization schemes for the underlying equation. In addition to effective preconditioning, one remedy is to develop better discretization schemes in the first place for both space and time, which would be a potential research direction in the next stage. As long as the resulting linear systems from such discretization schemes are of lower triangular banded block Toeplitz structure, our proposed preconditioners can be incorporated. The required structure, mainly as a result of time discretization through an all-at-once process, is not difficult to obtain.

Our preconditioning technique (i.e., $\mathcal{P}_{H}$ ) utilizes the fast diagonalizability of the Laplacian matrix, which relies mainly on the uniform spatial grid setting. We however emphasize that such a uniform grid assumption is a common starting point in the all-at-once preconditioning context, as with many existing works such as LIN2021110221 ; WU2021110076 . Along this line, it would be a direction for future research to develop efficient preconditioning methods when the physical domain is irregular - in this setting the theory of generalized locally Toeplitz sequences could be exploited for the spectral and convergence analysis GLT-LAA2 ; Ba . Also, our proposed preconditioner $\mathcal{P}_{\theta}$ , which does not require such fast diagonalizability and has shown excellent preconditioning effect in the numerical tests, deserves further investigation.

For another future work, it would be interesting to combine the symmetrization MINRES solver with the block $\epsilon$ -circulant preconditioning technique doi:10.1137/20M1316354 ; doi:10.1137/19M1309869 ; WU2021110076 which has only been shown applicable for GMRES. Such a combination would further advance the pioneering block circulant preconditioning theory (cooperated with MINRES) originally proposed in doi:10.1137/16M1062016 , which is still at an early stage of development.

As already observed, a further interesting case is that of variable coefficients. When $a(x)$ is not constant or there is also a dependency on time, that is $a\equiv a(x,t)$ , the spectral analysis can be performed and the related solvers can be designed via multilevel GLT tools (see book-GLT-II ), by assuming that both $n$ and $m$ are allowed to tend to infinity. In this setting, instead of dealing with a matrix valued symbol (whose size is the fixed parameter $m$ ), we have $d$ further levels in the structure, where $d$ is the spatial dimensionality of the problem, but with scalar-valued symbols. Of course, this setting is a challenge (see SeTy2 ; nega-gen and references therein) to be considered in future work, even though the numerical tests considered in this work seem promising in this direction.

Declarations

Ethical Approval and Consent to participate

Not applicable.

Consent for publication

The authors consent to publication of this work in Numerical Algorithms of Springer Nature, when accepted.

Human and Animal Ethics

Not applicable.

Availability of supporting data

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Competing interests

The authors declare no competing interests.

Funding

The work of S. Hon was supported in part by the Hong Kong RGC under grant 22300921, a start-up allowance from the Croucher Foundation, and a Tier 2 Start-up Grant from Hong Kong Baptist University. The work of S. Serra-Capizzano was supported in part by INDAM-GNCS. Furthermore, the work of Stefano Serra-Capizzano wss funded from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955701. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Belgium, France, Germany, Switzerland. Stefano Serra-Capizzano is also grateful for the support of the Laboratory of Theory, Economics and Systems – Department of Computer Science at Athens University of Economics and Business.

Authors’ contributions

S. Hon developed the methodology, wrote the main manuscript text, and conducted the main numerical experiments. P. Y. Fung improved the numerical experiments and modified the manuscript text. J. Dong modified the manuscript text. S. Serra-Capizzano validated the methodology and modified the manuscript text. All authors reviewed the manuscript.

References

(1) Barakitis, N., Ekstrom, S. E., Vassalos, P.: Preconditioners for fractional diffusion equations based on the spectral symbol, Numerical Linear Algebra with Applications, e2441 (2022) https://doi.org/10.1002/nla.2441
(2) Barbarino, G.: A systematic approach to reduced GLT. BIT Numerical Mathematics. 1–63 (2021) https://doi.org/10.1007/s10543-021-00896-7
(3) Benzi, M., Golub, G. H.: Bounds for the entries of matrix functions with applications to preconditioning. BIT Numerical Mathematics. 39(3), 417–438 (1999) https://link.springer.com/article/10.1023/A:1022362401426
(4) Bertaccini, D.: Reliable preconditioned iterative linear solvers for some integrators. Numerical Linear Algebra with Applications. 8(2), 111–125 (2001) https://doi.org/10.1002/1099-1506(200103)8:2<111::AID-NLA234>3.0.CO;2-Q
(5) Bertaccini, D., Ng, M. K.: Block $\omega$ -circulant preconditioners for the systems of differential equations. Calcolo. 40(2), 71–90 (2003) https://link.springer.com/article/10.1007/s100920300004
(6) Bini, D., Benedetto, F.: A new preconditioner for the parallel solution of positive definite Toeplitz systems. in Proc. Second ACM Symp. on Parallel Algorithms and Architectures, pp. 220–223 (1990). https://doi.org/10.1145/97444.97688
(7) Bini, D., Capovani, M.: Spectral and computational properties of band symmetric Toeplitz matrices. Linear Algebra and Its Applications. 52, 99–126 (1983) https://doi.org/10.1016/0024-3795(83)90009-5
(8) Brandts, J. H., da Silva, R. R.: Computable eigenvalue bounds for rank- $k$ perturbations. Linear Algebra and Its Applications. 432(12), 3100–3116 (2010) https://doi.org/10.1016/j.laa.2010.02.010
(9) Chan, R. H., Ng, M. K.: Conjugate gradient methods for Toeplitz systems. SIAM Review. 38(3), 427–482 (1996) https://doi.org/10.1137/s0036144594276474
(10) Chan, T. F.: An optimal circulant preconditioner for Toeplitz systems. SIAM Journal on Scientific and Statistical Computing. 9(4), 766–771 (1988) https://doi.org/10.1137/0909051
(11) Cocquet, P. H., Gander, M. J.: How large a shift is needed in the shifted Helmholtz preconditioner for its effective inversion by multigrid?. SIAM Journal on Scientific and Statistical Computing, 39(2), A438–A478 (2017) https://doi.org/10.1137/15M102085X
(12) Danieli, F., Wathen, A. J.: All-at-once solution of linear wave equations. Numerical Linear Algebra with Applications. e2386, (2021) https://doi.org/10.1002/nla.2386
(13) Di Benedetto, F.: Analysis of preconditioning techniques for ill-conditioned Toeplitz matrices. SIAM Journal on Scientific Computing. 16(3), 682–697 (1995) https://doi.org/10.1137/0916041
(14) Di Benedetto, F.: Solution of Toeplitz normal equations by sine transform based preconditioning. Linear Algebra and Its Applications. 285(1-3), 229–255 (1998) https://doi.org/10.1016/s0024-3795(98)10115-5
(15) Di Benedetto, F., Serra-Capizzano, S.: A unifying approach to abstract matrix algebra preconditioning. Numerische Mathematik. 82, 57–90 (1999) https://doi.org/10.1007/s002110050411
(16) Dobrev, V. A., Kolev, Tz., Petersson, N.A., Schroder J.B.: Two-level convergence theory for multigrid reduction in time (MGRIT). SIAM Journal on Scientific Computing. 39(5), S501–S527 (2017) https://doi.org/10.1137/16m1074096
(17) Donatelli, M., Mazza, M., Serra-Capizzano, S.: Spectral analysis and structure preserving preconditioners for fractional diffusion equations. Journal of Computational Physics. 307, 262–279 (2016) https://doi.org/10.1016/j.jcp.2015.11.061
(18) Falgout, R. D., Friedhoff, S., Kolev, T. V., MacLachlan, S. P., Schroder J. B.: Parallel time integration with multigrid. SIAM Journal on Scientific Computing. 36(6), C635–C661 (2014) https://doi.org/10.1002/pamm.201410456
(19) Ferrari, P., Furci, I., Hon, S., Mursaleen, M. A., Serra-Capizzano, S.: The eigenvalue distribution of special 2-by-2 block matrix-sequences with applications to the case of symmetrized Toeplitz structures. SIAM Journal on Matrix Analysis and Applications. 40(3), 1066–1086 (2019) https://doi.org/10.1137/18m1207399
(20) Gander, M.J.: 50 years of time parallel time integration. Multiple Shooting and Time Domain Decomposition Methods. 69–113. Springer, Cham (2015). Series Editors: T. Carraro, M. Geiger, S. Körkel, R. Rannacher https://doi.org/10.1007/978-3-319-23321-5_3
(21) Gander, M. J., Halpern, L., Ryan, J., Tran, T. T. B.: A direct solver for time parallelization. Domain Decomposition Methods in Science and Engineering XXII. 491–499. Springer, Cham (2016). Series Editors: T, Dickopf, M, J. Gander, L. Halpern, R. Krause, L.F. Pavarino https://doi.org/10.1007/978-3-319-18827-0_50
(22) Gander, M. J., Neumüller, M.: Analysis of a new space-time parallel multigrid algorithm for parabolic problems. SIAM Journal on Scientific Computing. 38(4) A2173–A2208 (2016) https://doi.org/10.1137/15m1046605
(23) Gander, M. J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method. SIAM Journal on Scientific Computing. 29(2) 556–578 (2007) https://doi.org/10.1137/05064607x
(24) Garoni, C., Serra-Capizzano, S.: Generalized locally Toeplitz sequences: Theory and applications. Volume 1, Springer, Cham (2017) https://doi.org/10.1007/978-3-319-53679-8_8
(25) Garoni, C., Serra-Capizzano, S.: Generalized locally Toeplitz sequences: Theory and applications. Volume 2, Springer, Cham (2018) https://doi.org/10.1007/978-3-030-02233-4
(26) Goddard, A., Wathen, A.: A note on parallel preconditioning for all-at-once evolutionary PDEs. Electronic Transactions on Numerical Analysis. 51, 135–150 (2019) https://doi.org/10.1553/etna_vol51s135
(27) Hon, S.: Optimal block circulant preconditioners for block Toeplitz systems with application to evolutionary PDEs. Journal of Computational and Applied Mathematics. 407, 113965 (2021). https://doi.org/10.1016/j.cam.2021.113965
(28) Hon, S., Serra-Capizzano, S., Wathen, A.: Band-Toeplitz preconditioners for ill-conditioned Toeplitz systems. BIT Numerical Mathematics. (2021) https://doi.org/10.1007/s10543-021-00889-6
(29) Horton, G., Vandewalle, S.: A space-time multigrid method for parabolic partial differential Equations. SIAM Journal on Scientific Computing. 16(4), 848–864 (1995). https://doi.org/10.1137/0916050
(30) Lin, X. L., Ng, M. K.: An all-at-once preconditioner for evolutionary partial differential equations. SIAM Journal on Scientific Computing. 43(4), A2766–A2784 (2021) https://doi.org/10.1137/20m1316354
(31) Lin, X. L., Ng, M. K., Zhi, Y.: A parallel-in-time two-sided preconditioning for all-at-once system from a non-local evolutionary equation with weakly singular kernel. Journal of Computational Physics. 434, 110221 (2021) https://doi.org/10.1016/j.jcp.2021.110221
(32) Lions, J. L., Maday, Y., Turinici, G.: A "parareal" in time discretization of PDE’s. Comptes rendus de l’Académie des sciences. Série 1, Mathématique. 332(7), 661–668 (2001)
(33) Liu, J., Wu, S. L.: A fast block $\alpha$ -circulant preconditioner for all-at-once systems from wave equations. SIAM Journal on Matrix Analysis and Applications. 41(4), 1912–1943 (2021) https://doi.org/10.1137/19m1309869
(34) Mazza, M., Pestana, J.: Spectral properties of flipped Toeplitz matrices and related preconditioning. BIT Numerical Mathematics. 59, 463–482 (2018) https://doi.org/10.1007/s10543-018-0740-y
(35) McDonald, E., Hon, S., Pestana, J., Wathen, A.: Preconditioning for nonsymmetry and time-dependence. In Lecture Notes in Computational Science and Engineering. 116, 81–91 (2017). Springer International Publishing. https://doi.org/10.1007/978-3-319-52389-7_7
(36) McDonald, E., Pestana, J., Wathen, A.: Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations, SIAM Journal on Scientific Computing, 40(2) A1012–A1033 (2018) https://doi.org/10.1137/16m1062016
(37) Ng, M. K.: Iterative Methods for Toeplitz Systems. Numerical Mathematics and Scientific Computation. Oxford University Press, New York (2004)
(38) Ng, M. K., Pan, J.: Approximate inverse circulant-plus-diagonal preconditioners for Toeplitz-plus-diagonal matrices. SIAM Journal on Scientific Computing, 32(3), 1442–1464 (2010) https://doi.org/10.1137/080720280
(39) Serra-Capizzano, S.: The GLT class as a generalized Fourier analysis and applications. Linear Algebra and Its Applications. 419(1), 180–233 (2006) https://doi.org/10.1016/j.laa.2006.04.012
(40) Serra-Capizzano, S.: Toeplitz preconditioners constructed from linear approximation processes. SIAM Journal on Matrix Analysis and Applications. 20(2), 446–465 (1999) https://doi.org/10.1137/s0895479897316904
(41) Serra-Capizzano, S.: Matrix algebra preconditioners for multilevel Toeplitz matrices are not superlinear. Special issue on structured and infinite systems of linear equations. Linear Algebra and Its Applications. 343/344, 303–319 (2002) https://dx.doi.org/10.1016/S0024-3795(01)00361-5
(42) Serra-Capizzano, S., Tilli, P.: On unitarily invariant norms of matrix-valued linear positive operators. Journal of Inequalities and Applications. 7(3), 309–330 (2002) https://doi.org/10.1155/s1025583402000152
(43) Serra-Capizzano, S., Tyrtyshnikov., E.: How to prove that a preconditioner cannot be superlinear, Mathematics of Computation, 72(243), 1305–1316 (2003) https://doi.org/10.1090/s0025-5718-03-01506-0
(44) Strang, G.: A Proposal for Toeplitz Matrix Calculations, Studies in Applied Mathematics, 74(2), 171–176 (1986) https://doi.org/10.1002/sapm1986742171
(45) Wathen, A.: Preconditioning. Acta Numerica. 24, 329–376 (2015) https://doi.org/10.1017/s0962492915000021
(46) Wu, S. L., Zhou, T.: Parallel implementation for the two-stage SDIRK methods via diagonalization. Journal of Computational Physics. 428, 110076 (2021) https://doi.org/10.1016/j.jcp.2020.110076

	$\displaystyle\\|\left(\sqrt{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}\right)^{-1}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\\|_{2}$	$\displaystyle=\|(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})^{-\frac{1}{2}}-p_{K}(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})\\|_{2}$
		$\displaystyle=\max_{x\in\sigma(\mathcal{T}_{H}^{\top}\mathcal{T}_{H})}\|F(x)-p_{K}(x)\|$
		$\displaystyle\leq\\|F-p_{K}(x)\\|_{\infty}$
		$\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}})}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{T}_{H}^{\top}\mathcal{T}_{H}}^{K}},$

	$\displaystyle\\|\mathcal{P}_{H}^{-1}-p_{K}(\mathcal{P}_{H}^{2})\\|_{2}$	$\displaystyle=\\|(\mathcal{P}_{H}^{2})^{-\frac{1}{2}}-p_{K}(\mathcal{P}_{H}^{2})\\|_{2}$
		$\displaystyle=\max_{x\in\sigma(\mathcal{P}_{H}^{2})}\|F(x)-p_{K}(x)\|$
		$\displaystyle\leq\\|F-p_{K}(x)\\|_{\infty}$
		$\displaystyle\leq\frac{2M(\mathcal{X}_{\mathcal{P}_{H}^{2}})}{\mathcal{X}_{\mathcal{P}_{H}^{2}}-1}\cdot\frac{1}{\mathcal{X}_{\mathcal{P}_{H}^{2}}^{K}},$

A sine transform based preconditioned MINRES method for all-at-once systems from constant and variable-coefficient evolutionary PDEs

Abstract

keywords:

pacs:

1 Introduction

2 Preliminaries

2.1 Block Toeplitz matrices

Definition 1.

Theorem 1.

2.2 Matrix functions

Theorem 2 (Bernstein’s theorem).

3 Main results

3.1 Preconditioning for heat equations

3.1.1 Implementations

3.1.2 Eigenvalue analysis of 𝒫H−1​𝒴​𝒯H\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}

Proposition 3.

Lemma 4.

Remark 1.

Proposition 5.

Theorem 6.

3.1.3 Eigenvalue analysis of (𝒫H−1​𝒯H)T​𝒫H−1​𝒯H(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}

Theorem 7.

3.1.4 Modified Preconditioner

4 Extension to multistep methods

4.1 Eigenvalue analysis of 𝒫^−1​𝒴​𝒯^\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}}

Proposition 8.

Lemma 9.

Proposition 10.

Theorem 11.

5 Numerical examples

Example 1.

Example 2.

Example 3.

6 Conclusion

Declarations

Ethical Approval and Consent to participate

Consent for publication

Human and Animal Ethics

Availability of supporting data

Competing interests

Funding

Authors’ contributions

References

3.1.2 Eigenvalue analysis of $\mathcal{P}_{H}^{-1}\mathcal{Y}\mathcal{T}_{H}$

3.1.3 Eigenvalue analysis of $(\mathcal{P}_{H}^{-1}\mathcal{T}_{H})^{T}\mathcal{P}_{H}^{-1}\mathcal{T}_{H}$

4.1 Eigenvalue analysis of $\widehat{\mathcal{P}}^{-1}\mathcal{Y}\widehat{\mathcal{T}}$