A note on parallel preconditioning for the all-at-once solution of Riesz fractional diffusion equations

Xian-Ming Gu [email protected] Yong-Liang Zhao [email protected] Xi-Le Zhao [email protected] Bruno Carpentieri [email protected] Yu-Yun Huang [email protected] School of Economic Mathematics/Institute of Mathematics,
Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, P.R. China Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence,
University of Groningen, Nijenborgh 9, P.O. Box 407, 9700 AK Groningen, The Netherlands School of Mathematical Sciences,
University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P.R. China Facoltà di Scienze e Tecnologie informatiche,
Libera Università di Bolzano, Dominikanerplatz 3 - piazza Domenicani, 3 Italy - 39100, Bozen-Bolzano.

Abstract

The $p$ -step backward difference formula (BDF) for solving systems of ODEs can be formulated as all-at-once linear systems that are solved by parallel-in-time preconditioned Krylov subspace solvers (see McDonald, Pestana, and Wathen [SIAM J. Sci. Comput., 40(2) (2018): A1012-A1033] and Lin and Ng [arXiv:2002.01108, 2020, 17 pages]). However, when the BDF $p$ ( $2\leq p\leq 6$ ) method is used to solve time-dependent PDEs, the generalization of these studies is not straightforward as $p$ -step BDF is not selfstarting for $p\geq 2$ . In this note, we focus on the 2-step BDF which is often superior to the trapezoidal rule for solving the Riesz fractional diffusion equations, and show that it results into an all-at-once discretized system that is a low-rank perturbation of a block triangular Toeplitz system. We first give an estimation of the condition number of the all-at-once systems and then, capitalizing on previous work, we propose two block circulant (BC) preconditioners. Both the invertibility of these two BC preconditioners and the eigenvalue distributions of preconditioned matrices are discussed in details. An efficient implementation of these BC preconditioners is also presented, including the fast computation of dense structured Jacobi matrices. Finally, numerical experiments involving both the one- and two-dimensional Riesz fractional diffusion equations are reported to support our theoretical findings.

keywords:

Backwards difference formula, all-at-once discretization, parallel-in-time preconditioning, Krylov subsp-ace solver, fractional diffusion equation.

^†^†journal: Numerical Mathematics: Theory, Methods and Applications

1 Introduction

In this paper, we are particularly interested in the efficient numerical solution of evolutionary partial differential equations (PDEs) with both first order temporal derivative and space fractional-order derivative(s). These models arise in various scientific applications in different fields including physics [1], bioengineering [2], hydrology [3], and finance [4], etc., owing to the potential of fractional calculus to describe rather accurately natural processes which maintain long-memory and hereditary properties in complex systems [5, 2]. In particular, fractional diffusion equations can provide an adequate and accurate description of transport processes that exhibit anomalous diffusion, for example subdiffusive phenomena and Lévy fights [6], which cannot be modelled properly by second-order diffusion equations. As most fractional diffusion equations can not be solved analytically, approximate numerical solutions are sought by using efficient numerical methods such as, e.g., (compact) finite difference [3, 8, 7, 9, 10, 11], finite element [12] and spectral (Galerkin) methods [13, 14, 15].

Many numerical techniques proposed in the literature for solving this class of problems are the common time-stepping schemes. They solve the underlying evolutionary PDEs with space fractional derivative(s) by marching in time sequentially, one level after the other. As many time steps may be usually necessary to balance the numerical errors arising from the spatial discretization, these conventional time-stepping schemes can be very time-consuming. This concern motivates the recent development of parallel-in-time (PinT) numerical solutions for evolutionary PDEs (especially with space fractional derivative(s)) including, e.g., the inverse Laplace transform method [16], the MGRIT method [17, 18, 12, 19], the exponential integrator [20] and the parareal method [21]. A class of PinT methods, i.e., the space-time methods, solves the evolutionary PDEs at all the time levels simultaneously by performing an all-at-once discretization that results into a large-scale linear system that is typically solved by preconditioned Krylov subspace methods; refer e.g., to [22, 24, 29, 27, 23, 25, 28, 26, 30, 31] for details. However, most of them only focus on the numerical solutions of one-dimensional space fractional diffusion equations [23, 25, 30, 32] due to the huge computational cost required for high-dimensional problems.

Recently, McDonald, Pestana and Wathen proposed in [33] a block circulant (BC) preconditioner to accelerate the convergence of Krylov subspace methods for solving the all-at-once linear system arising from $p$ -step BDF temporal discretization of evolutionary PDEs. Parallel experiments with the BC preconditioner in [33] are reported by Goddard and Wathen in [34]. In [35], a generalized version of the BC preconditioner has been proposed by Lin and Ng who introduced a parameter $\alpha\in(0,1)$ into the top-right block of the BC preconditioner that can be fine-tuned to handle efficiently the case of very small diffusion coefficients. Both the BC and the generalized BC preconditioners use a modified diagonalization technique that is originally proposed by Maday and Rønquist [22, 24]. The investigations in [33, 34, 35, 36] mainly focus on the 1-step BDF (i.e. the backward Euler method) for solving the underlying evolutionary PDEs, which results in an exact block lower triangular Toeplitz (BLTT) all-at-once system. On the other hand, when the BDF $p$ ( $2\leq p\leq 6$ ) method is used to discretize the evolutionary PDEs, the complete BLTT all-at-once systems cannot be obtained as implicit schemes based on the BDF $p$ ( $2\leq p\leq 6$ ) for evolutionary PDEs are not selfstarting [13, 39, 40, 37, 41]. For example, when we establish the fully discretized scheme based on popular BDF2 for evolutionary PDEs, we often need to use the backward Euler method to compute the solution at the first time level [42, 43, 44, 10].

In this study, we particularly consider the second-order accurate implicit difference BDF2 scheme for solving the one- and two-dimensional Riesz fractional diffusion equations (RFDEs). Although the Crank-Nicolson (C-N) method [8] is a very popular solution option for solving such RFDEs, it is still only $A$ -stable rather than $L$ -stable [40, 45]. By contrast, the BDF2 scheme with stiff decay can be more “stable” and slightly cheaper because it is always unconditionally stable and the numerical solution is often guaranteed to be positive and physically more reliable near initial time for the numerical solutions of evolutionary PDEs with either integral or fractional order spatial derivatives; see [10, 37, 42, 46] for details. After the spatial discretization of Riesz fractional derivative(s), we reformulate the BDF2 scheme for the semi-discretized system of ODEs into an all-at-once system, where its coefficient matrix is a BLTT matrix with a low-rank perturbation. Then, we tightly estimate the condition number of the all-at-once systems and adapt the generalized BC (also including the standard BC) preconditioner for such an all-at-once system. Meanwhile, the invertibility of the generalized BC preconditioner is discussed apart from the work in [33, 35], and the eigenvalue distributions of preconditioned matrices dictating the convergence rate of Krylov subspace methods is preliminarily investigated. From these discussions, we derive clear arguments explaining the better performance of the generalized BC preconditioner against the BC preconditioner, especially for very small diffusion coefficient.

The practical implementation of the generalized BC preconditioner requires to solve a sequence of dense complex-shifted linear systems with real symmetric negative definite (block) Toeplitz coefficient matrices, which arise from the numerical discretization of Riesz fractional derivatives. Because solving such Toeplitz systems is often very time-consuming, classical circulant preconditioners have been adapted to accelerate the convergence of iterative solutions, especially for level-1 Toeplitz systems. However, circulant preconditioners are often less efficient to extend for solving high-dimensional model problems (i.e., block Toeplitz discretized systems) [28, 48]. In this work, by approximating the real symmetric (block) Toeplitz matrix by the (block) $\tau$ -matrix [47, 48], that can be efficiently diagonalized using the discrete sine transforms (DSTs), we present an efficient implementation for the generalized BC preconditioner that does avoid any dense matrix storage and only needs to solve the first half of the sequence of complex-shifted systems. Estimates of the computational cost and of the memory requirements of the associated generalized BC preconditioner are given. Our numerical experiments suggest that the BDF2 all-at-once system utilized the preconditioned Krylov subspace solvers can be a competitive solution method for RFDEs.

The rest of this paper is organized as follows. In the next section, the all-at-once linear system derived from the BDF2 scheme for solving the RFDEs is presented. Meanwhile, the invertibility of the pertinent all-at-once system is proved and its condition number is estimated. In Section 3, the generalized BC preconditioner is adapted and discussed, both the properties and the efficient implementation of the preconditioned system are analyzed. In Section 4, numerical results are reported to support our findings and the effectiveness of the proposed preconditioner. Finally, the paper closes with conclusions in Section 5.

2 The all-at-once system of Riesz fractional diffusion equations

In this section, we present the development of a numerical scheme for initial-boundary problem of Riesz fractional diffusion equations that preserves the positivity of the solutions. Then, in the next section, we discuss its efficient parallel implementation.

2.1 The all-at-once discretization of Riesz fractional diffusion equation

The governing Riesz fractional diffusion equations ¹¹1Here it is worth noting that if the Riesz fractional derivative is replaced with the fractional Laplacian, we can just follow the work in [50] for the spatial discretization of one- or two-dimensional model problem and our proposed numerical techniques in the next sections are easy and simple to be adapted for such an extension. of the anomalous diffusion process can be written as

\begin{cases}\frac{\partial u(x,t)}{\partial t}=\kappa_{\gamma}\frac{\partial^{\gamma}u(x,t)}{\partial|x|^{\gamma}}+f(x,t),&(x,t)\in(a,b)\times(0,T],\\ u(x,0)=\phi(x),&x\in[a,b],\\ u(a,t)=u(b,t)=0,&t\in[0,T],\end{cases}

(2.1)

where $u(x,t)$ may represent, for example, a solute concentration, and constant $\kappa_{\gamma}>0$ the diffusion coefficient. This equations is a superdiffusive model largely used in fluid flow analysis, financial modelling and other applications. The Riesz fractional derivative $\partial^{\gamma}u(x,t)/\partial|x|^{\gamma}$ is defined by [51]

\begin{split}\frac{\partial^{\gamma}u(x,t)}{\partial|x|^{\gamma}}&=-\frac{1}{2\cos\left(\frac{\pi\gamma}{2}\right)}\cdot\frac{1}{\Gamma(2-\gamma)}\cdot\frac{\partial^{2}}{\partial x^{2}}\int^{b}_{a}\frac{u(\xi,t)}{|x-\xi|^{\gamma-1}}d\xi\\ &=-\frac{1}{2\cos\left(\frac{\pi\gamma}{2}\right)}[{}_{a}D^{\gamma}_{x}u(x,t)+{}_{x}D^{\gamma}_{b}u(x,t)],\quad\gamma\in(1,2),\end{split}

(2.2)

in which ${}_{a}D^{\gamma}_{x}$ and ${}_{x}D^{\gamma}_{b}$ are the left and right Riemann-Liouville fractional derivatives of order $\gamma\in(1,2)$ given, respectively, by [5]

{}_{a}D^{\gamma}_{x}u(x,t)=\frac{1}{\Gamma(2-\gamma)}\frac{\partial^{2}}{\partial x^{2}}\int^{x}_{a}\frac{u(s,t)}{(x-s)^{\gamma-1}}ds,

and

{}_{x}D^{\gamma}_{b}u(x,t)=\frac{1}{\Gamma(2-\gamma)}\frac{\partial^{2}}{\partial x^{2}}\int^{b}_{x}\frac{u(s,t)}{(s-x)^{\gamma-1}}ds.

As $\gamma\rightarrow 2$ , it notes that Eq. (2.1) degenerates into the classical diffusion equation [51].

Next, we focus on the numerical solution of Eq. (2.1). We consider a rectangle $\bar{Q}_{T}=\{(x,t):~{}a\leq x\leq b,~{}0\leq t\leq T\}$ discretized on the mesh $\varpi_{h\tau}=\varpi_{h}\times\varpi$ , where $\varpi_{h}=\{x_{i}=a+ih,~{}i=0,1,\cdots,N;~{}h=(b-a)/N\}$ , and $\varpi_{\tau}=\{t_{k}=k\tau,~{}j=0,1,\cdots,M;~{}\tau=T/M\}$ . We denote by $v=\{v_{i}~{}|~{}0\leq i\leq N\}$ any grid function.

A considerable amount of work has been devoted in the past years to the development of fast methods for the approximation of the Riesz and Riemann-Liouville (R-L) fractional derivatives, such as the first-order accurate shifted Grünwald approximation [3, 51] and the second-order accurate weighted-shifted Grünwald difference (WSGD) approximation [9, 52]. Due to the relationship between the two kinds of fractional derivatives, all the numerical schemes proposed for the R-L fractional derivatives can be easily adapted to approximate the Riesz fractional derivative. Although the solution approach described in this work can accomodate any spatial discretized method, we choose the so-called fractional centred difference formula [8] of the Riesz fractional derivatives for clarity.

For any function $u(x)\in L^{1}(\mathbb{R})$ , we denote

\Delta^{\gamma}_{h}u(x)=-\frac{1}{h^{\gamma}}\sum^{[(x-a)/h]}_{\ell=-[(b-x)/h]}\omega^{(\gamma)}_{\ell}u(x-\ell h),\quad x\in\mathbb{R},

(2.3)

where the $\gamma$ -dependent weight coefficient is defined as

\omega^{(\gamma)}_{\ell}=\frac{(-1)^{\ell}\Gamma(1+\gamma)}{\Gamma(1+\gamma/2-\ell)\Gamma(1+\gamma/2+\ell)},\quad\ell\in\mathbb{Z}.

(2.4)

As noted in [8], $\omega^{(\gamma)}_{\ell}=\mathcal{O}(\ell^{-1-\gamma})$ and the fractional centred difference formula $\Delta^{\gamma}_{h}u(x)$ exists for any $u(x)\in L^{1}(\mathbb{R})$ . Some properties of the coefficient $\omega^{(\gamma)}_{\ell}$ and the operator $\Delta^{\gamma}_{h}u(x)$ are presented in the following lemmas.

Lemma 2.1.

([8]) For $\gamma\in(1,2)$ , the coefficient $\omega^{(\gamma)}_{\ell}$ , $\ell\in\mathbb{Z}$ , defined in (2.4) fulfils

\begin{cases}\omega^{(\gamma)}_{0}\geq 0,\quad\omega^{(\gamma)}_{-\ell}=\omega^{(\gamma)}_{\ell},\quad|\ell|\geq 1,\\ \sum\limits^{\infty}_{\ell=-\infty}\omega^{(\gamma)}_{\ell}=0,\quad\omega^{(\gamma)}_{0}=\sum\limits^{-1}_{\ell=-\infty}|\omega^{(\gamma)}_{\ell}|+\sum\limits^{\infty}_{\ell=1}|\omega^{(\gamma)}_{\ell}|.\end{cases}

(2.5)

Lemma 2.2.

([8, 50]) Suppose that $u\in L^{1}(\mathbb{R})$ and

u(x)\in\mathscr{C}^{2+\gamma}(\mathbb{R}):=\bigg{\{}u\bigg{|}\int\nolimits_{-\infty}^{+\infty}(1+|\xi|)^{2+\gamma}|\hat{u}(\xi)|d\xi<\infty\bigg{\}},

where $\hat{u}(\xi)$ is the Fourier transformation of $u(x)$ . Then for a fixed $h$ , the fractional centred difference operator in (2.3) holds

\frac{\partial^{\gamma}u(x)}{\partial|x|^{\gamma}}=\Delta^{\gamma}_{h}u(x)+\mathcal{O}(h^{2})

uniformly for $x\in\mathbb{R}$ and $u(x)\equiv 0$ ( $x\in\mathbb{R}\backslash[a,b]$ ). In particular, if $\gamma=2$ , then it coincides with the second-order derivative approximation.

At this stage, let $u(x,t)\in\mathcal{C}^{4,3}_{x,t}([a,b]\times[0,T])$ be a solution to the problem (2.1) and consider Eq. (2.1) at the set of grid points $(x,t)=(x_{i},t_{k})\in\bar{Q}_{T}$ with $i=1,2,\cdots,N-1,~{}k=1,\cdots,M$ . The first-order time derivative at the point $t=t_{k}$ is approximated by the second-order backward difference formula, i.e.,

\frac{\partial u(x,t)}{\partial t}\Big{|}_{t=t_{k}}=\frac{u(x,t_{k})-4u(x,t_{k-1})+3u(x,t_{k-2})}{2\tau}+\mathcal{O}(\tau^{2}),\quad k\geq 2,

(2.6)

then we define $U^{k}_{i}=u(x_{i},t_{k})$ and $f^{k}_{i}=f(x_{i},t_{k})$ to obtain

\begin{cases}\frac{U^{k}_{i}-4U^{k-1}_{i}+3U^{k-2}_{i}}{2\tau}=\kappa_{\gamma}\Delta^{\gamma}_{h}U^{k}_{i}+f^{k}_{i}+R^{k}_{i},&1\leq i\leq N-1,~{}2\leq k\leq M,\\ U^{0}_{i}=\phi(x_{i}),&1\leq i\leq N-1,\\ U^{k}_{0}=U^{k}_{N}=0,&0\leq k\leq M,\end{cases}

(2.7)

where $\{R^{k}_{i}\}$ are small and satisfy the inequality

|R^{k}_{i}|\leq c(\tau^{2}+h^{2}),\quad 1\leq i\leq N-1,~{}2\leq k\leq M.

Note that we cannot omit the small term in the derivation of a two-level difference scheme that is not selfstarting from Eq. (2.7) due to the unknown information of $u(x_{i},t_{1})$ . One of the most popular strategies to compute the first time step solution $u^{1}_{i}$ is to use a first-order backward Euler scheme (which is only used once). This yields the following implicit difference scheme [37, 10] for Eq. (2.1):

\begin{cases}D^{2}_{t}u^{k}_{i}=\kappa_{\gamma}\Delta^{\gamma}_{h}u^{k}_{i}+f^{k}_{i},&1\leq i\leq N-1,~{}2\leq k\leq M,\\ u^{0}_{i}=\phi(x_{i}),&1\leq i\leq N-1,\\ u^{k}_{0}=u^{k}_{N}=0,&0\leq k\leq M,\end{cases}

(2.8)

where

D^{2}_{t}u^{k}_{i}=\begin{cases}\frac{u^{k}_{i}-4u^{k-1}_{i}+3u^{k-2}_{i}}{2\tau},&2\leq k\leq M,\\ \frac{u^{1}_{i}-u^{0}_{i}}{\tau},&k=1.\end{cases}

In order to implement the proposed scheme, here we define ${\bm{u}}^{k}=[u(x_{1},t_{k}),u(x_{2},t_{k}),\cdots,u(x_{N-1},t_{k})]^{\top}$ and ${\bm{f}}^{k}=[f(x_{1},t_{k}),f(x_{2},t_{k}),\cdots,f(x_{N-1},t_{k})]^{\top}$ ; then $\kappa_{\gamma}\Delta^{\gamma}_{h}u^{k}_{i}$ can be written into the matrix-vector product form $A{\bm{u}}^{k}$ , where

A=-\kappa_{\gamma}T_{x}=-\frac{\kappa_{\gamma}}{h^{\gamma}}\begin{bmatrix}\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}&\omega^{(\gamma)}_{-2}&\cdots&\omega^{(\gamma)}_{3-N}&\omega^{(\gamma)}_{2-N}\\ \omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}&\cdots&\omega^{(\gamma)}_{4-N}&\omega^{(\gamma)}_{3-N}\\ \omega^{(\gamma)}_{2}&\omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}&\cdots&\omega^{(\gamma)}_{5-N}&\omega^{(\gamma)}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma)}_{N-3}&\omega^{(\gamma)}_{N-4}&\omega^{(\gamma)}_{N-5}&\cdots&\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}\\ \omega^{(\gamma)}_{N-2}&\omega^{(\gamma)}_{N-3}&\omega^{(\gamma)}_{N-4}&\cdots&\omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}\end{bmatrix}\in\mathbb{R}^{(N-1)\times(N-1)}.

(2.9)

It is easy to prove that $T_{x}$ is a symmetric positive definite (SPD) Toeplitz matrix (see [8]). Therefore, it can be stored with only $N-1$ entries and the fast Fourier transforms (FFTs) can be applied to carry out the matrix-vector product in $\mathcal{O}(N\log N)$ operations. The matrix-vector form of the 2-step BDF method with start-up backward Euler method for solving the model problem (2.1) can be formulated as follows:


$\displaystyle\frac{{\bm{u}}^{1}-{\bm{u}}^{0}}{\tau}-A{\bm{u}}^{1}$	$\displaystyle={\bm{f}}^{1},$	(2.10a)
$\displaystyle\frac{3{\bm{u}}^{k}-4{\bm{u}}^{k-1}+{\bm{u}}^{k-2}}{2\tau}-A{\bm{u}}^{k}$	$\displaystyle={\bm{f}}^{k},\quad 2\leq k\leq N_{t}.$	(2.10b)

We refer to matrix $A$ as the Jacobian matrix. In particular, we note that the numerical scheme (2.10) has been proved to be unconditionally stable and second-order convergent in the discrete $L^{2}$ norm [10]. Note that it is computationally more efficient than the C-N scheme presented in [8] as it requires one less matrix-vector multiplication per time step.

Instead of computing the solution of (2.10) step-by-step, we try to get an all-at-once approximation by solving the following linear system:

\left(\begin{bmatrix}1&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}\otimes I_{s}-\tau I_{t}\otimes A\right)\begin{bmatrix}{\bm{u}}^{1}\\ {\bm{u}}^{2}\\ \vdots\\ {\bm{u}}^{N_{t}-1}\\ {\bm{u}}^{N_{t}}\\ \end{bmatrix}=\begin{bmatrix}\tau{\bm{f}}^{1}+{\bm{u}}^{0}\\ \tau{\bm{f}}^{2}-\frac{{\bm{u}}^{0}}{2}\\ \vdots\\ \tau{\bm{f}}^{N_{t}-1}\\ \tau{\bm{f}}^{N_{t}}\\ \end{bmatrix},

(2.11)

where $I_{s}$ and $I_{t}$ are two identity matrices of order $N_{s}~{}(=N-1)$ and $N_{t}~{}(=M)$ , respectively. We denote the above linear system as follows:

\mathcal{A}{\bm{U}}={\bm{F}},\quad{\bm{U}}=\begin{bmatrix}{\bm{u}}^{1}\\ {\bm{u}}^{2}\\ \vdots\\ {\bm{u}}^{N_{t}-1}\\ {\bm{u}}^{N_{t}}\\ \end{bmatrix},\quad{\bm{F}}=\begin{bmatrix}\tau{\bm{f}}^{1}+{\bm{u}}^{0}\\ \tau{\bm{f}}^{2}-\frac{{\bm{u}}^{0}}{2}\\ \vdots\\ \tau{\bm{f}}^{N_{t}-1}\\ \tau{\bm{f}}^{N_{t}}\\ \end{bmatrix},

(2.12)

where $\mathcal{A}=C\otimes I_{s}-\tau I_{t}\otimes A$ with

C=\begin{bmatrix}1&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}\in\mathbb{R}^{N_{t}\times N_{t}},

(2.13)

and it is clear that $\mathcal{A}$ is invertible, because its all diagonal blocks (i.e, either $I_{s}-\tau A$ or $\frac{3}{2}I_{s}-\tau A$ ) are invertible [33, 34].

2.2 Properties of the all-at-once system

In this subsection, we investigate the properties of the discrete all-at-once formulation (2.11). This will guide us to discuss the design of an efficient solver for such a large linear system.

Lemma 2.3.

For the matrix $C$ in (2.13), we have the following estimates,

1)

$\|C^{-1}\|_{\infty}\leq\frac{3N_{t}}{2}$ ;
2)

$\|C^{-1}\|_{1}=N_{t}$ .

Proof. Consider the following matrix splitting and then perform the matrix factorization,

\begin{split}C&=\begin{bmatrix}\frac{3}{2}&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}-\begin{bmatrix}\frac{1}{2}&\\ 0&0\\ 0&0&0\\ &\ddots&\ddots&\ddots\\ &&0&0&0&\\ &&&0&0&0\end{bmatrix}\triangleq\hat{C}-\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\\ &=\frac{3}{2}\begin{bmatrix}1&\\ -\frac{1}{3}&1\\ &-\frac{1}{3}&1\\ &&\ddots&\ddots\\ &&&-\frac{1}{3}&1&\\ &&&&-\frac{1}{3}&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ -1&1\\ &-1&1\\ &&\ddots&\ddots\\ &&&-1&1&\\ &&&&-1&1\end{bmatrix}-\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\end{split}

(2.14)

where the vector ${\bm{e}}_{1}=[1,0,\cdots,0]^{T}\in\mathbb{R}^{N_{t}}$ . According to the Sherman-Morrison formula [53], we can write

C^{-1}=\hat{C}^{-1}+\frac{\hat{C}^{-1}\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\hat{C}^{-1}}{1-\frac{1}{2}{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}}=\hat{C}^{-1}+\frac{\hat{C}^{-1}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\hat{C}^{-1}}{2-{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}}.

(2.15)

On the other hand, we can compute the inverse of $\hat{C}$ as follows,

\hat{C}^{-1}=\frac{2}{3}\begin{bmatrix}1&\\ 1&1&\\ 1&1&1&\\ \vdots&\ddots&\ddots&\ddots&\\ 1&1&\cdots&1&1&\\ 1&1&\cdots&1&1&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ \frac{1}{3}&1\\ \frac{1}{3^{2}}&\frac{1}{3}&1\\ \vdots&\ddots&\ddots&\ddots\\ \frac{1}{3^{N_{t}-2}}&\frac{1}{3^{N_{t}-3}}&\cdots&\frac{1}{3}&1&\\ \frac{1}{3^{N_{t}-1}}&\frac{1}{3^{N_{t}-2}}&\cdots&\frac{1}{3^{2}}&\frac{1}{3}&1\\ \end{bmatrix}.

(2.16)

Therefore, we know that

\hat{C}^{-1}{\bm{e}}_{1}=\frac{2}{3}\begin{bmatrix}1&\\ 1&1&\\ 1&1&1&\\ \vdots&\ddots&\ddots&\ddots&\\ 1&1&\cdots&1&1&\\ 1&1&\cdots&1&1&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ \frac{1}{3}\\ \frac{1}{3^{2}}\\ \vdots\\ \frac{1}{3^{N_{t}-2}}\\ \frac{1}{3^{N_{t}-1}}\\ \end{bmatrix}=\begin{bmatrix}\frac{2}{3}&\\ 1-\frac{1}{3^{2}}\\ 1-\frac{1}{3^{3}}\\ \vdots\\ 1-\frac{1}{3^{N_{t}-1}}\\ 1-\frac{1}{3^{N_{t}}}\\ \end{bmatrix}

(2.17)

and

{\bm{e}}^{T}_{1}\hat{C}^{-1}=\frac{2}{3}{\bm{e}}^{T}_{1},\quad{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}=\frac{2}{3}.

(2.18)

The explicit expression of $C^{-1}$ has the following form:

C^{-1}=\hat{C}^{-1}+\frac{1}{2}\begin{bmatrix}\frac{2}{3}&\\ 1-\frac{1}{3^{2}}\\ 1-\frac{1}{3^{3}}\\ \vdots\\ 1-\frac{1}{3^{N_{t}-1}}\\ 1-\frac{1}{3^{N_{t}}}\\ \end{bmatrix}{\bm{e}}^{T}_{1}=\hat{C}^{-1}+\frac{1}{2}\begin{bmatrix}\frac{2}{3}&0&\cdots&0\\ 1-\frac{1}{3^{2}}&0&\cdots&0\\ 1-\frac{1}{3^{3}}&0&\cdots&0\\ \vdots&\vdots&\cdots&\vdots\\ 1-\frac{1}{3^{N_{t}-1}}&0&\cdots&0\\ 1-\frac{1}{3^{N_{t}}}&0&\cdots&0\\ \end{bmatrix}.

(2.19)

Since $\hat{C}^{-1}$ is a lower triangular Toeplitz matrix, $\|C^{-1}\|_{1}$ is the absolute sum of elements of its last row, i.e.,

\begin{split}\|C^{-1}\|_{1}&=\frac{2}{3}\left[\left(1+\cdots+\frac{1}{3^{N_{t}-1}}\right)+\left(1+\cdots+\frac{1}{3^{N_{t}-2}}\right)+\cdots+\left(1+\frac{1}{3}\right)+1\right]+\frac{1}{2}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &=N_{t}-\left(\frac{1}{3}+\frac{1}{3^{2}}+\cdots+\frac{1}{3^{N_{t}}}\right)+\frac{1}{2}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &=N_{t}.\end{split}

(2.20)

Analogously, $\|C^{-1}\|_{\infty}$ is the absolute sum of elements of its first column, i.e.,

\begin{split}\|C^{-1}\|_{\infty}&=\frac{2}{3}\left[1+\left(1+\frac{1}{3}\right)+\cdots+\left(1+\cdots+\frac{1}{3^{N_{t}-2}}\right)+\left(1+\cdots+\frac{1}{3^{N_{t}-1}}\right)\right]\\ &+\frac{1}{2}\left(1-\frac{1}{3}+1-\frac{1}{3^{2}}+\cdots+1-\frac{1}{3^{N_{t}}}\right)\\ &=\frac{2}{3}\left(\frac{1-\frac{1}{3^{1}}}{1-\frac{1}{3}}+\frac{1-\frac{1}{3^{2}}}{1-\frac{1}{3}}+\cdots+\frac{1-\frac{1}{3^{N_{t}-1}}}{1-\frac{1}{3}}+\frac{1-\frac{1}{3^{N_{t}}}}{1-\frac{1}{3}}+\right)+\frac{1}{2}\left(N_{t}-\frac{1-\frac{1}{3^{N_{t}}}}{1-\frac{1}{3}}\cdot\frac{1}{3}\right)\\ &=\frac{3N_{t}}{2}-\frac{3}{4}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &\leq\frac{3N_{t}-1}{2}\leq\frac{3N_{t}}{2}.\end{split}

(2.21)

Therefore, the above estimates are proved. $\Box$

Theorem 2.1.

Suppose that $Re(\lambda(A))\leq 0$ . Then, the following bounds hold for the critical singular values of $\mathcal{A}$ in (2.11):

\sigma_{max}(\mathcal{A})\leq 4+\tau\|A\|_{2}\quad{\rm and}\quad\sigma_{min}(\mathcal{A})\geq\frac{\sqrt{6}}{3N_{t}},

(2.22)

so that ${\rm cond}(\mathcal{A})\leq 2\sqrt{6}N_{t}+\frac{\sqrt{6}T\|A\|_{2}}{2}$ .

Proof. Since $Re(\lambda(A))\leq 0$ , we may claim that $\|\mathcal{A}{\bm{z}}\|\geq\|(C\otimes I_{s}){\bm{z}}\|$ for any vector ${\bm{z}}$ . In particular, using the properties of the Kronecker product, we may estimate $\sigma_{min}(\mathcal{A})\geq\sigma_{min}(C)$ . The latter is computed using the spectral norm of $C^{-1}$ , which can be bounded by the following inequality:

\|C^{-1}\|_{2}\leq\sqrt{\|C^{-1}\|_{1}\|C^{-1}\|_{\infty}}.

(2.23)

According to Lemma 2.3, we have

\sigma_{min}(\mathcal{A})\geq\sigma_{max}(C)=\frac{1}{\|C^{-1}\|_{2}}\geq\frac{1}{\sqrt{\|C^{-1}\|_{1}\|C^{-1}\|_{\infty}}}\geq\frac{\sqrt{6}}{3N_{t}},

(2.24)

which proves the second bound.

The first estimate is derived similarly. By applying the triangle inequality, we can write

\sigma_{max}(\mathcal{A})=\|\mathcal{A}\|_{2}\leq\|C\|_{2}+\tau\|A\|_{2},

(2.25)

whereas

\|C\|_{1}=\|C\|_{\infty}=4,

(2.26)

so the spectral norms are not greater than 4. Finally, the proof is completed by recalling that $\tau=T/N_{t}$ . $\Box$

The condition number can be expected to vary linearly with the main properties of the system, such as number of time steps, length of time interval, and norm of the Jacobian matrix [39, 40, 54]. More refined estimates could be provided by taking into account particular properties of $A$ . In Theorem 2.1, the time interval $[0,T]$ may not necessarily be equal to the whole observation range of an application. The global time scheme (2.11) could be applied by splitting the desired interval $[0,\hat{T}]$ into a sequence of subintervals $[0,T],[T,2T],\cdots,[\hat{T}-T,\hat{T}]$ for “large” time steps of size $T$ each, solving by (2.11) for $[(q-1)T,qT],~{}q=1,\cdots,\hat{T}/T$ , extracting the last snapshot $x_{Nt}$ and restarting the method using $x_{Nt}$ as the initial state for the next interval. The optimal value of $T$ should provide the fastest computation [39]. Another way to accelerate the solution is to reduce the condition number of the linear system by preconditioning. This standard computational aspect will be considered in the next section.

3 Parallel-in-time (PinT) preconditioners

According to Theorem 2.1, when an iterative method, namely a Krylov subspace method, is used for solving the all-at-once system (2.11), it can converge very slowly or even stagnate. Therefore, in this section we look at preconditioners that can be efficiently implemented in the PinT framework.

3.1 The structuring-circulant preconditioners

Since the matrix $C$ defined in (2.13) can be viewed as the sum of a Toeplitz matrix and a rank-1 matrix, it is natural to define our first structuring-circulant preconditioner as

\mathcal{P}_{\alpha}=C_{\alpha}\otimes I_{s}-\tau I_{t}\otimes A

(3.1)

where

C_{\alpha}=\begin{bmatrix}\frac{3}{2}&&&&\frac{\alpha}{2}&-2\alpha\\ -2&\frac{3}{2}&&&&\frac{\alpha}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix},\quad\alpha\in(0,1],

(3.2)

is a $\alpha$ -circulant matrix of Strang type that can be diagonalized as

C_{\alpha}=V_{\alpha}D_{\alpha}V^{-1}_{\alpha}

(3.3)

with

	$V_{\alpha}=\Lambda_{\alpha}\mathbb{F}_{N_{t}}^{*},\quad~{}D_{\alpha}=\text{diag}\big{(}\sqrt{N_{t}}\mathbb{F}_{N_{t}}\Lambda_{\alpha}^{-1}C_{\alpha}(:,1)\big{)}={\rm diag}\big{(}\lambda^{(\alpha)}_{1},\lambda^{(\alpha)}_{2},\cdots,\lambda^{(\alpha)}_{N_{t}}\big{)}$	(3.4a)
where $\lambda^{(\alpha)}_{n}=\sum^{2}_{j=0}r_{j}\alpha^{j/N_{t}}\theta^{(n-1)j}~{}(r_{0}=3/2,~{}r_{1}=-2,~{}r_{2}=1/2)$ , ‘ $*$ ’ denotes the conjugate transpose of a matrix, $C_{\alpha}(:,1)$ is the first column of $C_{\alpha}$ , $\Lambda_{\alpha}={\rm diag}\big{(}1,\alpha^{-\frac{1}{N_{t}}},\cdots,\alpha^{-\frac{N_{t}-1}{N_{t}}}\big{)}$ and
	$\begin{split}\mathbb{F}_{N_{t}}=\frac{1}{\sqrt{N_{t}}}\begin{bmatrix}1&1&\dots&1\\ 1&\theta&\dots&\theta^{N_{t}-1}\\ \vdots&\vdots&\dots&\vdots\\ 1&\theta^{N_{t}-1}&\dots&\theta^{(N_{t}-1)(N_{t}-1)}\end{bmatrix},\quad~{}\theta=e^{\frac{2\pi{\rm i}}{N_{t}}}.\end{split}$	(3.4b)

The matrix $\mathbb{F}_{N_{t}}$ in (3.4b) is the discrete Fourier matrix, and is unitary. In addition, it is worth noting that if we choose $\alpha$ very small, then the condition number of $V_{\alpha}$ increases quickly and therefore large roundoff errors may be introduced in our proposed method; see [36, 19] for a discussion of these issues. Using the property of the Kronecker product, we can factorize $\mathcal{P}_{\alpha}$ as

\mathcal{P}_{\alpha}=(V_{\alpha}\otimes I_{s})\left(D_{\alpha}\otimes I_{s}-\tau I_{t}\otimes A\right)(V^{-1}_{\alpha}\otimes I_{s}),

(3.5)

and this implies that we can compute ${\bm{z}}=\mathcal{P}^{-1}_{\alpha}{\bm{v}}$ via the following three steps:

\begin{cases}{\bm{z}}_{1}=(V^{-1}_{\alpha}\otimes I_{s}){\bm{v}}=(\mathbb{F}_{N_{t}}\otimes I_{s})\left[(\Lambda^{-1}_{\alpha}\otimes I_{s}){\bm{v}}\right],&\text{Step-(a)},\\ \big{(}\lambda^{(\alpha)}_{n}I_{s}-\tau A\big{)}{\bm{z}}_{2,n}={\bm{z}}_{1,n},\quad~{}n=1,2,\dots,{N_{t}},&\text{Step-(b)},\\ {\bm{z}}=(V_{\alpha}\otimes I_{s}){\bm{z}}_{2}=(\Lambda_{\alpha}\otimes I_{s})\left[(\mathbb{F}^{*}_{N_{t}}\otimes I_{s}){\bm{z}}_{2}\right],&\text{Step-(c)},\\ \end{cases}

where ${\bm{z}}_{j}=({\bm{z}}_{j,1}^{\top},{\bm{z}}_{j,2}^{\top},\dots,{\bm{z}}_{j,N_{t}}^{\top})^{\top}$ (with $j=1,2$ ) and $\lambda^{(\alpha)}_{n}$ is the $n$ -th eigenvalue of $C_{\alpha}$ . The first and third steps only involve matrix-vector multiplications with $N_{t}$ subvectors, and these can be computed simultaneously by the FFTs²²2It notes that there are also parallel FFTs available at http://www.fftw.org/parallel/parallel-fftw.html. in $N_{t}$ CPUs [54]. The major computational cost is spent in the second step, however this step is naturally parallel for all the $N_{t}$ time steps.

Next, we study the invertibility of the matrix $\mathcal{P}_{\alpha}$ . According to Eq. (2.13) and because matrix $A$ is negative definite, the following result assures the invertibility of $\mathcal{P}_{\alpha}$ ³³3It is unnecessary to keep that $r_{0}\geq|r_{1}|+|r_{2}|$ , which is introduced in [35, Remark 4]..

Proposition 3.1.

For $\alpha\in(0,1]$ and $\lambda^{(\alpha)}_{n}\in\mathbb{C}$ , it holds that $\mathcal{R}e(\lambda^{(\alpha)}_{n})\geq 0$ , where the equality is true if and only if $\alpha=1$ .

Proof. Set $\varepsilon=\alpha^{1/N_{t}}\theta^{(n-1)}\triangleq x+{\rm i}y$ , then $x^{2}+y^{2}=\alpha^{2/N_{t}}\leq 1$ , we have

\begin{split}\mathcal{R}e(\lambda^{(\alpha)}_{n})&=\mathcal{R}e\left(\frac{3}{2}-2\varepsilon+\frac{\varepsilon^{2}}{2}\right)\\ &=\mathcal{R}e\left(\frac{1}{2}(1-x-{\rm i}y)(3-x-{\rm i}y)\right)\\ &=\frac{1}{2}[(1-x)(3-x)-y^{2}]\\ &\geq x^{2}-2x+1\\ &\geq 0,\end{split}

(3.6)

where the penultimate inequality holds due to $x^{2}-1\leq-y^{2}$ . Moreover, if $\alpha\neq 1$ , then it should hold that $x^{2}+y^{2}<1$ , i.e., $-1<x<1$ and $x^{2}-2x+1>0$ , which completes the proof. $\Box$

In practice, we often choose $\alpha\in(0,1)$ [35]; then it holds that $\mathcal{R}e(\lambda^{(\alpha)}_{n})>0$ according to Proposition 3.1. In Fig. 1, we plot the complex quantities $\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1}$ on the complex plane. We see that for $\alpha\in(0,1)$ , it holds $\mathcal{R}e(\lambda^{(\alpha)}_{n})>0$ and consequently all the linear systems involved at Step-(b) are positive definite, thus not difficult to solve [31]. As shown in [36], a multigrid method using Richardson iterations as a smoother with an optimized choice of the damping parameter is very effective. By the way, a detailed convergence analysis with multigrid precondioning in the context of the $\tau$ algebra considered in this paper is presented in [38].

Refer to caption — Fig. 1: The plot of complex quantities $\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1}$ with different $\alpha$ ’s.

In addition, it is not hard to derive the following result.

Theorem 3.1.

Let $\alpha\in(0,1]$ and $\mathcal{P}^{-1}_{\alpha}\mathcal{A}=\mathcal{I}+\mathcal{L}$ , where $\mathcal{I}$ is an identity matrix of order $N_{t}N_{s}$ , then it follows that ${\rm rank}(\mathcal{L})=2N_{t}$ .

Proof. We consider that

\begin{split}\mathcal{P}^{-1}_{\alpha}\mathcal{A}&=\mathcal{P}^{-1}_{\alpha}[\mathcal{P}_{\alpha}+(\mathcal{A}-\mathcal{P}_{\alpha})]\\ &=\mathcal{I}+\mathcal{P}^{-1}_{\alpha}[(C-C_{\alpha})\otimes I_{s}]\\ &\triangleq\mathcal{I}+\mathcal{L},\end{split}

(3.7)

where ${\rm rank}(\mathcal{L})={\rm rank}\big{(}\mathcal{P}^{-1}_{\alpha}[(C-C_{\alpha})\otimes I_{s}]\big{)}={\rm rank}\big{(}(C-C_{\alpha})\otimes I_{s}\big{)}=2N_{t}$ , which completes the proof. $\Box$

Theorem 3.1 implies that the generalized minimal residual (GMRES) method by Saad and Schultz [56] can compute the exact solution of the preconditioned system $\mathcal{P}^{-1}_{\alpha}\mathcal{A}$ (if it is diagonalizable) in at most $2N_{t}+1$ iterations [56, 23, 33]. Although it should be noted that such a convergence behavior of the preconditioned GMRES is not sharp when $N_{t}$ is not small, the result can give useful clues on the effectiveness of the preconditioner $\mathcal{P}_{\alpha}$ to approximate the all-at-once matrix $\mathcal{A}$ . Moreover, in Section 4 we will show numerically that most of the eigenvalues of the preconditioned matrix $\mathcal{P}^{-1}_{\alpha}\mathcal{A}$ actually cluster at point 1 of the spectrum, supporting the theoretical findings of this section on the good potential of the proposed preconditioner $\mathcal{P}_{\alpha}$ to accelerate the iterative solution of $\mathcal{P}^{-1}_{\alpha}\mathcal{A}$ .

Remark 3.1.

If we set $\alpha=1$ , the preconditioner $\mathcal{P}_{\alpha}$ reduces to the BC preconditioner $\mathcal{P}_{1}=C_{1}\otimes I_{s}-\tau I_{t}\otimes A$ , where $C_{1}$ is a matrix of $C_{\alpha}$ in (3.2) with $\alpha=1$ . Moreover, the matrix decomposition (3.5), Proposition 3.1, and Theorem 3.1 are available for the BC preconditioner $\mathcal{P}_{1}$ , refer e.g., to [33, 35] for details;

Remark 3.2.

We remark that the invertibility of $\mathcal{P}_{\alpha}$ (or $\mathcal{P}_{1}$ ) depends completely on the matrix $\lambda^{(\alpha)}_{n}I_{s}-\tau A$ . However, if we choose $\alpha\rightarrow 1$ , the real parts of $\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1}$ will be very close to zero, cf. Proposition 3.1. Meanwhile, if the diffusion coefficient $\kappa_{\gamma}$ is very small, the eigenvalues of $A$ will be increasingly close to zero. This fact will make the matrices $\lambda^{(\alpha)}_{n}I_{s}-\tau A$ potentially very ill-conditioned (even singular⁴⁴4If the model problem (2.1) incorporates Neumann boundary condition and we set $\alpha=1$ , the matrices $\lambda^{(\alpha)}_{n}I_{s}-\tau A$ will be singular because of the singular matrix $A$ .). Under this circumstance, the BC preconditioner $\mathcal{P}_{1}$ should be a bad preconditioner (see [35] and Section 4) while the preconditioner $\mathcal{P}_{\alpha}$ with $\alpha\in(0,1)$ should be preferred in a practical implementation;

Remark 3.3.

If one wants to improve the proposed preconditioner further, then the polynomial preconditioner that has potential to reduce communication costs in Krylov subspace solvers [56] can be derived:

\begin{split}\mathcal{A}^{-1}&=\left[\mathcal{I}-\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha}\right]^{-1}\mathcal{P}^{-1}_{\alpha}\\ &=\sum^{\infty}_{k=1}(\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha})^{k}\mathcal{P}^{-1}_{\alpha},\end{split}

(3.8)

where $\mathcal{S}_{\alpha}=(C_{\alpha}-C)\otimes I_{x}$ and assume that $\|\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha}\|<1$ , then the $m$ -step polynomial preconditioner $\mathcal{P}_{\alpha}(m)=\sum^{m}_{k=0}(\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha})^{k}\mathcal{P}^{-1}_{\alpha}$ will be available and inherently parallel. Moreover, such a preconditioner can be easily adapted for the all-at-once systems arising from the given non-uniform temporal discretizations; refer to [34] for a short discussion.

3.2 Efficient implementation of the PinT preconditioner

In this subsection, we discuss on how to implement the Krylov subspace method more efficiently for solving the left (or right) preconditioned system $\mathcal{P}^{-1}_{\alpha}\mathcal{A}{\bm{U}}=\mathcal{P}^{-1}_{\alpha}{\bm{F}}$ (or $\mathcal{A}\mathcal{P}^{-1}_{\alpha}(\mathcal{P}_{\alpha}{\bm{U}})={\bm{F}}$ ), $\alpha\in(0,1]$ . We recall that the kernel operations of a Krylov subspace algorithm are the matrix-vector products $\mathcal{P}^{-1}_{\alpha}{\bm{v}}$ and $\mathcal{A}{\bm{v}}$ for some given vector ${\bm{v}}\in\mathbb{R}^{N_{t}N_{s}\times 1}$ . First, we present a fast implementation for computing the second matrix-vector product.

\begin{split}\mathcal{A}{\bm{v}}&=(C\otimes I_{s}){\bm{v}}-(\tau I_{t}\otimes A){\bm{v}}\\ &={\rm vec}(VC^{T})-\tau\cdot{\rm vec}(AV),\end{split}

(3.9)

where the operation ${\bm{v}}={\rm vec}(V)$ and ${\rm vec}(X)$ denote the vectorization of the matrix $X$ obtained by stacking the columns of $X$ into a single column vector. The first term can be calculated directly in $\mathcal{O}(N_{t}N_{s})$ storage and operations due to the sparse matrix $C^{T}$ . Owing to the Toeplitz structure of matrix $A$ , the second term can be evaluated in $\mathcal{O}(N_{t}N_{s}\log N_{s})$ operations and $\mathcal{O}(N_{t}N_{s})$ storage. This means that the computation of $\mathcal{A}{\bm{v}}$ requires the same operations and storage of $(I_{t}\otimes A){\bm{v}}$ . Clearly, the algorithm complexity of $\mathcal{A}{\bm{v}}$ can be further alleviated by a parallel implementation.

Next, we focus on the fast computation of $\mathcal{P}^{-1}_{\alpha}{\bm{v}}$ for a given vector ${\bm{v}}$ . According to Step-(a)–Step-(c) described in Section 3.1, the first and third steps of the implementation of $\mathcal{P}_{\alpha}$ require $\mathcal{O}(N_{t}N_{s}\log N_{s})$ operations and $\mathcal{O}(N_{t}N_{s})$ storage. If the matrix $A$ can be diagonalized (or approximated) via fast DSTs, Step-(b) can be directly solved with $\mathcal{O}(N_{t}N_{s}\log N_{s})$ operations and $\mathcal{O}(N_{t}N_{s})$ storage, except for the different complex shifts.

Similar to [35, pp. 10-11], it is worth noting that only half of $N_{t}$ shifted linear systems in Step-(b) need to be solved. From Step-(a), we know that the right hand sides in Step-(b) can be expressed as

{\bm{z}}_{1,n}=\frac{1}{\sqrt{N_{t}}}\sum^{2}_{j=0}\alpha^{\frac{j}{N_{t}}}\theta^{(n-1)j}{\bm{v}}_{n},~{}~{}{\bm{v}}=[{\bm{v}}^{\top}_{1},{\bm{v}}^{\top}_{2},\cdots,{\bm{v}}^{\top}_{N_{t}}]^{\top},~{}~{}n=1,2,\cdots,N_{t}.

(3.10)

We recall the expression of coefficient matrices in Step-(b):

B_{n-1}:=\lambda^{(\alpha)}_{n}I_{s}-\tau A=\left(\sum^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{(n-1)j}\right)I_{s}-\tau A,\quad n=1,2,\cdots,N_{t}.

(3.11)

Let “ ${\rm conj}(\cdot)$ ” denote conjugate of a matrix or a vector. Then,

\begin{cases}{\rm conj}({\bm{z}}_{1,n})=\frac{1}{\sqrt{N_{t}}}\sum\limits^{2}_{j=0}\alpha^{\frac{j}{N_{t}}}\theta^{[N_{t}-(n-1)]j}{\bm{v}}_{n}={\bm{z}}_{1,N_{t}-n+2},\\ {\rm conj}(B_{n-1})=\left(\sum\limits^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{-(n-1)j}\right)I_{s}-\tau A=\left(\sum\limits^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{[N_{t}-(n-1)]j}\right)I_{s}-\tau A=B_{N_{t}-{\color[rgb]{1,0,0}{n}}+1},&n=2,3,\cdots,N_{t}.\end{cases}

(3.12)

That means the unknowns in Step-(b) hold equalities: ${\bm{z}}_{2,n}={\rm conj}({\bm{z}}_{2,N_{t}-n+2})$ for $n=2,3,\cdots,N_{t}$ . Therefore, only the first $\lfloor(N_{t}+1)/2\rfloor$ multi-shifted linear systems in Step-(b) need to be solved, thus the number of core processors required in the practical parallel implementation is significantly reduced.

Finally, we discuss the efficient solution of the sequence of shifted linear systems in Step-(b). Since $A$ is a real negative definite Toeplitz matrix, it can be approximated efficiently by a $\tau$ -matrix $\mathscr{T}(A)$ [47] that is a real symmetric matrix and can be diagonalized as follows:

\mathscr{T}(A)=Q^{\top}_{N_{s}}\Lambda_{N_{s}}Q_{N_{s}},\quad Q_{N_{s}}=\sqrt{\frac{2}{N_{s}+1}}\sin\left(\frac{\pi ij}{N_{s}+1}\right),~{}1\leq i,j\leq N_{s},\quad\Lambda_{N_{s}}={\rm diag}(\sigma_{1},\sigma,\cdots,\sigma_{N_{s}}).

(3.13)

From the relations $Q^{\top}_{N_{s}}Q_{N_{s}}=I_{s}$ and $Q_{N_{s}}\mathscr{T}(A)=\Lambda_{N_{s}}Q_{N_{s}}$ , we have

\sigma_{i}=\frac{\sum^{N_{s}}_{\ell=1}a_{\ell}\sin\left(\frac{\pi ij}{N_{s}+1}\right)}{\sin\left(\frac{\pi i}{N_{s}+1}\right)},

(3.14)

where the vector $[a_{1},~{}a_{2},\cdots,a_{N_{s}}]^{\top}$ is the first column of $A$ . The set of shifted linear systems at Step-(b) is reformulated as the following sequence:

\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\mathscr{T}(A)\big{]}{\bm{z}}_{2,n}=Q^{\top}_{N_{s}}\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\Lambda_{N_{s}}\big{]}Q_{N_{s}}={\bm{z}}_{1,n},\quad~{}n=1,2,\dots,N_{t},

(3.15)

that can be solved by fast DSTs without storing any dense matrix. In addition, since the DST only involves real operations, so the above preconditioner matrix diagonalization does not affect Eqs. (3.10)–(3.12) at all. Overall, the fast computation of $P^{-1}_{\alpha}{\bm{v}}$ using Eq. (3.15) requires $\mathcal{O}(N_{t}N_{s}\log N_{s})$ operations and $\mathcal{O}(N_{t}N_{s})$ memory units.

Remark 3.4.

It is worthwhile noting that such an implementation is also suitable for solving the all-at-once system that arises from the spatial discretizations using the compact finite difference, finite element and spectral methods by only substituting in Eq. (3.15) the identity matrix $I_{s}$ with the mass matrix. Fortunately, for one-dimensional problems, the mass matrix is a SPD Toeplitz tridiagonal matrix [25, 30, 13], which can be naturally diagonalized by fast DSTs. For high-dimensional models, the mass matrix will be a SPD Toeplitz block tridiagonal matrix in the Kronecker product form, thus it can be still diagonalized via fast DSTs; refer, e.g., to [11];

Remark 3.5.

Finally, it is essential to mention that although the algorithm and theoretical analyses are presented for one-dimensional model (2.1), they can be naturally extended for two-dimensional model (refer to the next section for a discussion) because the above algorithm and theoretical analyses are mainly based on the temporal discretization.

4 Numerical experiments

In this section, we present the performance of the proposed $\alpha$ -circulant-based preconditioners for solving some examples of one- and two-dimensional model problems (2.1). In all our numerical experiments reported in this section, following the guidelines given in [35] we set $\alpha=\min\{0.5,0.5\tau\}$ for the preconditioners $\mathcal{P}_{\alpha}$ and $\mathcal{P}_{1}$ . All our experiments are performed in MATLAB R2017a on a Gigabyte workstation equipped with Intel(R) Core(TM) i9-7900X CPU @3.3GHz, 32GB RAM running the Windows 10 Education operating system (64bit version) using double precision floating point arithmetic (with machine epsilon equal to $10^{-16}$ ). The adaptive Simpler GMRES (shortly referred to as Ad-SGMRES) method [55] is employed to solve the right-preconditioned systems in Example 1, while the BiCGSTAB method [56, pp. 244-247] method is applied to the left-preconditioned systems in Example 2 using the built-in function available in MATLAB⁵⁵5In this case, since Ad-SGMRES still requires large amounts of storage due to the orthogonalization process, we have also used the BiCGSTAB method as an alternative iterative method for solving non-symmetric systems.. The tolerance for the stopping criterion in both algorithms is set as $\|{\bm{r}}_{k}\|_{2}/\|{\bm{r}}_{0}\|_{2}<{\rm tol}=10^{-9}$ , where ${\bm{r}}_{k}$ is the residual vector at $k$ th Ad-SGMRES or BiCGSTAB iteration. The iterations are always started from the zero vector. All timings shown in the tables are obtained by averaging over 20 runs⁶⁶6The code in MATLAB will be available at https://github.com/Hsien-Ming-Ku/Group-of-FDEs..

In the tables, the quantity ‘Iter’ represents the iteration number of Ad-SGMRES or BiCGSTAB, ‘DoF’ is the number of degrees of freedom, or unknowns, and ‘CPU’ is the computational time expressed in seconds. The 2-norm of the true relative residual (called in the tables TRR) is defined as $\mathtt{TRR}=||{\bm{F}}-\mathcal{A}{\bm{U}}_{k}\|_{2}/\|{\bm{F}}\|_{2}$ , and the numerical error (Err) between the approximate and the exact solutions at the final time level reads $\|{\bm{u}}^{*}-{\bm{u}}^{N_{t}}\|_{\infty}$ , where ${\bm{U}}_{k}$ is the approximate solution when the preconditioned iterative solvers terminate and ${\bm{u}}^{*}$ is the exact solution on the mesh. These notations are adopted throughout this section.

Example 1. The first example is a Riesz fractional diffusion equation (2.1) with coefficients $(a,b)=(0,1)$ , $T=1$ , $\kappa_{\gamma}=0.01$ and $\phi(x)=15(1+\gamma/4)x^{3}(1-x)^{3}$ . The source term is given by

\begin{split}f(x,t)&=15(1+\gamma/4)e^{t}x^{3}(1-x)^{3}+\frac{15(1+\gamma/4)\kappa_{\gamma}e^{t}}{2\cos(\gamma\pi/2)}\Bigg{[}\frac{\Gamma(4)}{\Gamma(4-\gamma)}\big{(}x^{3-\gamma}+(1-x)^{3-\gamma}\big{)}-\frac{3\Gamma(5)}{\Gamma(5-\gamma)}\big{(}x^{4-\gamma}~{}+\\ &\quad~{}(1-x)^{4-\gamma}\big{)}+\frac{3\Gamma(6)}{\Gamma(6-\gamma)}\big{(}x^{5-\gamma}+(1-x)^{5-\gamma}\big{)}-\frac{\Gamma(7)}{\Gamma(7-\gamma)}\big{(}x^{6-\gamma}+(1-x)^{6-\gamma}\big{)}\Bigg{]}.\end{split}

The exact solution is known and it reads as $u(x,t)=15(1+\gamma/4)e^{t}x^{3}(1-x)^{3}$ . The results of our numerical experiments with the Ad-SGMRES method preconditioned by $\mathcal{P}_{\alpha}$ and $\mathcal{P}_{1}$ for solving the all-at-once discretized systems (2.12) are reported in Tables 1–3.

Table 1: Numerical results of GMRES with two different preconditioners on Example 1 with

\gamma=1.2

$N_{t}$	$h$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/128	8,128	7	0.033	-9.794	9.7599e-5	19	0.085	-9.412	9.7599e-5
	1/256	16,320	7	0.054	-9.257	9.4838e-5	19	0.117	-9.347	9.4838e-5
	1/512	32,704	8	0.116	-10.017	9.4147e-5	19	0.247	-9.535	9.4147e-5
	1/1024	65,472	8	0.156	-9.537	9.3974e-5	19	0.359	-9.163	9.3975e-5
$2^{8}$	1/128	32,512	7	0.107	-10.366	9.5721e-6	19	0.258	-9.378	9.5722e-6
	1/256	65,280	7	0.158	-9.580	6.8110e-6	19	0.385	-9.318	6.8111e-6
	1/512	130,816	7	0.445	-9.020	6.1205e-6	19	1.236	-9.248	6.1208e-6
	1/1024	261,888	8	0.787	-9.763	5.9481e-6	19	1.958	-9.156	5.9482e-6
$2^{10}$	1/128	130,048	6	0.323	-9.020	5.0121e-6	19	0.964	-9.368	5.0138e-6
	1/256	261,120	7	0.756	-9.895	1.2888e-6	19	2.123	-9.309	1.2890e-6
	1/512	523,264	7	1.715	-9.326	5.9821e-7	19	4.749	-9.242	5.9870e-7
	1/1024	1,047,552	8	3.245	-10.056	4.2607e-7	19	8.103	-9.153	4.2613e-7

Table 2: Numerical results of GMRES with two different preconditioners on Example 1 with

\gamma=1.5

$N_{t}$	$h$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/128	8,128	8	0.038	-9.892	1.0514e-4	15	0.085	-9.044	1.0515e-4
	1/256	16,320	8	0.054	-9.755	9.8789e-5	15	0.095	-9.176	9.8789e-5
	1/512	32,704	8	0.112	-9.697	9.7199e-5	15	0.198	-9.513	9.7199e-5
	1/1024	65,472	8	0.157	-9.349	9.6802e-5	16	0.308	-9.563	9.6802e-5
$2^{8}$	1/128	32,512	7	0.112	-9.545	1.4536e-5	16	0.225	-9.831	1.4536e-5
	1/256	65,280	7	0.158	-9.103	8.1809e-6	15	0.319	-9.323	8.1810e-6
	1/512	130,816	8	0.515	-9.925	6.5922e-6	15	0.979	-9.265	6.5922e-6
	1/1024	261,888	8	0.829	-9.587	6.1950e-6	16	1.649	-9.605	6.1950e-6
$2^{10}$	1/128	130,048	7	0.392	-9.982	1.3161e-5	15	0.827	-9.008	1.3162e-5
	1/256	261,120	7	0.779	-9.423	3.2696e-6	15	1.637	-9.184	3.2692e-6
	1/512	523,264	7	1.744	-9.049	9.0813e-7	15	3.769	-9.381	9.0882e-7
	1/1024	1,047,552	8	3.221	-9.878	5.1171e-7	16	6.658	-9.631	5.1160e-7

Table 3: Numerical results of GMRES with two different preconditioners on Example 1 with

\gamma=1.9

$N_{t}$	$h$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/128	8,128	7	0.034	-9.255	1.2052e-4	11	0.052	-9.305	1.2052e-4
	1/256	16,320	7	0.047	-9.101	1.0303e-4	11	0.070	-9.195	1.0303e-4
	1/512	32,704	8	0.113	-10.012	9.8653e-5	11	0.151	-9.123	9.8653e-5
	1/1024	65,472	8	0.159	-10.024	9.7559e-5	11	0.225	-9.058	9.7559e-5
$2^{8}$	1/128	32,512	7	0.112	-9.831	3.8671e-5	11	0.153	-9.408	3.8671e-5
	1/256	65,280	7	0.155	-9.500	1.1924e-5	11	0.235	-9.313	1.1924e-5
	1/512	130,816	7	0.456	-9.287	7.5514e-6	11	0.690	-9.255	7.5518e-6
	1/1024	261,888	7	0.696	-9.261	6.4585e-6	11	1.093	-9.212	6.4587e-6
$2^{10}$	1/128	130,048	6	0.331	-9.416	3.9387e-5	11	0.588	-9.433	3.9387e-5
	1/256	261,120	6	0.666	-9.216	9.8111e-6	11	1.187	-9.368	9.8118e-6
	1/512	523,264	7	1.736	-9.936	2.4178e-6	11	2.669	-9.317	2.4178e-6
	1/1024	1,047,552	7	2.850	-9.941	7.4549e-7	11	4.396	-9.300	7.4557e-7

According to Tables 1–3, we note that the used method with preconditioner $\mathcal{P}_{\alpha}$ converges much faster than with $\mathcal{P}_{1}$ in terms of both CPU and Iter on this Example 1, with different values of $\gamma$ ’s. In terms of accuracy (the values TRR and Err), the two preconditioned Ad-SGMRES methods are almost comparable. The results indicate that introducing the adaptive parameter $\alpha\in(0,1)$ indeed helps improve the performance of $\mathcal{P}_{1}$ . Moreover, the $\tau$ -matrix approximation of the Jacobian matrix $A$ in (3.5) is numerically effective. The iteration number of Ad-SGMRES- $\mathcal{P}_{\alpha}$ varies only slightly when $N_{t}$ increases (or $h$ decreases), showing almost matrix-size independent convergence rate. Such favourable numerical scalability property of Ad-SGMRES- $\mathcal{P}_{\alpha}$ is completely in line with our theoretical analysis presented in Section 3.1, where the eigenvalues distribution of the preconditioned matrices (also partly shown in Fig. 2) is independent of the space-discretization matrix $A$ . On the other hand, if the diffusion coefficient $\kappa_{\gamma}$ becomes smaller than $10^{-2}$ , the difference of performance between the preconditioners $\mathcal{P}_{\alpha}$ and $\mathcal{P}_{1}$ will be more remarkable; refer to Remark 4, however we do not investigate these circumstance in the present study.

Example 2. (the 2D problem [52, 13]) Consider the following Riesz fractional diffusion equation

\begin{cases}\frac{\partial u(x,y,t)}{\partial t}=\kappa_{\gamma_{1}}\frac{\partial^{\gamma_{1}}u(x,y,t)}{\partial|x|^{\gamma_{1}}}+\kappa_{\gamma_{2}}\frac{\partial^{\gamma_{2}u(x,y,t)}}{\partial|y|^{\gamma_{2}}}+f(x,y,t),&(x,y,t)\in\Omega\times(0,T],\\ u(x,y,0)=\phi(x,y),&(x,y)\in\Omega,\\ u(x,y,t)=0,&(x,y,t)\in\partial\Omega\times(0,T),\end{cases}

(4.1)

where $\kappa_{\gamma_{1}}=\kappa_{\gamma_{2}}=0.01$ , $T=2$ , $\Omega=(0,2)\times(0,2)$ , $\phi(x,y)=x^{4}(2-x)^{4}y^{4}(2-y)^{4}$ such that the source term is exactly defined as

\begin{split}f(x,y,t)&=-\frac{1}{3}e^{-t/3}x^{4}(2-x)^{4}y^{4}(2-y)^{4}+\frac{\kappa_{\gamma_{1}}e^{-t/3}}{2\cos(\gamma_{1}\pi/2)}y^{4}(2-y)^{4}\sum^{9}_{\ell=5}\frac{q_{\ell}\Gamma(\ell)\big{[}x^{\ell-1-\gamma_{1}}+(2-x)^{\ell-1-\gamma_{1}}\big{]}}{\Gamma(\ell-\gamma_{1})}\\ &\quad+\frac{\kappa_{\gamma_{2}}e^{-t/3}}{2\cos(\gamma_{2}\pi/2)}x^{4}(2-x)^{4}\sum^{9}_{\ell=5}\frac{q_{\ell}\Gamma(\ell)\big{[}x^{\ell-1-\gamma_{2}}+(2-x)^{\ell-1-\gamma_{2}}\big{]}}{\Gamma(\ell-\gamma_{2})}\end{split}

with $q_{5}=16,~{}q_{6}=-32,~{}q_{7}=24,~{}q_{8}=-8$ and $q_{9}=1$ . The exact solution is $u(x,y,y)=e^{-t/3}x^{4}(2-x)^{4}y^{4}(2-y)^{4}$ .

By a similar derivation to the technique described in Section 2.1, we can establish an implicit difference scheme for solving this model problem. For simplicity, we set $h_{x}=h_{y}=(b-a)/N$ ; then, it is easy to derive the Jacobian matrix in the following Kronecker product form

A=-\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}T_{x}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}T_{y}\right)\right],

(4.2)

where the two SPD Toeplitz matrices are defined by

T_{x}=\begin{bmatrix}\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}&\omega^{(\gamma_{1})}_{-2}&\cdots&\omega^{(\gamma_{1})}_{3-N}&\omega^{(\gamma_{1})}_{2-N}\\ \omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}&\cdots&\omega^{(\gamma_{1})}_{4-N}&\omega^{(\gamma_{1})}_{3-N}\\ \omega^{(\gamma_{1})}_{2}&\omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}&\cdots&\omega^{(\gamma_{1})}_{5-N}&\omega^{(\gamma_{1})}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma_{1})}_{N-3}&\omega^{(\gamma_{1})}_{N-4}&\omega^{(\gamma_{1})}_{N-5}&\cdots&\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}\\ \omega^{(\gamma_{1})}_{N-2}&\omega^{(\gamma_{1})}_{N-3}&\omega^{(\gamma_{1})}_{N-4}&\cdots&\omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}\end{bmatrix}~{}~{}{\rm and}~{}~{}T_{y}=\begin{bmatrix}\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}&\omega^{(\gamma_{2})}_{-2}&\cdots&\omega^{(\gamma_{2})}_{3-N}&\omega^{(\gamma_{2})}_{2-N}\\ \omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}&\cdots&\omega^{(\gamma_{1})}_{4-N}&\omega^{(\gamma_{2})}_{3-N}\\ \omega^{(\gamma_{2})}_{2}&\omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}&\cdots&\omega^{(\gamma_{2})}_{5-N}&\omega^{(\gamma_{2})}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma_{2})}_{N-3}&\omega^{(\gamma_{2})}_{N-4}&\omega^{(\gamma_{2})}_{N-5}&\cdots&\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}\\ \omega^{(\gamma_{2})}_{N-2}&\omega^{(\gamma_{2})}_{N-3}&\omega^{(\gamma_{2})}_{N-4}&\cdots&\omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}\end{bmatrix}.

Then, it is not hard to know that $A$ is a SPD matrix, which has no impact on the algorithm and theoretical analyses described in Sections 2–3. The only difference is that the direct application of $P_{\alpha}$ for Krylov subspace solvers must be prohibited, because it requires the huge computational cost and memory storage for solving a sequence of dense shifted linear systems in Step-(b).

In order to reduce the computational expense, we can follow Eq. (3.13) to approximate the Jacobian matrix $A$ in (4.2) by the following $\tau$ -matrix

\begin{split}\mathscr{T}(A)&=-\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\mathscr{T}(T_{x})\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\mathscr{T}(T_{y})\right)\right]\\ &=(Q^{\top}_{N-1}\otimes Q_{N-1})\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\Lambda^{(x)}_{N-1}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\Lambda^{(y)}_{N-1}\right)\right](Q^{\top}_{N-1}\otimes Q_{N-1})\end{split}

(4.3)

where $\mathscr{T}(T_{x})=Q^{\top}_{N-1}\Lambda^{(x)}_{N-1}Q_{N-1}$ and $\mathscr{T}(T_{y})=(Q^{\top}_{N-1}\Lambda^{(y)}_{N-1}Q_{N-1})$ [47, 48, 49]. Again, instead of solving the shifted linear systems in Step-(b), we solve the following sequence of shifted linear systems,

	$\displaystyle\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\mathscr{T}(A)\big{]}{\bm{z}}_{2,n}={\bm{z}}_{1,n},\quad n=1,2,\cdots,N_{t},$		(4.4)
	$\displaystyle\Leftrightarrow(Q^{\top}_{N-1}\otimes Q_{N-1})\left[\lambda^{(\alpha)}_{n}I_{s}-\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\Lambda^{(x)}_{N-1}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\Lambda^{(y)}_{N-1}\right)\right](Q^{\top}_{N-1}\otimes Q_{N-1}){\bm{z}}_{2,n}={\bm{z}}_{1,n},$		(4.5)

where $s=(N-1)^{2}$ , efficiently by fast discrete sine transforms avoiding the storage of any dense matrices. The numerical results reported in Tables 4–6 with $h=h_{x}=h_{y}$ illustrate the effectiveness and robustness of the PinT preconditioner $\mathcal{P}_{\alpha}$ for solving Eq. (4.5).

Table 4: Numerical results of GMRES with two different preconditioners on Example 2 with

(\gamma_{1},\gamma_{2})=(1.4,1.2)

$N_{t}$	$h_{x}=h_{y}$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/64	254,016	4.0	1.006	-9.195	1.2627e-4	12.0	2.544	-9.290	1.2628e-4
	1/128	1,032,256	4.5	5.325	-10.066	8.0645e-5	12.0	12.312	-9.132	8.0646e-5
	1/256	4,161,600	4.5	17.597	-9.219	7.8998e-5	12.0	42.376	-9.134	7.8999e-5
	1/512	16,711,744	5.0	115.94	-9.814	7.8611e-5	12.5	262.03	-9.151	7.8612e-5
$2^{8}$	1/64	1,016,064	4.0	4.027	-9.544	7.4633e-5	12.0	10.355	-9.264	7.4633e-5
	1/128	4,129,024	4.0	19.419	-9.134	2.1246e-5	12.0	49.827	-9.102	2.1246e-5
	1/256	16,646,400	5.0	79.796	-9.446	7.8953e-6	12.0	173.68	-9.092	7.8954e-6
	1/512	66,846,976	4.5	426.03	-9.174	5.0729e-6	12.5	1050.3	-9.204	5.0732e-6
$2^{10}$	1/64	4,064,256	4.0	16.246	-10.084	7.1404e-5	12.0	41.638	-9.244	7.1404e-5
	1/128	16,516,096	4.0	78.194	-9.861	1.8016e-5	12.0	203.61	-9.084	1.8017e-5
	1/256	66,585,600	4.0	265.24	-9.567	4.6657e-6	12.0	753.81	-9.072	4.6661e-6
	1/512	267,387,904	4.0	2520.7	-9.157	1.3276e-6	12.5	6989.3	-9.209	1.3282e-6

Table 5: Numerical results of GMRES with two different preconditioners on Example 2 with

(\gamma_{1},\gamma_{2})=(1.5,1.5)

$N_{t}$	$h_{x}=h_{y}$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/64	254,016	4.0	1.004	-9.297	1.5758e-4	11.0	2.487	-9.255	1.5758e-4
	1/128	1,032,256	4.5	5.322	-9.978	8.1963e-5	11.0	11.416	-9.227	8.1963e-5
	1/256	4,161,600	5.0	19.756	-10.122	7.9075e-5	11.5	41.569	-9.379	7.9075e-5
	1/512	16,711,744	5.0	115.82	-9.676	7.8490e-5	11.5	240.18	-9.345	7.8490e-5
$2^{8}$	1/64	1,016,064	4.0	4.058	-9.585	1.0778e-4	11.0	9.745	-9.127	1.0778e-4
	1/128	4,129,024	4.0	19.609	-9.106	2.9441e-5	11.0	46.275	-9.162	2.9441e-5
	1/256	16,646,400	4.5	73.342	-9.454	9.8515e-6	11.5	168.30	-9.259	9.8515e-6
	1/512	66,846,976	4.5	424.74	-9.300	5.1188e-6	11.5	965.63	-9.245	5.1183e-6
$2^{10}$	1/64	4,064,256	4.0	16.343	-10.060	1.0466e-4	11.0	39.182	-9.100	1.0466e-4
	1/128	16,516,096	4.0	78.225	-9.813	2.6328e-5	11.0	188.46	-9.126	2.6328e-5
	1/256	66,585,600	4.0	265.75	-9.582	6.7380e-6	11.5	733.89	-9.256	6.7382e-6
	1/512	267,387,904	4.0	2517.9	-9.282	1.8400e-6	11.5	6278.1	-9.224	1.8401e-6

Table 6: Numerical results of GMRES with two different preconditioners on Example 2 with

(\gamma_{1},\gamma_{2})=(1.7,1.9)

$N_{t}$	$h_{x}=h_{y}$	DoF	Iter	CPU	TRR	Err	Iter	CPU	TRR	Err
			$\mathcal{P}_{\alpha}$				$\mathcal{P}_{1}$
$2^{6}$	1/64	254,016	4.0	1.047	-9.492	2.3321e-4	11.5	2.513	-10.177	2.3321e-4
	1/128	1,032,256	4.0	4.976	-9.058	9.5250e-5	11.5	12.175	-10.188	9.5251e-5
	1/256	4,161,600	4.5	17.826	-9.887	7.9355e-5	11.5	41.627	-10.208	7.9355e-5
	1/512	16,711,744	4.5	114.13	-9.473	7.8240e-5	11.0	231.76	-9.154	7.8240e-5
$2^{8}$	1/64	1,016,064	4.0	4.035	-10.062	1.8715e-4	11.0	9.673	-9.339	1.8715e-4
	1/128	4,129,024	4.0	19.567	-9.425	4.9102e-5	11.5	48.305	-9.977	4.9102e-5
	1/256	16,646,400	4.0	66.497	-9.150	1.4579e-5	11.0	161.38	-9.064	1.4578e-5
	1/1024	66,846,976	4.0	380.78	-9.011	5.9522e-6	11.0	921.53	-9.115	5.9518e-6
$2^{10}$	1/64	4,064,256	4.0	16.225	-10.092	1.8428e-4	11.0	38.684	-9.027	1.8428e-4
	1/128	16,516,096	4.0	78.158	-9.905	4.6224e-5	11.5	198.25	-10.077	4.6224e-5
	1/256	66,585,600	4.0	266.02	-9.760	1.1700e-5	11.5	734.78	-10.020	1.1701e-5
	1/512	267,387,904	4.0	2519.5	-9.551	3.0690e-6	11.0	6117.6	-9.100	3.0691e-6

Similar to Example 1, Tables 4–6 show that the preconditioner $\mathcal{P}_{\alpha}$ converges much faster in terms of both CPU and Iter than $\mathcal{P}_{1}$ on this example, with different values of $\gamma$ ’s. The accuracy of two preconditioned BiCGSTAB methods is almost comparable with respect to TRR and Err. Once again, the results indicate that introducing the adaptive parameter $\alpha\in(0,1)$ indeed helps improve the performance of $\mathcal{P}_{1}$ ; the faster convergence rate of $\mathcal{P}_{\alpha}$ is computational attractive especially when the diffusion coefficients become smaller than $10^{-2}$ – cf. Remark 4. The $\tau$ -matrix approximation to the Jacobian matrix $A$ in (3.5) remains very effective for this two-dimensional model problem (4.1). For more details, one can attempt to play it by our accessible MATLAB codes. The iteration number of BiCGSTAB- $\mathcal{P}_{\alpha}$ varies only slightly when $N_{t}$ increases (or $h$ decreases), showing also for this problem almost matrix-size independent convergence rate. Such favourable numerical scalability property confirms our theoretical analysis since in Section 3.1 the eigenvalue distribution of the preconditioned matrices is independent of the space-discretization matrix $A$ .

5 Conclusions

In this note, we revisit the all-at-once linear system arising from the BDF $p$ temporal discretization for evolutionary PDEs. In particular, we present the BDF2 scheme for the RFDEs model as our case study, where the resultant all-at-once system is a BLTT linear system with a low-rank matrix perturbation. The conditioning of the all-at-once coefficient matrix is studied and tight bounds are provided. Then, we adapt the generalized BC preconditioner for such all-at-once systems, proving the invertibility of the preconditioner matrix unlike in previous studies. Our analysis demonstrates the superiority of the generalized BC preconditioner to the BC preconditioner. Moreover, the spectral properties of the preconditioned system and the convergence behavior of the preconditioned Krylov subspace solver have been investigated. By the $\tau$ -matrix approximation of the dense Jacobian matrix $A$ , we derive a memory-effective implementation of the generalized BC preconditioner for solving the one- and two-dimensional RFDEs model problems. Numerical results have been reported to show the effectiveness of the generalized BC preconditioner.

On the other hand, according to the analysis and implementation of the generalized BC preconditioner, the BDF2 scheme implemented via PinT preconditioned Krylov solvers can be easily extended to other spatial discretizations schemes for RFDEs (with other boundary conditions). Furthermore, our study may inspire the development of new parallel numerical methods preserving the positivity for certain evolutionary PDEs. As an outlook for the future, the extension of the generalized BC preconditioner and of its parallel implementation for solving nonlinear RFDEs [57, 32] (even in cases when non-uniform temporal steps are chosen in combination with the BDF2 scheme; refer to [34, 18] for a short discussion) remains an interesting topic of further research.

Acknowledgments

The authors would like to thank Dr. Xin Liang and Prof. Shu-Lin Wu for their insightful suggestions and encouragements. This research is supported by NSFC (11801463 and 61876203) and the Applied Basic Research Project of Sichuan Province (2020YJ0007). Meanwhile, the first author is grateful to Prof. Hai-Wei Sun for his helpful discussions during the visiting to University of Macau.

References

[1] S.Ş. Bayın, Definition of the Riesz derivative and its application to space fractional quantum mechanics, J. Math. Phys., 57(12) (2016), Article No. 123501, 12 pages. DOI: 10.1063/1.4968819.
[2] R.L. Magin, Fractional Calculus in Bioengineering, Begell House Publishers, Danbury, CT (2006).
[3] M.M. Meerschaert, C. Tadjeran, Finite difference approximations for two-sided space-fractional partial differential equations, Appl. Numer. Math., 56(1) (2006), pp. 80-90.
[4] E. Scalas, R. Gorenflo, F. Mainardi Fractional calculus and continuous-time finance, Physica A, 284(1-4) (2000), pp. 376-384.
[5] A.A. Kilbas, H.M. Srivastava, J.J. Trujillo, Theory and applications of fractional differential equations, Elsevier, Amsterdam, Netherlands (2006).
[6] R. Metzler, J. Klafter, The random walk’s guide to anomalous diffusion: a fractional dynamics approach, Phys. Rep., 339(1) (2000), pp. 1-77.
[7] M. Chen, Y. Wang, X. Cheng, W. Deng, Second-order LOD multigrid method for multidimensional Riesz fractional diffusion equation, BIT, 54(3) (2014), pp. 623-647.
[8] C. Çelik, M. Duman, Crank-Nicolson method for the fractional diffusion equation with the Riesz fractional derivative, J. Comput. Phys., 231(4) (2012), pp. 1743-1750.
[9] W.Y. Tian, H. Zhou, W. Deng, A class of second order difference approximations for solving space fractional diffusion equations, Math. Comp., 84(294) (2015), pp. 1703-1727.
[10] H.-L. Liao, P. Lyu, S. Vong, Second-order BDF time approximation for Riesz space-fractional diffusion equations, Int. J. Comput. Math., 95(1) (2018), pp. 144-158.
[11] D. Hu, X. Cao, A fourth-order compact ADI scheme for two-dimensional Riesz space fractional nonlinear reaction-diffusion equation, Int. J. Comput. Math., 97(9) (2020), pp. 1928-1948. DOI: 10.1080/00207160.2019.1671587.
[12] X. Yue, S. Shu, X. Xu, W. Bu, K. Pan, Parallel-in-time multigrid for space-time finite element approximations of two-dimensional space-fractional diffusion equations, Comput. Math. Appl., 78(11) (2019), pp. 3471-3484.
[13] Y. Xu, Y. Zhang, J. Zhao, Backward difference formulae and spectral Galerkin methods for the Riesz space fractional diffusion equation, Math. Comput. Simul., 166 (2019), pp. 494-507.
[14] Y. Xu, Y. Zhang, J. Zhao, General linear and spectral Galerkin methods for the Riesz space fractional diffusion equation, Appl. Math. Comput., 364 (2020), Article No. 124664, 12 pages. DOI: 10.1016/j.amc.2019.124664.
[15] J. Zhao, Y. Zhang, Y. Xu, Implicit Runge-Kutta and spectral Galerkin methods for Riesz space fractional/distributed-order diffusion equation, Comput. Appl. Math., 39(2) (2020), Article No. 47, 27 pages. DOI: 10.1007/s40314-020-1102-3.
[16] H.-K. Pang, H.-W. Sun, Fast numerical contour integral method for fractional diffusion equations, J. Sci. Comput., 66(1) (2016), pp. 41-66.
[17] M.J. Gander, M. Neumüller, Analysis of a new space-time parallel multigrid algorithm for parabolic problems, SIAM J. Sci. Comput., 38(4) (2016), pp. A2173-A2208.
[18] R.D. Falgout, M. Lecouvez, C.S. Woodward, A parallel-in-time algorithm for variable step multistep methods, J. Comput. Sci., 37 (2019), Article No. 101029, 12 pages. DOI: 10.1016/j.jocs.2019.101029.
[19] S.-L. Wu, T. Zhou, Acceleration of the MGRiT algorithm via the diagonalization technique, SIAM J. Sci. Comput., 41(5) (2019), pp. A3421-A3448.
[20] M.E. Farquhar, T.J. Moroney, Q. Yang, I.W. Turner, GPU accelerated algorithms for computing matrix function vector products with applications to exponential integrators and fractional diffusion, SIAM J. Sci. Comput., 38(3) (2016), pp. C127-C149.
[21] S.-L. Wu, T. Zhou, Fast parareal iterations for fractional diffusion equations, J. Comput. Phys., 329 (2017), pp. 210-226.
[22] Y. Maday, E.M. Rønquist, Parallelization in time through tensor-product space-time solvers, C. R. Math., 346(1-2) (2008), pp. 113-118.
[23] X.-M. Gu, T.-Z. Huang, X.-L. Zhao, H.-B. Li, L. Li, Strang-type preconditioners for solving fractional diffusion equations by boundary value methods, J. Comput. Appl. Math., 277 (2015), pp. 73-86.
[24] M.J. Gander, 50 Years of Time Parallel Time Integration, in Multiple Shooting and Time Domain Decomposition Methods (T. Carraro, M. Geiger, S. Körkel, et al., eds.), Contributions in Mathematical and Computational Sciences, vol 9. Springer, Cham, Switzerland (2017), pp. 69-113. DOI: 10.1007/978-3-319-23321-5_3.
[25] S.-L. Lei, Y.-C. Huang, Fast algorithms for high-order numerical methods for space-fractional diffusion equations, Int. J. Comput. Math., 94(5) (2017), pp. 1062-1078.
[26] T. Breiten, V. Simoncini, M. Stoll, Low-rank solvers for fractional differential equations, Electron. Trans. Numer. Anal., 45 (2016), pp. 107-132.
[27] A.T. Weaver, S. Gürol, J. Tshimanga, M. Chrust, A. Piacentini, “Time”-parallel diffusion-based correlation operators, Q. J. R. Meteorol. Soc., 144(716) (2018), pp. 2067-2088.
[28] D. Bertaccini, F. Durastante, Block structured preconditioners in tensor form for the all-at-once solution of a finite volume fractional diffusion equation, Appl. Math. Lett., 95 (2019), pp. 92-97.
[29] I. Podlubny, A. Chechkin, T. Skovranek, Y.Q. Chen, B.M.V. Jara, Matrix approach to discrete fractional calculus II: Partial fractional differential equations, J. Comput. Phys., 228(8) (2009), pp. 3137-3153.
[30] M. Ran, C. Zhang, A high-order accuracy method for solving the fractional diffusion equations, J. Comp. Math., 38(2) (2020), pp. 239-253.
[31] L. Banjai, D. Peterseim, Parallel multistep methods for linear evolution problems, IMA J. Numer. Anal., 32(3) (2012), pp. 1217-1240.
[32] Y.-L. Zhao, P.-Y. Zhu, X.-M. Gu, X.-L. Zhao, H.-Y. Jian, A preconditioning technique for all-at-once system from the nonlinear tempered fractional diffusion equation, J. Sci. Comput., 83(1) (2020), Article No. 10, 27 pages. DOI: 10.1007/s10915-020-01193-1.
[33] E. McDonald, J. Pestana, A. Wathen, Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations, SIAM J. Sci. Comput., 40(2) (2018), pp. A1012-A1033.
[34] A. Goddard, A. Wathen, A note on parallel preconditioning for all-at-once evolutionary PDEs, Electron. Trans. Numer. Anal., 51 (2019), pp. 135-150.
[35] X.-L. Lin, M. Ng, An all-at-once preconditioner for evolutionary partial differential equations, arXiv preprint, arXiv:2002.01108, submitted to SIAM J. Sci. Comput., 4 Feb. 2020, 17 pages.
[36] S.-L. Wu, H. Zhang, T. Zhou, Solving time-periodic fractional diffusion equations via diagonalization technique and multigrid, Numer. Linear Algebra Appl., 25(5) (2018), Article No. e2178, 24 pages. DOI: 10.1002/nla.2178.
[37] O. Bokanowski, K. Debrabant, Backward differentiation formula finite difference schemes for diffusion equations with an obstacle term, IMA J. Numer. Anal., to appear, 6 Jul. 2020, 29 pages. DOI: 10.1093/imanum/draa014.
[38] M. Donatelli, M. Mazza, S. Serra-Capizzano, Spectral analysis and structure preserving preconditioners for fractional diffusion equations, J. Comput. Phys., 307 (2016), pp. 262-279.
[39] K. Burrage, Parallel and Sequential Methods for Ordinary Differential Equations, Oxford University Press, New York, NY (1995).
[40] U.M. Ascher, L.R. Petzold, Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations, SIAM, Philadelphia, PA (1998).
[41] C. Lubich, D. Mansour, C. Venkataraman, Backward difference time discretization of parabolic differential equations on evolving surfaces, IMA J. Numer. Anal., 33(4) (2013), pp. 1365-1385.
[42] E. Emmrich, Error of the two-step BDF for the incompressible Navier-Stokes problem, ESAIM-Math. Model. Numer. Anal., 38(5) (2004), 757-764.
[43] V. Kučera, M. Vlasák, A priori diffusion-uniform error estimates for nonlinear singularly perturbed problems: BDF2, midpoint and time DG, ESAIM-Math. Model. Numer. Anal., 51(2) (2017), pp. 537-563.
[44] H. Nishikawa, On large start-up error of BDF2, J. Comput. Phys., 392 (2019), pp. 456-461.
[45] Q. Zhang, Y. Ren, X. Lin, Y. Xu, Uniform convergence of compact and BDF methods for the space fractional semilinear delay reaction-diffusion equations. Appl. Math. Comput., 358 (2019), pp. 91-110.
[46] C. Li, M.-M. Li, X. Sun, Finite difference/Galerkin finite element methods for a fractional heat conduction-transfer equation, Math. Meth. Appl. Sci., to appear, 30 Oct. 2019, 20 pages. DOI: 10.1002/mma.5984.
[47] D. Bini, F. Di Benedetto, A new preconditioner for the parallel solution of positive definite Toeplitz systems, Proceedings of the Second Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’90), ACM, New York, NY (1990), pp. 220-223. DOI: 10.1145/97444.97688.
[48] F. Di Benedetto, Preconditioning of block Toeplitz matrices by sine transforms, SIAM J. Sci. Comput., 18(2) (1997), pp. 499-515.
[49] S. Serra, Superlinear PCG methods for symmetric Toeplitz systems, Math. Comp., 68(226) (1999), pp. 793-803.
[50] Z. Hao, Z. Zhang, R. Du, Fractional centered difference scheme for high-dimensional integral fractional Laplacian, J. Comput. Phys., 424 (2021), Article No. 109851, 17 pages. DOI: 10.1016/j.jcp.2020.109851.
[51] Q. Yang, F. Liu, I. Turner, Numerical methods for fractional partial differential equations with Riesz space fractional derivatives, Appl. Math. Model., 34(1) (2010), pp. 200-218.
[52] Z.-P. Hao, Z.-Z. Sun, W.-R. Cao, A fourth-order approximation of fractional derivatives with its applications, J. Comput. Phys., 281 (2015), pp. 787-805.
[53] W.W. Hager, Updating the inverse of a matrix, SIAM Rev. 31(2) (1989), pp. 221-239.
[54] X.-M. Gu, S.-L. Wu, A parallel-in-time iterative algorithm for Volterra partial integro-differential problems with weakly singular kernel, J. Comput. Phys., 417 (2020), Article No. 109576, 17 pages. DOI: 10.1016/j.jcp.2020.109576.
[55] P. Jiránek, M. Rozložník, Adaptive version of Simpler GMRES, Numer. Algorithms, 53(1) (2010), pp. 93-112.
[56] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM, Philadelphia, PA (2003).
[57] M.J. Gander, L. Halpern, Time parallelization for nonlinear problems based on diagonalization, in Domain Decomposition Methods in Science and Engineering XXIII (C.-O. Lee, X.-C. Cai, D.E. Keyes, et al., eds.), Lecture Notes in Computational Science and Engineering, vol. 116, Springer, Cham, Switzerland (2017), pp. 163-170. DOI: 10.1007/978-3-319-52389-7_15.