This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A note on parallel preconditioning for the all-at-once solution of Riesz fractional diffusion equations

Xian-Ming Gu [email protected] Yong-Liang Zhao [email protected] Xi-Le Zhao [email protected] Bruno Carpentieri [email protected] Yu-Yun Huang [email protected] School of Economic Mathematics/Institute of Mathematics,
Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, P.R. China
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence,
University of Groningen, Nijenborgh 9, P.O. Box 407, 9700 AK Groningen, The Netherlands
School of Mathematical Sciences,
University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P.R. China
Facoltà di Scienze e Tecnologie informatiche,
Libera Università di Bolzano, Dominikanerplatz 3 - piazza Domenicani, 3 Italy - 39100, Bozen-Bolzano.
Abstract

The pp-step backward difference formula (BDF) for solving systems of ODEs can be formulated as all-at-once linear systems that are solved by parallel-in-time preconditioned Krylov subspace solvers (see McDonald, Pestana, and Wathen [SIAM J. Sci. Comput., 40(2) (2018): A1012-A1033] and Lin and Ng [arXiv:2002.01108, 2020, 17 pages]). However, when the BDFpp (2p62\leq p\leq 6) method is used to solve time-dependent PDEs, the generalization of these studies is not straightforward as pp-step BDF is not selfstarting for p2p\geq 2. In this note, we focus on the 2-step BDF which is often superior to the trapezoidal rule for solving the Riesz fractional diffusion equations, and show that it results into an all-at-once discretized system that is a low-rank perturbation of a block triangular Toeplitz system. We first give an estimation of the condition number of the all-at-once systems and then, capitalizing on previous work, we propose two block circulant (BC) preconditioners. Both the invertibility of these two BC preconditioners and the eigenvalue distributions of preconditioned matrices are discussed in details. An efficient implementation of these BC preconditioners is also presented, including the fast computation of dense structured Jacobi matrices. Finally, numerical experiments involving both the one- and two-dimensional Riesz fractional diffusion equations are reported to support our theoretical findings.

keywords:
Backwards difference formula, all-at-once discretization, parallel-in-time preconditioning, Krylov subsp-ace solver, fractional diffusion equation.
journal: Numerical Mathematics: Theory, Methods and Applications

1 Introduction

In this paper, we are particularly interested in the efficient numerical solution of evolutionary partial differential equations (PDEs) with both first order temporal derivative and space fractional-order derivative(s). These models arise in various scientific applications in different fields including physics [1], bioengineering [2], hydrology [3], and finance [4], etc., owing to the potential of fractional calculus to describe rather accurately natural processes which maintain long-memory and hereditary properties in complex systems [5, 2]. In particular, fractional diffusion equations can provide an adequate and accurate description of transport processes that exhibit anomalous diffusion, for example subdiffusive phenomena and Lévy fights [6], which cannot be modelled properly by second-order diffusion equations. As most fractional diffusion equations can not be solved analytically, approximate numerical solutions are sought by using efficient numerical methods such as, e.g., (compact) finite difference [3, 8, 7, 9, 10, 11], finite element [12] and spectral (Galerkin) methods [13, 14, 15].

Many numerical techniques proposed in the literature for solving this class of problems are the common time-stepping schemes. They solve the underlying evolutionary PDEs with space fractional derivative(s) by marching in time sequentially, one level after the other. As many time steps may be usually necessary to balance the numerical errors arising from the spatial discretization, these conventional time-stepping schemes can be very time-consuming. This concern motivates the recent development of parallel-in-time (PinT) numerical solutions for evolutionary PDEs (especially with space fractional derivative(s)) including, e.g., the inverse Laplace transform method [16], the MGRIT method [17, 18, 12, 19], the exponential integrator [20] and the parareal method [21]. A class of PinT methods, i.e., the space-time methods, solves the evolutionary PDEs at all the time levels simultaneously by performing an all-at-once discretization that results into a large-scale linear system that is typically solved by preconditioned Krylov subspace methods; refer e.g., to [22, 24, 29, 27, 23, 25, 28, 26, 30, 31] for details. However, most of them only focus on the numerical solutions of one-dimensional space fractional diffusion equations [23, 25, 30, 32] due to the huge computational cost required for high-dimensional problems.

Recently, McDonald, Pestana and Wathen proposed in [33] a block circulant (BC) preconditioner to accelerate the convergence of Krylov subspace methods for solving the all-at-once linear system arising from pp-step BDF temporal discretization of evolutionary PDEs. Parallel experiments with the BC preconditioner in [33] are reported by Goddard and Wathen in [34]. In [35], a generalized version of the BC preconditioner has been proposed by Lin and Ng who introduced a parameter α(0,1)\alpha\in(0,1) into the top-right block of the BC preconditioner that can be fine-tuned to handle efficiently the case of very small diffusion coefficients. Both the BC and the generalized BC preconditioners use a modified diagonalization technique that is originally proposed by Maday and Rønquist [22, 24]. The investigations in [33, 34, 35, 36] mainly focus on the 1-step BDF (i.e. the backward Euler method) for solving the underlying evolutionary PDEs, which results in an exact block lower triangular Toeplitz (BLTT) all-at-once system. On the other hand, when the BDFpp (2p62\leq p\leq 6) method is used to discretize the evolutionary PDEs, the complete BLTT all-at-once systems cannot be obtained as implicit schemes based on the BDFpp (2p62\leq p\leq 6) for evolutionary PDEs are not selfstarting [13, 39, 40, 37, 41]. For example, when we establish the fully discretized scheme based on popular BDF2 for evolutionary PDEs, we often need to use the backward Euler method to compute the solution at the first time level [42, 43, 44, 10].

In this study, we particularly consider the second-order accurate implicit difference BDF2 scheme for solving the one- and two-dimensional Riesz fractional diffusion equations (RFDEs). Although the Crank-Nicolson (C-N) method [8] is a very popular solution option for solving such RFDEs, it is still only AA-stable rather than LL-stable [40, 45]. By contrast, the BDF2 scheme with stiff decay can be more “stable” and slightly cheaper because it is always unconditionally stable and the numerical solution is often guaranteed to be positive and physically more reliable near initial time for the numerical solutions of evolutionary PDEs with either integral or fractional order spatial derivatives; see [10, 37, 42, 46] for details. After the spatial discretization of Riesz fractional derivative(s), we reformulate the BDF2 scheme for the semi-discretized system of ODEs into an all-at-once system, where its coefficient matrix is a BLTT matrix with a low-rank perturbation. Then, we tightly estimate the condition number of the all-at-once systems and adapt the generalized BC (also including the standard BC) preconditioner for such an all-at-once system. Meanwhile, the invertibility of the generalized BC preconditioner is discussed apart from the work in [33, 35], and the eigenvalue distributions of preconditioned matrices dictating the convergence rate of Krylov subspace methods is preliminarily investigated. From these discussions, we derive clear arguments explaining the better performance of the generalized BC preconditioner against the BC preconditioner, especially for very small diffusion coefficient.

The practical implementation of the generalized BC preconditioner requires to solve a sequence of dense complex-shifted linear systems with real symmetric negative definite (block) Toeplitz coefficient matrices, which arise from the numerical discretization of Riesz fractional derivatives. Because solving such Toeplitz systems is often very time-consuming, classical circulant preconditioners have been adapted to accelerate the convergence of iterative solutions, especially for level-1 Toeplitz systems. However, circulant preconditioners are often less efficient to extend for solving high-dimensional model problems (i.e., block Toeplitz discretized systems) [28, 48]. In this work, by approximating the real symmetric (block) Toeplitz matrix by the (block) τ\tau-matrix [47, 48], that can be efficiently diagonalized using the discrete sine transforms (DSTs), we present an efficient implementation for the generalized BC preconditioner that does avoid any dense matrix storage and only needs to solve the first half of the sequence of complex-shifted systems. Estimates of the computational cost and of the memory requirements of the associated generalized BC preconditioner are given. Our numerical experiments suggest that the BDF2 all-at-once system utilized the preconditioned Krylov subspace solvers can be a competitive solution method for RFDEs.

The rest of this paper is organized as follows. In the next section, the all-at-once linear system derived from the BDF2 scheme for solving the RFDEs is presented. Meanwhile, the invertibility of the pertinent all-at-once system is proved and its condition number is estimated. In Section 3, the generalized BC preconditioner is adapted and discussed, both the properties and the efficient implementation of the preconditioned system are analyzed. In Section 4, numerical results are reported to support our findings and the effectiveness of the proposed preconditioner. Finally, the paper closes with conclusions in Section 5.

2 The all-at-once system of Riesz fractional diffusion equations

In this section, we present the development of a numerical scheme for initial-boundary problem of Riesz fractional diffusion equations that preserves the positivity of the solutions. Then, in the next section, we discuss its efficient parallel implementation.

2.1 The all-at-once discretization of Riesz fractional diffusion equation

The governing Riesz fractional diffusion equations 111Here it is worth noting that if the Riesz fractional derivative is replaced with the fractional Laplacian, we can just follow the work in [50] for the spatial discretization of one- or two-dimensional model problem and our proposed numerical techniques in the next sections are easy and simple to be adapted for such an extension. of the anomalous diffusion process can be written as

{u(x,t)t=κγγu(x,t)|x|γ+f(x,t),(x,t)(a,b)×(0,T],u(x,0)=ϕ(x),x[a,b],u(a,t)=u(b,t)=0,t[0,T],\begin{cases}\frac{\partial u(x,t)}{\partial t}=\kappa_{\gamma}\frac{\partial^{\gamma}u(x,t)}{\partial|x|^{\gamma}}+f(x,t),&(x,t)\in(a,b)\times(0,T],\\ u(x,0)=\phi(x),&x\in[a,b],\\ u(a,t)=u(b,t)=0,&t\in[0,T],\end{cases} (2.1)

where u(x,t)u(x,t) may represent, for example, a solute concentration, and constant κγ>0\kappa_{\gamma}>0 the diffusion coefficient. This equations is a superdiffusive model largely used in fluid flow analysis, financial modelling and other applications. The Riesz fractional derivative γu(x,t)/|x|γ\partial^{\gamma}u(x,t)/\partial|x|^{\gamma} is defined by [51]

γu(x,t)|x|γ=12cos(πγ2)1Γ(2γ)2x2abu(ξ,t)|xξ|γ1𝑑ξ=12cos(πγ2)[Dxγau(x,t)+Dbγxu(x,t)],γ(1,2),\begin{split}\frac{\partial^{\gamma}u(x,t)}{\partial|x|^{\gamma}}&=-\frac{1}{2\cos\left(\frac{\pi\gamma}{2}\right)}\cdot\frac{1}{\Gamma(2-\gamma)}\cdot\frac{\partial^{2}}{\partial x^{2}}\int^{b}_{a}\frac{u(\xi,t)}{|x-\xi|^{\gamma-1}}d\xi\\ &=-\frac{1}{2\cos\left(\frac{\pi\gamma}{2}\right)}[{}_{a}D^{\gamma}_{x}u(x,t)+{}_{x}D^{\gamma}_{b}u(x,t)],\quad\gamma\in(1,2),\end{split} (2.2)

in which Dxγa{}_{a}D^{\gamma}_{x} and Dbγx{}_{x}D^{\gamma}_{b} are the left and right Riemann-Liouville fractional derivatives of order γ(1,2)\gamma\in(1,2) given, respectively, by [5]

Dxγau(x,t)=1Γ(2γ)2x2axu(s,t)(xs)γ1𝑑s,{}_{a}D^{\gamma}_{x}u(x,t)=\frac{1}{\Gamma(2-\gamma)}\frac{\partial^{2}}{\partial x^{2}}\int^{x}_{a}\frac{u(s,t)}{(x-s)^{\gamma-1}}ds,

and

Dbγxu(x,t)=1Γ(2γ)2x2xbu(s,t)(sx)γ1𝑑s.{}_{x}D^{\gamma}_{b}u(x,t)=\frac{1}{\Gamma(2-\gamma)}\frac{\partial^{2}}{\partial x^{2}}\int^{b}_{x}\frac{u(s,t)}{(s-x)^{\gamma-1}}ds.

As γ2\gamma\rightarrow 2, it notes that Eq. (2.1) degenerates into the classical diffusion equation [51].

Next, we focus on the numerical solution of Eq. (2.1). We consider a rectangle Q¯T={(x,t):axb,0tT}\bar{Q}_{T}=\{(x,t):~{}a\leq x\leq b,~{}0\leq t\leq T\} discretized on the mesh ϖhτ=ϖh×ϖ\varpi_{h\tau}=\varpi_{h}\times\varpi, where ϖh={xi=a+ih,i=0,1,,N;h=(ba)/N}\varpi_{h}=\{x_{i}=a+ih,~{}i=0,1,\cdots,N;~{}h=(b-a)/N\}, and ϖτ={tk=kτ,j=0,1,,M;τ=T/M}\varpi_{\tau}=\{t_{k}=k\tau,~{}j=0,1,\cdots,M;~{}\tau=T/M\}. We denote by v={vi|0iN}v=\{v_{i}~{}|~{}0\leq i\leq N\} any grid function.

A considerable amount of work has been devoted in the past years to the development of fast methods for the approximation of the Riesz and Riemann-Liouville (R-L) fractional derivatives, such as the first-order accurate shifted Grünwald approximation [3, 51] and the second-order accurate weighted-shifted Grünwald difference (WSGD) approximation [9, 52]. Due to the relationship between the two kinds of fractional derivatives, all the numerical schemes proposed for the R-L fractional derivatives can be easily adapted to approximate the Riesz fractional derivative. Although the solution approach described in this work can accomodate any spatial discretized method, we choose the so-called fractional centred difference formula [8] of the Riesz fractional derivatives for clarity.

For any function u(x)L1()u(x)\in L^{1}(\mathbb{R}), we denote

Δhγu(x)=1hγ=[(bx)/h][(xa)/h]ω(γ)u(xh),x,\Delta^{\gamma}_{h}u(x)=-\frac{1}{h^{\gamma}}\sum^{[(x-a)/h]}_{\ell=-[(b-x)/h]}\omega^{(\gamma)}_{\ell}u(x-\ell h),\quad x\in\mathbb{R}, (2.3)

where the γ\gamma-dependent weight coefficient is defined as

ω(γ)=(1)Γ(1+γ)Γ(1+γ/2)Γ(1+γ/2+),.\omega^{(\gamma)}_{\ell}=\frac{(-1)^{\ell}\Gamma(1+\gamma)}{\Gamma(1+\gamma/2-\ell)\Gamma(1+\gamma/2+\ell)},\quad\ell\in\mathbb{Z}. (2.4)

As noted in [8], ω(γ)=𝒪(1γ)\omega^{(\gamma)}_{\ell}=\mathcal{O}(\ell^{-1-\gamma}) and the fractional centred difference formula Δhγu(x)\Delta^{\gamma}_{h}u(x) exists for any u(x)L1()u(x)\in L^{1}(\mathbb{R}). Some properties of the coefficient ω(γ)\omega^{(\gamma)}_{\ell} and the operator Δhγu(x)\Delta^{\gamma}_{h}u(x) are presented in the following lemmas.

Lemma 2.1.

([8]) For γ(1,2)\gamma\in(1,2), the coefficient ω(γ)\omega^{(\gamma)}_{\ell}, \ell\in\mathbb{Z}, defined in (2.4) fulfils

{ω0(γ)0,ω(γ)=ω(γ),||1,=ω(γ)=0,ω0(γ)==1|ω(γ)|+=1|ω(γ)|.\begin{cases}\omega^{(\gamma)}_{0}\geq 0,\quad\omega^{(\gamma)}_{-\ell}=\omega^{(\gamma)}_{\ell},\quad|\ell|\geq 1,\\ \sum\limits^{\infty}_{\ell=-\infty}\omega^{(\gamma)}_{\ell}=0,\quad\omega^{(\gamma)}_{0}=\sum\limits^{-1}_{\ell=-\infty}|\omega^{(\gamma)}_{\ell}|+\sum\limits^{\infty}_{\ell=1}|\omega^{(\gamma)}_{\ell}|.\end{cases} (2.5)
Lemma 2.2.

([8, 50]) Suppose that uL1()u\in L^{1}(\mathbb{R}) and

u(x)𝒞2+γ():={u|+(1+|ξ|)2+γ|u^(ξ)|𝑑ξ<},u(x)\in\mathscr{C}^{2+\gamma}(\mathbb{R}):=\bigg{\{}u\bigg{|}\int\nolimits_{-\infty}^{+\infty}(1+|\xi|)^{2+\gamma}|\hat{u}(\xi)|d\xi<\infty\bigg{\}},

where u^(ξ)\hat{u}(\xi) is the Fourier transformation of u(x)u(x). Then for a fixed hh, the fractional centred difference operator in (2.3) holds

γu(x)|x|γ=Δhγu(x)+𝒪(h2)\frac{\partial^{\gamma}u(x)}{\partial|x|^{\gamma}}=\Delta^{\gamma}_{h}u(x)+\mathcal{O}(h^{2})

uniformly for xx\in\mathbb{R} and u(x)0u(x)\equiv 0 (x\[a,b]x\in\mathbb{R}\backslash[a,b]). In particular, if γ=2\gamma=2, then it coincides with the second-order derivative approximation.

At this stage, let u(x,t)𝒞x,t4,3([a,b]×[0,T])u(x,t)\in\mathcal{C}^{4,3}_{x,t}([a,b]\times[0,T]) be a solution to the problem (2.1) and consider Eq. (2.1) at the set of grid points (x,t)=(xi,tk)Q¯T(x,t)=(x_{i},t_{k})\in\bar{Q}_{T} with i=1,2,,N1,k=1,,Mi=1,2,\cdots,N-1,~{}k=1,\cdots,M. The first-order time derivative at the point t=tkt=t_{k} is approximated by the second-order backward difference formula, i.e.,

u(x,t)t|t=tk=u(x,tk)4u(x,tk1)+3u(x,tk2)2τ+𝒪(τ2),k2,\frac{\partial u(x,t)}{\partial t}\Big{|}_{t=t_{k}}=\frac{u(x,t_{k})-4u(x,t_{k-1})+3u(x,t_{k-2})}{2\tau}+\mathcal{O}(\tau^{2}),\quad k\geq 2, (2.6)

then we define Uik=u(xi,tk)U^{k}_{i}=u(x_{i},t_{k}) and fik=f(xi,tk)f^{k}_{i}=f(x_{i},t_{k}) to obtain

{Uik4Uik1+3Uik22τ=κγΔhγUik+fik+Rik,1iN1,2kM,Ui0=ϕ(xi),1iN1,U0k=UNk=0,0kM,\begin{cases}\frac{U^{k}_{i}-4U^{k-1}_{i}+3U^{k-2}_{i}}{2\tau}=\kappa_{\gamma}\Delta^{\gamma}_{h}U^{k}_{i}+f^{k}_{i}+R^{k}_{i},&1\leq i\leq N-1,~{}2\leq k\leq M,\\ U^{0}_{i}=\phi(x_{i}),&1\leq i\leq N-1,\\ U^{k}_{0}=U^{k}_{N}=0,&0\leq k\leq M,\end{cases} (2.7)

where {Rik}\{R^{k}_{i}\} are small and satisfy the inequality

|Rik|c(τ2+h2),1iN1,2kM.|R^{k}_{i}|\leq c(\tau^{2}+h^{2}),\quad 1\leq i\leq N-1,~{}2\leq k\leq M.

Note that we cannot omit the small term in the derivation of a two-level difference scheme that is not selfstarting from Eq. (2.7) due to the unknown information of u(xi,t1)u(x_{i},t_{1}). One of the most popular strategies to compute the first time step solution ui1u^{1}_{i} is to use a first-order backward Euler scheme (which is only used once). This yields the following implicit difference scheme [37, 10] for Eq. (2.1):

{Dt2uik=κγΔhγuik+fik,1iN1,2kM,ui0=ϕ(xi),1iN1,u0k=uNk=0,0kM,\begin{cases}D^{2}_{t}u^{k}_{i}=\kappa_{\gamma}\Delta^{\gamma}_{h}u^{k}_{i}+f^{k}_{i},&1\leq i\leq N-1,~{}2\leq k\leq M,\\ u^{0}_{i}=\phi(x_{i}),&1\leq i\leq N-1,\\ u^{k}_{0}=u^{k}_{N}=0,&0\leq k\leq M,\end{cases} (2.8)

where

Dt2uik={uik4uik1+3uik22τ,2kM,ui1ui0τ,k=1.D^{2}_{t}u^{k}_{i}=\begin{cases}\frac{u^{k}_{i}-4u^{k-1}_{i}+3u^{k-2}_{i}}{2\tau},&2\leq k\leq M,\\ \frac{u^{1}_{i}-u^{0}_{i}}{\tau},&k=1.\end{cases}

In order to implement the proposed scheme, here we define 𝒖k=[u(x1,tk),u(x2,tk),,u(xN1,tk)]{\bm{u}}^{k}=[u(x_{1},t_{k}),u(x_{2},t_{k}),\cdots,u(x_{N-1},t_{k})]^{\top} and 𝒇k=[f(x1,tk),f(x2,tk),,f(xN1,tk)]{\bm{f}}^{k}=[f(x_{1},t_{k}),f(x_{2},t_{k}),\cdots,f(x_{N-1},t_{k})]^{\top}; then κγΔhγuik\kappa_{\gamma}\Delta^{\gamma}_{h}u^{k}_{i} can be written into the matrix-vector product form A𝒖kA{\bm{u}}^{k}, where

A=κγTx=κγhγ[ω0(γ)ω1(γ)ω2(γ)ω3N(γ)ω2N(γ)ω1(γ)ω0(γ)ω1(γ)ω4N(γ)ω3N(γ)ω2(γ)ω1(γ)ω0(γ)ω5N(γ)ω4N(γ)ωN3(γ)ωN4(γ)ωN5(γ)ω0(γ)ω1(γ)ωN2(γ)ωN3(γ)ωN4(γ)ω1(γ)ω0(γ)](N1)×(N1).A=-\kappa_{\gamma}T_{x}=-\frac{\kappa_{\gamma}}{h^{\gamma}}\begin{bmatrix}\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}&\omega^{(\gamma)}_{-2}&\cdots&\omega^{(\gamma)}_{3-N}&\omega^{(\gamma)}_{2-N}\\ \omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}&\cdots&\omega^{(\gamma)}_{4-N}&\omega^{(\gamma)}_{3-N}\\ \omega^{(\gamma)}_{2}&\omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}&\cdots&\omega^{(\gamma)}_{5-N}&\omega^{(\gamma)}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma)}_{N-3}&\omega^{(\gamma)}_{N-4}&\omega^{(\gamma)}_{N-5}&\cdots&\omega^{(\gamma)}_{0}&\omega^{(\gamma)}_{-1}\\ \omega^{(\gamma)}_{N-2}&\omega^{(\gamma)}_{N-3}&\omega^{(\gamma)}_{N-4}&\cdots&\omega^{(\gamma)}_{1}&\omega^{(\gamma)}_{0}\end{bmatrix}\in\mathbb{R}^{(N-1)\times(N-1)}. (2.9)

It is easy to prove that TxT_{x} is a symmetric positive definite (SPD) Toeplitz matrix (see [8]). Therefore, it can be stored with only N1N-1 entries and the fast Fourier transforms (FFTs) can be applied to carry out the matrix-vector product in 𝒪(NlogN)\mathcal{O}(N\log N) operations. The matrix-vector form of the 2-step BDF method with start-up backward Euler method for solving the model problem (2.1) can be formulated as follows:

𝒖1𝒖0τA𝒖1\displaystyle\frac{{\bm{u}}^{1}-{\bm{u}}^{0}}{\tau}-A{\bm{u}}^{1} =𝒇1,\displaystyle={\bm{f}}^{1}, (2.10a)
3𝒖k4𝒖k1+𝒖k22τA𝒖k\displaystyle\frac{3{\bm{u}}^{k}-4{\bm{u}}^{k-1}+{\bm{u}}^{k-2}}{2\tau}-A{\bm{u}}^{k} =𝒇k,2kNt.\displaystyle={\bm{f}}^{k},\quad 2\leq k\leq N_{t}. (2.10b)

We refer to matrix AA as the Jacobian matrix. In particular, we note that the numerical scheme (2.10) has been proved to be unconditionally stable and second-order convergent in the discrete L2L^{2} norm [10]. Note that it is computationally more efficient than the C-N scheme presented in [8] as it requires one less matrix-vector multiplication per time step.

Instead of computing the solution of (2.10) step-by-step, we try to get an all-at-once approximation by solving the following linear system:

([1232122321223212232]IsτItA)[𝒖1𝒖2𝒖Nt1𝒖Nt]=[τ𝒇1+𝒖0τ𝒇2𝒖02τ𝒇Nt1τ𝒇Nt],\left(\begin{bmatrix}1&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}\otimes I_{s}-\tau I_{t}\otimes A\right)\begin{bmatrix}{\bm{u}}^{1}\\ {\bm{u}}^{2}\\ \vdots\\ {\bm{u}}^{N_{t}-1}\\ {\bm{u}}^{N_{t}}\\ \end{bmatrix}=\begin{bmatrix}\tau{\bm{f}}^{1}+{\bm{u}}^{0}\\ \tau{\bm{f}}^{2}-\frac{{\bm{u}}^{0}}{2}\\ \vdots\\ \tau{\bm{f}}^{N_{t}-1}\\ \tau{\bm{f}}^{N_{t}}\\ \end{bmatrix}, (2.11)

where IsI_{s} and ItI_{t} are two identity matrices of order Ns(=N1)N_{s}~{}(=N-1) and Nt(=M)N_{t}~{}(=M), respectively. We denote the above linear system as follows:

𝒜𝑼=𝑭,𝑼=[𝒖1𝒖2𝒖Nt1𝒖Nt],𝑭=[τ𝒇1+𝒖0τ𝒇2𝒖02τ𝒇Nt1τ𝒇Nt],\mathcal{A}{\bm{U}}={\bm{F}},\quad{\bm{U}}=\begin{bmatrix}{\bm{u}}^{1}\\ {\bm{u}}^{2}\\ \vdots\\ {\bm{u}}^{N_{t}-1}\\ {\bm{u}}^{N_{t}}\\ \end{bmatrix},\quad{\bm{F}}=\begin{bmatrix}\tau{\bm{f}}^{1}+{\bm{u}}^{0}\\ \tau{\bm{f}}^{2}-\frac{{\bm{u}}^{0}}{2}\\ \vdots\\ \tau{\bm{f}}^{N_{t}-1}\\ \tau{\bm{f}}^{N_{t}}\\ \end{bmatrix}, (2.12)

where 𝒜=CIsτItA\mathcal{A}=C\otimes I_{s}-\tau I_{t}\otimes A with

C=[1232122321223212232]Nt×Nt,C=\begin{bmatrix}1&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}\in\mathbb{R}^{N_{t}\times N_{t}}, (2.13)

and it is clear that 𝒜\mathcal{A} is invertible, because its all diagonal blocks (i.e, either IsτAI_{s}-\tau A or 32IsτA\frac{3}{2}I_{s}-\tau A) are invertible [33, 34].

2.2 Properties of the all-at-once system

In this subsection, we investigate the properties of the discrete all-at-once formulation (2.11). This will guide us to discuss the design of an efficient solver for such a large linear system.

Lemma 2.3.

For the matrix CC in (2.13), we have the following estimates,

  • 1)

    C13Nt2\|C^{-1}\|_{\infty}\leq\frac{3N_{t}}{2};

  • 2)

    C11=Nt\|C^{-1}\|_{1}=N_{t}.

Proof. Consider the following matrix splitting and then perform the matrix factorization,

C=[32232122321223212232][1200000000000]C^12𝒆1𝒆1T=32[1131131131131][111111111]12𝒆1𝒆1T\begin{split}C&=\begin{bmatrix}\frac{3}{2}&\\ -2&\frac{3}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix}-\begin{bmatrix}\frac{1}{2}&\\ 0&0\\ 0&0&0\\ &\ddots&\ddots&\ddots\\ &&0&0&0&\\ &&&0&0&0\end{bmatrix}\triangleq\hat{C}-\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\\ &=\frac{3}{2}\begin{bmatrix}1&\\ -\frac{1}{3}&1\\ &-\frac{1}{3}&1\\ &&\ddots&\ddots\\ &&&-\frac{1}{3}&1&\\ &&&&-\frac{1}{3}&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ -1&1\\ &-1&1\\ &&\ddots&\ddots\\ &&&-1&1&\\ &&&&-1&1\end{bmatrix}-\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\end{split} (2.14)

where the vector 𝒆1=[1,0,,0]TNt{\bm{e}}_{1}=[1,0,\cdots,0]^{T}\in\mathbb{R}^{N_{t}}. According to the Sherman-Morrison formula [53], we can write

C1=C^1+C^112𝒆1𝒆1TC^1112𝒆1TC^1𝒆1=C^1+C^1𝒆1𝒆1TC^12𝒆1TC^1𝒆1.C^{-1}=\hat{C}^{-1}+\frac{\hat{C}^{-1}\frac{1}{2}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\hat{C}^{-1}}{1-\frac{1}{2}{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}}=\hat{C}^{-1}+\frac{\hat{C}^{-1}{\bm{e}}_{1}{\bm{e}}^{T}_{1}\hat{C}^{-1}}{2-{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}}. (2.15)

On the other hand, we can compute the inverse of C^\hat{C} as follows,

C^1=23[111111111111111][113113213113Nt213Nt313113Nt113Nt2132131].\hat{C}^{-1}=\frac{2}{3}\begin{bmatrix}1&\\ 1&1&\\ 1&1&1&\\ \vdots&\ddots&\ddots&\ddots&\\ 1&1&\cdots&1&1&\\ 1&1&\cdots&1&1&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ \frac{1}{3}&1\\ \frac{1}{3^{2}}&\frac{1}{3}&1\\ \vdots&\ddots&\ddots&\ddots\\ \frac{1}{3^{N_{t}-2}}&\frac{1}{3^{N_{t}-3}}&\cdots&\frac{1}{3}&1&\\ \frac{1}{3^{N_{t}-1}}&\frac{1}{3^{N_{t}-2}}&\cdots&\frac{1}{3^{2}}&\frac{1}{3}&1\\ \end{bmatrix}. (2.16)

Therefore, we know that

C^1𝒆1=23[111111111111111][11313213Nt213Nt1]=[2311321133113Nt1113Nt]\hat{C}^{-1}{\bm{e}}_{1}=\frac{2}{3}\begin{bmatrix}1&\\ 1&1&\\ 1&1&1&\\ \vdots&\ddots&\ddots&\ddots&\\ 1&1&\cdots&1&1&\\ 1&1&\cdots&1&1&1\end{bmatrix}\cdot\begin{bmatrix}1&\\ \frac{1}{3}\\ \frac{1}{3^{2}}\\ \vdots\\ \frac{1}{3^{N_{t}-2}}\\ \frac{1}{3^{N_{t}-1}}\\ \end{bmatrix}=\begin{bmatrix}\frac{2}{3}&\\ 1-\frac{1}{3^{2}}\\ 1-\frac{1}{3^{3}}\\ \vdots\\ 1-\frac{1}{3^{N_{t}-1}}\\ 1-\frac{1}{3^{N_{t}}}\\ \end{bmatrix} (2.17)

and

𝒆1TC^1=23𝒆1T,𝒆1TC^1𝒆1=23.{\bm{e}}^{T}_{1}\hat{C}^{-1}=\frac{2}{3}{\bm{e}}^{T}_{1},\quad{\bm{e}}^{T}_{1}\hat{C}^{-1}{\bm{e}}_{1}=\frac{2}{3}. (2.18)

The explicit expression of C1C^{-1} has the following form:

C1=C^1+12[2311321133113Nt1113Nt]𝒆1T=C^1+12[2300113200113300113Nt100113Nt00].C^{-1}=\hat{C}^{-1}+\frac{1}{2}\begin{bmatrix}\frac{2}{3}&\\ 1-\frac{1}{3^{2}}\\ 1-\frac{1}{3^{3}}\\ \vdots\\ 1-\frac{1}{3^{N_{t}-1}}\\ 1-\frac{1}{3^{N_{t}}}\\ \end{bmatrix}{\bm{e}}^{T}_{1}=\hat{C}^{-1}+\frac{1}{2}\begin{bmatrix}\frac{2}{3}&0&\cdots&0\\ 1-\frac{1}{3^{2}}&0&\cdots&0\\ 1-\frac{1}{3^{3}}&0&\cdots&0\\ \vdots&\vdots&\cdots&\vdots\\ 1-\frac{1}{3^{N_{t}-1}}&0&\cdots&0\\ 1-\frac{1}{3^{N_{t}}}&0&\cdots&0\\ \end{bmatrix}. (2.19)

Since C^1\hat{C}^{-1} is a lower triangular Toeplitz matrix, C11\|C^{-1}\|_{1} is the absolute sum of elements of its last row, i.e.,

C11=23[(1++13Nt1)+(1++13Nt2)++(1+13)+1]+12(113Nt)=Nt(13+132++13Nt)+12(113Nt)=Nt.\begin{split}\|C^{-1}\|_{1}&=\frac{2}{3}\left[\left(1+\cdots+\frac{1}{3^{N_{t}-1}}\right)+\left(1+\cdots+\frac{1}{3^{N_{t}-2}}\right)+\cdots+\left(1+\frac{1}{3}\right)+1\right]+\frac{1}{2}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &=N_{t}-\left(\frac{1}{3}+\frac{1}{3^{2}}+\cdots+\frac{1}{3^{N_{t}}}\right)+\frac{1}{2}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &=N_{t}.\end{split} (2.20)

Analogously, C1\|C^{-1}\|_{\infty} is the absolute sum of elements of its first column, i.e.,

C1=23[1+(1+13)++(1++13Nt2)+(1++13Nt1)]+12(113+1132++113Nt)=23(1131113+1132113++113Nt1113+113Nt113+)+12(Nt113Nt11313)=3Nt234(113Nt)3Nt123Nt2.\begin{split}\|C^{-1}\|_{\infty}&=\frac{2}{3}\left[1+\left(1+\frac{1}{3}\right)+\cdots+\left(1+\cdots+\frac{1}{3^{N_{t}-2}}\right)+\left(1+\cdots+\frac{1}{3^{N_{t}-1}}\right)\right]\\ &+\frac{1}{2}\left(1-\frac{1}{3}+1-\frac{1}{3^{2}}+\cdots+1-\frac{1}{3^{N_{t}}}\right)\\ &=\frac{2}{3}\left(\frac{1-\frac{1}{3^{1}}}{1-\frac{1}{3}}+\frac{1-\frac{1}{3^{2}}}{1-\frac{1}{3}}+\cdots+\frac{1-\frac{1}{3^{N_{t}-1}}}{1-\frac{1}{3}}+\frac{1-\frac{1}{3^{N_{t}}}}{1-\frac{1}{3}}+\right)+\frac{1}{2}\left(N_{t}-\frac{1-\frac{1}{3^{N_{t}}}}{1-\frac{1}{3}}\cdot\frac{1}{3}\right)\\ &=\frac{3N_{t}}{2}-\frac{3}{4}\left(1-\frac{1}{3^{N_{t}}}\right)\\ &\leq\frac{3N_{t}-1}{2}\leq\frac{3N_{t}}{2}.\end{split} (2.21)

Therefore, the above estimates are proved. \Box

Theorem 2.1.

Suppose that Re(λ(A))0Re(\lambda(A))\leq 0. Then, the following bounds hold for the critical singular values of 𝒜\mathcal{A} in (2.11):

σmax(𝒜)4+τA2andσmin(𝒜)63Nt,\sigma_{max}(\mathcal{A})\leq 4+\tau\|A\|_{2}\quad{\rm and}\quad\sigma_{min}(\mathcal{A})\geq\frac{\sqrt{6}}{3N_{t}}, (2.22)

so that cond(𝒜)26Nt+6TA22{\rm cond}(\mathcal{A})\leq 2\sqrt{6}N_{t}+\frac{\sqrt{6}T\|A\|_{2}}{2}.

Proof. Since Re(λ(A))0Re(\lambda(A))\leq 0, we may claim that 𝒜𝒛(CIs)𝒛\|\mathcal{A}{\bm{z}}\|\geq\|(C\otimes I_{s}){\bm{z}}\| for any vector 𝒛{\bm{z}}. In particular, using the properties of the Kronecker product, we may estimate σmin(𝒜)σmin(C)\sigma_{min}(\mathcal{A})\geq\sigma_{min}(C). The latter is computed using the spectral norm of C1C^{-1}, which can be bounded by the following inequality:

C12C11C1.\|C^{-1}\|_{2}\leq\sqrt{\|C^{-1}\|_{1}\|C^{-1}\|_{\infty}}. (2.23)

According to Lemma 2.3, we have

σmin(𝒜)σmax(C)=1C121C11C163Nt,\sigma_{min}(\mathcal{A})\geq\sigma_{max}(C)=\frac{1}{\|C^{-1}\|_{2}}\geq\frac{1}{\sqrt{\|C^{-1}\|_{1}\|C^{-1}\|_{\infty}}}\geq\frac{\sqrt{6}}{3N_{t}}, (2.24)

which proves the second bound.

The first estimate is derived similarly. By applying the triangle inequality, we can write

σmax(𝒜)=𝒜2C2+τA2,\sigma_{max}(\mathcal{A})=\|\mathcal{A}\|_{2}\leq\|C\|_{2}+\tau\|A\|_{2}, (2.25)

whereas

C1=C=4,\|C\|_{1}=\|C\|_{\infty}=4, (2.26)

so the spectral norms are not greater than 4. Finally, the proof is completed by recalling that τ=T/Nt\tau=T/N_{t}. \Box

The condition number can be expected to vary linearly with the main properties of the system, such as number of time steps, length of time interval, and norm of the Jacobian matrix [39, 40, 54]. More refined estimates could be provided by taking into account particular properties of AA. In Theorem 2.1, the time interval [0,T][0,T] may not necessarily be equal to the whole observation range of an application. The global time scheme (2.11) could be applied by splitting the desired interval [0,T^][0,\hat{T}] into a sequence of subintervals [0,T],[T,2T],,[T^T,T^][0,T],[T,2T],\cdots,[\hat{T}-T,\hat{T}] for “large” time steps of size TT each, solving by (2.11) for [(q1)T,qT],q=1,,T^/T[(q-1)T,qT],~{}q=1,\cdots,\hat{T}/T, extracting the last snapshot xNtx_{Nt} and restarting the method using xNtx_{Nt} as the initial state for the next interval. The optimal value of TT should provide the fastest computation [39]. Another way to accelerate the solution is to reduce the condition number of the linear system by preconditioning. This standard computational aspect will be considered in the next section.

3 Parallel-in-time (PinT) preconditioners

According to Theorem 2.1, when an iterative method, namely a Krylov subspace method, is used for solving the all-at-once system (2.11), it can converge very slowly or even stagnate. Therefore, in this section we look at preconditioners that can be efficiently implemented in the PinT framework.

3.1 The structuring-circulant preconditioners

Since the matrix CC defined in (2.13) can be viewed as the sum of a Toeplitz matrix and a rank-1 matrix, it is natural to define our first structuring-circulant preconditioner as

𝒫α=CαIsτItA\mathcal{P}_{\alpha}=C_{\alpha}\otimes I_{s}-\tau I_{t}\otimes A (3.1)

where

Cα=[32α22α232α2122321223212232],α(0,1],C_{\alpha}=\begin{bmatrix}\frac{3}{2}&&&&\frac{\alpha}{2}&-2\alpha\\ -2&\frac{3}{2}&&&&\frac{\alpha}{2}\\ \frac{1}{2}&-2&\frac{3}{2}\\ &\ddots&\ddots&\ddots\\ &&\frac{1}{2}&-2&\frac{3}{2}&\\ &&&\frac{1}{2}&-2&\frac{3}{2}\end{bmatrix},\quad\alpha\in(0,1], (3.2)

is a α\alpha-circulant matrix of Strang type that can be diagonalized as

Cα=VαDαVα1C_{\alpha}=V_{\alpha}D_{\alpha}V^{-1}_{\alpha} (3.3)

with

Vα=Λα𝔽Nt,Dα=diag(Nt𝔽NtΛα1Cα(:,1))=diag(λ1(α),λ2(α),,λNt(α))V_{\alpha}=\Lambda_{\alpha}\mathbb{F}_{N_{t}}^{*},\quad~{}D_{\alpha}=\text{diag}\big{(}\sqrt{N_{t}}\mathbb{F}_{N_{t}}\Lambda_{\alpha}^{-1}C_{\alpha}(:,1)\big{)}={\rm diag}\big{(}\lambda^{(\alpha)}_{1},\lambda^{(\alpha)}_{2},\cdots,\lambda^{(\alpha)}_{N_{t}}\big{)} (3.4a)
where λn(α)=j=02rjαj/Ntθ(n1)j(r0=3/2,r1=2,r2=1/2)\lambda^{(\alpha)}_{n}=\sum^{2}_{j=0}r_{j}\alpha^{j/N_{t}}\theta^{(n-1)j}~{}(r_{0}=3/2,~{}r_{1}=-2,~{}r_{2}=1/2), ‘*’ denotes the conjugate transpose of a matrix, Cα(:,1)C_{\alpha}(:,1) is the first column of CαC_{\alpha}, Λα=diag(1,α1Nt,,αNt1Nt)\Lambda_{\alpha}={\rm diag}\big{(}1,\alpha^{-\frac{1}{N_{t}}},\cdots,\alpha^{-\frac{N_{t}-1}{N_{t}}}\big{)} and
𝔽Nt=1Nt[1111θθNt11θNt1θ(Nt1)(Nt1)],θ=e2πiNt.\begin{split}\mathbb{F}_{N_{t}}=\frac{1}{\sqrt{N_{t}}}\begin{bmatrix}1&1&\dots&1\\ 1&\theta&\dots&\theta^{N_{t}-1}\\ \vdots&\vdots&\dots&\vdots\\ 1&\theta^{N_{t}-1}&\dots&\theta^{(N_{t}-1)(N_{t}-1)}\end{bmatrix},\quad~{}\theta=e^{\frac{2\pi{\rm i}}{N_{t}}}.\end{split} (3.4b)

The matrix 𝔽Nt\mathbb{F}_{N_{t}} in (3.4b) is the discrete Fourier matrix, and is unitary. In addition, it is worth noting that if we choose α\alpha very small, then the condition number of VαV_{\alpha} increases quickly and therefore large roundoff errors may be introduced in our proposed method; see [36, 19] for a discussion of these issues. Using the property of the Kronecker product, we can factorize 𝒫α\mathcal{P}_{\alpha} as

𝒫α=(VαIs)(DαIsτItA)(Vα1Is),\mathcal{P}_{\alpha}=(V_{\alpha}\otimes I_{s})\left(D_{\alpha}\otimes I_{s}-\tau I_{t}\otimes A\right)(V^{-1}_{\alpha}\otimes I_{s}), (3.5)

and this implies that we can compute 𝒛=𝒫α1𝒗{\bm{z}}=\mathcal{P}^{-1}_{\alpha}{\bm{v}} via the following three steps:

{𝒛1=(Vα1Is)𝒗=(𝔽NtIs)[(Λα1Is)𝒗],Step-(a),(λn(α)IsτA)𝒛2,n=𝒛1,n,n=1,2,,Nt,Step-(b),𝒛=(VαIs)𝒛2=(ΛαIs)[(𝔽NtIs)𝒛2],Step-(c),\begin{cases}{\bm{z}}_{1}=(V^{-1}_{\alpha}\otimes I_{s}){\bm{v}}=(\mathbb{F}_{N_{t}}\otimes I_{s})\left[(\Lambda^{-1}_{\alpha}\otimes I_{s}){\bm{v}}\right],&\text{Step-(a)},\\ \big{(}\lambda^{(\alpha)}_{n}I_{s}-\tau A\big{)}{\bm{z}}_{2,n}={\bm{z}}_{1,n},\quad~{}n=1,2,\dots,{N_{t}},&\text{Step-(b)},\\ {\bm{z}}=(V_{\alpha}\otimes I_{s}){\bm{z}}_{2}=(\Lambda_{\alpha}\otimes I_{s})\left[(\mathbb{F}^{*}_{N_{t}}\otimes I_{s}){\bm{z}}_{2}\right],&\text{Step-(c)},\\ \end{cases}

where 𝒛j=(𝒛j,1,𝒛j,2,,𝒛j,Nt){\bm{z}}_{j}=({\bm{z}}_{j,1}^{\top},{\bm{z}}_{j,2}^{\top},\dots,{\bm{z}}_{j,N_{t}}^{\top})^{\top} (with j=1,2j=1,2) and λn(α)\lambda^{(\alpha)}_{n} is the nn-th eigenvalue of CαC_{\alpha}. The first and third steps only involve matrix-vector multiplications with NtN_{t} subvectors, and these can be computed simultaneously by the FFTs222It notes that there are also parallel FFTs available at http://www.fftw.org/parallel/parallel-fftw.html. in NtN_{t} CPUs [54]. The major computational cost is spent in the second step, however this step is naturally parallel for all the NtN_{t} time steps.

Next, we study the invertibility of the matrix 𝒫α\mathcal{P}_{\alpha}. According to Eq. (2.13) and because matrix AA is negative definite, the following result assures the invertibility of 𝒫α\mathcal{P}_{\alpha}333It is unnecessary to keep that r0|r1|+|r2|r_{0}\geq|r_{1}|+|r_{2}|, which is introduced in [35, Remark 4]..

Proposition 3.1.

For α(0,1]\alpha\in(0,1] and λn(α)\lambda^{(\alpha)}_{n}\in\mathbb{C}, it holds that e(λn(α))0\mathcal{R}e(\lambda^{(\alpha)}_{n})\geq 0, where the equality is true if and only if α=1\alpha=1.

Proof. Set ε=α1/Ntθ(n1)x+iy\varepsilon=\alpha^{1/N_{t}}\theta^{(n-1)}\triangleq x+{\rm i}y, then x2+y2=α2/Nt1x^{2}+y^{2}=\alpha^{2/N_{t}}\leq 1, we have

e(λn(α))=e(322ε+ε22)=e(12(1xiy)(3xiy))=12[(1x)(3x)y2]x22x+10,\begin{split}\mathcal{R}e(\lambda^{(\alpha)}_{n})&=\mathcal{R}e\left(\frac{3}{2}-2\varepsilon+\frac{\varepsilon^{2}}{2}\right)\\ &=\mathcal{R}e\left(\frac{1}{2}(1-x-{\rm i}y)(3-x-{\rm i}y)\right)\\ &=\frac{1}{2}[(1-x)(3-x)-y^{2}]\\ &\geq x^{2}-2x+1\\ &\geq 0,\end{split} (3.6)

where the penultimate inequality holds due to x21y2x^{2}-1\leq-y^{2}. Moreover, if α1\alpha\neq 1, then it should hold that x2+y2<1x^{2}+y^{2}<1, i.e., 1<x<1-1<x<1 and x22x+1>0x^{2}-2x+1>0, which completes the proof. \Box

In practice, we often choose α(0,1)\alpha\in(0,1) [35]; then it holds that e(λn(α))>0\mathcal{R}e(\lambda^{(\alpha)}_{n})>0 according to Proposition 3.1. In Fig. 1, we plot the complex quantities {λn(α)}n=1Nt\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1} on the complex plane. We see that for α(0,1)\alpha\in(0,1), it holds e(λn(α))>0\mathcal{R}e(\lambda^{(\alpha)}_{n})>0 and consequently all the linear systems involved at Step-(b) are positive definite, thus not difficult to solve [31]. As shown in [36], a multigrid method using Richardson iterations as a smoother with an optimized choice of the damping parameter is very effective. By the way, a detailed convergence analysis with multigrid precondioning in the context of the τ\tau algebra considered in this paper is presented in [38].

Refer to caption
Refer to caption
Refer to caption
Fig. 1: The plot of complex quantities {λn(α)}n=1Nt\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1} with different α\alpha’s.

In addition, it is not hard to derive the following result.

Theorem 3.1.

Let α(0,1]\alpha\in(0,1] and 𝒫α1𝒜=+\mathcal{P}^{-1}_{\alpha}\mathcal{A}=\mathcal{I}+\mathcal{L}, where \mathcal{I} is an identity matrix of order NtNsN_{t}N_{s}, then it follows that rank()=2Nt{\rm rank}(\mathcal{L})=2N_{t}.

Proof. We consider that

𝒫α1𝒜=𝒫α1[𝒫α+(𝒜𝒫α)]=+𝒫α1[(CCα)Is]+,\begin{split}\mathcal{P}^{-1}_{\alpha}\mathcal{A}&=\mathcal{P}^{-1}_{\alpha}[\mathcal{P}_{\alpha}+(\mathcal{A}-\mathcal{P}_{\alpha})]\\ &=\mathcal{I}+\mathcal{P}^{-1}_{\alpha}[(C-C_{\alpha})\otimes I_{s}]\\ &\triangleq\mathcal{I}+\mathcal{L},\end{split} (3.7)

where rank()=rank(𝒫α1[(CCα)Is])=rank((CCα)Is)=2Nt{\rm rank}(\mathcal{L})={\rm rank}\big{(}\mathcal{P}^{-1}_{\alpha}[(C-C_{\alpha})\otimes I_{s}]\big{)}={\rm rank}\big{(}(C-C_{\alpha})\otimes I_{s}\big{)}=2N_{t}, which completes the proof. \Box

Theorem 3.1 implies that the generalized minimal residual (GMRES) method by Saad and Schultz [56] can compute the exact solution of the preconditioned system 𝒫α1𝒜\mathcal{P}^{-1}_{\alpha}\mathcal{A} (if it is diagonalizable) in at most 2Nt+12N_{t}+1 iterations [56, 23, 33]. Although it should be noted that such a convergence behavior of the preconditioned GMRES is not sharp when NtN_{t} is not small, the result can give useful clues on the effectiveness of the preconditioner 𝒫α\mathcal{P}_{\alpha} to approximate the all-at-once matrix 𝒜\mathcal{A}. Moreover, in Section 4 we will show numerically that most of the eigenvalues of the preconditioned matrix 𝒫α1𝒜\mathcal{P}^{-1}_{\alpha}\mathcal{A} actually cluster at point 1 of the spectrum, supporting the theoretical findings of this section on the good potential of the proposed preconditioner 𝒫α\mathcal{P}_{\alpha} to accelerate the iterative solution of 𝒫α1𝒜\mathcal{P}^{-1}_{\alpha}\mathcal{A}.

Remark 3.1.

If we set α=1\alpha=1, the preconditioner 𝒫α\mathcal{P}_{\alpha} reduces to the BC preconditioner 𝒫1=C1IsτItA\mathcal{P}_{1}=C_{1}\otimes I_{s}-\tau I_{t}\otimes A, where C1C_{1} is a matrix of CαC_{\alpha} in (3.2) with α=1\alpha=1. Moreover, the matrix decomposition (3.5), Proposition 3.1, and Theorem 3.1 are available for the BC preconditioner 𝒫1\mathcal{P}_{1}, refer e.g., to [33, 35] for details;

Remark 3.2.

We remark that the invertibility of 𝒫α\mathcal{P}_{\alpha} (or 𝒫1\mathcal{P}_{1}) depends completely on the matrix λn(α)IsτA\lambda^{(\alpha)}_{n}I_{s}-\tau A. However, if we choose α1\alpha\rightarrow 1, the real parts of {λn(α)}n=1Nt\{\lambda^{(\alpha)}_{n}\}^{N_{t}}_{n=1} will be very close to zero, cf. Proposition 3.1. Meanwhile, if the diffusion coefficient κγ\kappa_{\gamma} is very small, the eigenvalues of AA will be increasingly close to zero. This fact will make the matrices λn(α)IsτA\lambda^{(\alpha)}_{n}I_{s}-\tau A potentially very ill-conditioned (even singular444If the model problem (2.1) incorporates Neumann boundary condition and we set α=1\alpha=1, the matrices λn(α)IsτA\lambda^{(\alpha)}_{n}I_{s}-\tau A will be singular because of the singular matrix AA.). Under this circumstance, the BC preconditioner 𝒫1\mathcal{P}_{1} should be a bad preconditioner (see [35] and Section 4) while the preconditioner 𝒫α\mathcal{P}_{\alpha} with α(0,1)\alpha\in(0,1) should be preferred in a practical implementation;

Remark 3.3.

If one wants to improve the proposed preconditioner further, then the polynomial preconditioner that has potential to reduce communication costs in Krylov subspace solvers [56] can be derived:

𝒜1=[𝒫α1𝒮α]1𝒫α1=k=1(𝒫α1𝒮α)k𝒫α1,\begin{split}\mathcal{A}^{-1}&=\left[\mathcal{I}-\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha}\right]^{-1}\mathcal{P}^{-1}_{\alpha}\\ &=\sum^{\infty}_{k=1}(\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha})^{k}\mathcal{P}^{-1}_{\alpha},\end{split} (3.8)

where 𝒮α=(CαC)Ix\mathcal{S}_{\alpha}=(C_{\alpha}-C)\otimes I_{x} and assume that 𝒫α1𝒮α<1\|\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha}\|<1, then the mm-step polynomial preconditioner 𝒫α(m)=k=0m(𝒫α1𝒮α)k𝒫α1\mathcal{P}_{\alpha}(m)=\sum^{m}_{k=0}(\mathcal{P}^{-1}_{\alpha}\mathcal{S}_{\alpha})^{k}\mathcal{P}^{-1}_{\alpha} will be available and inherently parallel. Moreover, such a preconditioner can be easily adapted for the all-at-once systems arising from the given non-uniform temporal discretizations; refer to [34] for a short discussion.

3.2 Efficient implementation of the PinT preconditioner

In this subsection, we discuss on how to implement the Krylov subspace method more efficiently for solving the left (or right) preconditioned system 𝒫α1𝒜𝑼=𝒫α1𝑭\mathcal{P}^{-1}_{\alpha}\mathcal{A}{\bm{U}}=\mathcal{P}^{-1}_{\alpha}{\bm{F}} (or 𝒜𝒫α1(𝒫α𝑼)=𝑭\mathcal{A}\mathcal{P}^{-1}_{\alpha}(\mathcal{P}_{\alpha}{\bm{U}})={\bm{F}}), α(0,1]\alpha\in(0,1]. We recall that the kernel operations of a Krylov subspace algorithm are the matrix-vector products 𝒫α1𝒗\mathcal{P}^{-1}_{\alpha}{\bm{v}} and 𝒜𝒗\mathcal{A}{\bm{v}} for some given vector 𝒗NtNs×1{\bm{v}}\in\mathbb{R}^{N_{t}N_{s}\times 1}. First, we present a fast implementation for computing the second matrix-vector product.

𝒜𝒗=(CIs)𝒗(τItA)𝒗=vec(VCT)τvec(AV),\begin{split}\mathcal{A}{\bm{v}}&=(C\otimes I_{s}){\bm{v}}-(\tau I_{t}\otimes A){\bm{v}}\\ &={\rm vec}(VC^{T})-\tau\cdot{\rm vec}(AV),\end{split} (3.9)

where the operation 𝒗=vec(V){\bm{v}}={\rm vec}(V) and vec(X){\rm vec}(X) denote the vectorization of the matrix XX obtained by stacking the columns of XX into a single column vector. The first term can be calculated directly in 𝒪(NtNs)\mathcal{O}(N_{t}N_{s}) storage and operations due to the sparse matrix CTC^{T}. Owing to the Toeplitz structure of matrix AA, the second term can be evaluated in 𝒪(NtNslogNs)\mathcal{O}(N_{t}N_{s}\log N_{s}) operations and 𝒪(NtNs)\mathcal{O}(N_{t}N_{s}) storage. This means that the computation of 𝒜𝒗\mathcal{A}{\bm{v}} requires the same operations and storage of (ItA)𝒗(I_{t}\otimes A){\bm{v}}. Clearly, the algorithm complexity of 𝒜𝒗\mathcal{A}{\bm{v}} can be further alleviated by a parallel implementation.

Next, we focus on the fast computation of 𝒫α1𝒗\mathcal{P}^{-1}_{\alpha}{\bm{v}} for a given vector 𝒗{\bm{v}}. According to Step-(a)–Step-(c) described in Section 3.1, the first and third steps of the implementation of 𝒫α\mathcal{P}_{\alpha} require 𝒪(NtNslogNs)\mathcal{O}(N_{t}N_{s}\log N_{s}) operations and 𝒪(NtNs)\mathcal{O}(N_{t}N_{s}) storage. If the matrix AA can be diagonalized (or approximated) via fast DSTs, Step-(b) can be directly solved with 𝒪(NtNslogNs)\mathcal{O}(N_{t}N_{s}\log N_{s}) operations and 𝒪(NtNs)\mathcal{O}(N_{t}N_{s}) storage, except for the different complex shifts.

Similar to [35, pp. 10-11], it is worth noting that only half of NtN_{t} shifted linear systems in Step-(b) need to be solved. From Step-(a), we know that the right hand sides in Step-(b) can be expressed as

𝒛1,n=1Ntj=02αjNtθ(n1)j𝒗n,𝒗=[𝒗1,𝒗2,,𝒗Nt],n=1,2,,Nt.{\bm{z}}_{1,n}=\frac{1}{\sqrt{N_{t}}}\sum^{2}_{j=0}\alpha^{\frac{j}{N_{t}}}\theta^{(n-1)j}{\bm{v}}_{n},~{}~{}{\bm{v}}=[{\bm{v}}^{\top}_{1},{\bm{v}}^{\top}_{2},\cdots,{\bm{v}}^{\top}_{N_{t}}]^{\top},~{}~{}n=1,2,\cdots,N_{t}. (3.10)

We recall the expression of coefficient matrices in Step-(b):

Bn1:=λn(α)IsτA=(j=02rjαjNtθ(n1)j)IsτA,n=1,2,,Nt.B_{n-1}:=\lambda^{(\alpha)}_{n}I_{s}-\tau A=\left(\sum^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{(n-1)j}\right)I_{s}-\tau A,\quad n=1,2,\cdots,N_{t}. (3.11)

Let “conj(){\rm conj}(\cdot)” denote conjugate of a matrix or a vector. Then,

{conj(𝒛1,n)=1Ntj=02αjNtθ[Nt(n1)]j𝒗n=𝒛1,Ntn+2,conj(Bn1)=(j=02rjαjNtθ(n1)j)IsτA=(j=02rjαjNtθ[Nt(n1)]j)IsτA=BNtn+1,n=2,3,,Nt.\begin{cases}{\rm conj}({\bm{z}}_{1,n})=\frac{1}{\sqrt{N_{t}}}\sum\limits^{2}_{j=0}\alpha^{\frac{j}{N_{t}}}\theta^{[N_{t}-(n-1)]j}{\bm{v}}_{n}={\bm{z}}_{1,N_{t}-n+2},\\ {\rm conj}(B_{n-1})=\left(\sum\limits^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{-(n-1)j}\right)I_{s}-\tau A=\left(\sum\limits^{2}_{j=0}r_{j}\alpha^{\frac{j}{N_{t}}}\theta^{[N_{t}-(n-1)]j}\right)I_{s}-\tau A=B_{N_{t}-{\color[rgb]{1,0,0}{n}}+1},&n=2,3,\cdots,N_{t}.\end{cases} (3.12)

That means the unknowns in Step-(b) hold equalities: 𝒛2,n=conj(𝒛2,Ntn+2){\bm{z}}_{2,n}={\rm conj}({\bm{z}}_{2,N_{t}-n+2}) for n=2,3,,Ntn=2,3,\cdots,N_{t}. Therefore, only the first (Nt+1)/2\lfloor(N_{t}+1)/2\rfloor multi-shifted linear systems in Step-(b) need to be solved, thus the number of core processors required in the practical parallel implementation is significantly reduced.

Finally, we discuss the efficient solution of the sequence of shifted linear systems in Step-(b). Since AA is a real negative definite Toeplitz matrix, it can be approximated efficiently by a τ\tau-matrix 𝒯(A)\mathscr{T}(A) [47] that is a real symmetric matrix and can be diagonalized as follows:

𝒯(A)=QNsΛNsQNs,QNs=2Ns+1sin(πijNs+1),1i,jNs,ΛNs=diag(σ1,σ,,σNs).\mathscr{T}(A)=Q^{\top}_{N_{s}}\Lambda_{N_{s}}Q_{N_{s}},\quad Q_{N_{s}}=\sqrt{\frac{2}{N_{s}+1}}\sin\left(\frac{\pi ij}{N_{s}+1}\right),~{}1\leq i,j\leq N_{s},\quad\Lambda_{N_{s}}={\rm diag}(\sigma_{1},\sigma,\cdots,\sigma_{N_{s}}). (3.13)

From the relations QNsQNs=IsQ^{\top}_{N_{s}}Q_{N_{s}}=I_{s} and QNs𝒯(A)=ΛNsQNsQ_{N_{s}}\mathscr{T}(A)=\Lambda_{N_{s}}Q_{N_{s}}, we have

σi==1Nsasin(πijNs+1)sin(πiNs+1),\sigma_{i}=\frac{\sum^{N_{s}}_{\ell=1}a_{\ell}\sin\left(\frac{\pi ij}{N_{s}+1}\right)}{\sin\left(\frac{\pi i}{N_{s}+1}\right)}, (3.14)

where the vector [a1,a2,,aNs][a_{1},~{}a_{2},\cdots,a_{N_{s}}]^{\top} is the first column of AA. The set of shifted linear systems at Step-(b) is reformulated as the following sequence:

[λn(α)Isτ𝒯(A)]𝒛2,n=QNs[λn(α)IsτΛNs]QNs=𝒛1,n,n=1,2,,Nt,\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\mathscr{T}(A)\big{]}{\bm{z}}_{2,n}=Q^{\top}_{N_{s}}\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\Lambda_{N_{s}}\big{]}Q_{N_{s}}={\bm{z}}_{1,n},\quad~{}n=1,2,\dots,N_{t}, (3.15)

that can be solved by fast DSTs without storing any dense matrix. In addition, since the DST only involves real operations, so the above preconditioner matrix diagonalization does not affect Eqs. (3.10)–(3.12) at all. Overall, the fast computation of Pα1𝒗P^{-1}_{\alpha}{\bm{v}} using Eq. (3.15) requires 𝒪(NtNslogNs)\mathcal{O}(N_{t}N_{s}\log N_{s}) operations and 𝒪(NtNs)\mathcal{O}(N_{t}N_{s}) memory units.

Remark 3.4.

It is worthwhile noting that such an implementation is also suitable for solving the all-at-once system that arises from the spatial discretizations using the compact finite difference, finite element and spectral methods by only substituting in Eq. (3.15) the identity matrix IsI_{s} with the mass matrix. Fortunately, for one-dimensional problems, the mass matrix is a SPD Toeplitz tridiagonal matrix [25, 30, 13], which can be naturally diagonalized by fast DSTs. For high-dimensional models, the mass matrix will be a SPD Toeplitz block tridiagonal matrix in the Kronecker product form, thus it can be still diagonalized via fast DSTs; refer, e.g., to [11];

Remark 3.5.

Finally, it is essential to mention that although the algorithm and theoretical analyses are presented for one-dimensional model (2.1), they can be naturally extended for two-dimensional model (refer to the next section for a discussion) because the above algorithm and theoretical analyses are mainly based on the temporal discretization.

4 Numerical experiments

In this section, we present the performance of the proposed α\alpha-circulant-based preconditioners for solving some examples of one- and two-dimensional model problems (2.1). In all our numerical experiments reported in this section, following the guidelines given in [35] we set α=min{0.5,0.5τ}\alpha=\min\{0.5,0.5\tau\} for the preconditioners 𝒫α\mathcal{P}_{\alpha} and 𝒫1\mathcal{P}_{1}. All our experiments are performed in MATLAB R2017a on a Gigabyte workstation equipped with Intel(R) Core(TM) i9-7900X CPU @3.3GHz, 32GB RAM running the Windows 10 Education operating system (64bit version) using double precision floating point arithmetic (with machine epsilon equal to 101610^{-16}). The adaptive Simpler GMRES (shortly referred to as Ad-SGMRES) method [55] is employed to solve the right-preconditioned systems in Example 1, while the BiCGSTAB method [56, pp. 244-247] method is applied to the left-preconditioned systems in Example 2 using the built-in function available in MATLAB555In this case, since Ad-SGMRES still requires large amounts of storage due to the orthogonalization process, we have also used the BiCGSTAB method as an alternative iterative method for solving non-symmetric systems.. The tolerance for the stopping criterion in both algorithms is set as 𝒓k2/𝒓02<tol=109\|{\bm{r}}_{k}\|_{2}/\|{\bm{r}}_{0}\|_{2}<{\rm tol}=10^{-9}, where 𝒓k{\bm{r}}_{k} is the residual vector at kkth Ad-SGMRES or BiCGSTAB iteration. The iterations are always started from the zero vector. All timings shown in the tables are obtained by averaging over 20 runs666The code in MATLAB will be available at https://github.com/Hsien-Ming-Ku/Group-of-FDEs..

In the tables, the quantity ‘Iter’ represents the iteration number of Ad-SGMRES or BiCGSTAB, ‘DoF’ is the number of degrees of freedom, or unknowns, and ‘CPU’ is the computational time expressed in seconds. The 2-norm of the true relative residual (called in the tables TRR) is defined as 𝚃𝚁𝚁=||𝑭𝒜𝑼k2/𝑭2\mathtt{TRR}=||{\bm{F}}-\mathcal{A}{\bm{U}}_{k}\|_{2}/\|{\bm{F}}\|_{2}, and the numerical error (Err) between the approximate and the exact solutions at the final time level reads 𝒖𝒖Nt\|{\bm{u}}^{*}-{\bm{u}}^{N_{t}}\|_{\infty}, where 𝑼k{\bm{U}}_{k} is the approximate solution when the preconditioned iterative solvers terminate and 𝒖{\bm{u}}^{*} is the exact solution on the mesh. These notations are adopted throughout this section.

Example 1. The first example is a Riesz fractional diffusion equation (2.1) with coefficients (a,b)=(0,1)(a,b)=(0,1), T=1T=1, κγ=0.01\kappa_{\gamma}=0.01 and ϕ(x)=15(1+γ/4)x3(1x)3\phi(x)=15(1+\gamma/4)x^{3}(1-x)^{3}. The source term is given by

f(x,t)=15(1+γ/4)etx3(1x)3+15(1+γ/4)κγet2cos(γπ/2)[Γ(4)Γ(4γ)(x3γ+(1x)3γ)3Γ(5)Γ(5γ)(x4γ+(1x)4γ)+3Γ(6)Γ(6γ)(x5γ+(1x)5γ)Γ(7)Γ(7γ)(x6γ+(1x)6γ)].\begin{split}f(x,t)&=15(1+\gamma/4)e^{t}x^{3}(1-x)^{3}+\frac{15(1+\gamma/4)\kappa_{\gamma}e^{t}}{2\cos(\gamma\pi/2)}\Bigg{[}\frac{\Gamma(4)}{\Gamma(4-\gamma)}\big{(}x^{3-\gamma}+(1-x)^{3-\gamma}\big{)}-\frac{3\Gamma(5)}{\Gamma(5-\gamma)}\big{(}x^{4-\gamma}~{}+\\ &\quad~{}(1-x)^{4-\gamma}\big{)}+\frac{3\Gamma(6)}{\Gamma(6-\gamma)}\big{(}x^{5-\gamma}+(1-x)^{5-\gamma}\big{)}-\frac{\Gamma(7)}{\Gamma(7-\gamma)}\big{(}x^{6-\gamma}+(1-x)^{6-\gamma}\big{)}\Bigg{]}.\end{split}

The exact solution is known and it reads as u(x,t)=15(1+γ/4)etx3(1x)3u(x,t)=15(1+\gamma/4)e^{t}x^{3}(1-x)^{3}. The results of our numerical experiments with the Ad-SGMRES method preconditioned by 𝒫α\mathcal{P}_{\alpha} and 𝒫1\mathcal{P}_{1} for solving the all-at-once discretized systems (2.12) are reported in Tables 13.

Table 1: Numerical results of GMRES with two different preconditioners on Example 1 with γ=1.2\gamma=1.2.
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hh DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/128 8,128 7 0.033 -9.794 9.7599e-5 19 0.085 -9.412 9.7599e-5
1/256 16,320 7 0.054 -9.257 9.4838e-5 19 0.117 -9.347 9.4838e-5
1/512 32,704 8 0.116 -10.017 9.4147e-5 19 0.247 -9.535 9.4147e-5
1/1024 65,472 8 0.156 -9.537 9.3974e-5 19 0.359 -9.163 9.3975e-5
282^{8} 1/128 32,512 7 0.107 -10.366 9.5721e-6 19 0.258 -9.378 9.5722e-6
1/256 65,280 7 0.158 -9.580 6.8110e-6 19 0.385 -9.318 6.8111e-6
1/512 130,816 7 0.445 -9.020 6.1205e-6 19 1.236 -9.248 6.1208e-6
1/1024 261,888 8 0.787 -9.763 5.9481e-6 19 1.958 -9.156 5.9482e-6
2102^{10} 1/128 130,048 6 0.323 -9.020 5.0121e-6 19 0.964 -9.368 5.0138e-6
1/256 261,120 7 0.756 -9.895 1.2888e-6 19 2.123 -9.309 1.2890e-6
1/512 523,264 7 1.715 -9.326 5.9821e-7 19 4.749 -9.242 5.9870e-7
1/1024 1,047,552 8 3.245 -10.056 4.2607e-7 19 8.103 -9.153 4.2613e-7
Table 2: Numerical results of GMRES with two different preconditioners on Example 1 with γ=1.5\gamma=1.5.
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hh DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/128 8,128 8 0.038 -9.892 1.0514e-4 15 0.085 -9.044 1.0515e-4
1/256 16,320 8 0.054 -9.755 9.8789e-5 15 0.095 -9.176 9.8789e-5
1/512 32,704 8 0.112 -9.697 9.7199e-5 15 0.198 -9.513 9.7199e-5
1/1024 65,472 8 0.157 -9.349 9.6802e-5 16 0.308 -9.563 9.6802e-5
282^{8} 1/128 32,512 7 0.112 -9.545 1.4536e-5 16 0.225 -9.831 1.4536e-5
1/256 65,280 7 0.158 -9.103 8.1809e-6 15 0.319 -9.323 8.1810e-6
1/512 130,816 8 0.515 -9.925 6.5922e-6 15 0.979 -9.265 6.5922e-6
1/1024 261,888 8 0.829 -9.587 6.1950e-6 16 1.649 -9.605 6.1950e-6
2102^{10} 1/128 130,048 7 0.392 -9.982 1.3161e-5 15 0.827 -9.008 1.3162e-5
1/256 261,120 7 0.779 -9.423 3.2696e-6 15 1.637 -9.184 3.2692e-6
1/512 523,264 7 1.744 -9.049 9.0813e-7 15 3.769 -9.381 9.0882e-7
1/1024 1,047,552 8 3.221 -9.878 5.1171e-7 16 6.658 -9.631 5.1160e-7
Table 3: Numerical results of GMRES with two different preconditioners on Example 1 with γ=1.9\gamma=1.9.
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hh DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/128 8,128 7 0.034 -9.255 1.2052e-4 11 0.052 -9.305 1.2052e-4
1/256 16,320 7 0.047 -9.101 1.0303e-4 11 0.070 -9.195 1.0303e-4
1/512 32,704 8 0.113 -10.012 9.8653e-5 11 0.151 -9.123 9.8653e-5
1/1024 65,472 8 0.159 -10.024 9.7559e-5 11 0.225 -9.058 9.7559e-5
282^{8} 1/128 32,512 7 0.112 -9.831 3.8671e-5 11 0.153 -9.408 3.8671e-5
1/256 65,280 7 0.155 -9.500 1.1924e-5 11 0.235 -9.313 1.1924e-5
1/512 130,816 7 0.456 -9.287 7.5514e-6 11 0.690 -9.255 7.5518e-6
1/1024 261,888 7 0.696 -9.261 6.4585e-6 11 1.093 -9.212 6.4587e-6
2102^{10} 1/128 130,048 6 0.331 -9.416 3.9387e-5 11 0.588 -9.433 3.9387e-5
1/256 261,120 6 0.666 -9.216 9.8111e-6 11 1.187 -9.368 9.8118e-6
1/512 523,264 7 1.736 -9.936 2.4178e-6 11 2.669 -9.317 2.4178e-6
1/1024 1,047,552 7 2.850 -9.941 7.4549e-7 11 4.396 -9.300 7.4557e-7

According to Tables 13, we note that the used method with preconditioner 𝒫α\mathcal{P}_{\alpha} converges much faster than with 𝒫1\mathcal{P}_{1} in terms of both CPU and Iter on this Example 1, with different values of γ\gamma’s. In terms of accuracy (the values TRR and Err), the two preconditioned Ad-SGMRES methods are almost comparable. The results indicate that introducing the adaptive parameter α(0,1)\alpha\in(0,1) indeed helps improve the performance of 𝒫1\mathcal{P}_{1}. Moreover, the τ\tau-matrix approximation of the Jacobian matrix AA in (3.5) is numerically effective. The iteration number of Ad-SGMRES-𝒫α\mathcal{P}_{\alpha} varies only slightly when NtN_{t} increases (or hh decreases), showing almost matrix-size independent convergence rate. Such favourable numerical scalability property of Ad-SGMRES-𝒫α\mathcal{P}_{\alpha} is completely in line with our theoretical analysis presented in Section 3.1, where the eigenvalues distribution of the preconditioned matrices (also partly shown in Fig. 2) is independent of the space-discretization matrix AA. On the other hand, if the diffusion coefficient κγ\kappa_{\gamma} becomes smaller than 10210^{-2}, the difference of performance between the preconditioners 𝒫α\mathcal{P}_{\alpha} and 𝒫1\mathcal{P}_{1} will be more remarkable; refer to Remark 4, however we do not investigate these circumstance in the present study.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Fig. 2: The eigenvalue distributions of the matrix 𝒜\mathcal{A} and preconditioned matrices 𝒫α1𝒜,𝒫11𝒜\mathcal{P}^{-1}_{\alpha}\mathcal{A},~{}\mathcal{P}^{-1}_{1}\mathcal{A} with γ=1.2,1.5,1.9\gamma=1.2,~{}1.5,~{}1.9 and h=τ=1/64h=\tau=1/64 in Example 1.

Example 2. (the 2D problem [52, 13]) Consider the following Riesz fractional diffusion equation

{u(x,y,t)t=κγ1γ1u(x,y,t)|x|γ1+κγ2γ2u(x,y,t)|y|γ2+f(x,y,t),(x,y,t)Ω×(0,T],u(x,y,0)=ϕ(x,y),(x,y)Ω,u(x,y,t)=0,(x,y,t)Ω×(0,T),\begin{cases}\frac{\partial u(x,y,t)}{\partial t}=\kappa_{\gamma_{1}}\frac{\partial^{\gamma_{1}}u(x,y,t)}{\partial|x|^{\gamma_{1}}}+\kappa_{\gamma_{2}}\frac{\partial^{\gamma_{2}u(x,y,t)}}{\partial|y|^{\gamma_{2}}}+f(x,y,t),&(x,y,t)\in\Omega\times(0,T],\\ u(x,y,0)=\phi(x,y),&(x,y)\in\Omega,\\ u(x,y,t)=0,&(x,y,t)\in\partial\Omega\times(0,T),\end{cases} (4.1)

where κγ1=κγ2=0.01\kappa_{\gamma_{1}}=\kappa_{\gamma_{2}}=0.01, T=2T=2, Ω=(0,2)×(0,2)\Omega=(0,2)\times(0,2), ϕ(x,y)=x4(2x)4y4(2y)4\phi(x,y)=x^{4}(2-x)^{4}y^{4}(2-y)^{4} such that the source term is exactly defined as

f(x,y,t)=13et/3x4(2x)4y4(2y)4+κγ1et/32cos(γ1π/2)y4(2y)4=59qΓ()[x1γ1+(2x)1γ1]Γ(γ1)+κγ2et/32cos(γ2π/2)x4(2x)4=59qΓ()[x1γ2+(2x)1γ2]Γ(γ2)\begin{split}f(x,y,t)&=-\frac{1}{3}e^{-t/3}x^{4}(2-x)^{4}y^{4}(2-y)^{4}+\frac{\kappa_{\gamma_{1}}e^{-t/3}}{2\cos(\gamma_{1}\pi/2)}y^{4}(2-y)^{4}\sum^{9}_{\ell=5}\frac{q_{\ell}\Gamma(\ell)\big{[}x^{\ell-1-\gamma_{1}}+(2-x)^{\ell-1-\gamma_{1}}\big{]}}{\Gamma(\ell-\gamma_{1})}\\ &\quad+\frac{\kappa_{\gamma_{2}}e^{-t/3}}{2\cos(\gamma_{2}\pi/2)}x^{4}(2-x)^{4}\sum^{9}_{\ell=5}\frac{q_{\ell}\Gamma(\ell)\big{[}x^{\ell-1-\gamma_{2}}+(2-x)^{\ell-1-\gamma_{2}}\big{]}}{\Gamma(\ell-\gamma_{2})}\end{split}

with q5=16,q6=32,q7=24,q8=8q_{5}=16,~{}q_{6}=-32,~{}q_{7}=24,~{}q_{8}=-8 and q9=1q_{9}=1. The exact solution is u(x,y,y)=et/3x4(2x)4y4(2y)4u(x,y,y)=e^{-t/3}x^{4}(2-x)^{4}y^{4}(2-y)^{4}.

By a similar derivation to the technique described in Section 2.1, we can establish an implicit difference scheme for solving this model problem. For simplicity, we set hx=hy=(ba)/Nh_{x}=h_{y}=(b-a)/N; then, it is easy to derive the Jacobian matrix in the following Kronecker product form

A=[(κγ1hγ1Tx)IN1+IN1(κγ2hγ2Ty)],A=-\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}T_{x}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}T_{y}\right)\right], (4.2)

where the two SPD Toeplitz matrices are defined by

Tx=[ω0(γ1)ω1(γ1)ω2(γ1)ω3N(γ1)ω2N(γ1)ω1(γ1)ω0(γ1)ω1(γ1)ω4N(γ1)ω3N(γ1)ω2(γ1)ω1(γ1)ω0(γ1)ω5N(γ1)ω4N(γ1)ωN3(γ1)ωN4(γ1)ωN5(γ1)ω0(γ1)ω1(γ1)ωN2(γ1)ωN3(γ1)ωN4(γ1)ω1(γ1)ω0(γ1)]andTy=[ω0(γ2)ω1(γ2)ω2(γ2)ω3N(γ2)ω2N(γ2)ω1(γ2)ω0(γ2)ω1(γ2)ω4N(γ1)ω3N(γ2)ω2(γ2)ω1(γ2)ω0(γ2)ω5N(γ2)ω4N(γ2)ωN3(γ2)ωN4(γ2)ωN5(γ2)ω0(γ2)ω1(γ2)ωN2(γ2)ωN3(γ2)ωN4(γ2)ω1(γ2)ω0(γ2)].T_{x}=\begin{bmatrix}\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}&\omega^{(\gamma_{1})}_{-2}&\cdots&\omega^{(\gamma_{1})}_{3-N}&\omega^{(\gamma_{1})}_{2-N}\\ \omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}&\cdots&\omega^{(\gamma_{1})}_{4-N}&\omega^{(\gamma_{1})}_{3-N}\\ \omega^{(\gamma_{1})}_{2}&\omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}&\cdots&\omega^{(\gamma_{1})}_{5-N}&\omega^{(\gamma_{1})}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma_{1})}_{N-3}&\omega^{(\gamma_{1})}_{N-4}&\omega^{(\gamma_{1})}_{N-5}&\cdots&\omega^{(\gamma_{1})}_{0}&\omega^{(\gamma_{1})}_{-1}\\ \omega^{(\gamma_{1})}_{N-2}&\omega^{(\gamma_{1})}_{N-3}&\omega^{(\gamma_{1})}_{N-4}&\cdots&\omega^{(\gamma_{1})}_{1}&\omega^{(\gamma_{1})}_{0}\end{bmatrix}~{}~{}{\rm and}~{}~{}T_{y}=\begin{bmatrix}\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}&\omega^{(\gamma_{2})}_{-2}&\cdots&\omega^{(\gamma_{2})}_{3-N}&\omega^{(\gamma_{2})}_{2-N}\\ \omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}&\cdots&\omega^{(\gamma_{1})}_{4-N}&\omega^{(\gamma_{2})}_{3-N}\\ \omega^{(\gamma_{2})}_{2}&\omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}&\cdots&\omega^{(\gamma_{2})}_{5-N}&\omega^{(\gamma_{2})}_{4-N}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \omega^{(\gamma_{2})}_{N-3}&\omega^{(\gamma_{2})}_{N-4}&\omega^{(\gamma_{2})}_{N-5}&\cdots&\omega^{(\gamma_{2})}_{0}&\omega^{(\gamma_{2})}_{-1}\\ \omega^{(\gamma_{2})}_{N-2}&\omega^{(\gamma_{2})}_{N-3}&\omega^{(\gamma_{2})}_{N-4}&\cdots&\omega^{(\gamma_{2})}_{1}&\omega^{(\gamma_{2})}_{0}\end{bmatrix}.

Then, it is not hard to know that AA is a SPD matrix, which has no impact on the algorithm and theoretical analyses described in Sections 23. The only difference is that the direct application of PαP_{\alpha} for Krylov subspace solvers must be prohibited, because it requires the huge computational cost and memory storage for solving a sequence of dense shifted linear systems in Step-(b).

In order to reduce the computational expense, we can follow Eq. (3.13) to approximate the Jacobian matrix AA in (4.2) by the following τ\tau-matrix

𝒯(A)=[(κγ1hγ1𝒯(Tx))IN1+IN1(κγ2hγ2𝒯(Ty))]=(QN1QN1)[(κγ1hγ1ΛN1(x))IN1+IN1(κγ2hγ2ΛN1(y))](QN1QN1)\begin{split}\mathscr{T}(A)&=-\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\mathscr{T}(T_{x})\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\mathscr{T}(T_{y})\right)\right]\\ &=(Q^{\top}_{N-1}\otimes Q_{N-1})\left[\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\Lambda^{(x)}_{N-1}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\Lambda^{(y)}_{N-1}\right)\right](Q^{\top}_{N-1}\otimes Q_{N-1})\end{split} (4.3)

where 𝒯(Tx)=QN1ΛN1(x)QN1\mathscr{T}(T_{x})=Q^{\top}_{N-1}\Lambda^{(x)}_{N-1}Q_{N-1} and 𝒯(Ty)=(QN1ΛN1(y)QN1)\mathscr{T}(T_{y})=(Q^{\top}_{N-1}\Lambda^{(y)}_{N-1}Q_{N-1}) [47, 48, 49]. Again, instead of solving the shifted linear systems in Step-(b), we solve the following sequence of shifted linear systems,

[λn(α)Isτ𝒯(A)]𝒛2,n=𝒛1,n,n=1,2,,Nt,\displaystyle\big{[}\lambda^{(\alpha)}_{n}I_{s}-\tau\mathscr{T}(A)\big{]}{\bm{z}}_{2,n}={\bm{z}}_{1,n},\quad n=1,2,\cdots,N_{t}, (4.4)
(QN1QN1)[λn(α)Is(κγ1hγ1ΛN1(x))IN1+IN1(κγ2hγ2ΛN1(y))](QN1QN1)𝒛2,n=𝒛1,n,\displaystyle\Leftrightarrow(Q^{\top}_{N-1}\otimes Q_{N-1})\left[\lambda^{(\alpha)}_{n}I_{s}-\left(\frac{\kappa_{\gamma_{1}}}{h^{\gamma_{1}}}\Lambda^{(x)}_{N-1}\right)\otimes I_{N-1}+I_{N-1}\otimes\left(\frac{\kappa_{\gamma_{2}}}{h^{\gamma_{2}}}\Lambda^{(y)}_{N-1}\right)\right](Q^{\top}_{N-1}\otimes Q_{N-1}){\bm{z}}_{2,n}={\bm{z}}_{1,n}, (4.5)

where s=(N1)2s=(N-1)^{2}, efficiently by fast discrete sine transforms avoiding the storage of any dense matrices. The numerical results reported in Tables 46 with h=hx=hyh=h_{x}=h_{y} illustrate the effectiveness and robustness of the PinT preconditioner 𝒫α\mathcal{P}_{\alpha} for solving Eq. (4.5).

Table 4: Numerical results of GMRES with two different preconditioners on Example 2 with (γ1,γ2)=(1.4,1.2)(\gamma_{1},\gamma_{2})=(1.4,1.2).
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hx=hyh_{x}=h_{y} DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/64 254,016 4.0 1.006 -9.195 1.2627e-4 12.0 2.544 -9.290 1.2628e-4
1/128 1,032,256 4.5 5.325 -10.066 8.0645e-5 12.0 12.312 -9.132 8.0646e-5
1/256 4,161,600 4.5 17.597 -9.219 7.8998e-5 12.0 42.376 -9.134 7.8999e-5
1/512 16,711,744 5.0 115.94 -9.814 7.8611e-5 12.5 262.03 -9.151 7.8612e-5
282^{8} 1/64 1,016,064 4.0 4.027 -9.544 7.4633e-5 12.0 10.355 -9.264 7.4633e-5
1/128 4,129,024 4.0 19.419 -9.134 2.1246e-5 12.0 49.827 -9.102 2.1246e-5
1/256 16,646,400 5.0 79.796 -9.446 7.8953e-6 12.0 173.68 -9.092 7.8954e-6
1/512 66,846,976 4.5 426.03 -9.174 5.0729e-6 12.5 1050.3 -9.204 5.0732e-6
2102^{10} 1/64 4,064,256 4.0 16.246 -10.084 7.1404e-5 12.0 41.638 -9.244 7.1404e-5
1/128 16,516,096 4.0 78.194 -9.861 1.8016e-5 12.0 203.61 -9.084 1.8017e-5
1/256 66,585,600 4.0 265.24 -9.567 4.6657e-6 12.0 753.81 -9.072 4.6661e-6
1/512 267,387,904 4.0 2520.7 -9.157 1.3276e-6 12.5 6989.3 -9.209 1.3282e-6
Table 5: Numerical results of GMRES with two different preconditioners on Example 2 with (γ1,γ2)=(1.5,1.5)(\gamma_{1},\gamma_{2})=(1.5,1.5).
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hx=hyh_{x}=h_{y} DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/64 254,016 4.0 1.004 -9.297 1.5758e-4 11.0 2.487 -9.255 1.5758e-4
1/128 1,032,256 4.5 5.322 -9.978 8.1963e-5 11.0 11.416 -9.227 8.1963e-5
1/256 4,161,600 5.0 19.756 -10.122 7.9075e-5 11.5 41.569 -9.379 7.9075e-5
1/512 16,711,744 5.0 115.82 -9.676 7.8490e-5 11.5 240.18 -9.345 7.8490e-5
282^{8} 1/64 1,016,064 4.0 4.058 -9.585 1.0778e-4 11.0 9.745 -9.127 1.0778e-4
1/128 4,129,024 4.0 19.609 -9.106 2.9441e-5 11.0 46.275 -9.162 2.9441e-5
1/256 16,646,400 4.5 73.342 -9.454 9.8515e-6 11.5 168.30 -9.259 9.8515e-6
1/512 66,846,976 4.5 424.74 -9.300 5.1188e-6 11.5 965.63 -9.245 5.1183e-6
2102^{10} 1/64 4,064,256 4.0 16.343 -10.060 1.0466e-4 11.0 39.182 -9.100 1.0466e-4
1/128 16,516,096 4.0 78.225 -9.813 2.6328e-5 11.0 188.46 -9.126 2.6328e-5
1/256 66,585,600 4.0 265.75 -9.582 6.7380e-6 11.5 733.89 -9.256 6.7382e-6
1/512 267,387,904 4.0 2517.9 -9.282 1.8400e-6 11.5 6278.1 -9.224 1.8401e-6
Table 6: Numerical results of GMRES with two different preconditioners on Example 2 with (γ1,γ2)=(1.7,1.9)(\gamma_{1},\gamma_{2})=(1.7,1.9).
𝒫α\mathcal{P}_{\alpha} 𝒫1\mathcal{P}_{1}
NtN_{t} hx=hyh_{x}=h_{y} DoF Iter CPU TRR Err Iter CPU TRR Err
262^{6} 1/64 254,016 4.0 1.047 -9.492 2.3321e-4 11.5 2.513 -10.177 2.3321e-4
1/128 1,032,256 4.0 4.976 -9.058 9.5250e-5 11.5 12.175 -10.188 9.5251e-5
1/256 4,161,600 4.5 17.826 -9.887 7.9355e-5 11.5 41.627 -10.208 7.9355e-5
1/512 16,711,744 4.5 114.13 -9.473 7.8240e-5 11.0 231.76 -9.154 7.8240e-5
282^{8} 1/64 1,016,064 4.0 4.035 -10.062 1.8715e-4 11.0 9.673 -9.339 1.8715e-4
1/128 4,129,024 4.0 19.567 -9.425 4.9102e-5 11.5 48.305 -9.977 4.9102e-5
1/256 16,646,400 4.0 66.497 -9.150 1.4579e-5 11.0 161.38 -9.064 1.4578e-5
1/1024 66,846,976 4.0 380.78 -9.011 5.9522e-6 11.0 921.53 -9.115 5.9518e-6
2102^{10} 1/64 4,064,256 4.0 16.225 -10.092 1.8428e-4 11.0 38.684 -9.027 1.8428e-4
1/128 16,516,096 4.0 78.158 -9.905 4.6224e-5 11.5 198.25 -10.077 4.6224e-5
1/256 66,585,600 4.0 266.02 -9.760 1.1700e-5 11.5 734.78 -10.020 1.1701e-5
1/512 267,387,904 4.0 2519.5 -9.551 3.0690e-6 11.0 6117.6 -9.100 3.0691e-6

Similar to Example 1, Tables 46 show that the preconditioner 𝒫α\mathcal{P}_{\alpha} converges much faster in terms of both CPU and Iter than 𝒫1\mathcal{P}_{1} on this example, with different values of γ\gamma’s. The accuracy of two preconditioned BiCGSTAB methods is almost comparable with respect to TRR and Err. Once again, the results indicate that introducing the adaptive parameter α(0,1)\alpha\in(0,1) indeed helps improve the performance of 𝒫1\mathcal{P}_{1}; the faster convergence rate of 𝒫α\mathcal{P}_{\alpha} is computational attractive especially when the diffusion coefficients become smaller than 10210^{-2} – cf. Remark 4. The τ\tau-matrix approximation to the Jacobian matrix AA in (3.5) remains very effective for this two-dimensional model problem (4.1). For more details, one can attempt to play it by our accessible MATLAB codes. The iteration number of BiCGSTAB-𝒫α\mathcal{P}_{\alpha} varies only slightly when NtN_{t} increases (or hh decreases), showing also for this problem almost matrix-size independent convergence rate. Such favourable numerical scalability property confirms our theoretical analysis since in Section 3.1 the eigenvalue distribution of the preconditioned matrices is independent of the space-discretization matrix AA.

5 Conclusions

In this note, we revisit the all-at-once linear system arising from the BDFpp temporal discretization for evolutionary PDEs. In particular, we present the BDF2 scheme for the RFDEs model as our case study, where the resultant all-at-once system is a BLTT linear system with a low-rank matrix perturbation. The conditioning of the all-at-once coefficient matrix is studied and tight bounds are provided. Then, we adapt the generalized BC preconditioner for such all-at-once systems, proving the invertibility of the preconditioner matrix unlike in previous studies. Our analysis demonstrates the superiority of the generalized BC preconditioner to the BC preconditioner. Moreover, the spectral properties of the preconditioned system and the convergence behavior of the preconditioned Krylov subspace solver have been investigated. By the τ\tau-matrix approximation of the dense Jacobian matrix AA, we derive a memory-effective implementation of the generalized BC preconditioner for solving the one- and two-dimensional RFDEs model problems. Numerical results have been reported to show the effectiveness of the generalized BC preconditioner.

On the other hand, according to the analysis and implementation of the generalized BC preconditioner, the BDF2 scheme implemented via PinT preconditioned Krylov solvers can be easily extended to other spatial discretizations schemes for RFDEs (with other boundary conditions). Furthermore, our study may inspire the development of new parallel numerical methods preserving the positivity for certain evolutionary PDEs. As an outlook for the future, the extension of the generalized BC preconditioner and of its parallel implementation for solving nonlinear RFDEs [57, 32] (even in cases when non-uniform temporal steps are chosen in combination with the BDF2 scheme; refer to [34, 18] for a short discussion) remains an interesting topic of further research.

Acknowledgments

The authors would like to thank Dr. Xin Liang and Prof. Shu-Lin Wu for their insightful suggestions and encouragements. This research is supported by NSFC (11801463 and 61876203) and the Applied Basic Research Project of Sichuan Province (2020YJ0007). Meanwhile, the first author is grateful to Prof. Hai-Wei Sun for his helpful discussions during the visiting to University of Macau.

References

  • [1] S.Ş. Bayın, Definition of the Riesz derivative and its application to space fractional quantum mechanics, J. Math. Phys., 57(12) (2016), Article No. 123501, 12 pages. DOI: 10.1063/1.4968819.
  • [2] R.L. Magin, Fractional Calculus in Bioengineering, Begell House Publishers, Danbury, CT (2006).
  • [3] M.M. Meerschaert, C. Tadjeran, Finite difference approximations for two-sided space-fractional partial differential equations, Appl. Numer. Math., 56(1) (2006), pp. 80-90.
  • [4] E. Scalas, R. Gorenflo, F. Mainardi Fractional calculus and continuous-time finance, Physica A, 284(1-4) (2000), pp. 376-384.
  • [5] A.A. Kilbas, H.M. Srivastava, J.J. Trujillo, Theory and applications of fractional differential equations, Elsevier, Amsterdam, Netherlands (2006).
  • [6] R. Metzler, J. Klafter, The random walk’s guide to anomalous diffusion: a fractional dynamics approach, Phys. Rep., 339(1) (2000), pp. 1-77.
  • [7] M. Chen, Y. Wang, X. Cheng, W. Deng, Second-order LOD multigrid method for multidimensional Riesz fractional diffusion equation, BIT, 54(3) (2014), pp. 623-647.
  • [8] C. Çelik, M. Duman, Crank-Nicolson method for the fractional diffusion equation with the Riesz fractional derivative, J. Comput. Phys., 231(4) (2012), pp. 1743-1750.
  • [9] W.Y. Tian, H. Zhou, W. Deng, A class of second order difference approximations for solving space fractional diffusion equations, Math. Comp., 84(294) (2015), pp. 1703-1727.
  • [10] H.-L. Liao, P. Lyu, S. Vong, Second-order BDF time approximation for Riesz space-fractional diffusion equations, Int. J. Comput. Math., 95(1) (2018), pp. 144-158.
  • [11] D. Hu, X. Cao, A fourth-order compact ADI scheme for two-dimensional Riesz space fractional nonlinear reaction-diffusion equation, Int. J. Comput. Math., 97(9) (2020), pp. 1928-1948. DOI: 10.1080/00207160.2019.1671587.
  • [12] X. Yue, S. Shu, X. Xu, W. Bu, K. Pan, Parallel-in-time multigrid for space-time finite element approximations of two-dimensional space-fractional diffusion equations, Comput. Math. Appl., 78(11) (2019), pp. 3471-3484.
  • [13] Y. Xu, Y. Zhang, J. Zhao, Backward difference formulae and spectral Galerkin methods for the Riesz space fractional diffusion equation, Math. Comput. Simul., 166 (2019), pp. 494-507.
  • [14] Y. Xu, Y. Zhang, J. Zhao, General linear and spectral Galerkin methods for the Riesz space fractional diffusion equation, Appl. Math. Comput., 364 (2020), Article No. 124664, 12 pages. DOI: 10.1016/j.amc.2019.124664.
  • [15] J. Zhao, Y. Zhang, Y. Xu, Implicit Runge-Kutta and spectral Galerkin methods for Riesz space fractional/distributed-order diffusion equation, Comput. Appl. Math., 39(2) (2020), Article No. 47, 27 pages. DOI: 10.1007/s40314-020-1102-3.
  • [16] H.-K. Pang, H.-W. Sun, Fast numerical contour integral method for fractional diffusion equations, J. Sci. Comput., 66(1) (2016), pp. 41-66.
  • [17] M.J. Gander, M. Neumüller, Analysis of a new space-time parallel multigrid algorithm for parabolic problems, SIAM J. Sci. Comput., 38(4) (2016), pp. A2173-A2208.
  • [18] R.D. Falgout, M. Lecouvez, C.S. Woodward, A parallel-in-time algorithm for variable step multistep methods, J. Comput. Sci., 37 (2019), Article No. 101029, 12 pages. DOI: 10.1016/j.jocs.2019.101029.
  • [19] S.-L. Wu, T. Zhou, Acceleration of the MGRiT algorithm via the diagonalization technique, SIAM J. Sci. Comput., 41(5) (2019), pp. A3421-A3448.
  • [20] M.E. Farquhar, T.J. Moroney, Q. Yang, I.W. Turner, GPU accelerated algorithms for computing matrix function vector products with applications to exponential integrators and fractional diffusion, SIAM J. Sci. Comput., 38(3) (2016), pp. C127-C149.
  • [21] S.-L. Wu, T. Zhou, Fast parareal iterations for fractional diffusion equations, J. Comput. Phys., 329 (2017), pp. 210-226.
  • [22] Y. Maday, E.M. Rønquist, Parallelization in time through tensor-product space-time solvers, C. R. Math., 346(1-2) (2008), pp. 113-118.
  • [23] X.-M. Gu, T.-Z. Huang, X.-L. Zhao, H.-B. Li, L. Li, Strang-type preconditioners for solving fractional diffusion equations by boundary value methods, J. Comput. Appl. Math., 277 (2015), pp. 73-86.
  • [24] M.J. Gander, 50 Years of Time Parallel Time Integration, in Multiple Shooting and Time Domain Decomposition Methods (T. Carraro, M. Geiger, S. Körkel, et al., eds.), Contributions in Mathematical and Computational Sciences, vol 9. Springer, Cham, Switzerland (2017), pp. 69-113. DOI: 10.1007/978-3-319-23321-5_3.
  • [25] S.-L. Lei, Y.-C. Huang, Fast algorithms for high-order numerical methods for space-fractional diffusion equations, Int. J. Comput. Math., 94(5) (2017), pp. 1062-1078.
  • [26] T. Breiten, V. Simoncini, M. Stoll, Low-rank solvers for fractional differential equations, Electron. Trans. Numer. Anal., 45 (2016), pp. 107-132.
  • [27] A.T. Weaver, S. Gürol, J. Tshimanga, M. Chrust, A. Piacentini, “Time”-parallel diffusion-based correlation operators, Q. J. R. Meteorol. Soc., 144(716) (2018), pp. 2067-2088.
  • [28] D. Bertaccini, F. Durastante, Block structured preconditioners in tensor form for the all-at-once solution of a finite volume fractional diffusion equation, Appl. Math. Lett., 95 (2019), pp. 92-97.
  • [29] I. Podlubny, A. Chechkin, T. Skovranek, Y.Q. Chen, B.M.V. Jara, Matrix approach to discrete fractional calculus II: Partial fractional differential equations, J. Comput. Phys., 228(8) (2009), pp. 3137-3153.
  • [30] M. Ran, C. Zhang, A high-order accuracy method for solving the fractional diffusion equations, J. Comp. Math., 38(2) (2020), pp. 239-253.
  • [31] L. Banjai, D. Peterseim, Parallel multistep methods for linear evolution problems, IMA J. Numer. Anal., 32(3) (2012), pp. 1217-1240.
  • [32] Y.-L. Zhao, P.-Y. Zhu, X.-M. Gu, X.-L. Zhao, H.-Y. Jian, A preconditioning technique for all-at-once system from the nonlinear tempered fractional diffusion equation, J. Sci. Comput., 83(1) (2020), Article No. 10, 27 pages. DOI: 10.1007/s10915-020-01193-1.
  • [33] E. McDonald, J. Pestana, A. Wathen, Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations, SIAM J. Sci. Comput., 40(2) (2018), pp. A1012-A1033.
  • [34] A. Goddard, A. Wathen, A note on parallel preconditioning for all-at-once evolutionary PDEs, Electron. Trans. Numer. Anal., 51 (2019), pp. 135-150.
  • [35] X.-L. Lin, M. Ng, An all-at-once preconditioner for evolutionary partial differential equations, arXiv preprint, arXiv:2002.01108, submitted to SIAM J. Sci. Comput., 4 Feb. 2020, 17 pages.
  • [36] S.-L. Wu, H. Zhang, T. Zhou, Solving time-periodic fractional diffusion equations via diagonalization technique and multigrid, Numer. Linear Algebra Appl., 25(5) (2018), Article No. e2178, 24 pages. DOI: 10.1002/nla.2178.
  • [37] O. Bokanowski, K. Debrabant, Backward differentiation formula finite difference schemes for diffusion equations with an obstacle term, IMA J. Numer. Anal., to appear, 6 Jul. 2020, 29 pages. DOI: 10.1093/imanum/draa014.
  • [38] M. Donatelli, M. Mazza, S. Serra-Capizzano, Spectral analysis and structure preserving preconditioners for fractional diffusion equations, J. Comput. Phys., 307 (2016), pp. 262-279.
  • [39] K. Burrage, Parallel and Sequential Methods for Ordinary Differential Equations, Oxford University Press, New York, NY (1995).
  • [40] U.M. Ascher, L.R. Petzold, Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations, SIAM, Philadelphia, PA (1998).
  • [41] C. Lubich, D. Mansour, C. Venkataraman, Backward difference time discretization of parabolic differential equations on evolving surfaces, IMA J. Numer. Anal., 33(4) (2013), pp. 1365-1385.
  • [42] E. Emmrich, Error of the two-step BDF for the incompressible Navier-Stokes problem, ESAIM-Math. Model. Numer. Anal., 38(5) (2004), 757-764.
  • [43] V. Kučera, M. Vlasák, A priori diffusion-uniform error estimates for nonlinear singularly perturbed problems: BDF2, midpoint and time DG, ESAIM-Math. Model. Numer. Anal., 51(2) (2017), pp. 537-563.
  • [44] H. Nishikawa, On large start-up error of BDF2, J. Comput. Phys., 392 (2019), pp. 456-461.
  • [45] Q. Zhang, Y. Ren, X. Lin, Y. Xu, Uniform convergence of compact and BDF methods for the space fractional semilinear delay reaction-diffusion equations. Appl. Math. Comput., 358 (2019), pp. 91-110.
  • [46] C. Li, M.-M. Li, X. Sun, Finite difference/Galerkin finite element methods for a fractional heat conduction-transfer equation, Math. Meth. Appl. Sci., to appear, 30 Oct. 2019, 20 pages. DOI: 10.1002/mma.5984.
  • [47] D. Bini, F. Di Benedetto, A new preconditioner for the parallel solution of positive definite Toeplitz systems, Proceedings of the Second Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’90), ACM, New York, NY (1990), pp. 220-223. DOI: 10.1145/97444.97688.
  • [48] F. Di Benedetto, Preconditioning of block Toeplitz matrices by sine transforms, SIAM J. Sci. Comput., 18(2) (1997), pp. 499-515.
  • [49] S. Serra, Superlinear PCG methods for symmetric Toeplitz systems, Math. Comp., 68(226) (1999), pp. 793-803.
  • [50] Z. Hao, Z. Zhang, R. Du, Fractional centered difference scheme for high-dimensional integral fractional Laplacian, J. Comput. Phys., 424 (2021), Article No. 109851, 17 pages. DOI: 10.1016/j.jcp.2020.109851.
  • [51] Q. Yang, F. Liu, I. Turner, Numerical methods for fractional partial differential equations with Riesz space fractional derivatives, Appl. Math. Model., 34(1) (2010), pp. 200-218.
  • [52] Z.-P. Hao, Z.-Z. Sun, W.-R. Cao, A fourth-order approximation of fractional derivatives with its applications, J. Comput. Phys., 281 (2015), pp. 787-805.
  • [53] W.W. Hager, Updating the inverse of a matrix, SIAM Rev. 31(2) (1989), pp. 221-239.
  • [54] X.-M. Gu, S.-L. Wu, A parallel-in-time iterative algorithm for Volterra partial integro-differential problems with weakly singular kernel, J. Comput. Phys., 417 (2020), Article No. 109576, 17 pages. DOI: 10.1016/j.jcp.2020.109576.
  • [55] P. Jiránek, M. Rozložník, Adaptive version of Simpler GMRES, Numer. Algorithms, 53(1) (2010), pp. 93-112.
  • [56] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM, Philadelphia, PA (2003).
  • [57] M.J. Gander, L. Halpern, Time parallelization for nonlinear problems based on diagonalization, in Domain Decomposition Methods in Science and Engineering XXIII (C.-O. Lee, X.-C. Cai, D.E. Keyes, et al., eds.), Lecture Notes in Computational Science and Engineering, vol. 116, Springer, Cham, Switzerland (2017), pp. 163-170. DOI: 10.1007/978-3-319-52389-7_15.