This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fast algebraic multigrid for block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problems thanks: This work was supported by NSFC 11601206.

Minghua Chen Corresponding author. School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou 730000, P.R. China (Email: [email protected])    Rongjun Cao School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou 730000, P.R. China (Email: [email protected])    Stefano Serra-Capizzano Department of Science and High Technology, University of Insubria, Via Valleggio 11, 22100 Como, Italy &\& Department of Information Technology, Division of Scientific Computing, Uppsala University - ITC, L ägerhyddsv. 2, hus 2, P.O. Box 337, SE-751 05, Uppsala, Sweden (Email: [email protected], [email protected])
Abstract

Algebraic multigrid (AMG) is one of the most efficient iterative methods for solving large sparse system of equations. However, how to build/check restriction and prolongation operators in practical of AMG methods for nonsymmetric sparse systems is still an interesting open question [Brezina, Manteuffel, McCormick, Runge, and Sanders, SIAM J. Sci. Comput. (2010); Manteuffel and Southworth, SIAM J. Sci. Comput. (2019)]. This paper deals with the block-structured dense and Toeplitz-like-plus-Cross systems, including nonsymmetric indefinite, symmetric positive definite (SPD), arising from nonlocal diffusion problem and peridynamic problem. The simple (traditional) restriction operator and prolongation operator are employed in order to handle such block-structured dense and Toeplitz-like-plus-Cross systems, which is convenient and efficient when employing a fast AMG. We focus our efforts on providing the detailed proof of the convergence of the two-grid method for such SPD situations. The numerical experiments are performed in order to verify the convergence with a computational cost of only 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) arithmetic operations, by using few fast Fourier transforms, where NN is the number of the grid points. To the best of our knowledge, this is the first contribution regarding Toeplitz-like-plus-Cross linear systems solved by means of a fast AMG.

keywords:
Algebraic multigrid, nonlocal diffusion problem, peridynamic problem, block-structured dense system, Toeplitz-like-plus-Cross system, fast Fourier transform
AMS:
65M55,74A70, 65T50

1 Introduction

Large, sparse, block-structured linear systems arise in a wide variety of applications throughout computational science and engineering involving advection-diffusion flow [42], image process [32], Markov chains [43], Biot’s consolidation model [36], Navier-Stocks equations and saddle point problems [7]. In this paper we study the fast algebraic multigrid for solving the block-structured dense linear systems, stemming from nonlocal problems [2, 11, 19, 26] by the piecewise quadratic polynomial collocation approximations, whose associated matrix can be expressed as a 2×22\times 2 block structure

(1) 𝒜u=(ABCD)(vw)=(bfbg),\mathcal{A}u=\left(\begin{array}[]{cc}A&B\\ C&D\end{array}\right)\left(\begin{array}[]{c}v\\ w\end{array}\right)=\left(\begin{array}[]{c}b_{f}\\ b_{g}\end{array}\right),

with the coefficient matrices AM×MA\in\mathbb{R}^{M\times M}, BM×NB\in\mathbb{R}^{M\times N}, CN×MC\in\mathbb{R}^{N\times M} and DN×ND\in\mathbb{R}^{N\times N}.

Algebraic multigrid (AMG) is one of the most efficient iterative methods for solving large-scale system of equations [37, 48]. In the past decades, AMG methods for linear systems having Toeplitz coefficient matrices with scalar entries have been widely studied [14] including elliptic PDEs [37, 39, 47, 48], fractional PDEs [22, 25, 35] and nonlocal PDEs [15, 17]. Some papers have investigated the case of block entries, where the entries are small generic matrices of fixed size instead of scalars [18, 24, 28]. Only few papers have investigated block-structured sparse linear systems (1). For example, setting up partial prolongations operators with a Galerkin coarse grid matrix, a new AMG approach for Stokes problem are designed in [33]. Constructing the corresponding tentative transfer operators, a fully aggregation-based AMG is developed for nonlinear contact problems of saddle point [46]. Using approximate ideal restriction (AIR) operators, AIR AMG method for space-time hybridizable discontinuous Galerkin discretization of advection-dominated flows are investigates [42]. Defining the interpolation matrix with the coarse coefficient vector, multilevel Markov chain Monte Carlo AMG algorithms are performed. Involving sparse integral transfer operators, towards adaptive smoothed aggregation AMG for nonsymmetric problems are considered [10]. A transfer operator based on the fractional approximation property, two-grid methods convergence in norm of nonsymmetric algebraic multigrid are presented [10]. However, how to build/check restriction and prolongation operators in practical of AMG methods for nonsymmetric sparse systems is still an interesting open question [10, 31]. In particular, how to develop/design fast AMG for block-structured dense linear systems (1) is still an interesting problems, since the above special prolongation/transfer operators are not easy to be employed in connection with the fast Fourier transform.

In this work, the simple (traditional) restriction operator and prolongation operator are used in order to handle such block-structured dense systems (1), including nonsymmetric indefinite system, symmetric positive definite (SPD), Toeplitz-plus-diagonal systems, which derive from the nonlocal problems discussed in [2, 11, 19, 26]. In general, it is still not at all easy for dense stiffness matrices [3, 4, 8, 17], unless we can reduce the problem to the Toeplitz setting and we know the symbol, its zeros, and their orders [39]. Instead we will focus our attention on first answering such a question for a two-level setting, since it is useful from a theoretical point of view as first step: in fact the study of the MGM convergence usually begins from the convergence analysis of the two-grid method (TGM) [35, 37, 48]. We focus our attention in providing a detailed proof of the convergence of the two-grid method (TGM) for the considered SPD linear systems. To the best of our knowledge, this is the first time that a fast AMG is studied for the block-structured dense linear systems as those reported in (1).

The outline of this paper is as follows. In the next section, we introduce block-structured dense systems including applications in nonlocal diffusion problems and peridynamic problem by the piecewise quadratic polynomial collocation. In Section 3, block-structured V-cycle AMG algorithm by Fast fourier transform for Toeplitz-like-plus-Cross systems are designed. Convergence rate of the two-grid method is analyzed in Section 4. To show the effectiveness of the presented schemes, results of numerical experiments are reported in section 5. Finally, we conclude the paper with some remarks and open problems.

2 Block-structured dense systems applications

Nonlocal diffusion problems have been used to model very different scientific phenomena occurring in various applied fields, for example in biology, particle systems, image processing, coagulation models, mathematical finance, etc. [2, 26]. Recently, the nonlocal volume-constrained diffusion problems, the so-called nonlocal model for distinguishing the nonlocal diffusion problems, attracted the wide interest of scientists [26], where the linear scalar peridynamic model can be considered as a special case [26, 40]. For example, the nonlocal peridynamic (PD) model is becoming an attractive emerging tool for the multiscale material simulations of crack nucleation and growth, fracture, and failure of composites [40].

Let the piecewise quadratic base functions be defined by [5, p. 37]

ϕm(x)={xxm1h2x(xm1+xm)h,x[xm1,xm],xm+1xh(xm+1+xm)2xh,x[xm,xm+1],0,otherwise;\phi_{m}(x)=\left\{\begin{array}[]{rl}\frac{x-x_{m-1}}{h}\frac{2x-(x_{m-1}+x_{m})}{h},&x\in\left[x_{m-1},x_{m}\right],\\ \frac{x_{m+1}-x}{h}\frac{(x_{m+1}+x_{m})-2x}{h},&x\in\left[x_{m},x_{m+1}\right],\\ 0,&{\rm otherwise};\end{array}\right.

and

ϕm12(x)={4(xxm1)(xmx)h2,x[xm1,xm],0,otherwise.\phi_{m-\frac{1}{2}}(x)=\left\{\begin{array}[]{rl}\frac{4(x-x_{m-1})(x_{m}-x)}{h^{2}},&x\in\left[x_{m-1},x_{m}\right],\\ 0,&{\rm otherwise}.\end{array}\right.

Then the piecewise Lagrange quadratic interpolant of u(x)u(x) is

(2) uQ(x)=j=rN+ru(xj)ϕj(x)+j=rN+r1u(xj+12)ϕj+12(x).u_{Q}(x)=\sum_{j=-r}^{N+r}u(x_{j})\phi_{j}(x)+\sum_{j=-r}^{N+r-1}u(x_{j+\frac{1}{2}})\phi_{j+\frac{1}{2}}(x).

In this work, we mainly focus on two types of nonlocal problems approximated by the piecewise quadratic polynomial collocation, which give raise to the block-structured dense systems expressed in (1).

2.1 Application in nonlocal model

Consider the time-dependent nonlocal diffusion problem, whose prototype is [2, 6, 11, 19]

(3) {utu=fonΩ,t>0,u(x,0)=u0onΩ,\left\{\begin{split}u_{t}-\mathcal{L}u&=f&~{}~{}{\rm on}&~{}~{}\Omega,\,t>0,\\ u(x,0)&=u_{0}&~{}~{}{\rm on}&~{}~{}\Omega,\end{split}\right.

with Dirichlet boundary conditions. The nonlocal operator \mathcal{L} is defined by

u(x)=ΩJ(|xy|)[u(x)u(y)]𝑑yxΩ.\begin{split}\mathcal{L}u(x)=\int_{\Omega}J(|x-y|)\left[u(x)-u(y)\right]dy~{}~{}\forall x\in\Omega.\end{split}

Here J(x)1x1+2s,s[0.5,1)J(x)\sim\frac{1}{x^{1+2s}},~{}~{}s\in[-0.5,1) is a radial probability density with a nonnegative symmetric dispersal kernel.

We briefly review some basic relevant notions concerning the piecewise quadratic polynomial collocation approximations for the corresponding stationary problem

abu(x)u(y)|xy|γ𝑑y=f(x),0<γ<1,\int^{b}_{a}\frac{u(x)-u(y)}{|x-y|^{\gamma}}dy=f(x),\quad 0<\gamma<1,

which leads to the nonsymmetric and indefinite system [11, 19]

(4) 𝒜hUh=ηh,γ(Fh+Kh)with𝒜h=(𝒟100𝒟2)(𝒬𝒫𝒩)\begin{split}\mathcal{A}_{h}U_{h}=\eta_{h,\gamma}\cdot(F_{h}+K_{h})~{}~{}{\rm with}~{}~{}\mathcal{A}_{h}=\left(\begin{matrix}\mathcal{D}_{1}&0\\ 0&\mathcal{D}_{2}\end{matrix}\right)-\left(\begin{matrix}\mathcal{M}&\mathcal{Q}\\ \mathcal{P}&\mathcal{N}\end{matrix}\right)\end{split}

and ηh,γ=(3γ)(2γ)(1γ)h1γ\eta_{h,\gamma}=\frac{(3-\gamma)(2-\gamma)(1-\gamma)}{h^{1-\gamma}}. Here the diagonal matrices 𝒟1\mathcal{D}_{1} and 𝒟2\mathcal{D}_{2} are given by

𝒟1=diag(d1,d2,,dN1),𝒟2=diag(d12,d32,,dN12)\mathcal{D}_{1}={\rm diag}\left(d_{1},d_{2},\ldots,d_{N-1}\right),~{}\mathcal{D}_{2}={\rm diag}\left(d_{\frac{1}{2}},d_{\frac{3}{2}},\ldots,d_{N-\frac{1}{2}}\right)

with

di2=(3γ)(2γ)((i2)1γ+(Ni2)1γ),i=1,2,,2N1.\begin{split}d_{\frac{i}{2}}=(3-\gamma)(2-\gamma)\left(\left(\frac{i}{2}\right)^{1-\gamma}+\left(N-\frac{i}{2}\right)^{1-\gamma}\right),\quad i=1,2,\cdots,2N-1.\end{split}

The square matrices \mathcal{M}, 𝒩\mathcal{N} and rectangular matrices 𝒫\mathcal{P}, 𝒬\mathcal{Q} are, respectively, defined by

=toeplitz(m0,m1,,mN2),𝒩=toeplitz(n0,n1,,nN1),𝒫=toeplitz([p0,p0,p1,,pN2],[p0,p1,,pN2]),𝒬=toeplitz([q0,q1,,qN2],[q0,q0,q1,,qN2]).\begin{split}\mathcal{M}&={\rm toeplitz}\left(m_{0},m_{1},\ldots,m_{N-2}\right),~{}~{}\mathcal{N}={\rm toeplitz}\left(n_{0},n_{1},\ldots,n_{N-1}\right),\\ \mathcal{P}&={\rm toeplitz}([p_{0},p_{0},p_{1},\dots,p_{N-2}],[p_{0},p_{1},\dots,p_{N-2}]),\\ \mathcal{Q}&={\rm toeplitz}([q_{0},q_{1},\dots,q_{N-2}],[q_{0},q_{0},q_{1},\dots,q_{N-2}]).\end{split}

The corresponding coefficients are computed by m0=2(1+γ)m_{0}=2(1+\gamma), n0=(2γ)2γ+1n_{0}={(2-\gamma)2^{\gamma+1}} and

p0=4[(32)3γ(12)3γ](3γ)[(32)2γ+3(12)2γ],qk=8((k+1)3γk3γ)+4(3γ)((k+1)2γ+k2γ),k0,mk=4[(k+1)3γ(k1)3γ](3γ)[(k+1)2γ+6k2γ+(k1)2γ],k1\begin{split}p_{0}&=4\left[\left(\frac{3}{2}\right)^{3-\gamma}-\left(\frac{1}{2}\right)^{3-\gamma}\right]-(3-\gamma)\left[\left(\frac{3}{2}\right)^{2-\gamma}+3\left(\frac{1}{2}\right)^{2-\gamma}\right],\\ q_{k}&=-8\left((k+1)^{3-\gamma}-k^{3-\gamma}\right)+4(3-\gamma)\left((k+1)^{2-\gamma}+k^{2-\gamma}\right),\,k\geq 0,\\ m_{k}&=4\left[(k+1)^{3-\gamma}\!-\!(k-1)^{3-\gamma}\right]-(3-\gamma)\left[(k+1)^{2-\gamma}+6k^{2-\gamma}+(k-1)^{2-\gamma}\right],~{}k\geq 1\end{split}

with pk=mk+12p_{k}=m_{k+\frac{1}{2}}, nk=qk12n_{k}=q_{k-\frac{1}{2}} for k1k\geq 1.

The boundary data KK is given by

K=(η1,η2,,ηN1,η12,η32,,ηN12)Tu0+(ηN1,ηN2,,η1,ηN12,ηN32,,η12)TuN\begin{split}K&=\left(\eta_{1},\eta_{2},\cdots,\eta_{N-1},\eta_{\frac{1}{2}},\eta_{\frac{3}{2}},\cdots,\eta_{N-\frac{1}{2}}\right)^{T}u_{0}\\ &\quad+\left(\eta_{N-1},\eta_{N-2},\cdots,\eta_{1},\eta_{N-\frac{1}{2}},\eta_{N-\frac{3}{2}},\cdots,\eta_{\frac{1}{2}}\right)^{T}u_{N}\end{split}

with η12=(2γ)(1γ)2γ1\eta_{\frac{1}{2}}={{(2-\gamma)(1-\gamma)}2^{\gamma-1}} and

ηi/2=4[i3γ(i1)3γ](3γ)[3i2γ+(i1)2γ(2γ)i1γ],2i2N1.\eta_{i/2}=4\left[i^{3-\gamma}\!-\!(i-1)^{3-\gamma}\right]-(3-\gamma)\left[3i^{2-\gamma}+\left(i-1\right)^{2-\gamma}-(2-\gamma)i^{1-\gamma}\right],~{}~{}2\leq i\leq 2N-1.

2.2 Application in Peridynamic model

Let us consider the following time-dependent peridynamic/nonlocal volume-constrained diffusion problem [17, 26, 40]

(5) {utδu=fonΩ,t>0,u(x,0)=u0onΩΩ,u=gonΩ,t>0.\left\{\begin{split}u_{t}-\mathcal{L}_{\delta}u&=f&~{}~{}{\rm on}&~{}~{}\Omega,\,t>0,\\ u(x,0)&=u_{0}&~{}~{}{\rm on}&~{}~{}\Omega\cup\Omega_{\mathcal{I}},\\ u&=g&~{}~{}{\rm on}&~{}~{}\Omega_{\mathcal{I}},t>0.\end{split}\right.

The nonlocal operator δ\mathcal{L}_{\delta} is defined by [26]

δu(x)=Bδ(x)γδ(|xy|)[u(y)u(x)]𝑑yxΩ\begin{split}\mathcal{L}_{\delta}u(x)=\int_{B_{\delta}(x)}\gamma_{\delta}(|x-y|)\left[u(y)-u(x)\right]dy\ \ ~{}~{}\forall x\in\Omega\end{split}

with Bδ(x)={y:|yx|<δ}B_{\delta}(x)=\{y\in\mathbb{R}:|y-x|<\delta\} denoting a neighborhood centered at xx of radius δ\delta, which is the horizon parameter and represents the size of nonlocality; the symmetric nonlocal kernel is defined as γδ(|xy|)=0\gamma_{\delta}(|x-y|)=0 if yBδ(x)y\not\in B_{\delta}(x).

Before we start to discuss this problem we shall briefly review few preliminary notions regarding the piecewise quadratic polynomial collocation approximations for the corresponding stationary problem

(6) {δu=fonΩ,u=gonΩ,\left\{\begin{split}-\mathcal{L}_{\delta}u&=f~{}~{}{\rm on}~{}~{}\Omega,\\ u&=g~{}~{}\,{\rm on}~{}~{}\Omega_{\mathcal{I}},\end{split}\right.

where u=gu=g denotes a volumetric constraint imposed on a volume Ω\Omega_{\mathcal{I}} that has a nonzero volume and is made to be disjoint from Ω\Omega. In order to keep the expression simple, below we assume the unit interval Ω\Omega with the volumetric constraint domain Ω=[δ,0][1,1+δ]\Omega_{\mathcal{I}}=[-\delta,0]\cup[1,1+\delta], but everything can be shifted to an arbitrary interval [a,b][a,b]. For convenience, we focus on the special case where the kernel γδ(s)\gamma_{\delta}(s) is taken to be a constant, i.e., γδ(s)=3δ3\gamma_{\delta}(s)=3\delta^{-3} [17, 26, 44]. More general kernel types [26, 44] can be studied in a similar manner.

Now, we introduce and discuss the discretization scheme of (6). Let the ratio r=δ/h1r=\big{\lfloor}\delta/h\big{\rfloor}\geq 1 if δh\delta\geq h and r=δ/h=1r=\big{\lceil}\delta/h\big{\rceil}=1 if δh\delta\leq h. Let the mesh points xi=ihx_{i}=ih, h=1/Nh=1/N, i𝒩i\in\mathcal{N} and

𝒩={r,r+12,,0,12,,N12,N,,N+r12,N+r},𝒩in={12,1,32,,N12},𝒩out=𝒩𝒩in.\begin{split}&\mathcal{N}=\left\{-r,{-r+\frac{1}{2}},\ldots,0,\frac{1}{2},\ldots,{N-\frac{1}{2}},{N},\ldots,{N+r-\frac{1}{2}},N+r\right\},\\ &\mathcal{N}_{in}=\left\{{\frac{1}{2}},1,\frac{3}{2},\ldots,N-\frac{1}{2}\right\},~{}~{}\mathcal{N}_{out}=\mathcal{N}\setminus\mathcal{N}_{in}.\end{split}

Let uiu_{i} be the approximated value of u(xi)u(x_{i}) and fi=f(xi)f_{i}=f(x_{i}). Denote Im=[(m1)h,mh]I_{m}=\left[(m-1)h,mh\right], 1mr1\leq m\leq r, and Ir+1=[rh,δ]I_{r+1}=[rh,\delta].

2.2.1 Nonsymmetric indefinite block-structured dense systems

By the piecewise quadratic polynomial collocation (2), it is easy to check that the standard collocation method of stationary problem (6) has the following form [21]

(7) {δui=fi,i𝒩in,ui=gi,i𝒩out.\left\{\begin{split}-\mathcal{L}_{\delta}u_{i}&=f_{i},~{}~{}i\in\mathcal{N}_{in},\\ u_{i}&=g_{i},~{}~{}i\in\mathcal{N}_{out}.\end{split}\right.

For convenience of implementation, we use the matrix form of the grid functions

Uh=[u1,u2,,uN1,u12,u32,,uN12]TandFh=[f1,f2,,fN1,f12,f32,,fN12]T.\begin{split}U_{h}=[u_{1},u_{2},\ldots,u_{N-1},u_{\frac{1}{2}},u_{\frac{3}{2}},\ldots,u_{N-\frac{1}{2}}]^{T}~{}~{}~{}{\rm and}~{}~{}~{}F_{h}=[f_{1},f_{2},\ldots,f_{N-1},f_{\frac{1}{2}},f_{\frac{3}{2}},\ldots,f_{N-\frac{1}{2}}]^{T}.\end{split}

We next deal with the boundary conditions to ensure that the proposed fast AMG will have a 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) complexity. Let the components of the left boundary gvL{}^{L}g^{v} (or right boundary gvR{}^{R}g^{v}) at integer nodes and the left boundary gwL{}^{L}g^{w} (or right boundary gwR{}^{R}g^{w}) at semi-integer nodes be

gvL=(gr,gr+1,,g1,g0)T,gvR=(gN,gN+1,,gN+r)T,gwL=(gr+12,gr+32,,g12)T,gwR=(gN+12,gN+32,,gN+r12)T.\begin{split}{}^{L}g^{v}=&\left(g_{-r},g_{-r+1},\ldots,g_{-1},g_{0}\right)^{T},~{}~{}{}^{R}g^{v}=\left(g_{N},g_{N+1},\ldots,g_{N+r}\right)^{T},\\ {}^{L}g^{w}=&\left(g_{-r+\frac{1}{2}},g_{-r+\frac{3}{2}},\ldots,g_{-\frac{1}{2}}\right)^{T},~{}~{}{}^{R}g^{w}=\left(g_{N+\frac{1}{2}},g_{N+\frac{3}{2}},\ldots,g_{N+r-\frac{1}{2}}\right)^{T}.\end{split}

Let Fv=[f1,f2,,fN1]TF^{v}=[f_{1},f_{2},\ldots,f_{N-1}]^{T}, Fw=[f12,f32,,fN12]TF^{w}=[f_{\frac{1}{2}},f_{\frac{3}{2}},\ldots,f_{N-\frac{1}{2}}]^{T} and

wA=(a1,a2,,ar,0)T,wB=(a32,a52,,ar12,0)T,wC=(c0,c1,,cr)T,wD=(d1,d2,,dr)T.\begin{split}w^{A}=&\left(a_{1},a_{2},\ldots,a_{r},0\right)^{T},~{}~{}w^{B}=\left(a_{\frac{3}{2}},a_{\frac{5}{2}},\ldots,a_{r-\frac{1}{2}},0\right)^{T},\\ w^{C}=&\left(c_{0},c_{1},\ldots,c_{r}\right)^{T},~{}~{}w^{D}=\left(d_{1},d_{2},\ldots,d_{r}\right)^{T}.\end{split}

Denote

FjvL=i=jr+1givLwr+1+jiA+i=jrgiwLwr+jiB,FjvR=i=1r+2jgivRwj1+iA+i=1r+1jgiwRwj1+iB{}^{L}F^{v}_{j}=\sum^{r+1}_{i=j}{}^{L}g^{v}_{i}w^{A}_{r+1+j-i}+\sum^{r}_{i=j}{}^{L}g^{w}_{i}w^{B}_{r+j-i},~{}~{}~{}{}^{R}F^{v}_{j}=\sum^{r+2-j}_{i=1}{}^{R}g^{v}_{i}w^{A}_{j-1+i}+\sum^{r+1-j}_{i=1}{}^{R}g^{w}_{i}w^{B}_{j-1+i}

and

(8) Fjv=FjvFjvL,FNjv=FNjvFjvR,j=1,,r.F^{v}_{j}=F^{v}_{j}-{}^{L}F^{v}_{j},~{}~{}~{}F^{v}_{N-j}=F^{v}_{N-j}-{}^{R}F^{v}_{j},~{}~{}j=1,\dots,r.

Let

FjwL=i=jr+1givLwr+1+jiC+i=jrgiwLwr+jiD,Fr+1wL=gr+1vLwr+1C,FjwR=i=1r+2jgivRwj1+iC+i=1r+1jgiwRwj1+iD,Fr+1wR=g1vRwr+1C\begin{split}{}^{L}F^{w}_{j}&=\sum^{r+1}_{i=j}{}^{L}g^{v}_{i}w^{C}_{r+1+j-i}+\sum^{r}_{i=j}{}^{L}g^{w}_{i}w^{D}_{r+j-i},~{}~{}~{}{}^{L}F^{w}_{r+1}={}^{L}g^{v}_{r+1}w^{C}_{r+1},\\ {}^{R}F^{w}_{j}&=\sum^{r+2-j}_{i=1}{}^{R}g^{v}_{i}w^{C}_{j-1+i}+\sum^{r+1-j}_{i=1}{}^{R}g^{w}_{i}w^{D}_{j-1+i},~{}~{}~{}{}^{R}F^{w}_{r+1}={}^{R}g^{v}_{1}w^{C}_{r+1}\end{split}

and

(9) Fjw=FjwFjwL,FNjw=FNjwFjwR,j=1,,r+1.F^{w}_{j}=F^{w}_{j}-{}^{L}F^{w}_{j},~{}~{}~{}F^{w}_{N-j}=F^{w}_{N-j}-{}^{R}F^{w}_{j},~{}~{}j=1,\dots,r+1.

From (8) and (9), the numerical scheme (7) can be recast as [20, 21]

(10) 𝒜hNUh=ηhFhN=[FvFw]with𝒜hN=[ABCD],ηh=2δ3/h.\mathcal{A}^{N}_{h}U_{h}=\eta_{h}\cdot F^{N}_{h}=\left[\begin{array}[]{c}F^{v}\\ F^{w}\end{array}\right]~{}~{}{\rm with}~{}~{}\mathcal{A}^{N}_{h}=\left[\begin{array}[]{cc}A&B\\ C&D\end{array}\right],\eta_{h}=2\delta^{3}/h.

Note that the above Toeplitz matrices A(N1)×(N1)A\in\mathbb{R}^{{(N-1)}\times(N-1)}, B(N1)×NB\in\mathbb{R}^{{(N-1)}\times N}, CN×(N1)C\in\mathbb{R}^{{N}\times(N-1)} and DN×ND\in\mathbb{R}^{{N}\times N} are defined by

A=toeplitz(a0,a1,a2,,ar,01×(Nr2)),B=toeplitz([a12,a32,,ar12,01×(N1r)],[a12,a12,a32,,ar12,01×(Nr1)]),C=toeplitz([c0,c0,c1,c2,,cr,01×(Nr2)],[c0,c1,c2,,cr,01×(Nr2)]),D=toeplitz([d0,d1,d2,,dr,01×(Nr1)])\begin{split}A&={\rm toeplitz}\left(a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-2)}\right),\\ B&={\rm toeplitz}\left(\left[a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-1-r)}\right],\left[a_{\frac{1}{2}},a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-r-1)}\right]\right),\\ C&={\rm toeplitz}\left([c_{0},c_{0},c_{1},c_{2},\cdots,c_{r},\textbf{0}_{1\times(N-r-2)}],[c_{0},c_{1},c_{2},\cdots,c_{r},\textbf{0}_{1\times(N-r-2)}]\right),\\ D&={\rm toeplitz}\left([d_{0},d_{1},d_{2},\cdots,d_{r},\textbf{0}_{1\times(N-r-1)}]\right)\end{split}

with the coefficients

a0=12r2,am=2,ar=1,1mr1,am+12=4,0mr1,\begin{split}&a_{0}=12r-2,\quad a_{m}=-2,\quad a_{r}=-1,~{}~{}1\leq m\leq r-1,\\ &a_{m+\frac{1}{2}}=-4,~{}~{}0\leq m\leq r-1,\end{split}

and

cm=2,cr1=94,cr=14,0mr2,d0=12r4,dm=4,dr=2,1mr1.\begin{split}&c_{m}=-2,\quad c_{r-1}=-\frac{9}{4},\quad c_{r}=\frac{1}{4},~{}~{}0\leq m\leq r-2,\\ &d_{0}=12r-4,\quad d_{m}=-4,\quad d_{r}=-2,~{}~{}1\leq m\leq r-1.\\ \end{split}

2.2.2 Symmetric positive definite block-structured dense systems

We observe that the discrete maximum principle is not satisfied for the above nonsymmetric indefinite system (10), which might be trickier for the stability analysis of the high-order numerical schemes [23, 30]. As a consequence, the shifted-symmetric piecewise quadratic polynomial collocation method for peridynamic /nonlocal model (6) has been considered in [20, 21], which satisfies the discrete maximum principle and has the symmetric positive definite block-structured dense systems. Namely, the shifted-symmetric system of (7) can be recast as

(11) 𝒜hSUh=ηhFhSwith𝒜hS=[ABBTA^],ηh=2δ3/h.\mathcal{A}^{S}_{h}U_{h}=\eta_{h}\cdot F_{h}^{S}~{}~{}{\rm with}~{}~{}\mathcal{A}^{S}_{h}=\left[\begin{array}[]{cc}A&B\\ B^{T}&\widehat{A}\end{array}\right],\eta_{h}=2\delta^{3}/h.

Here the function FhSF^{S}_{h} is computed as done in (8) and (9), the Toeplitz matrices A(N1)×(N1)A\in\mathbb{R}^{{(N-1)}\times(N-1)}, B(N1)×NB\in\mathbb{R}^{{(N-1)}\times N}, and A^N×N\widehat{A}\in\mathbb{R}^{{N}\times N} are defined by

A=toeplitz(a0,a1,a2,,ar,01×(Nr2)),B=toeplitz([a12,a32,,ar12,01×(N1r)],[a12,a12,a32,,ar12,01×(Nr1)]),A^=toeplitz([a0,a1,a2,,ar,01×(Nr1)])\begin{split}A&={\rm toeplitz}(a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-2)}),\\ B&={\rm toeplitz}\left([a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-1-r)}],[a_{\frac{1}{2}},a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-r-1)}]\right),\\ \widehat{A}&={\rm toeplitz}([a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-1)}])\end{split}

with

(12) a0=12r2,am=2,ar=1,1mr1,am+12=4,0mr1.\begin{split}&a_{0}=12r-2,\quad a_{m}=-2,\quad a_{r}=-1,~{}~{}1\leq m\leq r-1,\\ &a_{m+\frac{1}{2}}=-4,~{}~{}0\leq m\leq r-1.\end{split}

3 Fast AMG for block-structured dense systems

Multigrid methods are among the most efficient iterative methods for solving large scale systems of equations [37, 48]. However, it is not easy to build restriction and prolongation operators when using AMG methods for nonsymmetric sparse systems [10, 31]. To the best of our knowledge, there is no fast AMG for block-structured dense linear systems of the type in (1), since the special prolongation/transfer operators are not easy to be employed with fast Fourier transform. Here, the simple (traditional) transfer operator are employed to handle such block-structured dense systems to ensure a fast AMG showing a 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) complexity.

3.1 Multigrid methods

Let us firstly review the basic multigrid technique when applied to a scalar algebraic linear system, having in mind that our target is the efficient solution of the block-structured dense linear systems as those reported in (1). Let the finest mesh points xi=a+ihx_{i}=a+ih, h=(ba)/Nh=(b-a)/N, N=2KN=2^{K} with Ω=(a,b)\Omega=(a,b). Define the multiple level of grids

𝔅k={xik=a+i2k(ba),i=1:Nk}withNk=2k1,k=1:K,\mathfrak{B}^{k}=\left\{x^{k}_{i}=a+\frac{i}{2^{k}}(b-a),i=1:N_{k}\right\}~{}{\rm with}~{}N_{k}=2^{k}-1,k=1:K,

where 𝔅k\mathfrak{B}^{k} represents not only the grid with grid spacing hk=2Kkhh_{k}=2^{K-k}h, but also the space of vectors defined on that grid.

Given a algebraic system

(13) Ahuh=bh.A_{h}u^{h}=b^{h}.

We define a sequence of subsystems on different levels

Akuk=bk,uk𝔅k,k=1,,K.A_{k}u^{k}=b^{k},u^{k}\in\mathfrak{B}^{k},\quad k=1{,\ldots,}K.

Here KK is the total number of levels, with k=Kk=K being the finest level, i.e., AK=AhA_{K}=A_{h}.

The traditional restriction operator Ikk1:NkNk1I^{k-1}_{k}:\mathds{R}^{N_{k}}\rightarrow\mathds{R}^{N_{k-1}} and prolongation operator Ik1k:Nk1NkI^{k}_{k-1}:\mathds{R}^{N_{k-1}}\rightarrow\mathds{R}^{N_{k}} are defined by [38, pp. 438-454]

Ikk1=14[121121121]Nk1×NkandIk1k=2(Ikk1)T,I_{k}^{k-1}=\frac{1}{4}\left[\begin{array}[]{cccccccc}1&2&1&&&&&\\ &&1&2&1&&&\\ &&&&\cdots&\cdots&&\\ &&&&&1&2&1\end{array}\right]_{N_{k-1}\times N_{k}}\quad{\rm and}\quad I^{k}_{k-1}=2\left(I_{k}^{k-1}\right)^{T},

which should be convenient and efficient for block-structured dense linear systems as those in (1), using AMG and fast Fourier transforms. Let

νk1=Ikk1νkwithνk1=14(ν2i1k+2ν2ik+ν2i+1k),i=1:Nk,\nu^{k-1}=I_{k}^{k-1}\nu^{k}~{}~{}{\rm with}~{}~{}\nu^{k-1}=\frac{1}{4}\left(\nu^{k}_{2i-1}+2\nu^{k}_{2i}+\nu^{k}_{2i+1}\right),~{}i=1:N_{k},

and

νk=Ik1kνk1.\nu^{k}=I^{k}_{k-1}\nu^{k-1}.

It may be more useful to define the linear system by Galerkin projection in the AMG method, where the coarse grid problem is defined by

(14) Ak1=Ikk1AkIk1k,A_{k-1}=I_{k}^{k-1}A_{k}I^{k}_{k-1},

and the intermediate (k,k1)\left(k,k-1\right) coarse grid correction operator is

Tk=IkIk1kAk11Ikk1Ak.T_{k}=I_{k}-I^{k}_{k-1}A_{k-1}^{-1}I^{k-1}_{k}A_{k}.

We choose the damped Jacobi iterative operator

(15) Sk=IωDk1AkS_{k}=I-\omega D_{k}^{-1}A_{k}\

with a weighting factor ω\omega, and DkD_{k} being the diagonal of AkA_{k}.

A multigrid process is intrinsically to define a sequence of operators Bk:𝔅k𝔅kB_{k}:\mathfrak{B}^{k}\rightarrow\mathfrak{B}^{k}, which is an approximate inverse of AkA_{k} in the sense that IBkAk\left|\left|I-B_{k}A_{k}\right|\right| is small, bounded away from one. The V-cycle Multigrid Algorithm 1 can be seen in [47]. If k=2k=2, the resulting Algorithm 1 is two-grid method (TGM).

Algorithm 1 V-cycle Multigrid Algorithm: Define B1=A11B_{1}=A_{1}^{-1}. Assume that Bk1:𝔅k1𝔅k1B_{k-1}:\mathfrak{B}^{k-1}\mapsto\mathfrak{B}^{k-1} is defined. We shall now define Bk:𝔅k𝔅kB_{k}:\mathfrak{B}^{k}\mapsto\mathfrak{B}^{k} as an approximate iterative solver for the equation associated with Akuk=bkA_{k}u^{k}=b^{k}.
1:  Pre-smooth: Let Sk,ωS_{k,\omega} be defined by (15) and u0k=𝟎u^{k}_{0}=\mathbf{0}, for l=1:m1l=1:m_{1},
ulk=ul1k+Sk,ωpre(bkAkul1k)u^{k}_{l}=u^{k}_{l-1}+S_{k,\omega_{pre}}(b^{k}-A_{k}u^{k}_{l-1})
2:  Coarse grid solution: Denote ek1𝔅k1e^{k-1}\in\mathfrak{B}^{k-1} as the approximate solution of the residual equation Ak1e=Ikk1(bkAkum1k)A_{k-1}e=I^{k-1}_{k}(b^{k}-A_{k}u^{k}_{m_{1}}) with the iterator Bk1B_{k-1},
ek1=Bk1Ikk1(bkAkum1k)e^{k-1}=B_{k-1}I^{k-1}_{k}(b^{k}-A_{k}u^{k}_{m_{1}})
3:  Correction: um1+1k=um1k+Ik1kek1u^{k}_{m_{1}+1}=u^{k}_{m_{1}}+I^{k}_{k-1}e^{k-1},
4:  Post-smooth: for l=m1+2:m1+m2l=m_{1}+2:m_{1}+m_{2},
ulk=ul1k+Sk,ωpost(bkAkul1k)u^{k}_{l}=u^{k}_{l-1}+S_{k,\omega_{post}}(b^{k}-A_{k}u^{k}_{l-1})
5:  Define: Bkbk=um1+m2kB_{k}b^{k}=u^{k}_{m_{1}+m_{2}}.

The basic AMG idea for solving the block-structured dense linear systems in (1) is the same as in the scalar case (13). Define a sequence of block-structured subsystems on different levels

𝒜kuk=bk,uk𝔐k,k=1,,K\mathcal{A}_{k}u^{k}=b^{k},~{}~{}u^{k}\in\mathfrak{M}^{k},\quad k=1{,\ldots,}K

with the multiple level of grids

𝔐k={xi/2k=a+i/22k(ba),i=1:2Nk+1}withNk=2k1,k=1:K.\mathfrak{M}^{k}=\left\{x^{k}_{i/2}=a+\frac{i/2}{2^{k}}(b-a),i=1:2N_{k}+1\right\}~{}{\rm with}~{}N_{k}=2^{k}-1,k=1:K.

Here

𝒜k=[A(k)B(k)C(k)D(k)],uk=[vkwk],bk=[bfkbgk],\mathcal{A}_{k}=\left[\begin{array}[]{cc}A^{(k)}&B^{(k)}\\ C^{(k)}&D^{(k)}\end{array}\right],~{}u^{k}=\left[\begin{array}[]{c}v^{k}\\ w^{k}\end{array}\right],~{}b^{k}=\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right],

and 𝒟k\mathcal{D}_{k} is the diagonal matrix of 𝒜k\mathcal{A}_{k}. The block-structured dense V-cycle Multigrid method is developed in Algorithm 2.

Algorithm 2 Block-structured dense V-cycle Multigrid method: Define 1=𝒜11\mathcal{B}_{1}=\mathcal{A}_{1}^{-1}. Assume that k1:𝔐k1𝔐k1\mathcal{B}_{k-1}:\mathfrak{M}^{k-1}\mapsto\mathfrak{M}^{k-1} is defined. We shall now define k:𝔐k𝔐k\mathcal{B}_{k}:\mathfrak{M}^{k}\mapsto\mathfrak{M}^{k} as an approximate iterative solver for the equation associated with 𝒜kuk=bk\mathcal{A}_{k}u^{k}=b^{k}.
1:  Pre-smooth: Let 𝒮k,ω\mathcal{S}_{k,\omega} be defined by 𝒮k=Iω𝒟k1𝒜k\mathcal{S}_{k}=I-\omega\mathcal{D}_{k}^{-1}\mathcal{A}_{k} and [v0kw0k]=𝟎\left[\begin{array}[]{c}v^{k}_{0}\\ w^{k}_{0}\end{array}\right]=\mathbf{0}, for l=1:m1l=1:m_{1},
[vlkwlk]=[vl1kwl1k]+𝒮k,ωpre([bfkbgk]𝒜k[vl1kwl1k])\left[\begin{array}[]{c}v^{k}_{l}\\ w^{k}_{l}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]+\mathcal{S}_{k,\omega_{pre}}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]\right)
2:  Coarse grid solution: Denote 𝐞k1=[evk1ewk1]𝔐k1\mathbf{e}^{k-1}=\left[\begin{array}[]{c}e^{k-1}_{v}\\ e^{k-1}_{w}\end{array}\right]\in\mathfrak{M}_{k-1} as the approximate solution of the residual equation 𝒜k1𝐞=Ikk1([bfkbgk]𝒜k[vm1kwm1k])\mathcal{A}_{k-1}\mathbf{e}=I^{k-1}_{k}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{m_{1}}\\ w^{k}_{m_{1}}\end{array}\right]\right) with the iterator k1\mathcal{B}_{k-1} an approximate inverse of 𝒜k1\mathcal{A}_{k-1},
[euk1evk1]=k1Ikk1([fkgk]𝒜k[um1kvm1k])\left[\begin{array}[]{c}e^{k-1}_{u}\\ e^{k-1}_{v}\end{array}\right]=\mathcal{B}_{k-1}I^{k-1}_{k}\left(\left[\begin{array}[]{c}f_{k}\\ g_{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}u^{k}_{m_{1}}\\ v^{k}_{m_{1}}\end{array}\right]\right)
3:  Correction: [vm1+1kwm1+1k]=[vm1kwm1k]+Ik1k[evk1ewk1]\left[\begin{array}[]{c}v^{k}_{m_{1}+1}\\ w^{k}_{m_{1}+1}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{m_{1}}\\ w^{k}_{m_{1}}\end{array}\right]+I^{k}_{k-1}\left[\begin{array}[]{c}e^{k-1}_{v}\\ e^{k-1}_{w}\end{array}\right],
4:  Post-smooth: for l=m1+2:m1+m2l=m_{1}+2:m_{1}+m_{2},
[vlkwlk]=[vl1kwl1k]+𝒮k,ωpost([bfkbgk]𝒜k[vl1kwl1k])\left[\begin{array}[]{c}v^{k}_{l}\\ w^{k}_{l}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]+\mathcal{S}_{k,\omega_{post}}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]\right)
5:  Define: k[bfkbgk]=[vm1+m2kwm1+m2k]\mathcal{B}_{k}\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{m_{1}+m_{2}}\\ w^{k}_{m_{1}+m_{2}}\end{array}\right].

3.2 Fast Fourier transform for block-structured dense systems

We first review the Toeplitz matrix and the circulant matrix by fast Fourier transform with 𝒪(nlogn)\mathcal{O}(n\mbox{log}n) complexity, which will be used later. Let n×nn\times n Toeplitz matrix Tn(c)T_{n}(c) be defined by [9, 12]

Tn(c):=[cjk]j,k=1n=[c0c1c(n1)c1c0c(n2)cn1cn2c0],T_{n}(c):=[c_{j-k}]_{j,k=1}^{n}=\left[\begin{matrix}c_{0}&c_{-1}&\cdots&c_{-(n-1)}\\ c_{1}&c_{0}&\cdots&c_{-(n-2)}\\ \vdots&\vdots&\ddots&\vdots\\ c_{n-1}&c_{n-2}&\cdots&c_{0}\end{matrix}\right],

and the circulant matrix be the periodic cousin of Toeplitz matrix. Denote the circulant matrix CnC_{n} whose first column is c~=(c0,c1,,cn1)T\widetilde{c}=(c_{0},c_{1},\dots,c_{n-1})^{\rm T}, namely,

Cn:=[c0cn1cn2c2c1c1c0cn1c3c2c2c1c0c3cn2cn3c0cn1cn1cn2cn3c1c0].C_{n}:=\left[\begin{matrix}c_{0}&c_{n-1}&c_{n-2}&\cdots&c_{2}&c_{1}\\ c_{1}&c_{0}&c_{n-1}&\cdots&c_{3}&c_{2}\\ c_{2}&c_{1}&c_{0}&\ddots&\ddots&c_{3}\\ \vdots&\vdots&\ddots&\ddots&\ddots&\vdots\\ c_{n-2}&c_{n-3}&\ddots&\ddots&c_{0}&c_{n-1}\\ c_{n-1}&c_{n-2}&c_{n-3}&\cdots&c_{1}&c_{0}\end{matrix}\right].

Moreover, we set ωn=exp(2πi/n)\omega_{n}=\mbox{exp}(2\pi i/n) and define the unitary Fourier matrix

Fn=1n[11111ωnωn2ωnn11ωn2ωn4ωn2(n1)1ωnn1ωn2(n1)ωn(n1)(n1)]F_{n}=\frac{1}{\sqrt{n}}\left[\begin{array}[]{llllr}1&1&1&\cdots&1\\ 1&\omega_{n}&\omega_{n}^{2}&\cdots&\omega_{n}^{n-1}\\ 1&\omega_{n}^{2}&\omega_{n}^{4}&\cdots&\omega_{n}^{2(n-1)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&\omega_{n}^{n-1}&\omega_{n}^{2(n-1)}&\cdots&\omega_{n}^{(n-1)(n-1)}\end{array}\right]

with ii being the imaginary unit. Therefore, a circulant matrix can be diagonalized by the Fourier matrix FnF_{n}, i.e.,

Cn=Fn1diag(nFnc~)Fn=Fndiag(nFnc~)Fn,C_{n}=F_{n}^{-1}\mbox{diag}(\sqrt{n}F_{n}\widetilde{c})F_{n}=F_{n}^{*}\mbox{diag}(\sqrt{n}F_{n}\widetilde{c})F_{n},

where diag(nFnc~)\mbox{diag}(\sqrt{n}F_{n}\widetilde{c}) is a diagonal matrix holding the eigenvalues of CnC_{n}. Moreover, for any given vector vv, we can determine [13]

Cnv=ifft(fft(c~).fft(v))C_{n}v={\rm ifft}\left({\rm fft}(\widetilde{c}).*{\rm fft}(v)\right)

with 𝒪(nlogn)\mathcal{O}(n\mbox{log}n) operations by the fast Fourier transform (FFT) of the first column c~\widetilde{c} of CnC_{n}.

Then, for any nn-by-11 vector 𝐱\bf x, the multiplication Tn𝐱T_{n}\bf x can also be computed by FFTs with the computational count of 𝒪(nlogn)\mathcal{O}\left(n\log n\right) arithmetic operations [12, p. 12]. More concretely, we take a 2n2n-by-2n2n circulant matrix CnC_{n} with TnT_{n} embedded inside as follows:

(16) [TnTn][𝐱𝟎]=[Tn𝐱]withCn=[TnTn].\left[\begin{array}[]{cc}T_{n}&\ast\\ \ast&T_{n}\end{array}\right]\left[\begin{array}[]{c}\bf x\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}T_{n}\bf x\\ {\ddagger}\end{array}\right]~{}~{}~{}~{}{\rm with}~{}~{}C_{n}=\left[\begin{array}[]{cc}T_{n}&\ast\\ \ast&T_{n}\end{array}\right].

3.2.1 Fast Fourier transform for block-structured dense systems at the finest level

Let us consider the fast Fourier transform algorithm for block-structured dense systems (1) at the finest level, namely,

(17) 𝒜hu=(ABCD)(vw)=(Av+BwCv+Dw)\mathcal{A}_{h}u=\left(\begin{array}[]{cc}A&B\\ C&D\end{array}\right)\left(\begin{array}[]{c}v\\ w\end{array}\right)=\left(\begin{array}[]{c}Av+Bw\\ Cv+Dw\end{array}\right)

with Toeplitz matrices AM×MA\in\mathbb{R}^{M\times M}, BM×NB\in\mathbb{R}^{M\times N}, CN×MC\in\mathbb{R}^{N\times M} and DN×ND\in\mathbb{R}^{N\times N}, M<NM<N.

It is well known that most of the early works on matrix-vector multiplication with Toeplitz algebraic system were focused on squared matrices by Fast fourier transform (FFT) [12, 13]. However, the considered technique can not be directly applied to block-structured dense systems of the form (17), since the related structures contain rectangular the matrices BB and CC.

We next design/develop the FFT algorithm for the rectangular matrices BB and CC in (17) based on the idea of [11, 12], which realizes the computational count of 𝒪(NlogN)\mathcal{O}(N\log N) arithmetic operations and the required storage 𝒪(N)\mathcal{O}(N). Let the rectangular matrix BM×NB\in\mathbb{R}^{M\times N} be given by

B=[b0b1bJbN1b1b0b1b2b1b0bM+1b2b1b0b1bJ]M×NwithJ=NM.\begin{split}B=\left[\begin{matrix}b_{0}&b_{1}&\cdots&b_{J}&\cdots&\cdots&\cdots&b_{N-1}\\ b_{-1}&b_{0}&b_{1}&\ddots&\ddots&&&\vdots\\ b_{-2}&b_{-1}&b_{0}&\ddots&\ddots&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&\ddots&\ddots&\vdots\\ b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}&b_{1}&\cdots&b_{J}\end{matrix}\right]_{M\times N}~{}~{}{\rm with}~{}~{}J=N-M.\end{split}

We embed the matrix BM×NB\in\mathbb{R}^{M\times N} into the following NN-by-NN Toeplitz matrix BTB_{T},

BT=[b0b1bJbN1b1b0b1b2b1b0bM+1b2b1b0b1bJ0b100bM+1b2b1b0]N×N.\begin{split}B_{T}=\left[\begin{matrix}b_{0}&b_{1}&\cdots&b_{J}&\cdots&\cdots&\cdots&b_{N-1}\\ b_{-1}&b_{0}&b_{1}&\ddots&\ddots&&&\vdots\\ b_{-2}&b_{-1}&b_{0}&\ddots&\ddots&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&\ddots&\ddots&\vdots\\ b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}&b_{1}&\cdots&b_{J}\\ \hline\cr 0&\ddots&&\ddots&\ddots&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&&\ddots&\ddots&\ddots&b_{1}\\ 0&\cdots&0&b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}\\ \end{matrix}\right]_{N\times N}\end{split}.

Using (16) and fast Fourier transform, it implies that BwBw is the first MM-rows of BTwB_{T}w with

[BTBT][w𝟎]=[BTw].\left[\begin{array}[]{cc}B_{T}&\ast\\ \ast&B_{T}\end{array}\right]\left[\begin{array}[]{c}w\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}B_{T}w\\ {\ddagger}\end{array}\right].

Let the rectangular matrix CN×MC\in\mathbb{R}^{N\times M} be defined by

C=[c0c1cM1c1c1cJc1c0c1cN+1cJ]N×MwithJ=NM.\begin{split}C=\left[\begin{matrix}c_{0}&c_{1}&\cdots&c_{M-1}\\ c_{-1}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&c_{1}\\ c_{-J}&\cdots&c_{-1}&c_{0}\\ \vdots&\ddots&\ddots&c_{-1}\\ \vdots&&\ddots&\vdots\\ c_{-N+1}&\cdots&\cdots&c_{-J}\end{matrix}\right]_{N\times M}~{}~{}{\rm with}~{}~{}J=N-M.\end{split}

Similar, we embed matrix CN×MC\in\mathbb{R}^{N\times M} into the following NN-by-NN Toeplitz matrix CTC_{T},

CT=[c0c1cM1c1c1cJc1c0c1cN+1cJ|00cM10c1cM1c0c1c1c0]N×N.\begin{split}C_{T}=\left[\begin{matrix}c_{0}&c_{1}&\cdots&c_{M-1}\\ c_{-1}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&c_{1}\\ c_{-J}&\cdots&c_{-1}&c_{0}\\ \vdots&\ddots&\ddots&c_{-1}\\ \vdots&&\ddots&\vdots\\ c_{-N+1}&\cdots&\cdots&c_{-J}\end{matrix}\right|\left.\begin{matrix}0&\cdots&0\\ c_{M-1}&\ddots&\vdots\\ \vdots&\ddots&0\\ c_{1}&\cdots&c_{M-1}\\ c_{0}&\ddots&\vdots\\ \ddots&\ddots&c_{1}\\ \cdots&c_{-1}&c_{0}\end{matrix}\right]_{N\times N}.\end{split}

From (16) and fast Fourier transform, we have

Cv=CT𝐯with[CTCT][𝐯𝟎]=[CT𝐯],𝐯=[v𝟎J×1].Cv=C_{T}{\bf v}~{}~{}{\rm with}~{}~{}\left[\begin{array}[]{cc}C_{T}&\ast\\ \ast&C_{T}\end{array}\right]\left[\begin{array}[]{c}{\bf v}\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}C_{T}{\bf v}\\ {\ddagger}\end{array}\right],{\bf v}=\left[\begin{array}[]{c}v\\ {\bf 0}_{J\times 1}\end{array}\right].

3.2.2 Fast Fourier transform for block-structured dense systems at the coarser level

In fact, embedding rectangular matrices BB and CC in (17) into the square Toeplitz matrices in subsubsection 3.2.1 is invalid, when using the Galerkin projection (14) in the AMG method, although this technique can be applied to compute the finest level for a fast AMG, since it does not keep block-structured Toeplitz properties. Next, we shall design the FFT for the block-structured matrix 𝒜k\mathcal{A}_{k} multiplies vector at the coarser level, i.e.,

𝒜k[ukvk]=[A(k)uk+B(k)vkC(k)uk+D(k)vk].{\mathcal{A}}_{k}\left[\begin{array}[]{c}u^{k}\\ v^{k}\end{array}\right]=\left[\begin{array}[]{c}A^{(k)}u^{k}+B^{(k)}v^{k}\\[5.69054pt] C^{(k)}u^{k}+D^{(k)}v^{k}\end{array}\right].

In AMG, the coarse problem at the level k<Kk<K is typically defined using the Galerkin approach, i.e., the coefficient matrix on the coarser grid can be computed by

(18) 𝒜k1=Ikk1𝒜kIk1k.\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}.

More concretely,

𝒜k1=[A(k1)B(k1)C(k1)D(k1)](2Nk1+1)×(2Nk1+1)=Ikk1[A(k)B(k)C(k)D(k)](2Nk+1)×(2Nk+1)Ik1k\mathcal{A}_{k-1}=\left[\begin{array}[]{cc}A^{(k-1)}&B^{(k-1)}\\[5.69054pt] C^{(k-1)}&D^{(k-1)}\end{array}\right]_{(2N_{k-1}+1)\times(2N_{k-1}+1)}=I_{k}^{k-1}\left[\begin{array}[]{cc}A^{(k)}&B^{(k)}\\[5.69054pt] C^{(k)}&D^{(k)}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)}I^{k}_{k-1}

with Ikk1Nk×(2Nk+1)I_{k}^{k-1}\in\mathbb{R}^{N_{k}\times(2N_{k}+1)} in (14).

It should be noted that A(k),B(k),C(k),D(k)A^{(k)},B^{(k)},C^{(k)},D^{(k)} are Toeplitz matrices if k=Kk=K, which corresponds to the block-structured dense system (17) at the fineast level with the computational count of 𝒪(NlogN)\mathcal{O}(N\log N) arithmetic operations. However, there is major difference for 𝒜k1\mathcal{A}_{k-1} at the coarser level, since it does not keep block-structured Toeplitz properties, see Example 3.1.

Example 3.1.

Choose the unit matrices A(k)7×7A^{(k)}\in\mathbb{R}^{7\times 7}, D(k)8×8D^{(k)}\in\mathbb{R}^{8\times 8} and the rectangular ones matrices B(k)7×8B^{(k)}\in\mathbb{R}^{7\times 8}, C(k)8×7C^{(k)}\in\mathbb{R}^{8\times 7} (all the elements are 11). Using the Galerkin approximation (18), we deduce

[A(k1)B(k1)C(k1)D(k1)]=Ikk1𝒜kIk1k=18[610161016121213161616161616161616|12121312544|161616161616161616544610161016].\left[\begin{array}[]{cc}A^{(k-1)}&B^{(k-1)}\\[5.69054pt] C^{(k-1)}&D^{(k-1)}\end{array}\right]=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}=\frac{1}{8}\left[\begin{array}[]{ccc }6&1&0\\ 1&6&1\\ 0&1&6\\ \hline\cr 12&12&13\\ \hline\cr 16&16&16\\ 16&16&16\\ 16&16&16\\ \end{array}\right|\left.\begin{array}[]{c }12\\ 12\\ 13\\ \hline\cr 12\\ \hline\cr 5\\ 4\\ 4\\ \end{array}\right|\left.\begin{array}[]{ccc }16&16&16\\ 16&16&16\\ 16&16&16\\ \hline\cr 5&4&4\\ \hline\cr 6&1&0\\ 1&6&1\\ 0&1&6\\ \end{array}\right].

We next design fast Fourier transform for block-structured dense systems at the coarser level in AMG. Let

(19) 𝒜k=[A(k)0B¯(k)0C¯(k)0D¯(k)](2Nk+1)×(2Nk+1)+[0p(k)0q(k)o(k)ζ(k)0ξ(k)0](2Nk+1)×(2Nk+1),\mathcal{A}_{k}=\left[\begin{array}[]{ccc}A^{(k)}&\textbf{0}&\bar{B}^{(k)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k)}}&\textbf{0}&\bar{D}^{(k)}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)}+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k)}&\textbf{0}\\ {q^{(k)}}&o^{(k)}&{\zeta^{(k)}}\\ \textbf{0}&\xi^{(k)}&\textbf{0}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)},

where A(k),B¯(k),C¯(k),D¯(k)A^{(k)},\bar{B}^{(k)},{\bar{C}^{(k)}},\bar{D}^{(k)} are the square matrices with

B(k)=[p(k)B¯(k)],C(k)=[q(k)C¯(k)],D(k)=[o(k)ζ(k)ξ(k)D¯(k)].B^{(k)}=\left[\begin{array}[]{cc}p^{(k)}&\bar{B}^{(k)}\end{array}\right],~{}C^{(k)}=\left[\begin{array}[]{c}q^{(k)}\\[2.84526pt] \bar{C}^{(k)}\end{array}\right],~{}D^{(k)}=\left[\begin{array}[]{cc}o^{(k)}&\zeta^{(k)}\\[2.84526pt] \xi^{(k)}&\bar{D}^{(k)}\end{array}\right].

The symbol o(k)o^{(k)} is a real number, 0 denotes a zero number, and the bold 0 denotes a zero matrix/vector with the corresponding size. The coefficients p(k),ξ(k)p^{(k)},\xi^{(k)} denote the column vectors and q(k),ζ(k){q^{(k)}},{\zeta^{(k)}} denote the row vectors. We may call it the Cross-splitting technique, since we main focus on the fast Fourier transform for Cross-type matrix in (19).

From (19), it yields

𝒜k[vkwk]=([A(k)0B¯(k)0C¯(k)0D¯(k)]+[0p(k)0q(k)o(k)ζ(k)0ξ(k)0])[vkwokw¯k]=[A(k)vk+B¯(k)w¯k0C¯(k)vk+D¯(k)w¯k]+[p(k)wokq(k)vk+o(k)wok+ζ(k)w¯kξ(k)wok]withwk=[wokw¯k].\begin{split}{\mathcal{A}}_{k}\left[\begin{array}[]{c}v^{k}\\ w^{k}\end{array}\right]=&\left(\left[\begin{array}[]{ccc}A^{(k)}&\textbf{0}&\bar{B}^{(k)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k)}}&\textbf{0}&\bar{D}^{(k)}\end{array}\right]+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k)}&\textbf{0}\\ {q^{(k)}}&o^{(k)}&{\zeta^{(k)}}\\ \textbf{0}&\xi^{(k)}&\textbf{0}\end{array}\right]\right)\left[\begin{array}[]{c}v^{k}\\ w^{k}_{o}\\ \bar{w}^{k}\end{array}\right]\\ =&\left[\begin{array}[]{c}A^{(k)}v^{k}+\bar{B}^{(k)}\bar{w}^{k}\\ 0\\ {\bar{C}^{(k)}}v^{k}+\bar{D}^{(k)}\bar{w}^{k}\end{array}\right]+\left[\begin{array}[]{ccc}p^{(k)}w^{k}_{o}\\ q^{(k)}v^{k}+o^{(k)}w^{k}_{o}+\zeta^{(k)}\bar{w}^{k}\\ \xi^{(k)}w^{k}_{o}\end{array}\right]~{}~{}{\rm with}~{}~{}w^{k}=\left[\begin{array}[]{c}w^{k}_{o}\\ \bar{w}^{k}\end{array}\right].\end{split}

Obviously, since A(k)A^{(k)}, B¯(k)\bar{B}^{(k)}, C¯(k){\bar{C}^{(k)}}, D¯(k)\bar{D}^{(k)} are Toeplitz matrices, the computation of A(k)vkA^{(k)}v^{k}, B¯(k)w¯k\bar{B}^{(k)}\bar{w}^{k}, C¯(k)vk{\bar{C}^{(k)}}v^{k}, and D¯(k)w¯k\bar{D}^{(k)}\bar{w}^{k} by FFT needs 𝒪(NklogNk)\mathcal{O}(N_{k}{\rm log}N_{k}), and with required storage 𝒪(Nk)\mathcal{O}(N_{k}). For the Cross matrix, see p(k)p^{(k)}, q(k)q^{(k)}, ζ(k)\zeta^{(k)}, ξ(k)\xi^{(k)} and o(k)o^{(k)}, results in 𝒪(Nk)\mathcal{O}(N_{k}) complexity and storage operations.

Let the stiffness matrix of the coarser level be

(20) 𝒜k1=[A(k1)0B¯(k1)0C¯(k1)0D¯(k1)]Nk×Nk+[0p(k1)0q(k1)o(k1)ζ(k1)0ξ(k1)0]Nk×Nk.\mathcal{A}_{k-1}=\left[\begin{array}[]{ccc}A^{(k-1)}&\textbf{0}&\bar{B}^{(k-1)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k-1)}}&\textbf{0}&\bar{D}^{(k-1)}\end{array}\right]_{N_{k}\times N_{k}}+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k-1)}&\textbf{0}\\ {q^{(k-1)}}&o^{(k-1)}&{\zeta^{(k-1)}}\\ \textbf{0}&\xi^{(k-1)}&\textbf{0}\end{array}\right]_{N_{k}\times N_{k}}.

Then using (19), (20), and a Galerkin approach in (18), it is easy to obtain the following results.

Lemma 1.

Let A(k)={ai,j(k)}i,j=1NkA^{(k)}=\{a_{i,j}^{(k)}\}_{i,j=1}^{N_{k}} with ai,j(k)=aji(k)a_{i,j}^{(k)}=a_{j-i}^{(k)} be a Toeplitz matrix in (19). Then the elements of A(k1)A^{(k-1)} in (20) can be computed by

8a0(k1)=a2(k)+4a1(k)+6a0(k)+4a1(k)+a2(k),8a_{0}^{(k-1)}=a_{-2}^{(k)}+4a_{-1}^{(k)}+6a_{0}^{(k)}+4a_{1}^{(k)}+a_{2}^{(k)},

and

8ai(k1)=a2i2(k)+4a2i1(k)+6a2i(k)+4a2i+1(k)+a2i+2(k),8ai(k1)=a2i2(k)+4a2i1(k)+6a2i(k)+4a2i+1(k)+a2i+2(k),i1.\begin{split}8a_{i}^{(k-1)}&=a_{2i-2}^{(k)}+4a_{2i-1}^{(k)}+6a_{2i}^{(k)}+4a_{2i+1}^{(k)}+a_{2i+2}^{(k)},\\ 8a_{-i}^{(k-1)}&=a_{-2i-2}^{(k)}+4a_{-2i-1}^{(k)}+6a_{-2i}^{(k)}+4a_{-2i+1}^{(k)}+a_{-2i+2}^{(k)},~{}~{}~{}i\geq 1.\end{split}

Moreover, for the Cross matrix, the following terms can be defined

8pi(k1)=(aNk2i(k)+2aNk2i1(k)+aNk2i2(k))+2(p2i1(k)+2p2i(k)+p2i+1(k))+(b2i+2(k)+2b2i+1(k)+b2i(k)),8qi(k1)=(aNk+2i(k)+2aNk+2i+1(k)+aNk+2i+2(k))+2(q2i1(k)+2q2i(k)+q2i+1(k))+(c2i2(k)+2c2i1(k)+c2i(k)),8o(k1)=(a0(k)+2pNk1(k)+bNk+2(k))+2(qNk1(k)+2o(k)+ζ1(k))+(cNk2(k)+2ξ1(k)+d0(k)),\begin{split}8p^{(k-1)}_{i}&=\left(a^{(k)}_{N_{k}-2i}+2a^{(k)}_{N_{k}-2i-1}+a^{(k)}_{N_{k}-2i-2}\right)+2\left(p^{(k)}_{2i-1}+2p^{(k)}_{2i}+p^{(k)}_{2i+1}\right)\\ &\quad+\left(b^{(k)}_{-2i+2}+2b^{(k)}_{-2i+1}+b^{(k)}_{-2i}\right),\\ 8q^{(k-1)}_{i}&=\left(a^{(k)}_{-N_{k}+2i}+2a^{(k)}_{-N_{k}+2i+1}+a^{(k)}_{-N_{k}+2i+2}\right)+2\left(q^{(k)}_{2i-1}+2q^{(k)}_{2i}+q^{(k)}_{2i+1}\right)\\ &\quad+\left(c^{(k)}_{2i-2}+2c^{(k)}_{2i-1}+c^{(k)}_{2i}\right),\\ 8o^{(k-1)}&=\left(a^{(k)}_{0}+2p^{(k)}_{N_{k}-1}+b^{(k)}_{-N_{k}+2}\right)+2\left(q^{(k)}_{N_{k}-1}+2o^{(k)}+\zeta^{(k)}_{1}\right)+\left(c^{(k)}_{N_{k}-2}+2\xi^{(k)}_{1}+d^{(k)}_{0}\right),\end{split}

and

8ξi(k1)=(cNk2i(k)+2cNk2i1(k)+cNk2i2(k))+2(ξ2i1(k)+2ξ2i(k)+ξ2i+1(k))+(d2i+2(k)+2d2i+1(k)+d2i(k)),8ζi(k1)=(bNk+2i(k)+2bNk+2i+1(k)+bNk+2i+2(k))+2(ζ2i1(k)+2ζ2i(k)+ζ2i+1(k))+(d2i2(k)+2d2i1(k)+d2i(k)).\begin{split}8\xi^{(k-1)}_{i}&=\left(c^{(k)}_{N_{k}-2i}+2c^{(k)}_{N_{k}-2i-1}+c^{(k)}_{N_{k}-2i-2}\right)+2\left(\xi^{(k)}_{2i-1}+2\xi^{(k)}_{2i}+\xi^{(k)}_{2i+1}\right)\\ &\quad+\left(d^{(k)}_{-2i+2}+2d^{(k)}_{-2i+1}+d^{(k)}_{-2i}\right),\\ 8\zeta^{(k-1)}_{i}&=\left(b^{(k)}_{-N_{k}+2i}+2b^{(k)}_{-N_{k}+2i+1}+b^{(k)}_{-N_{k}+2i+2}\right)+2\left(\zeta^{(k)}_{2i-1}+2\zeta^{(k)}_{2i}+\zeta^{(k)}_{2i+1}\right)\\ &\quad+\left(d^{(k)}_{2i-2}+2d^{(k)}_{2i-1}+d^{(k)}_{2i}\right).\end{split}

3.3 The operation count and storage requirement

We now study the computation complexity by the fast Fourier transform and the required storage for the block-structured dense system (17) in AMG, arising from the nonlocal problems in Section 2.

From (17), we know that the matrix 𝒜h\mathcal{A}_{h} is a block-structured Toeplitz-like system. Then, we only need to store the first (last) column, first (last) row and principal diagonal 𝒜h\mathcal{A}_{h}, which have 𝒪(N)\mathcal{O}(N) parameters, instead of the full matrix 𝒜h\mathcal{A}_{h} with N2N^{2} entries. From Lemmas 1, we know that {Ak}\{A_{k}\} is a Toeplitz-like-plus-Cross matrix with the sizes 2kK𝒪(N)2^{k-K}\mathcal{O}(N) storage. Adding these terms together, we have

Storage=𝒪(N)(1+12+122++12K1)=𝒪(N).\mbox{Storage}=\mathcal{O}(N)\cdot\left(1+\frac{1}{2}+\frac{1}{2^{2}}+\ldots+\frac{1}{2^{K-1}}\right)=\mathcal{O}(N).

Regarding the computational complexity, the matrix-vector product associated with the matrix 𝒜h\mathcal{A}_{h} is a discrete convolution. On the other hand the cost of a direct product is O(N)O(N) for the Cross matrix, while the cost of using the FFT would lead to O(Nlog(N)O(N\log(N) for computations involving a dense Toeplitz matrix as the one in (16). Thus, the total per V-cycle AMG operation count is

Operation count=𝒪(NlogN)(1+12+122++12K1)=𝒪(NlogN).\mbox{Operation count}=\mathcal{O}(N\log N)\cdot\left(1+\frac{1}{2}+\frac{1}{2^{2}}+\ldots+\frac{1}{2^{K-1}}\right)=\mathcal{O}(N\log N).

4 Convergence of TGM for block-structured dense system (11)

The two-grid method (TGM) is rarely used in practice since the coarse grid operator may still be too large to be solved exactly. However, from a theoretical point of view, its study is useful as first step for evaluating the MGM convergence speed, whose analysis usually begins from that of the TGM [4, 3, 35, 37, 38, 39]. In the following we consider the convergence of the TGM for symmetric block-structured dense system (11). Since the matrix 𝒜:=𝒜h:=𝒜K\mathcal{A}:=\mathcal{A}_{h}:=\mathcal{A}_{K} is symmetric positive definite, we can define

(u,v)𝒟=(𝒟u,v),(u,v)𝒜=(𝒜u,v),(u,v)𝒜𝒟1𝒜=(𝒜u,𝒜v)𝒟1(u,v)_{\mathcal{D}}=(\mathcal{D}u,v),\ (u,v)_{\mathcal{A}}=(\mathcal{A}u,v),\ (u,v)_{\mathcal{A}\mathcal{D}^{-1}\mathcal{A}}=(\mathcal{A}u,\mathcal{A}v)_{\mathcal{D}^{-1}}

with 𝒟\mathcal{D} is the diagonal matrix of 𝒜\mathcal{A}.

Lemma 2.

[47] Let 𝒜K\mathcal{A}_{K} be a symmetric positive definite. Then

(1) the Jacobi method converges if and only if 2𝒟K𝒜K2\mathcal{D}_{K}-\mathcal{A}_{K} is symmetric positive definite;

(2) the Damped Jacobi method converges if and only if 0<ω<2/λmax(𝒟K1𝒜K)0<\omega<2/\lambda_{\max}(\mathcal{D}_{K}^{-1}\mathcal{A}_{K}).

Lemma 3.

[37, p. 84] Let 𝒜K\mathcal{A}_{K} be a symmetric positive definite. If ηω(2ωη0)\eta\leq\omega(2-\omega\eta_{0}) with η0λmax(𝒟K1𝒜K)\eta_{0}\geq\lambda_{\max}(\mathcal{D}_{K}^{-1}\mathcal{A}_{K}), then the damped Jacobi iteration with relaxation parameter 0<ω<2/η00<\omega<2/\eta_{0} satisfies

(21) 𝒮KνK𝒜K2νK𝒜K2η𝒜KνK𝒟K12νK𝔐K.||\mathcal{S}_{K}\nu^{K}||_{\mathcal{A}_{K}}^{2}\leq||\nu^{K}||_{\mathcal{A}_{K}}^{2}-\eta||\mathcal{A}_{K}\nu^{K}||_{\mathcal{D}_{K}^{-1}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.
Lemma 4.

[37, p. 89] Let 𝒜K\mathcal{A}_{K} be a symmetric positive definite matrix and SKS_{K} satisfies (21) and

(22) minνK1𝔐K1νKIK1KνK1𝒟K2μνK𝒜K2νK𝔐K\min_{\nu^{K-1}\in\mathfrak{M}^{K-1}}||\nu^{K}-I_{K-1}^{K}\nu^{K-1}||_{\mathcal{D}_{K}}^{2}\leq\mu||\nu^{K}||_{\mathcal{A}_{K}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}

with μ>0\mu>0 independent of νK\nu^{K}. Then, μη>0\mu\geq\eta>0 and the convergence factor of TGM satisfies

𝒮K𝒯K𝒜K1η/μνK𝔐K.||\mathcal{S}_{K}\mathcal{T}_{K}||_{\mathcal{A}_{K}}\leq\sqrt{1-\eta/\mu}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.

Next we need to check the smoothing condition (21) and approximation property (22), respectively. We will use notion called weakly diagonal dominant, if the diagonal element of a matrix is at least as large as the sum of absolute value of off-diagonal elements in the same row or column.

Lemma 5.

[45, p. 23] Let 𝒜KN×N\mathcal{A}_{K}\in\mathbb{R}^{N\times N} be a symmetric matrix. If 𝒜k\mathcal{A}_{k} is a strictly diagonally dominant or irreducibly weakly diagonally dominant matrix with positive real diagonal entries, then 𝒜K\mathcal{A}_{K} is positive definite.

Lemma 6.

Let 𝒜K:=𝒜hS\mathcal{A}_{K}:=\mathcal{A}^{S}_{h} be defined by (11). Then 𝒜K\mathcal{A}_{K} is a weakly diagonally dominant symmetric matrix with positive entries on the diagonal and nonpositive off-diagonal entries.

Proof.

From (12), we have

a0>0,am<0,1mrandam+12<0,0mr1,a_{0}>0,\quad a_{m}<0,~{}~{}1\leq m\leq r\quad{\rm and}\quad a_{m+\frac{1}{2}}<0,~{}~{}0\leq m\leq r-1,

and

a0+2m=1ram+2m=0r1am+12=0.a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0.

The proof is completed. ∎

We known that the matrix 𝒜K\mathcal{A}_{K} in (11) is reducible, since its graph is not connected [38, p. 91], which may lead to the matrix 𝒜K\mathcal{A}_{K} is singular or semi-positive definite, see Lemmas 5 and 6.

Lemma 7.

Let 𝒜K:=𝒜hS\mathcal{A}_{K}:=\mathcal{A}^{S}_{h} be defined by (11). Then 𝒜K\mathcal{A}_{K} is a symmetric positive definite matrix with positive entries on the diagonal and nonpositive off-diagonal entries.

Proof.

Let LN=tridiag(1,2,1)N×NL_{N}={\rm tridiag}(-1,2,-1)\in\mathbb{R}^{N\times N} be the discrete Laplacian operator. Let

(23) 𝒜K=𝒜resa1𝒜mainwith𝒜main=[LN1OOLN].\mathcal{A}_{K}=\mathcal{A}_{\rm res}-a_{1}\mathcal{A}_{\rm main}~{}~{}{\rm with}~{}~{}\mathcal{A}_{\rm main}=\left[\begin{array}[]{cc}L_{N-1}&O\\ O&L_{N}\end{array}\right].

We can check the matrix a1𝒜main-a_{1}\mathcal{A}_{\rm main} with a1<0a_{1}<0 is positive definite. We next prove 𝒜res\mathcal{A}_{\rm res} is a semi-positive definite matrix. From (23), it implies that the principal diagonal elements are positive and the off-diagonal are non-positive of the matrix 𝒜res\mathcal{A}_{\rm res}. Then 𝒜res\mathcal{A}_{\rm res} is still a diagonally dominant symmetric matrix. Thus, 𝒜res\mathcal{A}_{\rm res} is a semi-positive matrix by the Gerschgorin circle theorem [29, p. 388]. The proof is completed. ∎

Lemma 8.

Let the discrete Laplacian-like operators {LjM}j=1M1M×M\{L^{M}_{j}\}_{j=1}^{M-1}\in\mathds{R}^{M\times M} and discrete block-structured Laplacian operators be, respectively, defined by

LjM=[2j1 zeros 11112]M×Mandj=[LjM𝟎𝟎LjN].L^{M}_{j}=\left[\begin{matrix}2&\overset{j-1\text{ zeros }}{\overbrace{\cdots}}&-1&&\\ \ddots&\ddots&\ddots&\ddots&\\ -1&\ddots&\ddots&\ddots&-1\\ &\ddots&\ddots&\ddots&\ddots\\ &&-1&\ddots&2\end{matrix}\right]_{M\times M}~{}~{}{\rm and}~{}~{}\mathcal{L}_{j}=\left[\begin{array}[]{cc}{L}^{M}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right].

Then the matrices

(2lj=1lLjM)L1M,(2lj=1lj)1and21L1M+N\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1},~{}~{}~{}~{}\left(\frac{2}{l}\sum\limits_{j=1}^{l}\mathcal{L}_{j}\right)-\mathcal{L}_{1}~{}~{}{\rm and}~{}~{}2\mathcal{L}_{1}-L^{M+N}_{1}

are positive definite.

Proof.

The first results of this lemma can be seen in Lemma 3.10 of [17], which implies that the second one is also satisfied, since

(2lj=1lj)1=2lj=1l[LjM𝟎𝟎LjN][L1M𝟎𝟎L1N]=[(2lj=1lLjM)L1M𝟎𝟎(2lj=1lLjN)L1N].\begin{split}\left(\frac{2}{l}\sum\limits_{j=1}^{l}\mathcal{L}_{j}\right)-\mathcal{L}_{1}&=\frac{2}{l}\sum^{l}_{j=1}\left[\begin{array}[]{cc}{L}^{M}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right]-\left[\begin{array}[]{cc}{L}^{M}_{1}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{1}\end{array}\right]\\ &=\left[\begin{array}[]{cc}\left(\frac{2}{l}\sum^{l}_{j=1}{L}^{M}_{j}\right)-{L}^{M}_{1}&~{}\mathbf{0}\\ \mathbf{0}&~{}\left(\frac{2}{l}\sum^{l}_{j=1}L^{N}_{j}\right)-L^{N}_{1}\end{array}\right].\end{split}

On the other hand, we can check that 21L1M+N2\mathcal{L}_{1}-L^{M+N}_{1} is an irreducible and weakly diagonally dominant symmetric matrix, which means that it is positive definite by Lemma 5. The proof is completed. ∎

Remark 4.1.

In order to understand the spectral features of the matrices considered in Lemma 8, we can adopt the analysis via the related generating functions, since all the matrices in Lemma 8 are of real symmetric banded Toeplitz type, so the admit real-valued trigonometric polynomials as generating functions (see [39] and references therein): on the other hand the matrices j\mathcal{L}_{j} are block diagonal and hence their spectral analysis reduces to the Toeplitz setting. For instance, according to the notation in [39], LjM=TM(22cos(jθ))L^{M}_{j}=T_{M}(2-2\cos(j\theta)) that is the function 22cos(jθ)2-2\cos(j\theta) is the generating function of LjML^{M}_{j}. From classical results, we know that TM(f)T_{M}(f) is positive definite for any matrix-size MM if ff is essentially bounded and nonnegative, with positive essential supremum. In the present setting, the maximum of 22cos(jθ)2-2\cos(j\theta) is 4 and its minimum is zero and hence LjM=TM(22cos(jθ))L^{M}_{j}=T_{M}(2-2\cos(j\theta)) is positive definite. Not only this: if the nonnegative generating function ff has a unique zero of order α>0\alpha>0 then the minimal eigenvalue of TM(f)T_{M}(f) is positive converges monotonically to zero as c/Mαc/M^{\alpha} with cc depending on the second derivative of ff at the zero if it is positive. Based on these tools, we can deduce that LjML^{M}_{j} is positive definite, has minimal eigenvalue positive converging monotonically to zero as cj/M2c_{j}/M^{2} with positive cjc_{j} independent of MM, and cjc_{j} related to the second derivative of 22cos(jθ)2-2\cos(j\theta) at θ=0\theta=0.

Of course, by linearity, the Toeplitz matrix (2lj=1lLjM)L1M\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1} has generating function given by fl(θ)(2lj=1l[22cos(jθ)])(22cos(θ))f_{l}(\theta)\equiv\left(\frac{2}{l}\sum\limits_{j=1}^{l}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\left[2-2\cos(j\theta)\right]}\right)-(2-2\cos(\theta)) and hence by studying this generating function, we deduce that

(2lj=1lLjM)L1M\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1}

is positive definite, has minimal eigenvalue positive converging monotonically to zero as c/M2c/M^{2} with positive cc independent of MM. Hence its condition number grow exactly as M2M^{2} and since the related generating function has a unique zero of order 22 at θ=0\theta=0, there is a formal justification in using standard projectors and restriction operators, like those employed in the standard AMG, for the classical discrete Laplacian (see [28, 3, 39] and references therein).

Lemma 9.

Let 𝒜K:=𝒜hS\mathcal{A}_{K}:=\mathcal{A}^{S}_{h} be defined by (11). Then the damped Jacobi iteration with relaxation parameter 0<ω10<\omega\leq 1 satisfies

SKνK𝒜K2νK𝒜K212𝒜KνK𝒟K12νK𝔐K.||S_{K}\nu^{K}||_{\mathcal{A}_{K}}^{2}\leq||\nu^{K}||_{\mathcal{A}_{K}}^{2}-\frac{1}{2}||\mathcal{A}_{K}\nu^{K}||_{\mathcal{D}_{K}^{-1}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.
Proof.

According to (11) and (12), we know that the matrix 𝒜K\mathcal{A}_{K} is a weakly diagonal dominant, since

a0+2m=1ram+2m=0r1am+12=0.a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0.

Taking 𝒜K={ai,j(K)}i,j=1\mathcal{A}_{K}=\left\{a_{i,j}^{(K)}\right\}_{i,j=1}^{\infty} with ai,j(K)=a|ji|(K)a_{i,j}^{(K)}=a_{|j-i|}^{(K)}, it yields

ri(K):=ji|ai,j(K)|ai,i(K)=a0.r_{i}^{(K)}:=\sum\limits_{j\neq i}|a_{i,j}^{(K)}|\leq a_{i,i}^{(K)}=a_{0}.

Using the Gerschgorin circle theorem [29, p. 388], the eigenvalues of 𝒜K\mathcal{A}_{K} are in the disks centered at ai,i(K)a_{i,i}^{(K)} with radius ri(K)r_{i}^{(K)}. Namely, the eigenvalues λ\lambda of the matrix 𝒜K\mathcal{A}_{K} satisfy

|λai,i(K)|ri(K),|\lambda-a_{i,i}^{(K)}|\leq r_{i}^{(K)},

which leads to λmax(𝒜K)ai,i(K)+ri(K)<2ai,i(K)=2a1,1(K)=2a0\lambda_{\max}(\mathcal{A}_{K})\leq a_{i,i}^{(K)}+r_{i}^{(K)}<2a_{i,i}^{(K)}=2a_{1,1}^{(K)}=2a_{0}.

From Rayleigh theorem [29, p. 235], we deduce

λmax(𝒜K)=maxx0xT𝒜KxxTxxn.\lambda_{\max}(\mathcal{A}_{K})=\max_{x\neq 0}\frac{x^{T}\mathcal{A}_{K}x}{x^{T}x}\quad\forall x\in\mathbb{{R}}^{n}.

Take x=[1,0,,0]Tx=[1,0,\ldots,0]^{T}. Then

λmax(𝒜K)xT𝒜KxxTx=a1,1(K)=a0,\lambda_{\max}(\mathcal{A}_{K})\geq\frac{x^{T}\mathcal{A}_{K}x}{x^{T}x}=a_{1,1}^{(K)}=a_{0},

and

λmax((𝒟K)1𝒜K)=λmax(𝒜K)a1,1(K)=λmax(𝒜K)a0[1,2],\lambda_{\max}\left(\left(\mathcal{D}_{K}\right)^{-1}\mathcal{A}_{K}\right)=\frac{\lambda_{\max}(\mathcal{A}_{K})}{a_{1,1}^{(K)}}=\frac{\lambda_{\max}(\mathcal{A}_{K})}{a_{0}}\in[1,2],

where 𝒟K\mathcal{D}_{K} is the diagonal of 𝒜K\mathcal{A}_{K}.

From Lemma 2, the Damped Jacobi method converges with 0<ω<10<\omega<1. By the similar proof process of Lemma 7, we can check that 2𝒟K𝒜K2\mathcal{D}_{K}-\mathcal{A}_{K} is symmetric positive definite. Then using Lemma 2 again, the Jacobi method converges. Hence, the damped Jacobi iteration with relaxation parameter 0<ω10<\omega\leq 1 converges. The proof is completed. ∎

Lemma 10.

Let 𝒜K=𝒜h\mathcal{A}_{K}=\mathcal{A}_{h} be defined by (11). Then

minνK1𝔐K1νKIK1KνK1𝒟K2124νK𝒜K2νK𝔐K.\min_{\nu^{K-1}\in\mathfrak{M}^{K-1}}||\nu^{K}-I_{K-1}^{K}\nu^{K-1}||_{\mathcal{D}_{K}}^{2}\leq\frac{1}{24}||\nu^{K}||_{\mathcal{A}_{K}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.
Proof.

Let discrete block-structured Laplacian-like operators be defined by

(24) j=[LjN1𝟎𝟎LjN]=[LjN1𝟎𝟎𝟎]+[𝟎𝟎𝟎LjN],j=1,2,,r,\mathcal{L}_{j}=\left[\begin{array}[]{cc}{L}^{N-1}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right]=\left[\begin{array}[]{cc}{L}^{N-1}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}\mathbf{0}\end{array}\right]+\left[\begin{array}[]{cc}\mathbf{0}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right],~{}~{}j=1,2,\dots,r,

where the discrete Laplacian-like operators {LjN1}j=1r(N1)×(N1)\{L^{N-1}_{j}\}^{r}_{j=1}\in\mathds{R}^{(N-1)\times(N-1)}, {LjN}j=1rN×N\{L^{N}_{j}\}^{r}_{j=1}\in\mathds{R}^{N\times N} are given in Lemma 8.

From (11), the block-structured dense matrix 𝒜K\mathcal{A}_{K} can be denoted as the series sum with Laplacian-like operators j\mathcal{L}_{j}, i.e.,

(25) 𝒜K=[ABBTA^]=j=1rajj+and=(a0+2j=1raj)I+[𝟎BBT𝟎]\mathcal{A}_{K}=\left[\begin{array}[]{cc}A&B\\ B^{T}&\widehat{A}\end{array}\right]=-\sum^{r}_{j=1}a_{j}\mathcal{L}_{j}+\mathcal{B}~{}~{}{\rm and}~{}~{}\mathcal{B}=\left(a_{0}+2\sum^{r}_{j=1}a_{j}\right)I+\left[\begin{array}[]{cc}\mathbf{0}&~{}B\\ B^{T}&~{}\mathbf{0}\end{array}\right]

with II an identity matrix.

Using (12), it yields a0+2m=1ram+2m=0r1am+12=0,a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0, which implies that \mathcal{B} is a weakly diagonally dominant symmetric matrix with positive entries on the diagonal and nonpositive off-diagonal entries. Then \mathcal{B} is symmetric and semi-positive definite by applying the Gerschgorin circle theorem.

From (25), (12) and Lemma 8, we obtain

νK𝒜K=(𝒜KνK,νK)a12(j=1rjνK,νK)ra14(1νK,νK)ra18(L12N1νK,νK)a048(L12N1νK,νK)a024νKIK1KνK2=124νKIK1KνJ𝒟K2.\begin{split}\left|\left|\nu^{K}\right|\right|_{\mathcal{A}_{K}}&=\left(\mathcal{A}_{K}\nu^{K},\nu^{K}\right)\geq-\frac{a_{1}}{2}\left(\sum^{r}_{j=1}\mathcal{L}_{j}\nu^{K},\nu^{K}\right)\geq-\frac{ra_{1}}{4}\left(\mathcal{L}_{1}\nu^{K},\nu^{K}\right)\\ &\geq-\frac{ra_{1}}{8}\left(L^{2N-1}_{1}\nu^{K},\nu^{K}\right)\geq\frac{a_{0}}{48}\left(L^{2N-1}_{1}\nu^{K},\nu^{K}\right)\geq\frac{a_{0}}{24}\left|\left|\nu^{K}-I^{K}_{K-1}\nu^{K}\right|\right|^{2}\\ &=\frac{1}{24}\left|\left|\nu^{K}-I^{K}_{K-1}\nu^{J}\right|\right|^{2}_{\mathcal{D}_{K}}.\end{split}

The proof is completed. ∎

Theorem 11.

Let 𝒜K=𝒜h\mathcal{A}_{K}=\mathcal{A}_{h} be defined by (11). Then the convergence factor of the TGM satisfies

SKTK𝒜K47/48<1.\left|\left|S_{K}T_{K}\right|\right|_{\mathcal{A}_{K}}\leq\sqrt{47/48}<1.
Proof.

According to Lemma 3, 4, 9 and 10, the desired result is obtained. ∎

5 Numerical results

We employ the V-cycle block-structured AMG described in Algorithm 2 to solve the time-dependent nonlocal problems in Section 2. The stopping criterion is taken as

r(i)r0<1015,\frac{||r^{(i)}||}{||r^{0}||}<10^{-15},

where r(i)r^{(i)} is the residual vector after ii iterations, and the number of iterations (m1,m2)=(1,1)(m_{1},m_{2})=(1,1) and (wpre,wpost)=(1,1/2)(w_{pre},w_{post})=(1,1/2). In all tables, NN denotes the number of spatial grid points, and the numerical errors are measured by the ll_{\infty} (maximum) norm, ”rate” denotes the convergence orders, ”CPU” denotes the total CPU time in seconds (s) for solving the discretized systems, ”Iter” denotes the average number of iterations required to solve algebraic systems at each time level.

All numerical experiments are programmed in Matlab, and the computations are carried out on a desktop with the configuration: Intel(R) Core(TM) i7-7700 3.60 GHZ and 8 GB RAM and a 64 bit Windows 10 operating system.

First we consider the time-dependent nonlocal/peridynamic models (3) and (5) in Section 2. The initial value and the forcing term are chosen such that the exact solution of the considered equations is

u(x,t)=et(1+x)6,0x1, 0t1.u(x,t)=e^{t}\left(1+x\right)^{6},\quad 0\leqslant x\leqslant 1,\ 0\leqslant t\leqslant 1.

We apply the following BDF4 method to such nonlocal models

(26) BDF4scheme:(2512Iτ𝒜)Uk=4Uk13Uk2+43Uk314Uk4+τFk.\begin{split}{\rm BDF4~{}scheme}:&\left(\frac{25}{12}I-\tau\mathcal{A}\right)U^{k}=4U^{k-1}-3U^{k-2}+\frac{4}{3}U^{k-3}-\frac{1}{4}U^{k-4}+\tau F^{k}.\end{split}

Here the operator 𝒜\mathcal{A} denotes the block-structured systems (4), (10) and (11), respectively. It should be noted that the stability and convergence analysis of time-dependent nonlocal problem (3) and (5) can be seen in [1, 20, 19, 11]. The convergence rate of the two-grid method for time-dependent block-structured systems (11) can be directly obtained by Theorem 11 and [22]: here we omit the related derivation.

5.1 Application in nonlocal model

Table 1 shows that the BDF4 scheme (26) for time-dependent nonlocal model (3) has the global truncation error 𝒪(τ4+h4γ)\mathcal{O}\left(\tau^{4}+h^{4-\gamma}\right) and the computation cost is almost 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) operations.

Table 1: Using Galerkin approach 𝒜k1=Ikk1𝒜kIk1k\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1} in (18) computed by Lemma 1 to solve the time-dependent nonlocal model (3) with τ=h=1/N\tau=h=1/N.
NN γ=0\gamma=0 γ=0.5\gamma=0.5 γ=0.9\gamma=0.9
Error Rate CPU Iter Error Rate CPU Iter Error Rate CPU Iter
252^{5} 1.4460e-05 0.11s 3 1.9587e-05 0.1167s 3 3.4809e-05 0.14s 4
262^{6} 9.7263e-07 3.89 0.39s 3 1.5160e-06 3.6916 0.4236s 3 3.5907e-06 3.28 0.47s 4
272^{7} 6.2993e-08 3.95 0.77s 2 1.1678e-07 3.6984 0.9123s 3 3.7667e-07 3.25 1.10s 3
282^{8} 3.9959e-09 3.98 2.26s 2 9.1285e-09 3.6772 2.1589s 2 4.0423e-08 3.22 2.94s 3

5.2 Application in Peridynamic model

We further extend the V-cycle block-structured AMG Algorithm 2 to simulate the peridynamic model with nonsymmetric indefinite block-structured dense systems and symmetric positive definite block-structured dense systems, respectively.

5.2.1 Nonsymmetric indefinite block-structured dense systems

Table 2 shows that the BDF4 scheme (26) for time-dependent peridynamic model (3) with nonsymmetric indefinite block-structured dense systems has the global truncation error 𝒪(τ4+hmax{2,42β})\mathcal{O}\left(\tau^{4}+h^{\max\left\{2,4-2\beta\right\}}\right) with δ=𝒪(hβ)\delta=\mathcal{O}\left(h^{\beta}\right), β0\beta\geq 0 and a computational cost of 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) arithmetic operations.

Table 2: Nonsymmetric indefinite block-structured dense systems (10): Using Galerkin approach 𝒜k1=Ikk1𝒜kIk1k\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1} in (18) computed by Lemma 1 to solve the time-dependent peridynamic model (3) with and τ=h=1/N\tau=h=1/N.
NN δ=1/4\delta=1/4 δ=h\delta=\sqrt{h}
Error Rate CPU Iter Error Rate CPU Iter
BDF4 252^{5} 4.3254e-05 0.1460s 7 9.9042e-05 0.2276s 10
262^{6} 2.7166e-06 3.9930 0.3628s 6 9.3791e-06 3.4005 0.4737s 8
272^{7} 1.7022e-07 3.9963 0.6684s 5 1.1886e-06 2.9802 1.5029s 8
282^{8} 1.0652e-08 3.9981 2.0338s 4 1.3694e-07 3.1177 3.5816s 7

5.2.2 Symmetric positive definite block-structured dense systems

Table 3 shows that the BDF4 scheme (26) for time-dependent peridynamic model (3) with symmetric block-structured dense systems has the global truncation error 𝒪(τ4+hmax{2,42β})\mathcal{O}\left(\tau^{4}+h^{\max\left\{2,4-2\beta\right\}}\right) with δ=𝒪(hβ)\delta=\mathcal{O}\left(h^{\beta}\right), β0\beta\geq 0 and a computational cost of 𝒪(NlogN)\mathcal{O}(N\mbox{log}N) arithmetic operations.

Table 3: Symmetric positive definite block-structured dense systems (11): Using Galerkin approach 𝒜k1=Ikk1𝒜kIk1k\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1} in (18) computed by Lemma 1 to solve the time-dependent peridynamic model (3) with and τ=h=1/N\tau=h=1/N.
NN δ=1/4\delta=1/4 δ=h\delta=\sqrt{h}
Error Rate CPU Iter Error Rate CPU Iter
BDF4 252^{5} 1.1628e-05 0.2341s 9 2.5257e-05 0.2987s 12
262^{6} 7.3840e-07 3.9771 0.4819s 7 2.3810e-06 3.4070 0.5999s 10
272^{7} 4.6514e-08 3.9887 1.0638s 6 2.9930e-07 2.9919 2.0004s 10
282^{8} 2.9182e-09 3.9945 2.6782s 5 3.4366e-08 3.1226 4.6659s 9

6 Conclusions

In this paper, we considered the solutions of block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problem and peridynamic problem. We designed a AMG for block-structured dense and Toeplitz-like-plus-Cross systems, by making also use of fast Fourier transforms, and we provided an estimate of the TGM convergence rate for the peridynamic problem with symmetric positive definite block-structured dense linear systems. In this specific context, we answered the question on how to define coarsening and interpolation operators, when the stiffness matrix leads to nonsymmetric systems [10, 31]. The simple (traditional) restriction operator and prolongation operator are employed for such Toeplitz-like-plus-Cross systems, so that the entries of the sequence of subsystems are explicitly determined on different levels.

For the future, at least two questions arise and we plan to investigate them: more precisely we would like to consider the study of the TGM convergence analysis for nonsymmetric block-structured dense systems and the analysis of the full MGM for symmetric block-structured dense systems, based on the ideas presented in [15, 17].

References

  • [1] G. Akrivis, M. H. Chen, F. Yu, and Z. Zhou, The energy technique for the six-step BDF method, SIAM J. Numer. Anal., 59 (2021), pp. 2449–2472.
  • [2] F. Andreu-Vaillo, J. M. Mazón, J. D. Rossi, and J. J. Toledo-Melero, Nonlocal Diffusion Problems, Math. Surveys Monogr. 165, AMS, Providence, RI, 2010.
  • [3] A. Aricò and M. Donatelli, A V-cycle multigrid for multilevel matrix algebras: proof of optimality, Numer. Math., 105 (2007), pp. 511–547.
  • [4] A. Aricò, M. Donatelli, and S. Serra-Capizzano, V-cycle optimal convergence for certain (multilevel) structured linear systems, SIAM J. Matrix Anal. Appl., 26 (2004), pp. 186–214.
  • [5] K. E. Atkinson, The Numerical Solution of Integral Equations of the Second Kind, Cambridge University Press, New York, 2009.
  • [6] P. Bates, On some nonlocal evolution equations arising in materials science In: H. Brunner, X. Zhao and X. Zou (eds.) Nonlinear Dynamics and Evolution Equations. in Fields Inst. Commun. AMS, Providence, RI, (2006), pp. 13–52.
  • [7] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta Numer., 14 (2005), pp. 1–137.
  • [8] M. Bolten, M. Donatelli, T. Huckle, and C. Kravvaritis, Generalized grid transfer operators for multigrid methods applied on Toeplitz matrices, BIT., 55 (2015), pp. 341–366.
  • [9] A. Böttcher and S. M. Grudsky, Spectral Properties of Banded Toeplitz Matrices, SIAM, Philadelphia, PA, 2005.
  • [10] M. Brezina, T. Manteuffel, S. Mccormick, J. Ruge, and G. Sanders, Towards adaptive smoothed aggregation (α\alphaSA) for nonsymmetric problems, SIAM J. Sci. Comput., 32 (2010), pp. 14–39.
  • [11] R. J. Cao, M. H. Chen, M. K. Ng, and Y. J. Wu, Fast and high-order accuracy numerical methods for time-dependent nonlocal problems in 2\mathbb{R}^{2}, J. Sci. Comput., 84 (2020), Paper No.8.
  • [12] R. H. Chan and X. Q. Jin, An Introduction to Interative Toeplitz Solvers, SIAM, Phildelphia, 2007.
  • [13] R. H. Chan and M. K. Ng, Conjugate gradient methods for Toeplitz systems, SIAM Rev. 38 (1996), pp. 427–482.
  • [14] R. H. Chan, Q. S. Chang, and H. W. Sun, Multigrid method for ill-conditioned symmetric Toeplitz systems, SIAM J. Sci. Comput., 19 (1998), pp. 516–529.
  • [15] M. H. Chen, S. E. Ekström, S. Serra-Capizzano, A Multigrid method for nonlocal problems: non-diagonally dominant or Toeplitz-plus-tridiagonal systems, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1546–1570.
  • [16] M. H. Chen and W. H. Deng, Fourth order accurate scheme for the space fractional diffusion equations, SIAM J. Numer. Anal., 52 (2014), pp. 1418–1438.
  • [17] M. H. Chen, and W. H. Deng, Convergence analysis of a Multigrid method for a nonlocal model, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 869–890.
  • [18] M. H. Chen, and W. H. Deng, and S. Serra-Capizzano, Uniform convergwnce of V-cycle Multigrid algorithms for two-dimensional fractional Feynman-Kac equation, J. Sci. Comput., 74 (2018), pp. 1034–1059.
  • [19] M. H. Chen, W. Y. Qi, J. K. Shi and J. M. Wu, A sharp error estimate of piecewise polynomial collocation for nonlocal problems with weakly singular kernels, IMA J. Numer. Anal., 41 (2021), pp. 3145–3174.
  • [20] M. H. Chen, Y. F. Qi, and J. K. Shi, Asymptotically compatible piecewise quadratic polynomial collocation for nonlocal model, arXiv:2010.09215
  • [21] M. H. Chen, J. K. Shi, and X. B. Yin, Analysis of (shifted) piecewise quadratic polynomial collocation for nonlocal diffusion model, Under Review
  • [22] M. H. Chen, Y. T. Wang, X. Cheng, and W. H. Deng, Second-order LOD multigrid method for multidimensional Riesz fractional diffusion equation, BIT, 54 (2014), pp. 623–647.
  • [23] M. D’Elia, Q. Du, C. Glusa, M. Gunzburger, X. C. Tian, and Z. Zhou, Numerical methods for nonlocal and fractional models, Acta Numer., 29 (2020), pp. 1–124.
  • [24] M. Donatelli, P. Ferrari, I. Furci, S. Serra-Capizzano, and D. Sesana, Multigrid methods for block-Toeplitz linear systems: convergence analysis andapplications, Numer. Linear Algebra. Appl., 28 (2021), No. e2356
  • [25] M. Donatelli, M. Mazza, and S. Serra-Capizzano, Spectral analysis and multigrid methods for finite volume approximations of space-fractional diffusion equations, SIAM J. Sci. Comput., 40 (2018), pp. A4007–A4039.
  • [26] Q. Du, Nonlocal Modeling, Analysis, and Computation, SIAM, Philadelphia, 2019.
  • [27] Q. Du, M. Gunzburger, R. Lehoucq, and K. Zhou, Analysis and approximation of nonlocal diffusion problems with volume constraints, SIAM Rev., 56 (2012), pp. 676–696.
  • [28] G. Fiorentino and S. Serra, Multigrid methods for symmetric positive definite block Toeplitz matrices with nonnegative generating functions, SIAM J. Sci. Comput., 17 (1996), pp. 1068–1081.
  • [29] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, 2013.
  • [30] Y. Leng, X. Tian, N. Trask, and J. T. Foster, Asymptotically compatible reproducing kernel collocation and meshfree integration for nonlocal diffusion, SIAM J. Numer. Anal., 59 (2021), pp. 88–118.
  • [31] T. Manteuffel and S. Southworth, Convergence in norm of nonsymmetric algebraic multigrid, SIAM J. Sci. Comput., 41 (2019), pp. S269-S296.
  • [32] M. K. Ng, J. Y. Pan, Weighted Toeplit regularized least squares computation for image restoration, SIAM J. Sci.Comput., 36 (2014), pp. B94-B121.
  • [33] Y. Notay, A new algebraic multigrid approach for Stokes problems, Numer. Math., 132 (2016), pp. 51-84.
  • [34] D. Palitta and V. Simoncini, Matrix-equation-based strategies for convection-diffusion equations, BIT Numer. Math., 56 (2016), pp. 751–776.
  • [35] H. Pang and H. Sun, Multigrid method for fractional diffusion equations, J. Comput. Phys., 231 (2012), pp. 693–703.
  • [36] W. Y. Qi, P. Seshaiyer, J. P. Wang,, Finite element method with the total stress variable for Biot’s consolidation model, Numer. Methods Partial Differential Eq., 37 (2021), pp. 2409-2428.
  • [37] J. Ruge and K. Stüben, Algebraic multigrid, in Multigrid Methods, Ed: S. McCormick, SIAM, 1987, pp. 73–130.
  • [38] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM , Philadelphia, 2003.
  • [39] S. Serra-Capizzano, Convergence analysis of two-grid methods for elliptic Toeplitz and PDEs matrix-sequences, Numer. Math., 92 (2002), pp. 433–465.
  • [40] S. A. Silling, Reformulation of elasticity theory for discontinuities and long-range forces, J. Mech. Phys. Solids, 48 (2000), pp. 175–209.
  • [41] V. Simoncini, On the numerical solution of a class of systems of linear matrix equations, IMA J. Numer. Anal., 40 (2020), pp. 207-225.
  • [42] A. A. Sivas, B. S. Southworth, and S. Rhebergen, Air algebraic multigrid for a space-time hybridizable discontinuous Garlerkin discretization of advection(-diffusion), SIAM J. Sci. Comput., 43 (2021), pp. A3393-A3416.
  • [43] W. J. Stewart, Introduction to the Numerical Soluton of Markov Chains, Princeton Univ. Press, 1994.
  • [44] X. C. Tian and Q. Du, Analysis and comparison of different approximations to nonlocal diffusion and linear peridynamic equations, SIAM J. Numer. Anal., 51 (2013), pp. 3458–3482.
  • [45] R. V. Varga, Matrix Iterative Analysis, Springer, Berlin Heidelberg, 2000.
  • [46] MT. A. Wiesner, M. Mayr, A. Popp, M. W. Gee, and W. A. Wall, Algebraic multigrid methods for saddle point systems arising from mortar contact formulations, Int. J. Numer. Methods Eng., 122 (2021), pp. 3749–3779.
  • [47] J. Xu, An introduction to multilevel methods, in wavelets, Iterative Methods for Sparse Linear Systems, Multilevel Methods and Elliptic PDEs, Leicester, 1996, M.Ainsworth, J. Levesley, W. A. Light, and M. Marletta, eds., Oxford Universty Press, New York, 1997, pp. 213-302.
  • [48] J. Xu and L. Zikatanov, Algebraic multigrid methods, Acta. Numerica., 26 (2017), pp. 591–721.