Fast algebraic multigrid for block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problems ^†^†thanks: This work was supported by NSFC 11601206.

Minghua Chen Corresponding author. School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou 730000, P.R. China (Email: [email protected]) Rongjun Cao School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou 730000, P.R. China (Email: [email protected]) Stefano Serra-Capizzano Department of Science and High Technology, University of Insubria, Via Valleggio 11, 22100 Como, Italy

\&

Department of Information Technology, Division of Scientific Computing, Uppsala University - ITC, L ägerhyddsv. 2, hus 2, P.O. Box 337, SE-751 05, Uppsala, Sweden (Email: [email protected], [email protected])

Abstract

Algebraic multigrid (AMG) is one of the most efficient iterative methods for solving large sparse system of equations. However, how to build/check restriction and prolongation operators in practical of AMG methods for nonsymmetric sparse systems is still an interesting open question [Brezina, Manteuffel, McCormick, Runge, and Sanders, SIAM J. Sci. Comput. (2010); Manteuffel and Southworth, SIAM J. Sci. Comput. (2019)]. This paper deals with the block-structured dense and Toeplitz-like-plus-Cross systems, including nonsymmetric indefinite, symmetric positive definite (SPD), arising from nonlocal diffusion problem and peridynamic problem. The simple (traditional) restriction operator and prolongation operator are employed in order to handle such block-structured dense and Toeplitz-like-plus-Cross systems, which is convenient and efficient when employing a fast AMG. We focus our efforts on providing the detailed proof of the convergence of the two-grid method for such SPD situations. The numerical experiments are performed in order to verify the convergence with a computational cost of only $\mathcal{O}(N\mbox{log}N)$ arithmetic operations, by using few fast Fourier transforms, where $N$ is the number of the grid points. To the best of our knowledge, this is the first contribution regarding Toeplitz-like-plus-Cross linear systems solved by means of a fast AMG.

keywords:

Algebraic multigrid, nonlocal diffusion problem, peridynamic problem, block-structured dense system, Toeplitz-like-plus-Cross system, fast Fourier transform

AMS:

65M55,74A70, 65T50

1 Introduction

Large, sparse, block-structured linear systems arise in a wide variety of applications throughout computational science and engineering involving advection-diffusion flow [42], image process [32], Markov chains [43], Biot’s consolidation model [36], Navier-Stocks equations and saddle point problems [7]. In this paper we study the fast algebraic multigrid for solving the block-structured dense linear systems, stemming from nonlocal problems [2, 11, 19, 26] by the piecewise quadratic polynomial collocation approximations, whose associated matrix can be expressed as a $2\times 2$ block structure

(1)

\mathcal{A}u=\left(\begin{array}[]{cc}A&B\\ C&D\end{array}\right)\left(\begin{array}[]{c}v\\ w\end{array}\right)=\left(\begin{array}[]{c}b_{f}\\ b_{g}\end{array}\right),

with the coefficient matrices $A\in\mathbb{R}^{M\times M}$ , $B\in\mathbb{R}^{M\times N}$ , $C\in\mathbb{R}^{N\times M}$ and $D\in\mathbb{R}^{N\times N}$ .

Algebraic multigrid (AMG) is one of the most efficient iterative methods for solving large-scale system of equations [37, 48]. In the past decades, AMG methods for linear systems having Toeplitz coefficient matrices with scalar entries have been widely studied [14] including elliptic PDEs [37, 39, 47, 48], fractional PDEs [22, 25, 35] and nonlocal PDEs [15, 17]. Some papers have investigated the case of block entries, where the entries are small generic matrices of fixed size instead of scalars [18, 24, 28]. Only few papers have investigated block-structured sparse linear systems (1). For example, setting up partial prolongations operators with a Galerkin coarse grid matrix, a new AMG approach for Stokes problem are designed in [33]. Constructing the corresponding tentative transfer operators, a fully aggregation-based AMG is developed for nonlinear contact problems of saddle point [46]. Using approximate ideal restriction (AIR) operators, AIR AMG method for space-time hybridizable discontinuous Galerkin discretization of advection-dominated flows are investigates [42]. Defining the interpolation matrix with the coarse coefficient vector, multilevel Markov chain Monte Carlo AMG algorithms are performed. Involving sparse integral transfer operators, towards adaptive smoothed aggregation AMG for nonsymmetric problems are considered [10]. A transfer operator based on the fractional approximation property, two-grid methods convergence in norm of nonsymmetric algebraic multigrid are presented [10]. However, how to build/check restriction and prolongation operators in practical of AMG methods for nonsymmetric sparse systems is still an interesting open question [10, 31]. In particular, how to develop/design fast AMG for block-structured dense linear systems (1) is still an interesting problems, since the above special prolongation/transfer operators are not easy to be employed in connection with the fast Fourier transform.

In this work, the simple (traditional) restriction operator and prolongation operator are used in order to handle such block-structured dense systems (1), including nonsymmetric indefinite system, symmetric positive definite (SPD), Toeplitz-plus-diagonal systems, which derive from the nonlocal problems discussed in [2, 11, 19, 26]. In general, it is still not at all easy for dense stiffness matrices [3, 4, 8, 17], unless we can reduce the problem to the Toeplitz setting and we know the symbol, its zeros, and their orders [39]. Instead we will focus our attention on first answering such a question for a two-level setting, since it is useful from a theoretical point of view as first step: in fact the study of the MGM convergence usually begins from the convergence analysis of the two-grid method (TGM) [35, 37, 48]. We focus our attention in providing a detailed proof of the convergence of the two-grid method (TGM) for the considered SPD linear systems. To the best of our knowledge, this is the first time that a fast AMG is studied for the block-structured dense linear systems as those reported in (1).

The outline of this paper is as follows. In the next section, we introduce block-structured dense systems including applications in nonlocal diffusion problems and peridynamic problem by the piecewise quadratic polynomial collocation. In Section 3, block-structured V-cycle AMG algorithm by Fast fourier transform for Toeplitz-like-plus-Cross systems are designed. Convergence rate of the two-grid method is analyzed in Section 4. To show the effectiveness of the presented schemes, results of numerical experiments are reported in section 5. Finally, we conclude the paper with some remarks and open problems.

2 Block-structured dense systems applications

Nonlocal diffusion problems have been used to model very different scientific phenomena occurring in various applied fields, for example in biology, particle systems, image processing, coagulation models, mathematical finance, etc. [2, 26]. Recently, the nonlocal volume-constrained diffusion problems, the so-called nonlocal model for distinguishing the nonlocal diffusion problems, attracted the wide interest of scientists [26], where the linear scalar peridynamic model can be considered as a special case [26, 40]. For example, the nonlocal peridynamic (PD) model is becoming an attractive emerging tool for the multiscale material simulations of crack nucleation and growth, fracture, and failure of composites [40].

Let the piecewise quadratic base functions be defined by [5, p. 37]

\phi_{m}(x)=\left\{\begin{array}[]{rl}\frac{x-x_{m-1}}{h}\frac{2x-(x_{m-1}+x_{m})}{h},&x\in\left[x_{m-1},x_{m}\right],\\ \frac{x_{m+1}-x}{h}\frac{(x_{m+1}+x_{m})-2x}{h},&x\in\left[x_{m},x_{m+1}\right],\\ 0,&{\rm otherwise};\end{array}\right.

and

\phi_{m-\frac{1}{2}}(x)=\left\{\begin{array}[]{rl}\frac{4(x-x_{m-1})(x_{m}-x)}{h^{2}},&x\in\left[x_{m-1},x_{m}\right],\\ 0,&{\rm otherwise}.\end{array}\right.

Then the piecewise Lagrange quadratic interpolant of $u(x)$ is

(2)

u_{Q}(x)=\sum_{j=-r}^{N+r}u(x_{j})\phi_{j}(x)+\sum_{j=-r}^{N+r-1}u(x_{j+\frac{1}{2}})\phi_{j+\frac{1}{2}}(x).

In this work, we mainly focus on two types of nonlocal problems approximated by the piecewise quadratic polynomial collocation, which give raise to the block-structured dense systems expressed in (1).

2.1 Application in nonlocal model

Consider the time-dependent nonlocal diffusion problem, whose prototype is [2, 6, 11, 19]

(3)

\left\{\begin{split}u_{t}-\mathcal{L}u&=f&~{}~{}{\rm on}&~{}~{}\Omega,\,t>0,\\ u(x,0)&=u_{0}&~{}~{}{\rm on}&~{}~{}\Omega,\end{split}\right.

with Dirichlet boundary conditions. The nonlocal operator $\mathcal{L}$ is defined by

\begin{split}\mathcal{L}u(x)=\int_{\Omega}J(|x-y|)\left[u(x)-u(y)\right]dy~{}~{}\forall x\in\Omega.\end{split}

Here $J(x)\sim\frac{1}{x^{1+2s}},~{}~{}s\in[-0.5,1)$ is a radial probability density with a nonnegative symmetric dispersal kernel.

We briefly review some basic relevant notions concerning the piecewise quadratic polynomial collocation approximations for the corresponding stationary problem

\int^{b}_{a}\frac{u(x)-u(y)}{|x-y|^{\gamma}}dy=f(x),\quad 0<\gamma<1,

which leads to the nonsymmetric and indefinite system [11, 19]

(4)

\begin{split}\mathcal{A}_{h}U_{h}=\eta_{h,\gamma}\cdot(F_{h}+K_{h})~{}~{}{\rm with}~{}~{}\mathcal{A}_{h}=\left(\begin{matrix}\mathcal{D}_{1}&0\\ 0&\mathcal{D}_{2}\end{matrix}\right)-\left(\begin{matrix}\mathcal{M}&\mathcal{Q}\\ \mathcal{P}&\mathcal{N}\end{matrix}\right)\end{split}

and $\eta_{h,\gamma}=\frac{(3-\gamma)(2-\gamma)(1-\gamma)}{h^{1-\gamma}}$ . Here the diagonal matrices $\mathcal{D}_{1}$ and $\mathcal{D}_{2}$ are given by

\mathcal{D}_{1}={\rm diag}\left(d_{1},d_{2},\ldots,d_{N-1}\right),~{}\mathcal{D}_{2}={\rm diag}\left(d_{\frac{1}{2}},d_{\frac{3}{2}},\ldots,d_{N-\frac{1}{2}}\right)

with

\begin{split}d_{\frac{i}{2}}=(3-\gamma)(2-\gamma)\left(\left(\frac{i}{2}\right)^{1-\gamma}+\left(N-\frac{i}{2}\right)^{1-\gamma}\right),\quad i=1,2,\cdots,2N-1.\end{split}

The square matrices $\mathcal{M}$ , $\mathcal{N}$ and rectangular matrices $\mathcal{P}$ , $\mathcal{Q}$ are, respectively, defined by

\begin{split}\mathcal{M}&={\rm toeplitz}\left(m_{0},m_{1},\ldots,m_{N-2}\right),~{}~{}\mathcal{N}={\rm toeplitz}\left(n_{0},n_{1},\ldots,n_{N-1}\right),\\ \mathcal{P}&={\rm toeplitz}([p_{0},p_{0},p_{1},\dots,p_{N-2}],[p_{0},p_{1},\dots,p_{N-2}]),\\ \mathcal{Q}&={\rm toeplitz}([q_{0},q_{1},\dots,q_{N-2}],[q_{0},q_{0},q_{1},\dots,q_{N-2}]).\end{split}

The corresponding coefficients are computed by $m_{0}=2(1+\gamma)$ , $n_{0}={(2-\gamma)2^{\gamma+1}}$ and

\begin{split}p_{0}&=4\left[\left(\frac{3}{2}\right)^{3-\gamma}-\left(\frac{1}{2}\right)^{3-\gamma}\right]-(3-\gamma)\left[\left(\frac{3}{2}\right)^{2-\gamma}+3\left(\frac{1}{2}\right)^{2-\gamma}\right],\\ q_{k}&=-8\left((k+1)^{3-\gamma}-k^{3-\gamma}\right)+4(3-\gamma)\left((k+1)^{2-\gamma}+k^{2-\gamma}\right),\,k\geq 0,\\ m_{k}&=4\left[(k+1)^{3-\gamma}\!-\!(k-1)^{3-\gamma}\right]-(3-\gamma)\left[(k+1)^{2-\gamma}+6k^{2-\gamma}+(k-1)^{2-\gamma}\right],~{}k\geq 1\end{split}

with $p_{k}=m_{k+\frac{1}{2}}$ , $n_{k}=q_{k-\frac{1}{2}}$ for $k\geq 1$ .

The boundary data $K$ is given by

\begin{split}K&=\left(\eta_{1},\eta_{2},\cdots,\eta_{N-1},\eta_{\frac{1}{2}},\eta_{\frac{3}{2}},\cdots,\eta_{N-\frac{1}{2}}\right)^{T}u_{0}\\ &\quad+\left(\eta_{N-1},\eta_{N-2},\cdots,\eta_{1},\eta_{N-\frac{1}{2}},\eta_{N-\frac{3}{2}},\cdots,\eta_{\frac{1}{2}}\right)^{T}u_{N}\end{split}

with $\eta_{\frac{1}{2}}={{(2-\gamma)(1-\gamma)}2^{\gamma-1}}$ and

\eta_{i/2}=4\left[i^{3-\gamma}\!-\!(i-1)^{3-\gamma}\right]-(3-\gamma)\left[3i^{2-\gamma}+\left(i-1\right)^{2-\gamma}-(2-\gamma)i^{1-\gamma}\right],~{}~{}2\leq i\leq 2N-1.

2.2 Application in Peridynamic model

Let us consider the following time-dependent peridynamic/nonlocal volume-constrained diffusion problem [17, 26, 40]

(5)

\left\{\begin{split}u_{t}-\mathcal{L}_{\delta}u&=f&~{}~{}{\rm on}&~{}~{}\Omega,\,t>0,\\ u(x,0)&=u_{0}&~{}~{}{\rm on}&~{}~{}\Omega\cup\Omega_{\mathcal{I}},\\ u&=g&~{}~{}{\rm on}&~{}~{}\Omega_{\mathcal{I}},t>0.\end{split}\right.

The nonlocal operator $\mathcal{L}_{\delta}$ is defined by [26]

\begin{split}\mathcal{L}_{\delta}u(x)=\int_{B_{\delta}(x)}\gamma_{\delta}(|x-y|)\left[u(y)-u(x)\right]dy\ \ ~{}~{}\forall x\in\Omega\end{split}

with $B_{\delta}(x)=\{y\in\mathbb{R}:|y-x|<\delta\}$ denoting a neighborhood centered at $x$ of radius $\delta$ , which is the horizon parameter and represents the size of nonlocality; the symmetric nonlocal kernel is defined as $\gamma_{\delta}(|x-y|)=0$ if $y\not\in B_{\delta}(x)$ .

Before we start to discuss this problem we shall briefly review few preliminary notions regarding the piecewise quadratic polynomial collocation approximations for the corresponding stationary problem

(6)

\left\{\begin{split}-\mathcal{L}_{\delta}u&=f~{}~{}{\rm on}~{}~{}\Omega,\\ u&=g~{}~{}\,{\rm on}~{}~{}\Omega_{\mathcal{I}},\end{split}\right.

where $u=g$ denotes a volumetric constraint imposed on a volume $\Omega_{\mathcal{I}}$ that has a nonzero volume and is made to be disjoint from $\Omega$ . In order to keep the expression simple, below we assume the unit interval $\Omega$ with the volumetric constraint domain $\Omega_{\mathcal{I}}=[-\delta,0]\cup[1,1+\delta]$ , but everything can be shifted to an arbitrary interval $[a,b]$ . For convenience, we focus on the special case where the kernel $\gamma_{\delta}(s)$ is taken to be a constant, i.e., $\gamma_{\delta}(s)=3\delta^{-3}$ [17, 26, 44]. More general kernel types [26, 44] can be studied in a similar manner.

Now, we introduce and discuss the discretization scheme of (6). Let the ratio $r=\big{\lfloor}\delta/h\big{\rfloor}\geq 1$ if $\delta\geq h$ and $r=\big{\lceil}\delta/h\big{\rceil}=1$ if $\delta\leq h$ . Let the mesh points $x_{i}=ih$ , $h=1/N$ , $i\in\mathcal{N}$ and

\begin{split}&\mathcal{N}=\left\{-r,{-r+\frac{1}{2}},\ldots,0,\frac{1}{2},\ldots,{N-\frac{1}{2}},{N},\ldots,{N+r-\frac{1}{2}},N+r\right\},\\ &\mathcal{N}_{in}=\left\{{\frac{1}{2}},1,\frac{3}{2},\ldots,N-\frac{1}{2}\right\},~{}~{}\mathcal{N}_{out}=\mathcal{N}\setminus\mathcal{N}_{in}.\end{split}

Let $u_{i}$ be the approximated value of $u(x_{i})$ and $f_{i}=f(x_{i})$ . Denote $I_{m}=\left[(m-1)h,mh\right]$ , $1\leq m\leq r$ , and $I_{r+1}=[rh,\delta]$ .

2.2.1 Nonsymmetric indefinite block-structured dense systems

By the piecewise quadratic polynomial collocation (2), it is easy to check that the standard collocation method of stationary problem (6) has the following form [21]

(7)

\left\{\begin{split}-\mathcal{L}_{\delta}u_{i}&=f_{i},~{}~{}i\in\mathcal{N}_{in},\\ u_{i}&=g_{i},~{}~{}i\in\mathcal{N}_{out}.\end{split}\right.

For convenience of implementation, we use the matrix form of the grid functions

\begin{split}U_{h}=[u_{1},u_{2},\ldots,u_{N-1},u_{\frac{1}{2}},u_{\frac{3}{2}},\ldots,u_{N-\frac{1}{2}}]^{T}~{}~{}~{}{\rm and}~{}~{}~{}F_{h}=[f_{1},f_{2},\ldots,f_{N-1},f_{\frac{1}{2}},f_{\frac{3}{2}},\ldots,f_{N-\frac{1}{2}}]^{T}.\end{split}

We next deal with the boundary conditions to ensure that the proposed fast AMG will have a $\mathcal{O}(N\mbox{log}N)$ complexity. Let the components of the left boundary ${}^{L}g^{v}$ (or right boundary ${}^{R}g^{v}$ ) at integer nodes and the left boundary ${}^{L}g^{w}$ (or right boundary ${}^{R}g^{w}$ ) at semi-integer nodes be

\begin{split}{}^{L}g^{v}=&\left(g_{-r},g_{-r+1},\ldots,g_{-1},g_{0}\right)^{T},~{}~{}{}^{R}g^{v}=\left(g_{N},g_{N+1},\ldots,g_{N+r}\right)^{T},\\ {}^{L}g^{w}=&\left(g_{-r+\frac{1}{2}},g_{-r+\frac{3}{2}},\ldots,g_{-\frac{1}{2}}\right)^{T},~{}~{}{}^{R}g^{w}=\left(g_{N+\frac{1}{2}},g_{N+\frac{3}{2}},\ldots,g_{N+r-\frac{1}{2}}\right)^{T}.\end{split}

Let $F^{v}=[f_{1},f_{2},\ldots,f_{N-1}]^{T}$ , $F^{w}=[f_{\frac{1}{2}},f_{\frac{3}{2}},\ldots,f_{N-\frac{1}{2}}]^{T}$ and

\begin{split}w^{A}=&\left(a_{1},a_{2},\ldots,a_{r},0\right)^{T},~{}~{}w^{B}=\left(a_{\frac{3}{2}},a_{\frac{5}{2}},\ldots,a_{r-\frac{1}{2}},0\right)^{T},\\ w^{C}=&\left(c_{0},c_{1},\ldots,c_{r}\right)^{T},~{}~{}w^{D}=\left(d_{1},d_{2},\ldots,d_{r}\right)^{T}.\end{split}

Denote

{}^{L}F^{v}_{j}=\sum^{r+1}_{i=j}{}^{L}g^{v}_{i}w^{A}_{r+1+j-i}+\sum^{r}_{i=j}{}^{L}g^{w}_{i}w^{B}_{r+j-i},~{}~{}~{}{}^{R}F^{v}_{j}=\sum^{r+2-j}_{i=1}{}^{R}g^{v}_{i}w^{A}_{j-1+i}+\sum^{r+1-j}_{i=1}{}^{R}g^{w}_{i}w^{B}_{j-1+i}

and

(8)

F^{v}_{j}=F^{v}_{j}-{}^{L}F^{v}_{j},~{}~{}~{}F^{v}_{N-j}=F^{v}_{N-j}-{}^{R}F^{v}_{j},~{}~{}j=1,\dots,r.

Let

\begin{split}{}^{L}F^{w}_{j}&=\sum^{r+1}_{i=j}{}^{L}g^{v}_{i}w^{C}_{r+1+j-i}+\sum^{r}_{i=j}{}^{L}g^{w}_{i}w^{D}_{r+j-i},~{}~{}~{}{}^{L}F^{w}_{r+1}={}^{L}g^{v}_{r+1}w^{C}_{r+1},\\ {}^{R}F^{w}_{j}&=\sum^{r+2-j}_{i=1}{}^{R}g^{v}_{i}w^{C}_{j-1+i}+\sum^{r+1-j}_{i=1}{}^{R}g^{w}_{i}w^{D}_{j-1+i},~{}~{}~{}{}^{R}F^{w}_{r+1}={}^{R}g^{v}_{1}w^{C}_{r+1}\end{split}

and

(9)

F^{w}_{j}=F^{w}_{j}-{}^{L}F^{w}_{j},~{}~{}~{}F^{w}_{N-j}=F^{w}_{N-j}-{}^{R}F^{w}_{j},~{}~{}j=1,\dots,r+1.

From (8) and (9), the numerical scheme (7) can be recast as [20, 21]

(10)

\mathcal{A}^{N}_{h}U_{h}=\eta_{h}\cdot F^{N}_{h}=\left[\begin{array}[]{c}F^{v}\\ F^{w}\end{array}\right]~{}~{}{\rm with}~{}~{}\mathcal{A}^{N}_{h}=\left[\begin{array}[]{cc}A&B\\ C&D\end{array}\right],\eta_{h}=2\delta^{3}/h.

Note that the above Toeplitz matrices $A\in\mathbb{R}^{{(N-1)}\times(N-1)}$ , $B\in\mathbb{R}^{{(N-1)}\times N}$ , $C\in\mathbb{R}^{{N}\times(N-1)}$ and $D\in\mathbb{R}^{{N}\times N}$ are defined by

\begin{split}A&={\rm toeplitz}\left(a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-2)}\right),\\ B&={\rm toeplitz}\left(\left[a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-1-r)}\right],\left[a_{\frac{1}{2}},a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-r-1)}\right]\right),\\ C&={\rm toeplitz}\left([c_{0},c_{0},c_{1},c_{2},\cdots,c_{r},\textbf{0}_{1\times(N-r-2)}],[c_{0},c_{1},c_{2},\cdots,c_{r},\textbf{0}_{1\times(N-r-2)}]\right),\\ D&={\rm toeplitz}\left([d_{0},d_{1},d_{2},\cdots,d_{r},\textbf{0}_{1\times(N-r-1)}]\right)\end{split}

with the coefficients

\begin{split}&a_{0}=12r-2,\quad a_{m}=-2,\quad a_{r}=-1,~{}~{}1\leq m\leq r-1,\\ &a_{m+\frac{1}{2}}=-4,~{}~{}0\leq m\leq r-1,\end{split}

and

\begin{split}&c_{m}=-2,\quad c_{r-1}=-\frac{9}{4},\quad c_{r}=\frac{1}{4},~{}~{}0\leq m\leq r-2,\\ &d_{0}=12r-4,\quad d_{m}=-4,\quad d_{r}=-2,~{}~{}1\leq m\leq r-1.\\ \end{split}

2.2.2 Symmetric positive definite block-structured dense systems

We observe that the discrete maximum principle is not satisfied for the above nonsymmetric indefinite system (10), which might be trickier for the stability analysis of the high-order numerical schemes [23, 30]. As a consequence, the shifted-symmetric piecewise quadratic polynomial collocation method for peridynamic /nonlocal model (6) has been considered in [20, 21], which satisfies the discrete maximum principle and has the symmetric positive definite block-structured dense systems. Namely, the shifted-symmetric system of (7) can be recast as

(11)

\mathcal{A}^{S}_{h}U_{h}=\eta_{h}\cdot F_{h}^{S}~{}~{}{\rm with}~{}~{}\mathcal{A}^{S}_{h}=\left[\begin{array}[]{cc}A&B\\ B^{T}&\widehat{A}\end{array}\right],\eta_{h}=2\delta^{3}/h.

Here the function $F^{S}_{h}$ is computed as done in (8) and (9), the Toeplitz matrices $A\in\mathbb{R}^{{(N-1)}\times(N-1)}$ , $B\in\mathbb{R}^{{(N-1)}\times N}$ , and $\widehat{A}\in\mathbb{R}^{{N}\times N}$ are defined by

\begin{split}A&={\rm toeplitz}(a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-2)}),\\ B&={\rm toeplitz}\left([a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-1-r)}],[a_{\frac{1}{2}},a_{\frac{1}{2}},a_{\frac{3}{2}},\cdots,a_{r-\frac{1}{2}},\textbf{0}_{1\times(N-r-1)}]\right),\\ \widehat{A}&={\rm toeplitz}([a_{0},a_{1},a_{2},\cdots,a_{r},\textbf{0}_{1\times(N-r-1)}])\end{split}

with

(12)

\begin{split}&a_{0}=12r-2,\quad a_{m}=-2,\quad a_{r}=-1,~{}~{}1\leq m\leq r-1,\\ &a_{m+\frac{1}{2}}=-4,~{}~{}0\leq m\leq r-1.\end{split}

3 Fast AMG for block-structured dense systems

Multigrid methods are among the most efficient iterative methods for solving large scale systems of equations [37, 48]. However, it is not easy to build restriction and prolongation operators when using AMG methods for nonsymmetric sparse systems [10, 31]. To the best of our knowledge, there is no fast AMG for block-structured dense linear systems of the type in (1), since the special prolongation/transfer operators are not easy to be employed with fast Fourier transform. Here, the simple (traditional) transfer operator are employed to handle such block-structured dense systems to ensure a fast AMG showing a $\mathcal{O}(N\mbox{log}N)$ complexity.

3.1 Multigrid methods

Let us firstly review the basic multigrid technique when applied to a scalar algebraic linear system, having in mind that our target is the efficient solution of the block-structured dense linear systems as those reported in (1). Let the finest mesh points $x_{i}=a+ih$ , $h=(b-a)/N$ , $N=2^{K}$ with $\Omega=(a,b)$ . Define the multiple level of grids

\mathfrak{B}^{k}=\left\{x^{k}_{i}=a+\frac{i}{2^{k}}(b-a),i=1:N_{k}\right\}~{}{\rm with}~{}N_{k}=2^{k}-1,k=1:K,

where $\mathfrak{B}^{k}$ represents not only the grid with grid spacing $h_{k}=2^{K-k}h$ , but also the space of vectors defined on that grid.

Given a algebraic system

(13)

A_{h}u^{h}=b^{h}.

We define a sequence of subsystems on different levels

A_{k}u^{k}=b^{k},u^{k}\in\mathfrak{B}^{k},\quad k=1{,\ldots,}K.

Here $K$ is the total number of levels, with $k=K$ being the finest level, i.e., $A_{K}=A_{h}$ .

The traditional restriction operator $I^{k-1}_{k}:\mathds{R}^{N_{k}}\rightarrow\mathds{R}^{N_{k-1}}$ and prolongation operator $I^{k}_{k-1}:\mathds{R}^{N_{k-1}}\rightarrow\mathds{R}^{N_{k}}$ are defined by [38, pp. 438-454]

I_{k}^{k-1}=\frac{1}{4}\left[\begin{array}[]{cccccccc}1&2&1&&&&&\\ &&1&2&1&&&\\ &&&&\cdots&\cdots&&\\ &&&&&1&2&1\end{array}\right]_{N_{k-1}\times N_{k}}\quad{\rm and}\quad I^{k}_{k-1}=2\left(I_{k}^{k-1}\right)^{T},

which should be convenient and efficient for block-structured dense linear systems as those in (1), using AMG and fast Fourier transforms. Let

\nu^{k-1}=I_{k}^{k-1}\nu^{k}~{}~{}{\rm with}~{}~{}\nu^{k-1}=\frac{1}{4}\left(\nu^{k}_{2i-1}+2\nu^{k}_{2i}+\nu^{k}_{2i+1}\right),~{}i=1:N_{k},

and

\nu^{k}=I^{k}_{k-1}\nu^{k-1}.

It may be more useful to define the linear system by Galerkin projection in the AMG method, where the coarse grid problem is defined by

(14)

A_{k-1}=I_{k}^{k-1}A_{k}I^{k}_{k-1},

and the intermediate $\left(k,k-1\right)$ coarse grid correction operator is

T_{k}=I_{k}-I^{k}_{k-1}A_{k-1}^{-1}I^{k-1}_{k}A_{k}.

We choose the damped Jacobi iterative operator

(15)

S_{k}=I-\omega D_{k}^{-1}A_{k}\

with a weighting factor $\omega$ , and $D_{k}$ being the diagonal of $A_{k}$ .

A multigrid process is intrinsically to define a sequence of operators $B_{k}:\mathfrak{B}^{k}\rightarrow\mathfrak{B}^{k}$ , which is an approximate inverse of $A_{k}$ in the sense that $\left|\left|I-B_{k}A_{k}\right|\right|$ is small, bounded away from one. The V-cycle Multigrid Algorithm 1 can be seen in [47]. If $k=2$ , the resulting Algorithm 1 is two-grid method (TGM).

Algorithm 1 V-cycle Multigrid Algorithm: Define

B_{1}=A_{1}^{-1}

. Assume that

B_{k-1}:\mathfrak{B}^{k-1}\mapsto\mathfrak{B}^{k-1}

is defined. We shall now define

B_{k}:\mathfrak{B}^{k}\mapsto\mathfrak{B}^{k}

as an approximate iterative solver for the equation associated with

A_{k}u^{k}=b^{k}

1: Pre-smooth: Let

S_{k,\omega}

be defined by (15) and

u^{k}_{0}=\mathbf{0}

, for

l=1:m_{1}

u^{k}_{l}=u^{k}_{l-1}+S_{k,\omega_{pre}}(b^{k}-A_{k}u^{k}_{l-1})

2: Coarse grid solution: Denote

e^{k-1}\in\mathfrak{B}^{k-1}

as the approximate solution of the residual equation

A_{k-1}e=I^{k-1}_{k}(b^{k}-A_{k}u^{k}_{m_{1}})

with the iterator

B_{k-1}

e^{k-1}=B_{k-1}I^{k-1}_{k}(b^{k}-A_{k}u^{k}_{m_{1}})

3: Correction:

u^{k}_{m_{1}+1}=u^{k}_{m_{1}}+I^{k}_{k-1}e^{k-1}

4: Post-smooth: for

l=m_{1}+2:m_{1}+m_{2}

u^{k}_{l}=u^{k}_{l-1}+S_{k,\omega_{post}}(b^{k}-A_{k}u^{k}_{l-1})

5: Define:

B_{k}b^{k}=u^{k}_{m_{1}+m_{2}}

The basic AMG idea for solving the block-structured dense linear systems in (1) is the same as in the scalar case (13). Define a sequence of block-structured subsystems on different levels

\mathcal{A}_{k}u^{k}=b^{k},~{}~{}u^{k}\in\mathfrak{M}^{k},\quad k=1{,\ldots,}K

with the multiple level of grids

\mathfrak{M}^{k}=\left\{x^{k}_{i/2}=a+\frac{i/2}{2^{k}}(b-a),i=1:2N_{k}+1\right\}~{}{\rm with}~{}N_{k}=2^{k}-1,k=1:K.

Here

\mathcal{A}_{k}=\left[\begin{array}[]{cc}A^{(k)}&B^{(k)}\\ C^{(k)}&D^{(k)}\end{array}\right],~{}u^{k}=\left[\begin{array}[]{c}v^{k}\\ w^{k}\end{array}\right],~{}b^{k}=\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right],

and $\mathcal{D}_{k}$ is the diagonal matrix of $\mathcal{A}_{k}$ . The block-structured dense V-cycle Multigrid method is developed in Algorithm 2.

Algorithm 2 Block-structured dense V-cycle Multigrid method: Define

\mathcal{B}_{1}=\mathcal{A}_{1}^{-1}

. Assume that

\mathcal{B}_{k-1}:\mathfrak{M}^{k-1}\mapsto\mathfrak{M}^{k-1}

is defined. We shall now define

\mathcal{B}_{k}:\mathfrak{M}^{k}\mapsto\mathfrak{M}^{k}

as an approximate iterative solver for the equation associated with

\mathcal{A}_{k}u^{k}=b^{k}

1: Pre-smooth: Let

\mathcal{S}_{k,\omega}

be defined by

\mathcal{S}_{k}=I-\omega\mathcal{D}_{k}^{-1}\mathcal{A}_{k}

and

\left[\begin{array}[]{c}v^{k}_{0}\\ w^{k}_{0}\end{array}\right]=\mathbf{0}

, for

l=1:m_{1}

\left[\begin{array}[]{c}v^{k}_{l}\\ w^{k}_{l}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]+\mathcal{S}_{k,\omega_{pre}}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]\right)

2: Coarse grid solution: Denote

\mathbf{e}^{k-1}=\left[\begin{array}[]{c}e^{k-1}_{v}\\ e^{k-1}_{w}\end{array}\right]\in\mathfrak{M}_{k-1}

as the approximate solution of the residual equation

\mathcal{A}_{k-1}\mathbf{e}=I^{k-1}_{k}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{m_{1}}\\ w^{k}_{m_{1}}\end{array}\right]\right)

with the iterator

\mathcal{B}_{k-1}

an approximate inverse of

\mathcal{A}_{k-1}

\left[\begin{array}[]{c}e^{k-1}_{u}\\ e^{k-1}_{v}\end{array}\right]=\mathcal{B}_{k-1}I^{k-1}_{k}\left(\left[\begin{array}[]{c}f_{k}\\ g_{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}u^{k}_{m_{1}}\\ v^{k}_{m_{1}}\end{array}\right]\right)

3: Correction:

\left[\begin{array}[]{c}v^{k}_{m_{1}+1}\\ w^{k}_{m_{1}+1}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{m_{1}}\\ w^{k}_{m_{1}}\end{array}\right]+I^{k}_{k-1}\left[\begin{array}[]{c}e^{k-1}_{v}\\ e^{k-1}_{w}\end{array}\right]

4: Post-smooth: for

l=m_{1}+2:m_{1}+m_{2}

\left[\begin{array}[]{c}v^{k}_{l}\\ w^{k}_{l}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]+\mathcal{S}_{k,\omega_{post}}\left(\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]-\mathcal{A}_{k}\left[\begin{array}[]{c}v^{k}_{l-1}\\ w^{k}_{l-1}\end{array}\right]\right)

5: Define:

\mathcal{B}_{k}\left[\begin{array}[]{c}b_{f}^{k}\\ b_{g}^{k}\end{array}\right]=\left[\begin{array}[]{c}v^{k}_{m_{1}+m_{2}}\\ w^{k}_{m_{1}+m_{2}}\end{array}\right]

3.2 Fast Fourier transform for block-structured dense systems

We first review the Toeplitz matrix and the circulant matrix by fast Fourier transform with $\mathcal{O}(n\mbox{log}n)$ complexity, which will be used later. Let $n\times n$ Toeplitz matrix $T_{n}(c)$ be defined by [9, 12]

T_{n}(c):=[c_{j-k}]_{j,k=1}^{n}=\left[\begin{matrix}c_{0}&c_{-1}&\cdots&c_{-(n-1)}\\ c_{1}&c_{0}&\cdots&c_{-(n-2)}\\ \vdots&\vdots&\ddots&\vdots\\ c_{n-1}&c_{n-2}&\cdots&c_{0}\end{matrix}\right],

and the circulant matrix be the periodic cousin of Toeplitz matrix. Denote the circulant matrix $C_{n}$ whose first column is $\widetilde{c}=(c_{0},c_{1},\dots,c_{n-1})^{\rm T}$ , namely,

C_{n}:=\left[\begin{matrix}c_{0}&c_{n-1}&c_{n-2}&\cdots&c_{2}&c_{1}\\ c_{1}&c_{0}&c_{n-1}&\cdots&c_{3}&c_{2}\\ c_{2}&c_{1}&c_{0}&\ddots&\ddots&c_{3}\\ \vdots&\vdots&\ddots&\ddots&\ddots&\vdots\\ c_{n-2}&c_{n-3}&\ddots&\ddots&c_{0}&c_{n-1}\\ c_{n-1}&c_{n-2}&c_{n-3}&\cdots&c_{1}&c_{0}\end{matrix}\right].

Moreover, we set $\omega_{n}=\mbox{exp}(2\pi i/n)$ and define the unitary Fourier matrix

F_{n}=\frac{1}{\sqrt{n}}\left[\begin{array}[]{llllr}1&1&1&\cdots&1\\ 1&\omega_{n}&\omega_{n}^{2}&\cdots&\omega_{n}^{n-1}\\ 1&\omega_{n}^{2}&\omega_{n}^{4}&\cdots&\omega_{n}^{2(n-1)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&\omega_{n}^{n-1}&\omega_{n}^{2(n-1)}&\cdots&\omega_{n}^{(n-1)(n-1)}\end{array}\right]

with $i$ being the imaginary unit. Therefore, a circulant matrix can be diagonalized by the Fourier matrix $F_{n}$ , i.e.,

C_{n}=F_{n}^{-1}\mbox{diag}(\sqrt{n}F_{n}\widetilde{c})F_{n}=F_{n}^{*}\mbox{diag}(\sqrt{n}F_{n}\widetilde{c})F_{n},

where $\mbox{diag}(\sqrt{n}F_{n}\widetilde{c})$ is a diagonal matrix holding the eigenvalues of $C_{n}$ . Moreover, for any given vector $v$ , we can determine [13]

C_{n}v={\rm ifft}\left({\rm fft}(\widetilde{c}).*{\rm fft}(v)\right)

with $\mathcal{O}(n\mbox{log}n)$ operations by the fast Fourier transform (FFT) of the first column $\widetilde{c}$ of $C_{n}$ .

Then, for any $n$ -by- $1$ vector $\bf x$ , the multiplication $T_{n}\bf x$ can also be computed by FFTs with the computational count of $\mathcal{O}\left(n\log n\right)$ arithmetic operations [12, p. 12]. More concretely, we take a $2n$ -by- $2n$ circulant matrix $C_{n}$ with $T_{n}$ embedded inside as follows:

(16)

\left[\begin{array}[]{cc}T_{n}&\ast\\ \ast&T_{n}\end{array}\right]\left[\begin{array}[]{c}\bf x\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}T_{n}\bf x\\ {\ddagger}\end{array}\right]~{}~{}~{}~{}{\rm with}~{}~{}C_{n}=\left[\begin{array}[]{cc}T_{n}&\ast\\ \ast&T_{n}\end{array}\right].

3.2.1 Fast Fourier transform for block-structured dense systems at the finest level

Let us consider the fast Fourier transform algorithm for block-structured dense systems (1) at the finest level, namely,

(17)

\mathcal{A}_{h}u=\left(\begin{array}[]{cc}A&B\\ C&D\end{array}\right)\left(\begin{array}[]{c}v\\ w\end{array}\right)=\left(\begin{array}[]{c}Av+Bw\\ Cv+Dw\end{array}\right)

with Toeplitz matrices $A\in\mathbb{R}^{M\times M}$ , $B\in\mathbb{R}^{M\times N}$ , $C\in\mathbb{R}^{N\times M}$ and $D\in\mathbb{R}^{N\times N}$ , $M<N$ .

It is well known that most of the early works on matrix-vector multiplication with Toeplitz algebraic system were focused on squared matrices by Fast fourier transform (FFT) [12, 13]. However, the considered technique can not be directly applied to block-structured dense systems of the form (17), since the related structures contain rectangular the matrices $B$ and $C$ .

We next design/develop the FFT algorithm for the rectangular matrices $B$ and $C$ in (17) based on the idea of [11, 12], which realizes the computational count of $\mathcal{O}(N\log N)$ arithmetic operations and the required storage $\mathcal{O}(N)$ . Let the rectangular matrix $B\in\mathbb{R}^{M\times N}$ be given by

\begin{split}B=\left[\begin{matrix}b_{0}&b_{1}&\cdots&b_{J}&\cdots&\cdots&\cdots&b_{N-1}\\ b_{-1}&b_{0}&b_{1}&\ddots&\ddots&&&\vdots\\ b_{-2}&b_{-1}&b_{0}&\ddots&\ddots&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&\ddots&\ddots&\vdots\\ b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}&b_{1}&\cdots&b_{J}\end{matrix}\right]_{M\times N}~{}~{}{\rm with}~{}~{}J=N-M.\end{split}

We embed the matrix $B\in\mathbb{R}^{M\times N}$ into the following $N$ -by- $N$ Toeplitz matrix $B_{T}$ ,

\begin{split}B_{T}=\left[\begin{matrix}b_{0}&b_{1}&\cdots&b_{J}&\cdots&\cdots&\cdots&b_{N-1}\\ b_{-1}&b_{0}&b_{1}&\ddots&\ddots&&&\vdots\\ b_{-2}&b_{-1}&b_{0}&\ddots&\ddots&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&\ddots&\ddots&\vdots\\ b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}&b_{1}&\cdots&b_{J}\\ \hline\cr 0&\ddots&&\ddots&\ddots&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&&\ddots&\ddots&\ddots&b_{1}\\ 0&\cdots&0&b_{-M+1}&\cdots&b_{-2}&b_{-1}&b_{0}\\ \end{matrix}\right]_{N\times N}\end{split}.

Using (16) and fast Fourier transform, it implies that $Bw$ is the first $M$ -rows of $B_{T}w$ with

\left[\begin{array}[]{cc}B_{T}&\ast\\ \ast&B_{T}\end{array}\right]\left[\begin{array}[]{c}w\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}B_{T}w\\ {\ddagger}\end{array}\right].

Let the rectangular matrix $C\in\mathbb{R}^{N\times M}$ be defined by

\begin{split}C=\left[\begin{matrix}c_{0}&c_{1}&\cdots&c_{M-1}\\ c_{-1}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&c_{1}\\ c_{-J}&\cdots&c_{-1}&c_{0}\\ \vdots&\ddots&\ddots&c_{-1}\\ \vdots&&\ddots&\vdots\\ c_{-N+1}&\cdots&\cdots&c_{-J}\end{matrix}\right]_{N\times M}~{}~{}{\rm with}~{}~{}J=N-M.\end{split}

Similar, we embed matrix $C\in\mathbb{R}^{N\times M}$ into the following $N$ -by- $N$ Toeplitz matrix $C_{T}$ ,

\begin{split}C_{T}=\left[\begin{matrix}c_{0}&c_{1}&\cdots&c_{M-1}\\ c_{-1}&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&c_{1}\\ c_{-J}&\cdots&c_{-1}&c_{0}\\ \vdots&\ddots&\ddots&c_{-1}\\ \vdots&&\ddots&\vdots\\ c_{-N+1}&\cdots&\cdots&c_{-J}\end{matrix}\right|\left.\begin{matrix}0&\cdots&0\\ c_{M-1}&\ddots&\vdots\\ \vdots&\ddots&0\\ c_{1}&\cdots&c_{M-1}\\ c_{0}&\ddots&\vdots\\ \ddots&\ddots&c_{1}\\ \cdots&c_{-1}&c_{0}\end{matrix}\right]_{N\times N}.\end{split}

From (16) and fast Fourier transform, we have

Cv=C_{T}{\bf v}~{}~{}{\rm with}~{}~{}\left[\begin{array}[]{cc}C_{T}&\ast\\ \ast&C_{T}\end{array}\right]\left[\begin{array}[]{c}{\bf v}\\ \bf 0\end{array}\right]=\left[\begin{array}[]{c}C_{T}{\bf v}\\ {\ddagger}\end{array}\right],{\bf v}=\left[\begin{array}[]{c}v\\ {\bf 0}_{J\times 1}\end{array}\right].

3.2.2 Fast Fourier transform for block-structured dense systems at the coarser level

In fact, embedding rectangular matrices $B$ and $C$ in (17) into the square Toeplitz matrices in subsubsection 3.2.1 is invalid, when using the Galerkin projection (14) in the AMG method, although this technique can be applied to compute the finest level for a fast AMG, since it does not keep block-structured Toeplitz properties. Next, we shall design the FFT for the block-structured matrix $\mathcal{A}_{k}$ multiplies vector at the coarser level, i.e.,

{\mathcal{A}}_{k}\left[\begin{array}[]{c}u^{k}\\ v^{k}\end{array}\right]=\left[\begin{array}[]{c}A^{(k)}u^{k}+B^{(k)}v^{k}\\[5.69054pt] C^{(k)}u^{k}+D^{(k)}v^{k}\end{array}\right].

In AMG, the coarse problem at the level $k<K$ is typically defined using the Galerkin approach, i.e., the coefficient matrix on the coarser grid can be computed by

(18)

\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}.

More concretely,

\mathcal{A}_{k-1}=\left[\begin{array}[]{cc}A^{(k-1)}&B^{(k-1)}\\[5.69054pt] C^{(k-1)}&D^{(k-1)}\end{array}\right]_{(2N_{k-1}+1)\times(2N_{k-1}+1)}=I_{k}^{k-1}\left[\begin{array}[]{cc}A^{(k)}&B^{(k)}\\[5.69054pt] C^{(k)}&D^{(k)}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)}I^{k}_{k-1}

with $I_{k}^{k-1}\in\mathbb{R}^{N_{k}\times(2N_{k}+1)}$ in (14).

It should be noted that $A^{(k)},B^{(k)},C^{(k)},D^{(k)}$ are Toeplitz matrices if $k=K$ , which corresponds to the block-structured dense system (17) at the fineast level with the computational count of $\mathcal{O}(N\log N)$ arithmetic operations. However, there is major difference for $\mathcal{A}_{k-1}$ at the coarser level, since it does not keep block-structured Toeplitz properties, see Example 3.1.

Example 3.1.

Choose the unit matrices $A^{(k)}\in\mathbb{R}^{7\times 7}$ , $D^{(k)}\in\mathbb{R}^{8\times 8}$ and the rectangular ones matrices $B^{(k)}\in\mathbb{R}^{7\times 8}$ , $C^{(k)}\in\mathbb{R}^{8\times 7}$ (all the elements are $1$ ). Using the Galerkin approximation (18), we deduce

\left[\begin{array}[]{cc}A^{(k-1)}&B^{(k-1)}\\[5.69054pt] C^{(k-1)}&D^{(k-1)}\end{array}\right]=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}=\frac{1}{8}\left[\begin{array}[]{ccc }6&1&0\\ 1&6&1\\ 0&1&6\\ \hline\cr 12&12&13\\ \hline\cr 16&16&16\\ 16&16&16\\ 16&16&16\\ \end{array}\right|\left.\begin{array}[]{c }12\\ 12\\ 13\\ \hline\cr 12\\ \hline\cr 5\\ 4\\ 4\\ \end{array}\right|\left.\begin{array}[]{ccc }16&16&16\\ 16&16&16\\ 16&16&16\\ \hline\cr 5&4&4\\ \hline\cr 6&1&0\\ 1&6&1\\ 0&1&6\\ \end{array}\right].

We next design fast Fourier transform for block-structured dense systems at the coarser level in AMG. Let

(19)

\mathcal{A}_{k}=\left[\begin{array}[]{ccc}A^{(k)}&\textbf{0}&\bar{B}^{(k)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k)}}&\textbf{0}&\bar{D}^{(k)}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)}+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k)}&\textbf{0}\\ {q^{(k)}}&o^{(k)}&{\zeta^{(k)}}\\ \textbf{0}&\xi^{(k)}&\textbf{0}\end{array}\right]_{(2N_{k}+1)\times(2N_{k}+1)},

where $A^{(k)},\bar{B}^{(k)},{\bar{C}^{(k)}},\bar{D}^{(k)}$ are the square matrices with

B^{(k)}=\left[\begin{array}[]{cc}p^{(k)}&\bar{B}^{(k)}\end{array}\right],~{}C^{(k)}=\left[\begin{array}[]{c}q^{(k)}\\[2.84526pt] \bar{C}^{(k)}\end{array}\right],~{}D^{(k)}=\left[\begin{array}[]{cc}o^{(k)}&\zeta^{(k)}\\[2.84526pt] \xi^{(k)}&\bar{D}^{(k)}\end{array}\right].

The symbol $o^{(k)}$ is a real number, $0$ denotes a zero number, and the bold 0 denotes a zero matrix/vector with the corresponding size. The coefficients $p^{(k)},\xi^{(k)}$ denote the column vectors and ${q^{(k)}},{\zeta^{(k)}}$ denote the row vectors. We may call it the Cross-splitting technique, since we main focus on the fast Fourier transform for Cross-type matrix in (19).

From (19), it yields

\begin{split}{\mathcal{A}}_{k}\left[\begin{array}[]{c}v^{k}\\ w^{k}\end{array}\right]=&\left(\left[\begin{array}[]{ccc}A^{(k)}&\textbf{0}&\bar{B}^{(k)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k)}}&\textbf{0}&\bar{D}^{(k)}\end{array}\right]+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k)}&\textbf{0}\\ {q^{(k)}}&o^{(k)}&{\zeta^{(k)}}\\ \textbf{0}&\xi^{(k)}&\textbf{0}\end{array}\right]\right)\left[\begin{array}[]{c}v^{k}\\ w^{k}_{o}\\ \bar{w}^{k}\end{array}\right]\\ =&\left[\begin{array}[]{c}A^{(k)}v^{k}+\bar{B}^{(k)}\bar{w}^{k}\\ 0\\ {\bar{C}^{(k)}}v^{k}+\bar{D}^{(k)}\bar{w}^{k}\end{array}\right]+\left[\begin{array}[]{ccc}p^{(k)}w^{k}_{o}\\ q^{(k)}v^{k}+o^{(k)}w^{k}_{o}+\zeta^{(k)}\bar{w}^{k}\\ \xi^{(k)}w^{k}_{o}\end{array}\right]~{}~{}{\rm with}~{}~{}w^{k}=\left[\begin{array}[]{c}w^{k}_{o}\\ \bar{w}^{k}\end{array}\right].\end{split}

Obviously, since $A^{(k)}$ , $\bar{B}^{(k)}$ , ${\bar{C}^{(k)}}$ , $\bar{D}^{(k)}$ are Toeplitz matrices, the computation of $A^{(k)}v^{k}$ , $\bar{B}^{(k)}\bar{w}^{k}$ , ${\bar{C}^{(k)}}v^{k}$ , and $\bar{D}^{(k)}\bar{w}^{k}$ by FFT needs $\mathcal{O}(N_{k}{\rm log}N_{k})$ , and with required storage $\mathcal{O}(N_{k})$ . For the Cross matrix, see $p^{(k)}$ , $q^{(k)}$ , $\zeta^{(k)}$ , $\xi^{(k)}$ and $o^{(k)}$ , results in $\mathcal{O}(N_{k})$ complexity and storage operations.

Let the stiffness matrix of the coarser level be

(20)

\mathcal{A}_{k-1}=\left[\begin{array}[]{ccc}A^{(k-1)}&\textbf{0}&\bar{B}^{(k-1)}\\ \textbf{0 }^{\prime}&0&\textbf{0 }^{\prime}\\ {\bar{C}^{(k-1)}}&\textbf{0}&\bar{D}^{(k-1)}\end{array}\right]_{N_{k}\times N_{k}}+\left[\begin{array}[]{ccc}\textbf{0}&p^{(k-1)}&\textbf{0}\\ {q^{(k-1)}}&o^{(k-1)}&{\zeta^{(k-1)}}\\ \textbf{0}&\xi^{(k-1)}&\textbf{0}\end{array}\right]_{N_{k}\times N_{k}}.

Then using (19), (20), and a Galerkin approach in (18), it is easy to obtain the following results.

Lemma 1.

Let $A^{(k)}=\{a_{i,j}^{(k)}\}_{i,j=1}^{N_{k}}$ with $a_{i,j}^{(k)}=a_{j-i}^{(k)}$ be a Toeplitz matrix in (19). Then the elements of $A^{(k-1)}$ in (20) can be computed by

8a_{0}^{(k-1)}=a_{-2}^{(k)}+4a_{-1}^{(k)}+6a_{0}^{(k)}+4a_{1}^{(k)}+a_{2}^{(k)},

and

\begin{split}8a_{i}^{(k-1)}&=a_{2i-2}^{(k)}+4a_{2i-1}^{(k)}+6a_{2i}^{(k)}+4a_{2i+1}^{(k)}+a_{2i+2}^{(k)},\\ 8a_{-i}^{(k-1)}&=a_{-2i-2}^{(k)}+4a_{-2i-1}^{(k)}+6a_{-2i}^{(k)}+4a_{-2i+1}^{(k)}+a_{-2i+2}^{(k)},~{}~{}~{}i\geq 1.\end{split}

Moreover, for the Cross matrix, the following terms can be defined

\begin{split}8p^{(k-1)}_{i}&=\left(a^{(k)}_{N_{k}-2i}+2a^{(k)}_{N_{k}-2i-1}+a^{(k)}_{N_{k}-2i-2}\right)+2\left(p^{(k)}_{2i-1}+2p^{(k)}_{2i}+p^{(k)}_{2i+1}\right)\\ &\quad+\left(b^{(k)}_{-2i+2}+2b^{(k)}_{-2i+1}+b^{(k)}_{-2i}\right),\\ 8q^{(k-1)}_{i}&=\left(a^{(k)}_{-N_{k}+2i}+2a^{(k)}_{-N_{k}+2i+1}+a^{(k)}_{-N_{k}+2i+2}\right)+2\left(q^{(k)}_{2i-1}+2q^{(k)}_{2i}+q^{(k)}_{2i+1}\right)\\ &\quad+\left(c^{(k)}_{2i-2}+2c^{(k)}_{2i-1}+c^{(k)}_{2i}\right),\\ 8o^{(k-1)}&=\left(a^{(k)}_{0}+2p^{(k)}_{N_{k}-1}+b^{(k)}_{-N_{k}+2}\right)+2\left(q^{(k)}_{N_{k}-1}+2o^{(k)}+\zeta^{(k)}_{1}\right)+\left(c^{(k)}_{N_{k}-2}+2\xi^{(k)}_{1}+d^{(k)}_{0}\right),\end{split}

and

\begin{split}8\xi^{(k-1)}_{i}&=\left(c^{(k)}_{N_{k}-2i}+2c^{(k)}_{N_{k}-2i-1}+c^{(k)}_{N_{k}-2i-2}\right)+2\left(\xi^{(k)}_{2i-1}+2\xi^{(k)}_{2i}+\xi^{(k)}_{2i+1}\right)\\ &\quad+\left(d^{(k)}_{-2i+2}+2d^{(k)}_{-2i+1}+d^{(k)}_{-2i}\right),\\ 8\zeta^{(k-1)}_{i}&=\left(b^{(k)}_{-N_{k}+2i}+2b^{(k)}_{-N_{k}+2i+1}+b^{(k)}_{-N_{k}+2i+2}\right)+2\left(\zeta^{(k)}_{2i-1}+2\zeta^{(k)}_{2i}+\zeta^{(k)}_{2i+1}\right)\\ &\quad+\left(d^{(k)}_{2i-2}+2d^{(k)}_{2i-1}+d^{(k)}_{2i}\right).\end{split}

3.3 The operation count and storage requirement

We now study the computation complexity by the fast Fourier transform and the required storage for the block-structured dense system (17) in AMG, arising from the nonlocal problems in Section 2.

From (17), we know that the matrix $\mathcal{A}_{h}$ is a block-structured Toeplitz-like system. Then, we only need to store the first (last) column, first (last) row and principal diagonal $\mathcal{A}_{h}$ , which have $\mathcal{O}(N)$ parameters, instead of the full matrix $\mathcal{A}_{h}$ with $N^{2}$ entries. From Lemmas 1, we know that $\{A_{k}\}$ is a Toeplitz-like-plus-Cross matrix with the sizes $2^{k-K}\mathcal{O}(N)$ storage. Adding these terms together, we have

\mbox{Storage}=\mathcal{O}(N)\cdot\left(1+\frac{1}{2}+\frac{1}{2^{2}}+\ldots+\frac{1}{2^{K-1}}\right)=\mathcal{O}(N).

Regarding the computational complexity, the matrix-vector product associated with the matrix $\mathcal{A}_{h}$ is a discrete convolution. On the other hand the cost of a direct product is $O(N)$ for the Cross matrix, while the cost of using the FFT would lead to $O(N\log(N)$ for computations involving a dense Toeplitz matrix as the one in (16). Thus, the total per V-cycle AMG operation count is

\mbox{Operation count}=\mathcal{O}(N\log N)\cdot\left(1+\frac{1}{2}+\frac{1}{2^{2}}+\ldots+\frac{1}{2^{K-1}}\right)=\mathcal{O}(N\log N).

4 Convergence of TGM for block-structured dense system (11)

The two-grid method (TGM) is rarely used in practice since the coarse grid operator may still be too large to be solved exactly. However, from a theoretical point of view, its study is useful as first step for evaluating the MGM convergence speed, whose analysis usually begins from that of the TGM [4, 3, 35, 37, 38, 39]. In the following we consider the convergence of the TGM for symmetric block-structured dense system (11). Since the matrix $\mathcal{A}:=\mathcal{A}_{h}:=\mathcal{A}_{K}$ is symmetric positive definite, we can define

(u,v)_{\mathcal{D}}=(\mathcal{D}u,v),\ (u,v)_{\mathcal{A}}=(\mathcal{A}u,v),\ (u,v)_{\mathcal{A}\mathcal{D}^{-1}\mathcal{A}}=(\mathcal{A}u,\mathcal{A}v)_{\mathcal{D}^{-1}}

with $\mathcal{D}$ is the diagonal matrix of $\mathcal{A}$ .

Lemma 2.

[47] Let $\mathcal{A}_{K}$ be a symmetric positive definite. Then

(1) the Jacobi method converges if and only if $2\mathcal{D}_{K}-\mathcal{A}_{K}$ is symmetric positive definite;

(2) the Damped Jacobi method converges if and only if $0<\omega<2/\lambda_{\max}(\mathcal{D}_{K}^{-1}\mathcal{A}_{K})$ .

Lemma 3.

[37, p. 84] Let $\mathcal{A}_{K}$ be a symmetric positive definite. If $\eta\leq\omega(2-\omega\eta_{0})$ with $\eta_{0}\geq\lambda_{\max}(\mathcal{D}_{K}^{-1}\mathcal{A}_{K})$ , then the damped Jacobi iteration with relaxation parameter $0<\omega<2/\eta_{0}$ satisfies

(21)

||\mathcal{S}_{K}\nu^{K}||_{\mathcal{A}_{K}}^{2}\leq||\nu^{K}||_{\mathcal{A}_{K}}^{2}-\eta||\mathcal{A}_{K}\nu^{K}||_{\mathcal{D}_{K}^{-1}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.

Lemma 4.

[37, p. 89] Let $\mathcal{A}_{K}$ be a symmetric positive definite matrix and $S_{K}$ satisfies (21) and

(22)

\min_{\nu^{K-1}\in\mathfrak{M}^{K-1}}||\nu^{K}-I_{K-1}^{K}\nu^{K-1}||_{\mathcal{D}_{K}}^{2}\leq\mu||\nu^{K}||_{\mathcal{A}_{K}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}

with $\mu>0$ independent of $\nu^{K}$ . Then, $\mu\geq\eta>0$ and the convergence factor of TGM satisfies

||\mathcal{S}_{K}\mathcal{T}_{K}||_{\mathcal{A}_{K}}\leq\sqrt{1-\eta/\mu}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.

Next we need to check the smoothing condition (21) and approximation property (22), respectively. We will use notion called weakly diagonal dominant, if the diagonal element of a matrix is at least as large as the sum of absolute value of off-diagonal elements in the same row or column.

Lemma 5.

[45, p. 23] Let $\mathcal{A}_{K}\in\mathbb{R}^{N\times N}$ be a symmetric matrix. If $\mathcal{A}_{k}$ is a strictly diagonally dominant or irreducibly weakly diagonally dominant matrix with positive real diagonal entries, then $\mathcal{A}_{K}$ is positive definite.

Lemma 6.

Let $\mathcal{A}_{K}:=\mathcal{A}^{S}_{h}$ be defined by (11). Then $\mathcal{A}_{K}$ is a weakly diagonally dominant symmetric matrix with positive entries on the diagonal and nonpositive off-diagonal entries.

Proof.

From (12), we have

a_{0}>0,\quad a_{m}<0,~{}~{}1\leq m\leq r\quad{\rm and}\quad a_{m+\frac{1}{2}}<0,~{}~{}0\leq m\leq r-1,

and

a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0.

The proof is completed. ∎

We known that the matrix $\mathcal{A}_{K}$ in (11) is reducible, since its graph is not connected [38, p. 91], which may lead to the matrix $\mathcal{A}_{K}$ is singular or semi-positive definite, see Lemmas 5 and 6.

Lemma 7.

Let $\mathcal{A}_{K}:=\mathcal{A}^{S}_{h}$ be defined by (11). Then $\mathcal{A}_{K}$ is a symmetric positive definite matrix with positive entries on the diagonal and nonpositive off-diagonal entries.

Proof.

Let $L_{N}={\rm tridiag}(-1,2,-1)\in\mathbb{R}^{N\times N}$ be the discrete Laplacian operator. Let

(23)

\mathcal{A}_{K}=\mathcal{A}_{\rm res}-a_{1}\mathcal{A}_{\rm main}~{}~{}{\rm with}~{}~{}\mathcal{A}_{\rm main}=\left[\begin{array}[]{cc}L_{N-1}&O\\ O&L_{N}\end{array}\right].

We can check the matrix $-a_{1}\mathcal{A}_{\rm main}$ with $a_{1}<0$ is positive definite. We next prove $\mathcal{A}_{\rm res}$ is a semi-positive definite matrix. From (23), it implies that the principal diagonal elements are positive and the off-diagonal are non-positive of the matrix $\mathcal{A}_{\rm res}$ . Then $\mathcal{A}_{\rm res}$ is still a diagonally dominant symmetric matrix. Thus, $\mathcal{A}_{\rm res}$ is a semi-positive matrix by the Gerschgorin circle theorem [29, p. 388]. The proof is completed. ∎

Lemma 8.

Let the discrete Laplacian-like operators $\{L^{M}_{j}\}_{j=1}^{M-1}\in\mathds{R}^{M\times M}$ and discrete block-structured Laplacian operators be, respectively, defined by

L^{M}_{j}=\left[\begin{matrix}2&\overset{j-1\text{ zeros }}{\overbrace{\cdots}}&-1&&\\ \ddots&\ddots&\ddots&\ddots&\\ -1&\ddots&\ddots&\ddots&-1\\ &\ddots&\ddots&\ddots&\ddots\\ &&-1&\ddots&2\end{matrix}\right]_{M\times M}~{}~{}{\rm and}~{}~{}\mathcal{L}_{j}=\left[\begin{array}[]{cc}{L}^{M}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right].

Then the matrices

\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1},~{}~{}~{}~{}\left(\frac{2}{l}\sum\limits_{j=1}^{l}\mathcal{L}_{j}\right)-\mathcal{L}_{1}~{}~{}{\rm and}~{}~{}2\mathcal{L}_{1}-L^{M+N}_{1}

are positive definite.

Proof.

The first results of this lemma can be seen in Lemma 3.10 of [17], which implies that the second one is also satisfied, since

\begin{split}\left(\frac{2}{l}\sum\limits_{j=1}^{l}\mathcal{L}_{j}\right)-\mathcal{L}_{1}&=\frac{2}{l}\sum^{l}_{j=1}\left[\begin{array}[]{cc}{L}^{M}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right]-\left[\begin{array}[]{cc}{L}^{M}_{1}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{1}\end{array}\right]\\ &=\left[\begin{array}[]{cc}\left(\frac{2}{l}\sum^{l}_{j=1}{L}^{M}_{j}\right)-{L}^{M}_{1}&~{}\mathbf{0}\\ \mathbf{0}&~{}\left(\frac{2}{l}\sum^{l}_{j=1}L^{N}_{j}\right)-L^{N}_{1}\end{array}\right].\end{split}

On the other hand, we can check that $2\mathcal{L}_{1}-L^{M+N}_{1}$ is an irreducible and weakly diagonally dominant symmetric matrix, which means that it is positive definite by Lemma 5. The proof is completed. ∎

Remark 4.1.

In order to understand the spectral features of the matrices considered in Lemma 8, we can adopt the analysis via the related generating functions, since all the matrices in Lemma 8 are of real symmetric banded Toeplitz type, so the admit real-valued trigonometric polynomials as generating functions (see [39] and references therein): on the other hand the matrices $\mathcal{L}_{j}$ are block diagonal and hence their spectral analysis reduces to the Toeplitz setting. For instance, according to the notation in [39], $L^{M}_{j}=T_{M}(2-2\cos(j\theta))$ that is the function $2-2\cos(j\theta)$ is the generating function of $L^{M}_{j}$ . From classical results, we know that $T_{M}(f)$ is positive definite for any matrix-size $M$ if $f$ is essentially bounded and nonnegative, with positive essential supremum. In the present setting, the maximum of $2-2\cos(j\theta)$ is 4 and its minimum is zero and hence $L^{M}_{j}=T_{M}(2-2\cos(j\theta))$ is positive definite. Not only this: if the nonnegative generating function $f$ has a unique zero of order $\alpha>0$ then the minimal eigenvalue of $T_{M}(f)$ is positive converges monotonically to zero as $c/M^{\alpha}$ with $c$ depending on the second derivative of $f$ at the zero if it is positive. Based on these tools, we can deduce that $L^{M}_{j}$ is positive definite, has minimal eigenvalue positive converging monotonically to zero as $c_{j}/M^{2}$ with positive $c_{j}$ independent of $M$ , and $c_{j}$ related to the second derivative of $2-2\cos(j\theta)$ at $\theta=0$ .

Of course, by linearity, the Toeplitz matrix $\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1}$ has generating function given by $f_{l}(\theta)\equiv\left(\frac{2}{l}\sum\limits_{j=1}^{l}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\left[2-2\cos(j\theta)\right]}\right)-(2-2\cos(\theta))$ and hence by studying this generating function, we deduce that

\left(\frac{2}{l}\sum\limits_{j=1}^{l}L^{M}_{j}\right)-L^{M}_{1}

is positive definite, has minimal eigenvalue positive converging monotonically to zero as $c/M^{2}$ with positive $c$ independent of $M$ . Hence its condition number grow exactly as $M^{2}$ and since the related generating function has a unique zero of order $2$ at $\theta=0$ , there is a formal justification in using standard projectors and restriction operators, like those employed in the standard AMG, for the classical discrete Laplacian (see [28, 3, 39] and references therein).

Lemma 9.

Let $\mathcal{A}_{K}:=\mathcal{A}^{S}_{h}$ be defined by (11). Then the damped Jacobi iteration with relaxation parameter $0<\omega\leq 1$ satisfies

||S_{K}\nu^{K}||_{\mathcal{A}_{K}}^{2}\leq||\nu^{K}||_{\mathcal{A}_{K}}^{2}-\frac{1}{2}||\mathcal{A}_{K}\nu^{K}||_{\mathcal{D}_{K}^{-1}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.

Proof.

According to (11) and (12), we know that the matrix $\mathcal{A}_{K}$ is a weakly diagonal dominant, since

a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0.

Taking $\mathcal{A}_{K}=\left\{a_{i,j}^{(K)}\right\}_{i,j=1}^{\infty}$ with $a_{i,j}^{(K)}=a_{|j-i|}^{(K)}$ , it yields

r_{i}^{(K)}:=\sum\limits_{j\neq i}|a_{i,j}^{(K)}|\leq a_{i,i}^{(K)}=a_{0}.

Using the Gerschgorin circle theorem [29, p. 388], the eigenvalues of $\mathcal{A}_{K}$ are in the disks centered at $a_{i,i}^{(K)}$ with radius $r_{i}^{(K)}$ . Namely, the eigenvalues $\lambda$ of the matrix $\mathcal{A}_{K}$ satisfy

|\lambda-a_{i,i}^{(K)}|\leq r_{i}^{(K)},

which leads to $\lambda_{\max}(\mathcal{A}_{K})\leq a_{i,i}^{(K)}+r_{i}^{(K)}<2a_{i,i}^{(K)}=2a_{1,1}^{(K)}=2a_{0}$ .

From Rayleigh theorem [29, p. 235], we deduce

\lambda_{\max}(\mathcal{A}_{K})=\max_{x\neq 0}\frac{x^{T}\mathcal{A}_{K}x}{x^{T}x}\quad\forall x\in\mathbb{{R}}^{n}.

Take $x=[1,0,\ldots,0]^{T}$ . Then

\lambda_{\max}(\mathcal{A}_{K})\geq\frac{x^{T}\mathcal{A}_{K}x}{x^{T}x}=a_{1,1}^{(K)}=a_{0},

and

\lambda_{\max}\left(\left(\mathcal{D}_{K}\right)^{-1}\mathcal{A}_{K}\right)=\frac{\lambda_{\max}(\mathcal{A}_{K})}{a_{1,1}^{(K)}}=\frac{\lambda_{\max}(\mathcal{A}_{K})}{a_{0}}\in[1,2],

where $\mathcal{D}_{K}$ is the diagonal of $\mathcal{A}_{K}$ .

From Lemma 2, the Damped Jacobi method converges with $0<\omega<1$ . By the similar proof process of Lemma 7, we can check that $2\mathcal{D}_{K}-\mathcal{A}_{K}$ is symmetric positive definite. Then using Lemma 2 again, the Jacobi method converges. Hence, the damped Jacobi iteration with relaxation parameter $0<\omega\leq 1$ converges. The proof is completed. ∎

Lemma 10.

Let $\mathcal{A}_{K}=\mathcal{A}_{h}$ be defined by (11). Then

\min_{\nu^{K-1}\in\mathfrak{M}^{K-1}}||\nu^{K}-I_{K-1}^{K}\nu^{K-1}||_{\mathcal{D}_{K}}^{2}\leq\frac{1}{24}||\nu^{K}||_{\mathcal{A}_{K}}^{2}\quad\forall\nu^{K}\in\mathfrak{M}^{K}.

Proof.

Let discrete block-structured Laplacian-like operators be defined by

(24)

\mathcal{L}_{j}=\left[\begin{array}[]{cc}{L}^{N-1}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right]=\left[\begin{array}[]{cc}{L}^{N-1}_{j}&~{}\mathbf{0}\\ \mathbf{0}&~{}\mathbf{0}\end{array}\right]+\left[\begin{array}[]{cc}\mathbf{0}&~{}\mathbf{0}\\ \mathbf{0}&~{}L^{N}_{j}\end{array}\right],~{}~{}j=1,2,\dots,r,

where the discrete Laplacian-like operators $\{L^{N-1}_{j}\}^{r}_{j=1}\in\mathds{R}^{(N-1)\times(N-1)}$ , $\{L^{N}_{j}\}^{r}_{j=1}\in\mathds{R}^{N\times N}$ are given in Lemma 8.

From (11), the block-structured dense matrix $\mathcal{A}_{K}$ can be denoted as the series sum with Laplacian-like operators $\mathcal{L}_{j}$ , i.e.,

(25)

\mathcal{A}_{K}=\left[\begin{array}[]{cc}A&B\\ B^{T}&\widehat{A}\end{array}\right]=-\sum^{r}_{j=1}a_{j}\mathcal{L}_{j}+\mathcal{B}~{}~{}{\rm and}~{}~{}\mathcal{B}=\left(a_{0}+2\sum^{r}_{j=1}a_{j}\right)I+\left[\begin{array}[]{cc}\mathbf{0}&~{}B\\ B^{T}&~{}\mathbf{0}\end{array}\right]

with $I$ an identity matrix.

Using (12), it yields $a_{0}+2\sum_{m=1}^{r}a_{m}+2\sum_{m=0}^{r-1}a_{m+\frac{1}{2}}=0,$ which implies that $\mathcal{B}$ is a weakly diagonally dominant symmetric matrix with positive entries on the diagonal and nonpositive off-diagonal entries. Then $\mathcal{B}$ is symmetric and semi-positive definite by applying the Gerschgorin circle theorem.

From (25), (12) and Lemma 8, we obtain

\begin{split}\left|\left|\nu^{K}\right|\right|_{\mathcal{A}_{K}}&=\left(\mathcal{A}_{K}\nu^{K},\nu^{K}\right)\geq-\frac{a_{1}}{2}\left(\sum^{r}_{j=1}\mathcal{L}_{j}\nu^{K},\nu^{K}\right)\geq-\frac{ra_{1}}{4}\left(\mathcal{L}_{1}\nu^{K},\nu^{K}\right)\\ &\geq-\frac{ra_{1}}{8}\left(L^{2N-1}_{1}\nu^{K},\nu^{K}\right)\geq\frac{a_{0}}{48}\left(L^{2N-1}_{1}\nu^{K},\nu^{K}\right)\geq\frac{a_{0}}{24}\left|\left|\nu^{K}-I^{K}_{K-1}\nu^{K}\right|\right|^{2}\\ &=\frac{1}{24}\left|\left|\nu^{K}-I^{K}_{K-1}\nu^{J}\right|\right|^{2}_{\mathcal{D}_{K}}.\end{split}

The proof is completed. ∎

Theorem 11.

Let $\mathcal{A}_{K}=\mathcal{A}_{h}$ be defined by (11). Then the convergence factor of the TGM satisfies

\left|\left|S_{K}T_{K}\right|\right|_{\mathcal{A}_{K}}\leq\sqrt{47/48}<1.

Proof.

According to Lemma 3, 4, 9 and 10, the desired result is obtained. ∎

5 Numerical results

We employ the V-cycle block-structured AMG described in Algorithm 2 to solve the time-dependent nonlocal problems in Section 2. The stopping criterion is taken as

\frac{||r^{(i)}||}{||r^{0}||}<10^{-15},

where $r^{(i)}$ is the residual vector after $i$ iterations, and the number of iterations $(m_{1},m_{2})=(1,1)$ and $(w_{pre},w_{post})=(1,1/2)$ . In all tables, $N$ denotes the number of spatial grid points, and the numerical errors are measured by the $l_{\infty}$ (maximum) norm, ”rate” denotes the convergence orders, ”CPU” denotes the total CPU time in seconds (s) for solving the discretized systems, ”Iter” denotes the average number of iterations required to solve algebraic systems at each time level.

All numerical experiments are programmed in Matlab, and the computations are carried out on a desktop with the configuration: Intel(R) Core(TM) i7-7700 3.60 GHZ and 8 GB RAM and a 64 bit Windows 10 operating system.

First we consider the time-dependent nonlocal/peridynamic models (3) and (5) in Section 2. The initial value and the forcing term are chosen such that the exact solution of the considered equations is

u(x,t)=e^{t}\left(1+x\right)^{6},\quad 0\leqslant x\leqslant 1,\ 0\leqslant t\leqslant 1.

We apply the following BDF4 method to such nonlocal models

(26)

\begin{split}{\rm BDF4~{}scheme}:&\left(\frac{25}{12}I-\tau\mathcal{A}\right)U^{k}=4U^{k-1}-3U^{k-2}+\frac{4}{3}U^{k-3}-\frac{1}{4}U^{k-4}+\tau F^{k}.\end{split}

Here the operator $\mathcal{A}$ denotes the block-structured systems (4), (10) and (11), respectively. It should be noted that the stability and convergence analysis of time-dependent nonlocal problem (3) and (5) can be seen in [1, 20, 19, 11]. The convergence rate of the two-grid method for time-dependent block-structured systems (11) can be directly obtained by Theorem 11 and [22]: here we omit the related derivation.

5.1 Application in nonlocal model

Table 1 shows that the BDF4 scheme (26) for time-dependent nonlocal model (3) has the global truncation error $\mathcal{O}\left(\tau^{4}+h^{4-\gamma}\right)$ and the computation cost is almost $\mathcal{O}(N\mbox{log}N)$ operations.

Table 1: Using Galerkin approach

\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}

in (18) computed by Lemma 1 to solve the time-dependent nonlocal model (3) with

\tau=h=1/N

$N$	$\gamma=0$				$\gamma=0.5$				$\gamma=0.9$
$N$	Error	Rate	CPU	Iter	Error	Rate	CPU	Iter	Error	Rate	CPU	Iter
$2^{5}$	1.4460e-05		0.11s	3	1.9587e-05		0.1167s	3	3.4809e-05		0.14s	4
$2^{6}$	9.7263e-07	3.89	0.39s	3	1.5160e-06	3.6916	0.4236s	3	3.5907e-06	3.28	0.47s	4
$2^{7}$	6.2993e-08	3.95	0.77s	2	1.1678e-07	3.6984	0.9123s	3	3.7667e-07	3.25	1.10s	3
$2^{8}$	3.9959e-09	3.98	2.26s	2	9.1285e-09	3.6772	2.1589s	2	4.0423e-08	3.22	2.94s	3

5.2 Application in Peridynamic model

We further extend the V-cycle block-structured AMG Algorithm 2 to simulate the peridynamic model with nonsymmetric indefinite block-structured dense systems and symmetric positive definite block-structured dense systems, respectively.

5.2.1 Nonsymmetric indefinite block-structured dense systems

Table 2 shows that the BDF4 scheme (26) for time-dependent peridynamic model (3) with nonsymmetric indefinite block-structured dense systems has the global truncation error $\mathcal{O}\left(\tau^{4}+h^{\max\left\{2,4-2\beta\right\}}\right)$ with $\delta=\mathcal{O}\left(h^{\beta}\right)$ , $\beta\geq 0$ and a computational cost of $\mathcal{O}(N\mbox{log}N)$ arithmetic operations.

Table 2: Nonsymmetric indefinite block-structured dense systems (10): Using Galerkin approach

\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}

in (18) computed by Lemma 1 to solve the time-dependent peridynamic model (3) with and

\tau=h=1/N

	$N$	$\delta=1/4$				$\delta=\sqrt{h}$
	$N$	Error	Rate	CPU	Iter	Error	Rate	CPU	Iter
BDF4	$2^{5}$	4.3254e-05		0.1460s	7	9.9042e-05		0.2276s	10
	$2^{6}$	2.7166e-06	3.9930	0.3628s	6	9.3791e-06	3.4005	0.4737s	8
	$2^{7}$	1.7022e-07	3.9963	0.6684s	5	1.1886e-06	2.9802	1.5029s	8
	$2^{8}$	1.0652e-08	3.9981	2.0338s	4	1.3694e-07	3.1177	3.5816s	7

5.2.2 Symmetric positive definite block-structured dense systems

Table 3 shows that the BDF4 scheme (26) for time-dependent peridynamic model (3) with symmetric block-structured dense systems has the global truncation error $\mathcal{O}\left(\tau^{4}+h^{\max\left\{2,4-2\beta\right\}}\right)$ with $\delta=\mathcal{O}\left(h^{\beta}\right)$ , $\beta\geq 0$ and a computational cost of $\mathcal{O}(N\mbox{log}N)$ arithmetic operations.

Table 3: Symmetric positive definite block-structured dense systems (11): Using Galerkin approach

\mathcal{A}_{k-1}=I_{k}^{k-1}\mathcal{A}_{k}I^{k}_{k-1}

in (18) computed by Lemma 1 to solve the time-dependent peridynamic model (3) with and

\tau=h=1/N

	$N$	$\delta=1/4$				$\delta=\sqrt{h}$
	$N$	Error	Rate	CPU	Iter	Error	Rate	CPU	Iter
BDF4	$2^{5}$	1.1628e-05		0.2341s	9	2.5257e-05		0.2987s	12
	$2^{6}$	7.3840e-07	3.9771	0.4819s	7	2.3810e-06	3.4070	0.5999s	10
	$2^{7}$	4.6514e-08	3.9887	1.0638s	6	2.9930e-07	2.9919	2.0004s	10
	$2^{8}$	2.9182e-09	3.9945	2.6782s	5	3.4366e-08	3.1226	4.6659s	9

6 Conclusions

In this paper, we considered the solutions of block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problem and peridynamic problem. We designed a AMG for block-structured dense and Toeplitz-like-plus-Cross systems, by making also use of fast Fourier transforms, and we provided an estimate of the TGM convergence rate for the peridynamic problem with symmetric positive definite block-structured dense linear systems. In this specific context, we answered the question on how to define coarsening and interpolation operators, when the stiffness matrix leads to nonsymmetric systems [10, 31]. The simple (traditional) restriction operator and prolongation operator are employed for such Toeplitz-like-plus-Cross systems, so that the entries of the sequence of subsystems are explicitly determined on different levels.

For the future, at least two questions arise and we plan to investigate them: more precisely we would like to consider the study of the TGM convergence analysis for nonsymmetric block-structured dense systems and the analysis of the full MGM for symmetric block-structured dense systems, based on the ideas presented in [15, 17].

References

[1] G. Akrivis, M. H. Chen, F. Yu, and Z. Zhou, The energy technique for the six-step BDF method, SIAM J. Numer. Anal., 59 (2021), pp. 2449–2472.
[2] F. Andreu-Vaillo, J. M. Mazón, J. D. Rossi, and J. J. Toledo-Melero, Nonlocal Diffusion Problems, Math. Surveys Monogr. 165, AMS, Providence, RI, 2010.
[3] A. Aricò and M. Donatelli, A V-cycle multigrid for multilevel matrix algebras: proof of optimality, Numer. Math., 105 (2007), pp. 511–547.
[4] A. Aricò, M. Donatelli, and S. Serra-Capizzano, V-cycle optimal convergence for certain (multilevel) structured linear systems, SIAM J. Matrix Anal. Appl., 26 (2004), pp. 186–214.
[5] K. E. Atkinson, The Numerical Solution of Integral Equations of the Second Kind, Cambridge University Press, New York, 2009.
[6] P. Bates, On some nonlocal evolution equations arising in materials science In: H. Brunner, X. Zhao and X. Zou (eds.) Nonlinear Dynamics and Evolution Equations. in Fields Inst. Commun. AMS, Providence, RI, (2006), pp. 13–52.
[7] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta Numer., 14 (2005), pp. 1–137.
[8] M. Bolten, M. Donatelli, T. Huckle, and C. Kravvaritis, Generalized grid transfer operators for multigrid methods applied on Toeplitz matrices, BIT., 55 (2015), pp. 341–366.
[9] A. Böttcher and S. M. Grudsky, Spectral Properties of Banded Toeplitz Matrices, SIAM, Philadelphia, PA, 2005.
[10] M. Brezina, T. Manteuffel, S. Mccormick, J. Ruge, and G. Sanders, Towards adaptive smoothed aggregation ( $\alpha$ SA) for nonsymmetric problems, SIAM J. Sci. Comput., 32 (2010), pp. 14–39.
[11] R. J. Cao, M. H. Chen, M. K. Ng, and Y. J. Wu, Fast and high-order accuracy numerical methods for time-dependent nonlocal problems in $\mathbb{R}^{2}$ , J. Sci. Comput., 84 (2020), Paper No.8.
[12] R. H. Chan and X. Q. Jin, An Introduction to Interative Toeplitz Solvers, SIAM, Phildelphia, 2007.
[13] R. H. Chan and M. K. Ng, Conjugate gradient methods for Toeplitz systems, SIAM Rev. 38 (1996), pp. 427–482.
[14] R. H. Chan, Q. S. Chang, and H. W. Sun, Multigrid method for ill-conditioned symmetric Toeplitz systems, SIAM J. Sci. Comput., 19 (1998), pp. 516–529.
[15] M. H. Chen, S. E. Ekström, S. Serra-Capizzano, A Multigrid method for nonlocal problems: non-diagonally dominant or Toeplitz-plus-tridiagonal systems, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1546–1570.
[16] M. H. Chen and W. H. Deng, Fourth order accurate scheme for the space fractional diffusion equations, SIAM J. Numer. Anal., 52 (2014), pp. 1418–1438.
[17] M. H. Chen, and W. H. Deng, Convergence analysis of a Multigrid method for a nonlocal model, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 869–890.
[18] M. H. Chen, and W. H. Deng, and S. Serra-Capizzano, Uniform convergwnce of V-cycle Multigrid algorithms for two-dimensional fractional Feynman-Kac equation, J. Sci. Comput., 74 (2018), pp. 1034–1059.
[19] M. H. Chen, W. Y. Qi, J. K. Shi and J. M. Wu, A sharp error estimate of piecewise polynomial collocation for nonlocal problems with weakly singular kernels, IMA J. Numer. Anal., 41 (2021), pp. 3145–3174.
[20] M. H. Chen, Y. F. Qi, and J. K. Shi, Asymptotically compatible piecewise quadratic polynomial collocation for nonlocal model, arXiv:2010.09215
[21] M. H. Chen, J. K. Shi, and X. B. Yin, Analysis of (shifted) piecewise quadratic polynomial collocation for nonlocal diffusion model, Under Review
[22] M. H. Chen, Y. T. Wang, X. Cheng, and W. H. Deng, Second-order LOD multigrid method for multidimensional Riesz fractional diffusion equation, BIT, 54 (2014), pp. 623–647.
[23] M. D’Elia, Q. Du, C. Glusa, M. Gunzburger, X. C. Tian, and Z. Zhou, Numerical methods for nonlocal and fractional models, Acta Numer., 29 (2020), pp. 1–124.
[24] M. Donatelli, P. Ferrari, I. Furci, S. Serra-Capizzano, and D. Sesana, Multigrid methods for block-Toeplitz linear systems: convergence analysis andapplications, Numer. Linear Algebra. Appl., 28 (2021), No. e2356
[25] M. Donatelli, M. Mazza, and S. Serra-Capizzano, Spectral analysis and multigrid methods for finite volume approximations of space-fractional diffusion equations, SIAM J. Sci. Comput., 40 (2018), pp. A4007–A4039.
[26] Q. Du, Nonlocal Modeling, Analysis, and Computation, SIAM, Philadelphia, 2019.
[27] Q. Du, M. Gunzburger, R. Lehoucq, and K. Zhou, Analysis and approximation of nonlocal diffusion problems with volume constraints, SIAM Rev., 56 (2012), pp. 676–696.
[28] G. Fiorentino and S. Serra, Multigrid methods for symmetric positive definite block Toeplitz matrices with nonnegative generating functions, SIAM J. Sci. Comput., 17 (1996), pp. 1068–1081.
[29] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, 2013.
[30] Y. Leng, X. Tian, N. Trask, and J. T. Foster, Asymptotically compatible reproducing kernel collocation and meshfree integration for nonlocal diffusion, SIAM J. Numer. Anal., 59 (2021), pp. 88–118.
[31] T. Manteuffel and S. Southworth, Convergence in norm of nonsymmetric algebraic multigrid, SIAM J. Sci. Comput., 41 (2019), pp. S269-S296.
[32] M. K. Ng, J. Y. Pan, Weighted Toeplit regularized least squares computation for image restoration, SIAM J. Sci.Comput., 36 (2014), pp. B94-B121.
[33] Y. Notay, A new algebraic multigrid approach for Stokes problems, Numer. Math., 132 (2016), pp. 51-84.
[34] D. Palitta and V. Simoncini, Matrix-equation-based strategies for convection-diffusion equations, BIT Numer. Math., 56 (2016), pp. 751–776.
[35] H. Pang and H. Sun, Multigrid method for fractional diffusion equations, J. Comput. Phys., 231 (2012), pp. 693–703.
[36] W. Y. Qi, P. Seshaiyer, J. P. Wang,, Finite element method with the total stress variable for Biot’s consolidation model, Numer. Methods Partial Differential Eq., 37 (2021), pp. 2409-2428.
[37] J. Ruge and K. Stüben, Algebraic multigrid, in Multigrid Methods, Ed: S. McCormick, SIAM, 1987, pp. 73–130.
[38] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM , Philadelphia, 2003.
[39] S. Serra-Capizzano, Convergence analysis of two-grid methods for elliptic Toeplitz and PDEs matrix-sequences, Numer. Math., 92 (2002), pp. 433–465.
[40] S. A. Silling, Reformulation of elasticity theory for discontinuities and long-range forces, J. Mech. Phys. Solids, 48 (2000), pp. 175–209.
[41] V. Simoncini, On the numerical solution of a class of systems of linear matrix equations, IMA J. Numer. Anal., 40 (2020), pp. 207-225.
[42] A. A. Sivas, B. S. Southworth, and S. Rhebergen, Air algebraic multigrid for a space-time hybridizable discontinuous Garlerkin discretization of advection(-diffusion), SIAM J. Sci. Comput., 43 (2021), pp. A3393-A3416.
[43] W. J. Stewart, Introduction to the Numerical Soluton of Markov Chains, Princeton Univ. Press, 1994.
[44] X. C. Tian and Q. Du, Analysis and comparison of different approximations to nonlocal diffusion and linear peridynamic equations, SIAM J. Numer. Anal., 51 (2013), pp. 3458–3482.
[45] R. V. Varga, Matrix Iterative Analysis, Springer, Berlin Heidelberg, 2000.
[46] MT. A. Wiesner, M. Mayr, A. Popp, M. W. Gee, and W. A. Wall, Algebraic multigrid methods for saddle point systems arising from mortar contact formulations, Int. J. Numer. Methods Eng., 122 (2021), pp. 3749–3779.
[47] J. Xu, An introduction to multilevel methods, in wavelets, Iterative Methods for Sparse Linear Systems, Multilevel Methods and Elliptic PDEs, Leicester, 1996, M.Ainsworth, J. Levesley, W. A. Light, and M. Marletta, eds., Oxford Universty Press, New York, 1997, pp. 213-302.
[48] J. Xu and L. Zikatanov, Algebraic multigrid methods, Acta. Numerica., 26 (2017), pp. 591–721.

Fast algebraic multigrid for block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problems ††thanks: This work was supported by NSFC 11601206.

Abstract

keywords:

AMS:

1 Introduction

2 Block-structured dense systems applications

2.1 Application in nonlocal model

2.2 Application in Peridynamic model

2.2.1 Nonsymmetric indefinite block-structured dense systems

2.2.2 Symmetric positive definite block-structured dense systems

3 Fast AMG for block-structured dense systems

3.1 Multigrid methods

3.2 Fast Fourier transform for block-structured dense systems

3.2.1 Fast Fourier transform for block-structured dense systems at the finest level

3.2.2 Fast Fourier transform for block-structured dense systems at the coarser level

Example 3.1.

Lemma 1.

3.3 The operation count and storage requirement

4 Convergence of TGM for block-structured dense system (11)

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Remark 4.1.

Lemma 9.

Proof.

Lemma 10.

Proof.

Theorem 11.

Proof.

5 Numerical results

5.1 Application in nonlocal model

5.2 Application in Peridynamic model

5.2.1 Nonsymmetric indefinite block-structured dense systems

5.2.2 Symmetric positive definite block-structured dense systems

6 Conclusions

References

Fast algebraic multigrid for block-structured dense and Toeplitz-like-plus-Cross systems arising from nonlocal diffusion problems ^†^†thanks: This work was supported by NSFC 11601206.