This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Parallel multilevel restricted Schwarz preconditioners for implicit simulation of subsurface flows with Peng-Robinson equation of state

Rui Li [email protected] Haijian Yang [email protected] School of Mathematics, Hunan University, Changsha, Hunan 410082, PR China Chao Yang [email protected] School of Mathematical Sciences, Peking University, Beijing 100871, PR China
Abstract

Parallel algorithms and simulators with good scalabilities are particularly important for large-scale reservoir simulations on modern supercomputers with a large number of processors. In this paper, we introduce and study a family of highly scalable multilevel restricted additive Schwarz (RAS) methods for the fully implicit solution of subsurface flows with Peng-Robinson equation of state in two and three dimensions. With the use of a second-order fully implicit scheme, the proposed simulator is unconditionally stable with the relaxation of the time step size by the stability condition. The investigation then focuses on the development of several types of multilevel overlapping additive Schwarz methods for the preconditioning of the resultant linear system arising from the inexact Newton iteration, and some fast solver technologies are presented for the assurance of the multilevel approach efficiency and scalability. We numerically show that the proposed fully implicit framework is highly efficient for solving both standard benchmarks as well as realistic problems with several hundreds of millions of unknowns and scalable to 8192 processors on the Tianhe-2 supercomputer.

keywords:
Reservoir simulation , Fully implicit method , Multilevel method , Restricted Schwarz preconditioners , Parallel computing
journal: J. Comput. Phys.

1 Introduction

Simulation of subsurface fluid flows in porous media is currently an important topic of interest in many applications, such as hydrology and groundwater flow, oil and gas reservoir simulation, CO2 sequestration, and waste management [8, 13]. The extensive growing demand on accurate modeling subsurface systems has produced persistent requirements on the understanding of many different physical processes, including multiphase fluid flow and transport, fluid phase behavior with sophisticated equation of state, and geomechanical deformations. To simulate these processes, one needs to solve a set of coupled, nonlinear, time-dependent partial differential equations (PDEs) with complicated nonlinear behaviors arising from the complexity of the geological media and reservoir properties. Additional computational challenge comes from the involving of Peng-Robinson equation of state, which governs the equilibrium distribution of fluid and has a remarkable influence on the high variation of the Darcy velocity. Due to their high computational complexity, numerically approximating subsurface phenomena with high resolution is critical to the reservoir engineering for accurate predictions of costly projects. Hence, parallel reservoir simulators equipped with robust and scalable solvers on high performance computing platforms are crucial to achieve efficient simulations of this intricate problem [9].

In terms of the numerical solution of partial differential equations, the fully implicit method [2, 3, 17, 20, 24, 26, 32, 36] is a widely preferred approach to solve various problems arising from the discretization of subsurface fluid flows. The fully implicit scheme is unconditionally stable and can relax the stability requirement of the Courant-Friedrichs-Lewy (CFL) condition. It therefore provides a consistent and robust way to simulate the subsurface fluid flow problem in long-term and extreme-scale computations. In particular, a great advantage of the fully implicit algorithm is that the corresponding nonlinear equations are implicitly solved in a monolithic way. This characteristic feature strengthens its potential to allow the addition of more physics and the introduction of more equations without changing much of the simulator framework, which greatly expands the scope of application of the fully implicit approach. In spite of being stable for arbitrary large time steps, when a fully implicit scheme is applied, one must solve a large, sparse, and ill-conditioned linear system at each nonlinear iteration. It remains challenging and important to design robust and scalable solvers for the large scale simulations on high performance computing platforms. In this study, we employ the framework of Newton–Krylov algorithms [5, 7, 11, 15, 21, 34] to guarantee the nonlinear consistency, and mainly focus our efforts on designing an efficient preconditioning strategy to substantially reduce the condition number of the linear system.

In reservoir simulations, some effective preconditioning techniques, such as block preconditioners [14, 18, 33], the Constrained Residual Pressure (CPR) preconditioning approach [30, 31, 32], the domain decomposition methods [20, 26, 36, 37], and the algebraic or geometric multigrid algorithms [2, 3, 24, 32], are flexible and capable of addressing the inherent ill-conditioning of a complicated physics system, and hence have received increasingly more attentions in recent years. The focus of this paper is on the domain decomposition approach by which we propose a family of one-level or multilevel restricted additive Schwarz preconditioners. The original overlapping additive Schwarz (AS) method was introduced for the solution of symmetric positive definite elliptic finite element problems, and was later extended to many other nonlinear or linear systems [6, 29]. In the AS method, it follows the divide-and-conquer technique by recursively breaking down a problem into more sub-problems of the same or related type, and its communication only occurs between neighboring subdomains during the restriction and extension processes. Moreover, the additive Schwarz preconditioner does not require any splitting of the nonlinear system. Hence, it can serve as a basis of efficient algorithms that are naturally suitable for massively parallel computing. In particular, the approach can be combined with certain variants of restriction operators in a robust and efficient way, such as the so-called restricted additive Schwarz (RAS) method proposed by Xiao-Chuan Cai et al. in [4, 6]. Hence, it can significantly improve the physical feasibility of the solver and substantial reduce the total computing time, and has proven to be very efficient in a variety of applications [16, 23, 26, 32, 36, 38]. Recently, the overlapping restricted additive Schwarz method in conjunction with a variant of inexact Newton methods has been applied successfully to the two-phase flow problem [36] and the black oil model [32, 37]. It was demonstrated that the one-level Schwarz preconditioner is scalable in terms of the numbers of nonlinear and linear iterations, but not in terms of the total compute time when the number of processors becomes large, which means the requirement of the family of multilevel methods [16, 23, 28, 29].

In this work, to take advantage of modern supercomputers for large-scale reservoir simulations, we propose and develop the overlapping restricted additive Schwarz preconditioner into a general multilevel framework for solving discrete systems coming from fully implicit discretizations. We show that, with this new feature, our proposed approach is more robust and efficient for problems with highly heterogeneous media, and can scale optimally to a number of processors on the Tianhe-2 supercomputer. We would like to pointing out that designing a good strategy for the multilevel approach is both time-consuming and challenging, as it requires extensive knowledge of the general-purpose framework of interests, such as the selection of coarse-to-fine mesh ratios, the restriction and interpolation operators, and the solvers for the smoothing and coarse-grid correction. To the best of our knowledge, very limited research has been conducted to apply the multilevel restricted additive Schwarz preconditioning technique for the fully implicit solution of petroleum reservoirs.

The outline of the paper is as follows. In Section 2, we introduce the governing equations of subsurface flows with Peng-Robinson equation of state, followed by the corresponding fully implicit discretization. In Section 3, a family of multilevel restricted additive Schwarz preconditioners, as the most important part of the fully implicit solver, is presented in detail. We show numerical results for some 2D and 3D realistic benchmark problems in Section 4 to demonstrate the robustness and scalability of the proposed preconditioner. The paper is concluded in Section 5.

2 Mathematical model and discretizations

The problem of interest in this study is the compressible Darcy flow in a porous medium for the description of gas reservoir simulations [8, 9, 13]. Let Ωd\Omega\in\mathbb{R}^{d} (d=1,2,3)(d=1,2,3) be the computational domain with the boundary Γ=Ω\Gamma=\partial\Omega, and tft_{f} denote the final simulated time. The mass balance equation with the Darcy law for the real gas fluid is defined by

{(ϕρ)t+(ρ𝐯)=q,in ΩT=Ω×(0,tf],𝐯=𝐊μ(pρgh),\left\{\begin{array}[]{ll}\displaystyle\frac{\partial(\phi\rho)}{\partial{t}}+\nabla\cdot(\rho\mathbf{v})=q,~{}\mbox{in~{}}\Omega_{T}=\Omega\times(0,t_{f}],\\ \displaystyle\mathbf{v}=-\frac{\mathbf{K}}{\mu}(\nabla p-\rho g\nabla h),\\ \end{array}\right. (1)

where ϕ\phi is the porosity of the porous medium, 𝐊\mathbf{K} is the permeability tensor, 𝐯\mathbf{v} is the Darcy’s velocity, ρ\rho denotes the density, μ\mu is the viscosity of the flow, pp is the pressure, and qq is the external mass flow rate. The Peng-Robinson (PR) equation of state (EOS) [13] is used to describe the density as a function of the composition, temperature and pressure:

{ρ=cW,c=pZRT,\left\{\begin{array}[]{ll}\rho=cW,\\ c=\displaystyle\frac{p}{ZRT},\\ \end{array}\right. (2)

where WW is the molecular weight, TT is the temperature, RR is the gas constant, and ZZ is the compressibility factor of gas defined by

Z3(1B)Z2+(A3B22B)Z(ABB2B3)=0.Z^{3}-(1-B)Z^{2}+(A-3B^{2}-2B)Z-(AB-B^{2}-B^{3})=0. (3)

Moreover, AA and BB are the PR-EOS parameters defined as follows:

A=apR2T2,B=bpRT,\displaystyle A=\frac{ap}{R^{2}T^{2}},~{}B=\frac{bp}{RT},

where the parameters aa and bb are modeled by imposing the criticality conditions

{a=a(T)=0.45724R2Tc2pc(1+m(1Tr))2,b=0.0778RTcpc,\left\{\begin{array}[]{ll}\displaystyle a=a(T)=0.45724\cdot\frac{R^{2}T_{c}^{2}}{p_{c}}\left(1+m\left(1-\sqrt{T_{r}}\right)\right)^{2},\\ \displaystyle b=0.0778\cdot\frac{RT_{c}}{p_{c}},\end{array}\right. (4)

with TcT_{c} (pcp_{c}) being the specific pressure (temperature) in the critical state of the gas, and TrT_{r} being the reduced temperature defined as T/Tc{T}/{T_{c}}. In addition, the parameter mm in (4) is a fitting formula of the acentric factor ww of the substance:

m={0.37464+1.54226w0.26992w20w0.491,0.3796+1.485w0.1644w2+0.01667w30.491w2.0.m=\left\{\begin{aligned} 0.37464+1.54226w-0.26992w^{2}\quad 0\leq w\leq 0.491,\\ 0.3796+1.485w-0.1644w^{2}+0.01667w^{3}\quad 0.491\leq w\leq 2.0.\\ \end{aligned}\right. (5)

Here, the acentric factor ww is calculated by the following formula:

w=ln(pc1atm)f(0)(Tbr)f(1)(Tbr),w=\frac{-ln(\frac{p_{c}}{1\;atm})-f^{(0)}(T_{br})}{f^{(1)}(T_{br})},

where f0(Tbr)f^{0}(T_{br}) and f1(Tbr)f^{1}(T_{br}) are given by

{f0(Tbr)=5.97616(1Tbr)+1.29874(1Tbr)320.60394(1Tbr)521.06841(1Tbr)5Tbr,f1(Tbr)=5.03365(1Tbr)+1.11505(1Tbr)325.41217(1Tbr)527.46628(1Tbr)5Tbr,\left\{\begin{aligned} f^{0}(T_{br})&=\frac{-5.97616(1-T_{br})+1.29874(1-T_{br})^{\frac{3}{2}}-0.60394(1-T_{br})^{\frac{5}{2}}-1.06841(1-T_{br})^{5}}{T_{br}},\\ f^{1}(T_{br})&=\frac{-5.03365(1-T_{br})+1.11505(1-T_{br})^{\frac{3}{2}}-5.41217(1-T_{br})^{\frac{5}{2}}-7.46628(1-T_{br})^{5}}{T_{br}},\end{aligned}\right.

with the normal boiling point temperature TbT_{b} and the reduced normal boiling point temperature defined as Tbr=Tb/TcT_{br}={T_{b}}/{T_{c}}.

In this study, we take the pressure pp as the primary variable. Suppose that the porosity ϕ\phi is not changed with the time, then the equations of the mathematical model can be rewritten by substituting (2) and (3) into (1) as follows:

ϕpZ(1p1ZZp)pt(𝐊pμZ(ppWZRTgh))qRTW=0,\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\frac{\partial p}{\partial t}-\nabla\cdot\left(\frac{\mathbf{K}p}{\mu Z}\left(\nabla p-\frac{pW}{ZRT}g\nabla h\right)\right)-\frac{qRT}{W}=0, (6)

where the relations are given by

{Zp=bRTCZ2(aR2T2C2b+6BbRTC)Z+(2abpR3T3C2Bb+3B2bRTC),C=3Z22(1B)Z+(A2B3B2),Z3(1B)Z2+(A2B3B2)Z(ABB2B3)=0,\left\{\begin{aligned} &\frac{\partial Z}{\partial p}=-\frac{b}{RTC}Z^{2}-\left(\frac{a}{R^{2}T^{2}C}-\frac{2b+6Bb}{RTC}\right)Z+\left(\frac{2abp}{R^{3}T^{3}C}-\frac{2Bb+3B^{2}b}{RTC}\right),\\ &C=3Z^{2}-2(1-B)Z+(A-2B-3B^{2}),\\ &Z^{3}-(1-B)Z^{2}+(A-2B-3B^{2})Z-(AB-B^{2}-B^{3})=0,\end{aligned}\right.

with the initial condition p=p0p=p_{0}. Suppose the boundary of the computational domain Ω\Omega is composed of two parts Ω=ΓD+ΓN\partial\Omega=\Gamma^{D}+\Gamma^{N} with ΓDΓN=\Gamma^{D}\cap\Gamma^{N}=\emptyset. The boundary conditions associated to the model problem (6) are

{p=pDonΓD,𝐯𝐧=qNonΓN,\left\{\begin{array}[]{ll}p=p^{D}&\mbox{on}\,\Gamma^{D},\\ \mathbf{v}\cdot\mathbf{n}=q^{N}&\mbox{on}\,\Gamma^{N},\\ \end{array}\right.

where n is the outward normal of the boundary Ω\partial\Omega. We remark that the compressibility factor ZZ as an intermediate variable is obtained by solving the algebraic cubic equation (3) with the primary variable pp, see the references [8, 9] for the computation of ZZ in details.

We employ a cell-centered finite difference (CCFD) method for the spatial discretization, for which the details can be found in [8, 9, 22, 39], and then a fully implicit scheme is applied for the time integration [2, 16, 32, 36]. For a given time-stepping sequence 0=t(1)<t(2)<0=t^{(1)}<t^{(2)}<..., define the time step size Δt(l)=t(l+1)t(l)\Delta t^{(l)}=t^{(l+1)}-t^{(l)} and use superscript (l)(l) to denote the discretized evaluation at time level t=t(l)t=t^{(l)}. After spatially discretizing (6) by the CCFD scheme, we have a semi-discrete system

ϕpZ(1p1ZZp)pt+(p(l+1))=0,\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\frac{\partial p}{\partial t}+\mathcal{F}\left(p^{(l+1)}\right)=0, (7)

where (p(l+1))\mathcal{F}\left(p^{(l+1)}\right) is the operator of the spatial discretization at time level t=t(l+1)t=t^{(l+1)} by using the CCFD method. For the purpose of comparison, we implement both the first-order backward Euler scheme (BDF-1) and second-order backward differentiation formula (BDF-2) for the temporal integration of (7). For the BDF-1 scheme, the fully discretized system reads

[ϕpZ(1p1ZZp)](l+1)p(l+1)p(l)t(l)+(p(l+1))=0,\left[\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\right]^{(l+1)}\frac{p^{(l+1)}-p^{(l)}}{\triangle t^{(l)}}+\mathcal{F}\left(p^{(l+1)}\right)=0, (8)

where p(l)p^{(l)} is the evaluation of pp at the lthl^{th} time step with a time step size t(l)\triangle t^{(l)}. And the fully discretized system for the BDF-2 method is

[ϕpZ(1p1ZZp)](l+1)3p(l+1)4p(l)+p(l1)2t(l)+(p(l+1))=0.\left[\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\right]^{(l+1)}\frac{3p^{(l+1)}-4p^{(l)}+p^{(l-1)}}{2\triangle t^{(l)}}+\mathcal{F}\left(p^{(l+1)}\right)=0. (9)

Note that, for the BDF-2 scheme, the BDF-1 method is used at the initial time step.

Remark 1

The employed CCFD discretization for (6) on rectangular meshes can be viewed as the mixed finite element method with Raviart-Thomas basis functions of lowest order equipped with the trapezoidal quadrature rule [8, 9, 22]. We also would like to pointing out that the proposed solver technologies introduced in the following section can be extended to solve the systems arising from the other spatial discretization schemes.

3 Multilevel restricted Schwarz preconditioners

When the fully implicit method is applied, a system of nonlinear algebraic equations after the temporal and spatial discretizations,

F(X)=0,F(X)=0, (10)

is constructed and solved at each time step. In (10), the vector F(X):nnF(X):\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} is a given nonlinear vector-valued function arising from the residuals function in (8) or (9) given by F(X)=(F1,F2,,Fn)TF(X)=(F_{1},F_{2},\cdots,F_{n})^{T} with Fi=Fi(X1,X2,,Xn)TF_{i}=F_{i}(X_{1},X_{2},\cdots,X_{n})^{T}, and X=(X1,X2,,Xn)TX=(X_{1},X_{2},\cdots,X_{n})^{T}. In this study, the nonlinear system (10) is solved by a parallel Newton-Krylov method with the family of domain decomposition type preconditioners [5, 7, 11, 15].

Let the initial guess for Newton iterations X0nX^{0}\in\mathbb{R}^{n} at current time step be the solution of the previous time step, then the new solution Xk+1X^{k+1} is updated by the current solution XkX^{k} and a Newton correction vector sks^{k} as follows:

Xk+1=Xk+λksk,k=0,1,2,,X^{k+1}=X^{k}+\lambda^{k}s^{k},\quad k=0,1,2,\cdots, (11)

where λk\lambda^{k} is the step length calculated by a cubic line search method [12] to satisfy

F(Xk+λksk)(1αλk)F(Xk),\begin{array}[]{ll}\left\|F(X^{k}+\lambda^{k}s^{k})\right\|\leq(1-\alpha\lambda^{k})\left\|F(X^{k})\right\|,\end{array}

with the parameter α\alpha being fixed to 10410^{-4} in practice. And the Newton correction sks^{k} is obtained by solving the following Jacobian linear system

Jksk=F(Xk),J_{k}s^{k}=-F(X^{k}), (12)

with a Krylov subspace iteration method [25]. Here, Jk=F(Xk)J_{k}=\nabla F(X^{k}) is the Jacobian matrix obtained from the current solution XkX^{k}. Since the corresponding linear system from the Newton iteration is large sparse and ill-conditioned, the family of one-level or multilevel restricted Schwarz preconditioners is taken into account to accelerate the convergence of the linear system, i.e., we solve the following right-preconditioned Jacobian system

JkMk1Mksk=F(Xk),J_{k}M_{k}^{-1}M_{k}s^{k}=-F(X^{k}), (13)

with an overlapping Schwarz preconditioned Krylov subspace method. In practice, the Generalized Minimal RESidual (GMRES) algorithm as a Krylov subspace method is used for the linear solver.

The accuracy of approximation (13) is controlled by the relative and absolute tolerances ηkr\eta^{r}_{k} and ηka\eta^{a}_{k} until the following convergence criterion is satisfied

Jksk+F(Xk)min{ηkrF(Xk),ηka}.\|J_{k}s^{k}+F(X^{k})\|\leq\min\{\eta^{r}_{k}\|F(X^{k})\|,\eta^{a}_{k}\}. (14)

We set the stopping conditions for the Newton iteration as

F(Xk+1)max{εrF(X0),εa},||{F}(X^{k+1})||\leq\max\{\varepsilon_{r}||{F}(X^{0})||,\varepsilon_{a}\}, (15)

with the relative εr\varepsilon_{r} and absolute εa\varepsilon_{a} solver tolerances, respectively.

The most important component of a robust and scalable solver for solving the nonlinear or linear system is the choice of suitable preconditioners. In large-scale parallel computing, the additive Schwarz (AS) preconditioner, as a family of domain decomposition methods, can help to improve the convergence and meanwhile is beneficial to the scalability of the linear solver, which is the main focus of this paper.

3.1 One-level Schwarz preconditioners

Let Ωd\Omega\subset\mathbb{R}^{d}, d=1,2,3d=1,2,3, be the computational domain on which the PDE system (6) are defined. Then a spatial discretization is performed with a mesh Ωh\Omega_{h} of the characteristic size h>0h>0. To define the one-level Schwarz preconditioner, we first divide Ωh\Omega_{h} into nonoverlapping subdomains Ωi\Omega_{i}, i=1,,Npi=1,\cdots,N_{p}, and then expand each Ωi\Omega_{i} to Ωiδ\Omega_{i}^{\delta}, i.e., ΩiΩiδ\Omega_{i}\subset\Omega_{i}^{\delta}, to obtain the overlapping partition. Here, the parameter δ>0\delta>0 is the size of the overlap defined as the minimum distance between Ωiδ\partial\Omega_{i}^{\delta} and Ωi\partial\Omega_{i}, in the interior of Ω\Omega. For boundary subdomains we simply cut off the part outside Ω\Omega. We also denote H>0H>0 as the characteristic diameter of Ωi\Omega_{i}, as shown in the left panel of Figure 1.

HxH_{x}δ\deltaxxzzyy
xxzzyy
Figure 1: A demonstrated partition of domain decompositions with geometry preserving coarse meshes. In the figure, the red solid lines indicate the partition of the 3D domain into 3×3×33\times 3\times 3 non-overlapping subdomains of size Hx×Hy×HzH_{x}\times H_{y}\times H_{z}, the dotted lines show the extended boundary of an overlapping subdomain with δ=2\delta=2. The left (right) panel of the figure denotes the partition with a fine (coarse) mesh size hh (2h2h), respectively.

Let NN and NiN_{i} denote the number of degrees of freedom associated to Ω\Omega and Ωjδ\Omega_{j}^{\delta}, respectively. Let AN×NA\in\mathbb{R}^{N\times N} be the Jacobian matrix of the linear system defined on a mesh Ωh\Omega_{h}

AX=b.AX=b. (16)

Then we can define the matrices Riδ{R}^{\delta}_{i} and Ri0{R}^{0}_{i} as the restriction operator from Ωh\Omega_{h} to its overlapping and non-overlapping subdomains as follows: Its element (Riδ)l1,l2({R}^{\delta}_{i})_{l_{1},l_{2}} is either (a) an identity block, if the integer indices 1l1Ni1\leq l_{1}\leq N_{i} and 1l2N1\leq l_{2}\leq N are related to the same mesh point and this mesh point belongs to Ωjδ\Omega_{j}^{\delta}, or (b) a zero block, otherwise. The multiplication of Riδ{R}_{i}^{\delta} with a N×1N\times 1 vector generates a smaller Ni×1N_{i}\times 1 vector by discarding all elements corresponding to mesh points outside Ωiδ\Omega_{i}^{\delta}. The matrix Ri0Ni×N{R}_{i}^{0}\in\mathbb{R}^{N_{i}\times N} is defined in a similar way, the only difference to the operator Riδ{R}^{\delta}_{i} is that its application to a N×1N\times 1 vector also zeros out all of those elements corresponding to mesh points on Ωiδ\Ωi\Omega_{i}^{\delta}\backslash\Omega_{i}. Then the classical one-level additive Schwarz (AS, [29]) preconditioner is defined as

Mδ,δ1=i=1Np(Riδ)TAi1Riδ.\begin{array}[]{ll}{M}^{-1}_{\delta,\delta}=\sum\limits_{i=1}^{N_{p}}({R}^{\delta}_{i})^{T}{A}_{i}^{-1}{R}^{\delta}_{i}.\end{array} (17)

with Ai=RiδA(Riδ)T{A}_{i}={R}^{\delta}_{i}{A}({R}^{\delta}_{i})^{T} and NpN_{p} is the number of subdomains, which is the same as the number of processors. In addition to that, there are two modified approaches of the one-level additive Schwarz preconditioner that may have some potential advantages for parallel computing. The first version is the left restricted additive Schwarz (left-RAS, [6]) method defined by

M0,δ1=i=1Np(Ri0)TAi1Riδ.\begin{array}[]{ll}{M}^{-1}_{0,\delta}=\sum\limits_{i=1}^{N_{p}}({R}^{0}_{i})^{T}{A}_{i}^{-1}{R}^{\delta}_{i}.\end{array} (18)

and the other modification to the original method is the right restricted additive Schwarz (right-RAS, [4]) preconditioner as follow:

Mδ,01=i=1Np(Riδ)TAi1Ri0.\begin{array}[]{ll}{M}^{-1}_{\delta,0}=\sum\limits_{i=1}^{N_{p}}({R}^{\delta}_{i})^{T}{A}_{i}^{-1}{R}^{0}_{i}.\end{array} (19)

In the above preconditioners, we use a sparse LU factorization or incomplete LU (ILU) factorization method to solve the subdomain linear system corresponding to the matrix Ai1{A}_{i}^{-1}. In the following, we will denote a Schwarz preconditioner simply by Mh1M^{-1}_{h} defined on the mesh Ωh\Omega_{h} with the characteristic size hh, when the distinction is not important.

3.2 Multilevel additive Schwarz preconditioners

To improve the scalability and robustness of the one-level additive Schwarz preconditioners, especially when a large number of processors is used, we employ the family of multilevel Schwarz preconditioners. Multilevel Schwarz preconditioners are obtained by combining the single level preconditioner Mh1M^{-1}_{h} assigned to each level, as shown in Figure 1. For the description of multilevel Schwarz preconditioners [16, 23, 28], we use the index j=0,1,,L1j=0,1,...,L-1 to designate any of the L2L\geq 2 levels. The meshes from coarse to fine are denoted by Ωhj\Omega_{h_{j}}, and the corresponding matrices and vectors are denoted AhjA_{h_{j}} and XhjX_{h_{j}}. Let us denote IhjI_{h_{j}} as the identity operator defined on the level jj, and the restriction operator from the level jj to the level j1j-1 be defined by

jj1:NjNj1,\mathcal{I}_{j}^{j-1}:\mathbb{R}^{N_{j}}\rightarrow\mathbb{R}^{N_{j-1}}, (20)

where NjN_{j} and Nj1N_{j-1} denote the number of degrees of freedom associated to Ωhj\Omega_{h_{j}} and Ωhj1\Omega_{h_{j-1}}. Moreover,

j1j:Nj1Nj\mathcal{I}_{j-1}^{j}:\mathbb{R}^{N_{j-1}}\rightarrow\mathbb{R}^{N_{j}} (21)

is the interpolation operator from the level j1j-1 to the level jj.

For the convenience of introduction, we present the construction of the proposed multilevel Schwarz preconditioners from the view of the multigrid (MG) V-cycle algorithm [16, 23, 28, 29]. More precisely speaking, in this sense, at each level j>0j>0, the Schwarz preconditioned Richardson method works as the pre-smoother and post-smoother, i.e., Mhj1M^{-1}_{h_{j}} preconditioning the μi0\mu_{i}\geq 0 presmoother iterations and Mhj1M^{-1}_{h_{j}} preconditioning the νi0\nu_{i}\geq 0 postsmoother iterations. In the general multigrid V-cycle framework, let XkX^{k} be the current solution for the linear system (16), the new solution is computed by the iteration Xk+1=MG(b,L,Xk)X^{k+1}=MG(b,L,X^{k}), as described in Algorithm 1.

Algorithm 1 Multigrid (MG) V-cycle algorithm.
1:procedure Xhj=MGX_{h_{j}}=MG(bhj,j,Xhjb_{h_{j}},j,X_{h_{j}})
2:     if j=0j=0 then
3:         Solve Ah0Xh0=bh0A_{h_{0}}X_{h_{0}}=b_{h_{0}}\triangleright Coarsest correction
4:     else
5:         Smooth μi\mu_{i} times AhjXhj=bhjA_{h_{j}}X_{h_{j}}=b_{h_{j}}:\triangleright Presmoothing
6:         (bhjAhjXhj)=(IhjAhjMhj1)μi(bhjAhjXhj)(b_{h_{j}}-A_{h_{j}}X_{h_{j}})=(I_{h_{j}}-A_{h_{j}}M^{-1}_{h_{j}})^{\mu_{i}}(b_{h_{j}}-A_{h_{j}}X_{h_{j}});
7:         bhj1=jj1(bhjAhjXhj)b_{h_{j-1}}=\mathcal{I}_{j}^{j-1}(b_{h_{j}}-A_{h_{j}}X_{h_{j}}); \triangleright Residual restriction
8:         Xhj1=MG(bhj1,j1,0)X_{h_{j-1}}=MG(b_{h_{j-1}},j-1,0);\triangleright Recursivity
9:         Xhj=Xhj+j1jXhj1X_{h_{j}}=X_{h_{j}}+\mathcal{I}_{j-1}^{j}X_{h_{j-1}};\triangleright Correction interpolation
10:         Smooth νi\nu_{i} times AhjXhj=bhjA_{h_{j}}X_{h_{j}}=b_{h_{j}}:\triangleright Postsmoothing
11:         (bhjAhjXhj)=(IhjAhjMhj1)νi(bhjAhjXhj)(b_{h_{j}}-A_{h_{j}}X_{h_{j}})=(I_{h_{j}}-A_{h_{j}}M^{-1}_{h_{j}})^{\nu_{i}}(b_{h_{j}}-A_{h_{j}}X_{h_{j}});
12:     end if
13:     return XhjX_{h_{j}}
14:end procedure

3.2.1 With application to the two-level case

In the following, we restrict ourselves to the two-level case, and use the geometry preserving coarse mesh that shares the same boundary geometry with the fine mesh, as shown in Figure 1. The V-cycle two-level Schwarz preconditioner MV1M_{V}^{-1}, i.e, L=2L=2 in Algorithm 1, is constructed by combining the fine level Mh11M^{-1}_{h_{1}} and the coarse level Mh01M^{-1}_{h_{0}} preconditioners as follows:

MV1=Ah11[Ih1(Ih1Ah1Mh11)ν1(Ih1Ah101Mh0110)(Ih1Ah1Mh11)μ1].M_{V}^{-1}=A_{h_{1}}^{-1}\left[I_{h_{1}}-(I_{h_{1}}-A_{h_{1}}M^{-1}_{h_{1}})^{\nu_{1}}(I_{h_{1}}-A_{h_{1}}\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0})(I_{h_{1}}-A_{h_{1}}M^{-1}_{h_{1}})^{\mu_{1}}\right]. (22)

More precisely, if the pre- and post-smoothing parameters are fixed to μi=1\mu_{i}=1 and νi=1\nu_{i}=1, then the matrix-vector product of the two-level Schwarz preconditioner in (22) with any given vector eh1=MV1rh1e_{h_{1}}=M_{V}^{-1}r_{h_{1}} is obtained by the following three steps:

{eh11/3=Mh11rh1,eh12/3=eh11/3+01Mh0110(rh1Ah1eh11/3),eh1=eh12/3+Mh11(rh1Ah1eh12/3).\left\{\begin{array}[]{ll}\displaystyle{e_{h_{1}}^{1/3}=M^{-1}_{h_{1}}r_{h_{1}}},\\ \\ \displaystyle{e_{h_{1}}^{2/3}=e_{h_{1}}^{1/3}+\mathcal{I}^{1}_{0}M^{-1}_{h_{0}}\mathcal{I}^{0}_{1}(r_{h_{1}}-A_{h_{1}}e_{h_{1}}^{1/3}),}\\ \\ \displaystyle{e_{h_{1}}=e_{h_{1}}^{2/3}+M^{-1}_{h_{1}}(r_{h_{1}}-A_{h_{1}}e_{h_{1}}^{2/3}).}\end{array}\right. (23)

There are some modified versions of the V-cycle two-level Schwarz preconditioner that can be used for parallel computing. If we set μ1=0\mu_{1}=0 and ν1=1\nu_{1}=1 in (22), then we obtain the left-Kaskade Schwarz method defined by

MleftKas1=01Mh0110+Mh11Mh11Ah101Mh0110.M_{left-Kas}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}+M^{-1}_{h_{1}}-M^{-1}_{h_{1}}A_{h_{1}}\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}. (24)

And the other modification to the original method is the right-Kaskade Schwarz preconditioner with μ1=1\mu_{1}=1 and ν1=0\nu_{1}=0 in (22) as follow:

MrightKas1=Mh11+01Mh011001Mh0110Ah1Mh11.M_{right-Kas}^{-1}=M^{-1}_{h_{1}}+\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}-\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}A_{h_{1}}M^{-1}_{h_{1}}. (25)

Moreover, we can define other two hybrid versions of two-level Schwarz algorithms. The first two-level method is the pure additive two-level Schwarz preconditioner as

Madditve1=01Mh0110+Mh11,M_{additve}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}+M^{-1}_{h_{1}}, (26)

and the coarse-level only type two-level Schwarz preconditioner

Mcoarse1=01Mh0110.M_{coarse}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}. (27)

The motivation of the above two-level preconditioners, including, additively or multiplicatively, coarse preconditioners to an existing fine mesh preconditioner, is to make the overall solver scalable with respect to the number of processors or the number of subdomains. Hence, some numerical results will be shown later to compare the performance of these two-level preconditioners.

Remark 2

In the multilevel Schwarz preconditioners, the use of a coarse level helps the exchange of information, and the following linear problem defined on the coarsest mesh needs to be solved

Ah0Xh0=bh0,A_{h_{0}}X_{h_{0}}=b_{h_{0}}, (28)

with the use of the one-level additive Schwarz preconditioned GMRES, owing to the face that it is too large and expensive for some direct methods in the applications of large-scale simulations. When an iterative method is used for solving the linear system on the coarsest level (28), the overall preconditioner is an iterative procedure, and the preconditioner changes from iteration to iteration. Hence, when the multilevel Schwarz method is applied, a flexible version of GMRES (fGMRES, [25]) is used for the solution of the outer linear system (16).

Remark 3

When the classical additive Schwarz preconditioner (17) is applied to symmetric positive definite systems arising from the discretization of elliptical problems, then the condition number κ\kappa of the preconditioned linear system satisfies κC(1+H/δ)/H2\kappa\leq C(1+H/\delta)/{H^{2}} for the one-level method and κC(1+H/δ)\kappa\leq C(1+H/\delta) for the two-level method, where the parameter CC is independent of δ\delta, HH, and hh, see the references [28, 29]. However, these condition number estimates can neither be applied to the restricted additive Schwarz preconditioners, nor adapted to the family of non-elliptic systems like our model problem, and thereby there are very little theoretical literatures on the convergence of multilevel restricted Schwarz preconditioners for this system. The scalability tests in this paper will provide more understanding of restricted type domain decomposition methods for the hyperbolic problem.

3.2.2 Selection of interpolation operators

Algorithm 1 provides a general framework for choosing the coarse-to-fine mesh ratio, the restriction and interpolation operators, and the solvers for the smoothing and coarse-grid correction. And in this among them, choosing the right restriction and interpolation operators at each level is very important for the overall performance of the preconditioner in terms of the trade-off between rate of convergence and the cost of each iteration. Generally speaking, in most cases the overall solver makes a profit of the addition of coarse preconditioners with the decrease of the total number of linear iterations. However, for some classes of important problems such as reservoir simulation, we found that the number of iterations does not decrease as expected, if the operators are not chosen properly. After some attempts, we figure out the source of this phenomenon from the cell-centered spatial discretization. When the cell-centered scheme is involved, it happens that a coarse mesh point does not coincide with any fine mesh point, as shown in Figure 2. Hence, in the rest of this paper, we focus on the some strategies for the selection of interpolation operators, i.e, the first-order, second-order and third-order interpolation schemes.

Figure 2: Partitions for a 2D spatial domain with the coarse-to-fine mesh ratio 1:2.

Let Ω\Omega be the computational domain covered with Nx×NyN_{x}\times N_{y} mesh cells. Then we consider (x,y)(x,y) is a point at the position of the rectangular subdivision [xi,xi+1]×[yj,yj+1][x_{i},x_{i+1}]\times[y_{j},y_{j+1}] with i=1,,Nxi=1,\dots,N_{x}, j=1,,Nyj=1,\dots,N_{y}, and f(x,y)f(x,y) is the interpolation point value on this point. The nearest neighbor interpolation operator, which approximate the interpolation point data of the nearest node according to the shortest distance between the interpolation point and the sample point in space, is a first-order method defined by

f(x,y)=l=01m=01f(xi+l,yj+m)Wi+l,j+m,f(x,y)=\sum_{l=0}^{1}\sum_{m=0}^{1}f\left(x_{i+l},y_{j+m}\right)W_{i+l,j+m}, (29)

where f(xj+l,yk+m)f(x_{j+l},y_{k+m}) is the interpolation point value located at the point (xj+l,yk+m)(x_{j+l},y_{k+m}), and Wi+l,j+mW_{i+l,j+m} is a weight defined as

Wi+l,j+m={1,if(x,y)(xi+l,yj+m)is minimum,0,others.W_{i+l,j+m}=\left\{\begin{aligned} 1,\quad&\mbox{if}~{}\|(x,y)-(x_{i+l},y_{j+m})\|~{}\mbox{is minimum},\\ 0,\quad&\mbox{others}.\end{aligned}\right.

The bilinear interpolation operator, which utilizes the weighted average of the nearest neighboring values f(xi+l,yj+m)f(x_{i+l},y_{j+m}) to approximately generate a interpolation point value. Due to the point (x,y)(x,y) being in the subdivision [xi,xi+1]×[yj,yj+1][x_{i},x_{i+1}]\times[y_{j},y_{j+1}], is a second-order interpolation technique defined as follows:

f(x,y)=l=01m=01f(xi+l,yj+m)Wi+lWj+m,f(x,y)=\sum_{l=0}^{1}\sum_{m=0}^{1}f\left(x_{i+l},y_{j+m}\right)W_{i+l}W_{j+m}, (30)

where the weight for the bilinear interpolation scheme is defined as:

{Wi+l=(2xxixi+1xi+1xi)l+xi+1xxi+1xi,Wj+m=(2yyjyj+1yj+1yj)m+yj+1yyj+1yj.\left\{\begin{aligned} W_{i+l}&=\left(\frac{2x-x_{i}-x_{i+1}}{x_{i+1}-x_{i}}\right)l+\frac{x_{i+1}-x}{x_{i+1}-x_{i}},\\ W_{j+m}&=\left(\frac{2y-y_{j}-y_{j+1}}{y_{j+1}-y_{j}}\right)m+\frac{y_{j+1}-y}{y_{j+1}-y_{j}}.\end{aligned}\right.

In contrast to the bilinear interpolation, which only takes 4 ponits (2×2)(2\times 2) into account, the bicubic interpolation use 16 points (4×4)(4\times 4) and is a third-order interpolation scheme defined as follows:

f(x,y)=l=12m=12f(xi+l,yj+m)W(xxi+lxi+1xi)W(yyj+myj+1yj),f(x,y)=\sum_{l=-1}^{2}\sum_{m=-1}^{2}f\left(x_{i+l},y_{j+m}\right)W\left(\frac{x-x_{i+l}}{x_{i+1}-x_{i}}\right)W\left(\frac{y-y_{j+m}}{y_{j+1}-y_{j}}\right), (31)

where the weight for the cubic convolution interpolation is defined as

W(s)={32|s|352|s|2+1,0<|s|<1,12|s|3+52|s|24|s|+2,1<|s|<2, 0,2<|s|.W(s)=\left\{\begin{aligned} &\frac{3}{2}|s|^{3}-\frac{5}{2}|s|^{2}+1,\quad&0<|s|<1,\\ &-\frac{1}{2}|s|^{3}+\frac{5}{2}|s|^{2}-4|s|+2,\quad&1<|s|<2,\\ &\;0,\quad&2<|s|.\end{aligned}\right.

We remark that the two dimensional computational domain is used in the above description only for the ease of demonstration.

4 Numerical experiments

In this section, we implement the proposed algorithm described in the previous sections using the open-source Portable, Extensible Toolkit for Scientific computation (PETSc) [1], which is built on the top of Message Passing Interface (MPI), and investigate the numerical behavior and parallel performance of the newly proposed fully implicit solver with a variety of test cases.

4.1 Robustness and efficiency of the solver

The focus of the subsection is on the robustness and efficiency of the proposed algorithm for both standard benchmarks and realistic problems in highly heterogeneous media. Unless otherwise specified, the values of physical parameters used in the test cases are set as follows: ϕ=0.2\phi=0.2, μ=11.067×103cp\mu=11.067\times 10^{-3}\;\mbox{cp}, R=8.3147295J/(molK)R=8.3147295\;\mbox{J}/(\mbox{mol}\cdot\mbox{K}), W=16×103kg/molW=16\times 10^{-3}\;\mbox{kg/mol}, T=298KT=298\;\mbox{K}, pc=4.604×106pap_{c}=4.604\times 10^{6}\;\mbox{pa}, Tc=190.58KT_{c}=190.58\;\mbox{K}, and Tb=111.67KT_{b}=111.67\;\mbox{K}.

We first present results from a 2D test case (denoted as Case-1) in a horizontal layer with the permeability tensor 𝐊\mathbf{K} in (1) as follows:

𝐊=[kxx00kyy].\mathbf{K}=\left[\begin{array}[]{cc}k_{xx}&0\\ 0&k_{yy}\\ \end{array}\right].

With the media being horizontal, we neglect the effect of the gravity. In the configuration, the distribution of permeabilities includes two domains with different isotropy, as shown in Figure 3. The computational domain is 100 meters long and 100 meters wide. In the simulation, we assume that the left and right boundaries of the domain are impermeable. Then we flood the system by gas from the top to the bottom, i.e., we set pin=10p_{in}=10 atm at the top boundary and set pout=1p_{out}=1 atm at the right boundary. And there is no injection/extraction inside the domain. Compared with the previous example, in this test case there is an HH-shape zone and the value of permeabilities has a huge jump inside and outside the zone, which brings about even greater challenges to the fully implicit solver. In the test, the simulation is performed on a 512×512512\times 512 mesh, the time step size is fixed to Δt=0.1\Delta t=0.1 day, and the simulation is stopped at 1010 day. Figure 3 also illustrates the contour plots of the pressure. Table 1 shows the performance of the proposed fully implicit solver with respect to different permeability configurations. It is clearly seen that the simulation spends more computing time when the isotropic medium is used, which attributes to the increase in nonlinearity of the problem that affects the number of linear iterations.

Refer to caption
(a) Isotropic case
Refer to caption
(b) Anisotropic case
Refer to caption
(c) Isotropic case
Refer to caption
(d) Anisotropic case
Figure 3: Contour plots of the solution under the heterogeneous isotropic and anisotropic mediums for Case-1.
Table 1: A comparison of the fully implicit solver with different permeability configurations for Case-1.
Medium type Isotropic case Anisotropic case
Average nonlinear iterations 2.2 2.2
Average linear iterations 54.7 41.5
Execution time (second) 172.0 151.1

In the following, the experiment is conducted to simulate some 3D test cases. The focus is on the flow model with the medium being highly heterogeneous and isotropic, which significantly increases the nonlinearity of the system and imposes a severe challenge on the fully implicit solver. We first consider a 3D domain with dimension Ω=(0,100m)3\Omega=(0,100\;\mbox{m})^{3}, in which the permeabilities of the porous medium are random, denoted as Case-2. The random distribution of permeability with the range [3.1,14426.2] is generated by a geostatistical model using the open source toolbox MRST [19], as shown in Figure 4. In the test, two different flow configurations are used for the injection boundary, i.e., the fixed injection pressure and the fixed injection rate. In the fixed pressure configuration, the flow with the pressure of 10 atm is injected at the part of the left-hand side x=0m,y[0,10m],z[0,10m]x=0\;\mbox{m},y\in[0,10\;\mbox{m}],z\in[0,10\;\mbox{m}]; while, for the case of the fixed injection rate, the flow is injected from the same sub-boundary with the Darcy’s velocity 𝐯=5×104m/s\mathbf{v}=5\times 10^{-4}~{}\mbox{m/s}. The fluid flows from the right-hand side of the domain with a fixed pressure pout=1p_{out}=1 atm, and no-flow boundary conditions are imposed on the other boundaries of the domain. The initial pressure of p0=1p_{0}=1 atm is specified in the whole domain and the parameters required for this example are consistent with the previous examples.

Refer to caption
Figure 4: Permeability field for Case-2. In the figure, we use the logarithmic scale for the contour plot of the permeability.

In this case, the domain is highly heterogeneous, leading to the increase in nonlinearity of the problem that is challenging for the numerical techniques. Figure 5 shows the plots of the pressure profiles for Case-2 at different times under the two flow configurations. It is demonstrated from the figures that the proposed approach successfully resolves the different stages of the simulation, and the flow is close to reach the breakthrough at the time t=5t=5 days. In Table 2, we present the history of the values of nonlinear and linear iterations and the total execution time for the proposed solver at different times. The simulation is carried out on a 128×128×128128\times 128\times 128 mesh and the time step size is Δt=0.1\Delta t=0.1 day. It can be observed from Table 2 that the number of nonlinear and linear iterations has barely changed, while the total execution time increases constantly with the advance of time as expected, which displays the robustness and effectiveness of the implicit solver to handle the highly variety of physical parameters in the model problem.

Refer to caption
(a) t=1 day
Refer to caption
(b) t=2 day
Refer to caption
(c) t=5 day
Refer to caption
(d) t=1 day
Refer to caption
(e) t=2 day
Refer to caption
(f) t=5 day
Figure 5: A comparison of the pressure profiles at different times for Case-2 with the fixed pressure (first row) or velocity (second row) boundary.
Table 2: A comparison of the fully implicit solver at different times for Case-2 with the fixed pressure or velocity boundary.
Boundary Type End of the simulation (day) t=1t=1 t=2t=2 t=3t=3 t=5
Fixed pressure Average nonlinear iterations 3.0 2.9 2.9 2.9
Average linear iterations 22.6 23.8 24.3 24.5
Execution time (second) 645.3 1095.9 1545.2 2455.6
Fixed velocity Average nonlinear iterations 2.8 2.7 2.7 2.6
Average linear iterations 24.8 26.3 27.1 27.9
Execution time (second) 519.5 965.4 1325.1 2142.9

In the all of the above test cases, the porosity ϕ\phi in the model problem (1) is assumed to a constant. In the next 3D test case (Case-3) of this subsection, we import the porosity and the permeability from the Tenth SPE Comparative Project (SPE10) [10] as an example of a realistic realization with geological and petrophysical properties, in which the porosity and the permeability are capable of variation with the change of the position. It is a classic and challenging benchmark problem for reservoir due to highly heterogeneous permeabilities and porosities. As shown in Figure 6, the permeability is characterized by variations of more than six orders of magnitude and is ranged from 6.65×1046.65\times 10^{-4} to 2×1042\times 10^{4}, and the porosity scale ranges from 0 to 0.5. The 3D domain dimensions are 1200 ft long ×\times 2200 ft wide ×\times 170 ft thick. In the test, the boundary condition for pressure on the left-hand side of the domain is uniformly imposed to pin=10p_{in}=10 atm and on the opposite side is imposed to pout=1p_{out}=1 atm. No-flow boundary condition is set to other boundary of the domain. Other used parameters are the same as the previous case.

Refer to caption
(a) log10log_{10}-permeability field
Refer to caption
(b) Porosity field
Figure 6: Permeability and porosity fields for the SPE10 benchmark.

In Figure 7, we display the distributions for the pressure when the simulation is finished at different time points t=1,3,6,10t=1,3,6,10 days. In the simulation, the mesh is 60×220×8560\times 220\times 85 and the time step size is fixed to 0.10.1 day. The results shown in Figure 7 demonstrate that the proposed approach successfully resolves the rapid and abrupt evolution of the simulation at different stages, while keeping the solver in a robust and efficient way. Finally, we analyze the behavior of the proposed fully-implicit method when the time step size Δt\Delta t is changed. In the test, we again run the SPE-10 model on a fixed 60×220×8560\times 220\times 85 mesh. The simulation is stopped at t=3t=3 year. The results on the average numbers of nonlinear and linear iterations as well as the total compute time are summarized in Table 3. The results in the table clearly indicate that the combination of nonlinear and linear iterations works well for even very large value of time steps. The implicit approach converges for all time steps and is unconditionally stable. In addition to that, we also notice that, as the time step size Δt\Delta t decreases, the average number of nonlinear and linear iterations become smaller, whereas the total computing time increases. This behavior is somehow expected for the fully implicit approach in a variety of applications [2, 16, 32, 35, 36].

Table 3: The effect of different time step sizes in the fully implicit solver for SPE10.
Time step size Δt\Delta t 0.06 0.075 0.1 0.2 0.3
Number of time steps 50 40 30 15 10
Average nonlinear iterations 3.5 3.6 3.8 4.4 4.5
Average linear iterations 45.4 49.3 52.1 67.5 80.0
Execution time (second) 1424.2 1191.7 951.4 579.8 408.8
Refer to caption
(a) t=1 day
Refer to caption
(b) t=3 day
Refer to caption
(c) t=6 day
Refer to caption
(d) t=10 day
Figure 7: A comparison of the pressure profiles at different times for the SPE10 benchmark.

4.2 Performance of Schwarz preconditioners

In this subsection, we focus on the parallel performance of the proposed fully implicit solver with respect to the one-level or multilevel Schwarz preconditioner by Case-1, Case-2, and a new 3D test case. In the new 3D case, denoted as Case-4, the computational domain is Ω=(0,100m)3\Omega=(0,100\;\mbox{m})^{3} with the medium being heterogeneous and isotropic, and the distribution of permeability in each layer z[25m,75m]z\in[25\;\mbox{m},75\;\mbox{m}] is the same as the distribution of Case-1. We flood the system by gas from the behind face (x[0,100m]x\in[0,100\;\mbox{m}], y=100my=100\;\mbox{m}, z[0,100m]z\in[0,100\;\mbox{m}]) to the front face, i.e., we set pin=10p_{in}=10 atm at the behind face and set pout=1p_{out}=1 atm at the front face. And there is no injection/extraction inside the domain.

In the test, we use the following stopping parameters and notations, unless specified otherwise. The relative and absolute tolerances in (15) for Newton iterations are set to 10610^{-6} and 101010^{-10}, respectively. The linear systems are solved by the one-level or two-level Schwarz preconditioned GMRES method with absolute and relative tolerances of 10810^{-8} and 10510^{-5} in (14), except for the coarse solve of the two-level preconditioner, for which we use 2×1012\times 10^{-1} as the relative stopping tolerance. The restart value of the GMRES method is fixed as 30. In the tables of the following tests, the symbol “NpN_{p}” denotes the number of processor cores that is the same as the number of subdomains, “N. It” stands for the average number of Newton iterations per time step, “L. It” the average number of the one-level or two-level Schwarz preconditioned GMRES iterations per Newton iteration, and “Time” the total computing time in seconds.

4.2.1 One-level Schwarz preconditioners

Under the framework of one-level Schwarz preconditioners, we study the performance of the fully implicit solver, when different types of the additive Schwarz preconditioners are employed and several important parameters such as the subdomain solver and the overlapping size are taken into consideration.

We first look at the influence of subdomain solvers. In this test, we focus on the classical AS preconditioner (17) and fix the overlapping factor to δ=1\delta=1. The ILU factorization with different levels of fill-in and the full LU factorization are considered for the subdomain solvers. The simulation of Case-1 is applied to a fixed 512×512512\times 512 mesh using Np=16N_{p}=16 processors. The time step size is 0.1 day, and we run the 2D case for the first 20 time steps. The simulation of Case-4 is applied to a fixed 128×128×128128\times 128\times 128 mesh by using Np=64N_{p}=64 processors. The time step size is 0.1 day, and we run this case for the first 10 time steps. The numerical results are summarized in Table 4. We observe from the table that the number of nonlinear iterations is not sensitive to the choice of subdomain solvers. The linear system converges with less iterations when the LU factorization method is used as the subdomain solver; while the ILU approach is beneficial to the AS preconditioner in terms of the computing time, especially for the 3D test case.

Table 4: The impact of different subdomain solvers in the one-level Schwarz preconditioner.
Case Case-1 Case-4
Solver N. It L. It Time N. It L. It Time
LU 3.0 52.9 37.1 4.2 32.9 954.8
ILU(0) 3.0 99.1 34.3 4.3 161.7 282.4
ILU(1) 3.0 75.7 29.9 4.3 98.0 253.0
ILU(2) 3.0 73.9 32.1 4.3 76.1 239.6
ILU(4) 3.0 66.2 33.8 4.3 51.0 245.8

Moreover, in Table 5, we also compare the impact of different subdomain solvers for Case-2, when the fixed pressure and velocity boundaries are imposed for the flow configurations. In the test, the simulation is carried out on a 128×128×128128\times 128\times 128 mesh and the time step size is fixed to Δt=0.1\Delta t=0.1 day by using Np=64N_{p}=64 processors. It is clearly illustrated from the results that the choice of ILU(2) is still optimal for the 3D flow problem under the random permeability case with different boundary conditions, when the flow undergoes different stages in the complex simulation as shown in Figure 5.

Table 5: The impact of different subdomain solvers in the one-level Schwarz preconditioner for Case-2 with the fixed pressure boundary (denoted as “Model-A”) and the fixed velocity boundary (denoted as “Model-B”).
Solver t=1 t=2 t=3 t=5
Model-A N. It L. It Time N. It L. It Time N. It L. It Time N. It L. It Time
LU 3.0 33.0 685.2 2.9 36.6 1167.0 2.9 38.7 1652.7 2.9 41.2 2625.9
ILU(0) 3.0 161.5 435.3 2.9 183.0 760.1 2.9 195.3 1086.8 2.9 208.3 1742.7
ILU(1) 3.0 100.6 415.5 2.9 110.3 715.9 2.9 116.4 1015.4 2.9 123.6 1616.7
ILU(2) 3.0 80.0 403.4 2.9 88.9 695.0 2.9 93.3 991.1 2.9 98.4 1572.5
ILU(4) 3.0 52.4 417.8 2.9 58.8 725.1 2.9 62.1 1030.5 2.9 65.7 1638.9
Model-B N. It L. It Time N. It L. It Time N. It L. It Time N. It L. It Time
LU 2.8 39.3 563.7 2.7 42.9 1046.9 2.7 45.2 1438.8 2.6 47.7 2335.0
ILU(0) 2.8 178.5 366.0 2.7 190.4 679.2 2.7 201.8 934.4 2.6 214.9 1518.6
ILU(1) 2.8 113.9 349.6 2.7 124.0 641.3 2.7 131.6 873.7 2.6 138.4 1408.2
ILU(2) 2.8 87.8 340.2 2.7 95.7 626.8 2.7 101.2 859.0 2.6 105.0 1363.8
ILU(4) 2.8 67.3 351.8 2.7 74.1 650.5 2.7 78.5 895.2 2.6 81.6 1455.1

We then perform test with the 2D and 3D test cases by varying the combination of the overlapping factor δ\delta and the Schwarz preconditioner types, i.e., the classical-AS (17), the left-RAS (18), and the right-RAS (19) preconditioners. Based on the above observations, we take the choice of the subdomain solver with ILU(1) for the 2D test case and ILU(2) for the 3D test case. The numbers of nonlinear and linear iterations together with the execution time are illustrated in Table 6. The results in the table suggest that the more robust combination is the left-RAS preconditioner with the overlapping size δ=1\delta=1 for the compromise between the linear iteration and the total computing time. We remark that, when the overlapping size δ=0\delta=0, these preconditioners degenerates into the block-Jacobi preconditioner.

Table 6: The impact of different one-level Schwarz preconditioners with several overlapping sizes.
Case Case-1 Case-2 Case-4
Preconditioner δ\delta N. It L. It Time N. It L. It Time N. It L. It Time
Classical-AS 0 3.0 80.4 31.5 2.8 73.9 321.6 4.2 53.0 211.9
1 3.0 75.7 29.9 2.8 87.8 340.2 4.3 76.1 239.6
2 3.0 76.9 31.8 2.8 99.1 352.7 4.3 92.6 286.7
3 3.0 76.8 32.5 2.8 103.4 368.2 4.3 93.8 326.6
Left-RAS 1 3.0 55.8 28.3 2.8 43.7 306.2 4.2 37.5 205.0
2 3.0 51.9 28.8 2.8 41.6 312.8 4.3 34.0 218.0
3 3.0 49.4 28.9 2.8 40.8 322.5 4.3 33.2 232.3
Right-RAS 1 3.0 55.6 28.2 2.8 44.5 307.5 4.3 37.1 207.3
2 3.0 51.4 28.0 2.8 42.0 311.4 4.3 34.3 215.8
3 3.0 48.7 28.0 2.8 41.2 326.2 4.3 33.4 223.4

4.2.2 Two-level Schwarz preconditioners

For the reservoir simulation with high accuracy, the supercomputer with a large number of processors is a must, and therefore the scalabilities of the algorithm with respect to the number of processors are critically important. As introduced in Section 3, in the one-level Schwarz preconditioner, the average number of linear iterations per Newton iteration grows with the number of processors NpN_{p}, resulting in the deterioration of the implicit solver. It is clear that some stabilization is needed, which is achieved by the two-level method with the use of a coarse mesh and the interpolation operators explained in Section 3. The performance of the two-level method depends heavy on the two linear solvers defined on the coarse and fine meshes. Here, we restrict ourselves within the framework of Schwarz preconditioned GMRES methods, i.e., we refer to these iterative methods as smoothers on the each levels.

There are several assembly techniques available to construct a hybrid two-level Schwarz preconditioner by composing the one-level additive Schwarz preconditioner with a coarse-level preconditioner in a multiplicative or additive manner. Choosing the right type of two-level Schwarz preconditioners is very important for the overall performance of the preconditioner. A large number of numerical experiments is often necessary to identify the right selection. As introduced in Section 3, in the study we investigate the pure-coarse (27), the additive (26), the left-Kaskade (24), the right-Kaskade (25), and the V-cycle type (23) two-level Schwarz preconditioners. For each numerical case, the the overlapping sizes on the fine and coarse levels are fixed to δf=1\delta_{f}=1 and δc=1\delta_{c}=1, respectively. The subdomain solvers on the fine and coarse levels is solved by the LU factorization. In the test, we again run the 2D model with a fixed time step size 0.10.1 day on a 512×512512\times 512 mesh and run Case-4 with a fixed time step size 0.20.2 day on a 128×128×128128\times 128\times 128 mesh, and the simulation is finished at t=2t=2 and 11 days, respectively. For the simulation of Case-4, the flow problem with the fixed velocity model is solved on a 128×128×128128\times 128\times 128 mesh, and the computation is ended at t=1t=1 day with Δt=0.1\Delta t=0.1 day. The coarse-to-fine mesh ratio is used to 22 in each direction. In Table 7, we report the performance of the proposed two-level Schwarz preconditioners with respect to different interpolation operators that includes the first-order scheme (29), the second-order scheme (30), and the third-order scheme (31). Below we list the observations made from the results.

  • (a)

    We know that the effectiveness of the Schwarz preconditioner relies on its ability to mimic the spectrum of the linear operator and at the same time is relatively cheap to apply. We see that using the high order schemes in the construction of the Schwarz preconditioner provides less linear iteration counts. Moreover, when compared with the Schwarz method with high order methods, the low order approach is more attractive in the terms of the execution time, owing to its lower bandwidth and a less number of nonzeros in the sparse matrix. The results in the table suggest that the second order scheme is a suitable choice for compromise between the iterations and the total computing time.

  • (b)

    The best choices for some of the options in the multilevel Schwarz preconditioner are problem dependent. For the implicit solution of subsurface flows problems, from Table 7, we can see that the additive or Kaskade type two-level Schwarz preconditioners is exacerbated by a larger number of outer linear iterations, when compared with the V-cycle approach. The pure coarse version of the two-level approach performs considerably worse than the hybrid methods. Hence, our experiments suggest that there is a benefit to include both pre- and post-swipes of the one-level preconditioning for the simulation of model problems, especially for the 3D test cases.

Table 7: The impact of different two-level Schwarz preconditioners with three interpolation operators. In the table, the symbol “–” denotes the divergence of the solver caused by the failure of linear iterations.
Case Type First-order Second-order Third-order
N. It L. It Time N. It L. It Time N. It L. It Time
Case-1 Coarse
Additive 3.0 114.6 233.9 3.0 71.3 207.3 3.0 45.9 734.7
Left-Kaskade 3.0 71.1 204.2 3.0 34.1 163.9 3.0 8.05 565.2
Right-Kaskade 3.0 75.0 194.8 3.0 37.7 157.8 3.0 9.03 565.5
V-Cycle 3.0 10.0 129.6 3.0 9.4 129.1 3.0 5.4 579.3
Case-2 Coarse
Additive 2.8 87.4 371.0 2.8 58.3 349.4 2.8 42.0 5186.7
Left-Kaskade 2.8 31.3 325.7 2.8 20.6 312.4 2.8 14.7 4921.4
Right-Kaskade 2.8 34.5 326.6 2.8 21.8 315.9 2.8 15.1 4940.3
V-Cycle 2.8 14.9 314.2 2.8 12.6 308.8 2.8 5.8 4770.9
Case-4 Coarse
Additive 4.0 79.9 453.2 4.0 44.5 402.1 4.0 36.6 6114.6
Left-Kaskade 4.0 13.4 5763.8
Right-Kaskade 4.0 14.8 5308.8
V-Cycle 4.0 11.4 374.2 4.0 10.3 372.8 4.0 5.1 5156.5

For the two-level preconditioner, the size of the coarse mesh has a strong impact on the efficiency and robustness of the method. It is clear that using a relatively fine coarse mesh gains a stronger two-level Schwarz preconditioner, and therefore it can help reduce the total number of linear iterations. On the other hand, finer coarse meshes generates plenty of amount of memory and cache, leading to the increase of the total compute time. An important implementation detail to consider in designing the two-level method is to balance the effects of preconditioning and the computing time of the coarse solve. To understand the impact of different coarse meshes on the convergence of the algorithm, we show the results with different coarse mesh sizes with respect to different interpolation operators for the 2D and 3D test cases. In the test, the fine meshes for the 2D and 3D problems are 512×512512\times 512 and 128×128×128128\times 128\times 128, respectively. As shown in Table 8, we observe from the table that: (a) for the first-order scheme, the bad quality of the second coarse mesh does lead to a large increase of the number of iterations; (b) with the help of higher order schemes, the linear iterations increase slowly with growth of coarse-to-fine mesh ratios. This implies that a higher order scheme does a good job on preconditioning the fine mesh problem, since the second coarse mesh even with worse quality is able to keep the whole algorithm efficient in terms of the number of iterations.

Table 8: The impact of coarse-to-fine mesh ratios for the V-Cycle two-level Schwarz preconditioner.
Case Mesh ratio First-order Second-order Third-order
N. It L. It Time N. It L. It Time N. It L. It Time
Case-1 2 3.0 10.0 129.6 3.0 9.4 129.1 3.0 5.4 579.3
4 3.0 18.8 123.0 3.0 16.9 120.9 3.0 13.3 553.1
8 3.0 26.8 130.1 3.0 19.2 120.0 3.0 14.1 535.5
16 3.0 36.0 140.5 3.0 19.5 118.5 3.0 14.2 532.8
Case-2 2 2.8 14.9 314.2 2.8 12.6 308.8 2.8 5.8 4770.9
4 2.8 16.2 310.7 2.8 13.1 303.1 2.8 8.9 4741.6
8 2.8 18.9 308.5 2.8 13.7 299.6 2.8 9.4 4725.3
16 2.8 30.4 326.2 2.8 15.8 302.4 2.8 11.3 4709.2
Case-4 2 4.0 11.4 374.2 4.0 10.3 372.8 4.0 5.1 5156.5
4 4.0 17.7 372.1 4.0 10.6 368.4 4.0 9.5 5106.9
8 4.0 18.8 370.2 4.0 11.0 354.4 4.0 10.2 5091.3
16 4.0 39.8 415.7 4.0 12.7 356.3 4.0 11.6 5051.4

4.3 Parallel scalability

Achieving good parallel scalability is important in parallel computing, especially when solving large-scale problems with many processors. Hence, in the following we focus on the parallel performance of the proposed fully implicit method with one-level and two-level restricted additive Schwarz preconditioners. Again, the 2D and 3D test problems descried in subsectionn 4.2 are used for the scalability simulations. The numerical tests are carried out on the Tianhe-2 supercomputer. The computing nodes of Tianhe-2 are interconnected via a proprietary high performance network, and there are two 12-core Intel Ivy Bridge Xeon CPUs and 24 GB local memory in each node. In the numerical experiments, we use all 24 CPU cores in each node and assign one subdomain to each core.

Table 9: Strong scalability with different number of processors NpN_{p}.
Case Mesh NpN_{p} One-level Two-level
N. It L. It Time N. It L. It Time
Case-1 8192×81928192\times 8192 256 7.2 65.4 564.3 7.2 18.3 460.7
512 7.2 95.0 291.5 7.2 18.9 187.6
1024 7.2 96.3 136.6 7.2 19.6 90.0
2048 7.2 142.1 82.1 7.2 20.4 46.1
4096 7.2 144.4 44.9 7.2 21.1 26.3
8192 7.2 214.2 34.6 7.2 24.0 18.8
Case-4 256×256×256256\times 256\times 256 512 7.8 82.7 775.7 7.8 31.0 735.1
1024 7.8 97.7 250.1 7.8 35.2 231.5
2048 7.8 109.8 97.6 7.8 37.9 76.2
4096 7.8 122.9 68.7 7.8 43.2 37.6
8192 7.8 147.0 29.9 7.8 35.0 21.8

The strong scalability (denoted by SpeedupSpeedup) and the parallel efficiency (denoted by EfE_{f}) are respectively defined as follows:

Speedup=T(Nmin)T(Np),Ef=T(Nmin)×NminT(Np)×Np,Speedup=\displaystyle\frac{T(N_{min})}{T(N_{p})},~{}E_{f}=\displaystyle\frac{T(N_{min})\times N_{min}}{T(N_{p})\times N_{p}},

where NminN_{min} denotes the smallest processors number of the compassion, T(Np)T(N_{p}) denotes computational time with NpN_{p} processors. In the strong scalability test of the 2D problem, we use a fixed mesh 8192×81928192\times 8192, and also fixed time step sizes t=0.05\triangle t=0.05 and 0.01 days, in which the largest simulation consists of 8192×8192=67,108,8648192\times 8192=67,108,864 degrees of freedom. Moreover, we investigate the scalability of the the proposed fully implicit solver by using the fixed mesh 256×256×256256\times 256\times 256, also a fixed time step size t=0.5\triangle t=0.5 day for Case-4. The number of processors is changed from 256256 to 81928192. Table 9 shows the number of nonlinear and linear iterations as well as the computing time with respect to the number of processors, and Figure 8 provides the compute time and SpeedupSpeedup curve for the strong scalability. The table clearly indicates that the number of Newton iterations remains to be independent of the number of processors, and the number of linear iterations depends on the preconditioner employed in the solver. For the one-level solver, the number of linear iterations suffers as the number of processors increases. While, with the help of the coarse mesh, the number of linear iterations for the two-level preconditioner is kept to a low level as the number of processors increases. Moreover, to investigate the strong scalability of the proposed method with respect to the heterogeneity property in the reservoir, we again use the Case-2 benchmark, and compare the number of nonlinear and linear iterations and the total computing time at t=1,2,3,5t=1,2,3,5 days, respectively. As shown in Table 10, a good strong scalability is also achieved for the random case test under longer simulations.

Refer to caption
(a) Case-1
Refer to caption
(b) Case-4
Figure 8: Strong scalability and parallel efficiency results with different number of processors.
Table 10: Strong scalability with different number of processors NpN_{p} for Case-2.
Level NpN_{p} t=1 t=2 t=3 t=5
N. It L. It Time N. It L. It Time N. It L. It Time N. It L. It Time
One 64 2.8 44.5 307.5 2.7 52.1 588.6 2.7 58.7 801.4 2.6 63.5 1275.7
128 2.8 53.7 125.2 2.7 64.8 245.2 2.7 69.3 353.2 2.6 75.2 583.1
256 2.8 63.1 57.8 2.7 71.6 110.9 2.7 80.1 169.6 2.6 84.9 252.7
512 2.8 72.6 28.3 2.7 81.2 52.6 2.7 87.7 80.2 2.6 92.5 134.9
1024 2.8 84.9 18.9 2.7 94.5 37.2 2.7 99.4 58.5 2.6 104.7 96.3
2048 2.8 92.5 12.4 2.7 106.7 28.5 2.7 112.8 39.1 2.6 118.5 62.5
Two 64 2.8 18.8 283.6 2.7 20.2 554.7 2.7 22.6 772.0 2.6 24.3 1236.8
128 2.8 21.5 104.5 2.7 24.5 218.1 2.7 27.3 312.4 2.6 29.7 545.9
256 2.8 25.8 42.6 2.7 28.9 91.5 2.7 30.6 137.9 2.6 33.5 227.5
512 2.8 29.4 22.8 2.7 31.7 46.5 2.7 34.3 74.1 2.6 36.4 118.4
1024 2.8 32.3 15.7 2.7 35.4 32.9 2.7 36.8 51.8 2.6 38.2 89.2
2048 2.8 37.6 11.0 2.7 39.2 25.3 2.7 40.2 35.7 2.6 42.5 58.3

The weak scalability is used to examine how the execution time varies with the number of processors when the problem size per processor is fixed. In the weak scaling test, we start with a small 2048×20482048\times 2048 mesh with the number of processors Np=64N_{p}=64 and end up with a large 16384×1638416384\times 16384 mesh (268,435,456268,435,456 degrees of freedom) using up to 40964096 processor cores for the 2D test case. Also, we further test our algorithms in terms of the weak scalability starting with a 96×96×9696\times 96\times 96 mesh and Np=216N_{p}=216 for Case-4. We refine the mesh and increase the number of processors simultaneously to keep the number of unknowns per processor as a constant. Table 11 presents the results of weak scaling tests, which are run with fixed time step sizes t=0.01\triangle t=0.01 and 0.5 days respectively, and are stopped after 55 implicit time steps, i.e., the simulation are terminated at t=0.05t=0.05 and 2.52.5 days. We observe that, with the increase of the number of processors form 6464 to 40964096 and 216216 to 68596859 for the 2D and 3D test problems, a reasonably good weak scaling performance is obtained, especially for the two-level restricted Schwarz approach, which indicates that the proposed solver has a good weak scalability for this range of processor counts.

Table 11: Weak scalability with different number of processors.
Case Mesh NpN_{p} One-level Two-level
N. It L. It Time N. It L. It Time
Case-1 2048×20482048\times 2048 64 3.8 26.4 37.4 3.8 12.7 35.2
4096×40964096\times 4096 256 4.8 46.4 61.1 4.8 18.0 54.3
6144×61446144\times 6144 576 6.2 69.5 95.1 6.2 14.4 67.2
8192×81928192\times 8192 1024 7.2 96.3 136.6 7.2 19.6 90.0
12288×1228812288\times 12288 2304 9.8 173.9 316.8 9.8 17.8 131.8
16384×1638416384\times 16384 4096 12.4 294.7 607.8 12.4 17.0 197.1
Case-4 96×96×9696\times 96\times 96 216 3.8 40.6 12.5 3.8 16.9 10.7
128×128×128128\times 128\times 128 512 4.2 56.3 15.0 4.2 15.9 12.2
192×192×192192\times 192\times 192 1728 5.8 90.3 27.6 5.8 35.9 22.5
224×224×224224\times 224\times 224 2744 6.8 107.7 35.2 6.8 38.2 28.1
256×256×256256\times 256\times 256 4096 7.8 122.9 68.7 7.8 43.2 37.6
304×304×304304\times 304\times 304 6859 9.4 147.4 85.9 9.4 44.7 54.6

In summary, observing from the above tables and the figures, we highlight that, compared with the one-level method, the two-level restricted additive Schwarz method results in a very sharp reduction in the number of linear iterations and therefore brings about a good reduction in compute time. Hence, the two-level method is much more effective and scalable than the one-level approach in terms of the strong and weak scalabilities.

5 Conclusions

The simulation of subsurface flows with high resolution solutions is of paramount importance in reservoir simulation. In this work, we have presented a parallel fully implicit framework based on multilevel restricted Schwarz preconditioners for subsurface flow simulations with Peng-Robinson equation of state. The proposed framework can get rid of the restriction of the time step size, and is flexible and allows us to construct different type of multilevel preconditioners, based on plenitudinous choices for additively or multiplicatively strategies, interpolation and restriction operators. After experimenting with many different overlapping Schwarz type preconditioners, we found that the class of V-cycle-type restricted Schwarz methods based on the second order scheme is extremely beneficial for the problems under investigation. Numerical experiments also showed that the proposed algorithms and simulators are robust and scalable for the large-scale solution of some benchmarks as well as realistic problems with highly heterogeneous permeabilities in petroleum reservoirs.

Acknowledgment

The authors would like to express their appreciations to the anonymous reviewers for the invaluable comments that have greatly improved the quality of the manuscript.

References

  • [1] S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L.C. McInnes, B.F. Smith, H. Zhang, PETSc Users Manual, Argonne National Laboratory, 2019.
  • [2] Q.M. Bui, H.C. Elman, J.D. Moulton, Algebraic multigrid preconditioners for multiphase flow in porous media, SIAM J. Sci. Comput. 39 (2017) S662–S680.
  • [3] Q.M. Bui, L. Wang, D. Osei-Kuffuor, Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions, Adv. Water Resour. 114 (2018) 19–28.
  • [4] X.-C. Cai, M. Dryja, M. Sarkis, Restricted additive Schwarz preconditioners with harmonic overlap for symmetric positive definite linear systems, SIAM J. Numer. Anal. 41 (2003) 1209–1231.
  • [5] X.-C. Cai, W.D. Gropp, D. E. Keyes, R. G. Melvin, D. P. Young, Parallel Newton–Krylov–Schwarz algorithms for the transonic full potential equation, SIAM J. Sci. Comput. 19 (1998) 246–265.
  • [6] X.-C. Cai, M. Sarkis, A restricted additive Schwarz preconditioner for general sparse linear systems, SIAM J. Sci. Comput. 21 (1999) 792–797.
  • [7] X.-C. Cai, D.E. Keyes, Nonlinearly preconditioned inexact Newton algorithms, SIAM J. Sci. Comput. 24 (2002)183–200.
  • [8] Z. Chen, G. Huan, Y. Ma, Computational Methods for Multiphase Flows in Porous Media, SIAM, Philadelphia, PA, 2006.
  • [9] Z. Chen, Reservoir Simulation: Mathematical Techniques in Oil Recovery, SIAM, Philadelphia, PA, 2007.
  • [10] M. Christie, M. Blunt, Tenth SPE comparative solution project: A comparison of upscaling techniques. In SPE Reservoir Simulation Symposium. Society of Petroleum Engineers. (2001) 308–317.
  • [11] C.N. Dawson, H. Klíe, M.F. Wheeler, C.S. Woodward, A parallel, implicit, cell-centered method for two-phase flow with a preconditioned Newton–Krylov solver, Computat. Geosci. 1 (1997) 215–249.
  • [12] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996.
  • [13] A. Firoozabadi, Thermodynamics of Hyrocarbon Reservoirs, McGraw-Hill, New York, 1999.
  • [14] J.B. Haga, H. Osnes, H.P. Langtangen, A parallel block preconditioner for large-scale poroelasticity with highly heterogeneous material parameters, Comput. Geosci. 16 (2012) 723–734.
  • [15] D.A. Knoll, D.E. Keyes, Jacobian-free Newton–Krylov methods: a survey of approaches and applications, J. Comput. Phys. 193 (2004) 357–397.
  • [16] F. Kong, X.-C. Cai, A highly scalable multilevel Schwarz method with boundary geometry preserving coarse spaces for 3D elasticity problems on domains with complex geometry, SIAM J. Sci. Comput. 38 (2016) C73–C95.
  • [17] S. Lacroix, Y.V. Vassilevski, J.A. Wheeler, M.F. Wheeler, Iterative solution methods for modeling multiphase flow in porous media fully implicitly, SIAM J. Sci. Comput. 25 (2003) 905–926.
  • [18] J.J. Lee, K.-A. Mardal, R. Winther, Parameter-robust discretization and preconditioning of biot’s consolidation model, SIAM J. Sci. Comput. 39 (2017) A1–A24.
  • [19] K. Lie, S. Krogstad, I. Ligaarden, J. Natvig, H. Nilsen, B. Skaflestad, Open-source MATLAB implementation of consistent discretisations on complex grids, Comput. Geosci. 16 (2012) 297–322.
  • [20] L. Liu, D.E. Keyes, R. Krause, A note on adaptive nonlinear preconditioning techniques, SIAM J. Sci. Comput. 40 (2018) A1171–A1186.
  • [21] Y. Liu, H. Yang, Z. Xie, P. Qin, R. Li, Parallel simulation of variably saturated soil water flows by fully implicit domain decomposition methods, J. Hydrol. 582 (2020) 124481.
  • [22] J.E.P. Monteagudo, A. Firoozabadi, Comparison of fully implicit and IMPES formulations for simulation of water injection in fractured and unfractured media, Int. J. Numer. Meth. Engng. 69 (2007) 698–728.
  • [23] E.E. Prudencio, X.-C. Cai, Parallel multilevel restricted Schwarz preconditioners with pollution removing for PDE-constrained optimization, SIAM J. Sci. Comput. 29 (2007) 964–985.
  • [24] C. Qiao, S. Wu, J. Xu, C.S. Zhang, Analytical decoupling techniques for fully implicit reservoir simulation. J. Comput. Phys. 336 (2017) 664–681.
  • [25] Y. Saad, Iterative Methods for Sparse Linear Systems. SIAM, Second ed., 2003.
  • [26] J.O. Skogestad, E. Keilegavlen, J.M. Nordbotten, Domain decomposition strategies for nonlinear flow problems in porous media, J. Comput. Phys. 234 (2013) 439–451.
  • [27] G. Singh , M.F. Wheeler, A space-time domain decomposition approach using enhanced velocity mixed finite element method, J. Comput. Phys. 374 (2018) 893–911.
  • [28] B. Smith, P. Bjørstad, W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, 1996.
  • [29] A. Toselli, O. Widlund, Domain Decomposition Methods-Algorithms and Theory, Springer, Berlin, 2005.
  • [30] J.R. Wallis, Incomplete gaussian elimination as a preconditioning for generalized conjugate gradient acceleration, in SPE Reservoir Simulation Symposium, Society of Petroleum Engineers, 1983.
  • [31] J.R. Wallis, R.P. Kendall, L.E. Little, Constrained residual acceleration of conjugate residual methods, in SPE Reservoir Simulation Symposium, Society of Petroleum Engineers, 1985.
  • [32] K. Wang, H. Liu, Z. Chen, A scalable parallel black oil simulator on distributed memory parallel computers, J. Comput. Phys. 301 (2015) 19–34.
  • [33] J.A. White, N. Castelletto, H.A. Tchelepi, Block-partitioned solvers for coupled poromechanics: A unified framework, Comput. Methods Appl. Mech. Engrg. 303 (2016) 55–74.
  • [34] H. Yang, F.-N. Hwang, X.-C. Cai, Nonlinear preconditioning techniques for full-space Lagrange-Newton solution of PDE-constrained optimization problems, SIAM J. Sci. Comput. 38 (2016) A2756–A2778.
  • [35] H. Yang, Y. Li, S. Sun, Nonlinearly preconditioned constraint-preserving algorithms for subsurface three-phase flow with capillarity, Comput. Methods Appl. Mech. Engrg. 367 (2020) 113140.
  • [36] H. Yang, S. Sun, Y. Li, C. Yang, A scalable fully implicit framework for reservoir simulation on parallel computers, Comput. Methods Appl. Mech. Engrg. 330 (2018) 334–350.
  • [37] H. Yang, S. Sun, Y. Li, C. Yang, A fully implicit constraint-preserving simulator for the black oil model of petroleum reservoirs, J. Comput. Phys. 396 (2019) 347–363.
  • [38] H. Yang, S. Sun, C. Yang, Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media, J. Comput. Phys. 332 (2017) 1–20.
  • [39] H. Yang, C. Yang, S. Sun, Active-set reduced-space methods with nonlinear elimination for two-phase flow problems in porous media, SIAM J. Sci. Comput. 38 (2016) B593–B618.