Parallel multilevel restricted Schwarz preconditioners for implicit simulation of subsurface flows with Peng-Robinson equation of state

Rui Li [email protected] Haijian Yang [email protected] School of Mathematics, Hunan University, Changsha, Hunan 410082, PR China Chao Yang [email protected] School of Mathematical Sciences, Peking University, Beijing 100871, PR China

Abstract

Parallel algorithms and simulators with good scalabilities are particularly important for large-scale reservoir simulations on modern supercomputers with a large number of processors. In this paper, we introduce and study a family of highly scalable multilevel restricted additive Schwarz (RAS) methods for the fully implicit solution of subsurface flows with Peng-Robinson equation of state in two and three dimensions. With the use of a second-order fully implicit scheme, the proposed simulator is unconditionally stable with the relaxation of the time step size by the stability condition. The investigation then focuses on the development of several types of multilevel overlapping additive Schwarz methods for the preconditioning of the resultant linear system arising from the inexact Newton iteration, and some fast solver technologies are presented for the assurance of the multilevel approach efficiency and scalability. We numerically show that the proposed fully implicit framework is highly efficient for solving both standard benchmarks as well as realistic problems with several hundreds of millions of unknowns and scalable to 8192 processors on the Tianhe-2 supercomputer.

keywords:

Reservoir simulation , Fully implicit method , Multilevel method , Restricted Schwarz preconditioners , Parallel computing

^†^†journal: J. Comput. Phys.

1 Introduction

Simulation of subsurface fluid flows in porous media is currently an important topic of interest in many applications, such as hydrology and groundwater flow, oil and gas reservoir simulation, CO₂ sequestration, and waste management [8, 13]. The extensive growing demand on accurate modeling subsurface systems has produced persistent requirements on the understanding of many different physical processes, including multiphase fluid flow and transport, fluid phase behavior with sophisticated equation of state, and geomechanical deformations. To simulate these processes, one needs to solve a set of coupled, nonlinear, time-dependent partial differential equations (PDEs) with complicated nonlinear behaviors arising from the complexity of the geological media and reservoir properties. Additional computational challenge comes from the involving of Peng-Robinson equation of state, which governs the equilibrium distribution of fluid and has a remarkable influence on the high variation of the Darcy velocity. Due to their high computational complexity, numerically approximating subsurface phenomena with high resolution is critical to the reservoir engineering for accurate predictions of costly projects. Hence, parallel reservoir simulators equipped with robust and scalable solvers on high performance computing platforms are crucial to achieve efficient simulations of this intricate problem [9].

In terms of the numerical solution of partial differential equations, the fully implicit method [2, 3, 17, 20, 24, 26, 32, 36] is a widely preferred approach to solve various problems arising from the discretization of subsurface fluid flows. The fully implicit scheme is unconditionally stable and can relax the stability requirement of the Courant-Friedrichs-Lewy (CFL) condition. It therefore provides a consistent and robust way to simulate the subsurface fluid flow problem in long-term and extreme-scale computations. In particular, a great advantage of the fully implicit algorithm is that the corresponding nonlinear equations are implicitly solved in a monolithic way. This characteristic feature strengthens its potential to allow the addition of more physics and the introduction of more equations without changing much of the simulator framework, which greatly expands the scope of application of the fully implicit approach. In spite of being stable for arbitrary large time steps, when a fully implicit scheme is applied, one must solve a large, sparse, and ill-conditioned linear system at each nonlinear iteration. It remains challenging and important to design robust and scalable solvers for the large scale simulations on high performance computing platforms. In this study, we employ the framework of Newton–Krylov algorithms [5, 7, 11, 15, 21, 34] to guarantee the nonlinear consistency, and mainly focus our efforts on designing an efficient preconditioning strategy to substantially reduce the condition number of the linear system.

In reservoir simulations, some effective preconditioning techniques, such as block preconditioners [14, 18, 33], the Constrained Residual Pressure (CPR) preconditioning approach [30, 31, 32], the domain decomposition methods [20, 26, 36, 37], and the algebraic or geometric multigrid algorithms [2, 3, 24, 32], are flexible and capable of addressing the inherent ill-conditioning of a complicated physics system, and hence have received increasingly more attentions in recent years. The focus of this paper is on the domain decomposition approach by which we propose a family of one-level or multilevel restricted additive Schwarz preconditioners. The original overlapping additive Schwarz (AS) method was introduced for the solution of symmetric positive definite elliptic finite element problems, and was later extended to many other nonlinear or linear systems [6, 29]. In the AS method, it follows the divide-and-conquer technique by recursively breaking down a problem into more sub-problems of the same or related type, and its communication only occurs between neighboring subdomains during the restriction and extension processes. Moreover, the additive Schwarz preconditioner does not require any splitting of the nonlinear system. Hence, it can serve as a basis of efficient algorithms that are naturally suitable for massively parallel computing. In particular, the approach can be combined with certain variants of restriction operators in a robust and efficient way, such as the so-called restricted additive Schwarz (RAS) method proposed by Xiao-Chuan Cai et al. in [4, 6]. Hence, it can significantly improve the physical feasibility of the solver and substantial reduce the total computing time, and has proven to be very efficient in a variety of applications [16, 23, 26, 32, 36, 38]. Recently, the overlapping restricted additive Schwarz method in conjunction with a variant of inexact Newton methods has been applied successfully to the two-phase flow problem [36] and the black oil model [32, 37]. It was demonstrated that the one-level Schwarz preconditioner is scalable in terms of the numbers of nonlinear and linear iterations, but not in terms of the total compute time when the number of processors becomes large, which means the requirement of the family of multilevel methods [16, 23, 28, 29].

In this work, to take advantage of modern supercomputers for large-scale reservoir simulations, we propose and develop the overlapping restricted additive Schwarz preconditioner into a general multilevel framework for solving discrete systems coming from fully implicit discretizations. We show that, with this new feature, our proposed approach is more robust and efficient for problems with highly heterogeneous media, and can scale optimally to a number of processors on the Tianhe-2 supercomputer. We would like to pointing out that designing a good strategy for the multilevel approach is both time-consuming and challenging, as it requires extensive knowledge of the general-purpose framework of interests, such as the selection of coarse-to-fine mesh ratios, the restriction and interpolation operators, and the solvers for the smoothing and coarse-grid correction. To the best of our knowledge, very limited research has been conducted to apply the multilevel restricted additive Schwarz preconditioning technique for the fully implicit solution of petroleum reservoirs.

The outline of the paper is as follows. In Section 2, we introduce the governing equations of subsurface flows with Peng-Robinson equation of state, followed by the corresponding fully implicit discretization. In Section 3, a family of multilevel restricted additive Schwarz preconditioners, as the most important part of the fully implicit solver, is presented in detail. We show numerical results for some 2D and 3D realistic benchmark problems in Section 4 to demonstrate the robustness and scalability of the proposed preconditioner. The paper is concluded in Section 5.

2 Mathematical model and discretizations

The problem of interest in this study is the compressible Darcy flow in a porous medium for the description of gas reservoir simulations [8, 9, 13]. Let $\Omega\in\mathbb{R}^{d}$ $(d=1,2,3)$ be the computational domain with the boundary $\Gamma=\partial\Omega$ , and $t_{f}$ denote the final simulated time. The mass balance equation with the Darcy law for the real gas fluid is defined by

\left\{\begin{array}[]{ll}\displaystyle\frac{\partial(\phi\rho)}{\partial{t}}+\nabla\cdot(\rho\mathbf{v})=q,~{}\mbox{in~{}}\Omega_{T}=\Omega\times(0,t_{f}],\\ \displaystyle\mathbf{v}=-\frac{\mathbf{K}}{\mu}(\nabla p-\rho g\nabla h),\\ \end{array}\right.

(1)

where $\phi$ is the porosity of the porous medium, $\mathbf{K}$ is the permeability tensor, $\mathbf{v}$ is the Darcy’s velocity, $\rho$ denotes the density, $\mu$ is the viscosity of the flow, $p$ is the pressure, and $q$ is the external mass flow rate. The Peng-Robinson (PR) equation of state (EOS) [13] is used to describe the density as a function of the composition, temperature and pressure:

\left\{\begin{array}[]{ll}\rho=cW,\\ c=\displaystyle\frac{p}{ZRT},\\ \end{array}\right.

(2)

where $W$ is the molecular weight, $T$ is the temperature, $R$ is the gas constant, and $Z$ is the compressibility factor of gas defined by

Z^{3}-(1-B)Z^{2}+(A-3B^{2}-2B)Z-(AB-B^{2}-B^{3})=0.

(3)

Moreover, $A$ and $B$ are the PR-EOS parameters defined as follows:

\displaystyle A=\frac{ap}{R^{2}T^{2}},~{}B=\frac{bp}{RT},

where the parameters $a$ and $b$ are modeled by imposing the criticality conditions

\left\{\begin{array}[]{ll}\displaystyle a=a(T)=0.45724\cdot\frac{R^{2}T_{c}^{2}}{p_{c}}\left(1+m\left(1-\sqrt{T_{r}}\right)\right)^{2},\\ \displaystyle b=0.0778\cdot\frac{RT_{c}}{p_{c}},\end{array}\right.

(4)

with $T_{c}$ ( $p_{c}$ ) being the specific pressure (temperature) in the critical state of the gas, and $T_{r}$ being the reduced temperature defined as ${T}/{T_{c}}$ . In addition, the parameter $m$ in (4) is a fitting formula of the acentric factor $w$ of the substance:

m=\left\{\begin{aligned} 0.37464+1.54226w-0.26992w^{2}\quad 0\leq w\leq 0.491,\\ 0.3796+1.485w-0.1644w^{2}+0.01667w^{3}\quad 0.491\leq w\leq 2.0.\\ \end{aligned}\right.

(5)

Here, the acentric factor $w$ is calculated by the following formula:

w=\frac{-ln(\frac{p_{c}}{1\;atm})-f^{(0)}(T_{br})}{f^{(1)}(T_{br})},

where $f^{0}(T_{br})$ and $f^{1}(T_{br})$ are given by

\left\{\begin{aligned} f^{0}(T_{br})&=\frac{-5.97616(1-T_{br})+1.29874(1-T_{br})^{\frac{3}{2}}-0.60394(1-T_{br})^{\frac{5}{2}}-1.06841(1-T_{br})^{5}}{T_{br}},\\ f^{1}(T_{br})&=\frac{-5.03365(1-T_{br})+1.11505(1-T_{br})^{\frac{3}{2}}-5.41217(1-T_{br})^{\frac{5}{2}}-7.46628(1-T_{br})^{5}}{T_{br}},\end{aligned}\right.

with the normal boiling point temperature $T_{b}$ and the reduced normal boiling point temperature defined as $T_{br}={T_{b}}/{T_{c}}$ .

In this study, we take the pressure $p$ as the primary variable. Suppose that the porosity $\phi$ is not changed with the time, then the equations of the mathematical model can be rewritten by substituting (2) and (3) into (1) as follows:

\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\frac{\partial p}{\partial t}-\nabla\cdot\left(\frac{\mathbf{K}p}{\mu Z}\left(\nabla p-\frac{pW}{ZRT}g\nabla h\right)\right)-\frac{qRT}{W}=0,

(6)

where the relations are given by

\left\{\begin{aligned} &\frac{\partial Z}{\partial p}=-\frac{b}{RTC}Z^{2}-\left(\frac{a}{R^{2}T^{2}C}-\frac{2b+6Bb}{RTC}\right)Z+\left(\frac{2abp}{R^{3}T^{3}C}-\frac{2Bb+3B^{2}b}{RTC}\right),\\ &C=3Z^{2}-2(1-B)Z+(A-2B-3B^{2}),\\ &Z^{3}-(1-B)Z^{2}+(A-2B-3B^{2})Z-(AB-B^{2}-B^{3})=0,\end{aligned}\right.

with the initial condition $p=p_{0}$ . Suppose the boundary of the computational domain $\Omega$ is composed of two parts $\partial\Omega=\Gamma^{D}+\Gamma^{N}$ with $\Gamma^{D}\cap\Gamma^{N}=\emptyset$ . The boundary conditions associated to the model problem (6) are

\left\{\begin{array}[]{ll}p=p^{D}&\mbox{on}\,\Gamma^{D},\\ \mathbf{v}\cdot\mathbf{n}=q^{N}&\mbox{on}\,\Gamma^{N},\\ \end{array}\right.

where n is the outward normal of the boundary $\partial\Omega$ . We remark that the compressibility factor $Z$ as an intermediate variable is obtained by solving the algebraic cubic equation (3) with the primary variable $p$ , see the references [8, 9] for the computation of $Z$ in details.

We employ a cell-centered finite difference (CCFD) method for the spatial discretization, for which the details can be found in [8, 9, 22, 39], and then a fully implicit scheme is applied for the time integration [2, 16, 32, 36]. For a given time-stepping sequence $0=t^{(1)}<t^{(2)}<...$ , define the time step size $\Delta t^{(l)}=t^{(l+1)}-t^{(l)}$ and use superscript $(l)$ to denote the discretized evaluation at time level $t=t^{(l)}$ . After spatially discretizing (6) by the CCFD scheme, we have a semi-discrete system

\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\frac{\partial p}{\partial t}+\mathcal{F}\left(p^{(l+1)}\right)=0,

(7)

where $\mathcal{F}\left(p^{(l+1)}\right)$ is the operator of the spatial discretization at time level $t=t^{(l+1)}$ by using the CCFD method. For the purpose of comparison, we implement both the first-order backward Euler scheme (BDF-1) and second-order backward differentiation formula (BDF-2) for the temporal integration of (7). For the BDF-1 scheme, the fully discretized system reads

\left[\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\right]^{(l+1)}\frac{p^{(l+1)}-p^{(l)}}{\triangle t^{(l)}}+\mathcal{F}\left(p^{(l+1)}\right)=0,

(8)

where $p^{(l)}$ is the evaluation of $p$ at the $l^{th}$ time step with a time step size $\triangle t^{(l)}$ . And the fully discretized system for the BDF-2 method is

\left[\frac{\phi p}{Z}\left(\frac{1}{p}-\frac{1}{Z}\frac{\partial Z}{\partial p}\right)\right]^{(l+1)}\frac{3p^{(l+1)}-4p^{(l)}+p^{(l-1)}}{2\triangle t^{(l)}}+\mathcal{F}\left(p^{(l+1)}\right)=0.

(9)

Note that, for the BDF-2 scheme, the BDF-1 method is used at the initial time step.

Remark 1

The employed CCFD discretization for (6) on rectangular meshes can be viewed as the mixed finite element method with Raviart-Thomas basis functions of lowest order equipped with the trapezoidal quadrature rule [8, 9, 22]. We also would like to pointing out that the proposed solver technologies introduced in the following section can be extended to solve the systems arising from the other spatial discretization schemes.

3 Multilevel restricted Schwarz preconditioners

When the fully implicit method is applied, a system of nonlinear algebraic equations after the temporal and spatial discretizations,

F(X)=0,

(10)

is constructed and solved at each time step. In (10), the vector $F(X):\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ is a given nonlinear vector-valued function arising from the residuals function in (8) or (9) given by $F(X)=(F_{1},F_{2},\cdots,F_{n})^{T}$ with $F_{i}=F_{i}(X_{1},X_{2},\cdots,X_{n})^{T}$ , and $X=(X_{1},X_{2},\cdots,X_{n})^{T}$ . In this study, the nonlinear system (10) is solved by a parallel Newton-Krylov method with the family of domain decomposition type preconditioners [5, 7, 11, 15].

Let the initial guess for Newton iterations $X^{0}\in\mathbb{R}^{n}$ at current time step be the solution of the previous time step, then the new solution $X^{k+1}$ is updated by the current solution $X^{k}$ and a Newton correction vector $s^{k}$ as follows:

X^{k+1}=X^{k}+\lambda^{k}s^{k},\quad k=0,1,2,\cdots,

(11)

where $\lambda^{k}$ is the step length calculated by a cubic line search method [12] to satisfy

\begin{array}[]{ll}\left\|F(X^{k}+\lambda^{k}s^{k})\right\|\leq(1-\alpha\lambda^{k})\left\|F(X^{k})\right\|,\end{array}

with the parameter $\alpha$ being fixed to $10^{-4}$ in practice. And the Newton correction $s^{k}$ is obtained by solving the following Jacobian linear system

J_{k}s^{k}=-F(X^{k}),

(12)

with a Krylov subspace iteration method [25]. Here, $J_{k}=\nabla F(X^{k})$ is the Jacobian matrix obtained from the current solution $X^{k}$ . Since the corresponding linear system from the Newton iteration is large sparse and ill-conditioned, the family of one-level or multilevel restricted Schwarz preconditioners is taken into account to accelerate the convergence of the linear system, i.e., we solve the following right-preconditioned Jacobian system

J_{k}M_{k}^{-1}M_{k}s^{k}=-F(X^{k}),

(13)

with an overlapping Schwarz preconditioned Krylov subspace method. In practice, the Generalized Minimal RESidual (GMRES) algorithm as a Krylov subspace method is used for the linear solver.

The accuracy of approximation (13) is controlled by the relative and absolute tolerances $\eta^{r}_{k}$ and $\eta^{a}_{k}$ until the following convergence criterion is satisfied

\|J_{k}s^{k}+F(X^{k})\|\leq\min\{\eta^{r}_{k}\|F(X^{k})\|,\eta^{a}_{k}\}.

(14)

We set the stopping conditions for the Newton iteration as

||{F}(X^{k+1})||\leq\max\{\varepsilon_{r}||{F}(X^{0})||,\varepsilon_{a}\},

(15)

with the relative $\varepsilon_{r}$ and absolute $\varepsilon_{a}$ solver tolerances, respectively.

The most important component of a robust and scalable solver for solving the nonlinear or linear system is the choice of suitable preconditioners. In large-scale parallel computing, the additive Schwarz (AS) preconditioner, as a family of domain decomposition methods, can help to improve the convergence and meanwhile is beneficial to the scalability of the linear solver, which is the main focus of this paper.

3.1 One-level Schwarz preconditioners

Let $\Omega\subset\mathbb{R}^{d}$ , $d=1,2,3$ , be the computational domain on which the PDE system (6) are defined. Then a spatial discretization is performed with a mesh $\Omega_{h}$ of the characteristic size $h>0$ . To define the one-level Schwarz preconditioner, we first divide $\Omega_{h}$ into nonoverlapping subdomains $\Omega_{i}$ , $i=1,\cdots,N_{p}$ , and then expand each $\Omega_{i}$ to $\Omega_{i}^{\delta}$ , i.e., $\Omega_{i}\subset\Omega_{i}^{\delta}$ , to obtain the overlapping partition. Here, the parameter $\delta>0$ is the size of the overlap defined as the minimum distance between $\partial\Omega_{i}^{\delta}$ and $\partial\Omega_{i}$ , in the interior of $\Omega$ . For boundary subdomains we simply cut off the part outside $\Omega$ . We also denote $H>0$ as the characteristic diameter of $\Omega_{i}$ , as shown in the left panel of Figure 1.

Figure 1: A demonstrated partition of domain decompositions with geometry preserving coarse meshes. In the figure, the red solid lines indicate the partition of the 3D domain into

3\times 3\times 3

non-overlapping subdomains of size

H_{x}\times H_{y}\times H_{z}

, the dotted lines show the extended boundary of an overlapping subdomain with

\delta=2

. The left (right) panel of the figure denotes the partition with a fine (coarse) mesh size

h

(

2h

), respectively.

Let $N$ and $N_{i}$ denote the number of degrees of freedom associated to $\Omega$ and $\Omega_{j}^{\delta}$ , respectively. Let $A\in\mathbb{R}^{N\times N}$ be the Jacobian matrix of the linear system defined on a mesh $\Omega_{h}$

AX=b.

(16)

Then we can define the matrices ${R}^{\delta}_{i}$ and ${R}^{0}_{i}$ as the restriction operator from $\Omega_{h}$ to its overlapping and non-overlapping subdomains as follows: Its element $({R}^{\delta}_{i})_{l_{1},l_{2}}$ is either (a) an identity block, if the integer indices $1\leq l_{1}\leq N_{i}$ and $1\leq l_{2}\leq N$ are related to the same mesh point and this mesh point belongs to $\Omega_{j}^{\delta}$ , or (b) a zero block, otherwise. The multiplication of ${R}_{i}^{\delta}$ with a $N\times 1$ vector generates a smaller $N_{i}\times 1$ vector by discarding all elements corresponding to mesh points outside $\Omega_{i}^{\delta}$ . The matrix ${R}_{i}^{0}\in\mathbb{R}^{N_{i}\times N}$ is defined in a similar way, the only difference to the operator ${R}^{\delta}_{i}$ is that its application to a $N\times 1$ vector also zeros out all of those elements corresponding to mesh points on $\Omega_{i}^{\delta}\backslash\Omega_{i}$ . Then the classical one-level additive Schwarz (AS, [29]) preconditioner is defined as

\begin{array}[]{ll}{M}^{-1}_{\delta,\delta}=\sum\limits_{i=1}^{N_{p}}({R}^{\delta}_{i})^{T}{A}_{i}^{-1}{R}^{\delta}_{i}.\end{array}

(17)

with ${A}_{i}={R}^{\delta}_{i}{A}({R}^{\delta}_{i})^{T}$ and $N_{p}$ is the number of subdomains, which is the same as the number of processors. In addition to that, there are two modified approaches of the one-level additive Schwarz preconditioner that may have some potential advantages for parallel computing. The first version is the left restricted additive Schwarz (left-RAS, [6]) method defined by

\begin{array}[]{ll}{M}^{-1}_{0,\delta}=\sum\limits_{i=1}^{N_{p}}({R}^{0}_{i})^{T}{A}_{i}^{-1}{R}^{\delta}_{i}.\end{array}

(18)

and the other modification to the original method is the right restricted additive Schwarz (right-RAS, [4]) preconditioner as follow:

\begin{array}[]{ll}{M}^{-1}_{\delta,0}=\sum\limits_{i=1}^{N_{p}}({R}^{\delta}_{i})^{T}{A}_{i}^{-1}{R}^{0}_{i}.\end{array}

(19)

In the above preconditioners, we use a sparse LU factorization or incomplete LU (ILU) factorization method to solve the subdomain linear system corresponding to the matrix ${A}_{i}^{-1}$ . In the following, we will denote a Schwarz preconditioner simply by $M^{-1}_{h}$ defined on the mesh $\Omega_{h}$ with the characteristic size $h$ , when the distinction is not important.

3.2 Multilevel additive Schwarz preconditioners

To improve the scalability and robustness of the one-level additive Schwarz preconditioners, especially when a large number of processors is used, we employ the family of multilevel Schwarz preconditioners. Multilevel Schwarz preconditioners are obtained by combining the single level preconditioner $M^{-1}_{h}$ assigned to each level, as shown in Figure 1. For the description of multilevel Schwarz preconditioners [16, 23, 28], we use the index $j=0,1,...,L-1$ to designate any of the $L\geq 2$ levels. The meshes from coarse to fine are denoted by $\Omega_{h_{j}}$ , and the corresponding matrices and vectors are denoted $A_{h_{j}}$ and $X_{h_{j}}$ . Let us denote $I_{h_{j}}$ as the identity operator defined on the level $j$ , and the restriction operator from the level $j$ to the level $j-1$ be defined by

\mathcal{I}_{j}^{j-1}:\mathbb{R}^{N_{j}}\rightarrow\mathbb{R}^{N_{j-1}},

(20)

where $N_{j}$ and $N_{j-1}$ denote the number of degrees of freedom associated to $\Omega_{h_{j}}$ and $\Omega_{h_{j-1}}$ . Moreover,

\mathcal{I}_{j-1}^{j}:\mathbb{R}^{N_{j-1}}\rightarrow\mathbb{R}^{N_{j}}

(21)

is the interpolation operator from the level $j-1$ to the level $j$ .

For the convenience of introduction, we present the construction of the proposed multilevel Schwarz preconditioners from the view of the multigrid (MG) V-cycle algorithm [16, 23, 28, 29]. More precisely speaking, in this sense, at each level $j>0$ , the Schwarz preconditioned Richardson method works as the pre-smoother and post-smoother, i.e., $M^{-1}_{h_{j}}$ preconditioning the $\mu_{i}\geq 0$ presmoother iterations and $M^{-1}_{h_{j}}$ preconditioning the $\nu_{i}\geq 0$ postsmoother iterations. In the general multigrid V-cycle framework, let $X^{k}$ be the current solution for the linear system (16), the new solution is computed by the iteration $X^{k+1}=MG(b,L,X^{k})$ , as described in Algorithm 1.

Algorithm 1 Multigrid (MG) V-cycle algorithm.

1:procedure

X_{h_{j}}=MG

(

b_{h_{j}},j,X_{h_{j}}

)

2: if

j=0

then

3: Solve

A_{h_{0}}X_{h_{0}}=b_{h_{0}}

\triangleright

Coarsest correction

4: else

5: Smooth

\mu_{i}

times

A_{h_{j}}X_{h_{j}}=b_{h_{j}}

\triangleright

Presmoothing

(b_{h_{j}}-A_{h_{j}}X_{h_{j}})=(I_{h_{j}}-A_{h_{j}}M^{-1}_{h_{j}})^{\mu_{i}}(b_{h_{j}}-A_{h_{j}}X_{h_{j}})

;

b_{h_{j-1}}=\mathcal{I}_{j}^{j-1}(b_{h_{j}}-A_{h_{j}}X_{h_{j}})

;

\triangleright

Residual restriction

X_{h_{j-1}}=MG(b_{h_{j-1}},j-1,0)

;

\triangleright

Recursivity

X_{h_{j}}=X_{h_{j}}+\mathcal{I}_{j-1}^{j}X_{h_{j-1}}

;

\triangleright

Correction interpolation

10: Smooth

\nu_{i}

times

A_{h_{j}}X_{h_{j}}=b_{h_{j}}

\triangleright

Postsmoothing

11:

(b_{h_{j}}-A_{h_{j}}X_{h_{j}})=(I_{h_{j}}-A_{h_{j}}M^{-1}_{h_{j}})^{\nu_{i}}(b_{h_{j}}-A_{h_{j}}X_{h_{j}})

;

12: end if

13: return

X_{h_{j}}

14:end procedure

3.2.1 With application to the two-level case

In the following, we restrict ourselves to the two-level case, and use the geometry preserving coarse mesh that shares the same boundary geometry with the fine mesh, as shown in Figure 1. The V-cycle two-level Schwarz preconditioner $M_{V}^{-1}$ , i.e, $L=2$ in Algorithm 1, is constructed by combining the fine level $M^{-1}_{h_{1}}$ and the coarse level $M^{-1}_{h_{0}}$ preconditioners as follows:

M_{V}^{-1}=A_{h_{1}}^{-1}\left[I_{h_{1}}-(I_{h_{1}}-A_{h_{1}}M^{-1}_{h_{1}})^{\nu_{1}}(I_{h_{1}}-A_{h_{1}}\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0})(I_{h_{1}}-A_{h_{1}}M^{-1}_{h_{1}})^{\mu_{1}}\right].

(22)

More precisely, if the pre- and post-smoothing parameters are fixed to $\mu_{i}=1$ and $\nu_{i}=1$ , then the matrix-vector product of the two-level Schwarz preconditioner in (22) with any given vector $e_{h_{1}}=M_{V}^{-1}r_{h_{1}}$ is obtained by the following three steps:

\left\{\begin{array}[]{ll}\displaystyle{e_{h_{1}}^{1/3}=M^{-1}_{h_{1}}r_{h_{1}}},\\ \\ \displaystyle{e_{h_{1}}^{2/3}=e_{h_{1}}^{1/3}+\mathcal{I}^{1}_{0}M^{-1}_{h_{0}}\mathcal{I}^{0}_{1}(r_{h_{1}}-A_{h_{1}}e_{h_{1}}^{1/3}),}\\ \\ \displaystyle{e_{h_{1}}=e_{h_{1}}^{2/3}+M^{-1}_{h_{1}}(r_{h_{1}}-A_{h_{1}}e_{h_{1}}^{2/3}).}\end{array}\right.

(23)

There are some modified versions of the V-cycle two-level Schwarz preconditioner that can be used for parallel computing. If we set $\mu_{1}=0$ and $\nu_{1}=1$ in (22), then we obtain the left-Kaskade Schwarz method defined by

M_{left-Kas}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}+M^{-1}_{h_{1}}-M^{-1}_{h_{1}}A_{h_{1}}\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}.

(24)

And the other modification to the original method is the right-Kaskade Schwarz preconditioner with $\mu_{1}=1$ and $\nu_{1}=0$ in (22) as follow:

M_{right-Kas}^{-1}=M^{-1}_{h_{1}}+\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}-\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}A_{h_{1}}M^{-1}_{h_{1}}.

(25)

Moreover, we can define other two hybrid versions of two-level Schwarz algorithms. The first two-level method is the pure additive two-level Schwarz preconditioner as

M_{additve}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}+M^{-1}_{h_{1}},

(26)

and the coarse-level only type two-level Schwarz preconditioner

M_{coarse}^{-1}=\mathcal{I}_{0}^{1}M^{-1}_{h_{0}}\mathcal{I}_{1}^{0}.

(27)

The motivation of the above two-level preconditioners, including, additively or multiplicatively, coarse preconditioners to an existing fine mesh preconditioner, is to make the overall solver scalable with respect to the number of processors or the number of subdomains. Hence, some numerical results will be shown later to compare the performance of these two-level preconditioners.

Remark 2

In the multilevel Schwarz preconditioners, the use of a coarse level helps the exchange of information, and the following linear problem defined on the coarsest mesh needs to be solved

A_{h_{0}}X_{h_{0}}=b_{h_{0}},

(28)

with the use of the one-level additive Schwarz preconditioned GMRES, owing to the face that it is too large and expensive for some direct methods in the applications of large-scale simulations. When an iterative method is used for solving the linear system on the coarsest level (28), the overall preconditioner is an iterative procedure, and the preconditioner changes from iteration to iteration. Hence, when the multilevel Schwarz method is applied, a flexible version of GMRES (fGMRES, [25]) is used for the solution of the outer linear system (16).

Remark 3

When the classical additive Schwarz preconditioner (17) is applied to symmetric positive definite systems arising from the discretization of elliptical problems, then the condition number $\kappa$ of the preconditioned linear system satisfies $\kappa\leq C(1+H/\delta)/{H^{2}}$ for the one-level method and $\kappa\leq C(1+H/\delta)$ for the two-level method, where the parameter $C$ is independent of $\delta$ , $H$ , and $h$ , see the references [28, 29]. However, these condition number estimates can neither be applied to the restricted additive Schwarz preconditioners, nor adapted to the family of non-elliptic systems like our model problem, and thereby there are very little theoretical literatures on the convergence of multilevel restricted Schwarz preconditioners for this system. The scalability tests in this paper will provide more understanding of restricted type domain decomposition methods for the hyperbolic problem.

3.2.2 Selection of interpolation operators

Algorithm 1 provides a general framework for choosing the coarse-to-fine mesh ratio, the restriction and interpolation operators, and the solvers for the smoothing and coarse-grid correction. And in this among them, choosing the right restriction and interpolation operators at each level is very important for the overall performance of the preconditioner in terms of the trade-off between rate of convergence and the cost of each iteration. Generally speaking, in most cases the overall solver makes a profit of the addition of coarse preconditioners with the decrease of the total number of linear iterations. However, for some classes of important problems such as reservoir simulation, we found that the number of iterations does not decrease as expected, if the operators are not chosen properly. After some attempts, we figure out the source of this phenomenon from the cell-centered spatial discretization. When the cell-centered scheme is involved, it happens that a coarse mesh point does not coincide with any fine mesh point, as shown in Figure 2. Hence, in the rest of this paper, we focus on the some strategies for the selection of interpolation operators, i.e, the first-order, second-order and third-order interpolation schemes.

Figure 2: Partitions for a 2D spatial domain with the coarse-to-fine mesh ratio 1:2.

Let $\Omega$ be the computational domain covered with $N_{x}\times N_{y}$ mesh cells. Then we consider $(x,y)$ is a point at the position of the rectangular subdivision $[x_{i},x_{i+1}]\times[y_{j},y_{j+1}]$ with $i=1,\dots,N_{x}$ , $j=1,\dots,N_{y}$ , and $f(x,y)$ is the interpolation point value on this point. The nearest neighbor interpolation operator, which approximate the interpolation point data of the nearest node according to the shortest distance between the interpolation point and the sample point in space, is a first-order method defined by

f(x,y)=\sum_{l=0}^{1}\sum_{m=0}^{1}f\left(x_{i+l},y_{j+m}\right)W_{i+l,j+m},

(29)

where $f(x_{j+l},y_{k+m})$ is the interpolation point value located at the point $(x_{j+l},y_{k+m})$ , and $W_{i+l,j+m}$ is a weight defined as

W_{i+l,j+m}=\left\{\begin{aligned} 1,\quad&\mbox{if}~{}\|(x,y)-(x_{i+l},y_{j+m})\|~{}\mbox{is minimum},\\ 0,\quad&\mbox{others}.\end{aligned}\right.

The bilinear interpolation operator, which utilizes the weighted average of the nearest neighboring values $f(x_{i+l},y_{j+m})$ to approximately generate a interpolation point value. Due to the point $(x,y)$ being in the subdivision $[x_{i},x_{i+1}]\times[y_{j},y_{j+1}]$ , is a second-order interpolation technique defined as follows:

f(x,y)=\sum_{l=0}^{1}\sum_{m=0}^{1}f\left(x_{i+l},y_{j+m}\right)W_{i+l}W_{j+m},

(30)

where the weight for the bilinear interpolation scheme is defined as:

\left\{\begin{aligned} W_{i+l}&=\left(\frac{2x-x_{i}-x_{i+1}}{x_{i+1}-x_{i}}\right)l+\frac{x_{i+1}-x}{x_{i+1}-x_{i}},\\ W_{j+m}&=\left(\frac{2y-y_{j}-y_{j+1}}{y_{j+1}-y_{j}}\right)m+\frac{y_{j+1}-y}{y_{j+1}-y_{j}}.\end{aligned}\right.

In contrast to the bilinear interpolation, which only takes 4 ponits $(2\times 2)$ into account, the bicubic interpolation use 16 points $(4\times 4)$ and is a third-order interpolation scheme defined as follows:

f(x,y)=\sum_{l=-1}^{2}\sum_{m=-1}^{2}f\left(x_{i+l},y_{j+m}\right)W\left(\frac{x-x_{i+l}}{x_{i+1}-x_{i}}\right)W\left(\frac{y-y_{j+m}}{y_{j+1}-y_{j}}\right),

(31)

where the weight for the cubic convolution interpolation is defined as

W(s)=\left\{\begin{aligned} &\frac{3}{2}|s|^{3}-\frac{5}{2}|s|^{2}+1,\quad&0<|s|<1,\\ &-\frac{1}{2}|s|^{3}+\frac{5}{2}|s|^{2}-4|s|+2,\quad&1<|s|<2,\\ &\;0,\quad&2<|s|.\end{aligned}\right.

We remark that the two dimensional computational domain is used in the above description only for the ease of demonstration.

4 Numerical experiments

In this section, we implement the proposed algorithm described in the previous sections using the open-source Portable, Extensible Toolkit for Scientific computation (PETSc) [1], which is built on the top of Message Passing Interface (MPI), and investigate the numerical behavior and parallel performance of the newly proposed fully implicit solver with a variety of test cases.

4.1 Robustness and efficiency of the solver

The focus of the subsection is on the robustness and efficiency of the proposed algorithm for both standard benchmarks and realistic problems in highly heterogeneous media. Unless otherwise specified, the values of physical parameters used in the test cases are set as follows: $\phi=0.2$ , $\mu=11.067\times 10^{-3}\;\mbox{cp}$ , $R=8.3147295\;\mbox{J}/(\mbox{mol}\cdot\mbox{K})$ , $W=16\times 10^{-3}\;\mbox{kg/mol}$ , $T=298\;\mbox{K}$ , $p_{c}=4.604\times 10^{6}\;\mbox{pa}$ , $T_{c}=190.58\;\mbox{K}$ , and $T_{b}=111.67\;\mbox{K}$ .

We first present results from a 2D test case (denoted as Case-1) in a horizontal layer with the permeability tensor $\mathbf{K}$ in (1) as follows:

\mathbf{K}=\left[\begin{array}[]{cc}k_{xx}&0\\ 0&k_{yy}\\ \end{array}\right].

With the media being horizontal, we neglect the effect of the gravity. In the configuration, the distribution of permeabilities includes two domains with different isotropy, as shown in Figure 3. The computational domain is 100 meters long and 100 meters wide. In the simulation, we assume that the left and right boundaries of the domain are impermeable. Then we flood the system by gas from the top to the bottom, i.e., we set $p_{in}=10$ atm at the top boundary and set $p_{out}=1$ atm at the right boundary. And there is no injection/extraction inside the domain. Compared with the previous example, in this test case there is an $H$ -shape zone and the value of permeabilities has a huge jump inside and outside the zone, which brings about even greater challenges to the fully implicit solver. In the test, the simulation is performed on a $512\times 512$ mesh, the time step size is fixed to $\Delta t=0.1$ day, and the simulation is stopped at $10$ day. Figure 3 also illustrates the contour plots of the pressure. Table 1 shows the performance of the proposed fully implicit solver with respect to different permeability configurations. It is clearly seen that the simulation spends more computing time when the isotropic medium is used, which attributes to the increase in nonlinearity of the problem that affects the number of linear iterations.

Table 1: A comparison of the fully implicit solver with different permeability configurations for Case-1.

Medium type	Isotropic case	Anisotropic case
Average nonlinear iterations	2.2	2.2
Average linear iterations	54.7	41.5
Execution time (second)	172.0	151.1

In the following, the experiment is conducted to simulate some 3D test cases. The focus is on the flow model with the medium being highly heterogeneous and isotropic, which significantly increases the nonlinearity of the system and imposes a severe challenge on the fully implicit solver. We first consider a 3D domain with dimension $\Omega=(0,100\;\mbox{m})^{3}$ , in which the permeabilities of the porous medium are random, denoted as Case-2. The random distribution of permeability with the range [3.1,14426.2] is generated by a geostatistical model using the open source toolbox MRST [19], as shown in Figure 4. In the test, two different flow configurations are used for the injection boundary, i.e., the fixed injection pressure and the fixed injection rate. In the fixed pressure configuration, the flow with the pressure of 10 atm is injected at the part of the left-hand side $x=0\;\mbox{m},y\in[0,10\;\mbox{m}],z\in[0,10\;\mbox{m}]$ ; while, for the case of the fixed injection rate, the flow is injected from the same sub-boundary with the Darcy’s velocity $\mathbf{v}=5\times 10^{-4}~{}\mbox{m/s}$ . The fluid flows from the right-hand side of the domain with a fixed pressure $p_{out}=1$ atm, and no-flow boundary conditions are imposed on the other boundaries of the domain. The initial pressure of $p_{0}=1$ atm is specified in the whole domain and the parameters required for this example are consistent with the previous examples.

In this case, the domain is highly heterogeneous, leading to the increase in nonlinearity of the problem that is challenging for the numerical techniques. Figure 5 shows the plots of the pressure profiles for Case-2 at different times under the two flow configurations. It is demonstrated from the figures that the proposed approach successfully resolves the different stages of the simulation, and the flow is close to reach the breakthrough at the time $t=5$ days. In Table 2, we present the history of the values of nonlinear and linear iterations and the total execution time for the proposed solver at different times. The simulation is carried out on a $128\times 128\times 128$ mesh and the time step size is $\Delta t=0.1$ day. It can be observed from Table 2 that the number of nonlinear and linear iterations has barely changed, while the total execution time increases constantly with the advance of time as expected, which displays the robustness and effectiveness of the implicit solver to handle the highly variety of physical parameters in the model problem.

Table 2: A comparison of the fully implicit solver at different times for Case-2 with the fixed pressure or velocity boundary.

Boundary Type	End of the simulation (day)	$t=1$	$t=2$	$t=3$	t=5
Fixed pressure	Average nonlinear iterations	3.0	2.9	2.9	2.9
	Average linear iterations	22.6	23.8	24.3	24.5
	Execution time (second)	645.3	1095.9	1545.2	2455.6
Fixed velocity	Average nonlinear iterations	2.8	2.7	2.7	2.6
	Average linear iterations	24.8	26.3	27.1	27.9
	Execution time (second)	519.5	965.4	1325.1	2142.9

In the all of the above test cases, the porosity $\phi$ in the model problem (1) is assumed to a constant. In the next 3D test case (Case-3) of this subsection, we import the porosity and the permeability from the Tenth SPE Comparative Project (SPE10) [10] as an example of a realistic realization with geological and petrophysical properties, in which the porosity and the permeability are capable of variation with the change of the position. It is a classic and challenging benchmark problem for reservoir due to highly heterogeneous permeabilities and porosities. As shown in Figure 6, the permeability is characterized by variations of more than six orders of magnitude and is ranged from $6.65\times 10^{-4}$ to $2\times 10^{4}$ , and the porosity scale ranges from 0 to 0.5. The 3D domain dimensions are 1200 ft long $\times$ 2200 ft wide $\times$ 170 ft thick. In the test, the boundary condition for pressure on the left-hand side of the domain is uniformly imposed to $p_{in}=10$ atm and on the opposite side is imposed to $p_{out}=1$ atm. No-flow boundary condition is set to other boundary of the domain. Other used parameters are the same as the previous case.

In Figure 7, we display the distributions for the pressure when the simulation is finished at different time points $t=1,3,6,10$ days. In the simulation, the mesh is $60\times 220\times 85$ and the time step size is fixed to $0.1$ day. The results shown in Figure 7 demonstrate that the proposed approach successfully resolves the rapid and abrupt evolution of the simulation at different stages, while keeping the solver in a robust and efficient way. Finally, we analyze the behavior of the proposed fully-implicit method when the time step size $\Delta t$ is changed. In the test, we again run the SPE-10 model on a fixed $60\times 220\times 85$ mesh. The simulation is stopped at $t=3$ year. The results on the average numbers of nonlinear and linear iterations as well as the total compute time are summarized in Table 3. The results in the table clearly indicate that the combination of nonlinear and linear iterations works well for even very large value of time steps. The implicit approach converges for all time steps and is unconditionally stable. In addition to that, we also notice that, as the time step size $\Delta t$ decreases, the average number of nonlinear and linear iterations become smaller, whereas the total computing time increases. This behavior is somehow expected for the fully implicit approach in a variety of applications [2, 16, 32, 35, 36].

Table 3: The effect of different time step sizes in the fully implicit solver for SPE10.

Time step size $\Delta t$	0.06	0.075	0.1	0.2	0.3
Number of time steps	50	40	30	15	10
Average nonlinear iterations	3.5	3.6	3.8	4.4	4.5
Average linear iterations	45.4	49.3	52.1	67.5	80.0
Execution time (second)	1424.2	1191.7	951.4	579.8	408.8

4.2 Performance of Schwarz preconditioners

In this subsection, we focus on the parallel performance of the proposed fully implicit solver with respect to the one-level or multilevel Schwarz preconditioner by Case-1, Case-2, and a new 3D test case. In the new 3D case, denoted as Case-4, the computational domain is $\Omega=(0,100\;\mbox{m})^{3}$ with the medium being heterogeneous and isotropic, and the distribution of permeability in each layer $z\in[25\;\mbox{m},75\;\mbox{m}]$ is the same as the distribution of Case-1. We flood the system by gas from the behind face ( $x\in[0,100\;\mbox{m}]$ , $y=100\;\mbox{m}$ , $z\in[0,100\;\mbox{m}]$ ) to the front face, i.e., we set $p_{in}=10$ atm at the behind face and set $p_{out}=1$ atm at the front face. And there is no injection/extraction inside the domain.

In the test, we use the following stopping parameters and notations, unless specified otherwise. The relative and absolute tolerances in (15) for Newton iterations are set to $10^{-6}$ and $10^{-10}$ , respectively. The linear systems are solved by the one-level or two-level Schwarz preconditioned GMRES method with absolute and relative tolerances of $10^{-8}$ and $10^{-5}$ in (14), except for the coarse solve of the two-level preconditioner, for which we use $2\times 10^{-1}$ as the relative stopping tolerance. The restart value of the GMRES method is fixed as 30. In the tables of the following tests, the symbol “ $N_{p}$ ” denotes the number of processor cores that is the same as the number of subdomains, “N. It” stands for the average number of Newton iterations per time step, “L. It” the average number of the one-level or two-level Schwarz preconditioned GMRES iterations per Newton iteration, and “Time” the total computing time in seconds.

4.2.1 One-level Schwarz preconditioners

Under the framework of one-level Schwarz preconditioners, we study the performance of the fully implicit solver, when different types of the additive Schwarz preconditioners are employed and several important parameters such as the subdomain solver and the overlapping size are taken into consideration.

We first look at the influence of subdomain solvers. In this test, we focus on the classical AS preconditioner (17) and fix the overlapping factor to $\delta=1$ . The ILU factorization with different levels of fill-in and the full LU factorization are considered for the subdomain solvers. The simulation of Case-1 is applied to a fixed $512\times 512$ mesh using $N_{p}=16$ processors. The time step size is 0.1 day, and we run the 2D case for the first 20 time steps. The simulation of Case-4 is applied to a fixed $128\times 128\times 128$ mesh by using $N_{p}=64$ processors. The time step size is 0.1 day, and we run this case for the first 10 time steps. The numerical results are summarized in Table 4. We observe from the table that the number of nonlinear iterations is not sensitive to the choice of subdomain solvers. The linear system converges with less iterations when the LU factorization method is used as the subdomain solver; while the ILU approach is beneficial to the AS preconditioner in terms of the computing time, especially for the 3D test case.

Table 4: The impact of different subdomain solvers in the one-level Schwarz preconditioner.

Case	Case-1			Case-4
Solver	N. It	L. It	Time	N. It	L. It	Time
LU	3.0	52.9	37.1	4.2	32.9	954.8
ILU(0)	3.0	99.1	34.3	4.3	161.7	282.4
ILU(1)	3.0	75.7	29.9	4.3	98.0	253.0
ILU(2)	3.0	73.9	32.1	4.3	76.1	239.6
ILU(4)	3.0	66.2	33.8	4.3	51.0	245.8

Moreover, in Table 5, we also compare the impact of different subdomain solvers for Case-2, when the fixed pressure and velocity boundaries are imposed for the flow configurations. In the test, the simulation is carried out on a $128\times 128\times 128$ mesh and the time step size is fixed to $\Delta t=0.1$ day by using $N_{p}=64$ processors. It is clearly illustrated from the results that the choice of ILU(2) is still optimal for the 3D flow problem under the random permeability case with different boundary conditions, when the flow undergoes different stages in the complex simulation as shown in Figure 5.

Table 5: The impact of different subdomain solvers in the one-level Schwarz preconditioner for Case-2 with the fixed pressure boundary (denoted as “Model-A”) and the fixed velocity boundary (denoted as “Model-B”).

Solver	t=1			t=2			t=3			t=5
Model-A	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
LU	3.0	33.0	685.2	2.9	36.6	1167.0	2.9	38.7	1652.7	2.9	41.2	2625.9
ILU(0)	3.0	161.5	435.3	2.9	183.0	760.1	2.9	195.3	1086.8	2.9	208.3	1742.7
ILU(1)	3.0	100.6	415.5	2.9	110.3	715.9	2.9	116.4	1015.4	2.9	123.6	1616.7
ILU(2)	3.0	80.0	403.4	2.9	88.9	695.0	2.9	93.3	991.1	2.9	98.4	1572.5
ILU(4)	3.0	52.4	417.8	2.9	58.8	725.1	2.9	62.1	1030.5	2.9	65.7	1638.9
Model-B	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
LU	2.8	39.3	563.7	2.7	42.9	1046.9	2.7	45.2	1438.8	2.6	47.7	2335.0
ILU(0)	2.8	178.5	366.0	2.7	190.4	679.2	2.7	201.8	934.4	2.6	214.9	1518.6
ILU(1)	2.8	113.9	349.6	2.7	124.0	641.3	2.7	131.6	873.7	2.6	138.4	1408.2
ILU(2)	2.8	87.8	340.2	2.7	95.7	626.8	2.7	101.2	859.0	2.6	105.0	1363.8
ILU(4)	2.8	67.3	351.8	2.7	74.1	650.5	2.7	78.5	895.2	2.6	81.6	1455.1

We then perform test with the 2D and 3D test cases by varying the combination of the overlapping factor $\delta$ and the Schwarz preconditioner types, i.e., the classical-AS (17), the left-RAS (18), and the right-RAS (19) preconditioners. Based on the above observations, we take the choice of the subdomain solver with ILU(1) for the 2D test case and ILU(2) for the 3D test case. The numbers of nonlinear and linear iterations together with the execution time are illustrated in Table 6. The results in the table suggest that the more robust combination is the left-RAS preconditioner with the overlapping size $\delta=1$ for the compromise between the linear iteration and the total computing time. We remark that, when the overlapping size $\delta=0$ , these preconditioners degenerates into the block-Jacobi preconditioner.

Table 6: The impact of different one-level Schwarz preconditioners with several overlapping sizes.

Case		Case-1			Case-2			Case-4
Preconditioner	$\delta$	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
Classical-AS	0	3.0	80.4	31.5	2.8	73.9	321.6	4.2	53.0	211.9
	1	3.0	75.7	29.9	2.8	87.8	340.2	4.3	76.1	239.6
	2	3.0	76.9	31.8	2.8	99.1	352.7	4.3	92.6	286.7
	3	3.0	76.8	32.5	2.8	103.4	368.2	4.3	93.8	326.6
Left-RAS	1	3.0	55.8	28.3	2.8	43.7	306.2	4.2	37.5	205.0
	2	3.0	51.9	28.8	2.8	41.6	312.8	4.3	34.0	218.0
	3	3.0	49.4	28.9	2.8	40.8	322.5	4.3	33.2	232.3
Right-RAS	1	3.0	55.6	28.2	2.8	44.5	307.5	4.3	37.1	207.3
	2	3.0	51.4	28.0	2.8	42.0	311.4	4.3	34.3	215.8
	3	3.0	48.7	28.0	2.8	41.2	326.2	4.3	33.4	223.4

4.2.2 Two-level Schwarz preconditioners

For the reservoir simulation with high accuracy, the supercomputer with a large number of processors is a must, and therefore the scalabilities of the algorithm with respect to the number of processors are critically important. As introduced in Section 3, in the one-level Schwarz preconditioner, the average number of linear iterations per Newton iteration grows with the number of processors $N_{p}$ , resulting in the deterioration of the implicit solver. It is clear that some stabilization is needed, which is achieved by the two-level method with the use of a coarse mesh and the interpolation operators explained in Section 3. The performance of the two-level method depends heavy on the two linear solvers defined on the coarse and fine meshes. Here, we restrict ourselves within the framework of Schwarz preconditioned GMRES methods, i.e., we refer to these iterative methods as smoothers on the each levels.

There are several assembly techniques available to construct a hybrid two-level Schwarz preconditioner by composing the one-level additive Schwarz preconditioner with a coarse-level preconditioner in a multiplicative or additive manner. Choosing the right type of two-level Schwarz preconditioners is very important for the overall performance of the preconditioner. A large number of numerical experiments is often necessary to identify the right selection. As introduced in Section 3, in the study we investigate the pure-coarse (27), the additive (26), the left-Kaskade (24), the right-Kaskade (25), and the V-cycle type (23) two-level Schwarz preconditioners. For each numerical case, the the overlapping sizes on the fine and coarse levels are fixed to $\delta_{f}=1$ and $\delta_{c}=1$ , respectively. The subdomain solvers on the fine and coarse levels is solved by the LU factorization. In the test, we again run the 2D model with a fixed time step size $0.1$ day on a $512\times 512$ mesh and run Case-4 with a fixed time step size $0.2$ day on a $128\times 128\times 128$ mesh, and the simulation is finished at $t=2$ and $1$ days, respectively. For the simulation of Case-4, the flow problem with the fixed velocity model is solved on a $128\times 128\times 128$ mesh, and the computation is ended at $t=1$ day with $\Delta t=0.1$ day. The coarse-to-fine mesh ratio is used to $2$ in each direction. In Table 7, we report the performance of the proposed two-level Schwarz preconditioners with respect to different interpolation operators that includes the first-order scheme (29), the second-order scheme (30), and the third-order scheme (31). Below we list the observations made from the results.

(a)

We know that the effectiveness of the Schwarz preconditioner relies on its ability to mimic the spectrum of the linear operator and at the same time is relatively cheap to apply. We see that using the high order schemes in the construction of the Schwarz preconditioner provides less linear iteration counts. Moreover, when compared with the Schwarz method with high order methods, the low order approach is more attractive in the terms of the execution time, owing to its lower bandwidth and a less number of nonzeros in the sparse matrix. The results in the table suggest that the second order scheme is a suitable choice for compromise between the iterations and the total computing time.
(b)

The best choices for some of the options in the multilevel Schwarz preconditioner are problem dependent. For the implicit solution of subsurface flows problems, from Table 7, we can see that the additive or Kaskade type two-level Schwarz preconditioners is exacerbated by a larger number of outer linear iterations, when compared with the V-cycle approach. The pure coarse version of the two-level approach performs considerably worse than the hybrid methods. Hence, our experiments suggest that there is a benefit to include both pre- and post-swipes of the one-level preconditioning for the simulation of model problems, especially for the 3D test cases.

Table 7: The impact of different two-level Schwarz preconditioners with three interpolation operators. In the table, the symbol “–” denotes the divergence of the solver caused by the failure of linear iterations.

Case	Type	First-order			Second-order			Third-order
Case	Type	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
Case-1	Coarse	–	–	–	–	–	–	–	–	–
	Additive	3.0	114.6	233.9	3.0	71.3	207.3	3.0	45.9	734.7
	Left-Kaskade	3.0	71.1	204.2	3.0	34.1	163.9	3.0	8.05	565.2
	Right-Kaskade	3.0	75.0	194.8	3.0	37.7	157.8	3.0	9.03	565.5
	V-Cycle	3.0	10.0	129.6	3.0	9.4	129.1	3.0	5.4	579.3
Case-2	Coarse	–	–	–	–	–	–	–	–	–
	Additive	2.8	87.4	371.0	2.8	58.3	349.4	2.8	42.0	5186.7
	Left-Kaskade	2.8	31.3	325.7	2.8	20.6	312.4	2.8	14.7	4921.4
	Right-Kaskade	2.8	34.5	326.6	2.8	21.8	315.9	2.8	15.1	4940.3
	V-Cycle	2.8	14.9	314.2	2.8	12.6	308.8	2.8	5.8	4770.9
Case-4	Coarse	–	–	–	–	–	–	–	–	–
	Additive	4.0	79.9	453.2	4.0	44.5	402.1	4.0	36.6	6114.6
	Left-Kaskade	–	–	–	–	–	–	4.0	13.4	5763.8
	Right-Kaskade	–	–	–	–	–	–	4.0	14.8	5308.8
	V-Cycle	4.0	11.4	374.2	4.0	10.3	372.8	4.0	5.1	5156.5

For the two-level preconditioner, the size of the coarse mesh has a strong impact on the efficiency and robustness of the method. It is clear that using a relatively fine coarse mesh gains a stronger two-level Schwarz preconditioner, and therefore it can help reduce the total number of linear iterations. On the other hand, finer coarse meshes generates plenty of amount of memory and cache, leading to the increase of the total compute time. An important implementation detail to consider in designing the two-level method is to balance the effects of preconditioning and the computing time of the coarse solve. To understand the impact of different coarse meshes on the convergence of the algorithm, we show the results with different coarse mesh sizes with respect to different interpolation operators for the 2D and 3D test cases. In the test, the fine meshes for the 2D and 3D problems are $512\times 512$ and $128\times 128\times 128$ , respectively. As shown in Table 8, we observe from the table that: (a) for the first-order scheme, the bad quality of the second coarse mesh does lead to a large increase of the number of iterations; (b) with the help of higher order schemes, the linear iterations increase slowly with growth of coarse-to-fine mesh ratios. This implies that a higher order scheme does a good job on preconditioning the fine mesh problem, since the second coarse mesh even with worse quality is able to keep the whole algorithm efficient in terms of the number of iterations.

Table 8: The impact of coarse-to-fine mesh ratios for the V-Cycle two-level Schwarz preconditioner.

Case	Mesh ratio	First-order			Second-order			Third-order
Case	Mesh ratio	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
Case-1	2	3.0	10.0	129.6	3.0	9.4	129.1	3.0	5.4	579.3
	4	3.0	18.8	123.0	3.0	16.9	120.9	3.0	13.3	553.1
	8	3.0	26.8	130.1	3.0	19.2	120.0	3.0	14.1	535.5
	16	3.0	36.0	140.5	3.0	19.5	118.5	3.0	14.2	532.8
Case-2	2	2.8	14.9	314.2	2.8	12.6	308.8	2.8	5.8	4770.9
	4	2.8	16.2	310.7	2.8	13.1	303.1	2.8	8.9	4741.6
	8	2.8	18.9	308.5	2.8	13.7	299.6	2.8	9.4	4725.3
	16	2.8	30.4	326.2	2.8	15.8	302.4	2.8	11.3	4709.2
Case-4	2	4.0	11.4	374.2	4.0	10.3	372.8	4.0	5.1	5156.5
	4	4.0	17.7	372.1	4.0	10.6	368.4	4.0	9.5	5106.9
	8	4.0	18.8	370.2	4.0	11.0	354.4	4.0	10.2	5091.3
	16	4.0	39.8	415.7	4.0	12.7	356.3	4.0	11.6	5051.4

4.3 Parallel scalability

Achieving good parallel scalability is important in parallel computing, especially when solving large-scale problems with many processors. Hence, in the following we focus on the parallel performance of the proposed fully implicit method with one-level and two-level restricted additive Schwarz preconditioners. Again, the 2D and 3D test problems descried in subsectionn 4.2 are used for the scalability simulations. The numerical tests are carried out on the Tianhe-2 supercomputer. The computing nodes of Tianhe-2 are interconnected via a proprietary high performance network, and there are two 12-core Intel Ivy Bridge Xeon CPUs and 24 GB local memory in each node. In the numerical experiments, we use all 24 CPU cores in each node and assign one subdomain to each core.

Table 9: Strong scalability with different number of processors

N_{p}

Case	Mesh	$N_{p}$	One-level			Two-level
Case	Mesh	$N_{p}$	N. It	L. It	Time	N. It	L. It	Time
Case-1	$8192\times 8192$	256	7.2	65.4	564.3	7.2	18.3	460.7
		512	7.2	95.0	291.5	7.2	18.9	187.6
		1024	7.2	96.3	136.6	7.2	19.6	90.0
		2048	7.2	142.1	82.1	7.2	20.4	46.1
		4096	7.2	144.4	44.9	7.2	21.1	26.3
		8192	7.2	214.2	34.6	7.2	24.0	18.8
Case-4	$256\times 256\times 256$	512	7.8	82.7	775.7	7.8	31.0	735.1
		1024	7.8	97.7	250.1	7.8	35.2	231.5
		2048	7.8	109.8	97.6	7.8	37.9	76.2
		4096	7.8	122.9	68.7	7.8	43.2	37.6
		8192	7.8	147.0	29.9	7.8	35.0	21.8

The strong scalability (denoted by $Speedup$ ) and the parallel efficiency (denoted by $E_{f}$ ) are respectively defined as follows:

Speedup=\displaystyle\frac{T(N_{min})}{T(N_{p})},~{}E_{f}=\displaystyle\frac{T(N_{min})\times N_{min}}{T(N_{p})\times N_{p}},

where $N_{min}$ denotes the smallest processors number of the compassion, $T(N_{p})$ denotes computational time with $N_{p}$ processors. In the strong scalability test of the 2D problem, we use a fixed mesh $8192\times 8192$ , and also fixed time step sizes $\triangle t=0.05$ and 0.01 days, in which the largest simulation consists of $8192\times 8192=67,108,864$ degrees of freedom. Moreover, we investigate the scalability of the the proposed fully implicit solver by using the fixed mesh $256\times 256\times 256$ , also a fixed time step size $\triangle t=0.5$ day for Case-4. The number of processors is changed from $256$ to $8192$ . Table 9 shows the number of nonlinear and linear iterations as well as the computing time with respect to the number of processors, and Figure 8 provides the compute time and $Speedup$ curve for the strong scalability. The table clearly indicates that the number of Newton iterations remains to be independent of the number of processors, and the number of linear iterations depends on the preconditioner employed in the solver. For the one-level solver, the number of linear iterations suffers as the number of processors increases. While, with the help of the coarse mesh, the number of linear iterations for the two-level preconditioner is kept to a low level as the number of processors increases. Moreover, to investigate the strong scalability of the proposed method with respect to the heterogeneity property in the reservoir, we again use the Case-2 benchmark, and compare the number of nonlinear and linear iterations and the total computing time at $t=1,2,3,5$ days, respectively. As shown in Table 10, a good strong scalability is also achieved for the random case test under longer simulations.

Table 10: Strong scalability with different number of processors

N_{p}

for Case-2.

Level	$N_{p}$	t=1			t=2			t=3			t=5
Level	$N_{p}$	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time	N. It	L. It	Time
One	64	2.8	44.5	307.5	2.7	52.1	588.6	2.7	58.7	801.4	2.6	63.5	1275.7
	128	2.8	53.7	125.2	2.7	64.8	245.2	2.7	69.3	353.2	2.6	75.2	583.1
	256	2.8	63.1	57.8	2.7	71.6	110.9	2.7	80.1	169.6	2.6	84.9	252.7
	512	2.8	72.6	28.3	2.7	81.2	52.6	2.7	87.7	80.2	2.6	92.5	134.9
	1024	2.8	84.9	18.9	2.7	94.5	37.2	2.7	99.4	58.5	2.6	104.7	96.3
	2048	2.8	92.5	12.4	2.7	106.7	28.5	2.7	112.8	39.1	2.6	118.5	62.5
Two	64	2.8	18.8	283.6	2.7	20.2	554.7	2.7	22.6	772.0	2.6	24.3	1236.8
	128	2.8	21.5	104.5	2.7	24.5	218.1	2.7	27.3	312.4	2.6	29.7	545.9
	256	2.8	25.8	42.6	2.7	28.9	91.5	2.7	30.6	137.9	2.6	33.5	227.5
	512	2.8	29.4	22.8	2.7	31.7	46.5	2.7	34.3	74.1	2.6	36.4	118.4
	1024	2.8	32.3	15.7	2.7	35.4	32.9	2.7	36.8	51.8	2.6	38.2	89.2
	2048	2.8	37.6	11.0	2.7	39.2	25.3	2.7	40.2	35.7	2.6	42.5	58.3

The weak scalability is used to examine how the execution time varies with the number of processors when the problem size per processor is fixed. In the weak scaling test, we start with a small $2048\times 2048$ mesh with the number of processors $N_{p}=64$ and end up with a large $16384\times 16384$ mesh ( $268,435,456$ degrees of freedom) using up to $4096$ processor cores for the 2D test case. Also, we further test our algorithms in terms of the weak scalability starting with a $96\times 96\times 96$ mesh and $N_{p}=216$ for Case-4. We refine the mesh and increase the number of processors simultaneously to keep the number of unknowns per processor as a constant. Table 11 presents the results of weak scaling tests, which are run with fixed time step sizes $\triangle t=0.01$ and 0.5 days respectively, and are stopped after $5$ implicit time steps, i.e., the simulation are terminated at $t=0.05$ and $2.5$ days. We observe that, with the increase of the number of processors form $64$ to $4096$ and $216$ to $6859$ for the 2D and 3D test problems, a reasonably good weak scaling performance is obtained, especially for the two-level restricted Schwarz approach, which indicates that the proposed solver has a good weak scalability for this range of processor counts.

Table 11: Weak scalability with different number of processors.

Case	Mesh	$N_{p}$	One-level			Two-level
Case	Mesh	$N_{p}$	N. It	L. It	Time	N. It	L. It	Time
Case-1	$2048\times 2048$	64	3.8	26.4	37.4	3.8	12.7	35.2
	$4096\times 4096$	256	4.8	46.4	61.1	4.8	18.0	54.3
	$6144\times 6144$	576	6.2	69.5	95.1	6.2	14.4	67.2
	$8192\times 8192$	1024	7.2	96.3	136.6	7.2	19.6	90.0
	$12288\times 12288$	2304	9.8	173.9	316.8	9.8	17.8	131.8
	$16384\times 16384$	4096	12.4	294.7	607.8	12.4	17.0	197.1
Case-4	$96\times 96\times 96$	216	3.8	40.6	12.5	3.8	16.9	10.7
	$128\times 128\times 128$	512	4.2	56.3	15.0	4.2	15.9	12.2
	$192\times 192\times 192$	1728	5.8	90.3	27.6	5.8	35.9	22.5
	$224\times 224\times 224$	2744	6.8	107.7	35.2	6.8	38.2	28.1
	$256\times 256\times 256$	4096	7.8	122.9	68.7	7.8	43.2	37.6
	$304\times 304\times 304$	6859	9.4	147.4	85.9	9.4	44.7	54.6

In summary, observing from the above tables and the figures, we highlight that, compared with the one-level method, the two-level restricted additive Schwarz method results in a very sharp reduction in the number of linear iterations and therefore brings about a good reduction in compute time. Hence, the two-level method is much more effective and scalable than the one-level approach in terms of the strong and weak scalabilities.

5 Conclusions

The simulation of subsurface flows with high resolution solutions is of paramount importance in reservoir simulation. In this work, we have presented a parallel fully implicit framework based on multilevel restricted Schwarz preconditioners for subsurface flow simulations with Peng-Robinson equation of state. The proposed framework can get rid of the restriction of the time step size, and is flexible and allows us to construct different type of multilevel preconditioners, based on plenitudinous choices for additively or multiplicatively strategies, interpolation and restriction operators. After experimenting with many different overlapping Schwarz type preconditioners, we found that the class of V-cycle-type restricted Schwarz methods based on the second order scheme is extremely beneficial for the problems under investigation. Numerical experiments also showed that the proposed algorithms and simulators are robust and scalable for the large-scale solution of some benchmarks as well as realistic problems with highly heterogeneous permeabilities in petroleum reservoirs.

Acknowledgment

The authors would like to express their appreciations to the anonymous reviewers for the invaluable comments that have greatly improved the quality of the manuscript.

References

[1] S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L.C. McInnes, B.F. Smith, H. Zhang, PETSc Users Manual, Argonne National Laboratory, 2019.
[2] Q.M. Bui, H.C. Elman, J.D. Moulton, Algebraic multigrid preconditioners for multiphase flow in porous media, SIAM J. Sci. Comput. 39 (2017) S662–S680.
[3] Q.M. Bui, L. Wang, D. Osei-Kuffuor, Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions, Adv. Water Resour. 114 (2018) 19–28.
[4] X.-C. Cai, M. Dryja, M. Sarkis, Restricted additive Schwarz preconditioners with harmonic overlap for symmetric positive definite linear systems, SIAM J. Numer. Anal. 41 (2003) 1209–1231.
[5] X.-C. Cai, W.D. Gropp, D. E. Keyes, R. G. Melvin, D. P. Young, Parallel Newton–Krylov–Schwarz algorithms for the transonic full potential equation, SIAM J. Sci. Comput. 19 (1998) 246–265.
[6] X.-C. Cai, M. Sarkis, A restricted additive Schwarz preconditioner for general sparse linear systems, SIAM J. Sci. Comput. 21 (1999) 792–797.
[7] X.-C. Cai, D.E. Keyes, Nonlinearly preconditioned inexact Newton algorithms, SIAM J. Sci. Comput. 24 (2002)183–200.
[8] Z. Chen, G. Huan, Y. Ma, Computational Methods for Multiphase Flows in Porous Media, SIAM, Philadelphia, PA, 2006.
[9] Z. Chen, Reservoir Simulation: Mathematical Techniques in Oil Recovery, SIAM, Philadelphia, PA, 2007.
[10] M. Christie, M. Blunt, Tenth SPE comparative solution project: A comparison of upscaling techniques. In SPE Reservoir Simulation Symposium. Society of Petroleum Engineers. (2001) 308–317.
[11] C.N. Dawson, H. Klíe, M.F. Wheeler, C.S. Woodward, A parallel, implicit, cell-centered method for two-phase flow with a preconditioned Newton–Krylov solver, Computat. Geosci. 1 (1997) 215–249.
[12] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996.
[13] A. Firoozabadi, Thermodynamics of Hyrocarbon Reservoirs, McGraw-Hill, New York, 1999.
[14] J.B. Haga, H. Osnes, H.P. Langtangen, A parallel block preconditioner for large-scale poroelasticity with highly heterogeneous material parameters, Comput. Geosci. 16 (2012) 723–734.
[15] D.A. Knoll, D.E. Keyes, Jacobian-free Newton–Krylov methods: a survey of approaches and applications, J. Comput. Phys. 193 (2004) 357–397.
[16] F. Kong, X.-C. Cai, A highly scalable multilevel Schwarz method with boundary geometry preserving coarse spaces for 3D elasticity problems on domains with complex geometry, SIAM J. Sci. Comput. 38 (2016) C73–C95.
[17] S. Lacroix, Y.V. Vassilevski, J.A. Wheeler, M.F. Wheeler, Iterative solution methods for modeling multiphase flow in porous media fully implicitly, SIAM J. Sci. Comput. 25 (2003) 905–926.
[18] J.J. Lee, K.-A. Mardal, R. Winther, Parameter-robust discretization and preconditioning of biot’s consolidation model, SIAM J. Sci. Comput. 39 (2017) A1–A24.
[19] K. Lie, S. Krogstad, I. Ligaarden, J. Natvig, H. Nilsen, B. Skaflestad, Open-source MATLAB implementation of consistent discretisations on complex grids, Comput. Geosci. 16 (2012) 297–322.
[20] L. Liu, D.E. Keyes, R. Krause, A note on adaptive nonlinear preconditioning techniques, SIAM J. Sci. Comput. 40 (2018) A1171–A1186.
[21] Y. Liu, H. Yang, Z. Xie, P. Qin, R. Li, Parallel simulation of variably saturated soil water flows by fully implicit domain decomposition methods, J. Hydrol. 582 (2020) 124481.
[22] J.E.P. Monteagudo, A. Firoozabadi, Comparison of fully implicit and IMPES formulations for simulation of water injection in fractured and unfractured media, Int. J. Numer. Meth. Engng. 69 (2007) 698–728.
[23] E.E. Prudencio, X.-C. Cai, Parallel multilevel restricted Schwarz preconditioners with pollution removing for PDE-constrained optimization, SIAM J. Sci. Comput. 29 (2007) 964–985.
[24] C. Qiao, S. Wu, J. Xu, C.S. Zhang, Analytical decoupling techniques for fully implicit reservoir simulation. J. Comput. Phys. 336 (2017) 664–681.
[25] Y. Saad, Iterative Methods for Sparse Linear Systems. SIAM, Second ed., 2003.
[26] J.O. Skogestad, E. Keilegavlen, J.M. Nordbotten, Domain decomposition strategies for nonlinear flow problems in porous media, J. Comput. Phys. 234 (2013) 439–451.
[27] G. Singh , M.F. Wheeler, A space-time domain decomposition approach using enhanced velocity mixed finite element method, J. Comput. Phys. 374 (2018) 893–911.
[28] B. Smith, P. Bjørstad, W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, 1996.
[29] A. Toselli, O. Widlund, Domain Decomposition Methods-Algorithms and Theory, Springer, Berlin, 2005.
[30] J.R. Wallis, Incomplete gaussian elimination as a preconditioning for generalized conjugate gradient acceleration, in SPE Reservoir Simulation Symposium, Society of Petroleum Engineers, 1983.
[31] J.R. Wallis, R.P. Kendall, L.E. Little, Constrained residual acceleration of conjugate residual methods, in SPE Reservoir Simulation Symposium, Society of Petroleum Engineers, 1985.
[32] K. Wang, H. Liu, Z. Chen, A scalable parallel black oil simulator on distributed memory parallel computers, J. Comput. Phys. 301 (2015) 19–34.
[33] J.A. White, N. Castelletto, H.A. Tchelepi, Block-partitioned solvers for coupled poromechanics: A unified framework, Comput. Methods Appl. Mech. Engrg. 303 (2016) 55–74.
[34] H. Yang, F.-N. Hwang, X.-C. Cai, Nonlinear preconditioning techniques for full-space Lagrange-Newton solution of PDE-constrained optimization problems, SIAM J. Sci. Comput. 38 (2016) A2756–A2778.
[35] H. Yang, Y. Li, S. Sun, Nonlinearly preconditioned constraint-preserving algorithms for subsurface three-phase flow with capillarity, Comput. Methods Appl. Mech. Engrg. 367 (2020) 113140.
[36] H. Yang, S. Sun, Y. Li, C. Yang, A scalable fully implicit framework for reservoir simulation on parallel computers, Comput. Methods Appl. Mech. Engrg. 330 (2018) 334–350.
[37] H. Yang, S. Sun, Y. Li, C. Yang, A fully implicit constraint-preserving simulator for the black oil model of petroleum reservoirs, J. Comput. Phys. 396 (2019) 347–363.
[38] H. Yang, S. Sun, C. Yang, Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media, J. Comput. Phys. 332 (2017) 1–20.
[39] H. Yang, C. Yang, S. Sun, Active-set reduced-space methods with nonlinear elimination for two-phase flow problems in porous media, SIAM J. Sci. Comput. 38 (2016) B593–B618.