This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

libEMM: A fictious wave domain 3D CSEM modelling library bridging sequential and parallel GPU implementation

Pengliang Yang1
1 School of Mathematics, Harbin Institute of Technology, 150001, Harbin, China
E-mail: [email protected]
Abstract

This paper delivers a software - libEMM - for 3D controlled-source electromagnetics (CSEM) modelling in fictitious wave domain, based on the newly developed high-order finite-difference time-domain (FDTD) method on non-uniform grid. The numerical simulation can be carried out over a number of parallel processors using MPI-based high performance computing architecture. The FDTD kernel coded in C has been parallelized with OpenMP for speedup using local shared memory. In addition, the software features a GPU implementation of the same algorithm based on CUDA programming language, which can be cross-validated and compared in terms of efficiency. A perspective of libEMM on the horizon is its application to 3D CSEM inversion in land and marine environment.

1 Introduction

Controlled-source electromagnetics (CSEM) has been a well established technology in detecting hydrocarbon bearing formations to do geophysical explorations (Chave and Cox,, 1982; Constable et al.,, 1986). CSEM has been applied to air-bone (Chang-Chun et al.,, 2015) and cross-well (Alumbaugh and Newman,, 1997) geometries, in land (Ward and Hohmann,, 1988; Zhdanov and Keller,, 1994; Grayver et al.,, 2014) and/or marine environment (Eidesmo et al.,, 2002; Ellingsrud et al.,, 2002; Constable,, 2010; MacGregor and Tomlinson,, 2014). These applications are based on the high sensitivity of the low-frequency electromagnetic signals to detect high resistivity contrast in the subsurface. It makes CSEM an ideal tool for prospect de-risking (MacGregor et al.,, 2007) when a decision on well-placement has to be made prior to drilling (Meju,, 2019).

CSEM inversion of electromagnetic data is a powerful imaging approach to distinguishing the strong resistive anomalies, thus helping to decipher the potential hydrocarbon distribution in the Earth. CSEM inversion relies on repeated application of 3D numerical modelling engine based on diffusive Maxwell equation. To do that, different methods have been developed, i.e., frequency-domain finite-difference method (Newman and Alumbaugh,, 1995; Smith,, 1996; Mulder,, 2006; Streich,, 2009), frequency-domain finite-element method (Li and Key,, 2007; da Silva et al.,, 2012; Key,, 2016; Rochlitz et al.,, 2019), time-domain finite-difference method (Oristaglio and Hohmann,, 1984; Wang and Hohmann,, 1993; Taflove and Hagness,, 2005).

A number of open source codes have been released for CSEM modelling in public domain, such as SimPEG (Cockett et al.,, 2015), MARE2DEM (Key,, 2016), PETGEM (Castillo-Reyes et al.,, 2018), custEM (Rochlitz et al.,, 2019) and emg3d (Werthmüller et al.,, 2019). Among them, MARE2DEM in Fortran is dedicated to 2D modelling and inversion, while the remaining four are built in python working in 3D. Both SimPEG and emg3d allow forward simulation and estimation of gradient in inversion mode. The forward simulation of SimPEG (Cockett et al.,, 2015) and emg3d (Werthmüller et al.,, 2019) relies on finite-volume meshes, while PETGEM (Castillo-Reyes et al.,, 2018) and custEM (Rochlitz et al.,, 2019) are designed for accurate modelling in complicated models with finite elements. All these software packages are based on a frequency domain approach by solving the linear system after discretization of the frequency domain Maxwell equation. Some of them support the output of EM field as time series, by first modelling at judiciously selected frequencies and then Fourier transforming the frequency domain field into time domain (Rochlitz et al.,, 2021).

The main contribution of this paper is to deliver a time-domain 3D CSEM modelling code - libEMM. The code implements our newly developed high-order FDTD scheme running on both uniform and nonuniform (NU) staggered grid (Yang and Mittet,, 2023), using the efficient fictitious wave domain approach to do 3D CSEM modelling (Mittet,, 2010). The key difference between libEMM and other softwares is the computation of the EM field by a direct discretization of the converted Maxwell equation in fictitious time domain. Significant performance improvement in accuracy and computational efficiency of this method have been achieved compared to the commonly used 2nd order scheme over nonuniform grid (Yang and Mittet,, 2023).

This paper focuses on the computer implementation in a software framework rather than numerical justification of the method in terms of accuracy. Besides the implementation using message passing interface (MPI) over a number of distributed CPU cores, the paper also presents a number of key techniques to achieve multithread GPU parallelization. This is also the first time that GPU implementation of this method is extended from uniform grid (Yang,, 2021) to nonuniform grid, and released in an open source software, in the hope of wide adoption. Consequently, the coding of libEMM involves programming using C and Nvidia compute unified device architecture (CUDA), parallelized with MPI. The modelling job can then be launched using shell script on high performance computing cluster.

The paper is organized in the following. First we briefly review the methodology of CSEM modelling in fictitious wave domain. Then we present key ingredients to make an effective implementation, including the boundary conditions (the top air-water boundary condition and the absorbing boundary condition on other sides), the time stepping, the medium homogenization and the optimal grid generation. To facilitate CSEM modelling on GPU-enabled hardware, we also list several strategies to optimize code using CUDA programming language. After that, we catch the main features of libEMM for an easy-take in terms of software usage. Then, application examples are given with performance analysis. A final remark will be made towards the potential applications in real 3D CSEM inversion.

2 CSEM modelling in fictitious wave domain

2.1 The diffusive Maxwell equation

The CSEM physics is governed by the diffusive Maxwell equation (consisting of Faraday’s law and Ampère’s law), which reads in the frequency domain

{×𝐄iωμ𝐇=𝐌×𝐇+σ𝐄=𝐉,\begin{cases}\nabla\times\mathbf{E}-\mathrm{i}\omega\mu\mathbf{H}=-\mathbf{M}\\ -\nabla\times\mathbf{H}+\sigma\mathbf{E}=-\mathbf{J}\end{cases}, (1)

where 𝐄=(Ex,Ey,Ez)T\mathbf{E}=(E_{x},E_{y},E_{z})^{T} and 𝐇=(Hx,Hy,Hz)T\mathbf{H}=(H_{x},H_{y},H_{z})^{T} are the electrical and magnetic fields; 𝐉=(Jx,Jy,Jz)T\mathbf{J}=(J_{x},J_{y},J_{z})^{T} and 𝐌=(Mx,My,Mz)T\mathbf{M}=(M_{x},M_{y},M_{z})^{T} are the electrical and magnetic sources. Here, we slightly abuse the same notation to denote the same quantity when switching between time domain and frequency domain, where the convention of Fourier transform tiω\partial_{t}\leftrightarrow-\mathrm{i}\omega has been adopted. The magnetic permeability is μ\mu. The conductivity is a symmetric 3×33\times 3 tensor:

σ=[σxxσxyσxzσyxσyyσyzσzxσzyσzz]\sigma=\left[\begin{array}[]{lll}\sigma_{xx}&\sigma_{xy}&\sigma_{xz}\\ \sigma_{yx}&\sigma_{yy}&\sigma_{yz}\\ \sigma_{zx}&\sigma_{zy}&\sigma_{zz}\end{array}\right] (2)

in which σij=σji\sigma_{ij}=\sigma_{ji}, i,j{x,y,z}i,j\in\{x,y,z\}. The isotropic medium means only the diagonal elements of the conductivity tensor are non-zeros and the same in all directions: σxx=σyy=σzz\sigma_{xx}=\sigma_{yy}=\sigma_{zz}; σij=0,ij\sigma_{ij}=0,i\neq j. The vertical transverse isotropic (VTI) medium still has only the diagonal elements, that is, σh:=σxx=σyy,σv=σzz\sigma_{h}:=\sigma_{xx}=\sigma_{yy},\quad\sigma_{v}=\sigma_{zz}, where σh\sigma_{h} and σv\sigma_{v} stand for horizontal conductivity and vertical conductivity, respectively. The resistivity is also frequently used in CSEM system which is the inverse of the conductivity (ρij=1/σij\rho_{ij}=1/\sigma_{ij}).

2.2 Fictitious wave domain modelling

By defining a fictitious di-electrical permittivity ϵ\epsilon such that σ=2ω0ε\sigma=2\omega_{0}\varepsilon, and multiplying the second equation with iω/2ω0\sqrt{-\mathrm{i}\omega/2\omega_{0}}, (Mittet,, 2010) transformed the equation into

{×𝐄iωμ𝐇=𝐌×𝐇iωε𝐄=𝐉\begin{cases}\nabla\times\mathbf{E}^{\prime}-\mathrm{i}\omega^{\prime}\mu\mathbf{H}^{\prime}=-\mathbf{M}^{\prime}\\ -\nabla\times\mathbf{H}^{\prime}-\mathrm{i}\omega^{\prime}\varepsilon\mathbf{E}^{\prime}=-\mathbf{J}^{\prime}\end{cases} (3)

where we introduce a prime to define the resulting wave domain fields via

𝐄=𝐄,𝐇=iω2ω0𝐇,𝐌=𝐌,𝐉=iω2ω0𝐉,ω=(1+i)ωω0.\mathbf{E}^{\prime}=\mathbf{E},\;\mathbf{H}^{\prime}=\sqrt{\frac{-\mathrm{i}\omega}{2\omega_{0}}}\mathbf{H},\;\mathbf{M}^{\prime}=\mathbf{M},\;\mathbf{J}^{\prime}=\sqrt{\frac{-\mathrm{i}\omega}{2\omega_{0}}}\mathbf{J},\;\omega^{\prime}=(1+\mathrm{i})\sqrt{\omega\omega_{0}}. (4)

Equation (3) may be transformed into time domain as

{×𝐄+μt𝐇=𝐌×𝐇+εt𝐄=𝐉\begin{cases}\nabla\times\mathbf{E}^{\prime}+\mu\partial_{t}\mathbf{H}^{\prime}=-\mathbf{M}^{\prime}\\ -\nabla\times\mathbf{H}^{\prime}+\varepsilon\partial_{t}\mathbf{E}^{\prime}=-\mathbf{J}^{\prime}\end{cases} (5)

From the electromagnetic fields in the wave domain, the frequency-domain fields can be computed on the fly during modelling using the fictitious wave transformation

Ei(𝐱,ω)=\displaystyle E_{i}^{\prime}(\mathbf{x},\omega^{\prime})= 0TmaxEi(𝐱,t)eiωtdt,\displaystyle\int_{0}^{T_{\max}}E_{i}^{\prime}(\mathbf{x},t)e^{\mathrm{i}\omega^{\prime}t}\mathrm{d}t, (6a)
Hi(𝐱,ω)=\displaystyle H_{i}^{\prime}(\mathbf{x},\omega^{\prime})= 0TmaxHi(𝐱,t)eiωtdt,\displaystyle\int_{0}^{T_{\max}}H_{i}^{\prime}(\mathbf{x},t)e^{\mathrm{i}\omega^{\prime}t}\mathrm{d}t, (6b)

where i={x,y,z}i=\{x,y,z\} are component indices of the electromagnetic fields; TmaxT_{\max} is the final time until the electromagnetic fields reach their steady state.

The Green’s function for point source is then obtained by normalizing the electromagnetic field with the source current. In the absence of magnetic source (𝐌=0\mathbf{M}=0), the Green’s functions due to the electrical current Jj(𝐱s,ω)J_{j}(\mathbf{x}_{s},\omega) are

GijEE(𝐱,ω;𝐱s)=\displaystyle G_{ij}^{EE}(\mathbf{x},\omega;\mathbf{x}_{s})= Ei(𝐱,ω;𝐱s)Jj(𝐱s,ω)=iω2ω0Ei(𝐱,ω;𝐱s)Jj(𝐱s,ω),\displaystyle\frac{E_{i}(\mathbf{x},\omega;\mathbf{x}_{s})}{J_{j}(\mathbf{x}_{s},\omega)}=\sqrt{\frac{-\mathrm{i}\omega}{2\omega_{0}}}\frac{E_{i}^{\prime}(\mathbf{x},\omega;\mathbf{x}_{s})}{J_{j}^{\prime}(\mathbf{x}_{s},\omega)}, (7a)
GijHE(𝐱,ω;𝐱s)=\displaystyle G_{ij}^{HE}(\mathbf{x},\omega;\mathbf{x}_{s})= Hi(𝐱,ω;𝐱s)Jj(𝐱s,ω)=Hi(𝐱,ω;𝐱s)Jj(𝐱s,ω),\displaystyle\frac{H_{i}(\mathbf{x},\omega;\mathbf{x}_{s})}{J_{j}(\mathbf{x}_{s},\omega)}=\frac{H_{i}^{\prime}(\mathbf{x},\omega;\mathbf{x}_{s})}{J_{j}^{\prime}(\mathbf{x}_{s},\omega)}, (7b)

where GijEE(𝐱,ω;𝐱s)G_{ij}^{EE}(\mathbf{x},\omega;\mathbf{x}_{s}) and GijHE(𝐱,ω;𝐱s)G_{ij}^{HE}(\mathbf{x},\omega;\mathbf{x}_{s}) stand for the iith component of the electrical and magnetic Green’s function for angular frequency ω\omega at the spatial location 𝐱\mathbf{x}.

3 Computer implementation

In this section we present the key ingredients to efficiently implement the fictitious wave domain CSEM modelling based on our newly developed high-order FDTD method on non-uniform grid (Yang and Mittet,, 2023). For the convenience of code implementation, we define a data structure for electromagnetic fields and use the pointer emf to access information of all relevant parameters.

3.1 High order FDTD over nonuniform grid

Different from the attenuative formulation after (Maaø,, 2007), equation (3) is a pure wave domain equation, which can be discretized using leap-frog staggered grid finite difference method on the staggered (Yee) grid:

{Hin+1/2=Hin1/2Δtμ1(ξijkjEkn+Min)Ein+1=Ein+Δt(ε1)ij(ξjklkHln+1/2Jjn+1/2)\begin{cases}H_{i}^{{}^{\prime}n+1/2}=&H_{i}^{{}^{\prime}n-1/2}-\Delta t\mu^{-1}(\xi_{ijk}\partial_{j}E_{k}^{{}^{\prime}n}+M_{i}^{{}^{\prime}n})\\ E_{i}^{{}^{\prime}n+1}=&E_{i}^{{}^{\prime}n}+\Delta t(\varepsilon^{\prime-1})_{ij}(\xi_{jkl}\partial_{k}H_{l}^{{}^{\prime}n+1/2}-J_{j}^{{}^{\prime}n+1/2})\end{cases} (8)

where Δt\Delta t is the temporal sampling, ξijk\xi_{ijk} is the Levi-Civita symbol. The positioning of each field on the nonuniform staggered grid is similar to the staggered FDTD on a uniform grid. The major difference between FDTD implementations on a uniform and a nonuniform grids lies in the discretization of these spatial derivatives (iEk\partial_{i}E^{\prime}_{k} and kHl\partial_{k}H^{\prime}_{l})

In order to compute the electromagnetic field as well as its derivatives with arbitrary grid spacing, we have to do polynomial interpolation using a number of knots x0,x1,,xnx_{0},x_{1},\cdots,x_{n}. According to Taylor expansion, we have

f(xi)=f(x)+f(x)(xix)+12f′′(x)(xix)2++1n!f(n)(x)(xix)n+f(x_{i})=f(x)+f^{\prime}(x)(x_{i}-x)+\frac{1}{2}f^{\prime\prime}(x)(x_{i}-x)^{2}+\cdots+\frac{1}{n!}f^{(n)}(x)(x_{i}-x)^{n}+\cdots (9)

where i=0,1,,ni=0,1,\cdots,n. Let us define ai(x):=f(i)(x)/i!a_{i}(x):=f^{(i)}(x)/i! and use first n+1n+1 distinct nodes x0,x1,,xnx_{0},x_{1},\cdots,x_{n} to build a matrix system

[f(x0)f(x1)f(xn)]𝐟=[1x0x(x0x)2(x0x)n1x1x(x1x)2(x1x)n1xnx(xnx)2(xnx)n]𝐕T(x0x,,xnx)[a0(x)a1(x)an(x)]𝐚\underbrace{\begin{bmatrix}f(x_{0})\\ f(x_{1})\\ \vdots\\ f(x_{n})\end{bmatrix}}_{\mathbf{f}}=\underbrace{\begin{bmatrix}1&x_{0}-x&(x_{0}-x)^{2}&\cdots&(x_{0}-x)^{n}\\ 1&x_{1}-x&(x_{1}-x)^{2}&\cdots&(x_{1}-x)^{n}\\ \vdots&\ddots&&\vdots\\ 1&x_{n}-x&(x_{n}-x)^{2}&\cdots&(x_{n}-x)^{n}\\ \end{bmatrix}}_{\mathbf{V}^{T}(x_{0}-x,\cdots,x_{n}-x)}\underbrace{\begin{bmatrix}a_{0}(x)\\ a_{1}(x)\\ \vdots\\ a_{n}(x)\end{bmatrix}}_{\mathbf{a}} (10)

where 𝐕T(x0x,,xnx)\mathbf{V}^{T}(x_{0}-x,\cdots,x_{n}-x) is the transpose of a Vandermonde matrix determined by x0x,,xnxx_{0}-x,\cdots,x_{n}-x. The above expression implies that the function f(x)f(x) and its derivatives up to nn-th oder at arbitrary location xx can be found by inverting the Vandermonde matrix 𝐚=[𝐕T]1𝐟\mathbf{a}=[\mathbf{V}^{T}]^{-1}\mathbf{f}. When the order of the Vandermonde matrix increases, the inversion of Vandermonde matrix becomes highly ill-conditioned. Fortunately, there exists accurate and efficient algorithm (Björck and Pereyra,, 1970) to achieve this challenging job, by exploring the equivalence between Vandermonde matrix inversion and the Lagrange polynomial interpolation. The following code snippet implements it following (Golub,, 1996, Algorithm 4.6.1).

/* solve the vandermonde system V^T(x0,x1,...,xn) a = f
 * where the n+1 points are prescribed by vector x=[x0,x1,...,xn];
 * the solution is stored in vector a.
 */
void vandermonde(int n, float *x, float *a, float *f)
{
  int i,k;

  /* calculate Newton representation of the interpolating polynomial */
  for(i=0; i<=n; ++i) a[i] = f[i];

  for(k=0; k<n; k++){
    for(i=n; i>k; i--){
      a[i] = (a[i]-a[i-1])/(x[i]-x[i-k-1]);
    }
  }

  for(k=n-1; k>=0; k--){
    for(i=k; i<n; i++){
      a[i] -= a[i+1]*x[k];
    }
  }
}

Let the ii-th row, jj-th column of the inverse matrix [𝐕T]1[\mathbf{V}^{T}]^{-1} be wijw_{ij}, i.e. ([𝐕T]1)ij=wij,i,j=0,,n([\mathbf{V}^{T}]^{-1})_{ij}=w_{ij},i,j=0,\cdots,n. It follows that

[a0(x)a1(x)an(x)]𝐚:=[w00w01w0nw10w11w1nwn0wn1wnn][𝐕T]1[f(x0)f(x1)f(xn)]𝐟\underbrace{\begin{bmatrix}a_{0}(x)\\ a_{1}(x)\\ \vdots\\ a_{n}(x)\end{bmatrix}}_{\mathbf{a}}:=\underbrace{\begin{bmatrix}w_{00}&w_{01}&\cdots w_{0n}\\ w_{10}&w_{11}&\cdots w_{1n}\\ \vdots\\ w_{n0}&w_{n1}&\cdots w_{nn}\\ \end{bmatrix}}_{[\mathbf{V}^{T}]^{-1}}\underbrace{\begin{bmatrix}f(x_{0})\\ f(x_{1})\\ \vdots\\ f(x_{n})\end{bmatrix}}_{\mathbf{f}} (11)

There are a number of situations that requires the use of interpolation weights repeatedly at every time step:

  1. 1.

    The same finite difference coefficients will be used at the same grid location to do numerical modelling;

  2. 2.

    The same interpolation weights may be used to record fields at the same location in different time steps.

  3. 3.

    The injection of the source time function needs the same interpolation weights to be applied.

The last two cases are due to the fact that the sources and the receivers may reside in the arbitrary location, not necessarily on the grid. As a consequence, it is necessary to tabulate these weights instead of repeating the computation according to the inversion algorithm. Since applying the algorithm in operator form does not need the matrix coefficients explicitly, it is not obvious how to obtain the interpolation weights wijw_{ij}. We present an effective way here to make it. The idea is the following: feeding the algorithm with the vector ff with jjth element 1 and other elements 0, the output of the Vandermonde inversion algorithm will be the jjth column of the matrix [𝐕T]1[\mathbf{V}^{T}]^{-1}:

𝐟=[0,,1,,0]T,𝐚=[w1j,,wnj]T.\mathbf{f}=[0,\cdots,1,\cdots,0]^{T},\quad\mathbf{a}=[w_{1j},\cdots,w_{nj}]^{T}. (12)

By looping over index jj from 0 to nn, we obtain all columns of [𝐕T]1[\mathbf{V}^{T}]^{-1}. Over the rectilinear non-uniform grid, we construct multi-dimensional finite difference stencil using the tensor product of three 1D interpolation operator.

Consider the staggered finite difference approximation of the first derivatives in xx direction using 2r2r non-equidistant nodes centered at x=xi+1/2x=x_{i+1/2} for the magnetic component and x=xix=x_{i} for the electric component. The forward operator Dx+D_{x}^{+} to discretize 1st derivative of electrical field is given by

Dx+u|x=xi+1/2=(c1+(xi+1/2)u(xi+1)+c0+(xi+1/2)u(xi))+(c2+(xi+1/2)u(xi+2)+c1+(xi+1/2)u(xi1))++(cr+(xi+1/2)u(xi+r)+cr+1+(xi+1/2)u(xir+1))\begin{split}D_{x}^{+}u|_{x=x_{i+1/2}}=&\Big{(}c^{+}_{1}(x_{i+1/2})u(x_{i+1})+c^{+}_{0}(x_{i+1/2})u(x_{i})\Big{)}\\ +&\Big{(}c^{+}_{2}(x_{i+1/2})u(x_{i+2})+c^{+}_{-1}(x_{i+1/2})u(x_{i-1})\Big{)}\\ +&\cdots\\ +&\Big{(}c^{+}_{r}(x_{i+1/2})u(x_{i+r})+c^{+}_{-r+1}(x_{i+1/2})u(x_{i-r+1})\Big{)}\end{split} (13)

and the backward operator DxD_{x}^{-} to discretize 1st derivative of magnetic field is

Dxu|x=xi=(c1(xi)u(xi+1/2)+c0(xi)u(xi1/2))+(c2(xi)u(xi+3/2)+c1(xi)u(xi3/2))++(cr(xi)u(xi+r1/2)+cr+1(xi)u(xir+1/2)),\begin{split}D_{x}^{-}u|_{x=x_{i}}=&\Big{(}c^{-}_{1}(x_{i})u(x_{i+1/2})+c^{-}_{0}(x_{i})u(x_{i-1/2})\Big{)}\\ +&\Big{(}c^{-}_{2}(x_{i})u(x_{i+3/2})+c^{-}_{-1}(x_{i})u(x_{i-3/2})\Big{)}\\ +&\cdots\\ +&\Big{(}c^{-}_{r}(x_{i})u(x_{i+r-1/2})+c^{-}_{-r+1}(x_{i})u(x_{i-r+1/2})\Big{)},\end{split} (14)

where we have paired the differencing terms to highlight the similarity to the uniform grid staggered finite differencing. The coefficients cj+c_{j}^{+} and cjc_{j}^{-}, j=r+1,,rj=-r+1,\cdots,r are the 2nd row of the inverse of the matrix 𝐕T(xi+r1/2xi,,xir+1/2xi)\mathbf{V}^{T}(x_{i+r-1/2}-x_{i},\cdots,x_{i-r+1/2}-x_{i}) and 𝐕T(xi+rxi,,xir+1xi)\mathbf{V}^{T}(x_{i+r}-x_{i},\cdots,x_{i-r+1}-x_{i}) in (11). Using 2r2r nodes, we achieve higher accuracy up to 2r2r-th order in space.

Denote the radius rr as emf->rd. Using Vandermonde matrix inversion, we compute the finite difference coefficients of 2r2rth order and store it in emf->v3s (here 3 denotes z direction and s means grid staggering) to compute zHx\partial_{z}H_{x} and zHy\partial_{z}H_{y}, yielding

//i1min,i1max=lower and upper bounds for i1 - x direction
//i2min,i2max=lower and upper bounds for i2 - y direction
//i3min,i3max=lower and upper bounds for i3 - z direction
for(i3=i3min; i3<=i3max; ++i3){
  for(i2=i2min; i2<=i2max; ++i2){
    for(i1=i1min; i1<=i1max; ++i1){
     ...
     if(emf->rd==1){
       D3H2 = emf->v3s[i3][0]*emf->H2[i3-1][i2][i1]
         + emf->v3s[i3][1]*emf->H2[i3][i2][i1];
       D3H1 = emf->v3s[i3][0]*emf->H1[i3-1][i2][i1]
         + emf->v3s[i3][1]*emf->H1[i3][i2][i1];
     }else if(emf->rd==2){
       D3H2 = emf->v3s[i3][0]*emf->H2[i3-2][i2][i1]
         + emf->v3s[i3][1]*emf->H2[i3-1][i2][i1]
         + emf->v3s[i3][2]*emf->H2[i3][i2][i1]
         + emf->v3s[i3][3]*emf->H2[i3+1][i2][i1];
       D3H1 = emf->v3s[i3][0]*emf->H1[i3-2][i2][i1]
         + emf->v3s[i3][1]*emf->H1[i3-1][i2][i1]
         + emf->v3s[i3][2]*emf->H1[i3][i2][i1]
         + emf->v3s[i3][3]*emf->H1[i3+1][i2][i1];
     }else if(emf->rd==3){
       D3H2 = emf->v3s[i3][0]*emf->H2[i3-3][i2][i1]
         + emf->v3s[i3][1]*emf->H2[i3-2][i2][i1]
         + emf->v3s[i3][2]*emf->H2[i3-1][i2][i1]
         + emf->v3s[i3][3]*emf->H2[i3][i2][i1]
         + emf->v3s[i3][4]*emf->H2[i3+1][i2][i1]
         + emf->v3s[i3][5]*emf->H2[i3+2][i2][i1];
       D3H1 = emf->v3s[i3][0]*emf->H1[i3-3][i2][i1]
         + emf->v3s[i3][1]*emf->H1[i3-2][i2][i1]
         + emf->v3s[i3][2]*emf->H1[i3-1][i2][i1]
         + emf->v3s[i3][3]*emf->H1[i3][i2][i1]
         + emf->v3s[i3][4]*emf->H1[i3+1][i2][i1]
         + emf->v3s[i3][5]*emf->H1[i3+2][i2][i1];
     }
     ...
    }
  }
}

The above procedure allows us to use high order finite difference scheme to accurately compute the electromagnetic fields and their derivatives, typically with arbitrary grid spacing in the rectilinear grid. This opens the door for CSEM modelling using high order FDTD on a nonuniform grid in a consistent framework.

3.2 Stability condition and dispersion error

Following the standard von Neumann analysis, the author have proved (Yang and Mittet,, 2023) that the following condition must be satisfied in the generic rectilinear nonuniform grid for stable numerical modelling

ηΔt1withη=12vmax(Dxmax)2+(Dymax)2+(Dzmax)2,\eta\Delta t\leq 1\quad\text{with}\quad\eta=\frac{1}{2}v_{\max}\sqrt{(D_{x}^{\max})^{2}+(D_{y}^{\max})^{2}+(D_{z}^{\max})^{2}}, (15)

where vmax=max{v}v_{\max}=\max\{v\} in which v:=1/μϵv:=1/\sqrt{\mu\epsilon} is the velocity of the propagating EM field; DxmaxD_{x}^{\max}, DymaxD_{y}^{\max} and DzmaxD_{z}^{\max} are the maximum value of the the discretized first derivatives along xx, yy and zz directions. In particular, we have

Dxmax=max(i=r+1r|ci+|,i=r+1r|ci|)D_{x}^{\max}=\max(\sum_{i=-r+1}^{r}|c_{i}^{+}|,\sum_{i=-r+1}^{r}|c_{i}^{-}|) (16)

and similar estimations for DymaxD_{y}^{\max} and DzmaxD_{z}^{\max}. To minimize the number of time steps NtN_{t}, we would like to use the largest possible Δt\Delta t without violation of the above stability condition. The temporal sampling Δt\Delta t is therefore automatically determined based on the pre-determined grid spacing, i.e., Δt=0.99/η\Delta t=0.99/\eta.

As shown in the Appendix, the dispersion error on the uniform grid can be easily measured. However, over the non-uniform grid, the discritized spatial derivative operators in (13) and (14) immediately make the dispersion analysis difficult (the wavenumbers are also space dependent). In view of the difficulties to practically measure the dispersion error on nonuniform grid, there is no control of the dispersion error in wave domain. Another solid reason for not doing so is that the final electromagnetic field we compute is in diffusion domain instead of wave domain. Physically speaking, the diffusive EM field is dispersive in itself, thus the physical dispersion will domainate the computed field after conversion from wave domain into diffusion domain. This means the numeric dispersion error in wave domain will eventually be annihilated by the physical dispersion. Note that in our method, the physical dispersion has been achieved analytically according to equation (4), thus free of numeric error.

3.3 Top boundary condition

At each time step, the top boundary (air-water interface in marine CSEM or air-formation interface in land CSEM) must be properly implemented with high order FDTD (Wang and Hohmann,, 1993; Oristaglio and Hohmann,, 1984). In the air, the conductivity is zero. Without electrical source, the Ampère’s law then reduces to

×𝐇=0,\nabla\times\mathbf{H}=0, (17)

which leads to the wave-number domain relation

Hx=kxkzHz,Hz=kykzHz,H_{x}=\frac{k_{x}}{k_{z}}H_{z},\quad H_{z}=\frac{k_{y}}{k_{z}}H_{z}, (18)

where kxk_{x}, kyk_{y} and kzk_{z} are wave-numbers in x-, y- and z- directions. Since 𝐇=0\nabla\cdot\mathbf{H}=0, from the relation ××𝐇=(𝐇)Δ𝐇\nabla\times\nabla\times\mathbf{H}=\nabla(\nabla\cdot\mathbf{H})-\Delta\mathbf{H} we obtain

Δ𝐇=0,\Delta\mathbf{H}=0, (19)

which implies kx2+ky2+kz2=0k_{x}^{2}+k_{y}^{2}+k_{z}^{2}=0, hence kz=±ikx2+ky2k_{z}=\pm\mathrm{i}\sqrt{k_{x}^{2}+k_{y}^{2}}. It is critical to choose a correct sign here such that the field vanishes at infinite far distance (𝐇0\mathbf{H}\rightarrow 0 when z0z\rightarrow 0), yielding kz=ikx2+ky2k_{z}=-\mathrm{i}\sqrt{k_{x}^{2}+k_{y}^{2}}. This leads to the following implementation

Hx=ikxkx2+ky2Hz,Hz=ikykx2+ky2Hz.H_{x}=\frac{\mathrm{i}k_{x}}{\sqrt{k_{x}^{2}+k_{y}^{2}}}H_{z},\quad H_{z}=\frac{\mathrm{i}k_{y}}{\sqrt{k_{x}^{2}+k_{y}^{2}}}H_{z}. (20)

Note we have made a sign correction to (Mittet,, 2010, equation 46). In order to use high-order FDTD scheme, we also need to extrapolate electrical fields by forcing (Oristaglio and Hohmann,, 1984),

Δ𝐄=0\Delta\mathbf{E}=0 (21)

at the air interface.

The above boundary conditions result in the extrapolation of the fields to ghost points using Fourier transform. Because the use of fast Fourier transform (FFT) assumes equal grid spacing, the air-water boundary condition requires a uniform grid in both x- and y- directions. The use of non-uniform grid in x- and y- directions is feasible by interpolating between non-uniform grid and uniform grid. Our numerical result (Yang and Mittet,, 2023) shows that it leads to degraded modelling accuracy and thus not implemented in libEMM.

3.4 Absorbing boundary condition

In order to mimic the wave propagation to infinity, we apply perfectly matched layer (PML) (Chew and Weedon,, 1994) boundary condition in other directions except the top part of the model. It helps to attenuate the potential reflection using truncated computing mesh. The application of PML is equivalent to perform coordinate stretching x~=sx1x\partial_{\tilde{x}}=s_{x}^{-1}\partial_{x} with a complex factor sxs_{x} (|sx|>1|s_{x}|>1). The convolutional PML (CPML) (Roden and Gedney,, 2000) uses sx=κx+dx(x)αx+iω(κx1)s_{x}=\kappa_{x}+\frac{d_{x}(x)}{\alpha_{x}+i\omega}(\kappa_{x}\geq 1) such that

x~=1κxx+ψx,\partial_{\tilde{x}}=\frac{1}{\kappa_{x}}\partial_{x}+\psi_{x}, (22)

where the memory variable ψx\psi_{x} can then be computed in a recursive manner during time stepping

ψxn+1=bxψxn+axxn+1/2\psi_{x}^{n+1}=b_{x}\psi_{x}^{n}+a_{x}\partial_{x}^{n+1/2} (23)

where

bx=e(dx/κx+αx)Δt,ax=dxκx(dx+κxαx)(bx1).b_{x}=e^{-(d_{x}/\kappa_{x}+\alpha_{x})\Delta t},a_{x}=\frac{d_{x}}{\kappa_{x}(d_{x}+\kappa_{x}\alpha_{x})}(b_{x}-1). (24)

The damping profile dx(x)d_{x}(x) and the constant κx\kappa_{x} are chosen according to (Komatitsch and Martin,, 2007). We implement the CPML for xx, yy and zz directions in the same way in FDTD. Assume the resistivity/conductivity model is cube of size n1*n2*n3. The model will be padded with nb CPML layers and ne number of extra buffers on each side:

    emf->nbe = emf->nb + emf->ne;
    emf->n1pad = emf->n1 + 2*emf->nbe;//total number of gridpoints in x
    emf->n2pad = emf->n2 + 2*emf->nbe;//total number of gridpoints in y
    emf->n3pad = emf->n3 + 2*emf->nbe;//total number of gridpoints in z

An example for the memory variables associated with zHx\partial_{z}H_{x} and zHy\partial_{z}H_{y} are given as follows:

    ...
    if(i3<emf->nb){
      emf->memD3H2[i3][i2][i1] = emf->b3[i3]*emf->memD3H2[i3][i2][i1]
                               + emf->a3[i3]*D3H2;
      emf->memD3H1[i3][i2][i1] = emf->b3[i3]*emf->memD3H1[i3][i2][i1]
                               + emf->a3[i3]*D3H1;
      D3H2 += emf->memD3H2[i3][i2][i1];
      D3H1 += emf->memD3H1[i3][i2][i1];
    }else if(i3>emf->n3pad-1-emf->nb){
      j3 = emf->n3pad-1-i3;
      k3 = j3+emf->nb;
      emf->memD3H2[k3][i2][i1] = emf->b3[j3]*emf->memD3H2[k3][i2][i1]
                               + emf->a3[j3]*D3H2;
      emf->memD3H1[k3][i2][i1] = emf->b3[j3]*emf->memD3H1[k3][i2][i1]
                               + emf->a3[j3]*D3H1;
      D3H2 += emf->memD3H2[k3][i2][i1];
      D3H1 += emf->memD3H1[k3][i2][i1];
    }
    ...

After this step, the curl operator of the fields can then be obtained. For the curl of magnetic fields, the formula

×𝐇=[yHzzHy,zHxxHz,xHyyHx]T\nabla\times\mathbf{H}=\begin{bmatrix}\partial_{y}H_{z}-\partial_{z}H_{y},\partial_{z}H_{x}-\partial_{x}H_{z},\partial_{x}H_{y}-\partial_{y}H_{x}\end{bmatrix}^{T}

will translate into

    ...
    emf->curlH1[i3][i2][i1] = D2H3-D3H2;
    emf->curlH2[i3][i2][i1] = D3H1-D1H3;
    emf->curlH3[i3][i2][i1] = D1H2-D2H1;
    ...

The above code implements the CPML regions and the interior regions in an elegant and compatible frame.

3.5 Time integration

Equation (6) shows that many time steps may be required to compute the frequency domain electromagnetic fields. The frequency domain data should be integrated on the fly during forward modelling process thanks to the discrete time Fourier transform (DTFT).

u(𝐱,ω)=n=0Nt1u(𝐱,tn)exp(iωtn),u(\mathbf{x},\omega)=\sum_{n=0}^{N_{t}-1}u(\mathbf{x},t_{n})\exp(\mathrm{i}\omega^{\prime}t_{n}), (25)

where the time has been discretized as tn=nΔtt_{n}=n\Delta t; u{Ex,Ey,Ez,Hx,Hy,Hz}u\in\{E_{x},E_{y},E_{z},H_{x},H_{y},H_{z}\} is one component of the electromagnetic fields; NtN_{t} is the total number of time steps needed. The following code snippet examplifies the DTFT for ExE_{x} component at it time step:

void dtft_emf(emf_t *emf, int it, float ***E1, float _Complex ****fwd_E1)
/*<DTFT of the electromagnetic fields (emf) >*/
{
  int i1,i2,i3,ifreq;
  float _Complex factor;

  int i1min=emf->nb;//lower bound for index i1
  int i2min=emf->nb;//lower bound for index i2
  int i3min=emf->nb;//lower bound for index i3
  int i1max=emf->n1pad-1-emf->nb;//upper bound for index i1
  int i2max=emf->n2pad-1-emf->nb;//upper bound for index i2
  int i3max=emf->n3pad-1-emf->nb;//upper bound for index i3

  for(ifreq=0; ifreq<emf->nfreq; ++ifreq){
    factor = emf->expfactor[it][ifreq];

    for(i3=i3min; i3<i3max; ++i3){
      for(i2=i2min; i2<i2max; ++i2){
        for(i1=i1min; i1<i1max; ++i1){
          fwd_E1[ifreq][i3][i2][i1] += E1[i3][i2][i1]*factor;
        }
      }
    }
  }
}

where E1 and fwd_E1 stand for time-domain and frequency-domain ExE_{x} field. Note that we only perform the computation in the region of interest without absorbing boundaries by specifying the index bounds. We do the same for all other field components.

There have been a significant amount of effort in the literature to estimate the total modelling time TmaxT_{\max} (Wang and Hohmann,, 1993; Mittet,, 2010). The key for this estimation is to ensure that the frequency domain EM field reaches its steady state. In practice, the final number of time steps is usually over estimated for safety. In our implementation, we therefore regularly check the convergence of the frequency domain field until no contribution added after more number of time steps. This can be computationally expensive if the convergence check is very frequent. This is particularly true if one wishes to output the Green’s function at every grid point for every frequency of interest. Here, we take a short cut based on judicious physical intuition. Physically, the lowest frequency component of the CSEM has the largest penetration depth, according to the relation between the frequency and the skin depth δ=2/(ωμσ)\delta=\sqrt{2/(\omega\mu\sigma)}. Thus, using lowest frequency at the boundaries of the computing volume should be sufficient to monitor the evolution of the fields. The simulation will be automatically terminated once the field is converged, thus avoiding extra computation.

3.6 Medium homogenization

When an interface is present in the medium, additional effort is required to handle the high medium contrast in order to achieve precise modelling. This can be achieved by averaging the medium (Davydycheva et al.,, 2003). libEMM adapts it for VTI medium, even though this method is capable to handle full anisotropy. The key idea of this method is to compute the effective medium parameters for the horizontal conductivity and vertical resistivity by averaging over the cell size

σ¯xx(xi+0.5)=\displaystyle\bar{\sigma}_{xx}(x_{i+0.5})= 1xi+1xixixi+1σxx(x)dx,\displaystyle\frac{1}{x_{i+1}-x_{i}}\int_{x_{i}}^{x_{i+1}}\sigma_{xx}(x)\mathrm{d}x, (26a)
σ¯yy(yj+0.5)=\displaystyle\bar{\sigma}_{yy}(y_{j+0.5})= 1yj+1yjyjyj+1σyy(y)dy,\displaystyle\frac{1}{y_{j+1}-y_{j}}\int_{y_{j}}^{y_{j+1}}\sigma_{yy}(y)\mathrm{d}y, (26b)
σ¯zz(zk+0.5)=\displaystyle\bar{\sigma}_{zz}(z_{k+0.5})= (1zk+1xkzkzk+1σzz1(z)dz)1.\displaystyle\left(\frac{1}{z_{k+1}-x_{k}}\int_{z_{k}}^{z_{k+1}}\sigma_{zz}^{-1}(z)\mathrm{d}z\right)^{-1}. (26c)

The above procedure is based on the fact that the tangential field components see the model as a combination of resistors in parallel, while the normal field component sees the model as a combination of resistors in series. In the convention of Soviet literature, the method is often coined homogenization, as the same formula holds for all grid points, regardless of whether the point is in a homogeneous region or in the neighbourhood of an interface.

3.7 Optimal nonuniform grid generation

Our finite difference modelling is carried out on a rectilinear mesh, which is simply the tensor product of 1D non-equispaced meshes. To generate these 1D meshes, the rule of geometrical progression is used (Mulder,, 2006, Appendix C). Assume we have the total grid length LL divided into nn intervals (n+1n+1 nodes) with a common ratio q>1q>1. Denote the smallest interval Δx=x1x0\Delta x=x_{1}-x_{0}. Thus, the relation between LL and Δx\Delta x is

L=Δx(1+q++qn1)=Δxqn1q1.L=\Delta x(1+q+\cdots+q^{n-1})=\Delta x\frac{q^{n}-1}{q-1}. (27)

Due to the stability requirement and the resulting computational cost in the modelling, we are restricted to the smallest interval Δx\Delta x and a given number of nodes to discretize over certain distance. Since equation (27) does not yield an explicit expression for the stretching factor qq, the question boils down to finding the optimal qq from fixed nn, Δx\Delta x and LL. The relation in (27) is equivalent to

q=(LΔx(q1)+1)1ng(q)q=\underbrace{\left(\frac{L}{\Delta x}(q-1)+1\right)^{\frac{1}{n}}}_{g(q)} (28)

which inspires us to carry out a number of fixed point iterations:

qk+1=g(qk),k=0,1,.q^{k+1}=g(q^{k}),\quad k=0,1,\cdots. (29)

Since |g(q)|<1|g^{\prime}(q)|<1 holds for all q>1q>1, the scheme proposed here is guaranteed to converge. This idea has been implemented in the following code snippet.

float create_nugrid(int n, float len, float dx, float *x)
/*< generate power-law nonuniform grid by geometric progression >*/
{
  int i;
  float q, qq;

  if(fabs(n*dx-len)<1e-15) {
    for(i=0; i<=n; i++) x[i] = i*dx;
    return 1;
  }

  q = 1.1;//initial guess for the solution
  while(1){
    qq = pow(len*(q-1.)/dx + 1., 1./n);
    if(fabs(qq-q)<1e-15) break;
    q = qq;
  }
  for(x[0]=0,i=1; i<=n; i++) x[i] = (pow(q,i) - 1.)*dx/(q-1.);

  return q;
}

3.8 Consistent naming and memory-safe programming

Since programming 3D CSEM modelling involves sophisticated memory allocation and initialization using pointers, libEMM implements every computational module following the same naming convention to ensure a memory-safe code. Throughout the software, all routines with names xxxx_init() and xxxx_clos() serve as the constructor and the destructor. This allows the C routine xxxx.c resembling C++ class and Fortran modules. The following details the major steps of FDTD based CSEM modelling in fictitious wave domain, as sketched in Algorithm 1:

  • The pointer emf pointing to the data structure of electromagnetic fields will be initialized by emf_init(emf). The relevant parameters requiring computing memory (for example, emf->rho11, emf->rho22 and emf->rho33) will also be allocated for reading input model in emf_init(emf). Another routine named emf_close(emf) for destroying the variables allocated before will be placed in the same source file emf_init_close.c.

  • Following the same convention, the pointer acqui pointing to the data structure of survey acquisition will be initialized by acqui_init(). This initialization allocates variables and reads the input files by different MPI processor. The routine name acqui_close() will deallocate variables after the modelling. These are enclosed in the source file acqui_init_close.c.

  • The interpolation operator will be initialized by interpolation_init() prior to modelling and destroyed by interpolation_close() following the completion of the modelling jobs.

  • The computing model must be extended with CPML layers and some extra buffer layers. This is achieved in extend_model.c, with initialization by extend_model_init() and variable deallocation by extend_model_close().

  • The frequency domain EM fields are initialized by dtft_emf_init() and deallocated by dtft_emf_close(). The DTFT of EM fields will be performed at every time step by dtft_emf() in the same source file dtft_emf.c.

  • Air-wave boundary condition are implemented in airwave_bc.c, initialized by airwave_bc_init() and destroyed in airwave_bc_close(). At each time step, the airwave treatment for electrical and magnetic fields will be carried out via airwave_bc_update_E() and airwave_bc_update_H(), respectively.

  • The time-domain electromagnetic fields for FDTD modelling is initialized in fdtd_init() and destroyed in fdtd_close(). The leap-frog time stepping at each time step will call fdtd_curlH to compute ×𝐇\nabla\times\mathbf{H}, fdtd_update_E() to update electrical field 𝐄\mathbf{E}, fdtd_curlE to compute ×𝐄\nabla\times\mathbf{E}, fdtd_update_H() to update magnetic field 𝐇\mathbf{H}. These routines are bundled in fdtd.c.

  • The injection of the electromagnetic source for forward simulation, done by inject_electric_src_fwd() and inject_magnetic_src_fwd(), resides in inject_src_fwd.c.

  • The convergence is checked every 100 time steps by check_convergence().

  • After time domain modelling, the Green’s function is computed via compute_green_function().

  • The CSEM data is finally extracted via extract_emf() and written out by write_data().

Algorithm 1 FDTD-based CSEM modelling in fictitious wave domain
1:emf_init(emf);
2:acqui_init(acqui,emf);
3:sanity_check(emf);
4:interpolation_init();
5:extend_model_init(emf);
6:dtft_emf_init(emf);
7:airwave_bc_init(emf);
8:fdtd_init(emf);
9:for  𝚒𝚝=𝟶,,𝙽𝚝𝟷\mathtt{it=0,\cdots,N_{t}-1}  do
10:     fdtd_curlH(emf, it);
11:     inject_electric_src_fwd();
12:     fdtd_update_E(emf, it);
13:     airwave_bc_update_E(emf);
14:     dtft_emf(emf, it);
15:     fdtd_curlE(emf, it); ; 
16:     inject_magnetic_src_fwd();
17:     fdtd_update_H(emf, it);
18:     airwave_bc_update_H(emf);
19:     if 𝚒𝚝%𝟷𝟶𝟶==𝟶\mathtt{it\%100==0} then
20:         check_convergence(emf);
21:         if converged, break;  
22:     end if
23:end for
24:compute_green_function(emf);
25:extract_emf();
26:write_data(acqui, emf);
27:fdtd_init(emf); 
28:airwave_bc_close(emf);
29:dtft_emf_close(emf); 
30:extend_model_close(emf); 
31:interpolation_close(); 
32:emf_close(emf); 
33:acqui_close(acqui); 

To allocate memory for the variables used in CSEM modelling, we dynamically allocate arrays of pointers of arrays of pointers … using contiguous memory chunk, to construct multidimensional arrays such that these arrays can be referenced in the same way as static arrays in C. That is, the (i,j,k)(i,j,k)-th element can be referenced simply by array[k][j][i]. The rightmost index i is the fastest index while the leftmost index k is the slowest one. The array of size n1*n2*n3 starts from the first element array[0][0][0], and ends with the last one array[n3-1][n2-1][n1-1].

4 Multithread parallelization on GPU

4.1 Strategies for GPU modelling

The GPU acceleration technology has been mature after more than one decade of the development effort. Besides the successful applications in seismic community such as reverse time migration and full waveform inversion (Yang et al.,, 2014, 2015), the use of GPU for 3D CSEM is scarcely reported. Here we highlight some key strategies when implementing GPU-based 3D CSEM modeling.

  1. 1.

    Perform computation intensive part on GPU without frequent CPU-GPU data traffic. Note that CPU emphasis on low latency, while GPU emphasis on high throughput. In order to avoid substantial amount of communication latency, we design the code to perform all necessary pre-computation as much as possible before porting to GPU-based time stepping. The pre-computation includes validation of the input parameter for sanity check, preparation of the medium on extended domain, tabulating the interpolating weights at the source and receiver locations in a lookup table. The air-water boundary condition is directly computed on device with CUFFT library.

  2. 2.

    Use shared memory with tiled algorithm (Micikevicius,, 2009). The key idea is to explore the fast L1 cache of the shared memory, which is often quite small memory size but 2 order of magnitude faster than accessing the global memory. Tiling forces multiple threads to jointly focus on a subset of the input data at each phase of execution. Here we map the global memory onto 2D shared memory blocks and slide it along the third dimension, as illustrated in Figure 1. The reading and writing of the data on GPU will be performed block by block manner rather than element by element, thus much more efficient. Since we have to repeatedly reuse the same quantity at different location in the finite difference stencil, the tiling technique with shared memory enhances locality of data access.

  3. 3.

    A linear increasing index with consecutive memory access. Each CUDA warp contains 32 processing cores executed in a single instruction multiple thread fashion. The CUDA kernel will serialize the operations when the elements reside in different warps, which leads to dramatic performance penalty. The linear increasing index with consecutive memory access is therefore crucial to design efficient the CUDA kernels.

  4. 4.

    Improved scheme for convergence check of the fields. The modelling can be terminated only if the frequency domain EM fields converges at every grid point. This naturally leads to a reduction operation at all steps of convergence check. To do this efficiently on GPU, the author implemented a parallel reduction scheme in (Yang,, 2021). In this paper, we do it in a more judicious manner by only checking 8 corners of interior computing cube without absorbing boundaries, as sketched in Figure 2. This eliminates the need for parallel reduction and dramatically reduces the required number of floating point operations. The scheme can equally be applied in the sequential implementation.

  5. 5.

    Use of mapped pinned memory. Due to convergence check, timestepping modelling involves the communication between the device (GPU) and host (CPU), which violates the first principle aforementioned. To avoid this, we invoke mapped pinned memory which is desirable for the CPU and GPU to share a buffer without explicit buffer allocations and copies. It is well-known that extensive use of mapped pinned memory may hit the performance. Here, only 8 corners of the complex-valued EM field has to be backed up using zero-copy pinned memory, which minimizes the potential performance deterioration.

Refer to caption
Figure 1: The tiling blocks on x-y plan along z dimension
Refer to caption
Figure 2: Schematic plot of convergence check using eight corners of the interior computing cube without absorbing boundaries

4.2 CUDA implementation

The above techniques imply that parallel GPU implementation requires a significant adaptation of each subroutine in Algorithm 1 into different CUDA kernel functions, while the computational workflow remains the same but runs in on GPU device instead of CPU host. In GPU mode, the device must be initialized first, and then performing dedicated operations by these CUDA kernel functions, using different number of multithreaded block and grid sizes. This has been done by cuda_modelling.cu, as a replacement of the sequential modelling conducted by do_modeling.c. Some simple calculations may be sophisticated to achieve but can easily performed on CPU host only once. These operations, including the extension of the model for absorbing boundaries, the setup of CPML layers and DTFD coefficients, the computation of interpolation weights to inject the source and extract data at receiver loations, have been carried out priori to cuda_modeling.cu: GPU will directly copy them into device memory for direct utilization. This highlights the importance of exploring the distinct advantages of CPU and GPU computation.

All relevant CUDA kernels are placed in cuda_fdtd.cuh. In particular, the computation of the spatial derivatives in FDTD are ×𝐄\nabla\times\mathbf{E} and ×𝐇\nabla\times\mathbf{H}, which have been parallelized by GPU kernels cuda_fdtd_curlE_nugrid and cuda_fdtd_curlH_nugrid, as the alternatives to CPU routines fdtd_curlE and fdtd_curlH. According to tiled algorithm, the first two dimenions are parallelized using shared memory, while the 3rd dimension strides sequentially, yielding the following code

__global__ void cuda_fdtd_curlH_nugrid(...)ΨΨ
{
  const int i1 = threadIdx.x + blockDim.x*blockIdx.x;
  const int i2 = threadIdx.y + blockDim.y*blockIdx.y;
  const int t1 = threadIdx.x + 2;//t1>=2 && t1<BlockSize1+2
  const int t2 = threadIdx.y + 2;//t2>=2 && t2<BlockSize2+2

  __shared__ float sH1[BlockSize2+3][BlockSize1+3];
  __shared__ float sH2[BlockSize2+3][BlockSize1+3];
  __shared__ float sH3[BlockSize2+3][BlockSize1+3];

  int in_idx = i1+n1pad*i2;
  int out_idx = 0;
  int stride = n1pad*n2pad;

  for(i3=i3min; i3<=i3max; i3++){//i3-2>=0 && i3+1<n3pad
    ...
    __syncthreads();

    if(validrw){
      ...
      curlH1[out_idx] = D2H3 - D3H2;
      curlH2[out_idx] = D3H1 - D1H3;
      curlH3[out_idx] = D1H2 - D2H1;
    }
  }//end for i3
}

Note that it is important to synchronize GPU threads before proceeding to next stage. For efficiency, the timestepping of 𝐄\mathbf{E} and 𝐇\mathbf{H} fields also conforms the tiled mapping, i.e.,

__global__ void cuda_fdtd_update_E(...)
{
  const int i1 = threadIdx.x + blockDim.x*blockIdx.x;
  const int i2 = threadIdx.y + blockDim.y*blockIdx.y;
  int i3, id;

  for(i3=0; i3<n3pad; i3++){
    if(i1<n1pad && i2<n2pad){
      id = i1+n1pad*(i2+ n2pad*i3);
      E1[id] += dt*inveps11[id]*curlH1[id];
      E2[id] += dt*inveps22[id]*curlH2[id];
      E3[id] += dt*inveps33[id]*curlH3[id];
    }
  }
}

The convergence of the field is checked directly on GPU by cuda_check_convergence. Since the result of convergence check must be visible on host, we use cudaHostAlloc instead of commonly used routine cudaMalloc, to allocate memory to record the number of converged corners in 3D cube. Correspondingly, the pinned memory must be freed using cudaFreeHost rather than cudaFree. To be concrete, t, these correpond to the following lines of code:

  cudaHostAlloc(&h_ncorner, sizeof(int), cudaHostAllocMapped);

and

  cudaFreeHost(h_ncorner);

5 Software features

5.1 Dependencies for building executable

libEMM is a light-weight yet powerful 3D CSEM modelling toolkit which can run using single CPU processor, multiple MPI parallel process and also CUDA-enabled GPU mode. This implies that the software has the following dependencies:

  • FFTW for fast Fourier transform;

  • MPICH (or other types of MPI) for multi-processor parallelization over distributed memory architecture;

  • Nvidia CUDA package if one wish to benefit from GPU parallelization on device shared memory.

The installation of CUDA package is optional while the first two are mandatory. libEMM uses the C programming language to achieve high performance computing. To compile the code, one will use the command make to compile the code following the given Makefile, which may be edited according to the path of the user’s installation for the above packages. The compiled executable will be named as fdtd and placed in the folder bin.

5.2 Parameter parsing

libEMM is capable to automatically parse the argument list based on the parameter specified in the token. The value for each keyword will be picked up after a equal sign =. Multiple values for the same keyword can be specified via comma-separated parameters. Different keywords will be distinguished via the space.

5.3 Survey acquisitions

To specify the survey layout, three acquisition files in ASCII format must be given to prescribe the source and receiver locations as well as the their connection table. These files will be assigned to the arguments as

fsrc=sources.txt frec=receivers.txt fsrcrec=src_rec_table.txt

where the file sources.txt gives the source locations; the file receivers.txt gives the receiver locations; the file src_rec_table.txt establishes the connection between sources and receivers. When the code runs with MPI, each process will read these files and then carry out independent modelling jobs according to their process index (which uniquely determines a source). The source/receiver file is in the following form:

      x     y      z       azimuth    dip        iTx/iRx
-8000.0   0.0    275.0        0        0           1
-7975.0   0.0    275.0        0        0           2
-7950.0   0.0    275.0        0        0           3
-7925.0   0.0    275.0        0        0           4
    ...

The above 6 columns corresponds to (x,y,z)(x,y,z) coordinates (in meters), the azimuth and dip angles in rad and the index of the source/receivers. Thanks to the interpolation operator computed by inverting Vandermonde matrix, the sources and receivers can be accurately injected/extracted at arbitrary locations without the need to sit on grid. The connection table reads

 iTx iRx
 1    1
 1    2
 1    3
 ...

 2    1
 2    2
 2    3
 ...

which connects each source/transmitter indexed by iTx are associated with a number of receivers index by iRx.

5.4 Physical domain, grid dimensions and coordinates

libEMM carries out CSEM modelling over a 3D cube with physical dimensions specified by the bounds. Modelling over a domain X=X1×X2×X3X=X_{1}\times X_{2}\times X_{3} with X1=X2=[10000,10000]X_{1}=X_{2}=[-10000,10000]m, X3=[0,3000]X_{3}=[0,3000] reads

x1min=-10000 x1max=10000 x2min=-10000 x2max=10000 x3min=0 x3max=3000

The parameters n1, n2 and n3 specify the dimension of the model in x, y and z directions, while d1, d2 and d3 specify the smallest grid spacing (in meters) along each coordinate. To do modelling in a resistivity cube of 1013101^{3} grid points, Δx=Δy=200\Delta x=\Delta y=200 m, Δz=50\Delta z=50 m on uniform grid, one has to write in the argument list

n1=101 n2=101 n3=101 d1=200 d2=200 d3=50

These parameters will be checked by libEMM in order to match the bounds of the domain.

Because libEMM allows the CSEM modelling using both uniform grid and non-uniform grid, the grid coordinate in binary format, for argument fx3nu corresponding to the gridding in z direction, must be provided in order to model on non-uniform grid along z. Note that the file must store exactly n3 floating points. Non-uniform gridding in x- and y- directions has been forbidden due to degraded modelling accuracy (Yang and Mittet,, 2023). When the modelling is performed on uniform grid, there is no need to provide input for fx3nu.

5.5 Resistivity files

Three resistivity files in binary format are expected to feed the argument list for numerical modelling, as specified in the following

frho11=rho11 frho22=rho22 frho33=rho33

where the files rho11, rho22 and rho33 are the resistivities of size n1*n2*n3 after homogenization, implying a half grid shift in x, y and z directions, respectively. These files should be built using model building tools outside the modelling kernel.

5.6 Source and receiver channels, frequencies

libEMM specifies the active source channel and receiver channels via the keywords - chsrc and chrec. For example, recording CSEM data of components ExE_{x} and EyE_{y} from an electrical source JxJ_{x} reads

chsrc=Ex chrec=Ex,Ey

Similarly, one can specify the simulation frequencies as

freqs=0.25,0.75,1.25

in which 0.25 Hz, 0.75 Hz and 1.25 Hz will be the frequencies extracted from time-domain CSEM modelling engine.

5.7 Operator length, CPML layers and buffers

The half length of the finite-difference operator is determined by parameter rd. It is possible to choose rd=1, 2 or 3 in CPU mode, but only rd=2 is supported on GPU. This is because dramatic accuracy improvement has been observed in modelling when switching from the 2nd to the 4th order scheme, but increasing from 4th order to even higher order requires more computational effort without significant accuracy gain (Yang and Mittet,, 2023).

We pad the model with the same number of CPML layers on each side of the cube, specified by the parameter nb. In addition, we add ne number of buffers as a smooth transition zone between CPML and interior domain. A general principle for choosing ne is to ensure the number of buffer layers larger than the half length of finite-difference stencil, for example, 𝚗𝚎=𝚛𝚍+𝟻\mathtt{ne=rd+5}.

5.8 CSEM modelling output

The outcome after running a 3D CSEM modelling will be output as ASCII files named as emf_XXXX.txt, where XXXX is the index of the sources/transmitters. Running the simulation using 4 MPI process leads to

emf_0001.txt emf_0002.txt emf_0003.txt emf_0004.txt

Each file will follow the same convention to print out the transmitter index iTx, the receiver index iRx, the receiver channel Ex/Ey/Ez/Hx/Hy/Hz, the frequency index ifreq and the real and imaginary part of the complex-valed EM field in frequency domain:

iTx  iRx  chrec  ifreq Ψ  emf_real Ψ   emf_imag
1    1 Ψ   Ex Ψ   1 Ψ -1.086630e-14 Ψ 2.026699e-16
1    2 Ψ   Ex Ψ   1 Ψ -1.148639e-14 Ψ 3.976523e-16
1    3 Ψ   Ex Ψ   1 Ψ -1.213745e-14 Ψ 6.254443e-16
1    4 Ψ   Ex Ψ   1 Ψ -1.282039e-14 Ψ 8.894743e-16
1    5 Ψ   Ex Ψ   1 Ψ -1.353619e-14 Ψ 1.192894e-15
1    6 Ψ   Ex Ψ   1 Ψ -1.428555e-14 Ψ 1.539961e-15
1    7 Ψ   Ex Ψ   1 Ψ -1.506933e-14 Ψ 1.934350e-15
1    8 Ψ   Ex Ψ   1 Ψ -1.588794e-14 Ψ 2.380965e-15
1    9 Ψ   Ex Ψ   1 Ψ -1.674208e-14 Ψ 2.884079e-15
1    10    Ex Ψ   1 Ψ -1.763176e-14 Ψ 3.449250e-15
1    11    Ex Ψ   1 Ψ -1.855739e-14 Ψ 4.081327e-15
...

5.9 Running in different modes

To run the 3D CSEM modelling, one may write the parameters in a shell script run.sh and launch the code using bash run.sh. The following is an example shell script running with 25 MPI process, each of the MPI process parallelized by 4 OpenMP threads:

#!/usr/bin/bash

n1=101
n2=101
n3=101
d1=200
d2=200
d3=50

export OMP_NUM_THREADS=4
mpirun -n 25 ../bin/fdtd \
       mode=0 \
       fsrc=sources.txt \
       frec=receivers.txt \
       fsrcrec=src_rec_table.txt \
       frho11=rho11 \
       frho22=rho22 \
       frho33=rho33 \
       chsrc=Ex \
       chrec=Ex \
       x1min=-10000 x1max=10000 \
       x2min=-10000 x2max=10000 \
       x3min=0 x3max=5000 \
       n1=$n1 n2=$n2 n3=$n3 \
       d1=$d1 d2=$d2 d3=$d3 \
       nb=12 ne=6 \
       freqs=0.25,0.75,1.25 \
       rd=2

where there are 25 sources in the acquisition. It is also possible to do modelling for some specific sources, for example,

mpirun -n 2 ../bin/fdtd  shots=12,14 ...

will only simulate CSEM data for transmitter-12 and transmitter-14. The output CSEM response will be named as emf_0012.txt and emf_0014.txt. The code will terminate if the number of process launched for mpirun exceeds the total number of sources given in the file sources.txt.

The code can be compiled with nvcc compiler for GPU modelling using command ‘make GPU=1’. Assume there is only one GPU card available on the laptop. Once the executable is created after compilation, one can directly run the code by executing the command with proper parameters:

../bin/fdtd ...

Adding nvprof in the script provides us a convenient way to profile the computing time spent on different CUDA kernels

nvprof --log-file profiling.txt ../bin/fdtd ...

where the profiling log will be recorded in profiling.txt.

6 Application examples

The software allows modelling on both uniform grid (using the default flag emf->nugrid=0) and non-uniform grid (emf->nugrid=1). We start with a layered model possessing a 1D structure, for which the use of uniform grid is sufficient. The high-order FDTD over non-uniform grid has been developed for practical applications to model 3D CSEM response where the part of the computational domain requires dense sampling. The second example including a varying seafloor bathymetry therefore serves as an illustration. Three frequencies, i.e., 0.25 Hz, 0.75 Hz and 1.25 Hz which are representative in real applications, are examined in this experiment. The temporal interval and the number of time steps are automatically chosen (Mittet,, 2010) without breaking the stability condition. The relative amplitude error is measured by |ExFD|/|Exref|1|E_{x}^{FD}|/|E_{x}^{ref}|-1. It should be close to 0 if the modelling is precise, positive if |ExFD|>|Exref||E_{x}^{FD}|>|E_{x}^{ref}| and negative if positive if |ExFD|<|Exref||E_{x}^{FD}|<|E_{x}^{ref}|. The phase error is measured by ExFDExref\angle E_{x}^{FD}-\angle E_{x}^{ref} in degrees.

6.1 Layered model

The first test is a layered resistivity model shown in Figure 3: the model has 5 layers, the air of 1012Ωm10^{12}\;\Omega\cdot\mbox{m}, the water of 0.3125Ωm0.3125\;\Omega\cdot\mbox{m}, and two layers of formations. Between two layers of formations, there exists a resistive layer of 100Ωm100\;\Omega\cdot\mbox{m} mimicking hydrocarbon bearing formations. The 3D FDTD code is now cross-validated against the semi-analytic solution calculated by Dipole1D program (Key,, 2009). We discretize the model of dimension 10km×10km×5km10\;\mbox{km}\times 10\;\mbox{km}\times 5\;\mbox{km} including a resistor at the depth 1250-1350 m, with grid spacing Δx=Δy=100m\Delta x=\Delta y=100\;\mbox{m}, Δz=50m\Delta z=50\;\mbox{m}. This results in a large 3D grid of size 1013101^{3}. A point source is placed at 50 m above the seabed. Figures 4a and 4b show very good agreement between the FDTD method and the semi-analytic solution, in terms of the amplitude and the phase of the ExE_{x} component. Figures 4c and 4d show less than 1.5%1.5\% of the amplitude error and less than 11^{\circ} of the phase error at most of the offsets. The error at the near offset are very large. Fortunately the near offset data are not relevant for practical CSEM imaging. This example confirms the good accuracy of the FDTD method.

Refer to caption
Figure 3: The cross-section of the resistivity model: the dash line indicates the depth of the sources.
Refer to caption
Figure 4: Comparison between FDTD and analytic solution for 3D CSEM simulation in the shallow water scenario. (a) Amplitude; (b) Phase; (c) Amplitude error; (d) Phase error.

6.2 Model with seafloor bathymetry

To mimic practical marine CSEM environment, we perform CSEM model using a 3D resistivity model of 2D structure including seafloor bathymetry, as shown in Figure 5. To catch the impact of seafloor bathymetry, we mesh the model densely around the seabed, while gradually stretching the grid along the depth below. The x-z section of the mesh is shown in Figure 6. With such a model, we compute a reference solution using MARE2DEM (Key,, 2016) using finite element method. It extends the model tens of kilometers in each direction, to mimic that EM fields propagate to very far distance while avoiding possible edge/boundary effect. The regions of interest has been densely gridded using Delaunay triangulation, while big cells are employed at far distance away from interior part of the model. In our finite difference modelling, the PML boundary condition attenuates the artificial reflections in the computational domain within tens of grids to achieve the same behavior. Both the amplitude and phase match very well between FDTD modelling and MARE2DEM solution, as can be seen in Figure 7. Figure 7c shows that the amplitude error are bounded within 3% for all frequencies at relevant offsets; the amplitude error at far offset and higher frequencies becomes larger when approaching to noise level (1015V/m210^{-15}\;\mbox{V/m}^{2}).

When libEMM is launched using multiple MPI process, we can simultaneously model CSEM data associated with different transmitters while all receivers are active/alive. This mimics the practical situations in real marine CSEM acquisitions. For a survey layout in Figure 8, libEMM helps us to obtain both inline and broadside data, as shown in Figure 9.

Refer to caption
Figure 5: Marine CSEM conductivity model with seafloor bathymetry
Refer to caption
Figure 6: The non-uniform grid to mesh the resistivity including bathymetry
Refer to caption
Figure 7: Comparison of simulated CSEM response by FDTD and MARE2DEM
Refer to caption
Figure 8: A survey layout sheet
Refer to caption
Figure 9: The amplitude and phase of the CSEM data (including inline and broadside data) modelled with a transmitter deployed at (x,y,z)=(-4000 m, -4000 m, 550 m).

7 Performance analysis: OpenMP versus GPU

Using the previous layered resistivity model of size 1013101^{3}, we analyze the computational efficiency by comparing the runtime of the same simulation using GPU and CPU. Our GPU card is Nvidia Geforce GTX860M, which is of compute capability 5.0 paired with 2048 MB GDDR5 memory, operating at a frequency of 797 MHz. We compare it against Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz. The C code has been highly optimized and parallelized using different number of OpenMP threads.

When running in GPU mode, the profiling information recorded by CPU clock is misleading. The command nvprof is then useful to profile the runtime of each GPU kernel with good time resolution. The CUDA events are used as synchronization markers to accurately measure whole timing.

Figure 10 is a screenshot of profiling log from different GPU kernels. Table 1 summarizes the CPU runtime of the computation at all phases during different tests. Due to the existence of high resistive layer, the time-domain modelling was forced to use very small temporal sampling. The 3D modelling completes after more than 3000 time steps. It takes only approximately 53 seconds on GPU. To better catch the speedup gained by GPU compared with multithreaded CPU, we display the total runtime in Figure 11a. The GPU code obtains a speedup factor of 13.7 over single threaded CPU, and a factor of 8.3 compared with CPU parallelized by 8 OpenMP threads. We see that the performance of the code on CPU does not scale linearly with increasing number of OpenMP threads. Figure 11b gives a pie plot showing the proportion of each computational phase in the global GPU modelling process. The most computational intensive operations are the calculation of ×𝐄\nabla\times\mathbf{E} and ×𝐇\nabla\times\mathbf{H}, the update of the fields 𝐄\mathbf{E} and 𝐇\mathbf{H}, the application of air-wave boundary condition and the DTFT. The time spent on injecting source and convergence check is negligible.

Refer to caption
Figure 10: Screenshot of profiling log on GPU activities
Table 1: Runtime (in seconds) using CPU with different number of OpenMP threads
Computation omp=1 omp=2 omp=4 omp=8
×𝐄\nabla\times\mathbf{E} 2.07e+02 1.39e+02 1.05e+02 8.56e+01
inject 𝐇\mathbf{H} 1.12e-03 1.73e-03 1.48e-03 2.10e-03
udpate 𝐇\mathbf{H} 4.18e+01 3.36e+01 2.91e+01 2.66e+01
×𝐇\nabla\times\mathbf{H} 1.88e+02 1.22e+02 9.21e+01 7.65e+01
inject 𝐄\mathbf{E} 3.94e-03 5.28e-03 4.72e-03 6.07e-03
udpate 𝐄\mathbf{E} 5.49e+01 4.23e+01 3.66e+01 3.41e+01
airwave 2.06e+02 1.91e+02 2.03e+02 1.93e+02
DTFT 3.07e+01 2.71e+01 2.58e+01 2.62e+01
convergence 5.57e-04 8.01e-04 8.97e-04 2.35e-03
Total 7.30e+02 5.56e+02 4.93e+02 4.42e+02
Refer to caption
Figure 11: (a) Total runtime comparison between CPU with different number of OpenMP threads and GPU; (b) The percentage of each computing stage on GPU

8 Discussions

We highlight that different medium homogenization methods can result in different level of accuracy. The result obtained in this paper uses the homogenization method in (Davydycheva et al.,, 2003). The interface may also be handled using anti-aliasing bandlimited step function (Mittet,, 2021), which can be very accurate also (Yang and Mittet,, 2023). The step function approach is not adopted in libEMM, because a continuous description of the 2D interface (usually difficult to have in practice) is required for modelling in 3D inhomogeneous medium.

Besides the accuracy, the efficiency is also another important concern. The author have reported a speedup factor of 40 (Yang,, 2021) using GPU compared to single threaded GPU. After applying many effective optimizations in the C code, in particular eliminating point-by-point convergence check on grid, the speedup factor becomes smaller, but still significant. Note that we rely on a GPU card purchased in 2014 which may be quite old-fashioned. A more impressive number on speedup should be expected using latest hardware. However, releasing the software is more worthwhile as it opens the door for the user to further improve the performance rather than focusing on specific number on a specific hardware.

There are some limitations in current version of libEMM. Modelling with extended source of finite length has also been implemented in libEMM by uniformly distributing source current over the length of the the antenna. However, the accuracy of the implementation remains unknown. The FDTD approach is capable to handle fully anisotropic medium for numerical simulation. These obviously imply important applications for practical problems which have not been implemented in libEMM. A simple tool to build 3D models of 1D and 2D structures has been served, but splitted from the modelling as an independent part. The users may use other advanced and fancy tools to build sophisticated 3D model with geological structures, and then perform medium homogenization over nonuniform grid based on the tools provided before starting modelling jobs.

9 Conclusions

A light weight yet powerful software - libEMM - has been presented to do 3D CSEM modelling in fictitious wave domain. It includes both CPU implementation and GPU parallelization. Thus, libEMM allows the users benefiting from both multi-core parallel cluster, but also promises efficiency improvement through graphics cards when GPU resources are available. A detailed description of the software features are given for the ease of usage. Application examples are given with performance analysis. The landscape of this work is to move this implementation on cluster for 3D CSEM inversion, to effectively handle large scale applications.

Acknowledgements

Pengliang Yang is supported by Chinese Fundamental Research Funds for the Central Universities (AUGA5710010121) and National Natural Science Fundation of China (42274156). The author thanks Rune Mittet and Daniel Shantsev for many fruitful discussions. The author also appreciates the thoughtful question on the analysis of the dispersion error raised by an anonymous reviewer. libEMM is available via github repository: https://github.com/yangpl/libEMM. Many user-friendly features and development ideas of libEMM are inspired by the well-known computational geophysics open softwares - Seismic Unix and Madagascar.

Appendix A Dispersion analysis

Let us present some theoretic analysis on the numeric dispersion in wave domain without source terms. Applying the curl operator for the first subequation, and then plugging it into the second one in equation (3) cancels out the electric field such that

(μϵtt+××)𝐇=(μϵttΔ)𝐇=0,(\mu\epsilon\partial_{tt}+\nabla\times\nabla\times)\mathbf{H}^{\prime}=(\mu\epsilon\partial_{tt}-\Delta)\mathbf{H}^{\prime}=0, (30)

where Δ=\Delta=\nabla\cdot\nabla; the second equality is due to 𝐇=0\nabla\cdot\mathbf{H}=0 and the fact that

××𝐇=𝐇𝐇.\nabla\times\nabla\times\mathbf{H}^{\prime}=\nabla\nabla\cdot\mathbf{H}^{\prime}-\nabla\cdot\nabla\mathbf{H}^{\prime}. (31)

Consider the time harmonic plane wave representation of the EM fields on the grid

𝐇ei(ωtkxxkyykzz),\mathbf{H}^{\prime}\propto e^{-\mathrm{i}(\omega t-k_{x}x-k_{y}y-k_{z}z)}, (32)

where kxk_{x}, kyk_{y} and kzk_{z} are the wavenumber in the x, y and z directions. In case of uniform grid spacing Δx\Delta x, Δy\Delta y and Δz\Delta z, the grids are then directly prescribed by the index: xi=iΔxx_{i}=i\Delta x, yi=iΔyy_{i}=i\Delta y and zi=iΔzz_{i}=i\Delta z. The 2r2r coefficients of the finite difference stencil are independent of the grid position, satisfying

ci+=ci+1+=ci=ci+1:=ciΔx,i=1,,rc_{i}^{+}=-c_{-i+1}^{+}=c_{i}^{-}=-c_{-i+1}^{-}:=\frac{c_{i}}{\Delta x},\quad i=1,\cdots,r (33)

where cic_{i} is the standard finite difference coefficients which can be easily found using the method in (Fornberg,, 1998). The discretized spatial derivatives become

Dx+=Dx=1Δxi=1rci(ei(2i1)kxΔx2ei(2i1)kxΔx2)=2iΔxi=1rcisin((i12)kxΔx),D_{x}^{+}=D_{x}^{-}=\frac{1}{\Delta x}\sum_{i=1}^{r}c_{i}(e^{\mathrm{i}\frac{(2i-1)k_{x}\Delta x}{2}}-e^{-\mathrm{i}\frac{(2i-1)k_{x}\Delta x}{2}})=\frac{2\mathrm{i}}{\Delta x}\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{x}\Delta x), (34)

leading to the discretized Laplacian operator

ΔDx+Dx+Dy+Dy+Dz+Dz=(2i=1rcisin((i12)kxΔx))2Δx2(2i=1rcisin((i12)kyΔy))2Δy2(2i=1rcisin((i12)kzΔz))2Δz2\begin{split}\Delta\approx&D_{x}^{+}D_{x}^{-}+D_{y}^{+}D_{y}^{-}+D_{z}^{+}D_{z}^{-}\\ =&-\frac{(2\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{x}\Delta x))^{2}}{\Delta x^{2}}-\frac{(2\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{y}\Delta y))^{2}}{\Delta y^{2}}-\frac{(2\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{z}\Delta z))^{2}}{\Delta z^{2}}\\ \end{split} (35)

The time derivative was approximated using 2nd order leap-frog scheme so that

ttDt+Dt=(2sin(ωΔt2)Δt)2.\partial_{tt}\approx D_{t}^{+}D_{t}^{-}=-\left(\frac{2\sin(\frac{\omega\Delta t}{2})}{\Delta t}\right)^{2}. (36)

Equation (30) results in

sin2(ωΔt2)v2Δt2=(i=1rcisin((i12)kxΔx))2Δx2+(i=1rcisin((i12)kyΔy))2Δy2+(i=1rcisin((i12)kzΔz))2Δz2.\frac{\sin^{2}(\frac{\omega\Delta t}{2})}{v^{2}\Delta t^{2}}=\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{x}\Delta x))^{2}}{\Delta x^{2}}+\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{y}\Delta y))^{2}}{\Delta y^{2}}+\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{z}\Delta z))^{2}}{\Delta z^{2}}. (37)

Denote

S:=vΔt(i=1rcisin((i12)kxΔx))2Δx2+(i=1rcisin((i12)kyΔy))2Δy2+(i=1rcisin((i12)kzΔz))2Δz2\begin{array}[]{rl}S:=&v\Delta t\sqrt{\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{x}\Delta x))^{2}}{\Delta x^{2}}+\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{y}\Delta y))^{2}}{\Delta y^{2}}+\frac{(\sum_{i=1}^{r}c_{i}\sin((i-\frac{1}{2})k_{z}\Delta z))^{2}}{\Delta z^{2}}}\end{array} (38)

and the magnitude of the wavenumber k:=kx2+ky2+kz2k:=\sqrt{k_{x}^{2}+k_{y}^{2}+k_{z}^{2}}. According to k=ω/vk=\omega/v, equation (37) implies the numerical phase velocity vphasev_{phase} in FDTD simulation satisfies

sin(kΔtvphase2)=S.\sin(\frac{k\Delta tv_{phase}}{2})=S. (39)

The dispersion error can then be defined by the deviation of the relative phasevelocity (the ratio between numeric phase velocity and the true velocity) from 1,

δ:=vphasev1=2kvΔtarcsinS1.\delta:=\frac{v_{phase}}{v}-1=\frac{2}{kv\Delta t}\arcsin S-1. (40)

The 1D analysis for wave equation on uniform grid can be found in Taflove and Hagness, (2005, chapter 2). Indeed, Dablain, (1986) shows that the use of higher order finite difference scheme reduces the dispersion error, allowing simulation with less number of points per wavelength. For 3D Maxwell equation, the mathematical derivation above gives a qualitative way for such analysis. Unfortunately, this analysis becomes challenging on nonuniform grid.

References

  • Alumbaugh and Newman, (1997) Alumbaugh, D. and Newman, G. (1997). Three-dimensional massively parallel electromagnetic inversion-II. Analysis of a crosswell electromagnetic experiment. Geophysical Journal International, 128(2):355–363.
  • Björck and Pereyra, (1970) Björck, A. and Pereyra, V. (1970). Solution of Vandermonde systems of equations. Mathematics of computation, 24(112):893–903.
  • Castillo-Reyes et al., (2018) Castillo-Reyes, O., de la Puente, J., and Cela, J. M. (2018). PETGEM: A parallel code for 3D CSEM forward modeling using edge finite elements. Computers & Geosciences, 119:123–136.
  • Chang-Chun et al., (2015) Chang-Chun, Y., Xiu-Yan, R., Yun-He, L., Yan-Fu, Q., Chang-Kai, Q., and Jing, C. (2015). Review on airborne electromagnetic inverse theory and applications. Geophysics, 80(4):W17–W31.
  • Chave and Cox, (1982) Chave, A. D. and Cox, C. S. (1982). Controlled Electromagnetic Sources for Measuring Electrical Conductivity Beneath the Oceans 1. Forward Problem and Model Study. Journal of Geophysical Research, 87:5327–5338.
  • Chew and Weedon, (1994) Chew, W. C. and Weedon, W. H. (1994). A 3-D perfectly matched medium from modified Maxwell’s equations with stretched coordinates. Microwave and Optical Technology Letters, 7:599–604.
  • Cockett et al., (2015) Cockett, R., Kang, S., Heagy, L. J., Pidlisecky, A., and Oldenburg, D. W. (2015). SimPEG: An open source framework for simulation and gradient based parameter estimation in geophysical applications. Computers & Geosciences, 85:142–154.
  • Constable, (2010) Constable, S. (2010). Ten years of marine CSEM for hydrocarbon exploration. Geophysics, 75(5):75A67–75A81.
  • Constable et al., (1986) Constable, S. C., Cox, C. S., and Chave, A. D. (1986). Offshore electro-magnetic surveying techniques. In 56th Annual International Meeting, SEG, Expanded Abstracts, pages 81–82. Society of Exploration Geophysicists.
  • da Silva et al., (2012) da Silva, N. V., Morgan, J. V., MacGregor, L., and Warner, M. (2012). A finite element multifrontal method for 3D CSEM modeling in the frequency domain. Geophysics, 77(2):E101–E115.
  • Dablain, (1986) Dablain, M. (1986). The application of high order differencing for the scalar wave equation. Geophysics, 51:54–66.
  • Davydycheva et al., (2003) Davydycheva, S., Druskin, V., and Habashy, T. (2003). An efficient finite-difference scheme for electromagnetic logging in 3D anisotropic inhomogeneous media. Geophysics, 68(5):1525–1536.
  • Eidesmo et al., (2002) Eidesmo, T., Ellingsrud, S., MacGregor, L., Constable, S., Sinha, M., Johansen, S., Kong, F., and Westerdahl, H. (2002). Sea bed logging (SBL), a new method for remote and direct identification of hydrocarbon filled layers in deepwater areas. First break, 20(3).
  • Ellingsrud et al., (2002) Ellingsrud, S., Eidesmo, T., Johansen, S., Sinha, M., MacGregor, L., and Constable, S. (2002). Remote sensing of hydrocarbon layers by seabed logging (SBL): Results from a cruise offshore Angola. The Leading Edge, 21(10):972–982.
  • Fornberg, (1998) Fornberg, B. (1998). Classroom note: Calculation of weights in finite difference formulas. SIAM review, 40(3):685–691.
  • Golub, (1996) Golub, G. H. (1996). Matrix Computation, third edition. Johns Hopkins Studies in Mathematical Sciences.
  • Grayver et al., (2014) Grayver, A. V., Streich, R., and Ritter, O. (2014). 3D inversion and resolution analysis of land-based CSEM data from the Ketzin CO2 storage formation. Geophysics, 79(2):E101–E114.
  • Key, (2009) Key, K. (2009). 1D inversion of multicomponent, multifrequency marine csem data: Methodology and synthetic studies for resolving thin resistive layers. Geophysics, 74(2):F9–F20.
  • Key, (2016) Key, K. (2016). MARE2DEM: a 2-D inversion code for controlled-source electromagnetic and magnetotelluric data. Geophysical Journal International, 207(1):571–588.
  • Komatitsch and Martin, (2007) Komatitsch, D. and Martin, R. (2007). An unsplit convolutional perfectly matched layer improved at grazing incidence for the seismic wave equation. Geophysics, 72(5):SM155–SM167.
  • Li and Key, (2007) Li, Y. and Key, K. (2007). 2D marine controlled-source electromagnetic modeling: Part 1—an adaptive finite-element algorithm. Geophysics, 72(2):WA51–WA62.
  • Maaø, (2007) Maaø, F. (2007). Fast finite-difference time-domain modeling for marine subsurface electromagnetic problems. Geophysics, 72:A19–A23.
  • MacGregor et al., (2007) MacGregor, L., Barker, N., Overton, A., Moody, S., and Bodecott, D. (2007). Derisking exploration prospects using integrated seismic and electromagnetic data-A Falkland Islands case study. The Leading Edge, 26(3):356–359.
  • MacGregor and Tomlinson, (2014) MacGregor, L. and Tomlinson, J. (2014). Marine controlled-source electromagnetic methods in the hydrocarbon industry: A tutorial on method and practice. Interpretation, 2(3):SH13–SH32.
  • Meju, (2019) Meju, M. A. (2019). A simple geologic risk-tailored 3D controlled-source electromagnetic multiattribute analysis and quantitative interpretation approach. Geophysics, 84(3):E155–E171.
  • Micikevicius, (2009) Micikevicius, P. (2009). 3D finite difference computation on GPUs using CUDA. In Proceedings of 2nd workshop on general purpose processing on graphics processing units, pages 79–84.
  • Mittet, (2010) Mittet, R. (2010). High-order finite-difference simulations of marine CSEM surveys using a correspondence principle for wave and diffusion fields. Geophysics, 75(1):F33–F50.
  • Mittet, (2021) Mittet, R. (2021). Small-scale medium variations with high-order finite-difference and pseudospectral schemes. Geophysics, 86(5):T387–T399.
  • Mulder, (2006) Mulder, W. (2006). A multigrid solver for 3D electromagnetic diffusion. Geophysical prospecting, 54(5):633–649.
  • Newman and Alumbaugh, (1995) Newman, G. A. and Alumbaugh, D. L. (1995). Frequency-domain modelling of airborne electromagnetic responses using staggered finite differences. Geophysical Prospecting, 43(8):1021–1042.
  • Oristaglio and Hohmann, (1984) Oristaglio, M. L. and Hohmann, G. W. (1984). Diffusion of electromagnetic fields into a two-dimensional earth: A finite-difference approach. Geophysics, 49(7):870–894.
  • Rochlitz et al., (2021) Rochlitz, R., Seidel, M., and Börner, R.-U. (2021). Evaluation of three approaches for simulating 3-D time-domain electromagnetic data. Geophysical Journal International, 227(3):1980–1995.
  • Rochlitz et al., (2019) Rochlitz, R., Skibbe, N., and Günther, T. (2019). custEM: Customizable finite-element simulation of complex controlled-source electromagnetic data. Geophysics, 84(2):F17–F33.
  • Roden and Gedney, (2000) Roden, J. A. and Gedney, S. D. (2000). Convolution PML (CPML): An efficient FDTD implementation of the CFS-PML for arbitrary media. Microwave and Optical Technology Letters, 27(5):334–339.
  • Smith, (1996) Smith, J. T. (1996). Conservative modeling of 3-D electromagnetic fields, Part i: Properties and error analysis. Geophysics, 61(5):1308–1318.
  • Streich, (2009) Streich, R. (2009). 3D finite-difference frequency-domain modeling of controlled-source electromagnetic data: Direct solution and optimization for high accuracy. Geophysics, 74(5):F95–F105.
  • Taflove and Hagness, (2005) Taflove, A. and Hagness, S. C. (2005). Computational Electrodynamics: The Finite-Difference Time-Domain Method. Artech House, 3rd edition.
  • Wang and Hohmann, (1993) Wang, T. and Hohmann, G. W. (1993). A finite-difference, time-domain solution for three-dimensional electromagnetic modeling. Geophysics, 58(6):797–809.
  • Ward and Hohmann, (1988) Ward, S. H. and Hohmann, G. W. (1988). Electromagnetic theory for geophysical applications. In Electromagnetic Methods in Applied Geophysics: Volume 1, Theory, pages 130–311. Society of Exploration Geophysicists.
  • Werthmüller et al., (2019) Werthmüller, D., Mulder, W., and Slob, E. (2019). emg3d: A multigrid solver for 3D electromagnetic diffusion. Journal of Open Source Software, 4:1463.
  • Yang, (2021) Yang, P. (2021). Boost the efficiency of 3D CSEM modelling using graphics processing units. In 82nd EAGE Annual Conference & Exhibition, number 1, pages 1–5. European Association of Geoscientists & Engineers.
  • Yang et al., (2014) Yang, P., Gao, J., and Wang, B. (2014). RTM using effective boundary saving: A staggered grid GPU implementation. Computers & Geosciences, 68:64–72.
  • Yang et al., (2015) Yang, P., Gao, J., and Wang, B. (2015). A graphics processing unit implementation of time-domain full-waveform inversion. Geophysics, 80(3):F31–F39.
  • Yang and Mittet, (2023) Yang, P. and Mittet, R. (2023). Controlled-source electromagnetics modelling using high order finite-difference time-domain method on a nonuniform grid. Geophysics, 88(2):E56–E67.
  • Zhdanov and Keller, (1994) Zhdanov, M. S. and Keller, G. V. (1994). The geoelectrical methods in geophysical exploration, volume 31.