MDA GAN: Adversarial-Learning-based 3-D Seismic Data Interpolation and Reconstruction for Complex Missing

Yimin Dou, Kewen Li, Hongjie Duan, Timing Li, Lin Dong, Zongchao Huang The corresponding author is Kewen Li. [email protected] Dou, Kewen Li, Zongchao Huang, College of computer science and technology, China University of Petroleum (East China) Qingdao, China.Timing Li, College of Intelligence and Computing, Tianjin University Tianjin, China.Hongjie Duan, Shengli Oilfield Company, SINOPEC Dongying, China.Lin Dong, Center on Frontiers of Computing Studies, Peking University Beijing, ChinaThis work was supported by grants from the National Natural Science Foundation of China (Major Program, No.51991365), and the Natural Science Foundation of Shandong Province, China (ZR2021MF082).

Abstract

The interpolation and reconstruction of missing traces is a crucial step in seismic data processing, moreover it is also a highly ill-posed problem, especially for complex cases such as high-ratio random discrete missing, continuous missing and missing in fault-rich or salt body surveys. These complex cases are rarely mentioned in current works. To cope with complex missing cases, we propose Multi-Dimensional Adversarial GAN (MDA GAN), a novel 3-D GAN framework. It keeps anisotropy and spatial continuity of the data after 3D complex missing reconstruction using three discriminators. The feature splicing module is designed and embedded in the generator to retain more information of the input data. The Tanh cross entropy (TCE) loss is derived, which provides the generator with the optimal reconstruction gradient to make the generated data smoother and continuous. We experimentally verified the effectiveness of the individual components of the study and then tested the method on multiple publicly available data. The method achieves reasonable reconstructions for up to 95% of random discrete missing and 100 traces of continuous missing. In fault and salt body enriched surveys, MDA GAN still yields promising results for complex cases. Experimentally it has been demonstrated that our method achieves better performance than other methods in both simple and complex cases. Moreover, our network does not require training weights for each survey, the same weights it uses are applied to multiple surveys, significantly reducing time and computational costs, and we make the model publicly available on https://github.com/douyimin/MDA_GAN.

Index Terms:

GAN, adversarial learning, seismic data interpolation, seismic data reconstruction, seismic complex missing

I Introduction

Complete seismic data are often difficult to acquire due to various constraints such as economic, physical, and other factors. Reconstructing missing seismic data is a critical and challenging task. There are currently two major types of methods for interpolation and reconstruction of seismic data, theory-driven and data-driven.

There are four theory-driven methods. The first are prediction filter-based methods[1, 2, 3, 4], which extend interpolation to the frequency-space $(f-x)$ domain, which exploit the predictability of linear events in the $(f-x)$ domain. Such methods must use a local windowing strategy when handling nonlinear events. The choice of window affects the performance of interpolation and reconstruction, and the choice of the optimal window is currently unrealistic [5]. The second is the wave equation-based approach[6, 7], which requires subsurface velocity as a priori, however, it is difficult to obtain accurate velocity models in practical engineering, which limits its extension. The third is the methods based on sparse constraints [8, 9, 10, 11], which assume that the seismic data are linear or quasi-linear, and even though it can handle nonlinear data through parameter tuning, it is impractical to obtain optimal parameters, which affects its performance[12]. Finally, there are low-order constraint methods[13, 14, 15, 16], which are based on compression sensing, however most of them are only applicable to the missing case of random discrete.

Data driven employing machine learning or deep learning, there are generally two types of methods, Auto-Encoder (AE) and Generative Adversarial Neural Networks (GAN). AE-based methods include using AE[5], CAE[17], UNet[18] and ResNet[19], etc. AE methods use encoders to extract hidden variables from missing data, and the complete data is used to supervise the seismic data generated by the decoder, which leads to data being learned point-to-point, which also results in less comprehensive global information acquisition, with weak global information and strong local information. Due to the lack of global information, the performance is weak for large scale missing, so most such methods are suitable for handling random discrete missing [20, 21, 22, 23, 24, 25]. Wu et al. introduced the attention mechanism to solve the weak global information problem of the AE method, and achieved promising results in the case of continuous missing [21, 20]. He et al. used a multi-stage UNet to step through the interpolation of consecutive missing [22].

Additional to this are GAN-based methods, but they are less studied. GAN adds discriminator to AE (generator), which introduce regional or global information, so it should have better performance, especially for continuous and large scale deficiencies [26]. Oliveira achieved interpolation and reconstruction of Netherlands Offshore F3 seismic data via cGANs [27]. These methods all employ $L_{1}$ as the reconstruction loss of the generator, but GAN generally uses Tanh as the activation function of the output layer to ensure that the value domain of the generator is at $[-1,1]$ , and the gradient of $L_{1}$ under this activation function is non-positively correlated with the loss (equations (8, 9, 10)), which may lead to point-to-point learning that cannot be optimized. Since natural images are bounded by the three RGB channels to each other, and allow for diversity inpainting, $L_{1}$ is possible as the reconstruction loss of natural image inpainting [28, 29, 30, 31]. However, seismic data only has single channel, so these GAN-based interpolation results for seismic data have some obvious splicing traces, and the interpolation and reconstruction of seismic data require restoring the original seismic data as much as possible, and $L_{1}$ causes the results to deviate from the optimal solution. Wei et al. changed the adversarial loss to Wasserstein loss based on cGAN and achieved 2-D interpolation of up to 35 consecutive seismic traces missing.[32]. Kaur et al. used GAN and CycleGAN to interpolate 2-D synthetic seismic data [33, 34]. The existing methods are limited to the application of GAN in the field of seismic interpolation, so there is no significant difference in the performance of the current GAN-based and the AE-based methods. Moreover, most of the existing AE and GAN-based methods focus only on repairing 2D slices, but reconstruction 2D slices one by one leads to discontinuities in brutal stacked [12] section when there are complex cases such as large scale or continuous missing in shot gathers. Therefore it is crucial to research 3D interpolation and reconstruction approaches.

We suggest that introducing GAN into seismic interpolation requires addressing two major challenges. The first is how to design a reasonable GAN benchmark framework for 3D seismic data interpolation and reconstruction to prevent mode collapse during training, while ensuring that the anisotropy and spatial continuity of 3D seismic data can be preserved even in the case of complex missingness. Second, it can ensure that the generator performs accurate point-to-point learning without distortion of each pixel even when the discriminator introduces global information and uses the Tanh activation function.

There is no GAN-based method for 3D seismic interpolation [27, 32, 33, 34, 25]. The interpolation of 3D seismic data needs to ensure the spatial continuity and reliability of the seismic data in three directions (timeline, inline, and crossline) while maintaining the anisotropy of the seismic data, especially when we need to handle complex problems such as large scale discrete missing or continuous missing. Moreover, compared with the reconstruction of 2D data, the computational resources and parameters required for 3D data have increased significantly[35, 36], especially the alternate training mode of generator and discriminator, which puts harsh requirements on the hardware conditions. The increase of parameters is more possible to cause the mode collapse of GAN [37, 38], so it is challenging to design a GAN-based interpolation and reconstruction framework for 3D seismic data.

We propose a Multi-Dimensional Adversarial GAN (MDA GAN) that uses three discriminators to extract regional or global features for 3D volume, timeline, inline, and crossline of seismic data (inline and crossline share one discriminator and apply to both pre-stack and post-stack data) to maintain anisotropy and spatial continuity of the reconstructed seismic data.

In response to the demanding hardware requirements imposed by 3D data and the mode collapse problems that may be caused, a 3D data generator for GAN is designed, which propagates features using high and low resolution in parallel to prevent information loss, so that excessive width is not required to retain image information, and low (large scale) and high resolution interact during propagation, increasing the receptive field and fully extracting global information, so the network does not need to be deeper [39]. Meanwhile, we propose the FSM module and embed this generator to ensure the integrity of the unmissed parts during the model inference. The generator reduces the parameters and the load on the hardware, while maintaining a high quality reconstruction performance.

To enable accurate point-to-point learning of seismic data in the framework of GAN, we analyzed the gradients of reconstruction loss that are commonly used nowadays, and thus derived the Tanh Cross Entropy (TCE) loss that can perform accurate pixel-level learning under Tanh activation function for seismic data reconstruction by MDA GAN.

In general, our contributions include the following three points. Firstly, a novel GAN framework, MDA GAN, is proposed to cope with more complex 3D seismic missing cases to ensure the anisotropy as well as spatial continuity of the reconstructed 3D data. Secondly, to cope with the burden on the hardware caused by the parameter proliferation brought by 3D GAN, we elaborated the generator and proposed and embedded the FSM to accomplish end-to-end data generation and splicing. Finally, TCE reconstruction loss is proposed so that the training process can provide smoother gradients, enable accurate point-to-point learning, and ensure that the reconstructed data are distortion-free.

II Approach

In this section, we first introduce the baseline framework of MDA GAN, then illustrate the structure of FSM and how it is embedded and functions in the generator, and finally discuss the seismic voxel (pixel) level reconstruction loss Tanh Cross Entropy Loss (TCE) and analyze its training gradient to illustrate the advancement of the method.

II-A Baseline Framework

Refer to caption — Figure 1: In (a), the framework consists of one 3-D generator, one 3-D discriminator and two 2-D discriminators. For training, the input to the 3D network is data of size $128\times 128\times 128$ , and the batch size is $b$ . To conserve the RAM, the 2D discriminator randomly draws $8$ slices of $128\times 128$ in the 3D data along the corresponding direction as input, and the batch size is $8\times b$ . While for inference, the input to the generator can be any size as allowed by the hardware. (b) is the detailed structure of the generator in the framework, and the discriminator follows the standard encoder structure.The CONV block consists of a $3\times 3$ convolution, a normalization layer and a LeakyReLU activation function, Resblock was proposed by He et al [40].

Our approach is based on GAN, and in V Appendix we add a basic explanation for GAN and the current underlying framework for GAN-based image inpainting in computer vision.

The major disadvantage of current 2D seismic interpolation is that reconstruction 2D slices one by one leads to discontinuities in brutal stacked section[12] when there are complex cases such as large scale or continuous missing in shot gathers. To solve this problem, we extend the seismic interpolation and reconstruction to three dimensions.

The generator is a 3D CNN structure, and the input and output are 3D data of the same size. This article describes the structure of the generator and its components detailed in II-B.

The 3D seismic data have obvious anisotropy, where the seismic reflection axis is approximately perpendicular to the inline and crossline slices, and a clear seismic layered texture can be observed in these two directions, while the timeline direction shows irregular texture. Therefore, we use two 2D discriminators $\mathcal{D}^{m}_{\theta_{D}^{\text{xi}}}$ and $\mathcal{D}^{m}_{\theta_{D}^{\text{t}}}$ , inline and crossline share discriminator $\mathcal{D}^{m}_{\theta_{D}^{\text{xi}}}$ , and timeline direction uses $\mathcal{D}^{m}_{\theta_{D}^{\text{t}}}$ discriminator. 2D discriminators are employed for each directional slice to maintain the anisotropy of the restored seismic data, and we also use a 3D discriminator $\mathcal{D}^{t}_{\theta_{D}}$ to ensure the continuity of the results in 3D space.

The framework consists of a 3D generator, two 2D discriminators and a 3D discriminator. Accordingly, we use three adversarial losses $\mathcal{L}_{\mathcal{G}^{t}}$ , $\mathcal{L}_{\mathcal{G}^{m}_{\text{xi}}}$ and $\mathcal{L}_{\mathcal{G}^{m}_{\text{t}}}$ . In combination with the reconstruction (V-B), the loss of the supervised generator can be expressed as equation (1).

\mathcal{L}_{\mathcal{G}}=\mathcal{L}_{\mathcal{G}^{t}}+\mathcal{L}_{\mathcal{G}^{m}_{\text{xiline}}}+\mathcal{L}_{\mathcal{G}^{m}_{\text{tline}}}+\lambda\mathcal{L}_{\text{rec}}

(1)

where $\mathcal{L}_{\text{rec}}$ is the reconstruction loss, which we will describe in detail in II-D, $\lambda$ is a scaling factor to adjust the reconstruction loss to the same order of magnitude as the adversarial loss. Each adversarial loss is expressed as equation (2).

\mathcal{L}_{\text{adv}}=\text{log}(1-\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(\textbf{I}_{m})))

(2)

where $\textbf{I}_{m}$ is the missing input data. This loss tries to make the generator fool the discriminator so that the generated hidden variables are approximated to the real data.

Each discriminator determines the authenticity of the generated tensor or matrix, using the cross-entropy loss for supervision, equation (3).

\mathcal{L}_{\mathcal{D}}=\frac{1}{2}\text{log}(1-\mathcal{D}_{\theta_{D}}(\textbf{I}_{g}))+\frac{1}{2}\text{log}(-\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(\textbf{I}_{m})))

(3)

For reconstructing high-quality seismic data, our approach employs adversarial learning in multiple dimensions, hence the term Multi-Dimensional Adversarial (MDA) GAN. The overall framework of MDA GAN is shown in Fig.1-(a).

II-B Generator

II-B1 Backbone Network

Traditional generators normally take the form of concatenating high-resolution features with low-resolution features, i.e., encoder-decoder structures, and related models are available for various vision tasks [28, 29, 30, 31, 41]. However, such methods downsample the features several times and returns the original resolution via the decoder, which causes the loss of information, while the process of restoring the resolution requires plenty of parameters and eats up many video memory resources. The new model design idea of parallel propagation of high resolution with low resolution was proposed by Wang and Sun et al. [42, 43], and our previous work demonstrated that this structured network can achieve higher performance with few hardware resources [39]. We designed the generator based on this structure, using fewer computational resources to ensure the generation of high-quality data.

Resblock [40] is selected as the base block of the generator. The network is downsampled only twice, and then a low-resolution branch is added for parallel propagation, which is structured as shown in Fig. 1-(b). Because the network keeps the high-resolution features, we don’t need an excessive width and depth to ensure that the network recovers from the low resolution, so we use fewer parameters and save much bandwidth.

II-B2 Feature Splicing Module

To adequately preserve the unmissing information of the input data $\textbf{I}_{m}$ , many works perform splicing of $\textbf{I}_{m}$ with $\textbf{I}_{c}$ (output) via mask, and the splicing process is expressed as equation (4).

\textbf{I}_{g}\approx\textbf{I}_{g}^{\prime}=(1-\mathcal{M})\times\textbf{I}_{c}+\mathcal{M}\times\textbf{I}_{m}

(4)

Where $\textbf{I}_{g}$ is the ground truth and $\textbf{I}_{g}^{\prime}$ is the spliced data, $\mathcal{M}$ is mask, which is a matrix of the same size as the inputs, with 0 indicating missing data and the opposite with 1. However, for seismic data, it is difficult to obtain masks by a single step (threshold method) and some manual intervention is needed to splice or introduce other algorithms. Not splicing will cause the resulting $(1-\mathcal{M})\times\textbf{I}_{c}$ to differ from the original data $(1-\mathcal{M})\times\textbf{I}_{g}$ as shown in Fig. 9 (a-5),(b-5),(c-5). We wanna to find a solution to replace the role of $\mathcal{M}$ and to form end-to-end training and inference.

For this purpose, we presented FSM. Our approach is to splice the low-level features of the network (representing the original non-missing part) with the high-level features (representing the generated part) so that the network automatically completes this process, and then reconstructs the spliced features to output the result. The existing work such as UNet[30] which concatenates the high level and bottom level features and then fuses them using convolution does not express the splicing process explicitly, and in the fusion process, the missing and unmissing of the bottom level features but with have the same weight, this causes the generated unmissing part $(1-\mathcal{M})\times\textbf{I}_{c}$ to be blurred and differ from the original data, making it necessary to manually splice the generated with the original data.

The module can be represented by the following equations (5, 6, 7).

\mathcal{F}^{\text{branch}}=\textbf{F}^{\text{cat}}(\mathcal{F}^{l},\mathcal{F}^{h})

(5)

\mathcal{W}_{m}=\sigma(\textbf{F}^{\text{conv}}(\textbf{F}^{\text{conv}}(\mathcal{F}^{\text{branch}},3,1,2C_{1},C_{1}),1,1,C_{1},2C_{1}))

(6)

\mathcal{F}^{\text{spl}}=\textbf{F}^{\text{conv}}(\mathcal{W}_{m}\odot\mathcal{F}_{\text{branch}},1,1,2C_{1},C_{1})

(7)

In equation (5) the features are concatenated with the high-level feature $\mathcal{F}^{h}$ and the low-level feature $\mathcal{F}^{l}$ to obtain $\mathcal{F}^{\text{branch}}$ , and the channel after concatenation is changed from $C_{1}$ to $2\times C_{1}$ . $\mathcal{F}^{l}$ uses the features after the first convolution and $\mathcal{F}^{h}$ uses the features before the last residual block, so they both keep the original feature resolution. $\mathcal{F}^{l}$ has not been downsampled and convolved multiple times, so it retains all the information of the original data, and $\mathcal{F}^{h}$ has been filled with the missing parts after the generator network.

We want to express the splicing process of the features explicitly, so that the network automatically expresses equation (4), we compress and weight $\mathcal{F}^{\text{branch}}$ by a convolution, then recover it to the original channel and restrict its value domain to $[0,1]$ by the sigmoid ( $\sigma(x_{i})=\frac{1}{1+\text{exp}(-x_{i})}$ ) activation function to obtain $\mathcal{W}_{m}$ , the above process is represented by equation (6).

After equation (7), the FSM can be updated via standard backpropagation, and our experiments show that the module is able to converge $\mathcal{W}_{m}$ so that it generates a feature mask that resembles $\mathcal{M}$ and applies it to the splicing of features. The entire process can be represented in Fig. 2.

We visualized $\mathcal{W}_{m}$ , which has channel $2C_{1}$ and has two components, $\mathcal{W}_{l}$ ( $C_{1}$ channels) and $\mathcal{W}_{h}$ ( $C_{1}$ channels), with $\mathcal{W}_{l}$ corresponding to the $\mathcal{F}_{l}$ part of the $\mathcal{F}^{\text{branch}}$ and $\mathcal{W}_{h}$ corresponding to the $\mathcal{F}_{h}$ part, and Fig. 3 shows the weights of $\mathcal{W}_{m}$ with different patterns of input.

The approach don’t input any supervised information about mask ( $\mathcal{M}$ ), but the network still adaptively generates a mask-like heatmap. The supervision of the reconstruction loss and the FSM force $\mathcal{W}_{m}$ to automatically form semantic weights, and during continuous training, the network gradually learns which parts of the features should be extracted and retained.

II-C Discriminator

The discriminator network uses the structure of a standard encoder. The discriminator consists of five convolutional layers, and each layer employs the LeakyReLU[37] activation function, and also incorporates the spectral normalization[44] to stabilize the training of the GAN.

II-D Tanh Cross Entropy Loss

Most of the current reconstruction losses are based on $L_{1}$ and $L_{2}$ [45, 46, 47, 48], where $L_{1}$ provides stable gradients and $L_{2}$ has the more stable solution, but both loss functions have the common disadvantage of unsmoothed gradients in GANs (equations (10) and (13)). Generator networks generally use Tanh as the final activation layer to keep the network’s value domain at $[-1,1]$ , while ensuring training stability[37]. Next we derive the gradient expressions for $L_{1}$ and $L_{2}$ under the Tanh activation function.

First, we derive the gradient of $L_{1}$ loss back propagation to backbone.

\mathcal{L}_{L_{1}}=\dfrac{1}{N_{w}\times N_{h}\times N_{d}}\sum_{i}^{N_{w}}\sum_{j}^{N_{h}}\sum_{k}^{N_{d}}\mid y_{i,j,k}-\hat{y}_{i,j,k}\mid_{1}

(8)

\displaystyle\hat{y}_{i}=\text{Tanh}(\hat{x})=\dfrac{e^{\hat{x}}-e^{-\hat{x}}}{e^{\hat{x}}+e^{-\hat{x}}},\dfrac{d\hat{y}}{d\hat{x}}=1-\hat{y}^{2}

(9)

	$\displaystyle\dfrac{\partial\mathcal{L}_{L_{1}}(y,\hat{y})}{\partial\hat{x}_{t}}$	$\displaystyle=\dfrac{\partial\mathcal{L}_{L_{1}}(y,\hat{y})}{\partial\hat{y}_{t}}\cdot\dfrac{\partial\hat{y}_{t}}{\partial\hat{x}_{t}}$		(10)
		$\displaystyle=\begin{cases}1-\hat{y}_{t}^{2}\ \ \ \text{if}\ \ y_{t}-\hat{y}_{t}\geq 0\\ \hat{y}_{t}^{2}-1\ \ \ \text{if}\ \ y_{t}-\hat{y}_{t}<0\end{cases}$		(10)

The most reasonable relationship between $\partial\mathcal{L}_{L_{1}}/\partial\hat{x}_{t}$ and $\hat{y}_{t}$ should be the one that triggers the minimum gradient (minimum absolute value) when $\hat{y}_{t}=y_{t}$ . As $\hat{y}_{t}$ and $y_{t}$ become farther apart, the greater the gradient they cause should be, and this increasing relationship should be linear, i.e., smooth, its slope should not change with $\hat{y}_{t}$ . We define this optimal assumption as the equation (11).

\dfrac{\partial\mathcal{L}(y,\hat{y})}{\partial\hat{x}_{t}}=y_{t}-\hat{y}_{t}=\varepsilon

(11)

Fig. 4 shows a sharp fluctuation in the neighborhood of $L_{1}$ loss when $\hat{y}_{t}=y_{t}$ . Although it provides the correct gradient direction, which leads to it being trainable, but its values are coarse and the incremental relationship is nonlinear, so it obviously does not satisfy the above optimality assumption.

Equations (12) and (13) illustrate the gradient form of $L_{2}$ back propagation to backbone.

\mathcal{L}_{L_{2}}=\dfrac{1}{N_{w}\times N_{h}\times N_{d}}\sum_{i}^{N_{w}}\sum_{j}^{N_{h}}\sum_{k}^{N_{d}}\|y_{i,j,k}-\hat{y}_{i,j,k}\|^{2}_{2}

(12)

	$\displaystyle\dfrac{\partial\mathcal{L}_{L_{2}}(y,\hat{y})}{\partial\hat{x}_{t}}$	$\displaystyle=\dfrac{\partial\mathcal{L}_{L_{2}}(y,\hat{y})}{\partial\hat{y}_{t}}\cdot\dfrac{\partial\hat{y}_{t}}{\partial\hat{x}_{t}}$		(13)
		$\displaystyle=2(y_{t}-\hat{y}_{t})(1-\hat{y}_{t}^{2})$		(13)

In Fig. 5, the three curves before reaching the extreme value point execute a relatively reasonable trend, although it is not smooth, the magnitude of their absolute values is proportional to the distance from the point $\hat{y}_{t}=y_{t}$ . However, when they are in the position after the extreme point, i.e. the area marked by the red line segment, they show an unreasonable trend and the absolute values that should be large are becoming reduced. The $L_{2}$ loss forms a parabola between $\hat{y}_{t}=y_{t}$ and $\hat{y}_{t}=1$ . Although this can still be trained and does not affect the propagation of the gradient, we still hope to find a more reasonable gradient induced by the loss.

Both the $L_{1}$ and $L_{2}$ losses are deficient in the form of the gradient passed to the backbone. We give the optimal gradient form assumption in equation (11). We want $\varepsilon$ to be directly used as the gradient form of the loss to backbone, and the resulting loss will satisfy the optimal gradient assumption, so we integrate it. Then the loss function we expect can be expressed by the equation (14).

		$\displaystyle\mathcal{L}_{\text{TCE}}(y_{t},\hat{y}_{t})=\int\varepsilon\cdot\dfrac{\partial\hat{x}_{t}}{\partial\hat{y}_{t}}d\hat{x}_{t}=\int\dfrac{y_{t}-\hat{y}_{t}}{1-\hat{y}_{t}^{2}}d\hat{x}_{t}$		(14)
		$\displaystyle=-y_{t}\left(\frac{\text{log}\|\hat{y}_{t}+1\|}{2}-\frac{\text{log}\|\hat{y}_{t}-1\|}{2}\right)$
		$\displaystyle-\frac{1}{2}\text{log}\|1-\hat{y}_{t}^{2}\|+\text{log}2+C^{\prime}$
		$\displaystyle=-\left[\left(\frac{1+y_{t}}{2}\right)\text{log}\left(\frac{1+\hat{y}_{t}}{2}\right)+\left(\frac{1-y_{t}}{2}\right)\text{log}\left(\frac{1-\hat{y}_{t}}{2}\right)\right]$

The loss of all voxels should be averaged during training, so that the loss expression is (15).

		$\displaystyle\mathcal{L}_{\text{TCE}}(y,\hat{y})=\dfrac{1}{N_{w}\times N_{h}\times N_{d}}\sum_{i}^{N_{w}}\sum_{j}^{N_{h}}\sum_{k}^{N_{d}}$		(15)
		$\displaystyle\left[\left(\frac{1+y_{i,j,k}}{2}\right)\text{log}\left(\frac{1+\hat{y}_{i,j,k}}{2}\right)+\left(\frac{1-y_{i,j,k}}{2}\right)\text{log}\left(\frac{1-\hat{y}_{i,j,k}}{2}\right)\right]$		(15)

Its form is similar to cross-entropy, so we named it Tanh Cross-Entropy (TCE). This loss causes the gradient at training as in equation (16).

\dfrac{\partial\mathcal{L}_{\text{TCE}}(y,\hat{y})}{\partial\hat{x}_{t}}=y_{t}-\hat{y}_{t}=\varepsilon

(16)

In Fig. 6, TCE shows the smoothed gradient of its feedback to backbone. When $\hat{y}_{t}=y_{t}$ , the gradient is $0$ and the distance between $\hat{y}_{t}$ and $y_{t}$ is linear with the gradient size, which is the optimal loss we want to obtain.

Although $L_{1}$ and $L_{2}$ can give correct feedback to the generator in the gradient direction, their gradient values are coarse and they do not reasonably reflect the distance between GT and the predicted value, while the gradient passed from TCE to the generator backbone is the residual of GT and the predicted value, which perfectly captures the change of the distance between the output and the target.

II-E Training

The over-arching training framework follows the Fig. 1. The optimizer uses Adam[49]. In general, the training of the discriminator should be ahead of the generator, which is more conducive to producing high-quality data. We use the TTUR strategy for training [50], setting the learning rate of discriminators $\eta_{d}=0.0004$ and the generator $\eta_{g}=0.0001$ , finally alternating the training of the generator and discriminators.

III Experiments

III-A Experimental Grouping

We designed three control groups and one experimental group using the current most common UNet as the baseline model (Group 1), UNet has excellent performance in interpolating seismic data, which has been demonstrated in several works ([18, 20, 22] etc.). Group 2 is the replacement of the backbone of Group 1 with the generator we proposed in Fig. 1. Group 2 does not apply adversarial learning and therefore does not require tanh activation function and TCE loss, this group can verify the validity of the generator and the FSM compared to Group 1. Group 3 incorporates multidimensional adversarial learning, which differs from the experimental group in that it uses $L_{1}$ as the reconstruction loss. This group can be compared with Group 2 to verify the effectiveness of multidimensional adversarial and with the experimental group to verify the effectiveness of TCE loss. Group 4 is the experimental group, which uses exactly the method proposed in this paper. The settings of the four groups of experiments are shown in Table I.

TABLE I: Experimental grouping

	Method	Explain
Group 1	UNet	UNet-based regression, baseline model
Group 2	MDA Generator	Regression by generators only
Group 3	MDA GAN ( $L_{1}$ )	without TCE, with $L_{1}$ loss
Group 4	MDA GAN (TCE)	Our method

III-B Dataset

The current experimental approach of most data-driven seismic interpolation studies is to train a new set of weights for each survey [5, 17, 18, 19, 20, 21, 22, 25, 22, 27, 33, 34], but retraining the neural network is extremely time-consuming and resource-wasting, resulting in a much higher time and resource cost than the theory-driven approach. Furthermore, the traces of the gathers are often highly similar to each other, so the neural network weights obtained from training for a specific survey may be overfitted, and this overfitting tends to show extremely high performance for the test set divided by the same survey, so this way of dividing the data set may not objectively reflect the performance of the model.

In this study, we want to train a general model that can be applied to the reconstruction of most seismic data without having to retrain new weights for different surveys, which is the current trend in AI development, CV and NLP scientists are currently working on building general models for various tasks[51, 52]. We have used multiple surveys (both pre-stack and post-stack) to jointly train one model in the expectation that it will generalize to more data, we have made the final model publicly available on Github¹¹1Available: https://github.com/douyimin/MDA_GAN.

The usage of the data is shown in Table II.

TABLE II: Data Usage

Surveys	Purpose
SEG C3	3/5 train, 1/5 val, 1/5 test
Mobil Avo Viking Graben Line 12	3/5 train, 1/5 val, 1/5 test
F3 Netherlands	3/5 train, 1/5 val, 1/5 test
New Zealand Kerry	3/5 train, 1/5 val, 1/5 test
New Zealand Parihaka	3/5 train, 1/5 val, 1/5 test
New Zealand Opunake	All train
Sinopec surveys	All train

All training data are cropped to $128\times 128\times 128$ , because the generator is fully convolutional, so the validation and test sets can be of arbitrary size. During training, 30%-99% of slices are randomly removed in the training cuboid as input data, the original data are used as supervised information, and the whole process does not require manual annotation. The training batch size is 4, and it is trained on two RTX3090Ti with 500000 steps. The model with the highest SSIM (details in III-C) metric on the validation set was selected as the final model and tested.

Next, we present and analyze the qualitative and quantitative results for the test set in each survey separately, and then present the combined quantitative metrics on the test set for all surveys. The trace of test data deletion in this paper’s experiments were all performed using python’s random library, and all random seeds were ’2022’.

III-C Evaluation Metrics

All experiments in this paper were evaluated quantitatively using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) metrics[53], and all data were normalized to $[0,1]$ prior to evaluation. To facilitate comparison and reproduction, the quantitative metrics are calculated directly using the public library scikit-image’s implementation²²2Available: https://scikit-image.org of PSNR and SSIM, and all hyperparameters in SSIM are default.

III-C1 PNSR

PSNR is one of the most common methods for image quality assessment and is expressed as equation (17).

\textbf{\text{F}}_{\text{PNSR}}(\textbf{I}_{g}^{\prime},\textbf{I}_{g})=10\cdot\text{log}_{10}\left(\frac{\textbf{\text{F}}_{\text{max}}(\textbf{I}_{g})^{2}}{\textbf{\text{F}}_{\text{mse}}(\textbf{I}_{g}^{\prime},\textbf{I}_{g})}\right)

(17)

where $\textbf{\text{F}}_{\text{max}}(\cdot)$ is a function of the maximum value from the data, and $\textbf{\text{F}}_{\text{mse}}(\cdot)$ is the Mean Square Error (MSE) of the two images.

III-C2 SSIM

Unlike PSNR’s estimation of pixel point-to-point, SSIM incorporates consideration of inter-pixel dependence and prefers to evaluate the spatial similarity of two images. The expression is as follows.

\textbf{\text{F}}_{\text{SSIM}}(\textbf{I}_{g}^{\prime},\textbf{I}_{g})=\frac{(2\mu_{\textbf{I}_{g}^{\prime}}\mu_{\textbf{I}_{g}}+c_{1})(2\sigma_{\textbf{I}_{g}^{\prime}\textbf{I}_{g}}+c_{2})}{(\mu_{\textbf{I}_{g}^{\prime}}^{2}+\mu_{\textbf{I}_{g}}^{2}+c_{1})(\sigma_{\textbf{I}_{g}^{\prime}}^{2}+\sigma_{\textbf{I}_{g}}^{2}+c_{2})}

(18)

Where $\mu_{\textbf{I}_{g}^{\prime}}$ is the mean of $\textbf{I}_{g}^{\prime}$ , $\mu_{\textbf{I}_{g}}$ is the mean of $\textbf{I}_{g}$ , $\mu_{\textbf{I}_{g}^{\prime}}^{2}$ is the variance of $\textbf{I}_{g}^{\prime}$ , $\mu_{\textbf{I}_{g}}^{2}$ is the variance of $\textbf{I}_{g}$ , $\sigma_{\textbf{I}_{g}^{\prime}\textbf{I}_{g}}$ is the covariance of $\textbf{I}_{g}^{\prime}$ and $\textbf{I}_{g}$ , $c_{1}$ and $c_{2}$ two variables to stabilize the division with weak denominator, see literature [53] for details.

In skit-image, the equation is implemented by a sliding window of size $7\times 7\times 7$ , i.e., the local SSIM is calculated within each window to obtain the SSIM map. By averaging the SSIM map is the final SSIM value, while the visualization of the SSIM map can reflect the spatial distribution of the difference between the two data. In our experiments, we used SSIM to quantify the experimental results, while visualizing the SSIM map to clearly reflect the spatial distribution of the reconstructed differences.

III-D Test on SEG C3

In SEG C3, we stacks gather into 3D, as shown in Table II 3/5 of the data were involved in training, 1/5 were used for validation, and we used the last 1/5 for testing. Certainly, there are other surveys involved in the training.

We present it using one of the $192\times 240\times 64$ grid points (Fig. 7), which is obtained by interpolating (resize) the original data $168\times 205\times 31$ .

III-D1 Discrete missing

The upper limit of missing in most of the current studies is 50%. The present work hopes to address more complex missing cases, and we use 50% as the baseline and gradually expand the percentage of missing to 95% (50%, 75%, 90%, 95%). Fig. 8 shows the qualitative results for each group of experiments in the missing randomized discrete case, this figure also illustrates the visualization of quantitative results using SSIM metrics. Table III shows the SSIM and PSNR values on this test data.

In the case of 50% random missing, all four groups of experiments demonstrated promising results, and when the percentage of missing was expanded to 75%, UNet’s interpolation results started to show a discontinuity and its SSIM map showed a clear region of discrepancy, which was also shown at the same position in Group 2. When the percentage of missing continues to expand to 90%, the reliable interpolation is completed only for groups 3 and 4 using multidimensional adversarial, and the regression-based approach is difficult to work with a high level of missingness.

With an extremely high percentage of missing 95%, regression methods like Group 1 and 2 did not work at all, and interpolation was accomplished using multidimensional adversarial Group 3 and 4, which differed by using different reconstruction losses. Qualitatively, the results are smoother using TCE. Quantitatively, the method using $L_{1}$ has higher SSIM values and TCE shows a higher PSNR. From Fig. 8 (a-3) (b-3) (c-3) and (d-3), $L_{1}$ also shows some discontinuities and distortions in other missing ratios, the phenomenon that is also present in many GAN-based works[34, 27]. As we analyzed in II-D section, the gradient provided by $L_{1}$ to the GAN is not optimal, which leads to interpolation results with some distortion. This is compensated well by TCE, which outperforms the currently most commonly used $L_{1}$ loss in most qualitative and quantitative experiments (except the SSIM metric under 95% missing).

TABLE III: SSIM and PSNR on SEG C3, Discrete Missing

Discrete

Missing Ratio

50%

75%

90%

95%

UNet

0.9739

0.9356

0.8599

0.8189

SSIM

MDA Generator

0.9711

0.9490

0.8803

0.8377

MDA GAN (

L_{1}

)

0.9772

0.9612

0.9474

0.8901

MDA GAN (TCE)

0.9847

0.9651

0.9465

0.8854

UNet

36.88

31.20

26.05

22.22

PSNR

MDA Generator

36.69

31.79

26.19

22.66

MDA GAN (

L_{1}

)

37.02

32.55

30.21

25.96

MDA GAN (TCE)

37.14

32.81

31.00

26.29

III-D2 Continuous missing

Next, we compare the performance of the four methods in the continuous missing case. It should be noted that the best repair record for missing continuous seismic traces is currently by Yu et al.[20]. They interpolated $30\%$ of the missing continuous traces of the $128\times 128$ 2-D seismic image with an interpolated area of $128\times 38$ , i.e. 38 traces.

Fig. 9 illustrates three consecutive missing, 40, 70, and 100 traces. In the case of consecutive missing cases, Group 1 and 2 show unsatisfactory reconstructions with significant voids, and it can be observed that the regression method is only an extension of the data in the neighbourhood of the missing site and is limited in scope. Table IV also shows that the quantitative metrics based on the MDA method are significantly higher than the other methods.

The MDA GAN method completes the reconstruction of the continuous missing. This experiment also shows the effectiveness of the TCE, where the interpolation result of $L_{1}$ has obvious splicing traces and discontinuity at the red arrow, while the method using TCE is smoother and continuous.

It is worth noting that the reconstruction results of UNet at the red box in Fig. 9 show significant differences compared to ground truth, while the model using FSM shows less or even no differences, precisely because the FSM in the generator comes into action, and as described in section II-B and Fig. 3, FSM is able to retain more information about the unmissing part $(1-\mathcal{M})\times\textbf{I}_{c}$ .

TABLE IV: SSIM and PSNR on SEG C3, Continuous Missing

Continuous

Missing Traces

100

UNet

0.9419

0.9122

0.8833

SSIM

MDA Generator

0.9522

0.9248

0.9007

MDA GAN (

L_{1}

)

0.9799

0.9552

0.9225

MDA GAN (TCE)

0.9789

0.9596

0.9188

UNet

30.18

25.94

22.35

PSNR

MDA Generator

32.00

27.77

23.63

MDA GAN (

L_{1}

)

34.60

30.81

26.90

MDA GAN (TCE)

34.78

31.11

26.85

III-E Test on Mobil Avo Viking Graben Line 12

We fold the prestack gather into 3D, and intercept $512\times 256\times 128$ grid points in the test data as a sample for qualitative analysis, as shown in Fig. 10.

Fig. 11 shows the qualitative interpolation results of MDA GAN and UNet, and Table V shows the corresponding quantitative results.

At 50% discrete deficiency, both MDA GAN and UNet are able to perform the interpolation task well. When the ratio comes to 75%, UNet starts to be overwhelmed, while our method still shows high reliability. With more complex 90% and 95% missing, UNet invalidates, while our method still yields promising results. In the case of consecutive missing cases, all interpolation results of UNet are failures. Fig. 11 shows that the qualitative results of our method in complex cases are significantly higher than those of UNet, which is currently the most used, and the same conclusion can be obtained in the quantitative results in Table V.

TABLE V: SSIM and PSNR on Mobil Avo Viking Graben Line 12

Discrete

Missing Ratio

50%

75%

90%

95%

SSIM

UNet

0.9677

0.9531

0.9060

0.8583

MDA GAN (TCE)

0.9885

0.9766

0.9419

0.9088

PSNR

UNet

35.09

30.14

23.76

22.00

MDA GAN (TCE)

36.11

31.59

27.44

25.93

Continuous

Missing Traces

100

SSIM

UNet

0.9627

0.9541

0.9349

MDA GAN (TCE)

0.9845

0.9682

0.9510

PSNR

UNet

35.02

28.82

28.43

MDA GAN (TCE)

35.19

30.33

29.24

III-F Test on New Zealand Kerry and F3 Netherlands

This work focuses on the interpolation and reconstruction of complex cases. The above experiments validate the reconstruction performance of our method under complex missing for the set of prestack gather. The purpose of this subsection is to verify the performance of our method with complex surveys rich in anomalies such as faults and salt bodies.

The missing in original engineering often occurs in shot gathers, and anomalous bodies also appear in brutal stack sections [12] formed by shot gathers, but the post-stack data containing anomalous bodies is more abundant in the current public data, so we use the post-stack as the test data in this subsection.

III-F1 New Zealand Kerry (fault-rich)

Kerry contains abundant faults perpendicular to the timeline, which is a great challenge for the interpolation task. Fig. 12 (a) shows the original data, which sampled at a total of $224\times 512\times 192$ grid points. Due to the limited data available for Kerry, part of the data used for the presentation was involved in the training (which has been marked in the Fig. 12 (a)).

In Fig. 12 (b), we removed 80% of slices in both crossline and timeline directions, respectively, accumulating 95.62% of the missing voxels. Because of the large proportion of missing traces, some areas form continuous missing, and most of the faults are covered by missing traces.

UNet does not complete the reconstruction for most of the missing faults, and there are some voids at the yellow arrows in (c), while at the corresponding positions, MDA GAN fills these voids as faults. In the region marked by the red arrow in (d), although MDA GAN restores the amplitude, the fault information is lost because the continuous missing here removes the fault completely, while MDA GAN shows a very high interpolation performance for the missing where a portion of the fault information is retained.

III-F2 F3 Netherlands (salt bodies)

Salt bodies are common in marine seismic data, which have closed characteristics and distinct boundaries. We used the salt body part at the bottom of the F3 as qualitative data, which was all used as validation and test set without involvement in training, which sampled at a total of $128\times 384\times 384$ grid points.

Fig. 12 (a) shows the original data. In (b), we removed 80% of slices in both crossline and timeline directions, respectively, accumulating 95.76% of the missing voxels. In (c), the internal structure of the salt body is lost at (1) of the UNet reconstruction result, and the boundary information of the salt body is not reconstructed at (2), while the corresponding position in (d) shows a high-quality reconstruction result. MDA GAN demonstrates excellent reconstruction performance for anomalies such as faults and salt bodies.

III-G Quantitative metrics for the total test set

Finally we calculate the SSIM and PSNR metrics jointly for all the test sets shown in Table II. Because the size of the test parts differs for each survey, we calculate only the proportional discrete missing cases, the results are shown in Table VI. For a more visual analysis of the performance of each group of experiments, we organized Table VI into Fig. 14.

In Fig. 14 (a), the MDA-based metrics are remarkably higher than the other two groups, indicating the effectiveness of multidimensional adversarial. While using $L_{1}$ and TCE losses do not differ significantly in terms of SSIM, with $L_{1}$ slightly higher than TCE, which is one of the reasons why L1 is currently widely used as a reconstruction loss, but the experiments from above show that the qualitative results of TCE losses are more advanced (Fig. 8 and 9), and it can provide smoother and continuous reconstruction results. In (b), the PSNR of MDA GAN (TCE) was higher than the other groups. Similar to (a), there was no significant difference between TCE and $L_{1}$ , and TCE was slightly higher than $L_{1}$ in terms of PSNR metrics. This may be because PSNR is more sensitive to pixel differences, while SSIM prefers to show structural differences in regions.

TABLE VI: SSIM and PSNR on Mobil Avo Viking Graben Line 12

Discrete

Missing Ratio

50%

75%

90%

95%

UNet

0.9588

0.9217

0.8801

0.8701

SSIM

MDA Generator

0.9620

0.9355

0.8980

0.8789

MDA GAN (

L_{1}

)

0.9673

0.9501

0.9289

0.9179

MDA GAN (TCE)

0.9700

0.9528

0.9277

0.9137

UNet

34.61

29.36

23.50

22.29

PSNR

MDA Generator

34.72

30.45

24.38

23.01

MDA GAN (

L_{1}

)

34.88

31.93

26.11

25.84

MDA GAN (TCE)

35.37

32.49

27.25

26.12

IV Conclusion

This work focuses on complex case interpolation of 3D seismic data, and proposes a multidimensional adversarial (MDA) seismic reconstruction framework and TCE reconstruction loss. The MDA framework emphasizes maintaining the anisotropy and spatial continuity of the reconstructed seismic data in all three directions, and thus promising results can be obtained in multiple complex cases. The feature splicing module (FSM) embedded in the generator enables the reconstructed data to retain more information about the unmissing parts. The TCE loss is mathematically derived to provide the generator with the optimal reconstruction gradient under the Tanh activation function to ensure that the reconstructed pixels are free of distortion. Qualitative experiments have demonstrated the effectiveness of the MDA framework by showing that the MDA GAN is able to accomplish promising reconstructions under large proportions of discrete and continuous missing as well as fault and salt enriched data missing. With SSIM visualization, it is demonstrated that the FSM module is indeed able to retain more of the original input information. Multiple qualitative experiments also demonstrate the ability to generate smoother and more continuous reconstruction results with Tanh cross entropy (TCE) loss. Quantitative experiments demonstrated significantly better reconstruction performance of MDA GAN over multiple surveys than the baseline model and the ablation comparison experimental group. In the test set consisting of multiple surveys, MDA GAN outperformed the baseline model by 0.03, 0.05, and 0.04 for SSIM metrics and 3.13, 3.75, and 3.83 for PSNR under large proportional deletions of 75%, 90%, and 95%, respectively. In the 50% simpler missing case, our method still has the advantage.

V Appendix

V-A Generative Adversarial Network

Generative Adversarial Network (GAN) are proposed by Goodfellow et al.[54]. It consists of a generator network $\mathcal{G}_{\theta_{G}}$ and a discriminator network $\mathcal{D}_{\theta_{D}}$ , where the task of the generator is to yield images $x\in\mathcal{R}^{N_{w}\times N_{h}}$ with a latent noise prior vector, $z\in\mathcal{R}^{d}$ as input $z$ is sampled from a known distribution, i.e. latent vector $z$ , $z\sim\mathcal{U}[-1,1]^{d}$ [54]. The task of the discriminator is to distinguish real images from generated ones. The generator and the discriminator play a zero-sum game in which the two networks learn from each other and the data obtained by the generator becomes increasingly close to the real data, so that the desired data can be generated. The objective function of GANs can be expressed by the equation (19).

		$\displaystyle\mathop{\text{max}}\limits_{\mathcal{G}_{\theta_{G}}}\mathop{\text{min}}\limits_{\mathcal{D}_{\theta_{D}}}V(\mathcal{G}_{\theta_{G}},\mathcal{D}_{\theta_{D}})=$		(19)
		$\displaystyle\mathbb{E}_{x\sim p_{data}(x)}[\text{log}\mathcal{D}_{\theta_{D}}(x)]+\mathbb{E}_{z\sim p_{z}(z)}[\text{log}(1-\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(z))]$		(19)

where $p_{data}(x)$ represents the distribution of the training set and $p_{z}(z)$ represents the normal distribution of the noise, $\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(z))$ represents the generated data. $\mathcal{G}_{\theta_{G}}$ expects $\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(z))$ to be as large as possible, when $V(\mathcal{G}_{\theta_{G}},\mathcal{D}_{\theta_{D}})$ becomes smaller, so equation is minimized for $\mathcal{G}_{\theta_{G}}$ . $\mathcal{D}_{\theta_{D}}(x)$ is enhanced with the discriminator performance, when $\mathcal{D}_{\theta_{D}}(\mathcal{G}_{\theta_{G}}(z))$ becomes smaller and $V(\mathcal{G}_{\theta_{G}},\mathcal{D}_{\theta_{D}})$ becomes larger, so equation is maximized for $\mathcal{D}_{\theta_{D}}$ .

V-B Image Inpainting via GAN

The key to applying GAN to image inpainting is to join discriminator and adversarial loss on the base of encoder regression [28, 29, 30, 31]. The existing image inpainting work is for 2D images, its loss can be expressed as equation (20), the basic framework is shown in Fig 15.

\mathcal{L}_{\mathcal{G}}=\mathcal{L}_{\text{adv}}+\lambda\mathcal{L}_{\text{rec}}

(20)

where $\mathcal{L}_{\text{adv}}$ is adversarial loss, commonly used CE-GAN[54], LS-GAN [55], W-GAN[56], etc., $\lambda\mathcal{L}_{\text{rec}}$ is the reconstruction loss (regression loss), $L_{1}$ and $L_{2}$ are generally used [28, 29, 30, 31].

Most of the current GAN-based image inpainting follows this framework.

Acknowledgment

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper.

References

[1] S. Spitz, “Seismic trace interpolation in the fx domain,” Geophysics, vol. 56, no. 6, pp. 785–794, 1991.
[2] M. J. Porsani, “Seismic trace interpolation using half-step prediction filters,” Geophysics, vol. 64, no. 5, pp. 1461–1467, 1999.
[3] N. Gülünay, “Seismic trace interpolation in the fourier transform domain,” Geophysics, vol. 68, no. 1, pp. 355–369, 2003.
[4] M. Naghizadeh and M. D. Sacchi, “Multistep autoregressive reconstruction of seismic records,” Geophysics, vol. 72, no. 6, pp. V111–V118, 2007.
[5] B. Wang, N. Zhang, W. Lu, and J. Wang, “Deep-learning-based seismic data interpolation: A preliminary result,” Geophysics, vol. 84, no. 1, pp. V11–V20, 2019.
[6] J. Ronen, “Wave-equation trace interpolation,” Geophysics, vol. 52, no. 7, pp. 973–984, 1987.
[7] S. Fomel, “Seismic reflection data interpolation with differential offset and shot continuation,” Geophysics, vol. 68, no. 2, pp. 733–744, 2003.
[8] M. D. Sacchi, T. J. Ulrych, and C. J. Walker, “Interpolation and extrapolation using a high-resolution discrete fourier transform,” IEEE Transactions on Signal Processing, vol. 46, no. 1, pp. 31–38, 1998.
[9] J. Wang, M. Ng, and M. Perz, “Seismic data interpolation by greedy local radon transform,” Geophysics, vol. 75, no. 6, pp. WB225–WB234, 2010.
[10] A. Latif and W. A. Mousa, “An efficient undersampled high-resolution radon transform for exploration seismic data processing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 1010–1024, 2016.
[11] W. Liu, S. Cao, and Y. Chen, “Seismic time–frequency analysis via empirical wavelet transform,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 1, pp. 28–32, 2015.
[12] X. Niu, L. Fu, W. Zhang, and Y. Li, “Seismic data interpolation based on simultaneously sparse and low-rank matrix recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2021.
[13] V. Oropeza and M. Sacchi, “Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis,” Geophysics, vol. 76, no. 3, pp. V25–V32, 2011.
[14] W. Huang, R. Wang, Y. Chen, H. Li, and S. Gan, “Damped multichannel singular spectrum analysis for 3d random noise attenuation,” Geophysics, vol. 81, no. 4, pp. V261–V270, 2016.
[15] Y. A. S. Innocent Oboué, W. Chen, H. Wang, and Y. Chen, “Robust damped rank-reduction method for simultaneous denoising and reconstruction of 5d seismic data,” Geophysics, vol. 86, no. 1, pp. V71–V89, 2021.
[16] W. Zhang, L. Fu, and Q. Liu, “Nonconvex log-sum function-based majorization–minimization framework for seismic data reconstruction,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 11, pp. 1776–1780, 2019.
[17] Y. Wang, B. Wang, N. Tu, and J. Geng, “Seismic trace interpolation for irregularly spatial sampled data using convolutional autoencoder,” Geophysics, vol. 85, no. 2, pp. V119–V130, 2020.
[18] J. Park, D. Yoon, S. J. Seol, and J. Byun, “Reconstruction of seismic field data with convolutional u-net considering the optimal training input data,” in SEG International Exposition and Annual Meeting. OnePetro, 2019.
[19] B. Wang, N. Zhang, W. Lu, P. Zhang, and J. Geng, “Seismic data interpolation using deep learning based residual networks,” in 80th EAGE Conference and Exhibition 2018, vol. 2018, no. 1. European Association of Geoscientists & Engineers, 2018, pp. 1–5.
[20] J. Yu and B. Wu, “Attention and hybrid loss guided deep learning for consecutively missing seismic data reconstruction,” IEEE Transactions on Geoscience and Remote Sensing, 2021.
[21] X. Li, B. Wu, X. Zhu, and H. Yang, “Consecutively missing seismic data interpolation based on coordinate attention unet,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
[22] T. He, B. Wu, and X. Zhu, “Seismic data consecutively missing trace interpolation based on multistage neural network training process,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
[23] H. Li and S. Misra, “Prediction of subsurface nmr t2 distributions in a shale petroleum system using variational autoencoder-based neural networks,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 12, pp. 2395–2397, 2017.
[24] ——, “Long short-term memory and variational autoencoder with convolutional neural networks for generating nmr t2 distributions,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 2, pp. 192–195, 2018.
[25] F. Qian, Z. Liu, Y. Wang, S. Liao, S. Pan, and G. Hu, “Dtae: Deep tensor autoencoder for 3-d seismic data interpolation,” IEEE Transactions on Geoscience and Remote Sensing, 2021.
[26] S. Misra and H. Li, “Deep neural network architectures to approximate the fluid-filled pore size distributions of subsurface geological formations,” Machine Learning for Subsurface Characterization, vol. 183, 2019.
[27] D. A. Oliveira, R. S. Ferreira, R. Silva, and E. V. Brazil, “Interpolating seismic data with conditional generative adversarial networks,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 12, pp. 1952–1956, 2018.
[28] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European conference on computer vision. Springer, 2016, pp. 483–499.
[29] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
[30] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[31] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
[32] Q. Wei, X. Li, and M. Song, “De-aliased seismic data interpolation using conditional wasserstein generative adversarial networks,” Computers & Geosciences, vol. 154, p. 104801, 2021.
[33] H. Kaur, N. Pham, and S. Fomel, “Seismic data interpolation using cyclegan,” in SEG Technical Program Expanded Abstracts 2019. Society of Exploration Geophysicists, 2019, pp. 2202–2206.
[34] ——, “Seismic data interpolation using deep learning with generative adversarial networks,” Geophysical Prospecting, vol. 69, no. 2, pp. 307–326, 2021.
[35] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling,” Advances in neural information processing systems, vol. 29, 2016.
[36] M. D. Cirillo, D. Abramian, and A. Eklund, “Vox2vox: 3d-gan for brain tumour segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2020, pp. 274–284.
[37] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[38] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.
[39] Y. Dou, K. Li, J. Zhu, T. Li, S. Tan, and Z. Huang, “Md loss: Efficient training of 3d seismic fault segmentation network under sparse labels by weakening anomaly annotation,” IEEE Transactions on Geoscience and Remote Sensing, 2022.
[40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[41] Y. Dou, K. Li, J. Zhu, X. Li, and Y. Xi, “Attention-based 3-d seismic fault segmentation training by a few 2-d slice labels,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2021.
[42] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5693–5703.
[43] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, 2020.
[44] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.
[45] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2536–2544.
[46] C. Zheng, T.-J. Cham, and J. Cai, “Pluralistic image completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1438–1447.
[47] Z. Yi, Q. Tang, S. Azizi, D. Jang, and Z. Xu, “Contextual residual aggregation for ultra high-resolution image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7508–7517.
[48] Y. Zeng, Z. Lin, H. Lu, and V. M. Patel, “Cr-fill: Generative image inpainting with auxiliary contextual reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 164–14 173.
[49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[50] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
[51] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[52] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[53] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[54] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
[55] G.-J. Qi, “Loss-sensitive generative adversarial networks on lipschitz densities,” International Journal of Computer Vision, vol. 128, no. 5, pp. 1118–1140, 2020.
[56] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan. arxiv 2017,” arXiv preprint arXiv:1701.07875, vol. 30, no. 4, 2017.