This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

ODE-DPS: ODE-based Diffusion Posterior Sampling for Inverse Problems in Partial Differential Equation

Enze Jiang1111First Author and Second Author contribute equally to this work., Jishen Peng1footnotemark: , Zheng Ma1,2,3,4, Xiong-Bin Yan1,4 222Corresponding author. 1 School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China 2 Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, China 3 Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai, China 4 CMA-Shanghai, Shanghai Jiao Tong University, Shanghai, China [email protected]
Abstract

In recent years we have witnessed a growth in mathematics for deep learning, which has been used to solve inverse problems of partial differential equations (PDEs). However, most deep learning-based inversion methods either require paired data or necessitate retraining neural networks for modifications in the conditions of the inverse problem, significantly reducing the efficiency of inversion and limiting its applicability. To overcome this challenge, in this paper, leveraging the score-based generative diffusion model, we introduce a novel unsupervised inversion methodology tailored for solving inverse problems arising from PDEs. Our approach operates within the Bayesian inversion framework, treating the task of solving the posterior distribution as a conditional generation process achieved through solving a reverse-time stochastic differential equation. Furthermore, to enhance the accuracy of inversion results, we propose an ODE-based Diffusion Posterior Sampling inversion algorithm. The algorithm stems from the marginal probability density functions of two distinct forward generation processes that satisfy the same Fokker-Planck equation. Through a series of experiments involving various PDEs, we showcase the efficiency and robustness of our proposed method.

1 Introduction

Partial differential equations (PDEs) are fundamental concepts in various scientific and engineering fields, serving as the cornerstone of many disciplines. They are characterized by their dependence on parameters that govern their behavior, which allows them to describe a wide range of physical and operational scenarios. Solving partial differential equations under different parameter settings is meaningful because it enables predictions of physical laws under varying conditions. Various solution techniques, such as the finite element method (FEM), finite volumes, finite differences, and spectral methods, have been developed for tackling the problem [1, 2]. This process establishes a mapping from parameters to solutions, known as a parameter-to-solution map.

Conversely, when given experimental data, there is often a need to infer the unknown parameters associated with the observed phenomenon, thereby determining the parameter regimes of the observed system. This situation is known as the inverse problem. Inverse problems are generally more challenging than forward problems due to their ill-posed nature, where minor errors in observations can lead to significant inaccuracies in model parameters. This challenge is further exacerbated in scenarios involving observation noise, incomplete physics, and high-dimensional parameter spaces [3, 4].

So far, several regularization methods have been employed to address ill-posed inverse problems, including Tikhonov-type regularization methods, iterative regularization methods, and Bayesian inversion methods [3, 4]. The Tikhonov regularization method stabilizes the solution of the inverse problem by incorporating regularization terms and carefully selecting regularization parameters [5]. The iterative inversion method tackles the ill-posed nature of the inverse problem by implementing an iterative stopping criterion [5]. The Bayesian inversion method transforms ill-posed inverse problems into well-posed posterior distribution problems through the introduction of suitable prior distributions [3]. Although significant progress has been witnessed in various inverse problems, they are hindered by their inability to effectively utilize data from past records. This limitation leads to inaccurate inversion, making it challenging for the resulting parameters to meet practical requirements.

Recently, deep learning has emerged as a potent tool for addressing inverse problems related to partial differential equations (PDEs). For example, Raissi et al. [6] introduced physics-informed neural networks (PINNs) to tackle such problems. In this methodology, neural networks represent unknown parameters, leveraging physical equations to construct a loss function. Subsequently, stochastic gradient descent optimizes the neural network parameters, yielding numerical inversion results. In recent years, research in this field has flourished, incorporating various methodologies like weak adversarial networks for inverse problems [7], PDE-aware deep learning methods for inverse problems [8], implicit neural representations for seismic inversion [9], and inverse obstacle scatter [10]. These approaches operate on a case-by-case basis, necessitating complete retraining for each new set of unknown parameters, measured data, or modifications to other conditions in the inverse problem.

To address the aforementioned case-by-case scenario, researchers have proposed inverse neural operators as a solution for PDE inverse problems. Fan et al. [11] introduces compact neural network architectures for regularized inverse operators to solve the EIT inverse problem. Molinaro et al. [12] propose a novel architecture called Neural Inverse Operators (NIOs) for solving PDE inverse problems. NIOs utilize a composition of DeepONets and FNOs to approximate mappings from operators to functions. Ovadia et al. [13] combine a U-Net-based architecture with a vision transformer to design an inverse neural operator for solving various PDE inverse problems. While these inverse neural operators have demonstrated effective numerical inversion results for PDE inverse problems, acquiring pairs of data remains a challenging issue.

Presently, generative models [14, 15, 16, 17, 18, 19] have emerged as an alternative approach for solving inverse problems, offering the advantage of not necessitating paired unknown parameters and measurement data. This method only requires partial prior data about the unknown parameters and does not rely on their corresponding measurement data. Generative models constitute a category of machine learning models designed to learn the underlying data distribution and produce new data samples resembling the original. Typical examples of generative models encompass Autoencoders [16], Variational Autoencoders (VAEs) [17], and Generative Adversarial Networks (GANs) [18], among others. In recent years, diffusion models have gained prominence as mainstream generative models. Simultaneously, they have demonstrated effectiveness in addressing inverse problems within the visual field, as evidenced by studies such as [20, 21, 14, 22].

We present in this work an ODE-based Diffusion Posterior Sampling (ODE-DPS) algorithm for solving inverse problems in partial differential equations (PDEs). Specifically, by leveraging partial prior data of unknown parameters, combined with diffusion generative models and deep neural networks, we learn the prior distribution of the data, which is then applied in solving the Bayesian posterior distribution of the PDEs inverse problems. Then, we solve the posterior distribution using a conditional generation process achieved by solving a reverse-time stochastic differential equation. To enhance the inversion accuracy, we propose the ODE-DPS regularization inversion algorithm. This algorithm originates from an equivalent form of a noising process for the score-based diffusion model, with its equivalence guaranteed by the marginal probability density functions satisfying the same Fokker-Planck equation as in the forward noising process. Furthermore, based on experimental observations, we propose modifying the original algorithm by replacing the L2L_{2}-norm data residual term with an adaptive norm residual term. The purpose is to reduce the errors near the boundaries in the inversion results obtained by the score-based diffusion sampling algorithm. We summarize the main novelties of this paper as follows:

  • We propose an ODE-based diffusion sampling algorithm to solve inverse problems of PDEs. This algorithm utilizes a limited amount of prior data and integrates score-based generative models to learn the prior distribution of unknown parameters. Subsequently, it is applied in a conditional distribution-based backward generation process induced by Bayesian inversion.

  • By leveraging the Fokker-Planck equation satisfied by the marginal probability density function of the noising process in the generative model, we derived a new reverse ordinary differential equation. Subsequently, we discretized it numerically to develop our ODE-based DPS inversion algorithm.

  • Based on experimental observations, we propose modifying the diffusion posterior sampling algorithm by replacing the L2L_{2}-norm data residual term with an adaptive norm residual term to reduce the errors near the boundaries in the inversion results.

  • ODE-DPS algorithm requires only a small amount of prior data for unknown parameters and does not require observation data, also referred to as labeled data, during the training phase. Once trained, this generative model can be applied to various inverse problem-solving tasks. Compared to traditional inversion algorithms, it enhances the inversion accuracy with minimal reduction in efficiency.

  • Numerically, we compare our algorithm with traditional inversion methods across various inverse problems. It clearly shows that the ODE-DPS inversion algorithm can significantly improve the inversion accuracy.

The rest of this paper is organized as follows: In Section 2, we give an introduction to inverse problems. Then we elucidate the score-based diffusion model in Section 3. After that, we explain our method to solve the inverse problems in Section 4, apply it to some examples, and compare it with some traditional methods in Section 5. Finally, we end this paper with conclusions in Section 6.

2 Preliminaries

In this section, we consider the inverse problem of a general form, aiming to estimate x¯0D\bar{x}_{0}\in D from the data represented by the operator equation:

y¯δ=F(x¯0)+ξδ,\bar{y}_{\delta}=F(\bar{x}_{0})+\xi_{\delta}, (1)

where F:DXYF:D\subset X\rightarrow Y is an operator between reflexive Banach spaces (X,X)(X,\|\cdot\|_{X}) and (Y,Y)(Y,\|\cdot\|_{Y}) with domain DD. Here, y¯δ\bar{y}_{\delta} denotes the observed data with noise ξδ\xi_{\delta}.

Generally speaking, the inverse problem (1) is ill-posed in the sense that the solution may not exist, be not unique, or not vary smoothly on the data due to the small noise caused by the measurement and recording process. Consequently, directly solving the operator equation (1) for ill-posed inverse problems is not feasible. To tackle these challenges, regularization techniques are commonly employed. The Tikhonov regularization method [23, 5] stands out as a well-established and effective approach for addressing inverse problems across various fields. This method replaces solving operator equation (1) by minimizing the following problem:

x¯0:=argminx¯y¯δF(x¯)Y2+αx¯X2,\bar{x}_{0}:=\arg\min\limits_{\bar{x}}\|\bar{y}^{\delta}-F(\bar{x})\|_{Y}^{2}+\alpha\|\bar{x}\|_{X}^{2}, (2)

where x¯X\|\bar{x}\|_{X} denotes the regularization term, and α\alpha is a regularization parameter. The efficacy of the Tikhonov regularization method in solving inverse problems is significantly influenced by the choice of regularization parameter and the regularization term norm X\|\cdot\|_{X}. Optimal selection of a regularization parameter and regularization term can substantially enhance the accuracy of the inversion results. Nevertheless, determining the optimal parameter and norm are typically a challenging task.

Additionally, another method widely employed in solving inverse problems is the iterative method [24, 25]. However, due to the ill-posed nature of the inverse problem, the iterative inversion method often encounters semi-convergence. This phenomenon manifests as a decrease followed by an increase in the relative error of the numerical inversion result with iteration process. Therefore, the success of this inversion method hinges on choosing a suitable stopping criterion. Morozov’s discrepancy principle [26] is typically utilized for this purpose. Currently, integrating the discrepancy principle into iterative methods has emerged as another mainstream approach for addressing ill-posed inverse problems.

Although the inversion methods mentioned above have been applied to different inverse problems in the past, due to the ill-posedness of the inverse problem, we usually can only obtain low-precision inversion results. For example, let us consider the following equation:

{ut(x,y,t)=aΔu(x,y,t)+f(x,y),(x,y,t)Ω×(0,T),u(x,y,t)=ψ(x,y,t),(x,y,t)Ω×(0,T),u(x,y,0)=ϕ(x,y),(x,y)Ω,\left\{\begin{aligned} u_{t}(x,y,t)&=a\Delta u(x,y,t)+f(x,y),~{}(x,y,t)\in\Omega\times(0,T),\\ u(x,y,t)&=\psi(x,y,t),~{}(x,y,t)\in\partial\Omega\times(0,T),\\ u(x,y,0)&=\phi(x,y),~{}(x,y)\in\Omega,\end{aligned}\right. (3)

where aa is a positive constant. We assume that the initial value ϕ\phi and boundary ψ\psi and the constant aa are known (Here we set a=1,ψ(x,y,t)=0,ϕ(x,y)=20sin2πxsin2πy,Ω=[0,1]×[0,1],T=1a=1,~{}\psi(x,y,t)=0,~{}\phi(x,y)=20\sin 2\pi x\sin 2\pi y,~{}\Omega=[0,1]\times[0,1],T=1), and use the measurement data u(x,y,T)u(x,y,T) at the terminal time T=1T=1 to recover the unknown source term ff. We set the time step size to 0.01 and the grid size to 1/63 to discretizate Ω\Omega, and denote the measurement data with noise as uδ(x,y,T)u^{\delta}(x,y,T). We apply Landweber iteration regularization [27] and Tikhonov regularization method [5] for solving the ill-posed inverse source problem, and Morozov’s discrepancy principle [26] for stopping the Landweber iteration process. In the Tikhonov regularization method, we choose the norm of the regularization term to be the l2l_{2} norm, with a regularization parameter of α=0.005\alpha=0.005. The inversion results are shown in Figure 1. Compared with the exact solution in Figure 1a, the numerical results of the inversion in Figures 1b, 1c only captures large-scale features, and small-scale features are difficult to capture. In addition, the errors in Figures 1d, 1e also reflect that Landweber iteration regularization method and Tikhonov regularization method are difficult to give accurate inversion results. The main reasons why this inversion method struggles to achieve high-precision inversion results are primarily twofold:

  1. 1.

    Although we only consider a linear inverse problem in this numerical example, their inherent ill-posed nature persists. Hence, it remains challenging to obtain an accurate solution for the ill-posed problem.

  2. 2.

    While we have utilized the mainstream regularization methods to solve the aforementioned inverse problem, we have not incorporated any prior information about the inverse problem into the inversion process, apart from selecting the initial value and the norm of regularization term. In reality, leveraging prior information past-recorded of inverse problems can significantly enhance inversion accuracy.

Therefore, it is necessary for us to design a new regularized inversion algorithm to solve the ill-posed inverse problem. In recent years, deep learning-based inversion methods, as another form of regularization, have been applied across various fields of inverse problems, showing a trend of outperforming traditional inversion methods. Their advantage primarily stems from their ability to learn prior information about inverse problem solutions from previously recorded data, which can then be applied to the inversion process. Building upon this notion, this paper proposes a novel regularization inversion method. This method leverages partial prior information of unknown parameters, deep neural networks, score-based diffusion generative model, and the traditional Bayesian inversion framework to introduce a completely new regularization inversion approach. The following section will present the interpretation of our inversion method.

Refer to caption
(a) The true source ff.
Refer to caption
(b) The inversion ff by Landweber iteration method.
Refer to caption
(c) The inversion ff by Tikhonov method.
Refer to caption
(d) The error by Landweber iteration method.
Refer to caption
(e) The error by Tikhonov method.
Figure 1: The inversion results of the problem (3). (a) denotes the ground truth of source ff. (b), (c) respectively provide the inversion results by Landweber iteration regularization and Tikhonov regularization method. (d), (e) respectively denotes the difference between true ff and the inversion source by Landweber iteration regularization and Tikhonov regularization method. The relative l2l_{2} errors of the true and inversion source obtained by Landweber iteration and Tikhonov regularization are 34.4% and 37.9% respectively. Here, we set the regularization parameter α=0.005\alpha=0.005 for Tikhonov regularization.

3 Score-Based Diffusion Model

In this article, our main focus is on constructing new regularization algorithms using generative models and Bayesian inversion methods. Deep generative models are widely used in many subfields of AI and Machine Learning. These models, when combined with stochastic optimization methods, excel at capturing complex, high-dimensional data distributions such as images, text, and speech. Today, popular deep generative models include variational autoencoders (VAEs), generative adversarial networks (GANs), auto-regressive models, and so on. In particular, diffusion models, also known as diffusion probabilistic models or score-based generative models, represent a class of latent variable generative models that are found in applications for diverse inverse problem tasks, including image denoising [28], inpainting [29], super-resolution [30], and medical imaging [15]. These models have demonstrated state-of-the-art performance in these fields. Consequently, our objective in this study is to devise a novel regularization inversion method for addressing PDE inverse problems by integrating score-based diffusion models with traditional Bayesian inversion methods.

Score-based generative models represent a continuum of distributions evolving over time through a diffusion process. This process gradually transforms a data point into random noise, as described by a predetermined stochastic differential equation (SDE) independent of the data and devoid of trainable parameters. Reversing this process enables us to smoothly convert random noise into data for sample generation. Importantly, this reverse process adheres to a reverse-time SDE, derivable from the forward SDE based on the score of marginal probability densities over time. Consequently, we can approximate the reverse-time SDE by training a time-dependent neural network to estimate the scores, facilitating sample production using numerical SDE solvers. Below we will explain the score-based generative model in detail.

3.1 Perturbing Data with SDE

Score-based diffusion model [31] defines the generative process as the reverse of the noising process. Specifically, we construct a diffusion process 𝐱(t)\mathbf{x}(t) indexed by a continuous time variable t[0,T^]t\in[0,\hat{T}], such that 𝐱(𝟎)p(x(0))=p0\mathbf{x(0)}\sim p(x(0))=p_{0}, for which we have a datasets of i.i.d samples, and 𝐱(T^)p(x(T^))=pT^\mathbf{x}(\hat{T})\sim p(x(\hat{T}))=p_{\hat{T}}, for which we have a tractable form to generative samples efficiently. In other words, p0p_{0} is the data distribution and pT^p_{\hat{T}} is the prior distribution. The diffusion process can be modeled as the solution to an Ito^\mathrm{It\hat{o}} stochastic different equation (SDE):

d𝐱=β(t)2𝐱dt+β(t)dωd\mathbf{x}=-\frac{\beta(t)}{2}\mathbf{x}dt+\sqrt{\beta(t)}d\mathbf{\omega} (4)

where β(t)>0\beta(t)>0 is the noise schedule of the process, typically taken to be monotonically increasing linear function of tt [32], and ω\mathbf{\omega} is the standard Wiener process (a.k.a. Brownian motion). The SDE has a unique strong solution as long as the coefficients are globally Lipschitz in both state and time [33]. We hereafter denote by p(𝐱(t))p(\mathbf{x}(t)) the probability density of 𝐱(t)\mathbf{x}(t), and use p(𝐱(t)|𝐱(s))p(\mathbf{x}(t)|\mathbf{x}(s)) to denote the transition kernel from 𝐱(s)\mathbf{x}(s) to 𝐱(t)\mathbf{x}(t), where 0s<tT^0\leq s<t\leq\hat{T}.

3.2 Generative Samples by Reversing the SDE

The generative process is to recover the data distribution, i.e, obtain samples 𝐱(0)p(x(0))\mathbf{x}(0)\sim p(x(0)), by starting from samples of 𝐱(T^)p(x(T^))\mathbf{x}(\hat{T})\sim p(x(\hat{T})) and reversing the diffusion process (4). A remarkable result from Anderson [34] states that the reverse of a diffusion process is also a diffusion process, running backwards in time and given by the reverse-time SDE:

d𝐱=[β(t)2𝐱β(t)𝐱(t)logp(𝐱(t))]dt+β(t)dω¯,d\mathbf{x}=[-\frac{\beta(t)}{2}\mathbf{x}-\beta(t)\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))]dt+\sqrt{\beta(t)}d\bar{\mathbf{\omega}}, (5)

where ω¯\bar{\mathbf{\omega}} is a standard Wiener process when time flows backwards from T^\hat{T} to 0, and dtdt is an infinitesimal negative timestep. Once the score of each marginal distribution, 𝐱(t)logp(𝐱(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)), is known for all tt, we can derive the reverse diffusion process from equation (5) and simulate it to obtain samples from distribution p0p_{0}.

3.3 Estimating Score for the SDE

The score 𝐱(t)logp(𝐱(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)) can be estimated by training a score-based model on samples, i.e, we can train a time-dependent score-based model sθ(𝐱,t)s_{\theta}(\mathbf{x},t) to approximate the score function 𝐱(t)logp(𝐱(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)) with denoising score matching (DSM) [35]. The optimization objective was proved equivalent to the following:

θ=argminθ𝔼tU(0,T^),𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t)|𝐱(0))22],\theta^{*}=\mathop{\arg\min}\limits_{\theta}\mathbb{E}_{t\sim U(0,\hat{T}),~{}\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}], (6)

where ϵ0\epsilon\simeq 0 is a small positive constant. With sufficient data and model capacity, score matching ensures that the optimal solution to equation (6) denoted by sθ(𝐱(t),t)s_{\theta^{*}}(\mathbf{x}(t),t). Then, one can use the approximation 𝐱(t)logp(𝐱(t))sθ(𝐱(t),t)\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\simeq s_{\theta^{*}}(\mathbf{x}(t),t) as a plug in estimate to replace the score function in (5). Discretization of (5) and solving using, e.g. Euler-Maruyama discretization, amounts to sampling from the data distribution p0p_{0}, the goal of generative modeling.

4 Solving inverse problem by Score-based generative model

Consider the inverse problem mentioned in (1), and we assume that the unknown variable x¯0\bar{x}_{0} and the observed data y¯δ\bar{y}_{\delta} are random variables, ξδ\xi_{\delta} is the measurement noise. In the case of white Gaussian noise, ξδ𝒩(0,σ2I)\xi_{\delta}\sim\mathcal{N}(0,\sigma^{2}I). In the Bayesian framework, one utilizes p(x¯0)p(\bar{x}_{0}) as the prior, and denote the likelihood p(y¯δ|x¯0)𝒩(y¯δ|F(x¯0),σ2I)p(\bar{y}_{\delta}|\bar{x}_{0})\sim\mathcal{N}(\bar{y}_{\delta}|F(\bar{x}_{0}),\sigma^{2}I). By Bayesian rules’, the posterior distribution p(x¯0|y¯δ)p(\bar{x}_{0}|\bar{y}_{\delta}) denoted by

p(x¯0|y¯δ)=1p(y¯δ)p(y¯δ|x¯0)p(x¯0).p(\bar{x}_{0}|\bar{y}_{\delta})=\frac{1}{p(\bar{y}_{\delta})}p(\bar{y}_{\delta}|\bar{x}_{0})p(\bar{x}_{0}).

Then the inverse problem (1) is transformed into the problem of solving the posterior distribution p(x¯0|y¯δ)p(\bar{x}_{0}|\bar{y}_{\delta}). To do it, we use the forward diffusion process (4) to obtain a family of diffused distribution p(x¯(t)|y¯δ)p(\bar{x}(t)|\bar{y}_{\delta}) with the initial distribution p(x¯0|y¯δ)p(\bar{x}_{0}|\bar{y}_{\delta}) and apply Anderson’s Theorem to derive the conditioned reverse time SDE

dx¯(t)=[β(t)2x¯(t)β(t)x¯(t)logp(x¯(t)|y¯δ)]dt+β(t)dω¯.d\bar{x}(t)=[-\frac{\beta(t)}{2}\bar{x}(t)-\beta(t)\nabla_{\bar{x}(t)}\log{p(\bar{x}(t)|\bar{y}_{\delta})}]dt+\sqrt{\beta(t)}d\bar{\mathbf{\omega}}. (7)

Then the solution to the inverse problem, that is, solving the posterior distribution p(x¯0|y¯δ)p(\bar{x}_{0}|\bar{y}_{\delta}), is transformed into sampling from the conditional reverse time SDE (7). Since p(x¯(t)|y¯δ)p(x¯(t))p(y¯δ|x¯(t))p(\bar{x}(t)|\bar{y}_{\delta})\propto p(\bar{x}(t))p(\bar{y}_{\delta}|\bar{x}(t)), the score x¯(t)logp(x¯(t)|y¯δ)\nabla_{\bar{x}(t)}\log p(\bar{x}(t)|\bar{y}_{\delta}) in (7) can be computed easily by

x¯(t)logp(x¯(t)|y¯δ)=x¯(t)logp(x¯(t))+x¯(t)logp(y¯δ|x¯(t)).\nabla_{\bar{x}(t)}\log p(\bar{x}(t)|\bar{y}_{\delta})=\nabla_{\bar{x}(t)}\log p(\bar{x}(t))+\nabla_{\bar{x}(t)}\log p(\bar{y}_{\delta}|\bar{x}(t)). (8)

Combing (7) with (8), we have

dx¯(t)=[β(t)2x¯(t)β(t)(x¯(t)logp(x¯(t))+x¯(t)logp(y¯δ|x¯(t)))]dt+β(t)dω¯.d\bar{x}(t)=[-\frac{\beta(t)}{2}\bar{x}(t)-\beta(t)(\nabla_{\bar{x}(t)}\log p(\bar{x}(t))+\nabla_{\bar{x}(t)}\log p(\bar{y}_{\delta}|\bar{x}(t)))]dt+\sqrt{\beta(t)}d\bar{\mathbf{\omega}}. (9)

Leveraging (9), scored-based generative models provide a way to solve the Bayesian inverse problem, i.e., to solve the posterior distribution p(x¯0|y¯δ)p(\bar{x}_{0}|\bar{y}_{\delta}). In (9), we have two terms that should be computed: the score function x¯(t)logp(x¯(t))\nabla_{\bar{x}(t)}\log p(\bar{x}(t)) and the likelihood function x¯(t)logp(yδ|x¯(t))\nabla_{\bar{x}(t)}\log p(y_{\delta}|\bar{x}(t)). To compute the former term x¯(t)logp(x¯(t))\nabla_{\bar{x}(t)}\log p(\bar{x}(t)), we can simply use the pre-trained score function sθ(x¯(t),t)s_{\theta^{*}}(\bar{x}(t),t). However, the latter term is hard to acquire in closed-form due to the dependence on the time tt, as there only exists dependence between y¯δ\bar{y}_{\delta} and x¯0\bar{x}_{0}.

4.1 Approximate the Likelihood

In this part, we try to approximate logp(y¯δ|x¯(t))\log p(\bar{y}_{\delta}|\bar{x}(t)). Since x¯(t)\bar{x}(t) and y¯δ\bar{y}_{\delta} are conditionally independent on x¯0\bar{x}_{0}, we have

p(y¯δ|x¯(t))\displaystyle p(\bar{y}_{\delta}|\bar{x}(t)) =p(y¯δ|x¯0,x¯(t))p(x¯0|x¯(t))𝑑x¯0\displaystyle=\int p(\bar{y}_{\delta}|\bar{x}_{0},\bar{x}(t))p(\bar{x}_{0}|\bar{x}(t))d\bar{x}_{0} (10)
=p(y¯δ|x¯0)p(x¯0|x¯(t))𝑑x¯0\displaystyle=\int p(\bar{y}_{\delta}|\bar{x}_{0})p(\bar{x}_{0}|\bar{x}(t))d\bar{x}_{0}
=𝔼x¯0p(x¯0|x¯(t))[p(y¯δ|x¯0)].\displaystyle=\mathbb{E}_{\bar{x}_{0}\sim p(\bar{x}_{0}|\bar{x}(t))}[p(\bar{y}_{\delta}|\bar{x}_{0})].

Chung et al [36] suggests the following approximation:

p(y¯δ|x¯(t))p(y¯δ|x¯^0),p(\bar{y}_{\delta}|\bar{x}(t))\simeq p(\bar{y}_{\delta}|\hat{\bar{x}}_{0}),

where

x¯^0:=𝔼[x¯0|x¯(t)]=1α¯(t)(x¯(t)+(1α¯(t))x¯(t)logp(x¯(t))).\hat{\bar{x}}_{0}:=\mathbb{E}[\bar{x}_{0}|\bar{x}(t)]=\frac{1}{\sqrt{\bar{\alpha}(t)}}(\bar{x}(t)+(1-\bar{\alpha}(t))\nabla_{\bar{x}(t)}\log p(\bar{x}(t))).

Replacing x¯(t)logp(x¯(t))\nabla_{\bar{x}(t)}\log p(\bar{x}(t)) with the trained neural network sθ(x¯(t),t))s_{\theta^{*}}(\bar{x}(t),t)), we have the following approximation:

x¯^01α¯(t)(x¯(t)+(1α¯(t))sθ(x¯(t),t)),\hat{\bar{x}}_{0}\simeq\frac{1}{\sqrt{\bar{\alpha}(t)}}(\bar{x}(t)+(1-\bar{\alpha}(t))s_{\theta^{*}}(\bar{x}(t),t)),

Now, we turn to the calculation of p(y¯δ|x¯^0)p(\bar{y}_{\delta}|\hat{\bar{x}}_{0}). The likely function takes the form

p(y¯δ|x¯^0)=1(2π)nσ2nexp[y¯δF(x¯^0)222σ2],p(\bar{y}_{\delta}|\hat{\bar{x}}_{0})=\frac{1}{\sqrt{(2\pi)^{n}\sigma^{2n}}}\exp[-\frac{\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0})\|_{2}^{2}}{2\sigma^{2}}],

where nn is the dimension of y¯δ\bar{y}_{\delta}. We can calculate an approximation value of x¯(t)logp(y¯δ|x¯(t))\nabla_{\bar{x}(t)}\log p(\bar{y}_{\delta}|\bar{x}(t)) in (9) by

x¯(t)logp(y¯δ|x¯(t))𝐱(t)logp(y¯δ|x¯^0)1σ2x¯(t)y¯δF(x¯^0(x¯(t)))22.\nabla_{\bar{x}(t)}\log p(\bar{y}_{\delta}|\bar{x}(t))\simeq\nabla_{\mathbf{x}(t)}\log p(\bar{y}_{\delta}|\hat{\bar{x}}_{0})\simeq-\frac{1}{\sigma^{2}}\nabla_{\bar{x}(t)}\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0}(\bar{x}(t)))\|_{2}^{2}. (11)

Combining the equation (11) and (8), we can get the approximation

x¯(t)logp(x¯(t)|y¯δ)sθ(x¯(t),t)ζx¯(t)y¯δF(x¯^0(x¯(t)))22,\nabla_{\bar{x}(t)}\log p(\bar{x}(t)|\bar{y}_{\delta})\simeq s_{\theta*}(\bar{x}(t),t)-\zeta\nabla_{\bar{x}(t)}\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0}(\bar{x}(t)))\|_{2}^{2}, (12)

ζ:=1σ2\zeta:=\frac{1}{\sigma^{2}} is the step size, where we replace x¯(t)logp(x¯(t))\nabla_{\bar{x}(t)}\log p(\bar{x}(t)) with the trained neural network sθ(x¯(t),t))s_{\theta^{*}}(\bar{x}(t),t)).

Through the above discussion, and incorporateing (12) into (9) and using the usual ancestral sampling, we can get the Diffusion Posterior Sampling (DPS) algorithm 1.

1 Require: N,y¯δ,sθ,ζ,{βi}i=1NN,\bar{y}_{\delta},s_{\theta^{*}},\zeta,\{\beta_{i}\}_{i=1}^{N};
2 Result: x¯0\bar{x}_{0};
3 x¯N𝒩(0,𝐈)\bar{x}_{N}\sim\mathcal{N}(0,\mathbf{I});
4 αi:=1βi,αi¯:=j=1iαi\alpha_{i}:=1-\beta_{i},\bar{\alpha_{i}}:=\prod_{j=1}^{i}\alpha_{i}
5 for i=Ni=N to 0 do
6       s^=sθ(x¯i,i)\hat{s}=s_{\theta^{*}}(\bar{x}_{i},i)
7       x¯^0=1α¯i(x¯i+(1α¯i)s^)\hat{\bar{x}}_{0}=\frac{1}{\sqrt{\bar{\alpha}_{i}}}(\bar{x}_{i}+(1-\bar{\alpha}_{i})\hat{s})
8       z𝒩(0,I)z\sim\mathcal{N}(0,I)
9       x¯i1=αi(1α¯i1)1α¯ix¯i+α¯i1βi1α¯ix¯^0+σ~iz\bar{x}_{i-1}^{\prime}=\frac{\sqrt{\alpha_{i}}(1-\bar{\alpha}_{i-1})}{1-\bar{\alpha}_{i}}\bar{x}_{i}+\frac{\sqrt{\bar{\alpha}_{i-1}}\beta_{i}}{1-\bar{\alpha}_{i}}\hat{\bar{x}}_{0}+\tilde{\sigma}_{i}z
10       x¯i1=x¯i1ζx¯iy¯δF(x¯^0)22\bar{x}_{i-1}=\bar{x}_{i-1}^{\prime}-\zeta\nabla_{\bar{x}_{i}}\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0})\|_{2}^{2}.
11      
Algorithm 1 Diffusion Posterior Sampling (DPS) for Inverse problem.
Refer to caption
(a) The true ff.
Refer to caption
(b) The inversion ff.
Refer to caption
(c) Difference between true source and inversion source.
Figure 2: The inversion results of the problem (3). (a) denotes the ground truth of source ff. (b) denotes the inversion results by Diffusion Posterior Sampling algorithm 1. (c) denotes the difference between true ff and the inversion source. The relative l2l_{2} error of the true and inversion source is 18.4%. In algorithm 1, we set N=1000N=1000.

In Figure 2, we present the numerical results obtained using the DPS inversion algorithm to solve the ill-posed inverse problem described in Section 2. The network architecture, parameters, and training details of the neural network sθ(x¯(t),t)s_{\theta^{*}}(\bar{x}(t),t) will be deferred and presented later in the paper. It is evident from Figure 2b that the DPS inversion algorithm significantly improves the inversion performance compared to the Landweber iteration and Tikhonov regularization method. The relative l2l_{2} error of the inversion results obtained by the DPS algorithm is 18.4%, as opposed to the relative l2l_{2} error 34.4% and 37.9% produced by the Landweber iteration and Tikhonov regularization respectively, indicating a substantial enhancement in the inversion accuracy with the DPS algorithm. However, in Figure 2c, we observe larger errors near the boundaries of the region. Therefore, reducing the errors near the boundaries of the inversion results can improve the solution accuracy. Moreover, in the experiment, we find that the original DPS algorithm is unstable, which means the inversion results we get exist some fluctuations due to the randomness of the algorithm. Therefore, to achieve higher accuracy and better stability in the inversion algorithm, we need to modify the original DPS inversion algorithm. Based on this, in the following, we will provide an improved version of the DPS algorithm.

4.2 ODE-based Diffusion Posterior Sampling for inverse problems

We reconsider a stochastic differential equation in the following form:

d𝐱(t)=f(𝐱(t),t)dt+g(t)dw,d\mathbf{x}(t)=f(\mathbf{x}(t),t)dt+g(t)dw,

which is more generative than SDE (4). Suppose that p(𝐱(t))p(\mathbf{x}(t)) is the marginal probability density function, then the corresponding Fokker-Planck equation [31] is:

p(𝐱(t))t=𝐱(t)[f(𝐱(t),t)p(𝐱(t))]+12g2(t)𝐱(t)𝐱(t)p(𝐱(t)).\frac{\partial p(\mathbf{x}(t))}{\partial t}=-\nabla_{\mathbf{x}(t)}\cdot[f(\mathbf{x}(t),t)p(\mathbf{x}(t))]+\frac{1}{2}g^{2}(t)\nabla_{\mathbf{x}(t)}\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t)). (13)

Using the fact that 𝐱(t)(logp(𝐱(t)))p(𝐱(t))=𝐱(t)p(𝐱(t))\nabla_{\mathbf{x}(t)}(\log p(\mathbf{x}(t)))p(\mathbf{x}(t))=\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t)), the equation (13) can be rewritten into the following form:

p(𝐱(t))t=\displaystyle\frac{\partial p(\mathbf{x}(t))}{\partial t}= 𝐱(t)[f(𝐱(t),t)p(𝐱(t))12(g2(t)μ2(t))𝐱(t)p(𝐱(t))]\displaystyle-\nabla_{\mathbf{x}(t)}\cdot[f(\mathbf{x}(t),t)p(\mathbf{x}(t))-\frac{1}{2}(g^{2}(t)-\mu^{2}(t))\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t))] (14)
+12μ2(t)𝐱(t)𝐱(t)p(𝐱(t))\displaystyle+\frac{1}{2}\mu^{2}(t)\nabla_{\mathbf{x}(t)}\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t))
=\displaystyle= 𝐱(t)[(f(𝐱(t),t)12(g2(t)μ2(t))𝐱(t)logp(𝐱(t)))p(𝐱(t))]\displaystyle-\nabla_{\mathbf{x}(t)}\cdot[(f(\mathbf{x}(t),t)-\frac{1}{2}(g^{2}(t)-\mu^{2}(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)))p(\mathbf{x}(t))]
+12μ2(t)𝐱(t)𝐱(t)p(𝐱(t)),\displaystyle+\frac{1}{2}\mu^{2}(t)\nabla_{\mathbf{x}(t)}\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t)),

where μ(t)\mu(t) is an arbitrary smooth function. By comparing (13) and (14), the SDE corresponding to equation (14) is

d𝐱(t)=[f(𝐱(t),t)12(g2(t)μ2(t))𝐱(t)logp(𝐱(t))]dt+μ(t)dw.d\mathbf{x}(t)=[f(\mathbf{x}(t),t)-\frac{1}{2}(g^{2}(t)-\mu^{2}(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))]dt+\mu(t)dw.

Again, by Anderson [34], we obtain the reverse SDE of the above SDE as follows:

d𝐱(t)=[f(𝐱(t),t)12(g2(t)+μ2(t))𝐱(t)logp(𝐱(t))]dt+μ(t)dw.d\mathbf{x}(t)=[f(\mathbf{x}(t),t)-\frac{1}{2}(g^{2}(t)+\mu^{2}(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))]dt+\mu(t)dw.

Let μ(t)=0,f(𝐱(t),t)=β(t)2𝐱,g(t)=β(t)\mu(t)=0,f(\mathbf{x}(t),t)=-\frac{\beta(t)}{2}\mathbf{x},~{}g(t)=\sqrt{\beta(t)}, we can get the following ordinary differential equation,

d𝐱(t)=[β(t)2𝐱(t)β(t)2𝐱(t)logp(𝐱(t))]dt.d\mathbf{x}(t)=[-\frac{\beta(t)}{2}\mathbf{x}(t)-\frac{\beta(t)}{2}\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))]dt. (15)

By taking a negative time step, the differential equation (15) defines a new generative process from x(T^)p(x(T^))x(\hat{T})\sim p(x(\hat{T})). Similar to the previous discussion, for solving the inverse problem (1), we only need to solve the following reverse ordinary differential equation:

dx¯(t)=[β(t)2x¯(t)β(t)2(x¯(t)logp(x¯(t))+x¯(t)logp(y¯δ|x¯(t)))]dt.d\bar{x}(t)=[-\frac{\beta(t)}{2}\bar{x}(t)-\frac{\beta(t)}{2}(\nabla_{\bar{x}(t)}\log p(\bar{x}(t))+\nabla_{\bar{x}(t)}\log p(\bar{y}_{\delta}|\bar{x}(t)))]dt. (16)

Additionally, we observed in experiments that the gradient values near the boundaries in the gradient descent steps of the DPS inversion algorithm are relatively small. This suggests that the values near the boundaries remain nearly unchanged when updating the parameters for inversion, leading to significant errors near the boundaries in the inversion results. To mitigate this issue, we propose modifying the L2L_{2} norm of the data residual term to an adaptive norm. Specifically, we introduce a new weighted norm for the function (or matrix) MM, defined as follows:

M(x)ρ\displaystyle\|M(x)\|_{\rho} =Ωρ(x)M(x)2𝑑x,\displaystyle=\int_{\Omega}\rho(x)M(x)^{2}dx,
Mρ\displaystyle\|M\|_{\rho} =ijρijMij2,\displaystyle=\sum_{i}\sum_{j}\rho_{ij}M_{ij}^{2},

where Ω\Omega is the domain of the inversion parameters. ρ(x)\rho(x)({ρij}\{\rho_{ij}\}) is a weighting function(weighting matrix), which is set as:

ρ(x)\displaystyle\rho(x) =(maxxΩF(x¯)2x¯|x¯=x¯^0(x))2+c(F(x¯)2x¯|x¯=x¯^0(x))2+c\displaystyle=\frac{(\max_{x\in\Omega}\frac{\partial\|F(\bar{x})\|_{2}}{\partial\bar{x}}|_{\bar{x}=\hat{\bar{x}}_{0}}(x))^{2}+c}{(\frac{\partial\|F(\bar{x})\|_{2}}{\partial\bar{x}}|_{\bar{x}=\hat{\bar{x}}_{0}}(x))^{2}+c} (17)
ρij\displaystyle\rho_{ij} =(maxxijΩF(x¯)x¯ij|x¯=x¯^0)2+c(F(x¯)2x¯ij|x¯=x¯^0)2+c,\displaystyle=\frac{(\max_{x_{ij}\in\Omega}\frac{\partial\|F(\bar{x})\|}{\partial\bar{x}_{ij}}|_{\bar{x}=\hat{\bar{x}}_{0}})^{2}+c}{(\frac{\partial\|F(\bar{x})\|_{2}}{\partial_{\bar{x}_{ij}}}|_{\bar{x}=\hat{\bar{x}}_{0}})^{2}+c},

where cc is a small constant to guarantee the stability of gradients. Therefore, we modify equation (12) to the following form

x¯(t)logp(x¯(t)|y¯δ)sθ(x¯(t),t)ζix¯(t)y¯δF(x¯^0(x¯(t)))ρ2.\nabla_{\bar{x}(t)}\log p(\bar{x}(t)|\bar{y}_{\delta})\simeq s_{\theta*}(\bar{x}(t),t)-\zeta_{i}\nabla_{\bar{x}(t)}\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0}(\bar{x}(t)))\|_{\rho}^{2}. (18)

Based on the above discussion, and combing formulas (16) with (18), and using the ancestral sampling, we get an ODE-based DPS (ODE-DPS) algorithm as summarized in algorithm 2.

1 Require: N,y¯δ,sθ,ζ,γ,c,{βi}i=1NN,~{}\bar{y}_{\delta},~{}s_{\theta^{*}},~{}\zeta,~{}\gamma,~{}c,\{\beta_{i}\}_{i=1}^{N};
2 Result: x¯0\bar{x}_{0};
3 αi:=1βi,αi¯:=j=1iαi\alpha_{i}:=1-\beta_{i},\bar{\alpha_{i}}:=\prod_{j=1}^{i}\alpha_{i}
4 x¯N𝒩(0,𝐈)\bar{x}_{N}\sim\mathcal{N}(0,\mathbf{I});
5 for i=Ni=N to 0 do
6       s^=sθ(x¯i,i)\hat{s}=s_{\theta^{*}}(\bar{x}_{i},i)
7       x¯^0=1α¯i(x¯i+12(1α¯i)s^)\hat{\bar{x}}_{0}=\frac{1}{\sqrt{\bar{\alpha}_{i}}}(\bar{x}_{i}+\frac{1}{2}(1-\bar{\alpha}_{i})\hat{s})
8       x¯i1=αi(1α¯i1)1α¯ix¯i+α¯i1βi1α¯ix¯^0\bar{x}_{i-1}^{\prime}=\frac{\sqrt{\alpha_{i}}(1-\bar{\alpha}_{i-1})}{1-\bar{\alpha}_{i}}\bar{x}_{i}+\frac{\sqrt{\bar{\alpha}_{i-1}}\beta_{i}}{1-\bar{\alpha}_{i}}\hat{\bar{x}}_{0}
9       ζi=ζγ[N1100]\zeta_{i}=\zeta\gamma^{[\frac{N-1}{100}]}
10       ρ(x)=(maxxΩF(x¯)2x¯|x¯=x¯^0(x))2+c(F(x¯)2x¯|x¯=x¯^0(x))2+c\rho(x)=\frac{(\max_{x\in\Omega}\frac{\partial\|F(\bar{x})\|_{2}}{\partial\bar{x}}|_{\bar{x}=\hat{\bar{x}}_{0}}(x))^{2}+c}{(\frac{\partial\|F(\bar{x})\|_{2}}{\partial\bar{x}}|_{\bar{x}=\hat{\bar{x}}_{0}}(x))^{2}+c}
11       x¯i1=x¯i1ζix¯iy¯δF(x¯^0)ρ2\bar{x}_{i-1}=\bar{x}_{i-1}^{\prime}-\zeta_{i}\nabla_{\bar{x}_{i}}\|\bar{y}_{\delta}-F(\hat{\bar{x}}_{0})\|_{\rho}^{2}.
12      
Algorithm 2 ODE-DPS for Inverse problem
Remark 1.

It is worth noting that, in Algorithm 2, there is no need to retrain the neural network sθ(x¯,t)s_{\theta}(\bar{x},t). This approach requires training the neural network only once on the dataset, making it applicable to various inverse problem-solving tasks. This significantly reduces the inversion time compared to the inverse methods based on Physics-Informed Neural Networks (PINNs) [6], thus enhancing the inversion efficiency. Moreover, our inversion algorithm, akin to traditional regularization methods, incorporates the model information of the inverse problem. Compared to neural operator-based inversion methods [11, 13], it improves generalization and enhances the algorithm’s interpretability.

4.3 U-Net Architecture

Refer to caption
Figure 3: The Simplified U-Net model structure.

For diffusion models, U-Net architecture [37] is always used. The Simplified U-Net model structure is shown in Figure 3, and we specifically illustrate the architecture of the U-Net model along with the parameters we utilized in C.

The U-Net model consists of a downsampling path and an upsampling path. These two path are connected with skip connections that do not change the spatial size. Downsampling path employs numerous convolutional layers, decreasing the spatial size and increase the number of feature channels. Upsampling path increases the spatial size and decreasing the number of feature channels. For the skip connections, padding in convolutional layers of the downsampling path ensures that the tensors we concatenate have the same spatial size. Multi-heads attention layers [38] are also used in this work, which is found that can improve the behavior of the U-Net model. In addtion, we need to add a projection of the timestep embedding into each residual block.

5 Experiment

In this section, we show the performance of our method by three different inverse problems. Experimental performance is evaluated using both qualitative analysis and quantitative metric. We define the relative l2l_{2} error as follows:

relativel2error=x¯refx¯recx¯ref,\textrm{relative}~{}l_{2}~{}\textrm{error}=\frac{\|\bar{x}_{ref}-\bar{x}_{rec}\|}{\|\bar{x}_{ref}\|},

where x¯ref\bar{x}_{ref} is the ground truth and x¯rec\bar{x}_{rec} is the reconstructed parameter. The relative l2l_{2} error measures the degree of accuracy in reconstructing the unknown parameters.

To show the advantages of our proposed inversion algorithm, we will contrast it with the traditional regularization inversion methods, namely the Landweber iterative regularization method [27, 5] and the Tikhonov regularization method [5], in the paper. We illustrate the methods as follows:

  1. 1.

    Method 1: Landweber iteration regularization. It is a widely used regularization inversion method applied in solving ill-posed inverse problems, and various extensions of this method have been developed for different inverse problems. In this paper, we only focus on the most basic version [27], i.e., given x¯0\bar{x}_{0} is an initial guess which may incorporate a prior knowledge of an exact solution for inverse problem, the iteration is defined via the adjoint F()F^{{}^{\prime}}(\cdot)^{*} of F()F^{{}^{\prime}}(\cdot) as follows:

    x¯k+1δ=x¯kδ+ζkF(x¯kδ)(y¯δF(x¯kδ)),k=0,1,2,,\bar{x}_{k+1}^{\delta}=\bar{x}_{k}^{\delta}+\zeta_{k}F^{{}^{\prime}}(\bar{x}_{k}^{\delta})^{*}(\bar{y}^{\delta}-F(\bar{x}_{k}^{\delta})),~{}k=0,1,2,\cdots,

    where F()F^{{}^{\prime}}(\cdot) is the Fréchet derivative of forward operator F()F(\cdot) and ζk\zeta_{k} is the step size of kkth step. To obtain a stable approximation of the method, the iteration process need to be stopped after k=k(δ)k_{*}=k_{*}(\delta) steps according to a generalized discrepancy principle, i.e.,

    y¯δF(x¯kδ)τδ<y¯δF(x¯kδ),0kk,\|\bar{y}^{\delta}-F(\bar{x}_{k_{*}}^{\delta})\|\leq\tau\delta<\|\bar{y}^{\delta}-F(\bar{x}_{k}^{\delta})\|,~{}0\leq k\leq k_{*},

    where τ>1\tau>1 is a constant. In general, we set τ=1.01\tau=1.01. Additionally, in Lemma 2.4 of reference [5], we obtain that the Landweber iteration step is the steepest descent step with a stepsize ζk\zeta_{k} of a quadratic functional Ψ(x¯δ)=12y¯δF(x¯δ)22\Psi(\bar{x}^{\delta})=\frac{1}{2}\|\bar{y}^{\delta}-F(\bar{x}^{\delta})\|_{2}^{2}. We denote the descent direction by x¯δΨ(x¯δ)\nabla_{\bar{x}^{\delta}}\Psi(\bar{x}^{\delta}).

  2. 2.

    Method 2: An improved Landweber iteration regularization. Following the principles of our algorithm, we modify the descent direction using x¯δΨρ(x¯δ)\nabla_{\bar{x}^{\delta}}\Psi_{\rho}(\bar{x}^{\delta}), where Ψρ(x¯δ)=12y¯δF(x¯δ)ρ2\Psi_{\rho}(\bar{x}^{\delta})=\frac{1}{2}\|\bar{y}^{\delta}-F(\bar{x}^{\delta})\|_{\rho}^{2}, with ρ\|\cdot\|_{\rho} defined in (17).

  3. 3.

    Method 3: Tikhonov regularization method. Our objective is modified to solve the following optimization problem:

    x¯δ:=argminx¯y¯δF(x¯)22+αx¯22.\bar{x}^{\delta}:=\arg\min\limits_{\bar{x}}\|\bar{y}^{\delta}-F(\bar{x})\|_{2}^{2}+\alpha\|\bar{x}\|_{2}^{2}. (19)

    Here we set α=0.005\alpha=0.005 or α=0.1\alpha=0.1 in different experiments, and we find that this choice yields the best inversion results for the regularization method.

Unless otherwise specified, in all numerical examples, for the three traditional regularization methods mentioned above, we fix the maximum number of iterations at 2000, and set the step size ζk\zeta_{k} in iterations equal to our method ODE-DPS.

5.1 Inverse problems of heat equation

5.1.1 Problem settings

In this part, we consider the following heat equation:

{ut(x,y,t)=aΔu(x,y,t)+f(x,y),(x,y,t)Ω×(0,T),u(x,y,t)=ψ(x,y,t),(x,y,t)Ω×(0,T),u(x,y,0)=ϕ(x,y),(x,y)Ω¯,\left\{\begin{aligned} u_{t}(x,y,t)&=a\Delta u(x,y,t)+f(x,y),~{}(x,y,t)\in\Omega\times(0,T),\\ u(x,y,t)&=\psi(x,y,t),~{}(x,y,t)\in\partial\Omega\times(0,T),\\ u(x,y,0)&=\phi(x,y),~{}(x,y)\in\bar{\Omega},\end{aligned}\right. (20)

where Ω=[0,1]×[0,1]\Omega=[0,1]\times[0,1], the coefficient aa is a positive constant. Next, we will address two inverse problems: the inverse source problem and the inverse initial value problem. Both of these problems are linear and ill-posed, necessitating the use of regularization methods for their solution. We will employ our proposed ODE-DPS inversion algorithm along with three traditional regularization inversion algorithms to tackle these two inverse problems.

In this example, we focus on the inverse source problem and inverse initial value problem by measurements

y¯(x,y)=u(x,y,T).\bar{y}(x,y)=u(x,y,T).

The noisy measurements are generated by adding a random perturbation, i.e.,

y¯δ:=uδ(x,y,T)=u(x,y,T)+εηmax|u(x,y,T)|,\bar{y}^{\delta}:=u^{\delta}(x,y,T)=u(x,y,T)+\varepsilon\eta\max|u(x,y,T)|, (21)

where η𝒩(0,I)\eta\sim\mathcal{N}(0,I), ε\varepsilon represent the noise percentage. The corresponding noise level is calculated by δ=y¯δy¯\delta=\|\bar{y}^{\delta}-\bar{y}\|.

5.1.2 Implementation details

To solve the inverse problem, it is necessary to solve the forward problem multiple times. For equation (20), we utilize a five-point implicit difference scheme. In the inverse source problem, we set T=1T=1, the time grid step size as Δt=0.01\Delta t=0.01, the spatial grid step size as Δx=Δy=163\Delta x=\Delta y=\frac{1}{63}, and the regularization parameter of Tikhonov regularization meth as α=0.005\alpha=0.005. In the inverse initial value problem, we set T=0.01T=0.01, Δt=0.0001\Delta t=0.0001, Δx=Δy=163\Delta x=\Delta y=\frac{1}{63}, α=0.1\alpha=0.1. For the noisy measurement data, we set ε=0.05\varepsilon=0.05.

Before applying our method for solving the inverse problems, we trained a new diffusion model with the method in [39] by new datasets shown bellow. The training algorithm is provided in A. The training dataset is composed of functions like

k=1nxm=1nyαk,m1sinkπxsinmπy+αk,m2sinkπxcosmπy+αk,m3coskπxsinmπy+αk,m4coskπxcosmπy,\sum_{k=1}^{n_{x}}\sum_{m=1}^{n_{y}}\alpha_{k,m}^{1}\sin k\pi x\sin m\pi y+\alpha_{k,m}^{2}\sin k\pi x\cos m\pi y+\alpha_{k,m}^{3}\cos k\pi x\sin m\pi y+\alpha_{k,m}^{4}\cos k\pi x\cos m\pi y,

where αk,m1,αk,m2,αk,m3,αk,m4\alpha_{k,m}^{1},\alpha_{k,m}^{2},\alpha_{k,m}^{3},\alpha_{k,m}^{4} are uniformly random samples in [1k2m2,1k2m2][-\frac{1}{k^{2}m^{2}},\frac{1}{k^{2}m^{2}}], nx=ny=15n_{x}=n_{y}=15. In all experiments of this paper, we select 5000 samples to train the neural network of the score-based generative model.

5.1.3 Inverse heat source problem

Refer to caption
Refer to caption
(a) The true source f1f_{1}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion f1f_{1} (left) and the errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion f1f_{1} (left) and the errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion f1f_{1} (left) and the errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion f1f_{1} (left) and the errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logf1f1,rec2f12e_{f}=\log{\frac{\|f_{1}-f_{1,rec}\|_{2}}{\|f_{1}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;f1)u(x,y,T;f1,rec)2uδ(x,y,T;f1)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;f_{1})-u(x,y,T;f_{1,rec})\|_{2}}{\|u^{\delta}(x,y,T;f_{1})\|_{2}}}.
Figure 4: The inversion results of source f1f_{1}. (a): the true f1f_{1} and measurement data uδ(x,y,T;f1)u^{\delta}(x,y,T;f_{1}). (b)(c)(d)(e): the inversion f1f_{1} and errors using different regularization methods. (f): the errors between the true f1f_{1} and the inversion f1,recf_{1,rec}. (g): the errors between the measurement data uδ(x,y,T;f1)u^{\delta}(x,y,T;f_{1}) and u(x,y,T;f1,rec)u(x,y,T;f_{1,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.005\alpha=0.005 in Tikhonov regularization.
Refer to caption
Refer to caption
(a) The true source f2f_{2}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion f2f_{2} (left) and the errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion f2f_{2} (left) and the errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion f2f_{2} (left) and the errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion f2f_{2} (left) and the errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logf2f2,rec2f22e_{f}=\log{\frac{\|f_{2}-f_{2,rec}\|_{2}}{\|f_{2}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;f2)u(x,y,T;f2,rec)2uδ(x,y,T;f2)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;f_{2})-u(x,y,T;f_{2,rec})\|_{2}}{\|u^{\delta}(x,y,T;f_{2})\|_{2}}}.
Figure 5: The inversion results of source f2f_{2}. (a): the true f2f_{2} and measurement data uδ(x,y,T;f2)u^{\delta}(x,y,T;f_{2}). (b)(c)(d)(e): the inversion f2f_{2} and errors using different regularization methods. (f): the errors between the true f2f_{2} and the inversion f2,recf_{2,rec}. (g): the errors between the measurement data uδ(x,y,T;f2)u^{\delta}(x,y,T;f_{2}) and u(x,y,T;f2,rec)u(x,y,T;f_{2,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.005\alpha=0.005 in Tikhonov regularization.
Refer to caption
Refer to caption
(a) The true source f3f_{3}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion f3f_{3} (left) and the errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion f3f_{3} (left) and the errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion f3f_{3} (left) and the errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion f3f_{3} (left) and the errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logf3f3,rec2f32e_{f}=\log{\frac{\|f_{3}-f_{3,rec}\|_{2}}{\|f_{3}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;f3)u(x,y,T;f3,rec)2uδ(x,y,T;f3)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;f_{3})-u(x,y,T;f_{3,rec})\|_{2}}{\|u^{\delta}(x,y,T;f_{3})\|_{2}}}.
Figure 6: The inversion results of source f3f_{3}. (a): the true f3f_{3} and measurement data uδ(x,y,T;f3)u^{\delta}(x,y,T;f_{3}). (b)(c)(d)(e): the inversion f3f_{3} and errors using different regularization methods. (f): the errors between the true f3f_{3} and the inversion f3,recf_{3,rec}. (g): the errors between the measurement data uδ(x,y,T;f3)u^{\delta}(x,y,T;f_{3}) and u(x,y,T;f3,rec)u(x,y,T;f_{3,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.005\alpha=0.005 in Tikhonov regularization.

In this example, we take the functions of problem (20) as follows: the diffusion coefficient a=1a=1, the boundary ψ(x,y,t)=0\psi(x,y,t)=0 for (x,y,t)Ω×(0,T)(x,y,t)\in\partial\Omega\times(0,T), the initial value ϕ(x,y)=20sin2πxsin2πy\phi(x,y)=20\sin 2\pi x\sin 2\pi y for (x,y)Ω(x,y)\in\Omega. The parameters of our algorithm ODE-DPS are set as γ=1,ζ=9,c=1e8,N=1000\gamma=1,~{}\zeta=9,~{}c=1e-8,~{}N=1000.

Next, we will explore the inversion of three distinct source terms to evaluate the performance of our inversion technique. To ascertain the effectiveness of our proposed method, we will select one source term that mirrors those present in the training dataset but is absent from it. Furthermore, we will opt for two additional source functions that exhibit significant deviation from those featured in the training dataset. In the following, we will consider three representative examples given by:

  • Case 1: We select source f1f_{1} shared the same form of training dataset, but f1f_{1} is not included in the training set. The ground truth source f1f_{1} is shown in Figure 4.

  • Case 2: We choose f2(x,y)=(12x)sinπyf_{2}(x,y)=(1-2x)\sin\pi y, which represents a polynomial function multiplied by a trigonometric function. The ground truth source f2f_{2} is shown in Figure 5.

  • Case 3: We take f3(x,y)=75(x2+(y13)2)1f_{3}(x,y)=\frac{7}{5}(x^{2}+(y-\frac{1}{3})^{2})-1. The ground truth source f3f_{3} is shown in Figure 6.

Results and discussion: We compare the performance of the ODE-DPS regularization inversion method with that of the Landweber iteration regularization (Method 1), an improved Landweber iteration regularization (Method 2) and Tikhonov regularization (Method 3). All hyperparameters in the tested methods are manually tuned for optimal performance. Table 1 illustrates the relative l2l_{2} error for different heat sources ff using various regularization methods. Visual comparisons can be found in Figures 4-6. We summarize our findings as follows:

  1. 1.

    For all three numerical examples, the ODE-DPS inversion algorithm yielded the best numerical inversion results. Furthermore, our proposed inversion algorithm required fewer iterations compared to the Landweber iterative regularization method (Method 1) and Tikhonov regularization method (Method 3).

  2. 2.

    The improved Landweber iterative regularization method, referred to as Method 2, produced inversion outcomes with reduced relative l2l_{2} error in comparison to the original Landweber iterative regularization method (Method 1). This suggests that adjusting the gradient descent direction using x¯δΨρ(x¯δ)\nabla_{\bar{x}^{\delta}}\Psi_{\rho}(\bar{x}^{\delta}) throughout the iterations of the inversion algorithm enhances its performances.

  3. 3.

    Figures 4f, 5f, and 6f indicate that traditional regularization methods without regularization item lead to semi-convergent numerical inversion results with increasing iterations while Tikhonov regularization method and our proposed algorithm shows a consistent decrease in relative l2l_{2} error after a certain number of iterations. Furthermore, Figures 4g, 5g, and 6g demonstrate that all the inversion methods produce source terms ff of varying accuracy but their respective numerical solutions u(x,y,T;fi,rec)u(x,y,T;f_{i,rec}) and measurement data uδ(x,y,T;fi)u^{\delta}(x,y,T;f_{i}) show nearly identical discrepancies, where i=1,2,3i=1,2,3.

f1f_{1} f2f_{2} f3f_{3}
method steps relative l2l_{2} error steps relative l2l_{2} error steps relative l2l_{2} error
Method 1 1471 34.4% 1704 34.1% 1732 33.9%
Method 2 523 24.1% 817 29.4% 637 27.4%
Method 3 2000 37.9% 2000 39.5% 2000 37.7%
ODE-DPS(ours) 1000 9.4% 1000 7.3% 1000 10.4%
Table 1: The relative l2l_{2} error and iteration steps of inverse heat source using different inversion methods. Methods 1, 2, and 3 correspond to Landweber iteration, improved Landweber iteration, and Tikhonov regularization, respectively. We set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.005\alpha=0.005 in Tikhonov regularization.

5.1.4 Inverse initial value problem

In this experiment, we take the functions of problem (20) as follows: the diffusion coefficient a=1a=1, the heat source f(x,y)=0f(x,y)=0 for (x,y)Ω(x,y)\in\Omega. The parameters of our algorithm ODE-DPS are set as γ=0.75,ζ=2,c=1e5,N=1000\gamma=0.75,~{}\zeta=2,~{}c=1e-5,~{}N=1000.

Similar to the previous inverse heat source problem, we invert the following three initial values from the measurement data uδ(x,y,T)u^{\delta}(x,y,T) to validate the effectiveness of our method in inverse initial value problems for the heat equation:

  • Case 1: We select initial value ϕ1\phi_{1} share the same form of training data, but ϕ1(x,y)\phi_{1}(x,y) is not included in the training set and not equal to f1f_{1}. The ground truth initial value ϕ1\phi_{1} is shown in Figure 7.

  • Case 2: We select ϕ2(x,y)=4(xx2)cosπy\phi_{2}(x,y)=4(x-x^{2})\cos\pi y, which is entirely different from the training data. The true initial value ϕ2\phi_{2} is depicted in Figure 8.

  • Case 3: We take ϕ3(x,y)=75(x2+(y13)2)1\phi_{3}(x,y)=\frac{7}{5}(x^{2}+(y-\frac{1}{3})^{2})-1, and the actual initial value ϕ3\phi_{3} is depicted in Figure 9.

Refer to caption
Refer to caption
(a) The true initial value ϕ1\phi_{1}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion ϕ1\phi_{1}(left) and errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion ϕ1\phi_{1}(left) and errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion ϕ1\phi_{1}(left) and errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion ϕ1\phi_{1}(left) and errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logϕ1ϕ1,rec2ϕ12e_{f}=\log{\frac{\|\phi_{1}-\phi_{1,rec}\|_{2}}{\|\phi_{1}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;ϕ1)u(x,y,T;ϕ1,rec)2uδ(x,y,T;ϕ1)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;\phi_{1})-u(x,y,T;\phi_{1,rec})\|_{2}}{\|u^{\delta}(x,y,T;\phi_{1})\|_{2}}}.
Figure 7: The inversion results of initial value ϕ1\phi_{1}. (a): the true ϕ1\phi_{1} and measurement data uδ(x,y,T;ϕ1)u^{\delta}(x,y,T;\phi_{1}). (b)(c)(d)(e): the inversion ϕ1\phi_{1} and errors using different regularization methods. (f): the errors between the true ϕ1\phi_{1} and the inversion ϕ1,rec\phi_{1,rec}. (g): the errors between the measurement data uδ(x,y,T;ϕ1)u^{\delta}(x,y,T;\phi_{1}) and u(x,y,T;ϕ1,rec)u(x,y,T;\phi_{1,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.
Refer to caption
Refer to caption
(a) The true initial value ϕ2\phi_{2}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion ϕ2\phi_{2}(left) and errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion ϕ2\phi_{2}(left) and errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion ϕ2\phi_{2}(left) and errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion ϕ2\phi_{2}(left) and errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logϕ2ϕ2,rec2ϕ22e_{f}=\log{\frac{\|\phi_{2}-\phi_{2,rec}\|_{2}}{\|\phi_{2}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;ϕ2)u(x,y,T;ϕ2,rec)2uδ(x,y,T;ϕ2)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;\phi_{2})-u(x,y,T;\phi_{2,rec})\|_{2}}{\|u^{\delta}(x,y,T;\phi_{2})\|_{2}}}.
Figure 8: The inversion results of initial value ϕ2\phi_{2}. (a): the true ϕ2\phi_{2} and measurement data uδ(x,y,T;ϕ2)u^{\delta}(x,y,T;\phi_{2}). (b)(c)(d)(e): the inversion ϕ2\phi_{2} and errors using different regularization methods. (f): the errors between the true ϕ2\phi_{2} and the inversion ϕ2,rec\phi_{2,rec}. (g): the errors between the measurement data uδ(x,y,T;ϕ2)u^{\delta}(x,y,T;\phi_{2}) and u(x,y,T;ϕ2,rec)u(x,y,T;\phi_{2,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.
Refer to caption
Refer to caption
(a) The true initial value ϕ3\phi_{3}(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion ϕ3\phi_{3}(left) and errors(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion ϕ3\phi_{3}(left) and errors(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion ϕ3\phi_{3}(left) and errors(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion ϕ3\phi_{3}(left) and errors(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logϕ3ϕ3,rec2ϕ32e_{f}=\log{\frac{\|\phi_{3}-\phi_{3,rec}\|_{2}}{\|\phi_{3}\|_{2}}}.
Refer to caption
(g) The eu=loguδ(x,y,T;ϕ3)u(x,y,T;ϕ3,rec)2uδ(x,y,T;ϕ3)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;\phi_{3})-u(x,y,T;\phi_{3,rec})\|_{2}}{\|u^{\delta}(x,y,T;\phi_{3})\|_{2}}}.
Figure 9: The inversion results of initial value ϕ3\phi_{3}. (a): the true ϕ3\phi_{3} and measurement data uδ(x,y,T;ϕ3)u^{\delta}(x,y,T;\phi_{3}). (b)(c)(d)(e): the inversion ϕ3\phi_{3} and errors using different regularization methods. (f): the errors between the true ϕ3\phi_{3} and the inversion ϕ3,rec\phi_{3,rec}. (g): the errors between the measurement data uδ(x,y,T;ϕ3)u^{\delta}(x,y,T;\phi_{3}) and u(x,y,T;ϕ3,rec)u(x,y,T;\phi_{3,rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.
ϕ1\phi_{1} ϕ2\phi_{2} ϕ3\phi_{3}
method steps relative l2l_{2} error steps relative l2l_{2} error steps relative l2l_{2} error
Method 1 138 39.8% 86 36.4% 96 37.7%
Method 2 86 38.4% 76 36.0% 63 36.3%
Method 3 2000 40.5% 2000 36.4% 2000 37.3%
ODE-DPS(ours) 1000 10.2% 1000 6.9% 1000 10.6%
Table 2: The relative l2l_{2} error and iteration steps of inverse initial value problem using different inversion methods. Methods 1, 2, and 3 correspond to Landweber iteration, improved Landweber iteration, and Tikhonov regularization, respectively. we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.

Results and discussion: Table 2 presents the relative l2l_{2} error for different initial values ϕ\phi using various regularization methods. Visual comparisons can be observed in Figures 7-9. Our conclusions are summarized as follows:

  1. 1.

    Similar to the experimental findings in identifying the heat source problem, the ODE-DPS inversion algorithm produced the most accurate numerical inversion results for all three numerical examples. Moreover, the relative l2l_{2} error obtained with the three traditional regularization methods, is higher compared to the relative l2l_{2} error in the inverse source problem. This discrepancy arises from the exponential decay of initial values in the heat equation, indicating that perturbations in the initial values have minimal impact on the solution at time TT. Consequently, the inverse problem of recovering initial values becomes more challenging, further emphasizing the effectiveness of our algorithm.

  2. 2.

    From Figures 7f, 8f and 9f, we observe that the relative l2l_{2} error of the numerical inversion results generated by traditional regularization methods decrease rapidly initially and then tends to level off, while the relative l2l_{2} error produced by ODE-DPS show a consistent decrease and the inversion solution significantly superior to that obtained by traditional regularization methods.

5.2 Inverse Problem of Wave Equation

5.2.1 Problem settings

To further demonstrate the effectiveness of our proposed inversion algorithm, we consider a wave equation as follows:

{utt(x,y,t)=aΔu(x,y,t)+f(x,y),(x,y,t)Ω×(0,T),u(x,y,t)=g(x,y,t),(x,y,t)Ω×(0,T),u(x,y,0)=ϕ(x,y),(x,y)Ω¯,ut(x,y,0)=ψ(x,y),(x,y)Ω¯,\left\{\begin{aligned} u_{tt}(x,y,t)&=a\Delta u(x,y,t)+f(x,y),(x,y,t)\in\Omega\times(0,T),\\ u(x,y,t)&=g(x,y,t),(x,y,t)\in\partial\Omega\times(0,T),\\ u(x,y,0)&=\phi(x,y),(x,y)\in\bar{\Omega},\\ u_{t}(x,y,0)&=\psi(x,y),(x,y)\in\bar{\Omega},\\ \end{aligned}\right. (22)

where Ω=[0,1]×[0,1]\Omega=[0,1]\times[0,1], the coefficient aa is a positive constant. Similar to the experimental settings discussed earlier for inverse problems in the heat equation, we focus solely on an inverse source problem utilizing measurement data uδ(x,y,T)u^{\delta}(x,y,T) in this part to demonstrate the effectiveness of our method in wave equation inverse problems.

5.2.2 Implementation details

To solve the wave equation (22), we employ a five-point explicit difference scheme for the forward problem simulation. We fix T=1T=1 and set the time grid step size as Δt=1100\Delta t=\frac{1}{100}, and the spatial grid step size as Δx=Δy=163\Delta x=\Delta y=\frac{1}{63}. For the noisy measurement data, we set ε=0.05\varepsilon=0.05. The functions for problem (22) are set as follows: the positive coefficient a=0.05a=0.05, the boundary condition g(x,y,t)=0g(x,y,t)=0 for (x,y,t)Ω×(0,T)(x,y,t)\in\partial\Omega\times(0,T), and the initial values ϕ(x,y)=0\phi(x,y)=0 and ψ(x,y)=0\psi(x,y)=0 for (x,y)Ω(x,y)\in\Omega. The parameters of our ODE-DPS algorithm are set as γ=0.65\gamma=0.65, ζ=1.1\zeta=1.1, c=1e5c=1e-5, and N=1000N=1000. It is important to emphasize that, in this inverse source problem for the wave equation, there is no need to retrain the neural network. We can utilize the neural network sθ(𝐱(t),t)s_{\theta^{*}}(\mathbf{x}(t),t) previously trained for the inverse problems of the heat equation in the ODE-DPS inversion algorithm to identify the wave source problem.

We choose f(x,y)=sinπxcosπyf(x,y)=\sin\pi x\cos\pi y as a representative to validate the effectiveness of our method in solving the inverse source problem of the wave equation. The ground truth source ff is shown in Figure 10. We apply our method on the inverse source problem in the wave equation and compare it with the traditional methods (Method 1, Method 2 and Method 3).

Results and discussion: Table 3 demonstrates the relative l2l_{2} error for the inversion source ff using different regularization methods. The visual comparision can be found in Figure 10. We have the following findings:

  1. 1.

    Similar to the experiments conducted for the heat equation, all different regularization methods are capable of producing good numerical results. However, our proposed inversion algorithm outperforms the three compared traditional regularization methods in terms of lower relative l2l_{2} errors in Table 3 and visually more accurate numerical reconstruction in Figure 10. However, our proposed inversion algorithm required more iterations compared to the first two methods, which mainly depends on the sampling step size of the scored-based diffusion model.

  2. 2.

    From Figures 10f and 10g, it can be observed that the experimental results of the wave equation are remarkably similar to those of the heat equation, except that the numerical inversion results generated by Landweber iterative regularization method (Method 1) converge to a stable solution rather than exhibit a trend of decreasing relative l2l_{2} error initially followed by an increase as the iteration progresses. However, this stable solution is obviously inferior to the solution obtained by our method.

Refer to caption
Refer to caption
(a) The true source ff(left) and measurements data(right).
Refer to caption
Refer to caption
(b) The inversion ff(left) and the error(right) by ODE-DPS.
Refer to caption
Refer to caption
(c) The inversion ff(left) and the error(right) by Method 1(Landweber iteration).
Refer to caption
Refer to caption
(d) The inversion ff(left) and the error(right) by Method 2(Improved Landweber iteration).
Refer to caption
Refer to caption
(e) The inversion ff(left) and the error(right) by Method 3(Tikhonov regularization).
Refer to caption
(f) The ef=logffrec2f2e_{f}=\log\frac{\|f-f_{rec}\|_{2}}{\|f\|_{2}}.
Refer to caption
(g) The eu=loguδ(x,y,T;f)u(x,y,T;frec)2uδ(x,y,T;f)2e_{u}=\log{\frac{\|u^{\delta}(x,y,T;f)-u(x,y,T;f_{rec})\|_{2}}{\|u^{\delta}(x,y,T;f)\|_{2}}}.
Figure 10: The inversion results of source ff in wave equation. (a): the true ff and measurement data uδ(x,y,T;f)u^{\delta}(x,y,T;f). (b)(c)(d)(e): the inversion ff and errors using different regularization methods. (f): the errors between the true ff and the inversion frecf_{rec}. (g): the errors between the measurement data uδ(x,y,T;f)u^{\delta}(x,y,T;f) and u(x,y,T;frec)u(x,y,T;f_{rec}). Here, we set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.
method steps relative l2l_{2} error
Method 1 468 30.5%
Method 2 142 29.1%
Method 3 2000 31.8%
ODE-DPS(ours) 1000 3.5%
Table 3: The relative l2l_{2} error and iteration steps of inverse source problem of wave equation (22) using different inversion methods. Methods 1, 2, and 3 correspond to Landweber iteration, improved Landweber iteration, and Tikhonov regularization, respectively. We fix the maximum number of iterations for Methods 1, 2, and 3 at 2000. We set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.

5.3 Algorithm Investigations

5.3.1 Hyperparameters Analysis

In this part, we will discuss the impact of selecting different hyperparameters on the experimental results. We take the inverse source problem of the wave equation mentioned above as an example, by varying the step sizes ζ\zeta and its decay rate γ\gamma in gradient descent step of our method.

From Table 4, we can observe that within the range of gradient descent step sizes fluctuating between 0.7 to 1.5, and decay rates fluctuating between 0.45 to 0.85, ODE-DPS consistently achieves low-error solutions. The difference between the best and worst solutions is only around 3%. Moreover, focusing on the range of decay rates γ\gamma fluctuating between 0.55 to 0.85, we can see that the relative l2l_{2} errors obtained by the ODE-DPS method are all below 4.7%, with fluctuations in relative l2l_{2} error not exceeding 1.1%. This indicates that the algorithm’s results provided by ODE-DPS are not strongly dependent on the choice of hyperparameters and can produce good results within an appropriate range of values.

       ζ\zeta γ\gamma       0.45       0.55       0.65       0.75       0.85
      0.7       6.6%       4.7%       3.6%       3.4%       3.7%
      0.9       6.1%       4.3%       3.5%       3.5%       4.0%
      1.1       5.7%       4.1%       3.5%       3.6%       4.3%
      1.3       5.5%       4.0%       3.4%       3.8%       4.5%
      1.5       5.2%       3.9%       3.6%       3.8%       4.6%
Table 4: The relative l2l_{2} error of the inversion source ff for wave equation (22) under different parameters. In algorithm 2, we set N=1000N=1000, c=1e5c=1e-5. For noisy measurement, we take ε=0.05\varepsilon=0.05 in (21).

5.3.2 Robustness to Noise

Next, we assess the algorithm’s performance under varying levels of noise. We continue to use the inverse source problem of the wave equation mentioned above as an example, setting ε\varepsilon in (21) to [0,0.001,0.005,0.01,0.05,0.1][0,0.001,0.005,0.01,0.05,0.1]. For all comparison methods, we fix the maximum number of iterations at 1000 and record the relative l2l_{2} error under different noise levels, as presented in Table 5. Our observations reveal that our method consistently outperforms the other three methods across all noise levels. Moreover, as the noise level increases, the error of the solutions generated by our method remains relatively stable, whereas the errors of the solutions obtained using the other three methods are significantly impacted. The data in Table 5 underscores the remarkable robustness of our proposed method to noise, surpassing all three traditional methods in performance.

Methods ε\varepsilon 0 0.001 0.005 0.01 0.05 0.1
ODE-DPS(ours) 2.0% 2.2% 2.1% 2.3% 3.5% 4.6%
Method 1 25.2% 25.4% 26.8% 27.8% 30.5% 33.9%
Method 2 21.6% 21.8% 23.3% 24.7% 29.1% 30.9%
Method 3 27.0% 27.1% 28.0% 28.8% 31.8% 34.4%
Table 5: The relative l2l_{2} error of the inversion source ff for wave equation (22) under different noise level. Methods 1, 2, and 3 correspond to Landweber iteration, improved Landweber iteration, and Tikhonov regularization, respectively. We fix the maximum number of iterations for all algorithms to 1000. We set the noise level ε=0.05\varepsilon=0.05 in (21), the stopping parameter τ=1.01\tau=1.01 in discrepancy principle, the regularization parameter α=0.1\alpha=0.1 in Tikhonov regularization.

5.4 Advantages and Limitations

Advantages: In the experimental section, we employ the ODE-DPS inversion algorithm to solve three PDE inverse problems: the inverse source problem and inverse initial value problem in the heat equation, as well as the inverse source problem in the wave equation. We compare our proposed inversion method with traditional approaches in these experiments. Through extensive numerical experiments, we summarize the advantages of our algorithm as follows:

  1. 1.

    Compared to traditional inversion methods, our proposed ODE-DPS inversion algorithm significantly enhances the inversion accuracy. These facts are evident from Tables 1, 2, 3. For instance, in the inverse heat source problem, the lowest relative l2l_{2} error obtained by the traditional regularization method across three examples is 24.1%, while our ODE-DPS inversion algorithm achieves the highest relative l2l_{2} error is 10.4%.

  2. 2.

    Our inversion algorithm does not operate on a case-by-case area. That is, our proposed inversion method requires training the neural network sθ(x¯,t)s_{\theta}(\bar{x},t) only once, and it can subsequently be applied to various inverse problem-solving tasks. This demonstrates that our deep learning-based inversion algorithm achieves nearly the same inversion efficiency as traditional inversion methods.

  3. 3.

    Our inversion method demonstrates robustness to both the parameters within the algorithm and the level of noise present in the measurement data. This implies that, in experiments, we can achieve high-precision inversion results without the need for exhaustive parameter tuning. Moreover, the algorithm’s robustness to measurement noise suggests that even in scenarios with significant noise in the measurement data, our proposed inversion method consistently produces accurate numerical results.

  4. 4.

    Compared to the Landweber iteration regularization method, our ODE-DPS inversion method does not exhibit the property of semi-convergence. This fact is evident from the relative l2l_{2} error plots of the experimental results.

Limitations: Due to the discretization step size dictated by the diffusion model’s backward generation of the differential equation, our inversion method require N=1000N=1000 iterations to produce the final inversion outcome. This slightly increases the computational burden for solving some inverse problems. In the future, we aim to improve this aspect.

6 Conclusion

In this paper, we introduce an ODE-based Diffusion Posterior Sampling (ODE-DPS) algorithm aimed at solving inverse problems associated with partial differential equations (PDEs). The method operates within a Bayesian inversion framework, transforming the problem of solving the posterior distribution into a process of generating samples from conditional distributions. Subsequently, it utilizes the score-based diffusion model for solving. To improve the inversion accuracy of this approach, we propose the ODE-DPS algorithm and furnish a detailed mathematical exposition thereof. Additionally, we conduct numerical experiments to illustrate the efficacy of our method. We anticipate that our proposed approach will be applied in various PDEs inverse problem-solving.


Acknowledgments
Zheng Ma is supported by NSFC Grant No. 92270120 and No. 12201401. This work is partially supported by NSFC Grant No. 12031013 and the Strategic Priority Research Program of Chinese Academy of Sciences, XDA25010404. Additionally, it is also partially supported by Institute of Modern Analysis – A Shanghai Frontier Research Center.

References

  • [1] Alfio Quarteroni and Alberto Valli. Numerical approximation of partial differential equations, volume 23. Springer Science & Business Media, 2008.
  • [2] Alexandre Ern and Jean-Luc Guermond. Theory and practice of finite elements, volume 159. Springer, 2004.
  • [3] Andrew M Stuart. Inverse problems: a bayesian perspective. Acta numerica, 19:451–559, 2010.
  • [4] Curtis R Vogel. Computational methods for inverse problems. SIAM, 2002.
  • [5] Andreas Kirsch. An introduction to the mathematical theory of inverse problems, volume 120. Springer, 2011.
  • [6] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
  • [7] Gang Bao, Xiaojing Ye, Yaohua Zang, and Haomin Zhou. Numerical solution of inverse problems by weak adversarial networks. Inverse Problems, 36(11):115003, 2020.
  • [8] Riccardo Tenderini, Stefano Pagani, Alfio Quarteroni, and Simone Deparis. Pde-aware deep learning for inverse problems in cardiac electrophysiology. SIAM Journal on Scientific Computing, 44(3):B605–B639, 2022.
  • [9] Jian Sun, Kristopher Innanen, Tianze Zhang, and Daniel Trad. Implicit seismic full waveform inversion with deep neural representation. Journal of Geophysical Research: Solid Earth, 128(3):e2022JB025964, 2023.
  • [10] Tin Vlašić, Hieu Nguyen, AmirEhsan Khorashadizadeh, and Ivan Dokmanić. Implicit neural representation for mesh-free inverse obstacle scattering. In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pages 947–952. IEEE, 2022.
  • [11] Yuwei Fan and Lexing Ying. Solving electrical impedance tomography with deep learning. Journal of Computational Physics, 404:109119, 2020.
  • [12] Roberto Molinaro, Yunan Yang, Björn Engquist, and Siddhartha Mishra. Neural inverse operators for solving pde inverse problems. arXiv preprint arXiv:2301.11167, 2023.
  • [13] Oded Ovadia, Adar Kahana, Panos Stinis, Eli Turkel, and George Em Karniadakis. Vito: Vision transformer-operator. arXiv preprint arXiv:2303.08891, 2023.
  • [14] Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
  • [15] Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005, 2021.
  • [16] Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 37–49. JMLR Workshop and Conference Proceedings, 2012.
  • [17] Diederik P Kingma, Max Welling, et al. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  • [18] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018.
  • [19] Jay Whang, Erik Lindgren, and Alex Dimakis. Composing normalizing flows for inverse problems. In International Conference on Machine Learning, pages 11158–11169. PMLR, 2021.
  • [20] Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2022.
  • [21] Eloi Moliner, Jaakko Lehtinen, and Vesa Välimäki. Solving audio inverse problems with a diffusion model. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  • [22] Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, and Mauricio Delbracio. Prompt-tuning latent diffusion models for inverse problems. arXiv preprint arXiv:2310.01110, 2023.
  • [23] Andrey N. Tikhonov and Vasiliy Y. Arsenin. Solutions of ill-posed problems. V. H. Winston & Sons, 1977.
  • [24] Kazufumi Ito and Bangti Jin. Inverse problems: Tikhonov theory and algorithms, volume 22. World Scientific, 2014.
  • [25] Anatolii Borisovich Bakushinsky and M Yu Kokurin. Iterative methods for approximate solution of inverse problems, volume 577. Springer Science & Business Media, 2005.
  • [26] MT Nair. On morozov’s discrepancy principle for nonlinear ill-posed equations. Bulletin of the Australian Mathematical Society, 79(2):337–342, 2009.
  • [27] Martin Hanke, Andreas Neubauer, and Otmar Scherzer. A convergence analysis of the landweber iteration for nonlinear ill-posed problems. Numerische Mathematik, 72(1):21–37, 1995.
  • [28] Yutong Xie, Minne Yuan, Bin Dong, and Quanzheng Li. Diffusion model for generative image denoising. arXiv preprint arXiv:2302.02398, 2023.
  • [29] Ciprian Corneanu, Raghudeep Gadde, and Aleix M Martinez. Latentpaint: Image inpainting in latent space with diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4334–4343, 2024.
  • [30] Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Advances in Neural Information Processing Systems, 36, 2024.
  • [31] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  • [32] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  • [33] Bernt Øksendal and Bernt Øksendal. Stochastic differential equations. Springer, 2003.
  • [34] Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  • [35] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  • [36] Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.
  • [37] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
  • [38] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  • [39] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021.

Appendix A Training Algorithm

Below, we will provide some insights into the neural network training algorithm. To initiate the noising process (4), we fix the ending time T^=1\hat{T}=1 and set the time interval Δt=1N\Delta t=\frac{1}{N}. Then the time T^\hat{T} is uniformly divided into NN parts, denoted as ti=iΔt,i=0,1,,Nt_{i}=i\Delta t,i=0,1,\cdots,N. At each time point tit_{i}, we determine the value of β(t)\beta(t) and consider β(ti)Δt\beta(t_{i})\Delta t and 𝐱(ti)\mathbf{x}(t_{i}) as βi\beta_{i} and 𝐱i\mathbf{x}_{i}, respectively. These values satisfy the condition 0<β1<β2<<βN<10<\beta_{1}<\beta_{2}<\cdots<\beta_{N}<1. It’s worth noting that βi\beta_{i} can either be constant or learned through reparameterization. For the sake of simplicity, we assume βi\beta_{i} to be fixed. Following this, for every data point 𝐱0\mathbf{x}_{0} in the dataset, we add Gaussian noise based on the discretization of the stochastic differential equation (4) as outlined below:

𝐱i𝐱i1=12β(ti)Δt𝐱i1+β(ti)Δtzi.\mathbf{x}_{i}-\mathbf{x}_{i-1}=-\frac{1}{2}\beta(t_{i})\Delta t\mathbf{x}_{i-1}+\sqrt{\beta(t_{i})\Delta t}z_{i}.

As interval Δt1\Delta t\ll 1, we can approximate 112βi1-\frac{1}{2}\beta_{i} with 1βi\sqrt{1-\beta_{i}} (Using Taylor’s expansion) and get:

𝐱i1βi𝐱i1+βizi,\mathbf{x}_{i}\approx\sqrt{1-\beta_{i}}\mathbf{x}_{i-1}+\sqrt{\beta_{i}}z_{i},

where zi𝒩(0,I).z_{i}\sim\mathcal{N}(0,I).

From above discussion, we get the probability distributions of 𝐱i\mathbf{x}_{i} given 𝐱i1\mathbf{x}_{i-1} as:

p(𝐱i|𝐱i1)=𝒩(𝐱i;1βi𝐱i1,βiI).p(\mathbf{x}_{i}|\mathbf{x}_{i-1})=\mathcal{N}(\mathbf{x}_{i};\sqrt{1-\beta_{i}}\mathbf{x}_{i-1},\beta_{i}I).

Using the notation αi:=1βi\alpha_{i}:=1-\beta_{i} and α¯i:=j=1iαi\bar{\alpha}_{i}:=\prod_{j=1}^{i}\alpha_{i} and the same derivations as described in reference [32], we have:

p(𝐱i|𝐱0)=𝒩(𝐱i;α¯i𝐱0,(1α¯i)I).p(\mathbf{x}_{i}|\mathbf{x}_{0})=\mathcal{N}(\mathbf{x}_{i};\sqrt{\bar{\alpha}_{i}}\mathbf{x}_{0},(1-\bar{\alpha}_{i})I).

As mentioned earlier, our aim is to train a neural network sθ(𝐱(t),t)s_{\theta}(\mathbf{x}(t),t), where θ\theta represents the network parameters, to approximate 𝐱(t)logp(𝐱(t))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)). This enables us to reverse the forward process and generate the data. Naturally, we have an optimization objective:

J1=𝔼tU(0,1),𝐱(t)p(𝐱(t))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t))22].J_{1}=\mathbb{E}_{t\sim U(0,1),\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\|_{2}^{2}].

However, the precise distribution p(𝐱(t))p(\mathbf{x}(t)) is unknown and we have to construct an other objective equivalent to J1J_{1} and could be calculated. We use denoising score matching [35] to establish another objective as bellow:

J2=𝔼tU(0,1),𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t)|𝐱(0))22].J_{2}=\mathbb{E}_{t\sim U(0,1),\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}].

After discretization, p(𝐱(t)|𝐱(0))p(\mathbf{x}(t)|\mathbf{x}(0)) can be written as p(𝐱i|𝐱0),i=1,2,,Np(\mathbf{x}_{i}|\mathbf{x}_{0}),i=1,2,\cdots,N. Besides, the equivalence of J1J_{1} and J2J_{2} is provided in the B.

Then, we can utilize optimization techniques to update parameter θ\theta in order to minimize J2J_{2} until convergence, thereby obtaining the optimal solution θ\theta^{*} and the neural network sθ(𝐱(t),t)s_{\theta^{*}}(\mathbf{x}(t),t) to approximate the score 𝐱(t)p(𝐱(t))\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t)). We summarize the training process of the neural network sθ(𝐱(t),t)s_{\theta}(\mathbf{x}(t),t) in Algorithm 3.

1 Require: 𝐗,N,{βi}i=0N\mathbf{X},N,\{\beta_{i}\}_{i=0}^{N}, where 𝐗\mathbf{X} is a dataset;
2 Result: θ\theta
3 αi:=1βi,αi¯:=j=1iαi\alpha_{i}:=1-\beta_{i},\bar{\alpha_{i}}:=\prod_{j=1}^{i}\alpha_{i}
4 for 𝐱0𝐗\mathbf{x}_{0}\in\mathbf{X} do
5       for i=1i=1 to NN do
6             𝐱i𝒩(𝐱i;αi𝐱i1,βiI)\mathbf{x}_{i}\sim\mathcal{N}(\mathbf{x}_{i};\sqrt{\alpha_{i}}\mathbf{x}_{i-1},\beta_{i}I)
7             p(𝐱i|𝐱0)=𝒩(𝐱i;α¯i𝐱0,(1α¯i)I)p(\mathbf{x}_{i}|\mathbf{x}_{0})=\mathcal{N}(\mathbf{x}_{i};\sqrt{\bar{\alpha}_{i}}\mathbf{x}_{0},(1-\bar{\alpha}_{i})I)
8            
9      Repeat
10       J2=𝔼tU(ϵ,1),𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t)|𝐱(0))22]J_{2}=\mathbb{E}_{t\sim U(\epsilon,1),\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{\mathbf{x}}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}]
11       Update θ\theta with Adam optimization algorithm.
12       Until converged
Algorithm 3 Training algorithm.

Appendix B Proof of equivalence between J1J_{1} and J2J_{2}.

As shown in A:

J1\displaystyle J_{1} =𝔼tU(0,1),𝐱(t)p(𝐱(t))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t))22]\displaystyle=\mathbb{E}_{t\sim U(0,1),~{}\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\|_{2}^{2}]
J2\displaystyle J_{2} =𝔼tU(0,1),𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t)|𝐱(0))22].\displaystyle=\mathbb{E}_{t\sim U(0,1),~{}\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}].

Since tt follows the same distribution in both J1J_{1} and J2J_{2}, we only consider the equivalence of minimizing the following problem:

J^1\displaystyle\hat{J}_{1} =𝔼𝐱(t)p(𝐱(t))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t))22]\displaystyle=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\|_{2}^{2}]
J^2\displaystyle\hat{J}_{2} =𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t)𝐱(t)logp(𝐱(t)|𝐱(0))22].\displaystyle=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t),t)-\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}].

To do it, we rewrite them as:

J^1\displaystyle\hat{J}_{1} =𝔼𝐱(t)p(𝐱(t))[sθ(𝐱(t))22]S1(θ)+C1\displaystyle=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|s_{\theta}(\mathbf{x}(t))\|_{2}^{2}]-S_{1}(\theta)+C_{1}
J^2\displaystyle\hat{J}_{2} =𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t))22]S2(θ)+C2,\displaystyle=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t))\|_{2}^{2}]-S_{2}(\theta)+C_{2},

where C1=𝔼𝐱(t)p(𝐱(t))[𝐱(t)logp(𝐱(t))22]C_{1}=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\|_{2}^{2}] and C2=𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[𝐱(t)logp(𝐱(t)|𝐱(0))22]C_{2}=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\|_{2}^{2}]. Both C1C_{1} and C2C_{2} are constants that do not depend on the training parameter θ\theta. Besides,

S1(θ)\displaystyle S_{1}(\theta) =2𝔼𝐱(t)p(𝐱(t))[sθ(𝐱(t),t),𝐱(t)logp(𝐱(t))]\displaystyle=2\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\right\rangle]
S2(θ)\displaystyle S_{2}(\theta) =2𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t),𝐱(t)logp(𝐱(t)|𝐱(0))].\displaystyle=2\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\right\rangle].

We will show S1(θ)=S2(θ)S_{1}(\theta)=S_{2}(\theta) as bellow:

12S1(θ)\displaystyle\frac{1}{2}S_{1}(\theta)
=𝐱(t)p(𝐱(t))sθ(𝐱(t),t),𝐱(t)logp(𝐱(t))𝑑𝐱(t)\displaystyle=\int_{\mathbf{x}(t)}p(\mathbf{x}(t))\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t))\right\rangle d\mathbf{x}(t)
=𝐱(t)sθ(𝐱(t),t),𝐱(t)p(𝐱(t))𝑑𝐱(t)\displaystyle=\int_{\mathbf{x}(t)}\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t))\right\rangle d\mathbf{x}(t)
=𝐱(t)sθ(𝐱(t),t),𝐱(0)p(𝐱(0))𝐱(t)p(𝐱(t)|𝐱(0))𝑑𝐱(0)𝑑𝐱(t)\displaystyle=\int_{\mathbf{x}(t)}\left\langle s_{\theta}(\mathbf{x}(t),t),\int_{\mathbf{x}(0)}p(\mathbf{x}(0))\nabla_{\mathbf{x}(t)}p(\mathbf{x}(t)|\mathbf{x}(0))d\mathbf{x}(0)\right\rangle d\mathbf{x}(t)
=𝐱(t)sθ(𝐱(t),t),𝐱(0)p(𝐱(0))p(𝐱(t)|𝐱(0))𝐱(t)logp(𝐱(t)|𝐱(0))𝑑𝐱(0)𝑑𝐱(t)\displaystyle=\int_{\mathbf{x}(t)}\left\langle s_{\theta}(\mathbf{x}(t),t),\int_{\mathbf{x}(0)}p(\mathbf{x}(0))p(\mathbf{x}(t)|\mathbf{x}(0))\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))d\mathbf{x}(0)\right\rangle d\mathbf{x}(t)
=𝐱(t)𝐱(0)p(𝐱(0))p(𝐱(t)|𝐱(0))sθ(𝐱(t),t),𝐱(t)logp(𝐱(t)|𝐱(0))𝑑𝐱(0)𝑑𝐱(t)\displaystyle=\int_{\mathbf{x}(t)}\int_{\mathbf{x}(0)}p(\mathbf{x}(0))p(\mathbf{x}(t)|\mathbf{x}(0))\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\right\rangle d\mathbf{x}(0)d\mathbf{x}(t)
=𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t),t),𝐱(t)logp(𝐱(t)|𝐱(0))]\displaystyle=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\left\langle s_{\theta}(\mathbf{x}(t),t),\nabla_{\mathbf{x}(t)}\log p(\mathbf{x}(t)|\mathbf{x}(0))\right\rangle]
=12S2(θ).\displaystyle=\frac{1}{2}S_{2}(\theta).

Besides, we note that

𝔼𝐱(t)p(𝐱(t))[sθ(𝐱(t))22]=𝔼𝐱(t)p(𝐱(t)|𝐱(0)),𝐱(0)p(𝐱(0))[sθ(𝐱(t))22].\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t))}[\|s_{\theta}(\mathbf{x}(t))\|_{2}^{2}]=\mathbb{E}_{\mathbf{x}(t)\sim p(\mathbf{x}(t)|\mathbf{x}(0)),~{}\mathbf{x}(0)\sim p(\mathbf{x}(0))}[\|s_{\theta}(\mathbf{x}(t))\|_{2}^{2}].

Based on the above discussion, we have J2=J1C1+C2J_{2}=J_{1}-C_{1}+C_{2}. We have thus shown that the two optimization objectives J1J_{1} and J2J_{2} are equivalent.

Appendix C The Detailed Structure of the Neural Network

[Uncaptioned image]