Riemannian Diffusion Schrödinger Bridge

James Thornton Michael Hutchinson Emile Mathieu Valentin De Bortoli Yee Whye Teh Arnaud Doucet

Abstract

Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce Riemannian Diffusion Schrödinger Bridge. Our proposed method generalizes Diffusion Schrödinger Bridge introduced in (De Bortoli et al., 2021) to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.

Diffusion processes, Generative modeling, Riemannian manifold, Score-based generative models, Schrödinger bridge

1 Background

1.1 Score Based Generative Modeling

In Euclidean spaces, Score-based Generative Modeling (SGM) (Song & Ermon, 2019; Song et al., 2021b) consists of two main components. The first is a forward noising process $(\mathbf{X}_{t})_{t\geq 0}$ defined via the stochastic differential equation (SDE) (1) and the initial distribution $\mathbf{X}_{0}\sim p_{\textup{data}}$ , targeting an easy-to-sample prior $p_{\textup{prior}}$ . The second component is a backward denoising process $(\mathbf{Y}_{t})_{t\geq 0}=(\mathbf{X}_{T-t})_{t\in\left[0,T\right]}$ defined by the time-reversal of the noising SDE (1) (Cattiaux et al., 2021; Haussmann & Pardoux, 1986) from $p_{\textup{prior}}=p_{T}$ to $p_{\textup{data}}=p_{0}$ . Here $f$ is the drift, $g$ is the time-dependent volatility, while $(\mathbf{B}_{t})_{t\geq 0}$ is a d-dimensional Brownian motion and $\mathbb{P}_{t}$ is the distribution of $\mathbf{X}_{t}$ with corresponding density $p_{t}$ . The denoising process defines a generative model by sampling $\mathbf{Y}_{0}\sim p_{\textup{prior}}$

\mathrm{d}\mathbf{X}_{t}=f(t,\mathbf{X}_{t})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t},\qquad\mathrm{d}\mathbf{Y}_{t}=\{-f(t,\mathbf{Y}_{t})+g^{2}(t)\nabla\log p_{T-t}(\mathbf{Y}_{t})\}\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}.

(1)

The intractable score term, $\nabla\log p_{T-t}(\mathbf{Y}_{t})$ , may be approximated by using the following score matching identity $\textstyle{\nabla_{x_{t}}\log p_{t}(x_{t})=\int_{\mathbb{R}^{d}}\nabla_{x_{t}}\log p_{t|0}(x_{t}|x_{0})~{}p_{0|t}(x_{0}|x_{t})\mathrm{d}x_{0}}$ , with tractable transition density $p_{t|0}$ . This is then used to train a neural network $\bm{s}_{\theta}:\mathbb{R}\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ with regression objective $\bm{s}_{\theta^{*}}=\arg\min_{\theta}\mathbb{E}_{\mathbf{X}_{t},\mathbf{X}_{0}}\|\nabla_{x}\log p_{t|0}(\mathbf{X}_{t}|\mathbf{X}_{0})-\bm{s}_{\theta}(t,\mathbf{X}_{t})\|^{2}$ . Once trained, one can generate samples which are approximately distributed according to $p_{\textup{data}}$ via (1) (e.g. considering the Euler–Maruyama discretization of this SDE) by substituting $\bm{s}_{\theta^{*}}(t,x_{t})\approx\nabla_{x_{t}}\log p_{t}(x_{t})$ .

1.2 Riemannian Score Based Generative Modeling

De Bortoli et al. (2022) extended SGM to compact Riemannian manifolds, denoted $\mathcal{M}$ , henceforth abbreviated RSGM for Riemannian Score-based Generative Modeling. Given a $\mathcal{M}$ -valued diffusion process (2, left), where $\mathbf{B}_{t}^{\mathcal{M}}$ denotes Brownian motion on $\mathcal{M}$ ; the time reversal process may be written as (2, right) (De Bortoli et al., 2022, Theorem 1)

\displaystyle\mathrm{d}\mathbf{X}_{t}

\displaystyle=f(t,\mathbf{X}_{t})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}^{\mathcal{M}},

\displaystyle\mathrm{d}\mathbf{Y}_{t}

\displaystyle=\{-f(t,\mathbf{Y}_{t})+g^{2}(t)\nabla\log p_{T-t}(\mathbf{Y}_{t})\}\mathrm{d}t+g(t)\mathrm{d}\tilde{\mathbf{B}}_{t}^{\mathcal{M}}.\vspace{-0.1cm}

(2)

For a more thorough background on Riemannian geometry and time-reversal on manifolds see De Bortoli et al. (2022, App. B and G). Simulating a diffusion on a manifold is crucial to this approach. In contrast to Euclidean SGMs, closed-form sampling schemes are generally not available for diffusions on manifolds, hence approximation schemes such as the Geodesic Random Walk from Algo. 1 are required. Note that $\mathbf{B}_{t}^{\mathcal{M}}$ may be simulated using Algo 1 for $g=1,f=0$ . This consists of applying an Euler–Maruyama step in the tangent space, whereby Gaussian noise has also been projected to the tangent space using the projection operator $\mathrm{P}$ , then projecting back to the manifold using the exponential mapping.

Figure 1: Earthquake data: empirical data in red, generated samples in green using

N=10

diffusion steps. Top forward/backward pair: RSGM. Bottom pair: RDSB, with

5

IPF steps.

1: Input: step size:

\gamma

, initial state

X_{0}

2: for

k\in\{0,\dots,N-1\}

\bar{\mathbf{Z}}_{k+1}\sim\mathcal{N}(0,I_{p}),\quad\mathbf{Z}_{k+1}=\mathrm{P}(\mathbf{X}_{k})\bar{\mathbf{Z}}_{k+1}

\mathbf{W}_{k+1}=\gamma f(k\gamma,\mathbf{X}_{k})+\sqrt{\gamma}g(k\gamma)\mathbf{Z}_{k+1}

\mathbf{X}_{k+1}=\exp_{\mathbf{X}_{k}}\left(\mathbf{W}_{k+1}\right)

6: end for

7: return

\{\mathbf{X}_{k}\}_{k=0}^{N-1}

Algorithm 1 Simulating diffusions on a manifold

Score matching may be used to approximate the time-reversal (2) in the Riemannian setting (De Bortoli et al., 2022). Unlike for Euclidean SGM, $\nabla\log p_{t|0}(x_{t}|x_{0})$ is not available in closed-form. Instead, one may use the score identity ¹¹1Here all gradients are considered w.r.t. the Riemannian metric of $\mathcal{M}$ . $\textstyle{\nabla_{x_{t}}\log p_{t}(x_{t})=\int_{\mathcal{M}}\nabla_{x_{t}}\log p_{t|s}(x_{t}|x_{s})\mathbb{P}_{s|t}(x_{t},\mathrm{d}x_{s})}$ for $s\approx t$ , where again $\mathbb{P}$ is the distribution of $\mathbf{X}$ and $p_{t}$ corresponds to the density of $\mathbb{P}_{t}$ with respect to the uniform distribution on $\mathcal{M}$ , then $\nabla\log p_{t}(x_{t})=\operatorname*{arg\,min}_{\mathbf{s}_{t}}\int_{\mathcal{M}^{2}}\|\nabla_{x}\log p_{t|s}(x_{t}|x_{s})-\mathbf{s}_{t}(x_{t})\|^{2}\mathrm{d}\mathbb{P}_{s,t}(x_{s},x_{t})$ .

1.3 The Schrödinger Bridge Problem

A Schrödinger bridge extension of SGM has been introduced to reduce the number of diffusion steps for SGM by learning both forward and backward diffusions (De Bortoli et al., 2021). We briefly recall the notion of dynamical Schrödinger bridge (Léonard, 2012; Chen et al., 2016; Vargas et al., 2021; De Bortoli et al., 2021; Chen et al., 2022). We consider a reference path probability measure $\mathbb{P}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M})$ where $\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M})$ is the space measures on continuous paths $(\mathbf{X}_{t})_{t\in[0,1]}$ in $\mathcal{M}$ . In practice, we set $\mathbb{P}$ to be the distribution of the Brownian motion $(\mathbf{B}_{t}^{\mathcal{M}})_{t\in\left[0,T\right]}$ initialized from $p_{\textup{data}}$ , i.e. $\mathbf{B}_{0}^{\mathcal{M}}$ has distribution $p_{\textup{data}}$ . We consider the dynamical Schrödinger bridge problem

\mathbb{Q}^{\star}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}|\mathbb{P}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\ \mathbb{Q}_{0}=p_{\textup{data}},\ \mathbb{Q}_{T}=p_{\textup{prior}}\}.

(3)

The idealised solution $\mathbb{Q}^{\star}$ is called the Schrödinger Bridge (SB). Given a backward process $(\mathbf{Y}_{t}^{\star})_{t\in\left[0,T\right]}$ associated to $\mathbb{Q}^{\star}$ , one can obtain a generative model as follows. First sample from $\mathbf{Y}^{\star}_{T}\sim p_{\textup{prior}}$ and then follow the (backward) dynamics of $(\mathbf{Y}^{\star}_{t})_{t\in\left[0,T\right]}$ . By definition, we obtain that $\mathbf{Y}^{\star}_{0}\sim p_{\textup{data}}$ , the data distribution.

In practice, the solution of the SB problem is approximated using the Iterative Proportional Fitting (IPF) algorithm, which coincides with the Sinhkorn algorithm in discrete space (Sinkhorn, 1967; Peyré & Cuturi, 2019). IPF defines a sequence of path probability measures $(\mathbb{Q}^{n})_{n\in\mathbb{N}}\in(\mathcal{P}(\mathrm{C}(\left[0,T\right],\mathcal{M})))^{\mathbb{N}}$ , such that $\mathbb{Q}^{0}=\mathbb{P}$ and for any $n\in\mathbb{N}$

	$\displaystyle\mathbb{Q}^{2n+1}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}\|\mathbb{Q}^{2n}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\mathbb{Q}_{T}=p_{\textup{prior}}\},$		(4)
	$\displaystyle\mathbb{Q}^{2n+2}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}\|\mathbb{Q}^{2n+1}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\mathbb{Q}_{0}=p_{\textup{data}}\}.$		(5)

Under mild assumptions on $\mathbb{P}$ , $p_{\textup{data}}$ and $p_{\textup{prior}}$ , we have that $(\mathbb{Q}^{n})_{n\in\mathbb{N}}$ converges towards $\mathbb{Q}^{\star}$ (Nutz & Wiesel, 2022).

2 Riemannian Diffusion Schrödinger Bridge

In Euclidean state spaces, De Bortoli et al. (2021) proposed Diffusion Schrödinger Bridge (DSB), an algorithm to approximate the solution to the SB problem based on time-reversal, using score matching to approximate the IPF iterates. We propose Riemannian Diffusion Schrödinger Bridge (RDSB), an extension of DSB to approximate solutions of the SB problem for compact Riemannian manifolds.

We first connect the IPF iterates $(\mathbb{Q}^{n})_{n\in\mathbb{N}}$ with time reversal of diffusion processes on $\mathcal{M}$ . To simplify notation, we rewrite Equation 2 in terms of drift functions $f^{n}$ and $b^{n}$ in Equations 6 and 7. Here $\mathbf{Y}^{n}$ corresponds to the time-reversal of $\mathbf{X}^{n}$ , initialized at $\mathbf{Y}^{n}_{T}\sim p_{\textup{prior}}$ . $\mathbf{X}^{0}$ is the Brownian motion on $\mathcal{M}$ , $f^{n}=0$ , and $\mathbf{X}^{n+1}$ denotes the time-reversal of the diffusion $\mathbf{Y}^{n}$ , initialized at $\mathbf{X}^{n}_{0}\sim p_{\textup{data}}$ .

\mathrm{d}\mathbf{X}^{n}_{t}=f^{n}(t,\mathbf{X}_{t}^{n})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}^{\mathcal{M}},

(6)

\mathrm{d}\mathbf{Y}^{n}_{t}=b^{n}(T-t,\mathbf{Y}_{t}^{n})\mathrm{d}t+g(t)\mathrm{d}\tilde{\mathbf{B}}_{t}^{\mathcal{M}}.

(7)

1: for

n\in\{0,\dots,L\}

2: while not converged do

3: Sample

t_{i}\sim\textrm{Uniform}([0,T])

4: Simulate

\{\mathbf{X}^{i}_{t_{i}}\}_{i=0}^{B}

, where

\mathbf{X}^{i}_{0}\sim p_{\textup{data}}

5: Compute

\hat{\ell}^{b}_{n}(\phi^{n})

using (8)

\phi^{n}\leftarrow\textrm{Gradient Step}(\hat{\ell}^{b}_{n}(\phi^{n}))

7: end while

8: while not converged do

9: Sample

t_{i}\sim\textrm{Uniform}([0,T])

10: Simulate

\{\mathbf{Y}^{i}_{t_{i}}\}_{i=0}^{B}

, where

\mathbf{Y}^{i}_{T}\sim p_{\textup{prior}}

11: Compute

\hat{\ell}^{f}_{n+1}(\theta^{n+1})

using (9)

12:

\theta^{n+1}\leftarrow\textrm{Gradient Step}(\hat{\ell}^{f}_{n+1}(\theta^{n+1}))

13: end while

14: end for

15: Output:

(\theta^{L+1},\phi^{L})

Algorithm 2 RDSB

Proposition 2.1.

Let $\mathbb{P}$ be the path measure of the Brownian motion initialized at $p_{\textup{prior}}$ and IPF iterates $(\mathbb{Q}^{n})_{n}$ be as defined in Section 1.3. Assume that for any $n\in\mathbb{N}$ , $\operatorname{KL}\left(\mathbb{Q}^{n}|\mathbb{P}\right)<+\infty$ and that for any $t\in\left[0,T\right]$ and $n\in\mathbb{N}$ , $\mathbb{Q}^{n}_{t}$ admits a smooth positive density w.r.t. $p_{\textup{prior}}$ . Then, for any $n\in\mathbb{N}$ : $\mathbb{Q}^{2n}$ and $R(\mathbb{Q}^{2n+1})$ solve the time-reversal for (6) and (7) respectively. $R(\mathbb{P})_{t}=\mathbb{P}_{T-t}$ denotes the reverse time and for any $n\in\mathbb{N}$ , $t\in\left[0,T\right]$ and $x\in\mathcal{M}$ , $b^{n}(t,x)=-f^{n}(t,x)+g(t)^{2}\nabla\log p^{n}_{t}(x)$ , $f^{n+1}(t,x)=-b^{n}(t,x)+g(t)^{2}\nabla\log q^{n}_{t}(x)$ , with $f^{0}(t,x)=0$ , and $p^{n}_{t}$ , $q_{t}^{n}$ the densities of $\mathbb{Q}^{2n}_{t}$ and $\mathbb{Q}_{t}^{2n+1}$ .

Proof The proof follows De Bortoli et al. (2021, Proposition 6) using De Bortoli et al. (2022, Theorem 1) instead of Cattiaux et al. (2021, Theorem 4.19)

In particular, we have that $\mathbb{Q}^{1}$ is the diffusion process associated with RSGM, i.e. the time-reversal of the Brownian motion initialized at $p_{\textup{prior}}$ . Hence, $\mathbb{Q}^{2n+1}$ for $n\in\mathbb{N}$ with $n\geq 1$ can be seen as a refinement of $\mathbb{Q}^{1}$ . In the next proposition, we show that the drift term of the diffusion processes associated with $(\mathbb{Q}^{n})_{n\in\mathbb{N}}$ can be approximated leveraging score-based techniques.

Proposition 2.2.

Let $(\mathbf{X}_{t})_{t\in\left[0,T\right]}$ be a $\mathcal{M}$ -valued process with distribution $\mathbb{P}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M})$ such that for any $t\in\left[0,T\right]$ , $\mathbf{X}_{t}$ admits a positive density $p_{t}\in\mathrm{C}^{\infty}(\mathcal{M})$ w.r.t. $p_{\textup{prior}}$ . For any $t\in\left[0,T\right]$ and $x\in\mathcal{M}$ , let $b(t,x)=-f(t,x)+g(t)^{2}\nabla\log p_{t}(x)$ . Then, for any $t\in\left[0,T\right]$ , we have that

b(t,\cdot)=\operatorname*{arg\,min}_{r\in\mathrm{L}^{2}(\mathbb{P}_{t})}\mathbb{E}[\tfrac{1}{2}\|f(t,\mathbf{X}_{t})+r(\mathbf{X}_{t})\|_{2}^{2}+g(t)^{2}\mathrm{div}(r)(\mathbf{X}_{t})].

Proof. For any $x\in\mathcal{M}$ , $t\in\left[0,T\right]$ , $b(t,\cdot)=\operatorname*{arg\,min}_{r\in\mathcal{X}(\mathcal{M})}\mathbb{E}[\|r(t,\mathbf{X}_{t})-\{-f(t,\mathbf{X}_{t})+g(t)^{2}\nabla\log p_{t}(\mathbf{X}_{t})\}\|]_{2}^{2}$ . Expanding the quadratic, dropping terms not dependent on $r$ and using the divergence theorem, (see Lee, 2018, p.51), $\mathbb{E}[\langle r(\mathbf{X}_{t}),\nabla\log p_{t}(\mathbf{X}_{t})\rangle_{\mathcal{M}}]=-\mathbb{E}[\mathrm{div}(r)(\mathbf{X}_{t})]$ concludes the proof.

	$\displaystyle\hat{\ell}^{b}_{n}(\phi)$	$\displaystyle\textstyle{=\sum_{i=1}^{B}\tfrac{1}{2}\\|f^{n}_{\theta}(t_{i},x^{i}_{t_{i}})+b^{n}_{\phi}(t_{i},x^{i}_{t_{i}}))\\|_{2}^{2}+g^{2}(t_{i})\mathrm{div}(b^{n}_{\phi})(t_{i},x^{i}_{t_{i}})}$		(8)
	$\displaystyle\hat{\ell}^{f}_{n}(\theta)$	$\displaystyle\textstyle{=\sum_{i=1}^{B}\tfrac{1}{2}\\|b^{n-1}_{\phi}(t_{i},x^{i}_{t_{i}})+f^{n}_{\theta}(t_{i},x^{i}_{t_{i}})\\|_{2}^{2}+g^{2}(t_{i})\mathrm{div}(f^{n}_{\theta})(t_{i},x^{i}_{t_{i}})}$		(9)

In practice, for $L$ IPF steps, $(f^{n})_{n=1}^{L}$ and $(b^{n})_{n=0}^{L}$ are approximated using neural networks, $(f^{n}_{\theta^{n}})_{n=1}^{L}$ , $(b^{n}_{\phi^{n}})_{n=1}^{L}$ with parameters $(\theta^{n},\phi^{n})_{n}$ . Approximating drifts or means of the SDE is computationally cheaper than storing an evaluating $2L$ separate score networks. Proposition 2.2 provides loss functions. The divergence terms may be estimated using automatic differentiation or Hutchinson’s trace estimator (Hutchinson, 1989).

Likelihood Computation. The Ordinary Differential Equation (ODE), $\mathrm{d}\hat{\mathbf{X}}_{t}=\{f(t,\hat{\mathbf{X}}_{t})-\tfrac{1}{2}g(t)^{2}\nabla\log p_{T-t}(\hat{\mathbf{X}}_{t})\}\mathrm{d}t$ , has the same marginal probabilities as SDE (1), in particular $p_{\textup{data}}$ and hence may be used for likelihood computation using an adaptive ODE solver and trained score (Song et al., 2021b). Similar results hold for the Schrödinger Bridge (De Bortoli et al., 2021; Chen et al., 2022) and in the Riemannian setting (De Bortoli et al., 2022). We construct the appropriate probability flow ODE using drift approximations as $\mathrm{d}\hat{\mathbf{X}}_{t}=\frac{1}{2}\{f_{\theta}(t,\hat{\mathbf{X}}_{t})-b_{\phi}(t,\hat{\mathbf{X}}_{t})\}\mathrm{d}t$ . The computed likelihood assumes convergence of the forward noising process to be valid, in particular that $p_{T}=p_{\textup{prior}}$ . RDSB enforces this convergence.

Other acceleration methods. Leading SGM acceleration techniques (Xiao et al., 2022; Song et al., 2021a) use an implicit approach to denoising by estimating $x_{0}$ from $x_{t}$ , then sampling $\mathbf{X}_{s}|x_{t},x_{0}$ for $s<t$ . These techniques are not applicable in the Riemannian setting as it is not typically possible to sample $\mathbf{X}_{s}|x_{t},x_{0}$ for $t>s$ or $\mathbf{X}_{0}|x_{t}$ for large jumps $t\not\approx s$ , $t\not\approx 0$ .

3 Experiments

In each of the experiments, $4$ -layer, fully connected networks with hidden width $512$ are used for both $f_{\theta}$ , $b_{\phi}$ . $N=10$ diffusion steps of size $1/N$ was used. Triangular schedule $g^{2}(t)$ was selected, linearly interpolated from a peak of $0.05$ at $t=T/2$ and low of $0.001$ at $t=0,T$ . Other experimental details follow De Bortoli et al. (2022).

3.1 Earth and Climate Data

We validate RDSB on empirical distributions of occurrences of real Earth and climate science events including earthquakes (NGDC/WDS, 2022a); wild fires (EOSDIS, 2020); volcanic eruptions (NGDC/WDS, 2022b), and floods (Brakenridge, 2017). Figure 2 illustrates that generated samples are visually similar to the empirical datasets.

Close inspection of Figure 1 shows RDSB with $5$ IPF iterations exhibits better convergence than RSGM (equivalent to RDSB with $1$ IPF iteration) for $N=10$ diffusion steps on the earthquake data. In particular, RDSB generates fewer outlying samples to true data samples.

3.2 Dataset Interpolation on a Manifold

RDSB enables interpolation between datasets by choosing $p_{\textup{prior}}$ to be another dataset, and not necessary $p_{T}$ , the terminal distribution of the inital noising process. Figure 3 illustrates RDSB applied to samples on the sphere taken with probability proportional to the real component of spherical harmonic function with parameters $(l=2,m=4)$ and $(l=2,m=6)$ .

4 Future Directions

We introduce novel methodology for generative modeling and interpolation on compact Riemannian manifolds, which accelerates and generalizes RSGM. In future work, we will apply RDSB to more challenging settings such as interpolation in robotics and for protein modeling, see (Jing et al., 2022). From a theoretical point of view, the restriction to compact spaces allows us to leverage tools from Optimal Transport (Peyré & Cuturi, 2019) to prove the geometric convergence of RDSB to an approximation of the Schrödinger Bridge and quantify this bias.

References

Brakenridge (2017) Brakenridge, G. Global active archive of large flood events. http://floodobservatory.colorado.edu/Archives/index.html, 2017.
Cattiaux et al. (2021) Cattiaux, P., Conforti, G., Gentil, I., and Léonard, C. Time reversal of diffusion processes under a finite entropy condition. arXiv preprint arXiv:2104.07708, 2021.
Chen et al. (2022) Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of Schrödinger bridge using forward-backward SDEs theory. In International Conference on Learning Representations, 2022.
Chen et al. (2016) Chen, Y., Georgiou, T., and Pavon, M. Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM Journal on Applied Mathematics, 76(6):2375–2396, 2016.
De Bortoli et al. (2021) De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion Schrödinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems, 2021.
De Bortoli et al. (2022) De Bortoli, V., Mathieu, E., Hutchinson, M., Thornton, J., Teh, Y. W., and Doucet, A. Riemannian score-based generative modeling. arXiv preprint arXiv:2202.02763, 2022.
EOSDIS (2020) EOSDIS. Land, atmosphere near real-time capability for eos (lance) system operated by NASA’s earth science data and information system (esdis). https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/active-fire-data, 2020.
Haussmann & Pardoux (1986) Haussmann, U. G. and Pardoux, E. Time reversal of diffusions. The Annals of Probability, 14(4):1188–1205, 1986.
Hutchinson (1989) Hutchinson, M. F. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
Jing et al. (2022) Jing, B., Corso, G., Barzilay, R., and Jaakkola, T. S. Torsional diffusion for molecular conformer generation. In ICLR Workshop on Machine Learning for Drug Discovery, 2022.
Lee (2018) Lee, J. M. Introduction to Riemannian manifolds. Springer, 2018.
Léonard (2012) Léonard, C. From the Schrödinger problem to the Monge–Kantorovich problem. Journal of Functional Analysis, 262(4):1879–1920, 2012.
NGDC/WDS (2022a) NGDC/WDS. Ncei/wds global significant earthquake database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G012153, 2022a.
NGDC/WDS (2022b) NGDC/WDS. Ncei/wds global significant volcanic eruptions database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G10147, 2022b.
Nutz & Wiesel (2022) Nutz, M. and Wiesel, J. Stability of Schrödinger potentials and convergence of Sinkhorn’s algorithm. arXiv preprint arXiv:2201.10059, 2022.
Peyré & Cuturi (2019) Peyré, G. and Cuturi, M. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
Sinkhorn (1967) Sinkhorn, R. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
Song et al. (2021a) Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
Song & Ermon (2019) Song, Y. and Ermon, S. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, 2019.
Song et al. (2021b) Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
Vargas et al. (2021) Vargas, F., Thodoroff, P., Lamacraft, A., and Lawrence, N. Solving Schrödinger bridges via maximum likelihood. Entropy, 23(9):1134, 2021.
Xiao et al. (2022) Xiao, Z., Kreis, K., and Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022.