This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Riemannian Diffusion Schrödinger Bridge

James Thornton Michael Hutchinson Emile Mathieu Valentin De Bortoli Yee Whye Teh Arnaud Doucet
Abstract

Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce Riemannian Diffusion Schrödinger Bridge. Our proposed method generalizes Diffusion Schrödinger Bridge introduced in (De Bortoli et al., 2021) to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.

Diffusion processes, Generative modeling, Riemannian manifold, Score-based generative models, Schrödinger bridge

1 Background

1.1 Score Based Generative Modeling

In Euclidean spaces, Score-based Generative Modeling (SGM) (Song & Ermon, 2019; Song et al., 2021b) consists of two main components. The first is a forward noising process (𝐗t)t0(\mathbf{X}_{t})_{t\geq 0} defined via the stochastic differential equation (SDE) (1) and the initial distribution 𝐗0pdata\mathbf{X}_{0}\sim p_{\textup{data}}, targeting an easy-to-sample prior ppriorp_{\textup{prior}}. The second component is a backward denoising process (𝐘t)t0=(𝐗Tt)t[0,T](\mathbf{Y}_{t})_{t\geq 0}=(\mathbf{X}_{T-t})_{t\in\left[0,T\right]} defined by the time-reversal of the noising SDE (1(Cattiaux et al., 2021; Haussmann & Pardoux, 1986) from pprior=pTp_{\textup{prior}}=p_{T} to pdata=p0p_{\textup{data}}=p_{0} . Here ff is the drift, gg is the time-dependent volatility, while (𝐁t)t0(\mathbf{B}_{t})_{t\geq 0} is a d-dimensional Brownian motion and t\mathbb{P}_{t} is the distribution of 𝐗t\mathbf{X}_{t} with corresponding density ptp_{t}. The denoising process defines a generative model by sampling 𝐘0pprior\mathbf{Y}_{0}\sim p_{\textup{prior}}

d𝐗t=f(t,𝐗t)dt+g(t)d𝐁t,d𝐘t={f(t,𝐘t)+g2(t)logpTt(𝐘t)}dt+g(t)d𝐁t.\mathrm{d}\mathbf{X}_{t}=f(t,\mathbf{X}_{t})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t},\qquad\mathrm{d}\mathbf{Y}_{t}=\{-f(t,\mathbf{Y}_{t})+g^{2}(t)\nabla\log p_{T-t}(\mathbf{Y}_{t})\}\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}. (1)

The intractable score term, logpTt(𝐘t)\nabla\log p_{T-t}(\mathbf{Y}_{t}), may be approximated by using the following score matching identity xtlogpt(xt)=dxtlogpt|0(xt|x0)p0|t(x0|xt)dx0\textstyle{\nabla_{x_{t}}\log p_{t}(x_{t})=\int_{\mathbb{R}^{d}}\nabla_{x_{t}}\log p_{t|0}(x_{t}|x_{0})~{}p_{0|t}(x_{0}|x_{t})\mathrm{d}x_{0}}, with tractable transition density pt|0p_{t|0}. This is then used to train a neural network 𝒔θ:×dd\bm{s}_{\theta}:\mathbb{R}\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} with regression objective 𝒔θ=argminθ𝔼𝐗t,𝐗0xlogpt|0(𝐗t|𝐗0)𝒔θ(t,𝐗t)2\bm{s}_{\theta^{*}}=\arg\min_{\theta}\mathbb{E}_{\mathbf{X}_{t},\mathbf{X}_{0}}\|\nabla_{x}\log p_{t|0}(\mathbf{X}_{t}|\mathbf{X}_{0})-\bm{s}_{\theta}(t,\mathbf{X}_{t})\|^{2}. Once trained, one can generate samples which are approximately distributed according to pdatap_{\textup{data}} via (1) (e.g. considering the Euler–Maruyama discretization of this SDE) by substituting 𝒔θ(t,xt)xtlogpt(xt)\bm{s}_{\theta^{*}}(t,x_{t})\approx\nabla_{x_{t}}\log p_{t}(x_{t}).

1.2 Riemannian Score Based Generative Modeling

De Bortoli et al. (2022) extended SGM to compact Riemannian manifolds, denoted \mathcal{M}, henceforth abbreviated RSGM for Riemannian Score-based Generative Modeling. Given a \mathcal{M}-valued diffusion process (2, left), where 𝐁t\mathbf{B}_{t}^{\mathcal{M}} denotes Brownian motion on \mathcal{M}; the time reversal process may be written as (2, right) (De Bortoli et al., 2022, Theorem 1)

d𝐗t\displaystyle\mathrm{d}\mathbf{X}_{t} =f(t,𝐗t)dt+g(t)d𝐁t,\displaystyle=f(t,\mathbf{X}_{t})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}^{\mathcal{M}}, d𝐘t\displaystyle\mathrm{d}\mathbf{Y}_{t} ={f(t,𝐘t)+g2(t)logpTt(𝐘t)}dt+g(t)d𝐁~t.\displaystyle=\{-f(t,\mathbf{Y}_{t})+g^{2}(t)\nabla\log p_{T-t}(\mathbf{Y}_{t})\}\mathrm{d}t+g(t)\mathrm{d}\tilde{\mathbf{B}}_{t}^{\mathcal{M}}.\vspace{-0.1cm} (2)

For a more thorough background on Riemannian geometry and time-reversal on manifolds see De Bortoli et al. (2022, App. B and G). Simulating a diffusion on a manifold is crucial to this approach. In contrast to Euclidean SGMs, closed-form sampling schemes are generally not available for diffusions on manifolds, hence approximation schemes such as the Geodesic Random Walk from Algo. 1 are required. Note that 𝐁t\mathbf{B}_{t}^{\mathcal{M}} may be simulated using Algo 1 for g=1,f=0g=1,f=0. This consists of applying an Euler–Maruyama step in the tangent space, whereby Gaussian noise has also been projected to the tangent space using the projection operator P\mathrm{P}, then projecting back to the manifold using the exponential mapping.

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionForwardForwardBackwardBackwardttt=0t=0t=Tt=TRSGMRDSB
Figure 1: Earthquake data: empirical data in red, generated samples in green using N=10N=10 diffusion steps. Top forward/backward pair: RSGM. Bottom pair: RDSB, with 55 IPF steps.
1:  Input: step size: γ\gamma , initial state X0X_{0}
2:  for k{0,,N1}k\in\{0,\dots,N-1\} do
3:     𝐙¯k+1𝒩(0,Ip),𝐙k+1=P(𝐗k)𝐙¯k+1\bar{\mathbf{Z}}_{k+1}\sim\mathcal{N}(0,I_{p}),\quad\mathbf{Z}_{k+1}=\mathrm{P}(\mathbf{X}_{k})\bar{\mathbf{Z}}_{k+1}
4:     𝐖k+1=γf(kγ,𝐗k)+γg(kγ)𝐙k+1\mathbf{W}_{k+1}=\gamma f(k\gamma,\mathbf{X}_{k})+\sqrt{\gamma}g(k\gamma)\mathbf{Z}_{k+1}
5:     𝐗k+1=exp𝐗k(𝐖k+1)\mathbf{X}_{k+1}=\exp_{\mathbf{X}_{k}}\left(\mathbf{W}_{k+1}\right)
6:  end for
7:  return {𝐗k}k=0N1\{\mathbf{X}_{k}\}_{k=0}^{N-1}
Algorithm 1 Simulating diffusions on a manifold

Score matching may be used to approximate the time-reversal (2) in the Riemannian setting (De Bortoli et al., 2022). Unlike for Euclidean SGM, logpt|0(xt|x0)\nabla\log p_{t|0}(x_{t}|x_{0}) is not available in closed-form. Instead, one may use the score identity 111Here all gradients are considered w.r.t. the Riemannian metric of \mathcal{M}. xtlogpt(xt)=xtlogpt|s(xt|xs)s|t(xt,dxs)\textstyle{\nabla_{x_{t}}\log p_{t}(x_{t})=\int_{\mathcal{M}}\nabla_{x_{t}}\log p_{t|s}(x_{t}|x_{s})\mathbb{P}_{s|t}(x_{t},\mathrm{d}x_{s})} for sts\approx t, where again \mathbb{P} is the distribution of 𝐗\mathbf{X} and ptp_{t} corresponds to the density of t\mathbb{P}_{t} with respect to the uniform distribution on \mathcal{M}, then logpt(xt)=argmin𝐬t2xlogpt|s(xt|xs)𝐬t(xt)2ds,t(xs,xt)\nabla\log p_{t}(x_{t})=\operatorname*{arg\,min}_{\mathbf{s}_{t}}\int_{\mathcal{M}^{2}}\|\nabla_{x}\log p_{t|s}(x_{t}|x_{s})-\mathbf{s}_{t}(x_{t})\|^{2}\mathrm{d}\mathbb{P}_{s,t}(x_{s},x_{t}).

1.3 The Schrödinger Bridge Problem

A Schrödinger bridge extension of SGM has been introduced to reduce the number of diffusion steps for SGM by learning both forward and backward diffusions (De Bortoli et al., 2021). We briefly recall the notion of dynamical Schrödinger bridge (Léonard, 2012; Chen et al., 2016; Vargas et al., 2021; De Bortoli et al., 2021; Chen et al., 2022). We consider a reference path probability measure 𝒫(C([0,T]),)\mathbb{P}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}) where 𝒫(C([0,T]),)\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}) is the space measures on continuous paths (𝐗t)t[0,1](\mathbf{X}_{t})_{t\in[0,1]} in \mathcal{M} . In practice, we set \mathbb{P} to be the distribution of the Brownian motion (𝐁t)t[0,T](\mathbf{B}_{t}^{\mathcal{M}})_{t\in\left[0,T\right]} initialized from pdatap_{\textup{data}}, i.e. 𝐁0\mathbf{B}_{0}^{\mathcal{M}} has distribution pdatap_{\textup{data}}. We consider the dynamical Schrödinger bridge problem

=argmin{KL(|):𝒫(C([0,T]),),0=pdata,T=pprior}.\mathbb{Q}^{\star}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}|\mathbb{P}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\ \mathbb{Q}_{0}=p_{\textup{data}},\ \mathbb{Q}_{T}=p_{\textup{prior}}\}. (3)

The idealised solution \mathbb{Q}^{\star} is called the Schrödinger Bridge (SB). Given a backward process (𝐘t)t[0,T](\mathbf{Y}_{t}^{\star})_{t\in\left[0,T\right]} associated to \mathbb{Q}^{\star}, one can obtain a generative model as follows. First sample from 𝐘Tpprior\mathbf{Y}^{\star}_{T}\sim p_{\textup{prior}} and then follow the (backward) dynamics of (𝐘t)t[0,T](\mathbf{Y}^{\star}_{t})_{t\in\left[0,T\right]}. By definition, we obtain that 𝐘0pdata\mathbf{Y}^{\star}_{0}\sim p_{\textup{data}}, the data distribution.

In practice, the solution of the SB problem is approximated using the Iterative Proportional Fitting (IPF) algorithm, which coincides with the Sinhkorn algorithm in discrete space (Sinkhorn, 1967; Peyré & Cuturi, 2019). IPF defines a sequence of path probability measures (n)n(𝒫(C([0,T],)))(\mathbb{Q}^{n})_{n\in\mathbb{N}}\in(\mathcal{P}(\mathrm{C}(\left[0,T\right],\mathcal{M})))^{\mathbb{N}}, such that 0=\mathbb{Q}^{0}=\mathbb{P} and for any nn\in\mathbb{N}

2n+1=argmin{KL(|2n):𝒫(C([0,T]),),T=pprior},\displaystyle\mathbb{Q}^{2n+1}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}|\mathbb{Q}^{2n}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\mathbb{Q}_{T}=p_{\textup{prior}}\}, (4)
2n+2=argmin{KL(|2n+1):𝒫(C([0,T]),),0=pdata}.\displaystyle\mathbb{Q}^{2n+2}=\operatorname*{arg\,min}\{\operatorname{KL}\left(\mathbb{Q}|\mathbb{Q}^{2n+1}\right)\,:\;\mathbb{Q}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}),\mathbb{Q}_{0}=p_{\textup{data}}\}. (5)

Under mild assumptions on \mathbb{P}, pdatap_{\textup{data}} and ppriorp_{\textup{prior}}, we have that (n)n(\mathbb{Q}^{n})_{n\in\mathbb{N}} converges towards \mathbb{Q}^{\star} (Nutz & Wiesel, 2022).

2 Riemannian Diffusion Schrödinger Bridge

In Euclidean state spaces, De Bortoli et al. (2021) proposed Diffusion Schrödinger Bridge (DSB), an algorithm to approximate the solution to the SB problem based on time-reversal, using score matching to approximate the IPF iterates. We propose Riemannian Diffusion Schrödinger Bridge (RDSB), an extension of DSB to approximate solutions of the SB problem for compact Riemannian manifolds.

We first connect the IPF iterates (n)n(\mathbb{Q}^{n})_{n\in\mathbb{N}} with time reversal of diffusion processes on \mathcal{M}. To simplify notation, we rewrite Equation 2 in terms of drift functions fnf^{n} and bnb^{n} in Equations 6 and 7. Here 𝐘n\mathbf{Y}^{n} corresponds to the time-reversal of 𝐗n\mathbf{X}^{n}, initialized at 𝐘Tnpprior\mathbf{Y}^{n}_{T}\sim p_{\textup{prior}}. 𝐗0\mathbf{X}^{0} is the Brownian motion on \mathcal{M}, fn=0f^{n}=0, and 𝐗n+1\mathbf{X}^{n+1} denotes the time-reversal of the diffusion 𝐘n\mathbf{Y}^{n}, initialized at 𝐗0npdata\mathbf{X}^{n}_{0}\sim p_{\textup{data}}.

d𝐗tn=fn(t,𝐗tn)dt+g(t)d𝐁t,\mathrm{d}\mathbf{X}^{n}_{t}=f^{n}(t,\mathbf{X}_{t}^{n})\mathrm{d}t+g(t)\mathrm{d}\mathbf{B}_{t}^{\mathcal{M}}, (6)
d𝐘tn=bn(Tt,𝐘tn)dt+g(t)d𝐁~t.\mathrm{d}\mathbf{Y}^{n}_{t}=b^{n}(T-t,\mathbf{Y}_{t}^{n})\mathrm{d}t+g(t)\mathrm{d}\tilde{\mathbf{B}}_{t}^{\mathcal{M}}. (7)
1:  for n{0,,L}n\in\{0,\dots,L\} do
2:     while not converged do
3:        Sample tiUniform([0,T])t_{i}\sim\textrm{Uniform}([0,T])
4:        Simulate {𝐗tii}i=0B\{\mathbf{X}^{i}_{t_{i}}\}_{i=0}^{B}, where 𝐗0ipdata\mathbf{X}^{i}_{0}\sim p_{\textup{data}}
5:        Compute ^nb(ϕn)\hat{\ell}^{b}_{n}(\phi^{n}) using (8)
6:        ϕnGradient Step(^nb(ϕn))\phi^{n}\leftarrow\textrm{Gradient Step}(\hat{\ell}^{b}_{n}(\phi^{n}))
7:     end while
8:     while not converged do
9:        Sample tiUniform([0,T])t_{i}\sim\textrm{Uniform}([0,T])
10:        Simulate {𝐘tii}i=0B\{\mathbf{Y}^{i}_{t_{i}}\}_{i=0}^{B}, where 𝐘Tipprior\mathbf{Y}^{i}_{T}\sim p_{\textup{prior}}
11:        Compute ^n+1f(θn+1)\hat{\ell}^{f}_{n+1}(\theta^{n+1}) using (9)
12:        θn+1Gradient Step(^n+1f(θn+1))\theta^{n+1}\leftarrow\textrm{Gradient Step}(\hat{\ell}^{f}_{n+1}(\theta^{n+1}))
13:     end while
14:  end for
15:  Output: (θL+1,ϕL)(\theta^{L+1},\phi^{L})
Algorithm 2 RDSB
Proposition 2.1.

Let \mathbb{P} be the path measure of the Brownian motion initialized at ppriorp_{\textup{prior}} and IPF iterates (n)n(\mathbb{Q}^{n})_{n} be as defined in Section 1.3. Assume that for any nn\in\mathbb{N}, KL(n|)<+\operatorname{KL}\left(\mathbb{Q}^{n}|\mathbb{P}\right)<+\infty and that for any t[0,T]t\in\left[0,T\right] and nn\in\mathbb{N}, tn\mathbb{Q}^{n}_{t} admits a smooth positive density w.r.t. ppriorp_{\textup{prior}}. Then, for any nn\in\mathbb{N}: 2n\mathbb{Q}^{2n} and R(2n+1)R(\mathbb{Q}^{2n+1}) solve the time-reversal for (6) and (7) respectively. R()t=TtR(\mathbb{P})_{t}=\mathbb{P}_{T-t} denotes the reverse time and for any nn\in\mathbb{N}, t[0,T]t\in\left[0,T\right] and xx\in\mathcal{M}, bn(t,x)=fn(t,x)+g(t)2logptn(x)b^{n}(t,x)=-f^{n}(t,x)+g(t)^{2}\nabla\log p^{n}_{t}(x), fn+1(t,x)=bn(t,x)+g(t)2logqtn(x)f^{n+1}(t,x)=-b^{n}(t,x)+g(t)^{2}\nabla\log q^{n}_{t}(x), with f0(t,x)=0f^{0}(t,x)=0, and ptnp^{n}_{t}, qtnq_{t}^{n} the densities of t2n\mathbb{Q}^{2n}_{t} and t2n+1\mathbb{Q}_{t}^{2n+1}.

Proof The proof follows De Bortoli et al. (2021, Proposition 6) using De Bortoli et al. (2022, Theorem 1) instead of Cattiaux et al. (2021, Theorem 4.19)

In particular, we have that 1\mathbb{Q}^{1} is the diffusion process associated with RSGM, i.e. the time-reversal of the Brownian motion initialized at ppriorp_{\textup{prior}}. Hence, 2n+1\mathbb{Q}^{2n+1} for nn\in\mathbb{N} with n1n\geq 1 can be seen as a refinement of 1\mathbb{Q}^{1}. In the next proposition, we show that the drift term of the diffusion processes associated with (n)n(\mathbb{Q}^{n})_{n\in\mathbb{N}} can be approximated leveraging score-based techniques.

Proposition 2.2.

Let (𝐗t)t[0,T](\mathbf{X}_{t})_{t\in\left[0,T\right]} be a \mathcal{M}-valued process with distribution 𝒫(C([0,T]),)\mathbb{P}\in\mathcal{P}(\mathrm{C}(\left[0,T\right]),\mathcal{M}) such that for any t[0,T]t\in\left[0,T\right], 𝐗t\mathbf{X}_{t} admits a positive density ptC()p_{t}\in\mathrm{C}^{\infty}(\mathcal{M}) w.r.t. ppriorp_{\textup{prior}}. For any t[0,T]t\in\left[0,T\right] and xx\in\mathcal{M}, let b(t,x)=f(t,x)+g(t)2logpt(x)b(t,x)=-f(t,x)+g(t)^{2}\nabla\log p_{t}(x). Then, for any t[0,T]t\in\left[0,T\right], we have that

b(t,)=argminrL2(t)𝔼[12f(t,𝐗t)+r(𝐗t)22+g(t)2div(r)(𝐗t)].b(t,\cdot)=\operatorname*{arg\,min}_{r\in\mathrm{L}^{2}(\mathbb{P}_{t})}\mathbb{E}[\tfrac{1}{2}\|f(t,\mathbf{X}_{t})+r(\mathbf{X}_{t})\|_{2}^{2}+g(t)^{2}\mathrm{div}(r)(\mathbf{X}_{t})].

Proof. For any xx\in\mathcal{M}, t[0,T]t\in\left[0,T\right], b(t,)=argminr𝒳()𝔼[r(t,𝐗t){f(t,𝐗t)+g(t)2logpt(𝐗t)}]22b(t,\cdot)=\operatorname*{arg\,min}_{r\in\mathcal{X}(\mathcal{M})}\mathbb{E}[\|r(t,\mathbf{X}_{t})-\{-f(t,\mathbf{X}_{t})+g(t)^{2}\nabla\log p_{t}(\mathbf{X}_{t})\}\|]_{2}^{2}. Expanding the quadratic, dropping terms not dependent on rr and using the divergence theorem, (see Lee, 2018, p.51), 𝔼[r(𝐗t),logpt(𝐗t)]=𝔼[div(r)(𝐗t)]\mathbb{E}[\langle r(\mathbf{X}_{t}),\nabla\log p_{t}(\mathbf{X}_{t})\rangle_{\mathcal{M}}]=-\mathbb{E}[\mathrm{div}(r)(\mathbf{X}_{t})] concludes the proof.

^nb(ϕ)\displaystyle\hat{\ell}^{b}_{n}(\phi) =i=1B12fθn(ti,xtii)+bϕn(ti,xtii))22+g2(ti)div(bϕn)(ti,xtii)\displaystyle\textstyle{=\sum_{i=1}^{B}\tfrac{1}{2}\|f^{n}_{\theta}(t_{i},x^{i}_{t_{i}})+b^{n}_{\phi}(t_{i},x^{i}_{t_{i}}))\|_{2}^{2}+g^{2}(t_{i})\mathrm{div}(b^{n}_{\phi})(t_{i},x^{i}_{t_{i}})} (8)
^nf(θ)\displaystyle\hat{\ell}^{f}_{n}(\theta) =i=1B12bϕn1(ti,xtii)+fθn(ti,xtii)22+g2(ti)div(fθn)(ti,xtii)\displaystyle\textstyle{=\sum_{i=1}^{B}\tfrac{1}{2}\|b^{n-1}_{\phi}(t_{i},x^{i}_{t_{i}})+f^{n}_{\theta}(t_{i},x^{i}_{t_{i}})\|_{2}^{2}+g^{2}(t_{i})\mathrm{div}(f^{n}_{\theta})(t_{i},x^{i}_{t_{i}})} (9)

In practice, for LL IPF steps, (fn)n=1L(f^{n})_{n=1}^{L} and (bn)n=0L(b^{n})_{n=0}^{L} are approximated using neural networks, (fθnn)n=1L(f^{n}_{\theta^{n}})_{n=1}^{L}, (bϕnn)n=1L(b^{n}_{\phi^{n}})_{n=1}^{L} with parameters (θn,ϕn)n(\theta^{n},\phi^{n})_{n}. Approximating drifts or means of the SDE is computationally cheaper than storing an evaluating 2L2L separate score networks. Proposition 2.2 provides loss functions. The divergence terms may be estimated using automatic differentiation or Hutchinson’s trace estimator (Hutchinson, 1989).

Refer to caption
(a) Earthquake
Refer to caption
(b) Fire
Refer to caption
(c) Volcano
Refer to caption
(d) Flood
Figure 2: Climate data: N=10N=10 steps, 55 IPF iterations. Red points are training data and green are generated points.

Likelihood Computation. The Ordinary Differential Equation (ODE), d𝐗^t={f(t,𝐗^t)12g(t)2logpTt(𝐗^t)}dt\mathrm{d}\hat{\mathbf{X}}_{t}=\{f(t,\hat{\mathbf{X}}_{t})-\tfrac{1}{2}g(t)^{2}\nabla\log p_{T-t}(\hat{\mathbf{X}}_{t})\}\mathrm{d}t, has the same marginal probabilities as SDE (1), in particular pdatap_{\textup{data}} and hence may be used for likelihood computation using an adaptive ODE solver and trained score (Song et al., 2021b). Similar results hold for the Schrödinger Bridge (De Bortoli et al., 2021; Chen et al., 2022) and in the Riemannian setting (De Bortoli et al., 2022). We construct the appropriate probability flow ODE using drift approximations as d𝐗^t=12{fθ(t,𝐗^t)bϕ(t,𝐗^t)}dt\mathrm{d}\hat{\mathbf{X}}_{t}=\frac{1}{2}\{f_{\theta}(t,\hat{\mathbf{X}}_{t})-b_{\phi}(t,\hat{\mathbf{X}}_{t})\}\mathrm{d}t. The computed likelihood assumes convergence of the forward noising process to be valid, in particular that pT=ppriorp_{T}=p_{\textup{prior}}. RDSB enforces this convergence.

Other acceleration methods. Leading SGM acceleration techniques (Xiao et al., 2022; Song et al., 2021a) use an implicit approach to denoising by estimating x0x_{0} from xtx_{t}, then sampling 𝐗s|xt,x0\mathbf{X}_{s}|x_{t},x_{0} for s<ts<t. These techniques are not applicable in the Riemannian setting as it is not typically possible to sample 𝐗s|xt,x0\mathbf{X}_{s}|x_{t},x_{0} for t>st>s or 𝐗0|xt\mathbf{X}_{0}|x_{t} for large jumps tst\not\approx s, t0t\not\approx 0.

3 Experiments

In each of the experiments, 44-layer, fully connected networks with hidden width 512512 are used for both fθf_{\theta}, bϕb_{\phi}. N=10N=10 diffusion steps of size 1/N1/N was used. Triangular schedule g2(t)g^{2}(t) was selected, linearly interpolated from a peak of 0.050.05 at t=T/2t=T/2 and low of 0.0010.001 at t=0,Tt=0,T. Other experimental details follow De Bortoli et al. (2022).

3.1 Earth and Climate Data

We validate RDSB on empirical distributions of occurrences of real Earth and climate science events including earthquakes (NGDC/WDS, 2022a); wild fires (EOSDIS, 2020); volcanic eruptions (NGDC/WDS, 2022b), and floods (Brakenridge, 2017). Figure 2 illustrates that generated samples are visually similar to the empirical datasets.

Close inspection of Figure 1 shows RDSB with 55 IPF iterations exhibits better convergence than RSGM (equivalent to RDSB with 11 IPF iteration) for N=10N=10 diffusion steps on the earthquake data. In particular, RDSB generates fewer outlying samples to true data samples.

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 3: Interpolation between spherical harmonic datasets (l=2,m=4)(l=2,m=4) and (l=2,m=4)(l=2,m=4) after 22 IPF iterations

3.2 Dataset Interpolation on a Manifold

RDSB enables interpolation between datasets by choosing ppriorp_{\textup{prior}} to be another dataset, and not necessary pTp_{T}, the terminal distribution of the inital noising process. Figure 3 illustrates RDSB applied to samples on the sphere taken with probability proportional to the real component of spherical harmonic function with parameters (l=2,m=4)(l=2,m=4) and (l=2,m=6)(l=2,m=6).

4 Future Directions

We introduce novel methodology for generative modeling and interpolation on compact Riemannian manifolds, which accelerates and generalizes RSGM. In future work, we will apply RDSB to more challenging settings such as interpolation in robotics and for protein modeling, see (Jing et al., 2022). From a theoretical point of view, the restriction to compact spaces allows us to leverage tools from Optimal Transport (Peyré & Cuturi, 2019) to prove the geometric convergence of RDSB to an approximation of the Schrödinger Bridge and quantify this bias.

References

  • Brakenridge (2017) Brakenridge, G. Global active archive of large flood events. http://floodobservatory.colorado.edu/Archives/index.html, 2017.
  • Cattiaux et al. (2021) Cattiaux, P., Conforti, G., Gentil, I., and Léonard, C. Time reversal of diffusion processes under a finite entropy condition. arXiv preprint arXiv:2104.07708, 2021.
  • Chen et al. (2022) Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of Schrödinger bridge using forward-backward SDEs theory. In International Conference on Learning Representations, 2022.
  • Chen et al. (2016) Chen, Y., Georgiou, T., and Pavon, M. Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM Journal on Applied Mathematics, 76(6):2375–2396, 2016.
  • De Bortoli et al. (2021) De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion Schrödinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems, 2021.
  • De Bortoli et al. (2022) De Bortoli, V., Mathieu, E., Hutchinson, M., Thornton, J., Teh, Y. W., and Doucet, A. Riemannian score-based generative modeling. arXiv preprint arXiv:2202.02763, 2022.
  • EOSDIS (2020) EOSDIS. Land, atmosphere near real-time capability for eos (lance) system operated by NASA’s earth science data and information system (esdis). https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/active-fire-data, 2020.
  • Haussmann & Pardoux (1986) Haussmann, U. G. and Pardoux, E. Time reversal of diffusions. The Annals of Probability, 14(4):1188–1205, 1986.
  • Hutchinson (1989) Hutchinson, M. F. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
  • Jing et al. (2022) Jing, B., Corso, G., Barzilay, R., and Jaakkola, T. S. Torsional diffusion for molecular conformer generation. In ICLR Workshop on Machine Learning for Drug Discovery, 2022.
  • Lee (2018) Lee, J. M. Introduction to Riemannian manifolds. Springer, 2018.
  • Léonard (2012) Léonard, C. From the Schrödinger problem to the Monge–Kantorovich problem. Journal of Functional Analysis, 262(4):1879–1920, 2012.
  • NGDC/WDS (2022a) NGDC/WDS. Ncei/wds global significant earthquake database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G012153, 2022a.
  • NGDC/WDS (2022b) NGDC/WDS. Ncei/wds global significant volcanic eruptions database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G10147, 2022b.
  • Nutz & Wiesel (2022) Nutz, M. and Wiesel, J. Stability of Schrödinger potentials and convergence of Sinkhorn’s algorithm. arXiv preprint arXiv:2201.10059, 2022.
  • Peyré & Cuturi (2019) Peyré, G. and Cuturi, M. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  • Sinkhorn (1967) Sinkhorn, R. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
  • Song et al. (2021a) Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
  • Song & Ermon (2019) Song, Y. and Ermon, S. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, 2019.
  • Song et al. (2021b) Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
  • Vargas et al. (2021) Vargas, F., Thodoroff, P., Lamacraft, A., and Lawrence, N. Solving Schrödinger bridges via maximum likelihood. Entropy, 23(9):1134, 2021.
  • Xiao et al. (2022) Xiao, Z., Kreis, K., and Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022.