Riemannian Diffusion Schrödinger Bridge
Abstract
Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce Riemannian Diffusion Schrödinger Bridge. Our proposed method generalizes Diffusion Schrödinger Bridge introduced in (De Bortoli et al., 2021) to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.
1 Background
1.1 Score Based Generative Modeling
In Euclidean spaces, Score-based Generative Modeling (SGM) (Song & Ermon, 2019; Song et al., 2021b) consists of two main components. The first is a forward noising process defined via the stochastic differential equation (SDE) (1) and the initial distribution , targeting an easy-to-sample prior . The second component is a backward denoising process defined by the time-reversal of the noising SDE (1) (Cattiaux et al., 2021; Haussmann & Pardoux, 1986) from to . Here is the drift, is the time-dependent volatility, while is a d-dimensional Brownian motion and is the distribution of with corresponding density . The denoising process defines a generative model by sampling
(1) |
The intractable score term, , may be approximated by using the following score matching identity , with tractable transition density . This is then used to train a neural network with regression objective . Once trained, one can generate samples which are approximately distributed according to via (1) (e.g. considering the Euler–Maruyama discretization of this SDE) by substituting .
1.2 Riemannian Score Based Generative Modeling
De Bortoli et al. (2022) extended SGM to compact Riemannian manifolds, denoted , henceforth abbreviated RSGM for Riemannian Score-based Generative Modeling. Given a -valued diffusion process (2, left), where denotes Brownian motion on ; the time reversal process may be written as (2, right) (De Bortoli et al., 2022, Theorem 1)
(2) |
For a more thorough background on Riemannian geometry and time-reversal on manifolds see De Bortoli et al. (2022, App. B and G). Simulating a diffusion on a manifold is crucial to this approach. In contrast to Euclidean SGMs, closed-form sampling schemes are generally not available for diffusions on manifolds, hence approximation schemes such as the Geodesic Random Walk from Algo. 1 are required. Note that may be simulated using Algo 1 for . This consists of applying an Euler–Maruyama step in the tangent space, whereby Gaussian noise has also been projected to the tangent space using the projection operator , then projecting back to the manifold using the exponential mapping.
Score matching may be used to approximate the time-reversal (2) in the Riemannian setting (De Bortoli et al., 2022). Unlike for Euclidean SGM, is not available in closed-form. Instead, one may use the score identity 111Here all gradients are considered w.r.t. the Riemannian metric of . for , where again is the distribution of and corresponds to the density of with respect to the uniform distribution on , then .
1.3 The Schrödinger Bridge Problem
A Schrödinger bridge extension of SGM has been introduced to reduce the number of diffusion steps for SGM by learning both forward and backward diffusions (De Bortoli et al., 2021). We briefly recall the notion of dynamical Schrödinger bridge (Léonard, 2012; Chen et al., 2016; Vargas et al., 2021; De Bortoli et al., 2021; Chen et al., 2022). We consider a reference path probability measure where is the space measures on continuous paths in . In practice, we set to be the distribution of the Brownian motion initialized from , i.e. has distribution . We consider the dynamical Schrödinger bridge problem
(3) |
The idealised solution is called the Schrödinger Bridge (SB). Given a backward process associated to , one can obtain a generative model as follows. First sample from and then follow the (backward) dynamics of . By definition, we obtain that , the data distribution.
In practice, the solution of the SB problem is approximated using the Iterative Proportional Fitting (IPF) algorithm, which coincides with the Sinhkorn algorithm in discrete space (Sinkhorn, 1967; Peyré & Cuturi, 2019). IPF defines a sequence of path probability measures , such that and for any
(4) | |||
(5) |
Under mild assumptions on , and , we have that converges towards (Nutz & Wiesel, 2022).
2 Riemannian Diffusion Schrödinger Bridge
In Euclidean state spaces, De Bortoli et al. (2021) proposed Diffusion Schrödinger Bridge (DSB), an algorithm to approximate the solution to the SB problem based on time-reversal, using score matching to approximate the IPF iterates. We propose Riemannian Diffusion Schrödinger Bridge (RDSB), an extension of DSB to approximate solutions of the SB problem for compact Riemannian manifolds.
We first connect the IPF iterates with time reversal of diffusion processes on . To simplify notation, we rewrite Equation 2 in terms of drift functions and in Equations 6 and 7. Here corresponds to the time-reversal of , initialized at . is the Brownian motion on , , and denotes the time-reversal of the diffusion , initialized at .
(6) |
(7) |
Proposition 2.1.
Let be the path measure of the Brownian motion initialized at and IPF iterates be as defined in Section 1.3. Assume that for any , and that for any and , admits a smooth positive density w.r.t. . Then, for any : and solve the time-reversal for (6) and (7) respectively. denotes the reverse time and for any , and , , , with , and , the densities of and .
Proof The proof follows De Bortoli et al. (2021, Proposition 6) using De Bortoli et al. (2022, Theorem 1) instead of Cattiaux et al. (2021, Theorem 4.19)
In particular, we have that is the diffusion process associated with RSGM, i.e. the time-reversal of the Brownian motion initialized at . Hence, for with can be seen as a refinement of . In the next proposition, we show that the drift term of the diffusion processes associated with can be approximated leveraging score-based techniques.
Proposition 2.2.
Let be a -valued process with distribution such that for any , admits a positive density w.r.t. . For any and , let . Then, for any , we have that
Proof. For any , , . Expanding the quadratic, dropping terms not dependent on and using the divergence theorem, (see Lee, 2018, p.51), concludes the proof.
(8) | ||||
(9) |
In practice, for IPF steps, and are approximated using neural networks, , with parameters . Approximating drifts or means of the SDE is computationally cheaper than storing an evaluating separate score networks. Proposition 2.2 provides loss functions. The divergence terms may be estimated using automatic differentiation or Hutchinson’s trace estimator (Hutchinson, 1989).




Likelihood Computation. The Ordinary Differential Equation (ODE), , has the same marginal probabilities as SDE (1), in particular and hence may be used for likelihood computation using an adaptive ODE solver and trained score (Song et al., 2021b). Similar results hold for the Schrödinger Bridge (De Bortoli et al., 2021; Chen et al., 2022) and in the Riemannian setting (De Bortoli et al., 2022). We construct the appropriate probability flow ODE using drift approximations as . The computed likelihood assumes convergence of the forward noising process to be valid, in particular that . RDSB enforces this convergence.
Other acceleration methods. Leading SGM acceleration techniques (Xiao et al., 2022; Song et al., 2021a) use an implicit approach to denoising by estimating from , then sampling for . These techniques are not applicable in the Riemannian setting as it is not typically possible to sample for or for large jumps , .
3 Experiments
In each of the experiments, -layer, fully connected networks with hidden width are used for both , . diffusion steps of size was used. Triangular schedule was selected, linearly interpolated from a peak of at and low of at . Other experimental details follow De Bortoli et al. (2022).
3.1 Earth and Climate Data
We validate RDSB on empirical distributions of occurrences of real Earth and climate science events including earthquakes (NGDC/WDS, 2022a); wild fires (EOSDIS, 2020); volcanic eruptions (NGDC/WDS, 2022b), and floods (Brakenridge, 2017). Figure 2 illustrates that generated samples are visually similar to the empirical datasets.
Close inspection of Figure 1 shows RDSB with IPF iterations exhibits better convergence than RSGM (equivalent to RDSB with IPF iteration) for diffusion steps on the earthquake data. In particular, RDSB generates fewer outlying samples to true data samples.
3.2 Dataset Interpolation on a Manifold
RDSB enables interpolation between datasets by choosing to be another dataset, and not necessary , the terminal distribution of the inital noising process. Figure 3 illustrates RDSB applied to samples on the sphere taken with probability proportional to the real component of spherical harmonic function with parameters and .
4 Future Directions
We introduce novel methodology for generative modeling and interpolation on compact Riemannian manifolds, which accelerates and generalizes RSGM. In future work, we will apply RDSB to more challenging settings such as interpolation in robotics and for protein modeling, see (Jing et al., 2022). From a theoretical point of view, the restriction to compact spaces allows us to leverage tools from Optimal Transport (Peyré & Cuturi, 2019) to prove the geometric convergence of RDSB to an approximation of the Schrödinger Bridge and quantify this bias.
References
- Brakenridge (2017) Brakenridge, G. Global active archive of large flood events. http://floodobservatory.colorado.edu/Archives/index.html, 2017.
- Cattiaux et al. (2021) Cattiaux, P., Conforti, G., Gentil, I., and Léonard, C. Time reversal of diffusion processes under a finite entropy condition. arXiv preprint arXiv:2104.07708, 2021.
- Chen et al. (2022) Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of Schrödinger bridge using forward-backward SDEs theory. In International Conference on Learning Representations, 2022.
- Chen et al. (2016) Chen, Y., Georgiou, T., and Pavon, M. Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM Journal on Applied Mathematics, 76(6):2375–2396, 2016.
- De Bortoli et al. (2021) De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion Schrödinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems, 2021.
- De Bortoli et al. (2022) De Bortoli, V., Mathieu, E., Hutchinson, M., Thornton, J., Teh, Y. W., and Doucet, A. Riemannian score-based generative modeling. arXiv preprint arXiv:2202.02763, 2022.
- EOSDIS (2020) EOSDIS. Land, atmosphere near real-time capability for eos (lance) system operated by NASA’s earth science data and information system (esdis). https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/active-fire-data, 2020.
- Haussmann & Pardoux (1986) Haussmann, U. G. and Pardoux, E. Time reversal of diffusions. The Annals of Probability, 14(4):1188–1205, 1986.
- Hutchinson (1989) Hutchinson, M. F. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
- Jing et al. (2022) Jing, B., Corso, G., Barzilay, R., and Jaakkola, T. S. Torsional diffusion for molecular conformer generation. In ICLR Workshop on Machine Learning for Drug Discovery, 2022.
- Lee (2018) Lee, J. M. Introduction to Riemannian manifolds. Springer, 2018.
- Léonard (2012) Léonard, C. From the Schrödinger problem to the Monge–Kantorovich problem. Journal of Functional Analysis, 262(4):1879–1920, 2012.
- NGDC/WDS (2022a) NGDC/WDS. Ncei/wds global significant earthquake database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G012153, 2022a.
- NGDC/WDS (2022b) NGDC/WDS. Ncei/wds global significant volcanic eruptions database. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G10147, 2022b.
- Nutz & Wiesel (2022) Nutz, M. and Wiesel, J. Stability of Schrödinger potentials and convergence of Sinkhorn’s algorithm. arXiv preprint arXiv:2201.10059, 2022.
- Peyré & Cuturi (2019) Peyré, G. and Cuturi, M. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Sinkhorn (1967) Sinkhorn, R. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
- Song et al. (2021a) Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
- Song & Ermon (2019) Song, Y. and Ermon, S. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, 2019.
- Song et al. (2021b) Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
- Vargas et al. (2021) Vargas, F., Thodoroff, P., Lamacraft, A., and Lawrence, N. Solving Schrödinger bridges via maximum likelihood. Entropy, 23(9):1134, 2021.
- Xiao et al. (2022) Xiao, Z., Kreis, K., and Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022.