This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Verlet Flows: Exact-Likelihood Integrators for Flow-Based Generative Models

Ezra Erives, Bowen Jing, Tommi Jaakkola
CSAIL, Massachusetts Institute of Technology
{erives,bjing}@mit.edu, [email protected]
Abstract

Approximations in computing model likelihoods with continuous normalizing flows (CNFs) hinder the use of these models for importance sampling of Boltzmann distributions, where exact likelihoods are required. In this work, we present Verlet flows, a class of CNFs on an augmented state-space inspired by symplectic integrators from Hamiltonian dynamics. When used with carefully constructed Taylor-Verlet integrators, Verlet flows provide exact-likelihood generative models which generalize coupled flow architectures from a non-continuous setting while imposing minimal expressivity constraints. On experiments over toy densities, we demonstrate that the variance of the commonly used Hutchinson trace estimator is unsuitable for importance sampling, whereas Verlet flows perform comparably to full autograd trace computations while being significantly faster.

1 Introduction

Flow-based generative models—also called normalizing flows—parameterize maps from prior to data distributions via invertible transformations. An exciting application of normalizing flows is in learning the Boltzmann distributions of physical systems (Noé et al., 2019; Midgley et al., 2023; Kim et al., 2024). At inference time, these Boltzmann generators provide model likelihoods which can be used to reweigh samples towards the target energy with importance sampling. While nearly all existing Boltzmann generators are built from composing invertible layers such as coupling layers or splines, experiments on image domains suggest that continuous normalizing flows (CNFs)—which can parameterize arbitrary vector fields mapping noise to data—are far more expressive than their discrete counterparts (Chen et al., 2018; Grathwohl et al., 2018). Unfortunately, the exact model likelihood of CNFs can only be accessed through expensive trace computations and numerical integration, preventing their adoption in Boltzmann generators.

In this work, we propose Verlet flows, a flexible class of CNFs on an augmented state-space inspired by symplectic integrators from Hamiltonian dynamics. Instead of parameterizing the flow γ\gamma with a single neural network, Verlet flows instead parameterize the coefficients of the multivariate Taylor expansions of γ\gamma in both the state-space and the augmenting space. We then introduce Taylor-Verlet integrators, which exploit the splitting approximation from which many symplectic integrators are derived to approximate the intractable time evolution of γ\gamma as the composition of the tractable time evolutions of the Taylor expansion terms. At training time, Verlet flows are a subclass of CNFs, and can be trained accordingly. At inference time, Taylor-Verlet integration enables theoretically-sound importance sampling with exact likelihoods.

2 Background

Discrete Normalizing Flows

Given a source distribution π0\pi_{0} and target distribution π1\pi_{1}, we wish to learn an invertible, bijective transformation fθf_{\theta} which maps π0\pi_{0} to π1\pi_{1}. Discrete normalizing flows parameterize fθf_{\theta} as the composition fθ=fθNfθif_{\theta}=f^{N}_{\theta}\circ\dots\circ f^{i}_{\theta}, from which logπ1(fθ(x))\log\pi_{1}(f_{\theta}(x)) can be computed using the change of variables formula and the log-determinants of the Jacobians of the individual transformations fθif^{i}_{\theta}. Thus, significant effort has been dedicated to developing expressive, invertible building blocks fθif_{\theta}^{i} whose Jacobians have tractable log-determinant. Successful approaches include coupling-based flows, in which the dimensions of the state variable xx are partitioned in two, and the each half is used in turn to update the other half (Dinh et al., 2016; 2014; Müller et al., 2019; Durkan et al., 2019), and autoregressive flows (Kingma et al., 2017; Papamakarios et al., 2018). Despite these efforts, discrete normalizing flows have been shown to suffer from a lack of expressivity in practice.

Continuous Normalizing Flows

Continuous normalizing flows (CNFs) dispense with the discrete layers of normalizing flows and instead learn a time-dependent vector field γ(x,t;θ)\gamma(x,t;\theta), parameterized by a neural network, which maps the source π0\pi_{0} to a target distribution π1\pi_{1} (Chen et al., 2018; Grathwohl et al., 2018). Model densities can be accessed by the continuous-time change of variables formula given by

logπ1(x1)=logπ0(x0)01TrJγ(xt,t;θ)𝑑t,\log\pi_{1}(x_{1})=\log\pi_{0}(x_{0})-\int_{0}^{1}\operatorname{Tr}J_{\gamma}(x_{t},t;\theta)\,dt, (1)

where xt=x0+0tγ(xt,t;θ)𝑑tx_{t}=x_{0}+\int_{0}^{t}\gamma(x_{t},t;\theta)\,dt, Tr\operatorname{Tr} denotes trace, and Jγ(xt,t;θ)=γ(x,t;θ)x|xt,tJ_{\gamma}(x_{t},t;\theta)=\frac{\partial\gamma(x,t;\theta)}{\partial x}|_{x_{t},t} denotes the Jacobian. Compared to discrete normalizing flows, CNFs are not constrained by invertibility or the need for a tractable Jacobian, and therefore enjoy significantly greater expressivity.

While the trace TrJγ(xt,t;θ)\operatorname{Tr}J_{\gamma}(x_{t},t;\theta) appearing in the integrand of Equation 1 can be evaluated exactly with automatic differentiation, this grows prohibitively expensive as the dimensionality of the data grows large, as a linear number of backward-passes are required. In practice, the Hutchinson trace estimator (Grathwohl et al., 2018) is used to provide a linear-time, unbiased estimator of the trace. While cheaper, the variance of the Hutchinson estimator makes it unsuitable for importance sampling.

Symplectic Integrators and the Splitting Approximation

Leap-frog integration is a numeric method for integrating Newton’s equations of motion which involves alternatively updating qq (position) and pp (velocity) in an invertible manner not unlike augmented, coupled normalizing flows.111Closely related to leap-frog integration is Verlet integration, from which our method derives its name. Leap-frog integration is a special case of the more general family of symplectic integrators, designed for the Hamiltonian flow γH\gamma_{H} (of which the equations of motion are a special case). Oftentimes the Hamiltonian flow decomposes as γH=γq+γp\gamma_{H}=\gamma_{q}+\gamma_{p}, enabling the splitting approximation

φ(γH,τ)φ(γq,τ)φ(γp,τ)\varphi(\gamma_{H},\tau)\approx\varphi(\gamma_{q},\tau)\circ\varphi(\gamma_{p},\tau) (2)

where φ(γ,τ)\varphi(\gamma,\tau) denotes the time evolution operator along the flow γ\gamma for a duration τ\tau, and where the terms on the right-hand side of Equation 2 are possibly tractable in a way that the left-hand side is not. For example, the leap-frog integrator corresponds to analytic, invertible, and volume-preserving φ(γ{q,p},t)\varphi(\gamma_{\{q,p\}},t), whereas the original evolution may satisfy none of these properties. While Verlet flows, to be introduced in the next section, are not in general Hamiltonian, they similarly exploit the splitting approximation. A more detailed exposition of symplectic integrators and the splitting approximation can be found in Appendix A.

3 Methods

3.1 Verlet Flows

We consider the problem of mapping a source distribution π~0(q)\tilde{\pi}_{0}(q) on dq\mathbb{R}^{d_{q}} at time t=0t=0 to a target distribution π~1(q)\tilde{\pi}_{1}(q) on (dq\mathbb{R}^{d_{q}}) at time t=1t=1 by means of a time-dependent flow γ(x,t)\gamma(x,t). We will now augment this problem on the configuration-space dq\mathbb{R}^{d_{q}} by extending the distribution π~0(q)\tilde{\pi}_{0}(q) to π0(q,p)=π0(p|q)π~0(q)\pi_{0}(q,p)=\pi_{0}(p|q)\tilde{\pi}_{0}(q) and π~1(q)\tilde{\pi}_{1}(q) to π1(q,p)=π1(p|q)π~1(q)\pi_{1}(q,p)=\pi_{1}(p|q)\tilde{\pi}_{1}(q) where both πi(p|q)\pi_{i}(p|q) are given by 𝒩(p;0,Idp)\mathcal{N}(p;0,I_{d_{p}}). In analogy with Hamiltonian dynamics, we will refer to the space M=dq+dpM=\mathbb{R}^{d_{q}+d_{p}} as phase space.222Note that we do not require that dq=dpd_{q}=d_{p}.

Observe that any analytic flow γ\gamma is given (at least locally) by a multivariate Taylor expansion of the form

γ(x,t)=ddt[qp]=[γq(q,p,t)γp(q,p,t)]=[s0q(p,t)+s1q(p,t)Tq+s0p(q,t)+s1p(q,t)Tp+]=[k=0skq(p,t)(qk)k=0skp(q,t)(pk)]\gamma(x,t)=\frac{d}{dt}\begin{bmatrix}q\\ p\end{bmatrix}=\begin{bmatrix}\gamma^{q}(q,p,t)\\ \gamma^{p}(q,p,t)\end{bmatrix}=\begin{bmatrix}s_{0}^{q}(p,t)+s_{1}^{q}(p,t)^{T}q+\cdots\\ s_{0}^{p}(q,t)+s_{1}^{p}(q,t)^{T}p+\cdots\end{bmatrix}=\begin{bmatrix}\sum_{k=0}^{\infty}s_{k}^{q}(p,t)(q^{\otimes k})\\ \sum_{k=0}^{\infty}s_{k}^{p}(q,t)(p^{\otimes k})\end{bmatrix} (3)

for appropriate choices of functions siqs_{i}^{q} and sips_{i}^{p}, which we have identified in the last equality as (i,1)(i,1)-tensors: multilinear maps which take in ii copies of qTqnq\in T_{q}\mathbb{R}^{n} and return a tangent vector. While s0{q,p}s_{0}^{\{q,p\}} and s1{q,p}s_{1}^{\{q,p\}} can be thought of as vectors and matrices respectively, higher order terms do not admit particularly intuitive interpretations. Whereas traditional CNFs commonly parameterize γθ\gamma_{\theta} directly via a neural network, Verlet flows instead parameterize the coefficients sk{q,p};θs_{k}^{\{q,p\};\theta} with neural networks, allowing for Verlet integration via the splitting approximation. By parameterizing all the terms in the Taylor expansion, Verlet flows are in theory as expressive as CNFs parameterized as γ(q,p,t;θ)\gamma(q,p,t;\theta). However, in practice,we must truncate the series after some finite number of terms, yielding the order NN Verlet flow

γN(x,t;θ)[k=0Nskq(p,t;θ)(qk)k=0Nskp(q,t;θ)(pk)].\gamma_{N}(x,t;\theta)\coloneqq\begin{bmatrix}\sum_{k=0}^{N}s_{k}^{q}(p,t;\theta)(q^{\otimes k})\\ \sum_{k=0}^{N}s_{k}^{p}(q,t;\theta)(p^{\otimes k})\end{bmatrix}. (4)

In the next section, we examine how to obtain exact likelihoods from these truncated Verlet flows.

3.2 Taylor-Verlet Integrators

Denote by γkq\gamma_{k}^{q} the flow given by

γkq(x,t;θ)=[skq(p,t;θ)(qk)0]TxM,\gamma_{k}^{q}(x,t;\theta)=\begin{bmatrix}s_{k}^{q}(p,t;\theta)(q^{\otimes k})\\ 0\end{bmatrix}\in T_{x}M,

and define γkp\gamma_{k}^{p} similarly.333When there is no risk of ambiguity, we drop the subscript and refer to γN\gamma_{N} simply by γ\gamma. For any such flow γ\gamma^{\prime} on MM, denote by φ(γ,τ)\varphi^{\ddagger}(\gamma^{\prime},\tau) the time evolution operator, transporting a point xMx\in M along the flow γ\gamma^{\prime} for time τ\tau. We denote by just φ\varphi the pseudo time evolution operator given by φ(γ,τ):xtxt+tt+τγ(xs,t)𝑑s\varphi(\gamma^{\prime},\tau):x_{t}\to x_{t}+\int_{t}^{t+\tau}\gamma^{\prime}(x_{s},t)\,ds.444Justification for use of the pseudo time evolution operator φ\varphi can be found in Appendix B. Note that tt is kept constant throughout integration, an intentional choice which we shall see allows for a tractable closed form. Although our Verlet flows are not Hamiltonian, the splitting approximation from Equation 11 can be applied to Verlet flows to decompose the desired time evolution into simpler, analytic terms, yielding

φ(γ,τ)φ(γt,τ)φ(γNp,τ)φ(γNq,τ)φ(γN1p,τ)φ(γN1q,τ)φ(γ0p,τ)φ(γ0q,τ).\varphi^{\ddagger}(\gamma,\tau)\approx\varphi(\gamma_{t},\tau)\circ\varphi(\gamma^{p}_{N},\tau)\circ\varphi(\gamma^{q}_{N},\tau)\circ\varphi(\gamma^{p}_{N-1},\tau)\circ\varphi(\gamma^{q}_{N-1},\tau)\cdots\varphi(\gamma^{p}_{0},\tau)\circ\varphi(\gamma^{q}_{0},\tau). (5)

Note here that the leftmost term of the right hand side is the time-update term φ(γt,τ)\varphi(\gamma_{t},\tau). The key idea is that Equation 5 approximates the generally intractable φ(γ,τ)\varphi^{\ddagger}(\gamma,\tau) as a composition of simpler, tractable updates allowing for a closed-form, exact-likelihood integrator for Verlet flows.

The splitting approximation from Equation 5, together with closed-form expressions for the time evolution operators and their log density updates (see Figure 1), yields an integration scheme specifically tailored for Verlet flows, and which we shall refer to as a Taylor-Verlet integrator. Explicit integrators for first order and higher order Verlet flows are presented in Appendix D. One important element of the design space of Taylor-Verlet integration is the order of the terms within the splitting approximation of Equation 5, and consequently, the order of updates performed during Verlet integration. We will refer to Taylor-Verlet integrators which follow the order of Equation 5 as standard Taylor-Verlet integrators, and others as non-standard. While the remainder of this work focuses on standard Taylor-Verlet integrators, the space of non-standard Taylor-Verlet integrators is rich and requires further exploration. Certain coupling-based normalizing flow architectures, such as RealNVP (Dinh et al., 2016) can be realized as the update steps of non-standard Taylor-Verlet integrators, as is discussed in Appendix E.

3.3 Closed Form and Density Updates for Time Evolution Operators

Table 1: A summary of closed-forms for the time evolution operators φ(γkq;τ)\varphi(\gamma_{k}^{q};\tau), and their corresponding log density updates. Analogous results hold for for φ(γkp;τ)\varphi(\gamma_{k}^{p};\tau) as well.
Flow γ\gamma Operator φ(γ,τ)\varphi(\gamma,\tau) Density Update logdet|Jφ(γ,τ)|\log\det|J\varphi(\gamma,\tau)|
γ0q\gamma_{0}^{q} [qp][q+τs0q(p,t)p]\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q+\tau s_{0}^{q}(p,t)\\ p\end{bmatrix} 0
γ1q\gamma_{1}^{q} [qp][exp(τs1q(p,t))qp]\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}\exp(\tau s_{1}^{q}(p,t))q\\ p\end{bmatrix} Tr(τs1q(p,t))\operatorname{Tr}(\tau s_{1}^{q}(p,t))
γ¯kq,k>1\overline{\gamma}_{k}^{q},k>1 [qp][(q(1k)+τ(s¯kq)i(1k))(11k)p]\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}(q^{\circ(1-k)}+\tau(\overline{s}_{k}^{q})_{i}(1-k))^{\circ\left(\frac{1}{1-k}\right)}\\ p\end{bmatrix} ik1klog|qi1k+τ(1k)(s¯kq)i|klog|qi|\sum_{i}\frac{k}{1-k}\log\left|q_{i}^{1-k}+\tau(1-k)(\overline{s}_{k}^{q})_{i}\right|-k\log|q_{i}|

For each pseudo time evolution operator φ(γ{q,p}k,τ)\varphi(\gamma_{\{q,p\}}^{k},\tau), we compute its closed-form and the log-determinant of its Jacobian. Together, these allow us to implement the integrator given by Equation 5. Results are summarized in the Table 1 for γkq\gamma_{k}^{q} only, but analogous results hold for for γkp\gamma_{k}^{p} as well. Note that for terms of order k2k\geq 2, and for the sake of tractability, we restrict our attention to sparse tensors, denoted sk¯{q,p}\overline{s_{k}}^{\{q,p\}}, for which only “on-diagonal” terms are non-zero so that sk¯{q,p}(qk)\overline{s_{k}}^{\{q,p\}}(q^{\otimes k}) collapses to a simple dot product. We similarly use γ¯k{q,p}\overline{\gamma}_{k}^{\{q,p\}} to denote the corresponding flows for sparse, higher order terms. Full details and derivations can be found in Appendix C.

4 Experiments

Across all experiments in this section, and unless stated otherwise, we train an order-one Verlet flow γθ\gamma_{\theta}, with coefficients s0,1{q,p};θs_{0,1}^{\{q,p\};\theta} parameterized as a three-layer architecture with 6464 hidden units each, as a continuous normalizing flow using likelihood-based loss. Non-Verlet integration is performed numerically using a fourth-order Runge-Kutta solver for 100100 steps.

Estimation of logZ\log Z

Refer to caption
Refer to caption
Figure 1: The left graph shows estimates of the natural logarithm logZ\log Z (mean ±\pm S.D.) as a function of the number of samples. The right graph shown the time needed to make the computations in the left graph. Both graphs use 100100 integration steps.

Given an unnormalized density π^\widehat{\pi}, a common application of importance sampling is to estimate the partition function Z=π^(x)𝑑xZ=\int\widehat{\pi}(x)\,dx. Given a distribution πθ\pi_{\theta} (hopefully close to the unknown, normalized density π=π^Z\pi=\frac{\widehat{\pi}}{Z}), we obtain an unbiased estimate of ZZ via

𝔼xπθ[π^(x)πθ(x)]=d[π^(x)πθ(x)]πθ(x)𝑑x=dπ^(x)𝑑x=Z.\mathbb{E}_{x\sim\pi_{\theta}}\left[\frac{\widehat{\pi}(x)}{\pi_{\theta}(x)}\right]=\int_{\mathbb{R}^{d}}\left[\frac{\widehat{\pi}(x)}{\pi_{\theta}(x)}\right]\pi_{\theta}(x)\,dx=\int_{\mathbb{R}^{d}}\widehat{\pi}(x)\,dx=Z. (6)

We train an order-one Verlet flow γθ\gamma_{\theta} targeting a trimodal Gaussian mixture in two-dimensional qq-space, and an isotropic Gaussian 𝒩(p1;0,I2)\mathcal{N}(p_{1};0,I_{2}) in a two-dimensional pp-space. We then perform and time importance sampling using Equation 6 to estimate the natural logarithm logZ\log Z in two ways: first numerically integrating γθ\gamma_{\theta} with a fourth-order Runge-Kutta solver and using automatic differentiation to exactly compute the trace, and secondly using Taylor-Verlet integration. We find that integrating γθ\gamma_{\theta} using a Taylor-Verlet integrator performs comparably to integrating numerically while being significantly faster. Results are summarized in Figure 1.

The poor performance of the Hutchinson trace estimator can be seen in Figure 2, where we plot a histogram of the logarithm log[π^(x)πθ(x)]\log\left[\frac{\widehat{\pi}(x)}{\pi_{\theta}(x)}\right] of the importance weights for xπθ(x)x\sim\pi_{\theta}(x). The presence of just a few positive outliers (to be expected given the variance of the trace estimator) skews the resulting estimate of ZZ to be on the order of 102010^{20} or larger.

Refer to caption
Figure 2: This histogram shows log importance weights for a trimodal GMM obtained by numerically integrating the Verlet flow γθ\gamma_{\theta} using the Hutchinson trace estimator for 100100 integration steps. Positive outliers render the Hutchinson trace estimator unusable for importance sampling.

5 Conclusion

In this work, we have presented Verlet flows, a class of CNFs in an augmented state space whose flow γθ\gamma_{\theta} is parameterized via the coefficients of a multivariate Taylor expansion. The splitting approximation used by many symplectic integrators is adapted to construct exact-likelihood Taylor-Verlet integrators, which enable comparable but faster performance to numeric integration using expensive, autograd-based trace computation on tasks such as importance sampling.

6 Acknowledgements

We thank Gabriele Corso, Xiang Fu, Peter Holderrieth, Hannes Stärk, and Andrew Campbell for helpful feedback and discussion over the course of the project. We also thank the anonymous reviewers for their helpful feedback and suggestions.

References

  • Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  • Dinh et al. (2014) Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
  • Dinh et al. (2016) Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
  • Durkan et al. (2019) Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. Advances in neural information processing systems, 32, 2019.
  • Grathwohl et al. (2018) Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018.
  • Kim et al. (2024) Joseph C Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, and Mengdi Wang. Scalable normalizing flows enable boltzmann generators for macromolecules. arXiv preprint arXiv:2401.04246, 2024.
  • Kingma et al. (2017) Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improving variational inference with inverse autoregressive flow, 2017.
  • Midgley et al. (2023) Laurence I Midgley, Vincent Stimper, Javier Antorán, Emile Mathieu, Bernhard Schölkopf, and José Miguel Hernández-Lobato. Se (3) equivariant augmented coupling flows. arXiv preprint arXiv:2308.10364, 2023.
  • Müller et al. (2019) Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, and Jan Novák. Neural importance sampling, 2019.
  • Noé et al. (2019) Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
  • Papamakarios et al. (2018) George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation, 2018.
  • Yoshida (1993) Haruo Yoshida. Recent progress in the theory and application of symplectic integrators. In Qualitative and Quantitative Behaviour of Planetary Systems: Proceedings of the Third Alexander von Humboldt Colloquium on Celestial Mechanics, pp.  27–43. Springer, 1993.

Appendix A Hamiltonian Mechanics and Symplectic Integrators on Euclidean Space

Given a mechanical system with configuration space d\mathbb{R}^{d}, we may define the phase space of the system to be the cotangent bundle M=Td2dM=T^{\ast}\mathbb{R}^{d}\simeq\mathbb{R}^{2d}. Intuitively, phase space captures the intuitive notion that understanding the state of MM at a point in time requires knowledge of both the position qdq\in\mathbb{R}^{d} and the velocity, or momentum (assuming unit mass), pTdp\in T^{\ast}\mathbb{R}^{d}.

A.1 Hamiltonian Mechanics

Hamiltonian mechanics is a formulation of classical mechanics in which the equations of motion are given by differential equations describing the flow along level curves of an energy function, or Hamiltonian, (q,p)\mathcal{H}(q,p). Denote by 𝒳(M)\mathcal{X}(M) the space of smooth vector fields on MM. Then at the point (q,p)M(q,p)\in M, the Hamiltonian flow γ𝒳(M)\gamma_{\mathcal{H}}\in\mathcal{X}(M) is defined to be the unique vector field which satisfies

γTΩγ=γ\gamma_{\mathcal{H}}^{T}\Omega\gamma^{\prime}=\nabla\mathcal{H}\cdot\gamma^{\prime} (7)

for all γ𝒳(M)\gamma^{\prime}\in\mathcal{X}(M), and where

Ω=[0IdId0]\Omega=\begin{bmatrix}0&I_{d}\\ -I_{d}&0\end{bmatrix}

is the symplectic form555In our Euclidean context, a symplectic form is more generally any non-degenerate skew-symmetric bilinear form Ω\Omega^{\prime} on phase space. However, it can be shown that there always exists a change of basis which satisfies ΛΩΛ1=Ω\Lambda\Omega^{\prime}\Lambda^{-1}=\Omega, where Λ\Lambda denotes the change of basis matrix. Thus, we will only consider Ω\Omega.. Equation 7 implies γTΩ=\gamma_{\mathcal{H}}^{T}\Omega=\nabla\mathcal{H}, which yields

γ=[pq]T.\gamma_{\mathcal{H}}=\begin{bmatrix}\frac{\partial\mathcal{H}}{\partial p}&-\frac{\partial\mathcal{H}}{\partial q}\end{bmatrix}^{T}. (8)

In other words, our state (q,p)(q,p) evolves according to dqdt=p\frac{dq}{dt}=\frac{\partial\mathcal{H}}{\partial p} and dpdt=q\frac{dp}{dt}=-\frac{\partial\mathcal{H}}{\partial q}.

A.2 Properties of the Hamiltonian Flow γ\gamma_{\mathcal{H}}

The time evolution φ(γ,τ)\varphi^{\ddagger}(\gamma_{\mathcal{H}},\tau) of γ\gamma_{\mathcal{H}} satisfies two important properties: it conserves the Hamiltonian \mathcal{H}, and it conserves the symplectic form Ω\Omega.

Proposition A.1.

The flow γ\gamma_{\mathcal{H}} conserves the Hamiltonian \mathcal{H}.

Proof.

This amounts to showing that ddτφ(γ,τ)|τ=0=0\frac{d}{d\tau}\varphi^{\ddagger}(\gamma_{\mathcal{H}},\tau)|_{\tau=0}=0, which follows immediately from γ=0\nabla\mathcal{H}\cdot\gamma_{\mathcal{H}}=0. ∎

Proposition A.2.

The flow γ\gamma_{\mathcal{H}} preserves the symplectic form Ω\Omega.

Proof.

Realizing Ω\Omega as the (equivalent) two-form idqidpi\sum_{i}dq_{i}\wedge dp_{i}, the desired result amounts to showing that the Lie derivative γΩ=0\mathcal{L}_{\gamma_{\mathcal{H}}}\Omega=0. With Cartan’s formula, we find that

γΩ=d(ιγΩ)+ιγdΩ=d(ιγΩ)\displaystyle\mathcal{L}_{\gamma_{\mathcal{H}}}\Omega=d(\iota_{\gamma_{\mathcal{H}}}\Omega)+\iota_{\gamma_{\mathcal{H}}}d\Omega=d(\iota_{\gamma_{\mathcal{H}}}\Omega)

where dd denotes the exterior derivative, and ι\iota denotes the interior product. Here, we have used that dΩ=id(dqidpi)=0d\Omega=\sum_{i}d(dq_{i}\wedge dp_{i})=0. Then we compute that

d(ιγΩ)\displaystyle d(\iota_{\gamma_{\mathcal{H}}}\Omega) =d(ιγidqidpi)\displaystyle=d(\iota_{\gamma_{\mathcal{H}}}\sum_{i}dq_{i}\wedge dp_{i})
=d(ipidpi+qidqi)\displaystyle=d\left(\sum_{i}\frac{\partial\mathcal{H}}{\partial p_{i}}dp_{i}+\frac{\partial\mathcal{H}}{\partial q_{i}}dq_{i}\right)
=d(d).\displaystyle=d(d\mathcal{H}).

Since d2=0d^{2}=0, γ=d(d)=0\mathcal{L}_{\gamma_{\mathcal{H}}}=d(d\mathcal{H})=0, as desired. ∎

Flows which preserve the symplectic form Ω\Omega are known as symplectomorphisms. Proposition A.2 implies that the time evolution of γH\gamma_{H} is a symplectomorphism.

A.3 Symplectic Integrators and the Splitting Approximation

We have seen that the time-evolution of γ\gamma_{\mathcal{H}} is a symplectomorphism, and therefore preserves the symplectic structure on the phase space MM. In constructing numeric integrators for γ\gamma_{\mathcal{H}}, it is therefore desirable that our integrators are, if possible, themselves symplectomorphisms. In many cases, the Hamiltonian \mathcal{H} decomposes as the sum (q,p)=T(q)+V(p)\mathcal{H}(q,p)=T(q)+V(p). Then, at the point z=(q,p)Mz=(q,p)\in M, we find that

γT=[TpTq]=[0Tq]Tz(2)\gamma_{T}=\begin{bmatrix}\frac{\partial T}{\partial p}\\ -\frac{\partial T}{\partial q}\end{bmatrix}=\begin{bmatrix}0\\ -\frac{\partial T}{\partial{q}}\end{bmatrix}\in T_{z}(\mathbb{R}^{2})

and

γV=[VpVq]=[Vp0]Tz(2).\gamma_{V}=\begin{bmatrix}\frac{\partial V}{\partial p}\\ -\frac{\partial V}{\partial q}\end{bmatrix}=\begin{bmatrix}\frac{\partial V}{\partial{p}}\\ 0\end{bmatrix}\in T_{z}(\mathbb{R}^{2}).

Thus, the flow decomposes as well to

γ=[pq]=[VpTq]=[0Tq]+[p0]=γT+γV.\displaystyle\gamma_{\mathcal{H}}=\begin{bmatrix}\frac{\partial\mathcal{H}}{\partial p}\\ -\frac{\partial\mathcal{H}}{\partial q}\end{bmatrix}=\begin{bmatrix}\frac{\partial V}{\partial p}\\ -\frac{\partial T}{\partial q}\end{bmatrix}=\begin{bmatrix}0\\ -\frac{\partial T}{\partial q}\end{bmatrix}+\begin{bmatrix}\frac{\partial\mathcal{H}}{\partial p}\\ 0\end{bmatrix}=\gamma_{T}+\gamma_{V}.

Observe now that the respective time evolution operators are tractable and are given by

φ(γT,τ):[qp][q+τTpp]\varphi^{\ddagger}(\gamma_{T},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q+\tau\frac{\partial T}{\partial p}\\ p\end{bmatrix}

and

φ(γV,τ):[qp][qpτTq].\varphi^{\ddagger}(\gamma_{V},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q\\ p-\tau\frac{\partial T}{\partial q}\end{bmatrix}.

Since γT\gamma_{T} and γV\gamma_{V} are Hamiltonian flows their time evolutions φ(γT,τ)\varphi^{\ddagger}(\gamma_{T},\tau) and φ(γT,τ)\varphi^{\ddagger}(\gamma_{T},\tau) are both symplectomorphisms. As symplectomorphisms are closed under composition, it follows that that φ(γT,τ)φ(γV,τ)\varphi^{\ddagger}(\gamma_{T},\tau)\circ\varphi^{\ddagger}(\gamma_{V},\tau) is itself a symplectomorphism. We have thus arrived at the splitting approximation

φ(γ,τ)φ(γT,τ)φ(γV,τ).\varphi^{\ddagger}(\gamma_{\mathcal{H}},\tau)\approx\varphi^{\ddagger}(\gamma_{T},\tau)\circ\varphi^{\ddagger}(\gamma_{V},\tau). (9)

Equation 9 allows us to approximate the generally intractable, symplectic time evolution φ(γ,τ)\varphi^{\ddagger}(\gamma_{\mathcal{H}},\tau) as the symplectic composition of two simpler, tractable time evolution operators. The integration scheme given by Equation 9 is generally known as the symplectic Euler method.

So-called splitting methods make use of more general versions of the splitting approximation to derive higher order, symplectic integrators. Using the same decomposition (q,p)=T(q)+V(p)\mathcal{H}(q,p)=T(q)+V(p), and instead of considering the two-term approximation given by Equation 9, we may choose coefficients {ci}i=0N\{c_{i}\}_{i=0}^{N} and {di}i=0N\{d_{i}\}_{i=0}^{N} with ci=di=1\sum c_{i}=\sum d_{i}=1 and consider the more general splitting approximation

φ(γ,τ)φ(cNγT)φ(dNγV)φ(c0γT)φ(d0γV).\varphi^{\ddagger}(\gamma_{\mathcal{H}},\tau)\approx\varphi^{\ddagger}(c_{N}\gamma_{T})\circ\varphi^{\ddagger}(d_{N}\gamma_{V})\circ\dots\circ\varphi^{\ddagger}(c_{0}\gamma_{T})\circ\varphi^{\ddagger}(d_{0}\gamma_{V}). (10)

A more detailed exposition of higher order symplectic integrators can be found in (Yoshida, 1993).

Appendix B Justification for Treating φ(γ,τ)\varphi(\gamma,\tau)’s as Time Evolution Operators

In the following discussion, we will use xt=(qt,pt)x_{t}=(q_{t},p_{t}) for brevity. The splitting approximation from Equation 5, which we recall below as

φ(γ,τ)φ(γt,τ)φ(γNp,τ)φ(γNq,τ)φ(γ0p,τ)φ(γ0q,τ).\varphi^{\ddagger}(\gamma,\tau)\approx\varphi(\gamma_{t},\tau)\circ\varphi(\gamma^{p}_{N},\tau)\circ\varphi(\gamma^{q}_{N},\tau)\cdots\varphi(\gamma^{p}_{0},\tau)\circ\varphi(\gamma^{q}_{0},\tau). (11)

requires some clarification. Recall that while the true time evolution operator φ(γ,τ)\varphi^{\ddagger}(\gamma,\tau) is given by

φ(γ,τ):[xtt][xt+tt+τγ(xu,u)𝑑ut+τ],\varphi^{\ddagger}(\gamma,\tau):\begin{bmatrix}x_{t}\\ t\end{bmatrix}\to\begin{bmatrix}x_{t}+\int_{t}^{t+\tau}\gamma(x_{u},u)\,du\\ t+\tau\end{bmatrix}, (12)

the pseudo time operator φ(γ,τ)\varphi(\gamma,\tau) is given by

φ(γ,τ):[xtt][xt+tt+τγ(xu,t)𝑑ut],\varphi(\gamma,\tau):\begin{bmatrix}x_{t}\\ t\end{bmatrix}\to\begin{bmatrix}x_{t}+\int_{t}^{t+\tau}\gamma(x_{u},t)\,du\\ t\end{bmatrix}, (13)

where tt is kept-constant throughout the integration.

To make sense of the connection between φ\varphi^{\ddagger} and φ\varphi, we will augment our phase-time space 𝒮=dp+dq×0\mathcal{S}=\mathbb{R}^{d_{p}+d_{q}}\times\mathbb{R}_{\geq 0} (within which our points (xt,t)(x_{t},t) live), with a new ss-dimension, to obtain the space 𝒮=𝒮×0\mathcal{S}^{\prime}=\mathcal{S}\times\mathbb{R}_{\geq 0}. Treating xtx_{t} and tt as the state variables xsx_{s} and tst_{s} which evolve with ss, the flow γkq\gamma^{q}_{k} (as a representative example) on dp+dq\mathbb{R}^{d_{p}+d_{q}} can be extended to a flow γ^kq\widehat{\gamma}^{q}_{k} on 𝒮\mathcal{S} given by

γ^kq(xs,ts)=[xsstss]=[γkq(xs,ts)0]\widehat{\gamma}^{q}_{k}(x_{s},t_{s})=\begin{bmatrix}\frac{\partial x_{s}}{\partial s}\\ \frac{\partial t_{s}}{\partial s}\end{bmatrix}=\begin{bmatrix}\gamma^{q}_{k}(x_{s},t_{s})\\ 0\end{bmatrix} (14)

where the zero tst_{s}-component encodes the fact that the pseudo-time evolution φ(γkq,τ)\varphi(\gamma^{q}_{k},\tau) from Equation 13 does not change tt. The big idea is then that this pseudo time evolution φ(γkq,τ)\varphi(\gamma^{q}_{k},\tau) can be viewed as the projection of the (non-pseudo) ss-evolution φ(γ^kq,τ)\varphi^{\ddagger}(\widehat{\gamma}^{q}_{k},\tau), given by

φ(γ^kq,τ):[xstss][xs+ss+τγkq(xu,tu)𝑑uts+τs+τ],\varphi^{\ddagger}(\widehat{\gamma}^{q}_{k},\tau):\begin{bmatrix}x_{s}\\ t_{s}\\ s\end{bmatrix}\to\begin{bmatrix}x_{s}+\int_{s}^{s+\tau}\gamma^{q}_{k}(x_{u},t_{u})\,du\\ t_{s+\tau}\\ s+\tau\end{bmatrix}, (15)

onto 𝒮\mathcal{S}. The equivalency follows from the fact that for γ^kq\widehat{\gamma}^{q}_{k}, ts+τ=tst_{s+\tau^{\prime}}=t_{s} for τ[0,τ]\tau^{\prime}\in[0,\tau]. A similar statement can be made about the tt-update γt\gamma_{t} from Equation 11.

Denoting by Proj:𝒮𝒮\operatorname{Proj}:\mathcal{S^{\prime}}\to\mathcal{S} the projection onto 𝒮\mathcal{S}, we see that the splitting approximating using pseudo-time operators from Equation 11 can be rewritten as the projection onto SS of an analogous splitting approximation using non-pseudo ss-evolution operators, viz.,

Projφ(γ^,τ)Proj[φ(γ^t,τ)φ(γ^Np,τ)φ(γ^Nq,τ)φ(γ^0p,τ)φ(γ^0q,τ)].\operatorname{Proj}\varphi^{\ddagger}(\widehat{\gamma},\tau)\approx\operatorname{Proj}\left[\varphi^{\ddagger}(\widehat{\gamma}_{t},\tau)\circ\varphi^{\ddagger}(\widehat{\gamma}^{p}_{N},\tau)\circ\varphi^{\ddagger}(\widehat{\gamma}^{q}_{N},\tau)\cdots\varphi^{\ddagger}(\widehat{\gamma}^{p}_{0},\tau)\circ\varphi^{\ddagger}(\widehat{\gamma}^{q}_{0},\tau)\right]. (16)

Appendix C Derivation of Time Evolution Operators and Their Jacobians

Order Zero Terms.

For order k=0k=0, recall that

γ0q(x)=[s0q(p,t)(q0)0]=[s0q(p,t)0],\gamma^{q}_{0}(x)=\begin{bmatrix}s_{0}^{q}(p,t)(q^{\otimes 0})\\ 0\end{bmatrix}=\begin{bmatrix}s_{0}^{q}(p,t)\\ 0\end{bmatrix},

so that the operator φ(γq0,τ)\varphi(\gamma_{q}^{0},\tau) is given by

φ(γ0q,τ):[qpt][q+τs0q(p,t)pt]\varphi(\gamma^{q}_{0},\tau):\begin{bmatrix}q\\ p\\ t\end{bmatrix}\to\begin{bmatrix}q+\tau s_{0}^{q}(p,t)\\ p\\ t\end{bmatrix} (17)

with Jacobian J0qJ_{0}^{q} given by

J0q=[Idqτ(s0qp)Tτ(s0qt)T0Idp0001].J_{0}^{q}=\begin{bmatrix}I_{d_{q}}&\tau(\frac{\partial s_{0}^{q}}{\partial p})^{T}&\tau(\frac{\partial s_{0}^{q}}{\partial t})^{T}\\ 0&I_{d_{p}}&0\\ 0&0&1\end{bmatrix}. (18)

The analysis for s0ps_{0}^{p} is nearly identical, and we omit it.

Order One Terms.

For k=1k=1, we recall that

γ1q(x)=[s1q(p,t)(q1)00]=[s1q(p,t)Tq00].\gamma^{q}_{1}(x)=\begin{bmatrix}s_{1}^{q}(p,t)(q^{\otimes 1})\\ 0\\ 0\end{bmatrix}=\begin{bmatrix}s_{1}^{q}(p,t)^{T}q\\ 0\\ 0\end{bmatrix}. (19)

Then the time evolution operator φ(γ1q,τ)\varphi(\gamma^{q}_{1},\tau) is given by

φ(γ1q,τ):[qpt][exp(τs1q(p,t))qpt]\varphi(\gamma^{q}_{1},\tau):\begin{bmatrix}q\\ p\\ t\end{bmatrix}\to\begin{bmatrix}\exp(\tau s_{1}^{q}(p,t))q\\ p\\ t\end{bmatrix} (20)

and the Jacobian J1qJ_{1}^{q} is simply given by

J1q=[exp(τs1q(p,t))0Idp0001]J^{q}_{1}=\begin{bmatrix}\exp(\tau s_{1}^{q}(p,t))&\cdots&\cdots\\ 0&I_{d_{p}}&0\\ 0&0&1\end{bmatrix} (21)

Then logdet(Jq1)=logdet(exp(τa1(p,t)))=logexp(Tr(τa1(p,t)))=Tr(τa1(p,t))\log\det(J_{q}^{1})=\log\det(\exp(\tau a_{1}(p,t)))=\log\exp(\operatorname{Tr}(\tau a_{1}(p,t)))=\operatorname{Tr}(\tau a_{1}(p,t)).

Sparse Higher Order Terms.

For k>1k>1, we consider only sparse tensors given by the simple dot product

s¯kq(qk)=i(s¯kq)iqik=(s¯kq(qk))Tqk\overline{s}_{k}^{q}(q^{\otimes k})=\sum_{i}\left(\overline{s}_{k}^{q}\right)_{i}q_{i}^{k}=\left(\overline{s}_{k}^{q}(q^{\otimes k})\right)^{T}q^{\circ k}

where qkq^{\circ k} denotes the element-wise kk-th power of qq. Then the qq-component of time evolution operator γ¯kq\overline{\gamma}_{k}^{q} is given component-wise by an ODE of the form dqdt=skq(p,t)qk\frac{dq}{dt}=s_{k}^{q}(p,t)q^{k}, whose solution is obtained in closed form via rearranging to the equivalent form

qtqt+τ1s¯kq(p,t)qk𝑑q=tt+τ𝑑t=τ.\int_{q_{t}}^{q_{t+\tau}}\frac{1}{\overline{s}_{k}^{q}(p,t)}q^{-k}\,dq=\int_{t}^{t+\tau}\,dt=\tau.

Then it follows that qt+τq_{t+\tau} is given component-wise by (qt,i1k+τs¯kq(p,t)i(1k))11k(q_{t,i}^{1-k}+\tau\overline{s}_{k}^{q}(p,t)_{i}(1-k))^{\frac{1}{1-k}}. Thus, the operator φ(γ¯kq,τ)\varphi(\overline{\gamma}_{k}^{q},\tau) is given by

φ(γ¯kq,τ):[qpt][(q(1k)+τs¯kq(p,t)(1k))(11k)pt].\varphi(\overline{\gamma}_{k}^{q},\tau):\begin{bmatrix}q\\ p\\ t\end{bmatrix}\to\begin{bmatrix}\left(q^{\circ(1-k)}+\tau\overline{s}_{k}^{q}(p,t)(1-k)\right)^{\circ\left(\frac{1}{1-k}\right)}\\ p\\ t\end{bmatrix}. (22)

The Jacobian is then given by

Jkq=[diag(qk(q(1k)+τs¯kq(p,t)(1k))(11k1))0Idp0001]J^{q}_{k}=\begin{bmatrix}\operatorname{diag}\left(q^{-k}\left(q^{\circ(1-k)}+\tau\overline{s}_{k}^{q}(p,t)(1-k)\right)^{\circ\left(\frac{1}{1-k}-1\right)}\right)&\cdots&\cdots\\ 0&I_{d_{p}}&0\\ 0&0&1\end{bmatrix} (23)

with logdet|Jkq|\log\det|J_{k}^{q}| given by

logdetdiag|qk(q(1k)+τs¯kq(p,t)(1k))(k1k)|=ik1klog|qi1kτskq(p,t)i(1k)|klog|qi|.\log\det\operatorname{diag}\left|q^{\circ-k}\left(q^{\circ(1-k)}+\tau\overline{s}_{k}^{q}(p,t)(1-k)\right)^{\circ\left(\frac{k}{1-k}\right)}\right|=\sum_{i}\frac{k}{1-k}\log|q_{i}^{1-k}-\tau s_{k}^{q}(p,t)_{i}(1-k)|-k\log|q_{i}|.

Appendix D Explicit Descriptions of Taylor-Verlet Integrators

Taylor-Verlet integrators are constructed using the splitting approximation given in Equation 5 of an order NN Verlet flow γθ\gamma_{\theta}, which we recall below as

φ(γ,τ)φ(γt,τ)φ(γNp,τ)φ(γNq,τ)φ(γ0p,τ)φ(γ0q,τ).\varphi^{\ddagger}(\gamma,\tau)\approx\varphi(\gamma_{t},\tau)\circ{\color[rgb]{.5,0,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,0,.5}\varphi(\gamma^{p}_{N},\tau)}\circ{\color[rgb]{0,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{0,.5,.5}\varphi(\gamma^{q}_{N},\tau)}\cdots{\color[rgb]{.5,0,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,0,.5}\varphi(\gamma^{p}_{0},\tau)}\circ{\color[rgb]{0,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{0,.5,.5}\varphi(\gamma^{q}_{0},\tau)}. (24)

The standard Taylor-Verlet integrator of an order NN Verlet flow γθ\gamma_{\theta} is given explicitly in Algorithm 1 below.

Algorithm 1 Integration of order NN Verlet flow
1:procedure OrderNVerletIntegrate(q,p,t0,t1,steps,γθq,p,t_{0},t_{1},\text{steps},\gamma_{\theta}, NN)
2:     τt1t0steps\tau\leftarrow\frac{t_{1}-t_{0}}{\text{steps}}, tt0t\leftarrow t_{0}
3:     Δlogp=0\Delta\log p=0 \triangleright Change in log density.
4:     s0q,s0p,sNq,sNpγθs_{0}^{q},s_{0}^{p},\dots s_{N}^{q},s_{N}^{p}\leftarrow\gamma_{\theta}
5:     while t<t1t<t_{1} do
6:         k0k\leftarrow 0
7:         while kNk\leq N do
8:              qφ(γkq;θ,τ)q\leftarrow\varphi(\gamma_{k}^{q;\theta},\tau) \triangleright qq-update.
9:              ΔlogpΔlogplogdetJφ(γkq;θ,τ)\Delta\log p\leftarrow\Delta\log p-\log\det J\varphi(\gamma_{k}^{q;\theta},\tau)
10:              pφ(γkp;θ,τ)p\leftarrow\varphi(\gamma_{k}^{p;\theta},\tau)\triangleright pp-update.
11:              ΔlogpΔlogplogdetJφ(γkp;θ,τ)\Delta\log p\leftarrow\Delta\log p-\log\det J\varphi(\gamma_{k}^{p;\theta},\tau)
12:              kk+1k\leftarrow k+1          
13:         tt+τt\leftarrow t+\tau      
14:     returnq,p,Δlogp\textbf{return}\,\,q,p,\Delta\log p

Closed-form expressions for the time evolution operators γkq;θ,τ)\gamma_{k}^{q;\theta},\tau) and log density updates logdetJφ(γkq;θ,τ)\log\det J\varphi(\gamma_{k}^{q;\theta},\tau) can be found in Table 1. Algorithm 2 details explicitly standard Taylor-Verlet integration of an order one Verlet flow.

Algorithm 2 Integration of order one Verlet flow
1:procedure OrderOneVerletIntegrate(q,p,t0,t1,steps,γθq,p,t_{0},t_{1},\text{steps},\gamma_{\theta})
2:     τt1t0steps\tau\leftarrow\frac{t_{1}-t_{0}}{\text{steps}}, tt0t\leftarrow t_{0}
3:     Δlogp=0\Delta\log p=0 \triangleright Change in log density.
4:     s0q,s0p,s1q,s1pγθs_{0}^{q},s_{0}^{p},s_{1}^{q},s_{1}^{p}\leftarrow\gamma_{\theta}
5:     while t<t1t<t_{1} do
6:         qq+τs0q(p,t;θ),q\leftarrow q+\tau s_{0}^{q}(p,t;\theta), \triangleright Apply equation 17
7:         pp+τs0p(q,t;θ)p\leftarrow p+\tau s_{0}^{p}(q,t;\theta) \triangleright Apply equation 17
8:         qexp(τs1q(p,t;θ))qq\leftarrow\exp(\tau s_{1}^{q}(p,t;\theta))q\triangleright Apply equation 20
9:         ΔlogpΔlogpTr(τs1q(p,t;θ))\Delta\log p\leftarrow\Delta\log p-\operatorname{Tr}(\tau s_{1}^{q}(p,t;\theta)) \triangleright Apply equation 23
10:         pexp(τs1p(q,t;θ))pp\leftarrow\exp(\tau s_{1}^{p}(q,t;\theta))p\triangleright Apply equation 20
11:         ΔlogpΔlogpTr(τs1p(q,t;θ))\Delta\log p\leftarrow\Delta\log p-\operatorname{Tr}(\tau s_{1}^{p}(q,t;\theta)) \triangleright Apply equation 23
12:         tt+τt\leftarrow t+\tau      
13:     returnq,p,Δlogp\textbf{return}\,\,q,p,\Delta\log p

Appendix E Realizing Coupling Architectures as Verlet Integrators

In this section, we will show that two coupling-based normalizing flow architectures - NICE (Dinh et al. (2014)) and RealNVP (Dinh et al. (2016)) - can be realized as the Taylor-Verlet integrators for zero and first order Verlet flows respectively. Specifically, for each such coupling layer architecture fθf_{\theta}, we may construct a Verlet flow γθ\gamma_{\theta} whose Taylor-Verlet integrator is given by successive applications of fθf_{\theta}.

Additive Coupling Layers

The additive coupling layers of NICE involve updates of the form

fθq(q,p)\displaystyle f_{\theta}^{q}(q,p) =concat(q+tθq(p),p),\displaystyle=\operatorname{concat}(q+t^{q}_{\theta}(p),p),
fθp(q,p)\displaystyle f_{\theta}^{p}(q,p) =concat(q,p+tθp(q)).\displaystyle=\operatorname{concat}(q,p+t^{p}_{\theta}(q)).

Now consider the order zero Verlet flow γθ\gamma_{\theta} given by

yθ=1τ[t~θq(p,t)t~θp(q,t)],y_{\theta}=\frac{1}{\tau}\begin{bmatrix}\tilde{t}_{\theta}^{q}(p,t)\\ \tilde{t}_{\theta}^{p}(q,t)\end{bmatrix},

where t~θq(x,t)tθq(x)\tilde{t}_{\theta}^{q}(x,t)\triangleq t_{\theta}^{q}(x) and t~θp(x,t)tθp(x)\tilde{t}_{\theta}^{p}(x,t)\triangleq t_{\theta}^{p}(x). Then the standard Taylor-Verlet integrator with step size τ\tau is given by the splitting approximation

φ(γθ,τ)φ(γt,τ)φ(γp0;θ,τ)φ(γq0;θ,τ)\varphi^{\ddagger}(\gamma_{\theta},\tau)\approx\varphi(\gamma_{t},\tau)\circ\varphi(\gamma_{p}^{0;\theta},\tau)\circ\varphi(\gamma_{q}^{0;\theta},\tau)

with updates given by

φ(γq0;θ,τ):[qp][q+(τ)(1τt~θq(p,t))p]=[q+tθ(p)p]\varphi(\gamma_{q}^{0;\theta},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q+(\tau)\left(\frac{1}{\tau}\tilde{t}_{\theta}^{q}(p,t)\right)\\ p\end{bmatrix}=\begin{bmatrix}q+t_{\theta}(p)\\ p\end{bmatrix}

and

φ(γp0;θ,τ):[qp][qp+(τ)(1τt~θp(q,t))]=[qp+tθ(q)].\varphi(\gamma_{p}^{0;\theta},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q\\ p+(\tau)\left(\frac{1}{\tau}\tilde{t}_{\theta}^{p}(q,t)\right)\end{bmatrix}=\begin{bmatrix}q\\ p+t_{\theta}(q)\\ \end{bmatrix}.

Thus, fθq=φ(γq0;θ,τ)f_{\theta}^{q}=\varphi(\gamma_{q}^{0;\theta},\tau) and fθq=φ(γq0;θ,τ)f_{\theta}^{q}=\varphi(\gamma_{q}^{0;\theta},\tau).

RealNVP

The coupling layers of RealNVP are of the form

fθq(q,p)\displaystyle f_{\theta}^{q}(q,p) =concat(qexp(sθq(p))+tθq(p),p),\displaystyle=\operatorname{concat}(q\odot\exp(s^{q}_{\theta}(p))+t_{\theta}^{q}(p),p),
fθp(q,p)\displaystyle f_{\theta}^{p}(q,p) =concat(q,pexp(sθp(q))+tθp(q).\displaystyle=\operatorname{concat}(q,p\odot\exp(s_{\theta}^{p}(q))+t_{\theta}^{p}(q).

Now consider the first order Verlet flow γθ\gamma_{\theta} given by

γθ=[t~θq+(s~θq)Tqt~θp+(s~θp)Tp],\gamma_{\theta}=\begin{bmatrix}\tilde{t}_{\theta}^{q}+\left(\tilde{s}_{\theta}^{q}\right)^{T}q\\ \tilde{t}_{\theta}^{p}+\left(\tilde{s}_{\theta}^{p}\right)^{T}p\end{bmatrix},

where s~θq(p,t)1τdiag(sθq(p))\tilde{s}_{\theta}^{q}(p,t)\coloneqq\tfrac{1}{\tau}\operatorname{diag}(s_{\theta}^{q}(p)),

t~θq(p,t)tθq(p)τexp(τs~θq(p)),\tilde{t}_{\theta}^{q}(p,t)\coloneqq\frac{t_{\theta}^{q}(p)}{\tau\exp(\tau\tilde{s}_{\theta}^{q}(p))},

and s~θp\tilde{s}_{\theta}^{p} and t~θp\tilde{t}_{\theta}^{p} are defined analogously. Then a non-standard Taylor-Verlet integrator is obtained from the splitting approximation

φ(γθ,τ)φ(γt,τ)φ(γp1;θ,τ)φ(γp0;θ,τ)φ(γq1;θ,τ)φ(γq0;θ,τ)\varphi^{\ddagger}(\gamma_{\theta},\tau)\approx\varphi(\gamma_{t},\tau)\circ\varphi(\gamma_{p}^{1;\theta},\tau)\circ\varphi(\gamma_{p}^{0;\theta},\tau)\circ\varphi(\gamma_{q}^{1;\theta},\tau)\circ\varphi(\gamma_{q}^{0;\theta},\tau)

where the order has been rearranged from that of Equation 5 to group together the γq\gamma^{q} and γp\gamma^{p} terms. The time evolution operators φ(γq0;θ,τ)\varphi(\gamma_{q}^{0;\theta},\tau) and φ(γq1;θ,τ)\varphi(\gamma_{q}^{1;\theta},\tau) are given by

φ(γq0;θ,τ):[qp][q+τt~θq(p,t)p]=[q+tθq(p)exp(τs~θq(p,t))p]\varphi(\gamma_{q}^{0;\theta},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}q+\tau\tilde{t}_{\theta}^{q}(p,t)\\ p\end{bmatrix}=\begin{bmatrix}q+\frac{t_{\theta}^{q}(p)}{\exp(\tau\tilde{s}_{\theta}^{q}(p,t))}\\ p\end{bmatrix}

and

φ(γq1;θ,τ):[qp][exp(τs~θq(p,t))Tqp].\varphi(\gamma_{q}^{1;\theta},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}\exp(\tau\tilde{s}_{\theta}^{q}(p,t))^{T}q\\ p\end{bmatrix}.

So that the combined qq-update φ(γq1;θ,τ)φ(γq0;θ,τ)\varphi(\gamma_{q}^{1;\theta},\tau)\circ\varphi(\gamma_{q}^{0;\theta},\tau) is given by

φ(γq1;θ,τ)φ(γq0;θ,τ):[qp][exp(τs~θq(p,t))Tq+tθq(p)p]=[exp(diag(sθq(p))Tq+tθq(p)p]\varphi(\gamma_{q}^{1;\theta},\tau)\circ\varphi(\gamma_{q}^{0;\theta},\tau):\begin{bmatrix}q\\ p\end{bmatrix}\to\begin{bmatrix}\exp(\tau\tilde{s}_{\theta}^{q}(p,t))^{T}q+t_{\theta}^{q}(p)\\ p\end{bmatrix}=\begin{bmatrix}\exp(\operatorname{diag}(s_{\theta}^{q}(p))^{T}q+t_{\theta}^{q}(p)\\ p\end{bmatrix}

which reduces to

[qexp(sθq(p))+tθq(p)p]=concat(qexp(sθq(p))+tθq(p),p)=fθq(q,p).\begin{bmatrix}q\odot\exp(s_{\theta}^{q}(p))+t_{\theta}^{q}(p)\\ p\end{bmatrix}=\operatorname{concat}(q\odot\exp(s^{q}_{\theta}(p))+t_{\theta}^{q}(p),p)=f_{\theta}^{q}(q,p).

Thus, fθq(q,p)=φ(γq1;θ,τ)φ(γq0;θ,τ)f_{\theta}^{q}(q,p)=\varphi(\gamma_{q}^{1;\theta},\tau)\circ\varphi(\gamma_{q}^{0;\theta},\tau), and similarly, fθp(q,p)=φ(γp1;θ,τ)φ(γp0;θ,τ)f_{\theta}^{p}(q,p)=\varphi(\gamma_{p}^{1;\theta},\tau)\circ\varphi(\gamma_{p}^{0;\theta},\tau).

Strictly speaking, Taylor-Verlet integrators cannot be said to completely generalize these coupling-based architectures because Verlet flows operate on a fixed, canonical partition of dimensions, whereas coupling-based architectures commonly rely on different dimensional partitions in each layer.