Verlet Flows: Exact-Likelihood Integrators for Flow-Based Generative Models
Abstract
Approximations in computing model likelihoods with continuous normalizing flows (CNFs) hinder the use of these models for importance sampling of Boltzmann distributions, where exact likelihoods are required. In this work, we present Verlet flows, a class of CNFs on an augmented state-space inspired by symplectic integrators from Hamiltonian dynamics. When used with carefully constructed Taylor-Verlet integrators, Verlet flows provide exact-likelihood generative models which generalize coupled flow architectures from a non-continuous setting while imposing minimal expressivity constraints. On experiments over toy densities, we demonstrate that the variance of the commonly used Hutchinson trace estimator is unsuitable for importance sampling, whereas Verlet flows perform comparably to full autograd trace computations while being significantly faster.
1 Introduction
Flow-based generative models—also called normalizing flows—parameterize maps from prior to data distributions via invertible transformations. An exciting application of normalizing flows is in learning the Boltzmann distributions of physical systems (Noé et al., 2019; Midgley et al., 2023; Kim et al., 2024). At inference time, these Boltzmann generators provide model likelihoods which can be used to reweigh samples towards the target energy with importance sampling. While nearly all existing Boltzmann generators are built from composing invertible layers such as coupling layers or splines, experiments on image domains suggest that continuous normalizing flows (CNFs)—which can parameterize arbitrary vector fields mapping noise to data—are far more expressive than their discrete counterparts (Chen et al., 2018; Grathwohl et al., 2018). Unfortunately, the exact model likelihood of CNFs can only be accessed through expensive trace computations and numerical integration, preventing their adoption in Boltzmann generators.
In this work, we propose Verlet flows, a flexible class of CNFs on an augmented state-space inspired by symplectic integrators from Hamiltonian dynamics. Instead of parameterizing the flow with a single neural network, Verlet flows instead parameterize the coefficients of the multivariate Taylor expansions of in both the state-space and the augmenting space. We then introduce Taylor-Verlet integrators, which exploit the splitting approximation from which many symplectic integrators are derived to approximate the intractable time evolution of as the composition of the tractable time evolutions of the Taylor expansion terms. At training time, Verlet flows are a subclass of CNFs, and can be trained accordingly. At inference time, Taylor-Verlet integration enables theoretically-sound importance sampling with exact likelihoods.
2 Background
Discrete Normalizing Flows
Given a source distribution and target distribution , we wish to learn an invertible, bijective transformation which maps to . Discrete normalizing flows parameterize as the composition , from which can be computed using the change of variables formula and the log-determinants of the Jacobians of the individual transformations . Thus, significant effort has been dedicated to developing expressive, invertible building blocks whose Jacobians have tractable log-determinant. Successful approaches include coupling-based flows, in which the dimensions of the state variable are partitioned in two, and the each half is used in turn to update the other half (Dinh et al., 2016; 2014; Müller et al., 2019; Durkan et al., 2019), and autoregressive flows (Kingma et al., 2017; Papamakarios et al., 2018). Despite these efforts, discrete normalizing flows have been shown to suffer from a lack of expressivity in practice.
Continuous Normalizing Flows
Continuous normalizing flows (CNFs) dispense with the discrete layers of normalizing flows and instead learn a time-dependent vector field , parameterized by a neural network, which maps the source to a target distribution (Chen et al., 2018; Grathwohl et al., 2018). Model densities can be accessed by the continuous-time change of variables formula given by
(1) |
where , denotes trace, and denotes the Jacobian. Compared to discrete normalizing flows, CNFs are not constrained by invertibility or the need for a tractable Jacobian, and therefore enjoy significantly greater expressivity.
While the trace appearing in the integrand of Equation 1 can be evaluated exactly with automatic differentiation, this grows prohibitively expensive as the dimensionality of the data grows large, as a linear number of backward-passes are required. In practice, the Hutchinson trace estimator (Grathwohl et al., 2018) is used to provide a linear-time, unbiased estimator of the trace. While cheaper, the variance of the Hutchinson estimator makes it unsuitable for importance sampling.
Symplectic Integrators and the Splitting Approximation
Leap-frog integration is a numeric method for integrating Newton’s equations of motion which involves alternatively updating (position) and (velocity) in an invertible manner not unlike augmented, coupled normalizing flows.111Closely related to leap-frog integration is Verlet integration, from which our method derives its name. Leap-frog integration is a special case of the more general family of symplectic integrators, designed for the Hamiltonian flow (of which the equations of motion are a special case). Oftentimes the Hamiltonian flow decomposes as , enabling the splitting approximation
(2) |
where denotes the time evolution operator along the flow for a duration , and where the terms on the right-hand side of Equation 2 are possibly tractable in a way that the left-hand side is not. For example, the leap-frog integrator corresponds to analytic, invertible, and volume-preserving , whereas the original evolution may satisfy none of these properties. While Verlet flows, to be introduced in the next section, are not in general Hamiltonian, they similarly exploit the splitting approximation. A more detailed exposition of symplectic integrators and the splitting approximation can be found in Appendix A.
3 Methods
3.1 Verlet Flows
We consider the problem of mapping a source distribution on at time to a target distribution on () at time by means of a time-dependent flow . We will now augment this problem on the configuration-space by extending the distribution to and to where both are given by . In analogy with Hamiltonian dynamics, we will refer to the space as phase space.222Note that we do not require that .
Observe that any analytic flow is given (at least locally) by a multivariate Taylor expansion of the form
(3) |
for appropriate choices of functions and , which we have identified in the last equality as -tensors: multilinear maps which take in copies of and return a tangent vector. While and can be thought of as vectors and matrices respectively, higher order terms do not admit particularly intuitive interpretations. Whereas traditional CNFs commonly parameterize directly via a neural network, Verlet flows instead parameterize the coefficients with neural networks, allowing for Verlet integration via the splitting approximation. By parameterizing all the terms in the Taylor expansion, Verlet flows are in theory as expressive as CNFs parameterized as . However, in practice,we must truncate the series after some finite number of terms, yielding the order Verlet flow
(4) |
In the next section, we examine how to obtain exact likelihoods from these truncated Verlet flows.
3.2 Taylor-Verlet Integrators
Denote by the flow given by
and define similarly.333When there is no risk of ambiguity, we drop the subscript and refer to simply by . For any such flow on , denote by the time evolution operator, transporting a point along the flow for time . We denote by just the pseudo time evolution operator given by .444Justification for use of the pseudo time evolution operator can be found in Appendix B. Note that is kept constant throughout integration, an intentional choice which we shall see allows for a tractable closed form. Although our Verlet flows are not Hamiltonian, the splitting approximation from Equation 11 can be applied to Verlet flows to decompose the desired time evolution into simpler, analytic terms, yielding
(5) |
Note here that the leftmost term of the right hand side is the time-update term . The key idea is that Equation 5 approximates the generally intractable as a composition of simpler, tractable updates allowing for a closed-form, exact-likelihood integrator for Verlet flows.
The splitting approximation from Equation 5, together with closed-form expressions for the time evolution operators and their log density updates (see Figure 1), yields an integration scheme specifically tailored for Verlet flows, and which we shall refer to as a Taylor-Verlet integrator. Explicit integrators for first order and higher order Verlet flows are presented in Appendix D. One important element of the design space of Taylor-Verlet integration is the order of the terms within the splitting approximation of Equation 5, and consequently, the order of updates performed during Verlet integration. We will refer to Taylor-Verlet integrators which follow the order of Equation 5 as standard Taylor-Verlet integrators, and others as non-standard. While the remainder of this work focuses on standard Taylor-Verlet integrators, the space of non-standard Taylor-Verlet integrators is rich and requires further exploration. Certain coupling-based normalizing flow architectures, such as RealNVP (Dinh et al., 2016) can be realized as the update steps of non-standard Taylor-Verlet integrators, as is discussed in Appendix E.
3.3 Closed Form and Density Updates for Time Evolution Operators
Flow | Operator | Density Update |
---|---|---|
For each pseudo time evolution operator , we compute its closed-form and the log-determinant of its Jacobian. Together, these allow us to implement the integrator given by Equation 5. Results are summarized in the Table 1 for only, but analogous results hold for for as well. Note that for terms of order , and for the sake of tractability, we restrict our attention to sparse tensors, denoted , for which only “on-diagonal” terms are non-zero so that collapses to a simple dot product. We similarly use to denote the corresponding flows for sparse, higher order terms. Full details and derivations can be found in Appendix C.
4 Experiments
Across all experiments in this section, and unless stated otherwise, we train an order-one Verlet flow , with coefficients parameterized as a three-layer architecture with hidden units each, as a continuous normalizing flow using likelihood-based loss. Non-Verlet integration is performed numerically using a fourth-order Runge-Kutta solver for steps.
Estimation of


Given an unnormalized density , a common application of importance sampling is to estimate the partition function . Given a distribution (hopefully close to the unknown, normalized density ), we obtain an unbiased estimate of via
(6) |
We train an order-one Verlet flow targeting a trimodal Gaussian mixture in two-dimensional -space, and an isotropic Gaussian in a two-dimensional -space. We then perform and time importance sampling using Equation 6 to estimate the natural logarithm in two ways: first numerically integrating with a fourth-order Runge-Kutta solver and using automatic differentiation to exactly compute the trace, and secondly using Taylor-Verlet integration. We find that integrating using a Taylor-Verlet integrator performs comparably to integrating numerically while being significantly faster. Results are summarized in Figure 1.
The poor performance of the Hutchinson trace estimator can be seen in Figure 2, where we plot a histogram of the logarithm of the importance weights for . The presence of just a few positive outliers (to be expected given the variance of the trace estimator) skews the resulting estimate of to be on the order of or larger.

5 Conclusion
In this work, we have presented Verlet flows, a class of CNFs in an augmented state space whose flow is parameterized via the coefficients of a multivariate Taylor expansion. The splitting approximation used by many symplectic integrators is adapted to construct exact-likelihood Taylor-Verlet integrators, which enable comparable but faster performance to numeric integration using expensive, autograd-based trace computation on tasks such as importance sampling.
6 Acknowledgements
We thank Gabriele Corso, Xiang Fu, Peter Holderrieth, Hannes Stärk, and Andrew Campbell for helpful feedback and discussion over the course of the project. We also thank the anonymous reviewers for their helpful feedback and suggestions.
References
- Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Dinh et al. (2014) Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- Dinh et al. (2016) Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- Durkan et al. (2019) Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. Advances in neural information processing systems, 32, 2019.
- Grathwohl et al. (2018) Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018.
- Kim et al. (2024) Joseph C Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, and Mengdi Wang. Scalable normalizing flows enable boltzmann generators for macromolecules. arXiv preprint arXiv:2401.04246, 2024.
- Kingma et al. (2017) Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improving variational inference with inverse autoregressive flow, 2017.
- Midgley et al. (2023) Laurence I Midgley, Vincent Stimper, Javier Antorán, Emile Mathieu, Bernhard Schölkopf, and José Miguel Hernández-Lobato. Se (3) equivariant augmented coupling flows. arXiv preprint arXiv:2308.10364, 2023.
- Müller et al. (2019) Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, and Jan Novák. Neural importance sampling, 2019.
- Noé et al. (2019) Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
- Papamakarios et al. (2018) George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation, 2018.
- Yoshida (1993) Haruo Yoshida. Recent progress in the theory and application of symplectic integrators. In Qualitative and Quantitative Behaviour of Planetary Systems: Proceedings of the Third Alexander von Humboldt Colloquium on Celestial Mechanics, pp. 27–43. Springer, 1993.
Appendix A Hamiltonian Mechanics and Symplectic Integrators on Euclidean Space
Given a mechanical system with configuration space , we may define the phase space of the system to be the cotangent bundle . Intuitively, phase space captures the intuitive notion that understanding the state of at a point in time requires knowledge of both the position and the velocity, or momentum (assuming unit mass), .
A.1 Hamiltonian Mechanics
Hamiltonian mechanics is a formulation of classical mechanics in which the equations of motion are given by differential equations describing the flow along level curves of an energy function, or Hamiltonian, . Denote by the space of smooth vector fields on . Then at the point , the Hamiltonian flow is defined to be the unique vector field which satisfies
(7) |
for all , and where
is the symplectic form555In our Euclidean context, a symplectic form is more generally any non-degenerate skew-symmetric bilinear form on phase space. However, it can be shown that there always exists a change of basis which satisfies , where denotes the change of basis matrix. Thus, we will only consider .. Equation 7 implies , which yields
(8) |
In other words, our state evolves according to and .
A.2 Properties of the Hamiltonian Flow
The time evolution of satisfies two important properties: it conserves the Hamiltonian , and it conserves the symplectic form .
Proposition A.1.
The flow conserves the Hamiltonian .
Proof.
This amounts to showing that , which follows immediately from . ∎
Proposition A.2.
The flow preserves the symplectic form .
Proof.
Realizing as the (equivalent) two-form , the desired result amounts to showing that the Lie derivative . With Cartan’s formula, we find that
where denotes the exterior derivative, and denotes the interior product. Here, we have used that . Then we compute that
Since , , as desired. ∎
Flows which preserve the symplectic form are known as symplectomorphisms. Proposition A.2 implies that the time evolution of is a symplectomorphism.
A.3 Symplectic Integrators and the Splitting Approximation
We have seen that the time-evolution of is a symplectomorphism, and therefore preserves the symplectic structure on the phase space . In constructing numeric integrators for , it is therefore desirable that our integrators are, if possible, themselves symplectomorphisms. In many cases, the Hamiltonian decomposes as the sum . Then, at the point , we find that
and
Thus, the flow decomposes as well to
Observe now that the respective time evolution operators are tractable and are given by
and
Since and are Hamiltonian flows their time evolutions and are both symplectomorphisms. As symplectomorphisms are closed under composition, it follows that that is itself a symplectomorphism. We have thus arrived at the splitting approximation
(9) |
Equation 9 allows us to approximate the generally intractable, symplectic time evolution as the symplectic composition of two simpler, tractable time evolution operators. The integration scheme given by Equation 9 is generally known as the symplectic Euler method.
So-called splitting methods make use of more general versions of the splitting approximation to derive higher order, symplectic integrators. Using the same decomposition , and instead of considering the two-term approximation given by Equation 9, we may choose coefficients and with and consider the more general splitting approximation
(10) |
A more detailed exposition of higher order symplectic integrators can be found in (Yoshida, 1993).
Appendix B Justification for Treating ’s as Time Evolution Operators
In the following discussion, we will use for brevity. The splitting approximation from Equation 5, which we recall below as
(11) |
requires some clarification. Recall that while the true time evolution operator is given by
(12) |
the pseudo time operator is given by
(13) |
where is kept-constant throughout the integration.
To make sense of the connection between and , we will augment our phase-time space (within which our points live), with a new -dimension, to obtain the space . Treating and as the state variables and which evolve with , the flow (as a representative example) on can be extended to a flow on given by
(14) |
where the zero -component encodes the fact that the pseudo-time evolution from Equation 13 does not change . The big idea is then that this pseudo time evolution can be viewed as the projection of the (non-pseudo) -evolution , given by
(15) |
onto . The equivalency follows from the fact that for , for . A similar statement can be made about the -update from Equation 11.
Denoting by the projection onto , we see that the splitting approximating using pseudo-time operators from Equation 11 can be rewritten as the projection onto of an analogous splitting approximation using non-pseudo -evolution operators, viz.,
(16) |
Appendix C Derivation of Time Evolution Operators and Their Jacobians
Order Zero Terms.
For order , recall that
so that the operator is given by
(17) |
with Jacobian given by
(18) |
The analysis for is nearly identical, and we omit it.
Order One Terms.
For , we recall that
(19) |
Then the time evolution operator is given by
(20) |
and the Jacobian is simply given by
(21) |
Then .
Sparse Higher Order Terms.
For , we consider only sparse tensors given by the simple dot product
where denotes the element-wise -th power of . Then the -component of time evolution operator is given component-wise by an ODE of the form , whose solution is obtained in closed form via rearranging to the equivalent form
Then it follows that is given component-wise by . Thus, the operator is given by
(22) |
The Jacobian is then given by
(23) |
with given by
Appendix D Explicit Descriptions of Taylor-Verlet Integrators
Taylor-Verlet integrators are constructed using the splitting approximation given in Equation 5 of an order Verlet flow , which we recall below as
(24) |
The standard Taylor-Verlet integrator of an order Verlet flow is given explicitly in Algorithm 1 below.
Closed-form expressions for the time evolution operators and log density updates can be found in Table 1. Algorithm 2 details explicitly standard Taylor-Verlet integration of an order one Verlet flow.
Appendix E Realizing Coupling Architectures as Verlet Integrators
In this section, we will show that two coupling-based normalizing flow architectures - NICE (Dinh et al. (2014)) and RealNVP (Dinh et al. (2016)) - can be realized as the Taylor-Verlet integrators for zero and first order Verlet flows respectively. Specifically, for each such coupling layer architecture , we may construct a Verlet flow whose Taylor-Verlet integrator is given by successive applications of .
Additive Coupling Layers
The additive coupling layers of NICE involve updates of the form
Now consider the order zero Verlet flow given by
where and . Then the standard Taylor-Verlet integrator with step size is given by the splitting approximation
with updates given by
and
Thus, and .
RealNVP
The coupling layers of RealNVP are of the form
Now consider the first order Verlet flow given by
where ,
and and are defined analogously. Then a non-standard Taylor-Verlet integrator is obtained from the splitting approximation
where the order has been rearranged from that of Equation 5 to group together the and terms. The time evolution operators and are given by
and
So that the combined -update is given by
which reduces to
Thus, , and similarly, .
Strictly speaking, Taylor-Verlet integrators cannot be said to completely generalize these coupling-based architectures because Verlet flows operate on a fixed, canonical partition of dimensions, whereas coupling-based architectures commonly rely on different dimensional partitions in each layer.