Deep Neural Networks with Symplectic Preservation Properties
Abstract
We propose a deep neural network architecture designed such that its output forms an invertible symplectomorphism of the input. This design draws an analogy to the real-valued non-volume-preserving (real NVP) method used in normalizing flow techniques. Utilizing this neural network type allows for learning tasks on unknown Hamiltonian systems without breaking the inherent symplectic structure of the phase space.
Key Words: Deep learning, Symplecticomorphism, Structure-Preserving
AMS Classifications: 37J11, 70H15, 68T07
1 Introduction
For an unknown Hamiltonian system, our objective is to learn the flow mapping over a fixed time period . Specifically, we seek to determine the map that computes given an initial condition . Such problems arise, for instance, when analyzing a sequence of system snapshots at times . The key information we possess about this mapping is its property as a symplectomorphism (or canonical transformation), implying that the Jacobian of belongs to the symplectic group , where is the dimensionality of the system’s configuration space [2, 4].
In this study, we propose a neural network structure designed to ensure that its output is precisely a symplectomorphism of the input. ”Precisely” here means that the Jacobian of the mapping defined by the neural network is exactly a symplectic matrix, accounting only for minimal rounding errors inherent to floating-point arithmetic. Importantly, this framework eliminates the need to introduce an additional ”deviation-from-symplecticity penalty term” in our learning objective because the inherent structure of the network guarantees that the symplectomorphism condition cannot be violated.
The approach draws inspiration from the real NVP method [3], which is primarily used for density estimation of probability measures and differs significantly in purpose from our intended application. Nonetheless, this work leverages real NVP’s elegant methodology for constructing explicitly invertible neural networks. The method we propose represents a ”symplectic adaptation” of this technique, employing building blocks akin to those in real NVP while ensuring the preservation of symplecticity throughout. This adaptation involves replacing components that could potentially compromise the symplectic property of the mapping.
2 Preliminaries
2.1 Symplectic Structures and Symplectomorphism
On , we denote the standard Cartesian coordinates as , corresponding to the ”position” and ”momentum” coordinates in Hamiltonian mechanics. The standard symplectic form on is the differential 2-form
(1) |
and a transformation is called a symplectomorphism if . This means
(2) |
where
(3) |
or equivalently,
(4) |
where
(5) |
is the Jacobian matrix of , and
(6) |
is the matrix of the standard symplectic form .
The most essential property of a Hamiltonian system
(7) |
where
is that its flow map defines a family of symplectomorphisms. This means that if we solve (7) from time to time , then the mapping defined by is an symplectomorphism. The inverse is also true: If a differential equation system on satisfies than the flow maps are symlectomorphisms, then there exists a function such that the system can be written as Hamiltonian system (7).
2.1.1 Example: Shearing
One simplest example of symplecticomorphism comes from the symplectic Euler method for separable Hamiltonian. Suppose is a smooth function, then
(8) |
is a symplectic transformation, because
and the result comes from the identity . And similarly,
(9) |
is also a symplectomorphism, where is a smooth function. We call the symplectomorphism given by (8) or (9) a symplectic shearing.
2.1.2 Example: Stretching
Another example is the ”coordinate stretching” transformation. A diagonal linear transformation on is symplectic if and only if it has the form
(10) |
where are nonzero constants. Now we make it more general, supposing that each ’s are functions of the coordinates . Then
(11) | ||||
therefore, a transformation given as (10) is symplectic if and only if the condition
(12) |
is satisfied, the mapping (10) is symplectic. Note that (12) can be written as
and accoring to Poincaré’s Lemma, (12) is satisfied if
(13) |
for some smooth function . The condition (13) is satisfied when can be expressed as
for some , and
(14) |
holds, where is the partial derivative of on its -ith argument:
(15) |
We call the symplectomorihism given by (10) and (14) a symplectic stretching.
2.2 Real NVP
Real NVP (Real-valued Non-Volume Preserving) [3, 1] is a generative model used for density estimation. Real NVP networks use invertible transformations, allowing us to go back and forth between the original and transformed spaces. The structure of real NVP is as follows: The input and output of the network are both -dimensional vectors. An -dimensional vector
received as the input is partitioned in to two parts
A Real NVP transformation keeps one of the parts unchanged and perform an ”entry-wise linear transformation” on the other part, whose coefficients are determined by the unchanged part. Specifically, the input undergoes the following transformation:
(16) |
where are two functions which are given as a neural networks in practice, and the symbol ”” the Hadamard product (entry-wise product) operator:
The inverse of this mapping (16) is clear:
(17) |
The transformation (16) is often exhibited as a diagram like .

The apparent limitation of transformation (16) is that it does not change the part . This can be quickly fixed by appending another real NVP block that keeps the part unchanged:
(18) |
where are another two neural network functions, so the composed transformation from to given by (16) and (18) do not keep any component unchanged. This can be exhibited as a diagram like .

Of course, we can stack more layers like this to improve the expressivity of the network.
3 Symplectomorphism Neural Network (SymplectoNet, SpNN)
3.1 Structure
For our goal of building symplectomorphism neural network, the problem of real NVP is directly exhibited in its name: ”NVP” means ”non-volume-preserving”, while a symplectomorphism has to be volume preserving. Indeed, to make real NVP volume preserving (from ”real NVP” to ”real VP”), there is a quick fixation: one only needs to add an extra layer
after the output layer of the network that subtracts the average of the network. Unfortunately, mere volume-preserving property does not guarantee symplecticity. We need further adjustments.
Indeed, we can decompose (16) into two transformations: a ”stretching”
(19) |
and a ”shearing”
(20) |
Neither of these two transformations are guaranteed to be symplectic. Nevertheless, we have introduced their symplectic counterparts in the last section: Indeed, we can write (8), (9) and (10) (where (14), (15)) into a more compact form
(21) |
(22) |
(23) |
where , , , . And ”” is the Hadamard product as before. And now their correspondence with (19) and (20) are clear: (23) is exactly (20) when and are of the same dimension, and is the gradient of a function; while (23) is a symmetrized version of (19):
with being the gradient of a function. We denote the transformations defined by (21), (22), (23) as , and , which are short hands for ”q-shearing”, ”p-shearing” and ”stretching”, respectively. These becomes the basic building blocks of the ”symplectic version of real NVP” once we take , and in these transformations as trainable neural networks.
Now we have introduced all the basic symplecticomorphism building blocks, and a symplectomorphism neural network (SymplectoNet, or even shorter, SpNN) is a neural network designed as an arbitrary finite composition of , and where , and are arbitrary neural networks with -dimensional input and one-dimensional output.
Of course, the expressivity of this network depends on the complexity of the underlying neural networks , and , and also on the number of the building blocks we stacked. Indeed, the latter can be even more essential: e.g. if we only use less than four symplectic shearing blocks, we cannot even cover all the linear symplectomorphisms no matter how complicated the underlying network and are, because the Jacobian of a shearing transformation is of the form
where , are symmetric matrices. The degree of freedom of these matrices are , while , which is greater than for . This is why I also designed the symplectic stretching layer . A good practice is to include both the p, q-shearing and the symplectic stretching layers in the network for at lease once. A simplest example is a network with structure (see 3), which is similar to the structure of a real NVP.
3.2 SymplectoNet as Invertible Neural Network (INN)
One of the most important features of real NVP is that it is explicitly invertible: one can write out (or, in a more techical term, build the computation graph of) the explicit expression of the neural network function’s inverse function [5]. Our SymplectoNet is inspired by real NVP, so a natural question is whether the SymplectoNet structure is explicitly invertible like real NVP. Next, we will show that the answer is yes.
Indeed, since the inverse of a composed function is , so we only need to prove that the basic building blocks, , and are explicitly invertible. The inverse of , are obvious: (21) is equivalent to
(22) is equivalent to
therefore the inverse of , are , , respectively. And finally we look at . Notice that from (23), we have
therefore
(24) |
this shows that the inverse of is exactly . In conclusion, we have
(25) |
These results give a neat expression of inverting the SymplectoNet. E.g. the inverse of the SymplectoNet
(26) |
This shows that the inverse of SymplectoNet is explicitly available.
4 Extension to Family of Symplectomorphism
A natural extension of the symplectomorphism neural network is to include some parameters other that the canonical variables as inputs. This is can be easily achieved by changing the in the basic building blocks , and into ()-variable functions , , , where , and modify the blocks given by (21) ~(23) into
(27) |
(28) |
(29) |
With this modification, the network receives ()-dimensional vectors
as inputs and the output dimension is still , and for each fixed , the output vector is a symplectomorphism of the canonical part of the input vector, i.e. . Thus, each choice of the parameters defines a symplectomorphism, or we can say that the network defines a continuous family of symplectomorphisms parameterized by . A particularly common situation of this is when and represents the time variable. In this case, the network function can represent the solution of some Hamiltonian equation, and thanks to the symplectic property, of the network, there exists a Hamiltonian function
such that the network function represents exactly the solution of its corresponding Hamiltonian system (7). Nevertheless, it is not guaranteed that the symplectomorphism family parameterized by forms a single-parameter symplectomorphism group, i.e. the corresponding Hamiltonian has to depend explicitly on time, and we do not have method to exactly cancel this dependency.
By including more parameters (i.e. ), it is also possible to apply this network for optimal control problems involving Hamiltonian dynamics.
5 Some Preliminary Results
5.1 A Polar Nonlinear Mapping
This example is learning a symplectic map
(30) |
A network with structure
where are dense neural networks, and is dense neural network. The loss is the ordinary MSE loss. Adamax with learning rate 0.25 is applied here, and decay by factor 0.99 every 100 epoch.
Firstly, some uniformly random points for
is sampled. The training went for 40,000 epochs, and the loss dropped from to about , and the plot is shown in Figure 4(a), and the loss decay is shown in Figure 4(b).




Anoter numerical experiments concerning also (30) but the domain changed to
is also conducted. This time, the geometry of the transformation is more complicated. Note that we cannot do because this will make the mapping (30) non-injective, while the model is invertible. Thus the model will have difficulty learning the data near the two lines and . The training went for 40,000 epochs, and the loss dropped from to about , and the plot is shown in Figure 4(c), and the loss decay is shown in Figure 4(d). The majority of the error comes from boundary. This is because the points her are close to the points with .
References
- [1] C. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, Springer International Publishing, 2023.
- [2] A. da Silva, Lectures on Symplectic Geometry, Lecture Notes in Mathematics, Springer Berlin Heidelberg, 2004.
- [3] L. Dinh, J. N. Sohl-Dickstein, and S. Bengio, Density estimation using real nvp, ArXiv, abs/1605.08803 (2016).
- [4] H. Goldstein, Classical Mechanics, Addison-Wesley series in physics, Addison-Wesley Publishing Company, 1980.
- [5] I. Ishikawa, T. Teshima, K. Tojo, K. Oono, M. Ikeda, and M. Sugiyama, Universal approximation property of invertible neural networks, Journal of Machine Learning Research, 24 (2023), pp. 1–68.