This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep Neural Networks with Symplectic Preservation Properties

Qing He Wei Cai
Abstract

We propose a deep neural network architecture designed such that its output forms an invertible symplectomorphism of the input. This design draws an analogy to the real-valued non-volume-preserving (real NVP) method used in normalizing flow techniques. Utilizing this neural network type allows for learning tasks on unknown Hamiltonian systems without breaking the inherent symplectic structure of the phase space.

Key Words: Deep learning, Symplecticomorphism, Structure-Preserving

AMS Classifications: 37J11, 70H15, 68T07

1 Introduction

For an unknown Hamiltonian system, our objective is to learn the flow mapping over a fixed time period TT. Specifically, we seek to determine the map ΦT\Phi_{T} that computes (q,p)t=T(q,p)_{t=T} given an initial condition (q,p)t=0=(q0,p0)(q,p)_{t=0}=(q_{0},p_{0}). Such problems arise, for instance, when analyzing a sequence of system snapshots at times 0,T,2T,3T,0,T,2T,3T,\ldots. The key information we possess about this mapping is its property as a symplectomorphism (or canonical transformation), implying that the Jacobian of ΦT\Phi_{T} belongs to the symplectic group Sp(2n)Sp(2n), where nn is the dimensionality of the system’s configuration space [2, 4].

In this study, we propose a neural network structure designed to ensure that its output is precisely a symplectomorphism of the input. ”Precisely” here means that the Jacobian of the mapping defined by the neural network is exactly a symplectic matrix, accounting only for minimal rounding errors inherent to floating-point arithmetic. Importantly, this framework eliminates the need to introduce an additional ”deviation-from-symplecticity penalty term” in our learning objective because the inherent structure of the network guarantees that the symplectomorphism condition cannot be violated.

The approach draws inspiration from the real NVP method [3], which is primarily used for density estimation of probability measures and differs significantly in purpose from our intended application. Nonetheless, this work leverages real NVP’s elegant methodology for constructing explicitly invertible neural networks. The method we propose represents a ”symplectic adaptation” of this technique, employing building blocks akin to those in real NVP while ensuring the preservation of symplecticity throughout. This adaptation involves replacing components that could potentially compromise the symplectic property of the mapping.

2 Preliminaries

2.1 Symplectic Structures and Symplectomorphism

On 2n\mathbb{R}^{2n}, we denote the standard Cartesian coordinates as q1,,qn,p1,,pnq_{1},\cdots,q_{n},p_{1},\cdots,p_{n}, corresponding to the ”position” and ”momentum” coordinates in Hamiltonian mechanics. The standard symplectic form on 2n\mathbb{R}^{2n} is the differential 2-form

ω=i=1ndqidpi,\omega=\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}, (1)

and a transformation φ:2n2n\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}^{2n} is called a symplectomorphism if φω=ω\varphi^{*}\omega=\omega. This means

i=1ndQidPi=i=1ndqidpi,\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}=\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}, (2)

where

(Q1,,Qn,P1,,Pn)=φ(q1,,qn,p1,,pn),(Q_{1},\cdots,Q_{n},P_{1},\cdots,P_{n})=\varphi(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}), (3)

or equivalently,

JφΩJφ=Ω,J_{\varphi}^{\top}\Omega J_{\varphi}=\Omega, (4)

where

Jφ=(Q1q1Q1qnQ1p1Q1pnQnq1QnqnQnp1QnpnP1q1P1qnP1p1P1pnPnq1PnqnPnp1Pnpn)J_{\varphi}=\begin{pmatrix}\frac{\partial Q_{1}}{\partial q_{1}}&\cdots&\frac{\partial Q_{1}}{\partial q_{n}}&\frac{\partial Q_{1}}{\partial p_{1}}&\cdots&\frac{\partial Q_{1}}{\partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial Q_{n}}{\partial q_{1}}&\cdots&\frac{\partial Q_{n}}{\partial q_{n}}&\frac{\partial Q_{n}}{\partial p_{1}}&\cdots&\frac{\partial Q_{n}}{\partial p_{n}}\\ \frac{\partial P_{1}}{\partial q_{1}}&\cdots&\frac{\partial P_{1}}{\partial q_{n}}&\frac{\partial P_{1}}{\partial p_{1}}&\cdots&\frac{\partial P_{1}}{\partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial P_{n}}{\partial q_{1}}&\cdots&\frac{\partial P_{n}}{\partial q_{n}}&\frac{\partial P_{n}}{\partial p_{1}}&\cdots&\frac{\partial P_{n}}{\partial p_{n}}\end{pmatrix} (5)

is the Jacobian matrix of φ\varphi, and

Ω=(0n×nIn×nIn×n0n×n)\Omega=\begin{pmatrix}0_{n\times n}&I_{n\times n}\\ -I_{n\times n}&0_{n\times n}\end{pmatrix} (6)

is the matrix of the standard symplectic form ω\omega.

The most essential property of a Hamiltonian system

{dqidt=Hpi,dpidt=Hqi,i=1,2,,n,\begin{cases}\displaystyle\frac{\mathrm{d}q_{i}}{\mathrm{d}t}=\frac{\partial H}{\partial p_{i}},&\\[10.0pt] \displaystyle\frac{\mathrm{d}p_{i}}{\mathrm{d}t}=-\frac{\partial H}{\partial q_{i}},&\end{cases}i=1,2,\cdots,n, (7)

where

H=H(q1,,qn,p1,,pn,t)C2(2n+1)H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)\in C^{2}(\mathbb{R}^{2n+1})

is that its flow map defines a family of symplectomorphisms. This means that if we solve (7) from time t0t_{0} to time t1t_{1}, then the mapping defined by (q(t0),p(t0))(q(t1),p(t1))(q(t_{0}),p(t_{0}))\to(q(t_{1}),p(t_{1})) is an 2n2n\mathbb{R}^{2n}\to\mathbb{R}^{2n} symplectomorphism. The inverse is also true: If a differential equation system on 2n2n\mathbb{R}^{2n}\to\mathbb{R}^{2n} satisfies than the flow maps are symlectomorphisms, then there exists a function HC2(2n+1)H\in C^{2}(\mathbb{R}^{2n+1}) such that the system can be written as Hamiltonian system (7).

2.1.1 Example: Shearing

One simplest example of symplecticomorphism comes from the symplectic Euler method for separable Hamiltonian. Suppose F:nF:\mathbb{R}^{n}\rightarrow\mathbb{R} is a smooth function, then

{Qi=qiPi=pi+Fqi(q1,,qn)\begin{cases}Q_{i}=q_{i}&\\ P_{i}=p_{i}+\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})&\end{cases} (8)

is a symplectic transformation, because

i=1ndQidPi=\displaystyle\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}= i=1ndqid(pi+Fqi(q1,,qn))\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}\left(p_{i}+\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})\right)
=\displaystyle= i=1ndqidpi+i=1ndqidFqi(q1,,qn)\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})
=\displaystyle= i=1ndqidpid(dF(q1,,qn))\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\mathrm{d}(\mathrm{d}F(q_{1},\cdots,q_{n}))

and the result comes from the identity d(dF)=0\mathrm{d}(\mathrm{d}F)=0. And similarly,

{Qi=qi+Gpi(p1,,pn)Pi=pi\begin{cases}Q_{i}=q_{i}+\frac{\partial G}{\partial p_{i}}(p_{1},\cdots,p_{n})&\\ P_{i}=p_{i}&\end{cases} (9)

is also a symplectomorphism, where G:nG:\mathbb{R}^{n}\rightarrow\mathbb{R} is a smooth function. We call the symplectomorphism given by (8) or (9) a symplectic shearing.

2.1.2 Example: Stretching

Another example is the ”coordinate stretching” transformation. A diagonal linear transformation on 2n\mathbb{R}^{2n} is symplectic if and only if it has the form

(q1,,qn,p1,,pn)(k1q1,,knqn,p1k1,,pnkn),(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})\mapsto\left(k_{1}q_{1},\cdots,k_{n}q_{n},\frac{p_{1}}{k_{1}},\cdots,\frac{p_{n}}{k_{n}}\right), (10)

where k1,,knk_{1},\cdots,k_{n} are nonzero constants. Now we make it more general, supposing that each kik_{i}’s are functions of the coordinates q1,,qn,p1,,pnq_{1},\cdots,q_{n},p_{1},\cdots,p_{n}. Then

i=1nd(kiqi)dpiki=\displaystyle\sum_{i=1}^{n}\mathrm{d}(k_{i}q_{i})\land\mathrm{d}\frac{p_{i}}{k_{i}}= i=1n(kidqi+qidki)(dpikipidkiki2)\displaystyle\sum_{i=1}^{n}(k_{i}\mathrm{d}q_{i}+q_{i}\mathrm{d}k_{i})\land\left(\frac{\mathrm{d}p_{i}}{k_{i}}-\frac{p_{i}\mathrm{d}k_{i}}{k_{i}^{2}}\right) (11)
=\displaystyle= i=1ndqidpi+qikidkidpipidqidkiki+0\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\frac{q_{i}}{k_{i}}\mathrm{d}k_{i}\land\mathrm{d}p_{i}-\frac{p_{i}\mathrm{d}q_{i}\land\mathrm{d}k_{i}}{k_{i}}+0
=\displaystyle= i=1ndqidpiqidpi+pidqikidki\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{q_{i}\mathrm{d}p_{i}+p_{i}\mathrm{d}q_{i}}{k_{i}}\land\mathrm{d}k_{i}
=\displaystyle= i=1ndqidpid(piqi)kidki,\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{\mathrm{d}(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i},

therefore, a transformation given as (10) is symplectic if and only if the condition

i=1nd(piqi)kidki=0\sum_{i=1}^{n}\frac{\mathrm{d}(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i}=0 (12)

is satisfied, the mapping (10) is symplectic. Note that (12) can be written as

i=1nd(piqi)dln|ki|=0,\sum_{i=1}^{n}\mathrm{d}(p_{i}q_{i})\land\mathrm{d}\ln|k_{i}|=0,

and accoring to Poincaré’s Lemma, (12) is satisfied if

i=1nln|ki|d(piqi)=dφ\sum_{i=1}^{n}\ln|k_{i}|\mathrm{d}(p_{i}q_{i})=\mathrm{d}\varphi (13)

for some smooth function φ:2n\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}. The condition (13) is satisfied when φ\varphi can be expressed as

φ(q1,,qn,p1,,pn)=Φ(p1q1,p2q2,,pnqn)\varphi(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})=\Phi(p_{1}q_{1},p_{2}q_{2},\cdots,p_{n}q_{n})

for some Φ:n\Phi:\mathbb{R}^{n}\rightarrow\mathbb{R}, and

ki=±eΦi(p1q1,p2q2,,pnqn)k_{i}=\pm\mathrm{e}^{\Phi_{i}(p_{1}q_{1},p_{2}q_{2},\cdots,p_{n}q_{n})} (14)

holds, where Φi\Phi_{i} is the partial derivative of Φ\Phi on its ii-ith argument:

Φi(x1,,xn)=Φxi(x1,,xn).\Phi_{i}(x_{1},\cdots,x_{n})=\frac{\partial\Phi}{\partial x_{i}}(x_{1},\cdots,x_{n}). (15)

We call the symplectomorihism given by (10) and (14) a symplectic stretching.

2.2 Real NVP

Real NVP (Real-valued Non-Volume Preserving) [3, 1] is a generative model used for density estimation. Real NVP networks use invertible transformations, allowing us to go back and forth between the original and transformed spaces. The structure of real NVP is as follows: The input and output of the network are both NN-dimensional vectors. An NN-dimensional vector

z=(z1,z2,,zN)z=(z_{1},z_{2},\cdots,z_{N})

received as the input is partitioned in to two parts

z=(z1,,znA,zn+1,,zNB):=(zA,zB).z=(\underbrace{z_{1},\cdots,z_{n}}_{A},\underbrace{z_{n+1},\cdots,z_{N}}_{B}):=(z_{A},z_{B}).

A Real NVP transformation keeps one of the parts unchanged and perform an ”entry-wise linear transformation” on the other part, whose coefficients are determined by the unchanged part. Specifically, the input zz undergoes the following transformation:

{xA=zAxB=es(zA)zB+b(zA)\begin{cases}x_{A}=z_{A}&\\ x_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B}+b(z_{A})&\end{cases} (16)

where s,b:nNns,b:\mathbb{R}^{n}\rightarrow\mathbb{R}^{N-n} are two functions which are given as a neural networks in practice, and the symbol ”\odot” the Hadamard product (entry-wise product) operator:

(x1,,xn)(y1,,yn)=(x1y1,,xnyn).(x_{1},\cdots,x_{n})\odot(y_{1},\cdots,y_{n})=(x_{1}y_{1},\cdots,x_{n}y_{n}).

The inverse of this mapping (16) is clear:

{zA=xAzB=es(zA)(xBb(xA)).\begin{cases}z_{A}=x_{A}&\\ z_{B}=\mathrm{e}^{-s(z_{A})}\odot(x_{B}-b(x_{A})).&\end{cases} (17)

The transformation (16) is often exhibited as a diagram like .

Refer to caption
Figure 1: A diagram of the transformation (16)

The apparent limitation of transformation (16) is that it does not change the part zAz_{A}. This can be quickly fixed by appending another real NVP block that keeps the xBx_{B} part unchanged:

{yAes~(xA)xB+b~(xA)yBxB\begin{cases}y_{A}\leftarrow\mathrm{e}^{\tilde{s}(x_{A})}\odot x_{B}+\tilde{b}(x_{A})&\\ y_{B}\leftarrow x_{B}&\end{cases} (18)

where s~,b~:Nnn\tilde{s},\tilde{b}:\mathbb{R}^{N-n}\rightarrow\mathbb{R}^{n} are another two neural network functions, so the composed transformation from zz to yy given by (16) and (18) do not keep any component unchanged. This can be exhibited as a diagram like .

Refer to caption
Figure 2: A diagram of the composition transformation

Of course, we can stack more layers like this to improve the expressivity of the network.

3 Symplectomorphism Neural Network (SymplectoNet, SpNN)

3.1 Structure

For our goal of building symplectomorphism neural network, the problem of real NVP is directly exhibited in its name: ”NVP” means ”non-volume-preserving”, while a symplectomorphism has to be volume preserving. Indeed, to make real NVP volume preserving (from ”real NVP” to ”real VP”), there is a quick fixation: one only needs to add an extra layer

(s1,,sN)(s1,,sN)s¯(1,,1),s¯=1Ni=1Nsi(s_{1},\cdots,s_{N})\rightarrow(s_{1},\cdots,s_{N})-\overline{s}(1,\cdots,1),\quad\overline{s}=\frac{1}{N}\sum_{i=1}^{N}s_{i}

after the output layer of the network that subtracts the average of the network. Unfortunately, mere volume-preserving property does not guarantee symplecticity. We need further adjustments.

Indeed, we can decompose (16) into two transformations: a ”stretching”

{ξA=zAξB=es(zA)zB,\begin{cases}\xi_{A}=z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B},&\end{cases} (19)

and a ”shearing”

{xA=ξAxB=ξB+b(ξA).\begin{cases}x_{A}=\xi_{A}&\\ x_{B}=\xi_{B}+b(\xi_{A}).&\end{cases} (20)

Neither of these two transformations are guaranteed to be symplectic. Nevertheless, we have introduced their symplectic counterparts in the last section: Indeed, we can write (8), (9) and (10) (where (14), (15)) into a more compact form

{Q=qP=p+F(q),\begin{cases}Q=q&\\ P=p+\nabla F(q),&\end{cases} (21)
{Q=q+G(p)P=p,\begin{cases}Q=q+\nabla G(p)&\\ P=p,&\end{cases} (22)
{Q=eΦ(qp)qP=eΦ(qp)p,\begin{cases}Q=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q&\\ P=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot p,&\end{cases} (23)

where q=(q1,,qn)q=(q_{1},\cdots,q_{n}), p=(p1,,pn)p=(p_{1},\cdots,p_{n}), Q=(Q1,,Qn)Q=(Q_{1},\cdots,Q_{n}), P=(P1,,Pn)P=(P_{1},\cdots,P_{n}). And ”\odot” is the Hadamard product as before. And now their correspondence with (19) and (20) are clear: (23) is exactly (20) when dimxA\dim x_{A} and dimxB\dim x_{B} are of the same dimension, and bb is the gradient of a function; while (23) is a symmetrized version of (19):

{ξA=es(zAzB)zAξB=es(zAzB)zB,\begin{cases}\xi_{A}=\mathrm{e}^{-s(z_{A}\odot z_{B})}\odot z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A}\odot z_{B})}\odot z_{B},&\end{cases}

with ss being the gradient of a function. We denote the transformations defined by (21), (22), (23) as qShF\operatorname{qSh}_{F}, pShG\operatorname{pSh}_{G} and StΦ\operatorname{St_{\Phi}}, which are short hands for ”q-shearing”, ”p-shearing” and ”stretching”, respectively. These becomes the basic building blocks of the ”symplectic version of real NVP” once we take FF, GG and Φ\Phi in these transformations as trainable neural networks.

Now we have introduced all the basic symplecticomorphism building blocks, and a symplectomorphism neural network (SymplectoNet, or even shorter, SpNN) is a neural network designed as an arbitrary finite composition of qShF\operatorname{qSh}_{F}, pShG\operatorname{pSh}_{G} and StΦ\operatorname{St_{\Phi}} where FF, GG and Φ\Phi are arbitrary neural networks with nn-dimensional input and one-dimensional output.

Of course, the expressivity of this network depends on the complexity of the underlying neural networks FF, GG and HH, and also on the number of the building blocks we stacked. Indeed, the latter can be even more essential: e.g. if we only use less than four symplectic shearing blocks, we cannot even cover all the linear symplectomorphisms no matter how complicated the underlying network FF and GG are, because the Jacobian of a shearing transformation is of the form

(IBI)or(ICI),\begin{pmatrix}I&\\ B&I\end{pmatrix}\quad\text{or}\quad\begin{pmatrix}I&C\\ &I\end{pmatrix},

where BB, CC are symmetric n×nn\times n matrices. The degree of freedom of these matrices are n(n+1)/2n(n+1)/2, while dimSp(2n)=n(2n+1)\dim Sp(2n)=n(2n+1), which is greater than 3n(n+1)/23n(n+1)/2 for n>1n>1. This is why I also designed the symplectic stretching layer StrΦ\operatorname{Str}_{\Phi}. A good practice is to include both the p, q-shearing and the symplectic stretching layers in the network for at lease once. A simplest example is a network with structure pShGStΦqShF\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F} (see 3), which is similar to the structure of a real NVP.

qqppF\nabla FΦ\nabla\Phiexp1/x1/xG\nabla GqShF\operatorname{qSh}_{F}pShG\operatorname{pSh}_{G}StΦ\operatorname{St}_{\Phi}
Figure 3: The diagram expression of pShGStΦqShF\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}

3.2 SymplectoNet as Invertible Neural Network (INN)

One of the most important features of real NVP is that it is explicitly invertible: one can write out (or, in a more techical term, build the computation graph of) the explicit expression of the neural network function’s inverse function [5]. Our SymplectoNet is inspired by real NVP, so a natural question is whether the SymplectoNet structure is explicitly invertible like real NVP. Next, we will show that the answer is yes.

Indeed, since the inverse of a composed function f1f2fkf_{1}\circ f_{2}\circ\cdots\circ f_{k} is fk1f21f11f_{k}^{-1}\circ\cdots\circ f_{2}^{-1}\circ f_{1}^{-1}, so we only need to prove that the basic building blocks, pShG\operatorname{pSh}_{G}, qShF\operatorname{qSh}_{F} and StΦ\operatorname{St}_{\Phi} are explicitly invertible. The inverse of pShG\operatorname{pSh}_{G}, qShF\operatorname{qSh}_{F} are obvious: (21) is equivalent to

{q=Qp=PF(Q),\begin{cases}q=Q&\\ p=P-\nabla F(Q),&\end{cases}

(22) is equivalent to

{q=QG(P)p=P,\begin{cases}q=Q-\nabla G(P)&\\ p=P,&\end{cases}

therefore the inverse of pShG\operatorname{pSh}_{G}, qShF\operatorname{qSh}_{F} are  pShG\operatorname{pSh}_{-G}, qShF\operatorname{qSh}_{-F}, respectively. And finally we look at StΦ\operatorname{St}_{\Phi}. Notice that from (23), we have

QP=eΦ(qp)qeΦ(qp)p=qp,Q\odot P=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q\odot\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot p=q\odot p,

therefore

{q=eΦ(qp)Q=eΦ(QP)Qp=eΦ(qp)P=eΦ(QP)P,\begin{cases}q=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot Q=\mathrm{e}^{-\nabla\Phi(Q\odot P)}\odot Q&\\ p=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot P=\mathrm{e}^{\nabla\Phi(Q\odot P)}\odot P,&\end{cases} (24)

this shows that the inverse of StΦ\operatorname{St}_{\Phi} is exactly StΦ\operatorname{St}_{-\Phi}. In conclusion, we have

{(pShG)1=pShG,(qShF)1=qShF,(StΦ)1=StΦ.\begin{cases}\left(\operatorname{pSh}_{G}\right)^{-1}=\ \operatorname{pSh}_{-G},&\\ \left(\operatorname{qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F},&\\ \left(\operatorname{St}_{\Phi}\right)^{-1}=\operatorname{St}_{-\Phi}.&\end{cases} (25)

These results give a neat expression of inverting the SymplectoNet. E.g. the inverse of the SymplectoNet

(pShGStΦqShF)1=qShFStΦpShG.\left(\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F}\circ\operatorname{St}_{-\Phi}\circ\operatorname{pSh}_{-G}. (26)

This shows that the inverse of SymplectoNet is explicitly available.

4 Extension to Family of Symplectomorphism

A natural extension of the symplectomorphism neural network is to include some parameters τ1,τ2,,τK\tau_{1},\tau_{2},\cdots,\tau_{K} other that the canonical variables as inputs. This is can be easily achieved by changing the F(q),G(p),Φ(z)F(q),G(p),\Phi(z) in the basic building blocks qShF\operatorname{qSh}_{F}, pShG\operatorname{pSh}_{G} and StΦ\operatorname{St}_{\Phi} into (n+Kn+K)-variable functions F(q;τ)F(q;\tau), G(p;τ)G(p;\tau), Φ(z;τ)\Phi(z;\tau), where τ=(τ1,,τK)\tau=(\tau_{1},\cdots,\tau_{K}), and modify the blocks given by (21) ~(23) into

{Q=qP=p+qF(q,τ),\begin{cases}Q=q&\\ P=p+\nabla_{q}F(q,\tau),&\end{cases} (27)
{Q=q+pG(p,τ)P=p,\begin{cases}Q=q+\nabla_{p}G(p,\tau)&\\ P=p,&\end{cases} (28)
{Q=ezΦ(qp,τ)qP=ezΦ(qp,τ)p,\begin{cases}Q=\mathrm{e}^{\nabla_{z}\Phi(q\odot p,\tau)}\odot q&\\ P=\mathrm{e}^{-\nabla_{z}\Phi(q\odot p,\tau)}\odot p,&\end{cases} (29)

With this modification, the network receives (2n+K2n+K)-dimensional vectors

(q1,,qn,p1,,pn,τ1,,τK)(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},\tau_{1},\cdots,\tau_{K})

as inputs and the output dimension is still 2n2n, and for each fixed τ1,,τK\tau_{1},\cdots,\tau_{K}, the output vector is a symplectomorphism of the canonical part of the input vector, i.e. (q1,,qn,p1,,pn)(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}). Thus, each choice of the parameters τ1,,τK\tau_{1},\cdots,\tau_{K} defines a symplectomorphism, or we can say that the network defines a continuous family of symplectomorphisms parameterized by τ1,,τK\tau_{1},\cdots,\tau_{K}. A particularly common situation of this is when K=1K=1 and τ1=t\tau_{1}=t represents the time variable. In this case, the network function can represent the solution of some Hamiltonian equation, and thanks to the symplectic property, of the network, there exists a Hamiltonian function

H=H(q1,,qn,p1,,pn,t)H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)

such that the network function represents exactly the solution of its corresponding Hamiltonian system (7). Nevertheless, it is not guaranteed that the symplectomorphism family parameterized by tt forms a single-parameter symplectomorphism group, i.e. the corresponding Hamiltonian HH has to depend explicitly on time, and we do not have method to exactly cancel this dependency.

By including more parameters (i.e. K>1K>1), it is also possible to apply this network for optimal control problems involving Hamiltonian dynamics.

5 Some Preliminary Results

5.1 A Polar Nonlinear Mapping

This example is learning a symplectic map

(q,p)(2qcosp,2qsinp)=:(Q,P)(q,\ p)\rightarrow\left(\sqrt{2q}\cos p,\sqrt{2q}\sin p\right)=:(Q,P) (30)

A network with structure

qShF1pShG1StΦqShF2pShG2,\operatorname{qSh}_{F_{1}}\circ\operatorname{pSh}_{G_{1}}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F_{2}}\circ\operatorname{pSh}_{G_{2}},

where F1,G1,F2,G2F_{1},G_{1},F_{2},G_{2} are (2,20,10,1)(2,20,10,1) dense neural networks, and Φ\Phi is (2,10,1)(2,10,1) dense neural network. The loss is the ordinary MSE loss. Adamax with learning rate 0.25 is applied here, and decay by factor 0.99 every 100 epoch.

Firstly, some uniformly random points for

(q,p)[0,1]×[0,1](q,p)\in[0,1]\times[0,1]

is sampled. The training went for 40,000 epochs, and the loss dropped from 0.30.3 to about 10510^{-5}, and the plot is shown in Figure 4(a), and the loss decay is shown in Figure 4(b).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 4: Numerical experiment results of symplectomorphism neural network fitting the symplectomorphism (30). (a): The result of (30) with (q,p)[0,1]×[0,1](q,p)\in[0,1]\times[0,1]. Blue dots: true data; Orange stars: predicted results. Note that most of the error comes from data near q=0q=0 because there is a singularity there; (b): The loss decay of (a); (c): The result of (30) with (q,p)[1/2,3/2]×[0,3π/2](q,p)\in[1/2,3/2]\times[0,3\pi/2]. Blue dots: true data; Orange stars: predicted results. Note that most of the error comes from data near q=0q=0 because there is a singularity there; (d): The loss decay of (c);

Anoter numerical experiments concerning also (30) but the domain changed to

(q,p)[12,32]×[0,3π2](q,p)\in\left[\frac{1}{2},\frac{3}{2}\right]\times\left[0,\frac{3\pi}{2}\right]

is also conducted. This time, the geometry of the transformation is more complicated. Note that we cannot do p:[0,2π]p:[0,2\pi] because this will make the mapping (30) non-injective, while the model is invertible. Thus the model will have difficulty learning the data near the two lines p=0p=0 and p=2πp=2\pi. The training went for 40,000 epochs, and the loss dropped from 0.30.3 to about 10510^{-5}, and the plot is shown in Figure 4(c), and the loss decay is shown in Figure 4(d). The majority of the error comes from p=3π/2p=3\pi/2 boundary. This is because the points her are close to the points with p=0p=0.

References

  • [1] C. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, Springer International Publishing, 2023.
  • [2] A. da Silva, Lectures on Symplectic Geometry, Lecture Notes in Mathematics, Springer Berlin Heidelberg, 2004.
  • [3] L. Dinh, J. N. Sohl-Dickstein, and S. Bengio, Density estimation using real nvp, ArXiv, abs/1605.08803 (2016).
  • [4] H. Goldstein, Classical Mechanics, Addison-Wesley series in physics, Addison-Wesley Publishing Company, 1980.
  • [5] I. Ishikawa, T. Teshima, K. Tojo, K. Oono, M. Ikeda, and M. Sugiyama, Universal approximation property of invertible neural networks, Journal of Machine Learning Research, 24 (2023), pp. 1–68.