Equivariant Manifold Flows

Isay Katsman*, Aaron Lou*, Derek Lim*, Qingxuan Jiang*
Cornell University
{isk22, al968, dl772, qj46}@cornell.edu
Ser-Nam Lim
Facebook AI
[email protected]
&Christopher De Sa
Cornell University
[email protected]

Abstract

Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries—a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by learning quantum field theory-motivated invariant $SU(n)$ densities and by correcting meteor impact dataset bias.

1 Introduction

Refer to caption — Figure 1: An example of a density on $SU(3)$ that is invariant to conjugation by $SU(3)$ . The $x$ -axis and $y$ -axis are the angles $\theta_{1}$ and $\theta_{2}$ for eigenvalues $e^{i\theta_{1}}$ and $e^{i\theta_{2}}$ of a matrix in $SU(3)$ . The axis range is $-\pi$ to $\pi$ .

^†^†footnotetext: * indicates equal contribution

Learning probabilistic models for data has long been the focus of many problems in machine learning and statistics. Though much effort has gone into learning models over Euclidean space [20, 6, 21], less attention has been allocated to learning models over non-Euclidean spaces, despite the fact that many problems require a manifold structure. Density learning over non-Euclidean spaces has applications ranging from quantum field theory in physics [44] to motion estimation in robotics [16] to protein-structure prediction in computational biology [22].

Continuous normalizing flows (CNFs) [6, 21] are powerful generative models for learning structure in complex data due to their tractability and theoretical guarantees. Recent work [29, 30] has extended the framework of continuous normalizing flows to the setting of density learning on Riemannian manifolds. However, for many applications in the natural sciences, this construction is insufficient as it cannot properly model necessary symmetries. For example, such symmetry requirements arise when sampling coupled particle systems in physical chemistry [26] or sampling for use in $SU(n)$ ¹¹1 $SU(n)$ denotes the special unitary group $SU(n)=\{X\in\mathbb{C}^{n\times n}\mid X^{*}X=I,\;\det(X)=1\}$ . lattice gauge theories in theoretical physics [3].

More precisely, these symmetries are invariances with respect to action by an isometry subgroup of the underlying manifold. For example, consider the task of learning a density on the sphere that is invariant to rotation around an axis; this is an example of learning an isometry subgroup invariant²²2This specific isometry subgroup is known as the isotropy group at a point of the sphere intersecting the axis. density. For a less trivial example, note that when learning a flow-based sampler for $SU(n)$ in the context of lattice QFT [3], the learned density must be invariant to conjugation by $SU(n)$ (see Figure 1 for a density on $SU(3)$ that exhibits the requisite symmetry).

One might naturally attempt to work with the quotient of the manifold by the relevant isometry subgroup in order to model the invariance. First, note that this structure is not always a manifold, and additional restrictions are needed on the action to ensure the quotient will have a manifold structure³³3In particular, the isometry subgroup action needs to be smooth, free, and proper to ensure the quotient will be a manifold by the Quotient Manifold Theorem [28].. Assuming the quotient is in fact a manifold, one then asks whether an invariant density may be modelled by learning over this quotient with a general manifold density learning method such as NMODE [29]? Though this seems plausible, it is a problematic approach for several reasons:

1.

First, it is often difficult to realize necessary constructs (charts, exponential maps, tangent spaces) on the quotient manifold (e.g. this is the case for $\mathbb{RP}^{n}$ , a quotient of $\mathbb{S}^{n}$ [28]).
2.

Second, even if the above constructs can be realized, the quotient manifold often has a boundary, which precludes the use of a manifold CNF. To illustrate this point, consider the simple case of the sphere invariant to rotation about an axis; the quotient manifold is a closed interval, and a CNF would “flow out" on the boundary.
3.

Third, even if the quotient is a manifold without boundary for which we have a clear characterization, it may have a discrete structure that induces artifacts in the learned distribution. This is the case for Boyda et al. [3]: the flow construction over the quotient induces abnormalities in the density.

Motivated by the above drawbacks, we design a manifold continuous normalizing flow on the original manifold that maintains the requisite symmetry invariance. Since vanilla manifold CNFs do not maintain said symmetries, we instead construct equivariant manifold flows and show they induce the desired invariance. To construct these flows, we present the first general way of designing equivariant vector fields on manifolds. A summary of our paper’s contributions is as follows:

•

We present a general framework and the requisite theory for learning equivariant manifold flows: in our setup, the flows can be learned over arbitrary Riemannian manifolds while explicitly incorporating symmetries inherent to the problem. Moreover, we prove that the equivariant flows we construct can universally approximate distributions on closed manifolds.
•

We demonstrate the efficacy of our approach by learning gauge invariant densities over $SU(n)$ in the context of quantum field theory. In particular, when applied to the densities in Boyda et al. [3], we adhere more naturally to the target geometry and avoid the unnatural artifacts of the quotient construction.
•

We highlight the benefit of incorporating symmetries into manifold flow models by comparing directly against previous general manifold density learning approaches. We show that when a general manifold learning model is not aware of symmetries inherent to the problem, the learned density is of considerably worse quality and violates said symmetries. Prior to our work, there did not exist literature that demonstrated the benefits of incorporating isometry group symmetries for learning flows on manifolds, yet we achieve these benefits, and do so through a novel equivariant vector field construction.

2 Related Work

Our work builds directly on pre-existing manifold normalizing flow models and enables them to leverage inherent symmetries through equivariance. In this section we cover important developments from the relevant fields: manifold normalizing flows and equivariant machine learning.

Normalizing Flows on Manifolds

Normalizing flows on Euclidean space have long been touted as powerful generative models [10, 6, 21]. Similar to GANs [20] and VAEs [24], normalizing flows learn to map samples from a tractable prior density to a target density. However, unlike the aforementioned models, normalizing flows account for changes in volume, enabling exact evaluation of the output probability density. In a rather concrete sense, this makes them theoretically principled. As such, they are ideal candidates for generalization beyond the Euclidean setting, where a careful, theoretically principled modelling approach is necessary.

Motivated by recent developments in geometric deep learning [4], many methods have extended normalizing flows to Riemannian manifolds. Rezende et al. [38] introduced constructions specific to tori and spheres, while Bose et al. [2] introduced constructions for hyperbolic space. Following this work, Lou et al. [29], Mathieu and Nickel [30], Falorsi and Forré [15] concurrently introduced a general construction by extending Neural ODEs [6] to the setting of Riemannian manifolds. Our work takes inspiration from the methods of Lou et al. [29], Mathieu and Nickel [30] and generalizes them further to enable learning that takes into account symmetries of the target density.

Equivariant Machine Learning

Motivated by the observation that many classic neural network architectures incorporate symmetry as an inductive bias, recent work has leveraged symmetries inherent in data through the concept of equivariance [7, 9, 8, 27, 18, 37]. Köhler et al. [26], in particular, used equivariant normalizing flows to enable learning symmetric densities over Euclidean space. The authors note their approach is better suited to density learning in some physical chemistry settings (when compared to general purpose normalizing flows), since they take into account the symmetries of the problem.

Symmetries also appear naturally in the context of learning densities over manifolds. While in many cases symmetry can be a good inductive bias for learning⁴⁴4For example, asteroid impacts on the sphere can be modelled as being approximately invariant to rotation about the Earth’s axis., for certain test tasks it is a strict requirement. For example, Boyda et al. [3] introduced equivariant flows on $SU(n)$ for use in lattice gauge theories, where the modelled distribution must be conjugation invariant. However, beyond conjugation invariant learning on $SU(n)$ [3], not much other work has been done for learning invariant distributions over manifolds. Our work bridges this gap by introducing the first general equivariant manifold normalizing flow model for arbitrary manifolds and symmetries.

3 Background

In this section, we provide a terse overview of necessary concepts for understanding our paper. In particular, we address fundamental notions from Riemannian geometry as well as the basic set-up of normalizing flows on manifolds. For a more detailed introduction to Riemannian geometry, we refer the reader to textbooks such as Lee [28] and Kobyzev et al. [25].

3.1 Riemannian Geometry

A Riemannian manifold $(\mathcal{M},h)$ is an $n$ -dimensional manifold with a smooth collection of inner products $(h_{x})_{x\in\mathcal{M}}$ for every tangent space $T_{x}\mathcal{M}$ . The Riemannian metric $h$ induces a distance $d_{h}$ on the manifold.

A diffeomorphism $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{M}$ is a differentiable bijection with differentiable inverse. A diffeomorphism $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{M}$ is called an isometry if $h(D_{x}f(u),D_{x}f(v))=h(u,v)$ for all tangent vectors $u,v\in T_{x}\mathcal{M}$ where $D_{x}f$ is the differential of $f$ . Note that isometries preserve the manifold distance function. The collection of all isometries forms a group $G$ , which we call the isometry group of the manifold $\mathcal{M}$ .

Riemannian metrics also allow for a natural analogue of gradients on $\mathbb{R}^{n}$ . For a function $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathbb{R}$ , we define the Riemannian gradient $\nabla_{x}f$ to be the vector on $T_{x}\mathcal{M}$ such that $h(\nabla_{x}f,v)=D_{x}f(v)$ for $v\in T_{x}\mathcal{M}$ .

3.2 Normalizing Flows on Manifolds

Manifold Normalizing Flow

Let $(\mathcal{M},h)$ be a Riemannian manifold. A normalizing flow on $\mathcal{M}$ is a diffeomorphism $f_{\theta}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathcal{M}$ (parametrized by $\theta$ ) that transforms a prior density $\rho$ to model density $\rho_{f_{\theta}}$ . The model distribution can be computed via the Riemannian change of variables⁵⁵5Here, $\det_{h}$ is the determinant function with volume induced by the Riemannian metric $h$ .:

\rho_{f_{\theta}}(x)=\rho\left(f_{\theta}^{-1}(x)\right)\left|{\textstyle\det_{h}}D_{x}f_{\theta}^{-1}\right|.

Manifold Continuous Normalizing Flow

A manifold continuous normalizing flow with base point $z$ is a function $\gamma\mathrel{\mathop{\mathchar 58\relax}}[0,\infty)\to\mathcal{M}$ that satisfies the manifold ODE

\frac{d\gamma(t)}{dt}=X(\gamma(t),t)\text{,}\qquad\gamma(0)=z.

We define $F_{X,T}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{M}$ , $z\mapsto F_{X,T}(z)$ to map any base point $z\in\mathcal{M}$ to the value of the CNF starting at $z$ , evaluated at time $T$ . This function is known as the (vector field) flow of $X$ .

3.3 Equivariance and Invariance

Let $G$ be an isometry subgroup of $\mathcal{M}$ . We notate the action of an element $g\in G$ on $\mathcal{M}$ by the map $L_{g}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathcal{M}$ .

Equivariant and Invariant Functions We say that a function $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{N}$ is equivariant if, for all isometries $g_{x}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{M}$ and $g_{y}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{N}\to\mathcal{N}$ , we have $f\circ g_{x}=g_{y}\circ f$ . We say a function $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathcal{N}$ is invariant if $f\circ g_{x}=f$ .

Equivariant Vector Fields Let $X\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\times[0,\infty)\to T\mathcal{M}$ , $X(m,t)\in T_{m}\mathcal{M}$ be a time-dependent vector field on manifold $\mathcal{M}$ , with base point $x_{0}\in\mathcal{M}$ . $X$ is a $G$ -equivariant vector field if $\forall(m,t)\in\mathcal{M}\times[0,\infty)$ , $X(L_{g}m,t)=(D_{m}L_{g})X(m,t)$ .

Equivariant Flows A flow $f\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathcal{M}$ is $G$ -equivariant if it commutes with actions from $G$ , i.e. we have $L_{g}\circ f=f\circ L_{g}$ .

Invariance of Density A density $\rho$ on a manifold $\mathcal{M}$ is $G$ -invariant if, for all $g\in G$ and $x\in\mathcal{M}$ , $\rho(L_{g}x)=\rho(x)$ , where $L_{g}$ is the action of $g$ on $x$ .

4 Invariant Densities from Equivariant Flows

Our goal in this section is to describe a tractable way to learn a density over a manifold that obeys a symmetry given by an isometry subgroup $G$ . Since this cannot be done directly and it is not clear how a manifold continuous normalizing flow can be altered to preserve symmetry, we will derive the following implications to yield a tractable solution:

1.

$G$ -invariant potential $\Rightarrow$ $G$ -equivariant vector field (Theorem 1). We show that given a $G$ -invariant potential function $\Phi\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathbb{R}$ , the vector field $\nabla\Phi$ is $G$ -equivariant.
2.

$G$ -equivariant vector field $\Rightarrow$ $G$ -equivariant flow (Theorem 2). We show that a $G$ -equivariant vector field on $\mathcal{M}$ uniquely induces a $G$ -equivariant flow.
3.

$G$ -equivariant flow $\Rightarrow$ $G$ -invariant density (Theorem 3). We show that given a $G$ -invariant prior $\rho$ and a $G$ -equivariant flow $f$ , the flow density $\rho_{f}$ is $G$ -invariant.

These are constructed in the same spirit as the theorems in Köhler et al. [26] (which also appeared in Papamakarios et al. [34]), although we note that our results are significantly more general. In addition to extending the domain to Riemannian manifolds, we consider arbitrary symmetry groups while Köhler et al. [26] only considers the linear Lie group $SO(n)$ . As a result, our proof techniques are based on heavy geometric machinery instead of straightforward linear algebra techniques.

If we have a prior distribution on the manifold that obeys the requisite invariance, then the above implications show that we can use a $G$ -invariant potential to produce a flow that, in tandem with the CNF framework, learns an output density with the desired invariance. We claim that constructing a $G$ -invariant potential function on a manifold is far simpler than directly parameterizing a $G$ -invariant density or a $G$ -equivariant flow. We shall give explicit examples of $G$ -invariant potential constructions in Section 5.2 that induce a desired density invariance.

Moreover, we show in Theorem 4 that considering equivariant flows generated from invariant potential functions suffices to learn any smooth distribution over a closed manifold, as measured by Kullback-Leibler divergence.

We defer the proofs of all theorems to the appendix.

4.1 Equivariant Gradient of Potential Function

We start by showing how to construct $G$ -equivariant vector fields from $G$ -invariant potential functions.

To design an equivariant vector field $X$ , it is sufficient to set the vector field dynamics of $X$ as the gradient of some $G$ -invariant potential function $\Phi\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathbb{R}$ . This is formalized in the following theorem.

Theorem 1.

Let $(\mathcal{M},h)$ be a Riemannian manifold and $G$ be its group of isometries (or an isometry subgroup). If $\Phi\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\to\mathbb{R}$ is a smooth $G$ -invariant function, then the following diagram commutes for any $g\in G$ :

or $\nabla_{L_{g}u}\Phi=D_{u}L_{g}(\nabla_{u}\Phi)$ . Hence $\nabla\Phi$ is a $G$ -equivariant vector field. This condition is also tight in the sense that it only occurs if $G$ is the isometry subgroup.

Hence, as long as one can construct a $G$ -invariant potential function, one can obtain the desired equivariant vector field. By this construction, a parameterization of $G$ -invariant potential functions yields a parameterization of (some) $G$ -equivariant vector fields.

4.2 Constructing Equivariant Manifold Flows from Equivariant Vector Fields

To construct equivariant manifold flows, we will use tools from the theory of manifold ODEs. In particular, there exists a natural correspondence between equivariant flows and equivariant vector fields. We formalize this in the following theorem:

Theorem 2.

Let $(\mathcal{M},h)$ be a Riemannian manifold, and $G$ be its isometry group (or one of its subgroups). Let $X$ be any time-dependent vector field on $\mathcal{M}$ , and $F_{X,T}$ be the flow of $X$ . Then $X$ is a $G$ -equivariant vector field if and only if $F_{X,T}$ is a $G$ -equivariant vector field flow.

Hence we can obtain an equivariant flow from an equivariant vector field, and vice versa.

4.3 Invariant Manifold Densities from Equivariant Flows

We now show that $G$ -equivariant flows induce $G$ -invariant densities. Note that we require the group $G$ to be an isometry subgroup in order to control the density of $\rho_{f}$ , and the following theorem does not hold for general diffeomorphism subgroups.

Theorem 3.

Let $(\mathcal{M},h)$ be a Riemannian manifold, and $G$ be its isometry group (or one of its subgroups). If $\rho$ is a $G$ -invariant density on $\mathcal{M}$ , and $f$ is a $G$ -equivariant diffeomorphism, then $\rho_{f}$ is also $G$ -invariant.

In the context of manifold normalizing flows, Theorem 3 implies that if the prior density on $\mathcal{M}$ is $G$ -invariant and the flow is $G$ -equivariant, the resulting output density will be $G$ -invariant. In the context of the overall set-up, this reduces the problem of constructing a $G$ -invariant density to the problem of constructing a $G$ -invariant potential function.

4.4 Sufficiency of Flows Generated via Invariant Potentials

It is unclear whether equivariant flows induced by invariant potentials can learn arbitrary invariant distributions over manifolds. In particular, it is reasonable to have some concerns about limited expressivity, since it is unclear whether any equivariant flow can be generated in this way. We alleviate these concerns for our use cases by proving that equivariant flows obtained from invariant potential functions suffice to learn any smooth invariant distribution over a closed manifold, as measured by Kullback-Leibler (KL) divergence.

Theorem 4.

Let $(\mathcal{M},h)$ be a closed Riemannian manifold. Let $\pi$ be a smooth, non-vanishing distribution over $\mathcal{M}$ , which will act as our target distribution. Let $\rho_{t}$ be a distribution over said manifold parameterized by a real time variable $t$ , with $\rho_{0}$ acting as the initial distribution. Let $D_{KL}(\rho_{t}||\pi)$ denote the Kullback–Leibler divergence between distributions $\rho_{t}$ and $\pi$ . If we choose a $g\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathbb{R}$ such that

g(x)=\log\left(\frac{\pi(x)}{\rho_{t}(x)}\right),

and if $\rho_{t}$ evolves with $t$ as the distribution of a flow according to $g$ , it follows that

\frac{\partial}{\partial t}D_{KL}(\rho_{t}\|\pi)=-\int_{\mathcal{M}}\rho_{t}\exp(g)\|\nabla g\|^{2}\>dx=-\int_{\mathcal{M}}\pi\|\nabla g\|^{2}\>dx

implying convergence of $\rho_{t}$ to $\pi$ in $KL$ . Moreover, the exact diffeomorphism that takes us from $\rho_{0}\rightarrow\pi$ is as follows. Given some initial point $x\in\mathcal{M}$ , let $u(t)$ be the solution to the initial value problem given by:

\frac{du(t)}{dt}=\nabla g(t),\qquad u(0)=x

The desired diffeomorphism maps $x$ to $\lim_{t\rightarrow\infty}u(t)$ .

Hence if the target distribution is $\pi$ , the current distribution is $\rho_{0}$ , and $g$ as defined above is the potential from which the flow controlling the evolution of $\rho_{t}$ is obtained, then $\rho_{t}$ converges to $\pi$ in $KL$ . This means that considering flows generated by invariant potential functions is sufficient to learn any smooth invariant target distribution on a closed manifold (as measured by KL divergence).

5 Learning Invariant Densities with Equivariant Flows

In this section, we discuss implementation details of the methodology given in Section 4. In particular, we describe the equivariant manifold flow model, provide two examples of invariant potential constructions on different manifolds, and discuss how training is performed depending on the target task.

5.1 Equivariant Manifold Flow Model

For our equivariant flow model, we first construct a $G$ -invariant potential function $\Phi\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathbb{R}$ (we show how to construct these potentials in Section 5.2). The equivariant flow model works by using automatic differentiation [35] on $\Phi$ to obtain $\nabla\Phi$ , using this $\nabla\Phi$ for the vector field, and integrating in a step-wise fashion over the manifold. Specifically, forward integration and change-in-density (divergence) computations utilize the Riemannian Continuous Normalizing Flows [30] framework. This flow model is used in tandem with a specific training procedure (described in Section 5.3) to obtain a $G$ -invariant model density that approximates some target.

5.2 Constructing $G$ -invariant Potential Functions

In this subsection, we present two constructions of invariant potentials on manifolds. Note that a symmetry of a manifold (i.e. action by an isometry subgroup) will leave part of the manifold free. The core idea of our invariant potential construction is to parameterize a neural network on the free portion of the manifold. While the two constructions we give below are certainly not exhaustive, they illustrate the versatility of our method, which is applicable to general manifolds and symmetries.

5.2.1 Isotropy Invariance on $S^{2}$

Consider the sphere $S^{2}$ , which is the Riemannian manifold $\{v\in\mathbb{R}^{3}\mathrel{\mathop{\mathchar 58\relax}}\left\|v\right\|=1\}$ with the induced pullback metric. The isotropy group for a point $v$ is defined as the subgroup of the isometry group which fixes $v$ , i.e. the set of rotations around an axis that passes through $v$ . In practice, we let $v=(0,0,1)$ , so the isotropy group is the group of rotations on the $xy$ -plane. An isotropy invariant density would be invariant to such rotations, and hence would look like a horizontally-striped density on the sphere (see Figure 4(a)).

Invariant Potential Parameterization

We design an invariant potential by applying a neural network to the free parameter. In the case of our specific isotropy group listed above, the free parameter is the $z$ -coordinate. The invariant potential is simply a $2$ -input neural network with the spatial input being the $z$ -coordinate and the time input being the time during integration. As a result of this design, we see that the only variance in the learned distribution that uses this potential will be along the $z$ -axis, as desired.

Prior Distributions

For proper learning with a normalizing flow, we need a prior distribution on the sphere that respects the isotropy invariance. There are many isotropy invariant potentials on the sphere. Natural choices include the uniform density (which is invariant to all rotations) and the wrapped distribution with the center at $v$ [40, 33]. For our experiments, we use the uniform density.

5.2.2 Conjugation Invariance on $SU(n)$

For many applications in physics (specifically gauge theory and lattice quantum field theory), one works with the Lie Group $SU(n)$ — the group of unitary matrices with determinant $1$ . In particular, when modelling probability distributions on $SU(n)$ for lattice QFT, the desired distribution must be invariant under conjugation by $SU(n)$ [3]. Conjugation is an isometry on $SU(n)$ (see Appendix A.5), so we can model probability distributions invariant under this action with our developed theory.

Invariant Potential Parameterization

We want to construct a conjugation invariant potential function $\Phi\mathrel{\mathop{\mathchar 58\relax}}SU(n)\to\mathbb{R}$ . Note that matrix conjugation preserves eigenvalues. Thus, for a function $\Phi\mathrel{\mathop{\mathchar 58\relax}}SU(n)\to\mathbb{R}$ to be invariant to matrix conjugation, it has to act on the eigenvalues of $x\in SU(n)$ as a multi-set.

We can parameterize such potential functions $\Phi$ by the DeepSet network from [45]. DeepSet is a permutation invariant neural network that acts on the eigenvalues, so the mapping of $x\in SU(n)$ is $\Phi(x)=\hat{\Phi}(\{\lambda_{1}(x),\ldots,\lambda_{n}(x)\})$ for some set function $\hat{\Phi}$ . We append the integration time to the input of the standard neural network layers in the DeepSet network.

As a result of this design, we see that the only variance in the learned distribution will be amongst non-similar matrices, while all similar matrices will be assigned the same density value.

Prior Distributions

For the prior distribution of the flow, we need a distribution that respects the matrix conjugation invariance. We use the Haar measure on $SU(n)$ , which is the uniform density over this manifold that is symmetric under gauge symmetry [3]. The volume element of the Haar measure is given for an $x\in SU(n)$ as $\mathrm{Haar}(x)=\prod_{i<j}|\lambda_{i}(x)-\lambda_{j}(x)|^{2}$ . We can sample from and compute the log probabilities with respect to this distribution efficiently with standard matrix computations [32].

5.3 Training Paradigms for Equivariant Manifold Flows

There are two notable ways in which we can use the model described in Section 5.1. Namely, we can use it to learn to sample from a distribution for which we have a density function, or we can use it to learn the density given a way to sample from the distribution. These training paradigms are useful in different contexts, as we will see in Section 6.

Learning to sample given an exact density.

In certain settings, we are given an exact density and the task is to learn a tractable sampler for the distribution. For example in Boyda et al. [3], we are given conjugation-invariant densities on $SU(n)$ for which we know the exact density function (without knowledge of any normalizing constants). In contrast to procedures for normalizing flow training that use negative log-likelihood based losses, we do not have access to samples from the target distribution. Instead, we train our models by sampling from the Haar distribution on $SU(n)$ , computing the KL divergence between the probabilities that our model assigns to these samples and the probabilities of the target distribution evaluated at these samples, and backpropagating from this KL divergence loss. When this loss is minimized, we can sample from the target distribution by sampling the prior, then forwarding the prior samples through our model. In the context of Boyda et al. [3], such a flow-based sampler is important for modelling gauge theories.

Learning the density given a sampler.

In other settings, we are given a way to sample from a target distribution and want to learn the precise density for downstream tasks. For this setting, we sample the target distribution, use our flow to map it to a tractable prior, and use a negative log-likelihood-based loss. The flow will eventually learn to assign higher probabilities in sampled regions, and in doing so, will learn to approximate the target density.

6 Experiments

In this section, we utilize instantiations of equivariant manifold flows to learn densities over various manifolds of interest that are invariant to certain symmetries. First, we construct flows on $SU(n)$ that are invariant to conjugation by $SU(n)$ ; these are useful for lattice quantum field theory [3]. In this setting, our model outperforms the construction of Boyda et al. [3].

As a second application, we model asteroid impacts on Earth by constructing flow models on $S^{2}$ that are invariant to the isotropy group that fixes the north pole. Our approach is able to overcome dataset bias, as only land impacts are reported in the dataset.

Finally, to demonstrate the need for enforcing equivariance of flow models, we directly compare our flow construction with a general purpose flow while learning a density with an inherent symmetry. The densities we decided to use for this purpose are sphere densities that are invariant to action by the isotropy group. Our model is able to learn these densities much better than previous manifold ODE models that do not enforce equivariance of flows [29], thus showing the ability of our model to leverage the desired symmetries. In fact, even on simple isotropy-invariant densities, our model succeeds while the free model without equivariance fails.

6.1 $SU(n)$ Gauge Equivariant Neural Network Flows

Learning $SU(n)$ gauge equivariant neural network flows is important for obtaining good flow-based samplers of densities on $SU(n)$ useful for lattice quantum field theory [3]. We compare our model for $SU(n)$ gauge equivariant flows (Section 5.2.2) with that of Boyda et al. [3]. For the sake of staying true to the application area, we follow the framework of Boyda et al. [3] in learning densities on $SU(n)$ that are invariant to conjugation by $SU(n)$ . In particular, our goal is to learn a flow to model a target distribution so that we may efficiently sample from it.

As mentioned above in Section 5.3, this setting follows the first paradigm in which we are given exact density functions and learn how to sample.

For the actual architecture of our equivariant manifold flows, we parameterize our potentials as DeepSet networks on eigenvalues as detailed in Section 5.2.2. The prior distribution for our model is also the Haar (uniform) distribution on $SU(n)$ . Further training details are given in Appendix C.1.

6.1.1 $SU(2)$

Figure 2(a) displays learned densities for our model and the model of Boyda et al. [3] in the case of three particular densities on $SU(2)$ described in Appendix C.2.1. While both models match the target distributions well in high-density regions, we find that our model exhibits a considerable improvement in lower-density regions, where the tails of our learned distribution decay faster. By contrast, the model of Boyda et al. [3] seems to be unable to reduce mass near $\pm\pi$ , a possible consequence of their construction. Even in high-density regions, our model appears to vary smoothly, with fewer unnecessary bumps and curves when compared to the densities of the model in Boyda et al. [3].

6.1.2 $SU(3)$

Figure 2(b) displays learned densities for our model and the model of Boyda et al. [3] in the case of three particular densities on $SU(3)$ described in Appendix C.2.2. In this case, we see that our models fit the target densities more accurately and better respect the geometry of the target distribution. Indeed, while the learned densities of Boyda et al. [3] are often sharp and have pointed corners, our models learn densities that vary smoothly and curve in ways that are representative of the target distributions.

6.2 Asteroid Impact Dataset Bias Correction

We also showcase our model’s ability to correct for dataset bias. In particular, we consider the test case of modelling asteroid impacts on Earth. Towards this end, many preexisting works have compiled locations of previous asteroid impacts [31, 14], but modelling these datasets is challenging since they are inherently biased. In particular, all recorded impacts are found on land. However, ocean impacts are also dangerous [42] and should be properly modelled. To correct for this bias, we note that the distribution of asteroid impacts should be invariant with respect to the rotation of the Earth. We apply our isotropy invariant $S^{2}$ flow (described in Section 5.2.1) to model the asteroid impact locations given by the dataset Meteorite Landings [31] ⁶⁶6This dataset was released by NASA without a specified license.. Training happens in the setting of the second paradigm described in Section 5.3, since we can easily sample the target distribution and aim to learn the density. We visualize our results in Figure 3.

6.3 Modelling Invariance Matters

We also show that our equivariant condition on the manifold flow matters for efficient and accurate training when the target distribution is invariant. In particular, we again consider the sphere under the action of the isotropy group. We try to learn the isotropy invariant density given in Figure 4(a) and compare the results of our equivariant flow against those of a predefined manifold flow that does not explicitly model the symmetry [29]. While other manifold flow models have been proposed for the sphere [38], NMODE outperforms them [29], so we use NMODE as a strong baseline. We train for 100 epochs with a learning rate of $0.001$ and a batch size of $200$ ; our results are shown in Figure 4.

Despite our equivariant flow having fewer parameters (as both flows have the same width and the equivariant flow has an input dimension of $1$ ), our model is able to capture the distribution much better than NMODE [29]. This is due to the inductive bias of our equivariant model which explicitly leverages the underlying symmetry.

7 Conclusion

In this work, we introduce equivariant manifold flows in a fully general context and provide the necessary theory to ensure our construction is principled. We also demonstrate the efficacy of our approach in the context of learning conjugation invariant densities over $SU(2)$ and $SU(3)$ , which is an important task for sampling $SU(n)$ lattice gauge theories in quantum field theory. In particular, we show that our method can more naturally adhere to the geometry of the target densities when compared to prior work while being more generally applicable. We also present an application to modelling asteroid impacts and demonstrate the necessity of modelling existing invariances by comparing against a regular manifold flow.

Further considerations. While our theory and implementations have utility in very general settings, there are still some limitations that could be addressed in future work. Further research may focus on finding other ways to generate equivariant manifold flows that do not rely on the construction of an invariant potential, and perhaps additionally on showing that such methods are sufficiently expressive to learn over open manifolds. Our models also require a fair bit of tuning to achieve results as strong as we demonstrate. Finally, we note that our theory and learning algorithm are too abstract for us to be sure of the future societal impacts. Still, we advance the field of deep generative models, which is known to have potential for negative impacts through malicious generation of fake images and text. Nevertheless, we do not expect this work to have negative effects in this area, as our applications are not in this domain.

Acknowledgements

We would like to thank Facebook AI for funding equipment that made this work possible. In addition, we thank the National Science Foundation for awarding Prof. Christopher de Sa a grant that helps fund this research effort (NSF IIS-2008102) and for supporting Aaron Lou with a graduate student fellowship. We would also like to acknowledge Jonas Köhler and Denis Boyda for their useful insights.

References

Abel [1824] Niels Henrik Abel. Mémoire sur les équations algébriques, où on demontre l’impossibilité de la résolution de l’équation générale du cinquième dégré. 1824.
Bose et al. [2020] Joey Bose, Ariella Smofsky, Renjie Liao, Prakash Panangaden, and Will Hamilton. Latent variable modelling with hyperbolic normalizing flows. In Proceedings of the 37th International Conference on Machine Learning, pages 1045–1055, 2020.
Boyda et al. [2020] Denis Boyda, Gurtej Kanwar, Sébastien Racanière, Danilo Jimenez Rezende, Michael S Albergo, Kyle Cranmer, Daniel C Hackett, and Phiala E Shanahan. Sampling using $su(n)$ gauge equivariant flows. arXiv preprint arXiv:2008.05456, 2020.
Bronstein et al. [2021] Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Velivcković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
Bump [2004] Daniel Bump. Lie groups. Springer, 2004.
Chen et al. [2018] Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, volume 31, pages 6571–6583, 2018.
Cohen and Welling [2016] Taco Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of The 33rd International Conference on Machine Learning, pages 2990–2999, 2016.
Cohen et al. [2019] Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral CNN. In Proceedings of the 36th International Conference on Machine Learning, pages 1321–1330, 2019.
Cohen et al. [2018] Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. In International Conference on Learning Representations, 2018.
Dinh et al. [2017] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. In International Conference on Learning Representations, 2017.
do Carmo [1992] Manfredo Perdigão do Carmo. Riemannian geometry / Manfredo do Carmo ; translated by Francis Flaherty. Mathematics. Theory and applications. Birkhäuser, Boston, 1992. ISBN 0817634908.
Donnelly [2006] Harold G. Donnelly. Eigenfunctions of the laplacian on compact riemannian manifolds. Asian Journal of Mathematics, 10:115–126, 2006.
Durkan et al. [2019] Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. In Advances in Neural Information Processing Systems, volume 32, 2019.
[14] Earth Impact Database. Earth impact database, 2011. Retrieved from http://passc.net/EarthImpactDatabase.
Falorsi and Forré [2020] Luca Falorsi and Patrick Forré. Neural ordinary differential equations on manifolds. arXiv preprint arXiv:2006.06663, 2020.
Feiten et al. [2013] Wendelin Feiten, Muriel Lang, and Sandra Hirche. Rigid motion estimation using mixtures of projected gaussians. Proceedings of the 16th International Conference on Information Fusion, pages 1465–1472, 2013.
Field [1980] MJ Field. Equivariant dynamical systems. Transactions of the American Mathematical Society, 259(1):185–205, 1980.
Finzi et al. [2020] Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
Gallier and Quaintance [2020] Jean Gallier and Jocelyn Quaintance. Differential Geometry and Lie Groups: A Computational Perspective, volume 12. Springer, 2020.
Goodfellow et al. [2014] Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, page 2672–2680, 2014.
Grathwohl et al. [2019] Will Grathwohl, Ricky T Q Chen, Jesse Bettencourt, and David Duvenaud. Scalable reversible generative models with free-form continuous dynamics. In International Conference on Learning Representations, 2019.
Hamelryck et al. [2006] Thomas Hamelryck, John T Kent, and Anders Krogh. Sampling realistic protein conformations using local structural bias. PLoS Computational Biology, 2(9), 2006.
Kingma and Ba [2015] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
Kingma and Welling [2014] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
Kobyzev et al. [2020] Ivan Kobyzev, Simon Prince, and Marcus Brubaker. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
Köhler et al. [2020] Jonas Köhler, Leon Klein, and Frank Noe. Equivariant flows: Exact likelihood generative learning for symmetric densities. In Proceedings of the 37th International Conference on Machine Learning, pages 5361–5370, 2020.
Kondor and Trivedi [2018] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Proceedings of the 35th International Conference on Machine Learning, pages 2747–2755, 2018.
Lee [2013] John M Lee. Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer New York, 2013.
Lou et al. [2020] Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, and Christopher De Sa. Neural manifold ordinary differential equations. In Advances in Neural Information Processing Systems, volume 33, pages 17548–17558, 2020.
Mathieu and Nickel [2020] Emile Mathieu and Maximilian Nickel. Riemannian continuous normalizing flows. In Advances in Neural Information Processing Systems, volume 33, pages 2503–2515, 2020.
[31] Meteorite Landings. Meteorite landings dataset, March 2017. Retrieved from https://data.world/nasa/meteorite-landings.
Mezzadri [2007] Francesco Mezzadri. How to generate random matrices from the classical compact groups. Notices of the American Mathematical Society, 54:592–604, 2007.
Nagano et al. [2019] Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, and Masanori Koyama. A wrapped normal distribution on hyperbolic space for gradient-based learning. In Proceedings of the 36th International Conference on Machine Learning, pages 4693–4702, 2019.
Papamakarios et al. [2021] George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021.
Paszke et al. [2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zach DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In Neural Information Processing System Autodiff Workshop, 2017.
Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019.
Rezende et al. [2019] Danilo Jimenez Rezende, Sébastien Racanière, Irina Higgins, and Peter Toth. Equivariant hamiltonian flows. arXiv preprint arXiv:1909.13739, 2019.
Rezende et al. [2020] Danilo Jimenez Rezende, George Papamakarios, Sebastien Racaniere, Michael Albergo, Gurtej Kanwar, Phiala Shanahan, and Kyle Cranmer. Normalizing flows on tori and spheres. In Proceedings of the 37th International Conference on Machine Learning, pages 8083–8092, 2020.
Risken and Frank [1996] Hannes Risken and Till Frank. The Fokker-Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer Berlin Heidelberg, 1996.
Skopek et al. [2020] Ondrej Skopek, Octavian-Eugen Ganea, and Gary Bécigneul. Mixed-curvature variational autoencoders. In International Conference on Learning Representations, 2020.
Wang et al. [2019] Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, and Mathieu Salzmann. Backpropagation-friendly eigendecomposition. In Advances in Neural Information Processing Systems, volume 32, 2019.
Ward and Asphaug [2003] Steven N Ward and Erik Asphaug. Asteroid impact tsunami of 2880 March 16. Geophysical Journal International, 153(3):F6–F10, 2003.
Wasserman [1969] Arthur G Wasserman. Equivariant differential topology. Topology, 8(2):127–150, 1969.
Wirnsberger et al. [2020] Peter Wirnsberger, Andrew J Ballard, George Papamakarios, Stuart Abercrombie, Sébastien Racanière, Alexander Pritzel, Danilo Jimenez Rezende, and Charles Blundell. Targeted free energy estimation via learned mappings. The Journal of Chemical Physics, 153(14):144112, 2020.
Zaheer et al. [2017] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in Neural Information Processing Systems, volume 30, pages 3391–3401, 2017.

Appendix

Appendix A Proof of Theorems

In this section, we restate and prove the theorems in Section 4. These give the theoretical foundations that we use to build our models. Prior work [43, 17] addresses some of the results we formalize below.

A.1 Proof of Theorem 1

Theorem 1.

or $\nabla_{L_{g}u}\Phi=D_{u}L_{g}(\nabla_{u}\Phi)$ . This is condition is also tight in the sense that it only occurs if $G$ is the group of isometries.

Proof.

We first recall the Riemannian gradient chain rule:

\nabla_{u}(\Phi\circ L_{g})=(D_{u}L_{g})^{\top}(\nabla_{L_{g}u}\Phi)

where $(D_{u}L_{g})^{\top}\mathrel{\mathop{\mathchar 58\relax}}T_{L_{g}u}\mathcal{M}\to T_{u}\mathcal{M}$ is the “adjoint" given by

h\left(D_{u}L_{g}(v),w\right)=h\left(v,(D_{u}L_{g})^{\top}(w)\right).

Since $L_{g}$ is an isometry, we also have

h(x,y)=h\left(D_{u}L_{g}(x),D_{u}L_{g}(y)\right).

Combining the above two equations gives

h(x,y)=h(D_{u}L_{g}(x),D_{u}L_{g}(y))=h\left(x,(D_{u}L_{g})^{\top}\left(D_{u}L_{g}(y)\right)\right),

which implies for all $y$ ,

h\left(x,y-(D_{u}L_{g})^{\top}(D_{u}L_{g}(y))\right)=0.

Since $h$ is a Riemannian metric (even pseudo-metric works due to non-degeneracy), we must have that $(D_{u}L_{g})^{\top}\circ(D_{u}L_{g})=I$ .

To complete the proof, we recall that $\Phi=\Phi\circ L_{g}$ , and this combined with chain rule gives

\nabla_{u}\Phi=\nabla_{u}(\Phi\circ L_{g})=(D_{u}L_{g})^{\top}(\nabla_{L_{g}u}\Phi).

Now applying $D_{u}L_{g}$ on both sides gives

\nabla_{L_{g}u}\Phi=D_{u}L_{g}\nabla_{u}\Phi

which is exactly what we want to show.

We see that this is an “only if" condition because we must necessarily get that the adjoint is the inverse, which implies that $L_{g}$ is an isometry. ∎

A.2 Proof of Theorem 2

Theorem 2.

Let $(\mathcal{M},h)$ be a Riemannian manifold, and $G$ be its isometry group (or one of its subgroups). Let $X$ be any time-dependent vector field on $\mathcal{M}$ , and $F_{X,T}$ be the flow of $X$ . Then $X$ is an $G$ -equivariant vector field if and only if $F_{X,T}$ is a $G$ -equivariant flow for any $T\in[0,+\infty)$ .

Proof.

$\boldsymbol{G}$ -equivariant $\boldsymbol{X\Rightarrow}$ $\boldsymbol{G}$ -equivariant $\boldsymbol{F_{X,T}}$ . We invoke the following lemma from Lee [28, Corollary 9.14]:

Lemma 1.

Let $F\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}\rightarrow\mathcal{N}$ be a diffeomorphism. If $X$ is a smooth vector field over $\mathcal{M}$ and $\theta$ is the flow of X, then the flow of $F_{*}X$ ( $F_{*}$ is another notation for the differential of $F$ ) is $\eta_{t}=F\circ\theta_{t}\circ F^{-1}$ , with domain $N_{t}=F(M_{t})$ for each $t\in\mathbb{R}$ .

Examine $L_{g}$ and its action on $X$ . Since $X$ is $G$ -equivariant, we have for any $(x,t)\in\mathcal{M}\times[0,+\infty)$ ,

((L_{g})_{*}X)(x,t)=(D_{L_{g}^{-1}(x)}L_{g})X(L_{g}^{-1}(x),t)=X(L_{g}\circ L_{g}^{-1}(x),t)=X(x,t)

so it follows that $(L_{g})_{*}X=X$ . Applying the lemma above, we get:

F_{(L_{g})_{*}X,T}=L_{g}\circ F_{X,T}\circ L_{g}^{-1}

and, by simplifying, we get that $F_{X,T}\circ L_{g}=L_{g}\circ F_{X,T}$ , as desired.

$\boldsymbol{G}$ -equivariant $\boldsymbol{X\Leftarrow}$ $\boldsymbol{G}$ -equivariant $\boldsymbol{F_{X,T}}$ . This direction follows from the chain rule. If $F_{X,T}$ is $G$ -equivariant, then at all times we have:

$\displaystyle(D_{m}L_{g})\left(X(F_{X,t}(m),t\right)$	$\displaystyle=(D_{m}L_{g})\left(\frac{d}{dt}F_{X,T}(m)\right)$	(definition)
	$\displaystyle=\frac{d}{dt}(L_{g}\circ F_{X,T})(m)$	(chain rule)
	$\displaystyle=\frac{d}{dt}F_{X,T}(L_{g}m)$	(equivariance)
	$\displaystyle=X(L_{g}(F_{X,t}(m)),t)$	(definition)

This concludes the proof of the backward direction. ∎

A.3 Proof of Theorem 3

Theorem 3.

Proof.

We wish to show $\rho_{f}(x)$ is also $G$ -invariant, i.e. $\rho_{f}(L_{g}x)=\rho_{f}(x)$ for all $g\in G,x\in\mathcal{M}$ .

We first recall the definition of $\rho_{f}$ :

\rho_{f}(x)=\rho\left(f^{-1}(x)\right)\left|\det\frac{\partial f^{-1}(x)}{\partial x}\right|=\rho\left(f^{-1}(x)\right)\left|\det J_{f^{-1}}(x)\right|.

Since $f\in C^{1}(\mathcal{M},\mathcal{M})$ is $G$ -equivariant, we have $f\circ L_{g}=L_{g}\circ f$ for any $g\in G$ . Also, since $\rho$ is $G$ -invariant, we have $\rho\circ L_{g}=\rho$ . Combining these properties, we see that:

$\displaystyle\rho_{f}(L_{g}x)$	$\displaystyle=\rho_{f}(L_{g}x)\frac{\|\det J_{L_{g}}(x)\|}{\|\det J_{L_{g}}(x)\|}=\frac{\rho_{R_{g^{-1}}\circ f}(x)}{\|\det J_{L_{g}}(x)\|}$	(expanding definition of $\rho_{f}$ )
	$\displaystyle=\frac{\rho_{f\circ R_{g^{-1}}}(x)}{\|\det J_{L_{g}}(x)\|}=\rho\left((L_{g}\circ f^{-1})(x)\right)\frac{\|\det J_{L_{g}\circ f^{-1}}(x)\|}{\|\det J_{L_{g}}(x)\|}$	$\displaystyle\text{(G-equivariance of }f)$
	$\displaystyle=(\rho\circ L_{g}\circ f^{-1})(x)\frac{\|\det J_{L_{g}}(f^{-1}(x))J_{f^{-1}}(x)\|}{\|\det J_{L_{g}}(x)\|}$	(expanding Jacobian)
	$\displaystyle=(\rho\circ f^{-1})(x)\frac{\|\det J_{L_{g}}(f^{-1}(x))\|\|\det J_{f^{-1}}(x)\|}{\|\det J_{L_{g}}(x)\|}$	$\displaystyle\text{(G-invariance of }\rho)$
	$\displaystyle=\rho(f^{-1}(x))\|\det J_{f^{-1}}(x)\|\cdot\frac{\|\det J_{L_{g}}(f^{-1}(x))\|}{\|\det J_{L_{g}}(x)\|}$	(rearrangement)
	$\displaystyle=\rho_{f}(x)\cdot\frac{\|\det J_{L_{g}}(f^{-1}(x))\|}{\|\det J_{L_{g}}(x)\|}$	(expanding definition of $\rho_{f}$ )

Now note that $G$ is contained in the isometry group, and thus $L_{g}$ is an isometry. This means $|\det J_{L_{g}}(x)|=1$ for any $x\in\mathcal{M}$ , so the right-hand side above is simply $\rho_{f}(x)$ , which proves the theorem. ∎

A.4 Proof of Theorem 4

Theorem 4.

g(x)=\log\left(\frac{\pi(x)}{\rho_{t}(x)}\right),

and if $\rho_{t}$ evolves with $t$ as the distribution of a flow according to $g$ , it follows that

\frac{\partial}{\partial t}D_{KL}(\rho_{t}\|\pi)=-\int_{\mathcal{M}}\rho_{t}\exp(g)\|\nabla g\|^{2}\>dx=-\int_{\mathcal{M}}\pi\|\nabla g\|^{2}\>dx

\frac{du(t)}{dt}=\nabla g(t),\qquad u(0)=x

The desired diffeomorphism maps $x$ to $\lim_{t\rightarrow\infty}u(t)$ .

Proof.

1) Derivative of $D_{KL}(\rho_{t}||\pi)$ . We start by noting the following: by the Fokker-Planck equation, $\rho_{t}$ evolving as a flow according to $g$ is equivalent to

\frac{\partial\rho_{t}}{\partial t}=\nabla\cdot(\rho_{t}\nabla g).

Please observe that since $\rho_{t}$ is defined as being a solution to the Fokker-Planck equation [39], $\rho_{t}$ will be a family of densities. In particular, the Fokker-Planck equation describes the time evolution of a probability density function.

Keeping Fokker-Planck in mind, we obtain the following expression for the time derivative of $D_{KL}(\rho_{t}||\pi)$ :

\begin{split}\frac{\partial}{\partial t}D_{KL}(\rho_{t}||\pi)&=\int\frac{\pi}{\rho_{t}}\frac{\partial\rho_{t}}{\partial t}\>dx\\ &=\int\frac{\pi}{\rho_{t}}\nabla\cdot(\rho_{t}\nabla g)\>dx\\ &=\int\left(\nabla\cdot\left(\frac{\pi}{\rho_{t}}(\rho_{t}\nabla g)\right)-(\rho_{t}\nabla g)\cdot\nabla\cdot\frac{\pi}{\rho_{t}}\right)\>dx\\ &=\int\left(\nabla\cdot(\pi\nabla g)-(\rho_{t}\nabla g)\cdot\nabla\frac{\pi}{\rho_{t}}\right)\>dx\\ &=-\int(\rho_{t}\nabla g)\cdot\nabla\frac{\pi}{\rho_{t}}\>dx,\end{split}

where the final equality follows from the divergence theorem, since the integral of the divergence over a closed manifold is $0$ . Now if we choose $g$ such that:

g(x)=\log\left(\frac{\pi(x)}{\rho_{t}(x)}\right).

Then we have:

\begin{split}\frac{\partial}{\partial t}D_{KL}(\rho_{t}||\pi)&=-\int(\rho_{t}\nabla g)\cdot\nabla\exp(g)\>dx\\ &=-\int\rho_{t}\exp(g)||\nabla g||^{2}\>dx,\end{split}

2) Proof of convergence. Consider:

\frac{\partial\rho_{t}}{\partial t}=\nabla\cdot(\rho_{t}\nabla g)

where $g$ is defined as above. Note by standard existence and uniqueness results for differential equations on manifolds (for example, see do Carmo [11]) we have the existence of a solution, $\rho_{t}$ for all time $t>0$ , to this differential equation with initial value $\rho_{0}$ .

Now note $g$ , expressed as a function of $\rho_{t}$ , is an invariant potential, the flow of which maps $\rho_{0}$ to $\lim_{t\rightarrow\infty}\rho_{t}$ . By the result above, we know the right-hand-side of the equation:

\begin{split}\frac{\partial}{\partial t}D_{KL}(\rho_{t}||\pi)&=-\int\rho_{t}\exp(g)||\nabla g||^{2}\>dx,\end{split}

must approach $0$ (since the $KL$ -divergence cannot continue decreasing at any constant rate, as it must be non-negative). The only way the right-hand-side can be $0$ is when $\nabla g=0$ , which can occur only when $\rho_{t}=\pi$ . This concludes the proof of convergence of $\rho_{t}\rightarrow\pi$ in $KL$ .

3) Showing diffeomorphism $\rho_{0}\rightarrow\pi$ is well-defined. The exact diffeomorphism from $\rho\rightarrow\pi$ is as follows. Given some initial point $x\in\text{supp}(\rho)$ , let $u(t)$ be the solution to the initial value problem given by:

\frac{du(t)}{dt}=\nabla g(t),\qquad u(0)=x

$g$ is defined as before. Note $u(t)$ exists and is unique by standard differential equation uniqueness and existence results [11]. We claim the desired diffeomorphism maps $x$ to $\lim_{t\rightarrow\infty}u(t)$ . All that remains is to show (a) convergence to a smooth function at the limit and (b) that equivariance of the diffeomorphism does not break at the limit. We begin by showing this for $\pi$ uniform and finish the proof by extending to $\pi$ general.

$\pi$ uniform. For simplicity, we first consider the case where $\pi$ is the uniform (Haar) measure. In this case, the differential equation that $\rho$ obeys reduces to the heat equation, namely:

\frac{\partial\rho}{\partial t}=\Delta\rho

(a) Please note the following: an important fact that makes harmonic analysis on compact manifolds possible is that the spectrum of the Laplacian on any compact manifold must be discrete, i.e. its eigenvalues are countable and tend to infinity [12]. Also, its eigenvectors must be smooth (intuitively this says harmonic analysis is “nice" on manifolds in the same way that Fourier analysis is nice on the unit circle).

Note also that the Laplacian is Hermitian and negative semidefinite, and moreover that the only eigenfunction for eigenvalue $0$ is the constant vector.

Both facts above imply the solution to the above differential equation will just be the sum of several exponentially decaying (in $t$ ) terms and a constant term, given by the harmonic expansion of $\rho_{0}$ .

From here, it follows that the $L^{2}$ distance between $\rho_{t}$ and the constant potential is just the sum of squares of the coefficients in front of those terms (this is simply the manifold analog of Parseval’s theorem). However, all of those terms are decaying exponentially, so it follows that $\rho_{t}$ converges in the $L^{2}$ norm to the constant potential⁷⁷7Please note that if we wanted some other type of convergence, e.g. pointwise convergence, we could get this as well using a similar argument, by analyzing the decay properties of the eigenvalues/eigenvectors of the Laplacian..

(b) Additionally, note that if the initialization $\rho_{0}$ is $G$ -invariant, then it is fairly easy to see that all the terms in its harmonic expansion must also be $G$ -invariant. As a result, $\rho$ must be $G$ -invariant at all times, and must remain $G$ -invariant in the limit. Similarly, its flow must be $G$ -equivariant.

$\pi$ general. We have shown the desired properties for the case of $\pi$ uniform. However, the general case is entirely analogous, as the modified operator (involving $\pi$ ) has all the same relevant properties as the Laplacian (it is just generally better known that the Laplacian has these properties).

∎

A.5 Conjugation by $SU(n)$ is an Isometry

We now prove a lemma that shows that the group action of conjugation by $SU(n)$ is an isometry subgroup. This implies that Theorems 1 through 3 above can be specialized to the setting of $SU(n)$ .

Lemma 2.

Let $G$ be the group action of conjugation by $SU(n)$ , and let each $L_{g}$ represent the corresponding action of conjugation by $g\in SU(n)$ . Then $G$ is an isometry subgroup.

Proof.

We first show that the matrix conjugation action of $SU(n)$ is unitary. For $R,X\in SU(n)$ , note that the action of conjugation is given by $\mathrm{vec}(RXR^{-1})=(R^{-T}\otimes R)\mathrm{vec}(X)$ . We have that $R^{-T}\otimes R$ is unitary because:

	$\displaystyle(R^{-T}\otimes R)^{*}(R^{-T}\otimes R)$
	$\displaystyle=(\overline{R^{-1}}\otimes R^{*})(R^{-T}\otimes R)$	(conjugate transposes distribute over $\otimes$ )
	$\displaystyle=(\overline{R^{-1}}R^{-T})\otimes(R^{*}R)$	(mixed-product property of $\otimes$ )
	$\displaystyle=(R^{T}R^{-T})\otimes(I)=(I)\otimes(I)=I_{n^{2}\times n^{2}}$	(simplification)

Now choose an orthonormal frame $X_{1},\ldots,X_{m}$ of $T\mathcal{M}$ . Note that $T\mathcal{M}$ locally consists of $SU(n)$ shifts of the algebra, which itself consists of traceless skew-Hermitian matrices [19]. We show $G$ is an isometry subgroup by noting that when it acts on the frame, the resulting frame is orthonormal. Let $g\in G$ , and consider the result of action of $g$ on the frame, namely $L_{g}X_{1},\ldots,L_{g}X_{m}$ . Then we have:

(L_{g}X_{i})^{*}(L_{g}X_{j})=X^{*}_{i}R^{*}_{g}L_{g}X_{j}=X^{*}_{i}X_{j}.

Note for $i\neq j$ , we have $X^{*}_{i}X_{j}=0$ and for $i=j$ we see $X^{*}_{i}X_{i}=1$ . Hence the resulting frame is orthonormal and $G$ is an isometry subgroup. ∎

Appendix B Manifold Details for the Special Unitary Group $SU(n)$

In this section, we give a basic introduction to the special unitary group $SU(n)$ and relevant properties.

Definition. The special unitary group $SU(n)$ consists of all $n$ -by- $n$ unitary matrices $U$ (i.e. $U^{*}U=UU^{*}=1$ for $U^{*}$ the conjugate transpose of $U$ ) that have determinant $\det(U)=1$ .

Note that $SU(n)$ is a smooth manifold; in particular, it has Lie structure [19]. Moreover, the tangent space at the identity (i.e. the Lie algebra) consists of traceless skew-Hermitian matrices [19]. The Riemannian metric is $\mathrm{tr}(A^{\top}B)$ .

B.1 Haar Measure on $SU(n)$

Haar Measure. Haar measures are generic constructs of measures on topological groups $G$ that are invariant under group operation. For example, the Lie group $G=SU(n)$ has Haar measure $\mu_{H}\mathrel{\mathop{\mathchar 58\relax}}SU(n)\rightarrow\mathbb{R}$ , which is defined as the unique measure such that for any $U\in SU(n)$ , we have

\mu_{H}(VU)=\mu_{H}(UW)=\mu_{H}(U)

for all $V,W\in SU(n)$ and $\mu_{H}(G)=1$ .

A topological group $G$ together with its unique Haar measure defines a probability space on the group. This gives one natural way of defining probability distributions on the group, explaining its importance in our construction of probability distributions on Lie groups, specifically $SU(n)$ .

To make the above Haar measure definition more concrete, we note from Bump [5, Proposition 18.4] that we can transform an integral over $SU(n)$ with respect to the Haar measure into integrating over the corresponding diagonal matrices under eigendecomposition:

\int\limits_{SU(n)}fd\mu_{H}=\frac{1}{n!}\int\limits_{T}f(\mathrm{diag}(\lambda_{1},\ldots,\lambda_{n}))\prod\limits_{i<j}|\lambda_{i}-\lambda_{j}|d\lambda.

Thus, we can think of the Haar measure as inducing the change of variables with volume element

\mathrm{Haar}(x)=\prod_{i<j}|\lambda_{i}(x)-\lambda_{j}(x)|^{2}.

To sample uniformly from the Haar measure, we just need to ensure that we are sampling each $x\in SU(n)$ with probability proportional to $\text{Haar}(x)$ .

Sampling from the Haar Prior. We use Algorithm 1 [32] for generating a sample uniformly from the Haar prior on $SU(n)$ :

Sample $Z\in\mathbb{C}^{n\times n}$ where each entry $Z_{ij}=Z_{ij}^{(1)}+iZ_{ij}^{(2)}$ for independent random variables $Z_{ij}^{(1)},Z_{ij}^{(2)}\sim\mathcal{N}(0,1/2)$ .

Let $Z=QR$ be the QR Factorization of $Z$ .

Let $\Lambda=\mathrm{diag}(\frac{R_{11}}{|R_{11}|},\ldots,\frac{R_{nn}}{|R_{nn}|})$ .

Output $Q^{\prime}=Q\Lambda$ as distributed with Haar measure.

Algorithm 1 Sampling from the Haar Prior on

SU(n)

B.2 Eigendecomposition on $SU(n)$

One main step in the invariant potential computation for $SU(n)$ is to derive formulas for the eigendecomposition of $U\in SU(n)$ as well as formulas for double differentiation through the eigendecomposition (recall that we must differentiate the $SU(n)$ -invariant potential $\Phi$ to get $SU(n)$ -equivariant vector field $\nabla\Phi$ and another time to produce gradients to optimize this). During the initial submission of our paper, a general implementation of this for complex matrices did not exist. Furthermore, while various specialized numerical techniques have been developed [41] to perform this computation, the implementation of these was unnecessary for our test cases of $n=2,3$ . Instead, we derived explicit formulas for the eigenvalues based on finding roots of the characteristic polynomials (given by root formulas for quadratic/cubic equations). Note that this procedure does not scale to higher dimensions since there does not exist a closed form solution for $n>4$ [1]. However, concurrently released versions of PyTorch [36] introduced twice differentiable complex eigendecomposition, allowing one to easily extend our methods to higher dimensions.

B.2.1 Explicit Formula for $SU(2)$

We now derive an explicit eigenvalue formula for the $U\in SU(2)$ case. Let us denote $U=\begin{bmatrix}a+bi&-c+di\\ c+di&a-bi\end{bmatrix}$ for $a,b,c,d\in\mathbb{R}$ such that $a^{2}+b^{2}+c^{2}+d^{2}=1$ as an element of $SU(2)$ ; then the characteristic polynomial of this matrix is given by

\det(\lambda I-U)=(\lambda-(a+bi))(\lambda-(a-bi))+(c+di)(c-di)=(a-\lambda)^{2}+b^{2}+c^{2}+d^{2}=\lambda^{2}-2a\lambda+1

and thus its eigenvalues are given by

\lambda_{1}=a+i\sqrt{1-a^{2}}=a+i\sqrt{b^{2}+c^{2}+d^{2}}

\lambda_{2}=a-i\sqrt{1-a^{2}}=a-i\sqrt{b^{2}+c^{2}+d^{2}}

Remark. We note that there is a natural isomorphism $\phi\mathrel{\mathop{\mathchar 58\relax}}S^{3}\rightarrow SU(2)$ , given by

\phi(a,b,c,d)=\begin{bmatrix}a+bi&-c+di\\ c+di&a-bi\end{bmatrix}

We can exploit this isomorphism by learning a flow over $S^{3}$ with a regular manifold flow like NMODE [29] and mapping it to a flow over $SU(2)$ . This is also an acceptable way to obtain stable density learning over $SU(2)$ .

B.2.2 Explicit Formula for $SU(3)$

We now derive an explicit eigenvalue formula for the $U\in SU(3)$ case. For the case of $U\in SU(3)$ , we can compute the characteristic polynomial as

	$\displaystyle\det(\lambda I-U)$	$\displaystyle=\det\left(\begin{bmatrix}\lambda-U_{11}&-U_{12}&-U_{13}\\ -U_{21}&\lambda-U_{22}&-U_{23}\\ -U_{31}&-U_{32}&\lambda-U_{33}\end{bmatrix}\right)$
		$\displaystyle=\lambda^{3}+c_{2}\lambda^{2}+c_{1}\lambda+c_{0}$

where

c_{2}=-(U_{11}+U_{22}+U_{33})

c_{1}=U_{11}U_{22}+U_{22}U_{33}+U_{33}U_{11}-U_{12}U_{21}-U_{23}U_{32}-U_{13}U_{31}

c_{0}=-(U_{12}U_{23}U_{31}+U_{13}U_{21}U_{32}+U_{11}U_{22}U_{33}-U_{12}U_{21}U_{33}-U_{13}U_{31}U_{22}-U_{23}U_{32}U_{11})

Now to solve the equation

\lambda^{3}+c_{2}\lambda^{2}+c_{1}\lambda+c_{0}=0

we first transform it into a depressed cubic

t^{3}+pt+q=0

where we make the transformation

t=x+\frac{c_{2}}{3}

p=\frac{3c_{1}-c_{2}^{2}}{3}

q=\frac{2c_{2}^{3}-9c_{2}c_{1}+27c_{0}}{27}

Now from Cardano’s formula, we have the cubic roots of the depressed cubic given by

\lambda_{1,2,3}=\sqrt[3]{-\frac{q}{2}+\sqrt{\frac{q^{2}}{4}+\frac{p^{3}}{27}}}+\sqrt[3]{-\frac{q}{2}-\sqrt{\frac{q^{2}}{4}+\frac{p^{3}}{27}}}

where the two cubic roots in the above equation are picked such that they multiply to $-\frac{p}{3}$ .

Appendix C Experimental Details for Learning Equivariant Flows on $SU(n)$

This section presents some additional details regarding the experiments that learn invariant densities on $SU(n)$ in Section 6.

For the evaluation, we found that ESS (effective sample size) was not a good metric to compare learned densities in this context. In particular, we noticed that several degenerate (mode collapsed) densities were able to attain near perfect ESS while completely failing on matching the target distribution geometry. Given that Boyda et al. [3] did not release code and reported ESS only for certain test cases, we decided to exclude ESS as a metric from our paper and instead relied directly on distribution geometry visualization.

C.1 Training Details

Our DeepSet network [45] consists of a feature extractor and regressor. The feature extractor is a $1$ -layer tanh network with $32$ hidden channels. We concatenate the time component to the sum component of the feature extractor before feeding the resulting $33$ size tensor into a $1$ -layer tanh regressor network.

To train our flows, we minimize the KL divergence between our model distribution and the target distribution [34], as is done in Boyda et al. [3]. In a training iteration, we draw a batch of samples uniformly from $SU(n)$ , map them through our flow, and compute the gradients with respect to the batch KL divergence between our model probabilities and the target density probabilities. We use the Adam stochastic optimizer for gradient-based optimization [23]. The graph shown in Figure 2 was trained for $300$ iterations with a batch size of $8192$ and weight decay setting of $0.01$ ; the starting learning rate for Adam was $0.01$ , and a multi-step learning rate schedule that decreased the learning rate by a factor of $10$ every $100$ epochs was used. We use PyTorch to implement our models and run experiments [36]. Experiments are run on one CPU and/or GPU at a time, where we use one NVIDIA RTX 2080Ti GPU with 11 GB of GPU RAM.

We note that during our implementation, there are specific parts of the code that involved careful tuning for effective training. Specifically, we perturbed the results of certain functions and gradients by small constants to ensure numerical stability of the training process. We also spent some time tuning the learning rate and some ODE settings. More details can be found in the accompanying Github code.

C.2 Conjugation-Invariant Target Distributions

Boyda et al. [3] defined a family of matrix-conjugation-invariant densities on $SU(n)$ as:

p_{toy}(U)=\frac{1}{Z}e^{\frac{\beta}{n}\text{Re}\>\text{tr}\left(\sum_{k}c_{k}U^{k}\right)},

which is parameterized by scalars $c_{k}$ and $\beta$ . The normalizing constant $Z$ is chosen to ensure that $p_{toy}$ is a valid probability density with respect to the Haar measure.

More specifically, the experiments of Boyda et al. [3] focus on learning to sample from the distribution with the above density with three components, in the following form:

p_{toy}(U)=\frac{1}{Z}e^{\frac{\beta}{n}\text{Re}\>\text{tr}\left(c_{1}U+c_{2}U^{2}+c_{3}U^{3}\right)}

We tested on three instances of the density, also used in Boyda et al. [3]:

set $i$	$c_{1}$	$c_{2}$	$c_{3}$	$\beta$
1	0.98	-0.63	-0.21	9
2	0.17	-0.65	1.22	9
3	1	0	0	9

Table 1: Sets of parameters

c_{1},c_{2},c_{3}

and

\beta

used in the

SU(2)

and

SU(3)

experiments

Note that the rows of Figure 2 correspond to coefficient sets $3,2,1$ , given in order from top to bottom.

C.2.1 Case for $SU(2)$

In the case of $n=2$ , we can represent the eigenvalues of a matrix $U\in SU(2)$ in the form $e^{i\theta},e^{-i\theta}$ for some angle $\theta\in[0,\pi]$ . We then have $\mathrm{tr}(U)=e^{i\theta}+e^{-i\theta}=2\cos(\theta)$ , so above density takes the form:

p_{toy}(U)=\frac{1}{Z}e^{c_{1}\beta\cos\theta}\cdot e^{c_{2}\beta\cos(2\theta)}\cdot e^{c_{3}\beta\cos(3\theta)}.

C.2.2 Case for $SU(3)$

In the case of $n=3$ , we can represent the eigenvalues of $U\in SU(3)$ in the form $e^{i\theta_{1}},e^{i\theta_{2}},e^{i(-\theta_{1}-\theta_{2})}$ . Thus, we have

\text{Re}\>\text{tr}(U)=\frac{1}{3}\left(\cos(\theta_{1})+\cos(\theta_{2})+\cos(-\theta_{1}-\theta_{2})\right)

and thus

	$\displaystyle p_{toy}(U)$	$\displaystyle=\frac{1}{Z}e^{\frac{c_{1}\beta}{3}\left(\cos(\theta_{1})+\cos(\theta_{2})+\cos(-\theta_{1}-\theta_{2})\right)}$
		$\displaystyle\cdot e^{\frac{c_{2}\beta}{3}\left(\cos(2\theta_{1})+\cos(2\theta_{2})+\cos(-2\theta_{1}-2\theta_{2})\right)}$
		$\displaystyle\cdot e^{\frac{c_{3}\beta}{3}\left(\cos(3\theta_{1})+\cos(3\theta_{2})+\cos(-3\theta_{1}-3\theta_{2})\right)}$

Appendix D Learning Continuous Normalizing Flows over Manifolds with Boundary

Motivation. Recall that learning a continuous normalizing flow over a manifold with boundary is not principled, and is rather numerically unstable, since probability mass can “flow out" on the boundary. In particular we noted in Section 1 that this was a major problem for the quotient manifold approach to learning invariant densities, since the quotient frequently has a nonempty boundary.

Our Approach. Our method enables learning flows over manifolds with boundary. One need only represent the manifold with boundary as a quotient of a larger manifold without boundary and learn with an invariant potential function that ensures the density descends smoothly from the larger manifold without boundary to the manifold with boundary.

Example. For instance, one can use our method to construct a flow over an interval. Notice that we can view an interval $I=[0,1]$ as a manifold with boundary. The boundary consists of the two endpoints, $\{0,1\}$ . To use our method to learn a flow over this interval, we need only represent $[0,1]$ as the quotient of $S^{2}$ by the isotropy group at the north pole, then apply the flow construction described in Section 5.2.1. The learned density assigns the same value to all points at the same latitude: clearly, this descends to a density over $[0,1]$ by taking one representative point from each latitude circle. Notice that this works more generally: we can represent various manifolds with boundary as quotients of larger manifolds by isotropy groups. In particular, one can imagine using this method to replace neural spline flows [13], which carefully constructs noncontinuous normalizing flows over intervals.

Equivariant Manifold Flows

Abstract

1 Introduction

2 Related Work

Normalizing Flows on Manifolds

Equivariant Machine Learning

3 Background

3.1 Riemannian Geometry

3.2 Normalizing Flows on Manifolds

Manifold Normalizing Flow

Manifold Continuous Normalizing Flow

3.3 Equivariance and Invariance

4 Invariant Densities from Equivariant Flows

4.1 Equivariant Gradient of Potential Function

Theorem 1.

4.2 Constructing Equivariant Manifold Flows from Equivariant Vector Fields

Theorem 2.

4.3 Invariant Manifold Densities from Equivariant Flows

Theorem 3.

4.4 Sufficiency of Flows Generated via Invariant Potentials

Theorem 4.

5 Learning Invariant Densities with Equivariant Flows

5.1 Equivariant Manifold Flow Model

5.2 Constructing GG-invariant Potential Functions

5.2.1 Isotropy Invariance on S2S^{2}

Invariant Potential Parameterization

Prior Distributions

5.2.2 Conjugation Invariance on S​U​(n)SU(n)

Invariant Potential Parameterization

Prior Distributions

5.3 Training Paradigms for Equivariant Manifold Flows

Learning to sample given an exact density.

Learning the density given a sampler.

6 Experiments

6.1 S​U​(n)SU(n) Gauge Equivariant Neural Network Flows

6.1.1 S​U​(2)SU(2)

6.1.2 S​U​(3)SU(3)

6.2 Asteroid Impact Dataset Bias Correction

6.3 Modelling Invariance Matters

7 Conclusion

Acknowledgements

References

Appendix A Proof of Theorems

A.1 Proof of Theorem 1

Theorem 1.

Proof.

A.2 Proof of Theorem 2

Theorem 2.

Proof.

Lemma 1.

A.3 Proof of Theorem 3

Theorem 3.

Proof.

A.4 Proof of Theorem 4

Theorem 4.

Proof.

A.5 Conjugation by S​U​(n)SU(n) is an Isometry

Lemma 2.

Proof.

Appendix B Manifold Details for the Special Unitary Group S​U​(n)SU(n)

B.1 Haar Measure on S​U​(n)SU(n)

B.2 Eigendecomposition on S​U​(n)SU(n)

B.2.1 Explicit Formula for S​U​(2)SU(2)

B.2.2 Explicit Formula for S​U​(3)SU(3)

Appendix C Experimental Details for Learning Equivariant Flows on S​U​(n)SU(n)

C.1 Training Details

C.2 Conjugation-Invariant Target Distributions

C.2.1 Case for S​U​(2)SU(2)

C.2.2 Case for S​U​(3)SU(3)

Appendix D Learning Continuous Normalizing Flows over Manifolds with Boundary

5.2 Constructing $G$ -invariant Potential Functions

5.2.1 Isotropy Invariance on $S^{2}$

5.2.2 Conjugation Invariance on $SU(n)$

6.1 $SU(n)$ Gauge Equivariant Neural Network Flows

6.1.1 $SU(2)$

6.1.2 $SU(3)$

A.5 Conjugation by $SU(n)$ is an Isometry

Appendix B Manifold Details for the Special Unitary Group $SU(n)$

B.1 Haar Measure on $SU(n)$

B.2 Eigendecomposition on $SU(n)$

B.2.1 Explicit Formula for $SU(2)$

B.2.2 Explicit Formula for $SU(3)$

Appendix C Experimental Details for Learning Equivariant Flows on $SU(n)$

C.2.1 Case for $SU(2)$

C.2.2 Case for $SU(3)$