vskip=5pt,leftmargin=15pt,rightmargin=15pt
A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning
Abstract
Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning.
Fundamental symmetries, such as Poincaré invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe.
Symmetry is essential to achieving this extrapolatory power in machine learning applications.
For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance.
In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways:
1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data.
We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles.
We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative.
We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models.
We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.
Keywords: Symmetries, machine learning, Lie groups, manifolds, invariance, equivariance, neural networks, deep learning
1 Introduction
Symmetry is present throughout nature, and according to David Gross (1996) the discovery of fundamental symmetries has played an increasingly central role in physics since the beginning of the 20th century. He asserts that
“Einstein’s great advance in 1905 was to put symmetry first, to regard the symmetry principle as the primary feature of nature that constrains the allowable dynamical laws.”
According to Einstein’s special theory of relativity, physical laws including those of electromagnetism and quantum mechanics are Poincaré-invariant, meaning that after predictable transformations (actions of the Poincaré group), these laws can be applied in any non-accelerating reference frame, anywhere in the universe, at all times. Specifically these transformations form a ten-parameter group including four translations of space-time, three rotations of space, and three shifts or “boosts” in velocity. For small boosts of velocity, these transformations become the Galilean symmetries of classical mechanics. Similarly, the theorems of Euclidean geometry are unchanged after arbitrary translations, rotations, and reflections, comprising the Euclidean group. In fluid mechanics, the conformal (angle-preserving) symmetry of Laplace’s equation is used to reduce the study of idealized flows in complicated geometries to canonical flows in simple domains. In dynamical systems, the celebrated theorem of Noether (1918) establishes a correspondence between symmetries and conservation laws, an idea which has become a central pillar of mechanics (Abraham and Marsden, 2008). These examples illustrate the diversity of symmetry groups and their physical applications. More importantly, they illustrate how symmetric models and theories in physics automatically extrapolate in explainable ways to environments beyond the available data.
In machine learning, models that exploit symmetry can be trained with less data and use fewer parameters compared to their asymmetric counterparts. Early examples include augmenting data with known transformations (see Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001)) or using convolutional neural networks (CNNs) to achieve translation invariance for image processing tasks (see Fukushima (1980); LeCun et al. (1989)). More recently, equivariant neural networks respecting Euclidean symmetries have achieved state-of-the-art performance for predicting potentials in molecular dynamics Batzner et al. (2022). As with physical laws, symmetries and invariances allow machine learning models to extrapolate beyond the training data, and achieve high performance with fewer modeling parameters.
However, many problems are only weakly symmetric. Gravity, friction, and other external forces can cause some or all of the Poincaré or Galilean symmetries to be broken. Interactions between particles can be viewed as breaking symmetries possessed by non-interacting particles. Written characters have translation and scaling symmetry, but not rotation (cf. and , d and p, N and Z) or reflection (cf. b and d, b and p). One of the main contributions of this work is to propose a method of enforcing a new principle of parsimony in machine learning. This principal of parsimony by maximal symmetry states that a model should break a symmetry within a set of physically reasonable transformations (such as Poincaré, Galilean, Euclidean, or conformal symmetry) only when there is sufficient evidence in the data.
In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in the following three ways:
-
Task 1.
Enforce. Train a model with known symmetry.
-
Task 2.
Discover. Identify the symmetries of a given model or data set.
-
Task 3.
Promote. Train a model with as many symmetries as possible (from among candidates), breaking symmetries only when there is sufficient evidence in the data.
While these tasks have been studied to varying extents separately, we show how they can be situated within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. As a special case, the Lie derivative recovers the linear constraints derived by Finzi et al. (2021) for weights in equivariant multilayer perceptrons. In full generality, we show that known symmetries can be enforced as linear constraints derived from Lie derivatives for a large class of problems including learning vector and tensor fields on manifolds as well as learning equivariant integral operators acting on such fields. For example the kernels of “steerable” CNNs developed by Weiler et al. (2018); Weiler and Cesa (2019) are constructed to automatically satisfy these constraints for the groups (rotations in three dimensions) and (rotations and translations in two dimensions). We show how analogous steerable networks for other groups, such as subgroups of , can be constructed by numerically enforcing linear constraints derived from the Lie derivative on integral kernels defining each layer. Symmetries, conservation laws, and symplectic structure can also be enforced when learning dynamical systems via linear constraints on the vector field. Again these constraints come from the Lie derivative and can be incorporated into neural network architectures and basis function regression models such as Sparse Identification of Nonlinear Dynamics (SINDy) (Brunton et al., 2016).
Moskalev et al. (2022) identifies the connected subgroup of symmetries of a trained neural network by computing the nullspace of a linear operator. Likewise, Kaiser et al. (2018, 2021) recovers the symmetries and conservation laws of a dynamical system by computing the nullspace of a different linear operator. We observe that these operators and others whose nullspaces encode the symmetries of more general models can be derived directly from the Lie derivative in a manner dual to the construction of operators used to enforce symmetry. Specifically, the nullspaces of the operators we construct reveal the largest connected subgroups of symmetries for enormous classes of models. This extends work by Gruver et al. (2022) using the Lie derivative to test whether a trained neural network is equivariant with respect to a given one-parameter group, e.g., rotation of images. Generalizing the ideas in Cahill et al. (2023), we also show that the symmetries of point clouds approximating underlying submanifolds can be recovered by computing the nullspaces of associated linear operators. This allows for the unsupervised mining of data for hidden symmetries.
The idea of relaxed symmetry has been introduced recently by Wang et al. (2022), along with architecture-specific symmetry-promoting regularization functions involving sums or integrals over the candidate group of transformations. The Augerino method introduced by Benton et al. (2020) uses regularization to promote equivariance with respect to a larger collection of candidate transformations. Promoting physical constraints through the loss function is also a core concept of Physics-Informed Neural Networks (PINNs) introduced by Raissi et al. (2019). Our approach to the third task (promoting symmetry) is to introduce a unified and broadly applicable class of convex regularization functions based on the Lie derivative to penalize symmetry breaking during training of machine learning models. As we describe above, the Lie derivative yields an operator whose nullspace corresponds to the symmetries of a given model. Hence, the lower the rank of this operator, the more symmetric the model is. The nuclear norm has been used extensively as a convex relaxation of the rank with favorable theoretical properties for compressed sensing and low-rank matrix recovery (Recht et al., 2010; Gross, 2011), as well as in robust PCA (Candès et al., 2011; Bouwmans et al., 2018). Penalizing the nuclear norm of our symmetry-encoding operator yields a convex regularization function that can be added to the loss when training machine learning models, including basis function regression and neural networks. Likewise, we use a nuclear norm penalty to promote conservation laws and Hamiltonicity with respect to candidate symplectic structures when fitting dynamical systems to data. This lets us train the model and enforce data-consistent symmetries simultaneously.
2 Executive summary
This paper provides a linear-algebraic framework to enforce, discover, and promote symmetry of machine learning models. To illustrate, consider a model defined by a function . By a symmetry, we mean an invertible transformation and an invertible linear transformation so that
(1) |
Examples to keep in mind are rotations and translations. If is a symmetry, then so is its inverse , and if is another symmetry, then so is the composition . We work with collections of transformations , called groups, that have an identity element and are closed under composition and inverses. Specifically, we consider Lie groups.
2.1 Enforcing symmetry
Given a group of transformations, the symmetry condition (1) imposes linear constraints on that can be enforced during the fitting process. However, there is one constraint per transformation, making direct enforcement impractical for continuous Lie groups such as rotations or translations. We observe that for smoothly-parametrized, connected groups, it suffices to consider a finite collection of infinitesimal linear constraints where is the defined “Lie derivative” defined by
(2) |
Notice that this expression is linear with respect to and with respect to .
2.2 Discovering symmetry
The symmetries of a given model form a subgroup that we seek to identify within a given group of candidates. For continuous Lie groups of transformations, the component of the subgroup containing the identity is revealed by the nullspace of the linear map
(3) |
More generally, the symmetries of a smooth surface in can be determined from data sampled from this surface by computing the nullspace of a positive semidefinite operator. When the surface is the graph of the function in , this operator is with being an adjoint operator.
2.3 Promoting symmetry
Here, we seek to learn a model that both fits data and possesses as many symmetries as possible from a given candidate group of transformations. Since the nullspace of the operator defined in (3) corresponds with the symmetries of , we seek to minimize the rank of during the training process. To do this, we regularize optimization problems for using a convex relaxation of the rank given by the nuclear norm (sum of singular values)
(4) |
This is convex with respect to because is linear and the nuclear norm is convex.
3 Related work
3.1 Enforcing symmetry
Data-augmentation, as reviewed by Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001), is one of the simplest ways to incorporate known symmetry into machine learning models. Usually this entails training a neural network architecture on training data to which known transformations have been applied. The theoretical foundations of these methods are explored by Chen et al. (2020). Data-augmentation has also been used by Benton et al. (2020) to construct equivariant neural networks by averaging the network’s output over transformations applied to the data. Finally, Brandstetter et al. (2022) applied data-augmentation strategies with known Lie-point symmetries for improving neural PDE solvers.
Symmetry can also be enforced directly on the machine learning architecture. For example, Convolutional Neural Networks (CNNs), introduced by Fukushima (1980) and popularized by LeCun et al. (1989), achieve translational equivariance by employing convolutional filters with trainable kernels in each layer. CNNs have been generalized to provide equivariance with respect to symmetry groups other than translation. Group-Equivariant CNNs (G-CNNs) (Cohen and Welling, 2016) provide equivariance with respect to arbitrary discrete groups generated by translations, reflections, and rotations. Rotational equivariance can be enforced on three-dimensional scalar, vector, or tensor fields using the 3D Steerable CNNs developed by Weiler et al. (2018). Spherical CNNs Cohen et al. (2018); Esteves et al. (2018) allow for rotation-equivariant maps to be learned for fields (such as projected images of 3D objects) on spheres. Essentially any group equivariant linear map (defining a layer of an equivariant neural network) acting fields can be described by group convolution (Kondor and Trivedi, 2018; Cohen et al., 2019), with the spaces of appropriate convolution kernel characterized by Cohen et al. (2019). Finzi et al. (2020) provides a practical way to construct convolutional layers that are equivariant with respect to arbitrary Lie groups and for general data types. For dynamical systems, Marsden and Ratiu (1999); Rowley et al. (2003); Abraham and Marsden (2008) describe techniques for symmetry reduction of the original problem to a quotient space where the known symmetry group has been factored out. Related approaches have been used by Peitz et al. (2023); Steyert (2022) to approximate Koopman operators for symmetric dynamical systems (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)).
A general method for constructing equivariant neural networks is introduced by Finzi et al. (2021), and relies on the observation that equivariance can be enforced through a set of linear constraints. For graph neural networks, Maron et al. (2018) characterizes the subspaces of linear layers satisfying permutation equivariance. Similarly, Ahmadi and Khadir (2020) shows that discrete symmetries and other types of side information can be enforced via linear or convex constraints in learning problems for dynamical systems. Our work builds on the results of Finzi et al. (2021), Weiler et al. (2018), Cohen et al. (2019), and Ahmadi and Khadir (2020) by showing that equivariance can be enforced in a systematic and unified way via linear constraints for large classes of functions and neural networks. Concurrent work by (Yang et al., 2024) shows how to enforce known or discovered Lie group symmetries on latent dynamics using hard linear constraints or soft penalties.
3.2 Discovering symmetry
Early work by Rao and Ruderman (1999); Miao and Rao (2007) used nonlinear optimization to learn infinitesimal generators describing transformations between images. Later, it was recognized by Cahill et al. (2023) that linear algebraic methods could be used to uncover the generators of continuous linear symmetries of arbitrary point clouds in Euclidean space. Similarly, Kaiser et al. (2018) and Moskalev et al. (2022) show how conserved quantities of dynamical systems and invariances of trained neural networks can be revealed by computing the nullspaces of associated linear operators. We connect these linear-algebraic methods to the Lie derivative, and provide generalizations to nonlinear group actions on manifolds. The Lie derivative has been used by Gruver et al. (2022) to quantify the extent to which a trained network is equivariant with respect to a given one-parameter subgroup of transformations. Our results show how the Lie derivative can reveal the entire connected subgroup of symmetries of a trained model via symmetric eigendecomposition.
More sophisticated nonlinear optimization techniques use Generative Adversarial Networks (GANs) to learn the transformations that leave a data distribution unchanged. These methods include SymmetryGAN developed by Desai et al. (2022) and LieGAN developed by Yang et al. (2023b). In contrast, our methods for detecting symmetry are entirely linear-algebraic.
While symmetries may exist in data, their representation may be difficult to describe. Yang et al. (2023a) develop Latent LieGAN (LaLieGAN) to extend LieGAN to find linear representations of symmetries in a latent space. Recently this has been applied to dynamics discovery (Yang et al., 2024). Likewise, Liu and Tegmark (2022) discover hidden symmetries by optimizing nonlinear transformations into spaces where candidate symmetries hold. Similar to our approach for promoting symmetry, they use a cost function to measure whether a given symmetry holds. In contrast, our regularization functions enable subgroups of candidate symmetry groups to be identified.
3.3 Promoting symmetry
Biasing a network towards increased symmetry can be accomplished through methods such as symmetry regularization. Analogous to the physics-informed loss developed in PINNs by Raissi et al. (2019) that penalize a solution for violating known dynamics, one can penalize symmetry violation for a known group; for example, Akhound-Sadegh et al. (2024) extends the PINN framework to penalize deviations of known Lie-point symmetries of a PDE. More generally, however, one can consider a candidate group of symmetries and promote as much symmetry as possible that is consistent with the available data. Wang et al. (2022) discusses these approaches, along with architecture-specific methods, including regularization functions involving summations or integrals over the candidate group of symmetries. While our regularization functions resemble these for discrete groups, we use a radically different regularization for continuous Lie groups. By leveraging the Lie algebra, our regularization functions eliminate the need to numerically integrate complicated functions over the group—a task that is already prohibitive for the -dimensional non-compact group of Galilean symmetries in classical mechanics.
Automated data augmentation techniques introduced by Cubuk et al. (2019); Hataya et al. (2020); Benton et al. (2020) are another class of methods that arguably promote symmetry. These techniques optimize the distribution of transformations applied to augment the data during training. For example “Augerino” is an elegant method developed by Benton et al. (2020) which averages an arbitrary network’s output over the augmentation distribution and relies on regularization to prevent the distribution of transformations from becoming concentrated near the identity. In essence, the regularization biases the averaged network towards increased symmetry.
In contrast, our regularization functions promote symmetry on an architectural level for the original network. This eliminates the need to perform averaging, which grows more costly for larger collections of symmetries. While a distribution over symmetries can be useful for learning interesting partial symmetries (e.g. stays for small rotations, before turning into ), as is done by Benton et al. (2020) and Romero and Lohit (2022), it is not clear how to use a continuous distribution over transformations to identify lower-dimensional subgroups which have measure zero. On the other hand, our linear-algebraic approach easily identifies and promotes symmetries in lower-dimensional connected subgroups.
3.4 Additional approaches and applications
There are several other approaches that incorporate various aspects of enforcing, discovering, and promoting symmetries. For example, Baddoo et al. (2023) developed algorithms to enforce and promote known symmetries in dynamic mode decomposition, through manifold constrained learning and regularization, respectively. Baddoo et al. (2023) also showed that discovering unknown symmetries is a dual problem to enforcing symmetry. Exploiting symmetry has also been a central theme in the reduced-order modeling of fluids for decades (Holmes et al., 2012). As machine learning methods are becoming widely used to develop these models (Brunton et al., 2020), the themes of enforcing and discovering symmetries in machine models are increasingly relevant. Known fluid symmetries have been enforced in SINDy for fluid systems (Loiseau and Brunton, 2018) through linear equality constraints; this approach was generalized to enforce more complex constraints (Champion et al., 2020). Unknown symmetries were similarly uncovered for electroconvective flows (Guan et al., 2021). Symmetry breaking is also important in many turbulent flows (Callaham et al., 2022).
4 Elementary theory of Lie group actions
This section provides background and notation required to understand the main results of this paper in the less abstract, but still remarkably useful setting of Lie groups acting on vector spaces. In Section 5 we use this theory to study the symmetries of continuously differentiable functions between vector spaces. Such functions form the basic building blocks of many machine learning models such as basis functions regression models, the layers of multilayer perceptrons, and the kernels of integral operators acting on spatial fields such as images. We emphasize that this is not the most general setting for our results, but we provide this section and simpler versions of our main Theorems in Section 5 in order to make the presentation more accessible. We develop our main results in the more general setting of fiber-linear Lie group actions on sections of vector bundles in Section 11.
4.1 Lie groups and subgroups
Lie groups are ubiquitous in science and engineering. Some familiar examples include the general linear group consisting of all real, invertible, matrices; the orthogonal group
(5) |
and the special Euclidean group
(6) |
representing rotations and translations in real -dimensional space, , embedded in via . Observe that the sets , , and contain the identity matrix and are closed under matrix multiplication and inversion, making them into (non-commutative) groups. They are also smooth manifolds, which makes them Lie groups (Lee, 2013). In general, a Lie group is a smooth manifold that is simultaneously an algebraic group whose composition and inversion operations are smooth maps. The identity element is usually denoted for “einselement”, which for a matrix Lie group is the identity matrix . This section summarizes some basic results that can be found in references such as (Abraham et al., 1988; Lee, 2013; Varadarajan, 1984; Hall, 2015).
The most useful and profound property of a Lie group is the fact that the group is almost entirely characterized by an associated vector space called its Lie algebra. This allows global nonlinear questions about the group — such as which elements leave a function unchanged — to be answered using linear algebra. If is a Lie group, its Lie algebra, commonly denoted or , is the vector space consisting of all smooth vector fields on that remain invariant when pushed forward by left translation . Translating back and forth from the identity element, the Lie algebra can be identified with the tangent space . For example, the Lie algebra of the orthogonal group consists of all skew-symmetric matrices, and is denoted
(7) |
A key fact is that the Lie algebra of is closed under the “Lie bracket” of vector fields111In a vector field is equivalent to a directional derivative operator . A vector field on a smooth manifold is defined as an analogous linear operator acting on the space of smooth functions. The Lie bracket is defined as the commutator of these operators. See Lee (2013).
(8) |
For matrix Lie groups, this corresponds to the same commutator of matrices , as shown by Theorem 3.20 in Hall (2015).
The key tool relating global properties of a Lie group back to its Lie algebra is the exponential map . A vector field has a unique integral curve passing through the identity and satisfying . The exponential map defined by
(9) |
reproduces the entire integral curve thanks to Proposition 20.5 in Lee (2013). Such an exponential curve is illustrated in Figure 1. For a matrix Lie group and , the exponential map is given by the matrix exponential
(10) |
Proposition 20.8 in (Lee, 2013) provides many of the basic properties of the exponential map, such as , , and . Perhaps the most important is that it provides a diffeomorphism of an open neighborhood of the origin in and an open neighborhood of the identity element in .
The connected component of containing the identity element is called the “identity component” of the Lie group and is denoted . Any element in this component can then be expressed as a finite product of exponentials thanks to Proposition 7.14 in (Lee, 2013), that is
(11) |
Moreover, the identity component is a normal subgroup of and all of the other connected components of are diffeomorphic cosets of (Proposition 7.15 in Lee (2013)), as we illustrate in Figure 1. For example, the special Euclidean group is connected, and thus equal to its identity component. On the other hand, the orthogonal group is compact and has two components consisting of orthogonal matrices whose determinants are and . The identity component of the orthogonal group is called the special orthogonal group and is denoted . It is a general fact that when a Lie group such as is connected and compact, it is equal to the image of the exponential map without the need to consider products of exponentials, see Tao (2011) and Appendix C.1 of Lezcano-Casado and Martınez-Rubio (2019).
A subgroup of a Lie group is called a “Lie subgroup” when is an immersed submanifold of and the group operations are smooth when restricted to . An immersed submanifold does not necessarily inherit its topology as a subset of , but rather has a topology and smooth structure such that the inclusion is smooth and its derivative is injective (see Lee (2013)). The tangent space to a Lie subgroup at the identity, defined as , is closed under the Lie bracket and thus forms a “Lie subalgebra” of , denoted . Conversely, a remarkable result stated by Theorem 19.26 in Lee (2013) shows that any subalgebra , that is, any subspace closed under the Lie bracket corresponds to a unique connected Lie subgroup satisfying . Later on, we will use this fact to identify the connected subgroups of symmetries of machine learning models based on infinitesimal criteria. Another remarkable and useful fact is the “closed subgroup theorem” stated as Theorem 20.12 in Lee (2013). It says that if is a closed subset and is closed under the group operations of , then is automatically an embedded Lie subgroup of . Interestingly, while a Lie subgroup need not be a closed subset, it turns out that can always be embedded as a closed subgroup in a larger , thanks to Theorem 9 in Gotô (1950).
4.2 Group representations, actions, and infinitesimal generators
A Lie group homomorphism is a smooth map between Lie groups that respects the group product, that is,
(12) |
The tangent map is a Lie algebra homomorphism by Theorem 8.44 in Lee (2013), meaning that it is a linear map respecting the Lie bracket:
(13) |
Moreover, (see Proposition 20.8 in Lee (2013)) the Lie group homomorphism and its induced Lie algebra homomorphism are related by the exponential maps on and via the identity
(14) |
Another fundamental result (Theorem 20.19 in Lee (2013)) is that any Lie algebra homomorphism corresponds to a unique Lie group homomorphism when is simply connected. When is the general linear group on a vector space, then the Lie group and Lie algebra homomorphisms are called Lie group and Lie algebra “representations”.
A Lie group can act on a vector space via a representation according to
(15) |
with and . More generally, a nonlinear right action of a Lie group on a manifold is any smooth map satisfying
(16) |
for every and . Figure 1 depicts the action of a Lie group on a manifold. We make frequent use of the maps , which have smooth inverses , and the “orbit maps” . For example, using a representation , the position and velocity of a particle in can be rotated and translated via the action
The positions and velocities of particles arranged as a vector can be simultaneously rotated and translated via an analogous representation .
The key fact about a group action is that it is almost completely characterized by a linear map called the infinitesimal generator. This map assigns to each element in the Lie algebra, a vector field on defined by
(17) |
The infinitesimal generator and its relation to the group action are illustrated in Figure 1. For the linear action in (15), the infinitesimal generator is the linear vector field given by the matrix-vector product . Crucially, the flow of the generator recovers the group action along the exponential curve , i.e.,
(18) |
For the linear right action in (15), this is easily verified by differentiation, applying (14), and the fact that solutions of smooth ordinary differential equations are unique. For a nonlinear right action this follows from Lemma 20.14 and Proposition 9.13 in Lee (2013).
Remark 1
In contrast to a “right” action , a “left” action satisfies . While our main results work for left actions too, e.g. , right actions are slightly more natural because the infintesimal generator is a Lie alegbra homomorphism, i.e.,
(19) |
whereas this holds with a sign change for left actions. Every left action can be converted into an equivalent right action defined by , and vice versa.
5 Fundamental operators for studying symmetry
Here we introduce our main theoretical results for studying symmetries of machine learning models by focusing on a concrete and useful special case. The basic building blocks of the machine learning models we consider here are continuously differentiable functions between finite-dimensional vector spaces. The spaces of functions with continuous derivatives up to order is denoted , with addition and scalar multiplication defined point-wise. These functions could be layers of a multilayer neural network, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task as in Brunton et al. (2016). General versions of our results for sections of vector bundles are developed later in Section 11. Our main results show that two families of fundamental linear operators encode the symmetries of these functions. The fundamental operators allow us to enforce, promote, and discover symmetry in machine learning models as we describe in Sections 6, 7, and 8.
We consider a general (perhaps nonlinear) right action and a representation . The definition of equivariance, the symmetry group of a function, and the first family of fundamental operators are introduced by the following:
Definition 2
We say that is equivariant with respect to a group element if
(20) |
for every . These elements form a subgroup of denoted .
Note that when the action is also defined by a representation, then (20) becomes
(21) |
The transformation operators are linear maps sending functions in to functions in . These fundamental operators form a group with composition and inversion . Thus, is an infinite-dimensional representation of in for any . These operators are useful for studying discrete symmetries of functions. However, for a continuous group it is impractical to work directly with the uncountable family .
The second family of fundamental operators are the key objects we use to study continuous symmetries of functions. These are the Lie derivatives defined along each by
(22) |
Note that when the action is , we have and (22) becomes
(23) |
where are the Lie algebra representations corresponding to . Evident from (22) is the fact that the Lie derivative is linear with respect to both and , and sends functions in to functions in for every . The geometric construction of the fundamental operators and are depicted in Figure 2. It turns out (see Proposition 24) that is the Lie algebra representation corresponding to on , meaning that on this space we have the handy relations
(24) |
The results stated below are special cases of more general results developed later in Section 11.
Our first main result provides necessary and sufficient conditions for a continuously differentiable function to be equivariant with respect to the Lie group actions on and . This generalizes the constraints derived by Finzi et al. (2021) for the linear layers of equivariant multilayer perceptrons.
Theorem 3
Let generate (via linear combinations and Lie brackets) the Lie algebra and let contain one element from each non-identity component of . Then is -equivariant if and only if
(25) |
for every and every . This is a special case of Theorem 26.
Since the fundamental operators and are linear, Theorem 3 provides linear constraints for a continuously differentiable function to be -equivariant.
Our second main result shows that the continuous symmetries of a given continuously differentiable function are encoded by its Lie derivatives.
Theorem 4
Given , the symmetry group is a closed, embedded Lie subgroup of with Lie subalgebra
(26) |
This is a special case of Theorem 25.
This result completely characterizes the identity component of the symmetry group because the connected Lie subgroups of are in one-to-one correspondence with Lie subalgebras of by Theorem 19.26 in Lee (2013). The Lie subalgebra of symmetries of a function can be identified via linear algebra. In particular, is the nullspace of the linear operator defined by
(27) |
Discretization methods suitable for linear-algebraic computations with the fundamental operators will be discussed in Section 10. The key point is that when the functions lie in a finite-dimensional subspace , the ranges of the restricted Lie derivatives , hence, also the ranges of , are contained in a corresponding finite-dimensional subspace on which inner products can be defined using sampling or quadrature.
The preceding two theorems already show the duality between enforcing and discovering continuous symmetries with respect to the Lie derivative, viewed as a bilinear form . To discover symmetries, we seek generators satisfying for a known function . On the other hand, to enforce a connected group of symmetries, we seek functions satisfying with known generators of .
6 Enforcing symmetry with linear constraints
Methods to enforce symmetry in neural networks and other machine learning models have been studied extensively, as we reviewed briefly in Section 3.1. A unifying theme in these techniques has been the use of linear constraints to enforce symmetry (Finzi et al., 2021; Loiseau and Brunton, 2018; Weiler et al., 2018; Cohen et al., 2019; Ahmadi and Khadir, 2020). The purpose of this section is to show how several of these methods can be understood in terms of the fundamental operators and linear constraints provided by Theorem 3.
6.1 Multilayer perceptrons
Enforcing symmetry in multilayer percetrons was studied by Finzi et al. (2021). They provide a practical method based on enforcing linear constraints on the weights defining each layer of a neural network. The network uses specialized nonlinearities that are automatically equivariant, meaning that the constraints need only be enforced on the linear component of each layer. We show that the constraints derived by Finzi et al. (2021) are the same as those given by Theorem 3.
Specifically, each linear layer , for , is defined by
(28) |
where are weight matrices and are bias vectors. Defining group representations for each layer, yields fundamental operators given by
(29) | ||||
(30) |
Let generate and let consist of an element from each non-identity component of . Using the fundamental operators and Theorem 3, it follows that the layer is -equivariant if and only if the weights and biases satisfy
(31) |
(32) |
for every and . These are the same as the linear constraints one derives using the method by Finzi et al. (2021). The equivariant linear layers are then combined with specialized equivariant nonlinearities to produce an equivariant network
(33) |
The composition of equivariant functions is equivariant, as one can easily check using Definition 2.
6.2 Neural operators acting on fields
Enforcing symmetry in neural networks acting on spatial fields has been studied extensively by Weiler et al. (2018); Cohen et al. (2018); Esteves et al. (2018); Kondor and Trivedi (2018); Cohen et al. (2019) among others. Many of these techniques use integral operators to define equivariant linear layers, which are coupled with equivariant nonlinearities, such as the gated nonlinearities proposed by Weiler et al. (2018). Networks built by composing integral operators with nonlinearities constitute a large class of “neural operators” described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023). The key task is to identify appropriate bases for equivariant kernels. For certain groups, such as the Special Euclidean group , bases can be constructed explicitly using spherical harmonics, as in Weiler et al. (2018). We show that equivariance with respect to arbitrary group actions can be enforced via linear constraints on the integral kernels derived using the fundamental operators introduced in Section 5. Appropriate bases of kernel functions can then be constructed numerically by computing an appropriate nullspace, as is done by Finzi et al. (2021) for multilayer perceptrons.
For the sake of simplicity we consider integral operators acting on vector-valued functions , where is a finite-dimensional vector space. Later on in Section 11.4 we study higher-order integral operators acting on sections of vector bundles. If is another finite-dimensional vector space, an integral operator acting on to produce a new function is defined by
(34) |
where the “kernel” function provides a linear map at each . In other words, the kernel is a function on taking values in the tensor product space , where denotes the algebraic dual of . Many of the neural operator architectures described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023) are constructed by composing layers defined by integral operators (34) with nonlinear activation functions, usually acting pointwise. The kernel functions are optimized during training of the neural operator.
With group actions defined by representations on , functions transform according to
(35) |
for . Likewise, functions transform via an analogous operator .
Definition 5
The integral operator in (34) is equivariant with respect to when
(36) |
The elements satisfying this equation form a subgroup of denoted .
By changing variables in the integral, the operator on the left is given by
(37) |
where
(38) |
The following result provides equivariance conditions in terms of the kernel, generalizing Lemma 1 in Weiler et al. (2018).
Proposition 6
Let be continuous and suppose that acts on a function space containing all smooth, compactly supported fields. Then
(39) |
We give a proof in Appendix A
The Lie derivative of a continuously differentiable kernel function is given by
(40) |
The operators and are the fundamental operators from Section 5 because the transformation law for the kernel can be written as
(41) |
where
(42) | ||||
are representations of in and .
As an immediate consequence of Theorem 3, we have the following corollary establishing linear constraints for the kernel to produce an equivariant integral operator.
Corollary 7
These linear constraint equations must be satisfied to enforced equivariance with respect to a known symmetry in the machine learning process. By discretizing the operators and , as discussed later in Section 10, one can solve these constraints numerically to construct a basis of kernel functions for equivariant integral operators.
As an immediate consequence of Theorem 4, the following result shows that the Lie derivative of the kernel encodes the continuous symmetries of a given integral operator.
Corollary 8
Under the same hypotheses as Proposition 6 and assuming is continuously differentiable, it follows that is a closed, embedded Lie subgroup of with Lie subalgebra
(44) |
This result will be useful for methods that promote symmetry of the integral operator, as we describe later in Section 8.
7 Discovering symmetry by computing nullspaces
In this section we show that in a wide range of settings, the continuous symmetries of a manifold, point cloud, or map can be recovered by computing the nullspace of a linear operator. For functions, this is already covered by Theorem 4, which allows us to compute the connected subgroup of symmetries by identifying its Lie subalgebra
(45) |
where is the linear operator defined by (27). Hence, if a machine learning model has a symmetry group , then its Lie algebra is equal to the nullspace of .
This section explains how this is actually a special case of a more general result allowing us to reveal the symmetries of submanifolds via the nullspace of a closely related operator. We begin with the more general case where we study the symmetries of a submanifold of Euclidean space, and we explain how to recover symmetries from point clouds approximating submanifolds. The Lie derivative described in Section 5 is then recovered when the submanifold is the graph of a function. We also briefly describe how the fundamental operators from Section 5 can be used to recover symmetries and conservation laws of dynamical systems.
7.1 Symmetries of submanifolds
We begin by studying the symmetries of smooth submanifolds of Euclidean space using an approach similar to Cahill et al. (2023). However, we use a different operator that generalizes more naturally to nonlinear group actions on arbitrary manifolds (see Section 12) and recovers the Lie derivative (see Section 7.2). With right action of a Lie group, we define invariance of a submanifold as follows:
Definition 9
A submanifold is invariant with respect to a group element if
(46) |
for every . These elements form a subgroup of denoted .
The subgroup of symmetries of a submanifold is characterized by the following theorem.
Theorem 10
Let be a smooth, closed, embedded submanifold of . Then is a closed, embedded Lie subgroup of whose Lie subalgebra is
(47) |
This is a special case of Theorem 35.
The meaning of this result and its practical use for detecting symmetry are illustrated in Figure 3.
To reveal the connected component of , we let be a family of linear projections onto . These are assumed to vary continuously with respect to . Then under the assumptions of the above theorem, is the nullspace of the symmetric, positive-semidefinite operator defined by
(48) |
for every . We see in Figure 3 that measures the component of the infinitesimal generator not tangent to the submanifold at . Here, is any strictly positive measure on that makes all of these integrals finite. The above formula is useful for computing the matrix of in an orthonormal basis for .
Alternatively, when the dimension of is large, one can compute the nullspace using a Krylov algorithm such as the one described in Finzi et al. (2021). Such algorithms rely solely on queries of acting on vectors . When and are given by a Lie group representation (see Section 4.2), then the operator defined in (48) is given explicitly by
(49) |
where is the adjoint of . If is a matrix Lie group and is the identity representation, then is the injection . When inherits its inner product from , then is the orthogonal projection of onto . For example, if is the identity representation of in with the inner product on given by the usual inner product of matrices , then it can be readily verified that
(50) |
In practice, one can use sample points on the manifold to obtain a Monte-Carlo estimate of with approximate projections computed using local principal component analysis (PCA), as described in Cahill et al. (2023). More accurate estimates of the tangent spaces can be obtained using the methods in Berry and Giannakis (2020). Assuming the are accurate, the following proposition shows that the correct Lie subalgebra of symmetries is revealed using finitely many sample points . However, this result does not tell us how many samples to use, or even when to stop sampling.
Proposition 11
Let be a strictly positive probability measure on a smooth manifold such that for every . Let be drawn independently from the distribution and let be defined by
(51) |
Then there is almost surely an integer such that for every we have . We provide a proof in Appendix A.
7.2 Symmetries of functions as symmetries of submanifolds
The method described above for studying symmetries of submanifolds can be applied to reveal the symmetries of smooth maps between vector spaces by identifying the map with its graph
(52) |
The graph is a smooth, closed, embedded submanifold of the space by Proposition 5.7 in Lee (2013). We show that this approach recovers the Lie derivative and our result in Theorem 4. By choosing bases for the domain and codomain, it suffices to consider smooth functions .
Supposing that we have representations and of in the domain and codomain, we consider a combined representation
(53) |
Defining a smoothly-varying family of projections
(54) |
onto , it is easy to check that
(55) |
We note that this is a special case of Theorem 39 describing the Lie derivative in terms of a projection onto the tangent space of a function’s graph. The resulting operator defined by (48) is given by
(56) |
for and an appropriate positive measure on that makes the integrals finite. Therefore, Theorem 4 is recovered from our result about symmetries of submanifolds stated in Theorem 10.
Related quantities have been used to study the symmetries of trained neural networks, with the being the network and its derivatives computed via back-propagation. The quantity was used by Gruver et al. (2022) to construct the Local Equivariant Error or (LEE), measuring the extent to which a trained neural network fails to respect symmetries in the one-parameter group . The nullspace of in the special case where acts trivially was used by Moskalev et al. (2022) to identify the connected subgroup with respect to which a given network is invariant.
By viewing a function as a submanifold, we obtain a simple data-driven technique for estimating the Lie derivative and subgroup of symmetries of the function. To approximate , , and using input-output pairs , one simply needs to approximate the projection in (54) using these data. To do this, we can obtain matrices with columns spanning by applying local PCA to the data , or by pruning the frames computed in Berry and Giannakis (2020). With the projection in (54) is given by
(57) |
because any projection is uniquely determined by its range and nullspace (see Section 5.9 of Meyer (2000)). This gives us a simple way to approximate , , and using the input-output pairs. However, many such pairs are needed since the tangent space to the graph of at is well-approximated by local PCA only when there are at least neighboring samples sufficiently close to . Even more samples are needed when they are noisy. The convergence properties of the spectral methods in Berry and Giannakis (2020) are better, but they still require enough samples to obtain accurate Monte-Carlo or quadrature-based estimates of integrals, in this case over .
7.3 Symmetries and conservation laws of dynamical systems
Here, we consider the case when is a smooth function defining a dynamical system
(58) |
with state variables . The solution of this equation is described by the flow map , which is defined on a maximal connected open set containing . In many cases we write . Given a Lie group representation , equivariance for the dynamical system is defined as follows:
Definition 12
The dynamical system in (58) is equivariant with respect to a group element if the flow map satisfies
(59) |
for every .
Differentiating at shows that equivariance of the dynamical system implies that is equivariant in the sense of Definition 2. The converse is also true thanks to Corollary 9.14 in Lee (2013), meaning that equivariance for the dynamical system is equivalent to equivariance of . Therefore, we can study equivariance of the dynamical system in (58) by directly applying the tools developed in Section 5 to the function . Thanks to Theorem 4, identifying the connected subgroup of symmetries for the dynamical system is a simple matter of computing the nullspace of the linear map , that is
(60) |
Here, the Lie derivative is given by
(61) |
where is the Lie bracket of the infinitesimal generator defined by and the vector field . Symmetries can also be enforced as linear constraints on described by Theorem 3. This was done by Ahmadi and Khadir (2020) for polynomial dynamical systems with discrete symmetries. Later on in Section 11.1 we show that analogous results apply to dynamical systems defined by vector fields on manifolds and nonlinear Lie group actions.
A conserved quantity for the system in (58) is defined as follows:
Definition 13
A scalar valued quantity is said to be conserved when
(62) |
In this setting, the composition operators are often referred to as Koopman operators (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)). It is easy to see that a smooth function is conserved if and only if
(63) |
This relation is used by Kaiser et al. (2018, 2021) to identify conserved quantities by computing the nullspace of restricted to finite-dimensional spaces of candidate functions. When the flow is defined for all , the operators and are the fundamental operators from Section 5 for the right action and representation of the Lie group .
Remark 14
For Hamiltonian dynamical systems Noether’s theorem establishes a remarkable equivalence between the symmetries of the Hamiltonian and conserved quantities of the system. We study Hamiltonian systems later in Section 11.3.
8 Promoting symmetry with convex penalties
In this section we show how to design custom convex regularization functions to promote symmetries within a given candidate group during training of a machine learning model. This allows us to train a model with as many symmetries as possible from among the candidates, while breaking candidate symmetries only when the data provides sufficient evidence. We study both discrete and continuous groups of candidate symmetries. We quantify the extent to which symmetries within the candidate group are broken using the fundamental operators described in Section 5. For discrete groups we use the transformation operators and for continuous groups we use the Lie derivatives . In the continuous case we penalize a convex relaxation of the codimension of the subgroup of symmetries given by a nuclear norm (Schatten -norm) of the operator defined by (27); minimizing this codimension via the proxy nuclear norm will promote the largest nullspace possible, and hence the largest admissible symmetry group. Once these regularization functions are developed abstractly in Sections 8.1 and 8.2, we show how the approach can be applied to basis function regression (Section 8.3), symmetric function recovery (Section 9), and neural networks (Section 8.4).
As in Section 5, the basic building blocks of the machine learning models we consider are continuously differentiable () functions between finite-dimensional vector spaces. While we consider this restricted setting here, our results readily generalize to sections of vector bundles, as we describe later in Section 11. These functions could be layers of a multilayer perceptron, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task. We consider parametric models where is constrained to lie in a given finite-dimensional subspace of continuously differentiable functions. Working within a finite-dimensional subspace of functions will allow us to discretize the fundamental operators in Section 10.
We consider the same setting as Section 5, i.e., candidate symmetries are described by a Lie group acting on the domain and codomain of functions via a right action and a representation . Equivariance in this setting is described by Definition 2. When fitting the function to data, our regularization functions penalize the size of . For reasons that will become clear, we use different penalties corresponding to different notions of “size” when is a discrete group versus when is continuous. The main result describing the continuous symmetries of is Theorem 4.
8.1 Discrete symmetries
When the group has finitely many elements, one can measure the size of simply by counting its elements:
(64) |
However, this penalty is impractical for optimization owing to its discrete values and nonconvexity. Letting be any norm on the space yields a convex relaxation of the above penalty given by
(65) |
This is a convex function on because is a linear operator and vector space norms are convex. For example, if are the coefficients of in a basis for and is the matrix of in this basis, then the Euclidean norm can be used to define
(66) |
This is directly analogous to the group sparsity penalty proposed in Yuan and Lin (2006).
8.2 Continuous symmetries
We now consider the case where is a Lie group of dimension greater than zero. Here we use the dimension of to measure the symmetry of , seeking to penalize the complementary dimension or “codimension”, given by
(67) |
We take this approach in the continuous case because it is no longer possible to simply count the number of broken symmetries. While it is possible in principle to replace the sum in (65) by an integral of over , the numerical quadrature required to approximate it becomes prohibitive for higher-dimensional candidate groups. This difficulty is exacerbated by the fact that the integrand is not smooth. The space can also become infinite-dimensional when has positive dimension, making it challenging to compute the norm .
In contrast, it is much easier to measure the “size” of a continuous symmetry group using its dimension because this can be computed via linear algebra. Specifically, the dimension of is equal to that of its Lie algebra. Thanks to Theorem 4, this is the nullspace of a linear operator defined by
(68) |
where is the Lie derivative in (22). By the rank and nullity theorem, the codimension of is equal to the rank of this operator:
(69) |
Penalizing the rank of an operator is impractical for optimization owing to its discrete values and nonconvexity. A commonly used convex relaxation of the rank is provided by the Schatten -norm, also known as the “nuclear norm”, given by
(70) |
Here denotes the th singular value of with respect to inner products on and . This space is finite-dimensional, being spanned by where and are basis elements for and . This enables computations with discrete inner products on , as we describe in Section 10. For certain rank minimization problems, penalizing the nuclear norm is guaranteed to recover the true minimum rank solution (Candès and Recht, 2009; Recht et al., 2010; Gross, 2011).
The proposed regularization function (70) is convex on because is linear and the nuclear norm is convex. For example, if are the coefficients of in a basis for and are the matrices of in orthonormal bases for and , then
(71) |
With and being the orthonormal bases for and , one can compute a store the rank- tensor . Practical methods for constructing and computing with inner products on will be discussed in Section 10.
8.3 Promoting symmetry in basis function regression
To demonstrate how the symmetry-promoting regularization functions proposed above can be used in practice, consider a regression problem for a function . It is common to parameterize this problem by expressing in a dictionary consisting of user-defined smooth functions with a matrix of weights to be fit during the training process. For example, the sparse identification of nonlinear dynamics (SINDy) algorithm (Brunton et al., 2016) belongs to this type of learning, among other machine learning algorithms (Brunton and Kutz, 2022). The fundamental operators (Section 5) for this class of functions are given by
(72) | ||||
(73) |
These can be used directly in (65) and (70) to construct symmetry-promoting regularization functions that are convex with respect to the weight matrix . Given a training dataset consisting of input-output pairs we can seek a regularized least-squares fit by solving the convex optimization problem
(74) |
Here, is a parameter controlling the strength of the regularization that can be determined using cross-validation. To examine when this approach could be beneficial, we study a simplified problem — symmetric function recovery — in Section 9, below.
Remark 15
The solutions of (74) do not depend on how the dictionary functions are normalized due to the fact that the function being minimized can be written entirely in terms of and the data . This is in contrast to other types of regularized regression problems that penalize the weights directly, and therefore depend on how the functions in are normalized.
8.4 Promoting symmetry in neural networks
In this section we describe a convex regularizing penalty to promote -equivariance in feed-forward neural networks
(75) |
composed of layers with group representations . Since the composition is -equivariant if every layer is -equivariant, the main idea is to measure the symmetries shared by all of the layers. Specifically, we aim to maximize the “size” of the subgroup
(76) |
where the notion of “size” we adopt depends on whether is discrete or continuous. The same ideas can be applied to neural networks acting on fields with layers defined by integral operators as described in Section 6.2. In this case we consider symmetries shared by all of the integral kernels.
We consider the case in which the trainable layers are elements of vector spaces , over which the optimization is carried out. For example, each layer may be given by as in Section 8.3, where is a trainable weight matrix and is a fixed dictionary of nonlinear functions. Alternatively, we could follow Finzi et al. (2021) and use trainable linear layers composed with fixed -equivariant nonlinearities. In contrast with Finzi et al. (2021), we do not force the linear layers to be -equivariant. Rather, we penalize the breaking of -symmetries in the linear layers as a means to regularize the neural network and to learn which subgroup of symmetries are compatible with the data and which are not.
As in Section 8.1, when is a discrete group with finitely many elements, a convex relaxation of the cardinality of is
(77) |
Again, this is analogous to the group-LASSO penalty developed in Yuan and Lin (2006).
When is a Lie group with nonzero dimension, we follow the approach in Section 8.2 using the following observation:
Proposition 16
The Lie subalgebra in the proposition is equal to the nullspace of the linear operator defined by
(79) |
By the rank and nullity theorem, minimizing the rank of this operator is equivalent to maximizing the dimension of the subgroup of symmetries shared by all of the layers in the network. As in Section 8.2, a convex relaxation of the rank is provided by the nuclear norm
(80) |
where are the matrices of in orthonormal bases for and the associated spaces .
9 Numerical study of sample complexity to recover symmetric functions
Can promoting symmetry help us learn an unknown symmetric function using less data? To begin answering this question, we perform numerical experiments studying the amount of sampled data needed to recover structured polynomial functions on of the form
(81) | ||||
(82) |
These possess various rotation and translation invariances when , as characterized in detail below by Proposition 17 and its corollaries.
We aim to recover the unknown function within the space of polynomial functions on with degrees up to based on the values at sample points . Our approximation of is computed by solving the convex optimization problem
(83) |
where is a candidate Lie group of symmetries acting on . This was done using the CVXPY software package developed by Diamond and Boyd (2016); Agrawal et al. (2018). The nuclear norm in (83) was defined with respect to inner products on the corresponding Lie algebras given by . As we describe later in Section 10, the inner product on the space containing the ranges of every was defined by (89) with unit weights and points drawn uniformly from the cube . Note that these discretization points were not the same as the sample points in (83). The validity of this inner product is guaranteed almost surely by Proposition 21.
To study the sample complexity for (83) to recover functions in the form of and , we perform multiple experiments using random realizations of these functions sampled at random points . In each experiment, the vectors were drawn uniformly from the cube and the vectors were formed from the columns of a random orthonormal matrix (specifically, the left singular vectors of an matrix with standard Gaussian entries). The coefficients of and in a basis of monomials up to a specified degree were sampled uniformly from the interval . This yielded random polynomial functions and with degrees and .
The sample points for each experiment were drawn uniformly from the cube . A total of with sample points were drawn, which is sufficient to recover the function almost surely regardless of regularization. For each experiment we determine the smallest so that recovery is achieved by (83) with using the sample points for every . To be precise, successful recovery is declared when all coefficients describing and in the monomial basis for agree to a tolerance of times the magnitude of the largest coefficient of . The range of values for across such random experiments provides an estimate of the sample complexity. In Figures 4, 5, and 6 we plot the range of values for as shaded regions with the average values displayed as a solid lines.
In Figure 4, we use the special Euclidean group as a candidate group to recover functions of the form with the degree of specified. The number of radial features is selected in accordance with Corollary 18 in order to ensure that has the known form and dimension stated in Proposition 17. The symmetry-promoting regularization significantly reduces the number of samples needed to recover compared to the number of samples needed to solve the linear system specifying this function within the space of polynomials with the same or lesser degree. As the number of radial features increases, so does the sample complexity to recover . This is likely due to the decreased dimension of .
In Figures 5 and 6, we use and the group of translations as candidate symmetry groups to recover function of the form with the degree of specified. Obviously, has an -dimensional subgroup of translation symmetries orthogonal to . By Corollary 19, choosing is sufficient to ensure that has the known form and dimension stated in Proposition 17. The results in Figures 5 and 6 show that the symmetry-promoting regularization reduces the sample complexity to recover . Moreover, fewer samples are needed when depends on fewer linear features, as might be expected because the dimension of increases as decreases.
Proposition 17
Let and suppose that and are sets of linearly-independent vectors in . Then, contains the -dimensional subalgebra
(84) |
and contains the -dimensional subalgebra
(85) |
Either every polynomial with degree gives or the set of polynomials with degree satisfying is a set of measure zero. Likewise, for , , and . See Appendix A for a proof.
Corollary 18
10 Discretizing the operators
This section describes how to construct matrices for the operators and for continuously differentiable functions in a user-specified finite-dimensional subspace . By choosing bases for the finite-dimensional vector spaces and , it suffices without loss of generality to consider the case in which and . We assume that and are endowed with inner products and that and are orthonormal bases for these spaces, respectively. The key task is to endow the finite-dimensional subspace
(86) |
with a convenient inner product. Once this is done, an orthonormal basis for can be constructed by applying a Gram-Schmidt process to the functions , which span . Matrices for and are then easily obtained by computing
(87) |
The issue at hand is to choose the inner product on in a way that makes computing these matrices easy. A natural choice is to equip with an inner product where is a positive measure on (such as a Guassian distribution) for which the norms of function in are finite. The problem is that it is usually challenging or inconvenient to compute the required integrals
(88) |
analytically. In this section we discuss inner products that are easy to compute in practice.
10.1 Numerical quadrature and Monte-Carlo
When (88) cannot be computed analytically, one can resort to a numerical quadrature or Monte-Carlo approximation. In both cases the integral is approximated by a weighted sum, yielding a semi-inner product
(89) |
that converges to as . The following proposition means that we do not have to pass to the limit in order to obtain a valid inner product defined by (89) on .
Proposition 20
For example, in Monte-Carlo approximation, the samples are drawn independently from a distribution with the assumption that both and are -finite and is absolutely continuous with respect to . The weights are given by the Radon-Nikodym derivative . Then for every the approximate integral converges as almost surely thanks to the strong law of large numbers (see Theorem 7.7 in Koralov and Sinai (2012)). By the proposition, there is almost surely a finite such that (89) is an inner product on for every .
10.2 Subspaces of polynomials
Here we consider the special case when is a finite-dimensional subspace consisting of polynomial functions . Examining the expression in (22), it is evident that is also a polynomial function with degree not greater than that of . Thus, is also a space of polynomial functions with degree not exceeding the maximum degree in . Since a polynomial that vanishes on an open set must be identically zero, we can take the integrals defining the inner product in (89) over a cube, such as . This is convenient because polynomial integrals over the cube can be calculated analytically.
We can also use the sample-based inner product in (89) with randomly chosen points and positive weights . The following proposition tells us exactly how many sample points we need.
Proposition 21
This means that we can draw sample points independently from any absolutely continuous measure (such as a Gaussian distribution or the uniform distribution on a cube), and with probability one, (89) will be an inner product on . When consists of polynomials with degree at most , then taking
(91) |
is sufficient.
11 Generalization to sections of vector bundles
The machinery for promoting, discovering, and enforcing symmetry of maps between finite-dimensional vector spaces is a special case of more general machinery for sections of vector bundles presented here. Applications of this more general framework include studying the symmetries of vector fields, tensor fields, dynamical systems, and integral operators manifolds with respect to nonlinear group actions (Abraham et al., 1988). We rely heavily on background, definitions, and results that can be found in Lee (2013) and Kolář et al. (1993).
First, we provide some background on smooth vector bundles that can be found in Lee (2013, Chapter 10). A rank- smooth vector bundle is a collection of -dimensional vector spaces , called “fibers”, organized smoothly over a base manifold . This fibers are organized by the “bundle projection” , a surjective map whose preimages are the fibers . Smoothness means that is a smooth submersion where is a smooth manifold covered by smooth local trivializations
with being open subsets covering . The transition functions between local trivializations are -linear, meaning that there are smooth matrix-valued functions satisfying
(92) |
for every and . The bundle with this structure is often denoted .
A “section” of the rank- vector bundle is a map satisfying . The space of (possibly rough) sections, denoted , is a vector space with addition and scalar multiplication defined pointwise in each fiber . We equip with the topology of pointwise convergence, making it into a locally-convex space. The space of sections possessing continuous derivatives is denoted , with the space of merely continuous sections being and the space of smooth sections being . A vector bundle and a section are depicted in Figure 7, along with the fundamental operators for a group action that we introduce below.
We consider a smooth “fiber-linear” right -action , meaning that every is a smooth vector bundle homomorphism. In other words, descends under the bundle projection to a unique smooth right -action so that the diagram
(93) |
commutes and the restricted maps are linear. We define what it means for a section to be symmetric with respect to this action as follows:
Definition 22
A section is equivariant with respect to a transformation if
(94) |
These transformations form a subgroup of denoted .
The operators are depicted in Figure 7. Thanks to the vector bundle homomorphism properties of , the operators are well-defined and linear. Moreover, they form a group under composition , with inverses given by .
The “infinitesimal generator” of the group action is the linear map defined by
(95) |
It turns out that this vector field is -related to (see Lemma 5.13 in Kolář et al. (1993), Lemma 20.14 in Lee (2013)), meaning that the flow of is given by
(96) |
Likewise, is the flow of , which is -related to .
Differentiating the smooth curves lying in for each gives rise to the Lie derivative along defined by
(97) |
where if and only if the limit converges in , i.e., pointwise. Note that we implicitly identify . This construction is illustrated in Figure 7. We recover (22) from (97) in the special case where a smooth function is viewed as a section of the bundle and acted upon by group representations. Critically, the Lie derivative , as defined above, is a linear operator on sections of the vector bundle . This allows us to formulate convex symmetry-promoting regularization functions as in Section 8 using Lie derivatives in the broader setting of vector bundle sections.
Remark 23 (Lie derivatives using flows)
Thanks to (96), the Lie derivative defined in (97) only depends on the infinitesimal generator , and its flow for small time . Hence, any vector field in whose flow is fiber-linear, but not necessarily defined for all , gives rise to an analogously-defined Lie derivative acting linearly on . These are the so-called “linear vector fields” described by Kolář et al. (1993) in Section 47.9. In fact, more general versions of the Lie derivative based on flows for maps between manifolds are described by Kolář et al. (1993) in Chapter 11. However, these generalizations are nonlinear operators, destroying the convex properties of the symmetry-promoting regularization functions in Section 8.
In addition to linearity, the key properties of the operators and for studying symmetries of sections are:
Proposition 24
Taken together, these results mean that and are (infinite-dimensional) representations of and in .
The main results of this section are the following two theorems. The first completely characterizes the identity component of by correspondence with its Lie subgalgebra (see Theorem 19.26 in Lee (2013)). The second gives necessary and sufficient conditions for a section to be -equivariant.
Theorem 25
If is a continuous section, then is a closed, embedded Lie subgroup of whose Lie subalgebra is
(101) |
We give a proof in Appendix D.
Theorem 26
Suppose that has connected components with being the component containing the identity element. Let generate and let be elements from each non-identity component of . A continuous section is -equivariant if and only if
(102) |
If, in addition, we have
(103) |
then is -equivariant. We give a proof in Appendix E.
These results allow us to promote, enforce, and discover symmetries for sections of vector bundles in fundamentally the same way we did for maps between finite-dimensional vector spaces in Sections 6, 7, and 8. In particular, symmetry can be enforced through analogous linear constraints, discovered through nullspaces of analogous operators, and promoted through analogous convex penalties based on the nuclear norm.
Remark 27 (Left actions)
Theorems 26 and 25 hold without any modification for left G-actions . This is because we can define a corresponding right -action by with associated operators related by
(104) |
The symmetry group does not depend on whether it is defined by the condition or by . It is slighly less natural to work with left actions because and are Lie group and Lie algebra anti-homomorphisms, that is,
(105) |
11.1 Vector fields
Here we study the symmetries of a vector field under a right -action . This allows us to extend the discussion in Section 7.3 to dynamical systems described by vector fields on smooth manifolds and acted upon nonlinearly by arbitrary Lie groups. The tangent map of the diffeomorhpism transforms vector fields via the pushforward map defined by
(106) |
for every .
Definition 28
Given , we say that a vector field is -invariant if and only if , that is,
(107) |
Because , it is clear that a vector field is -invariant if and only if it is -invariant.
Recall that vector fields are smooth sections of the tangent bundle . The right -action on induces a right -action on the tangent bundle defined by
(108) |
It is easy to see that each is a vector bundle homomorphism descending to under the bundle projection . Crucially, we have
(109) |
meaning that a vector field is -invariant if and only if it is -equivariant as a section of with respect to the action . Recall that (by Lemma 20.14 in Lee (2013)) the left-invariant vector field and its infinitesimal generator are -related, where is the orbit map. This means that is the time- flow of by Proposition 9.13 in Lee (2013). As a result, the Lie derivative in (97) agrees with the standard Lie derivative of along (see Lee (2013, p.228)), that is,
(110) |
where the expression on the right is the Lie bracket of and .
11.2 Tensor fields
Symmetries of a tensor field can also be revealed using our framework. This will be useful for our study of Hamiltonian dynamics in Section 11.3 and for our study of integral operators, whose kernels can be viewed as tensor fields, in Section 11.4. For simplicity, we consider a rank- covariant tensor field , although our results extend to contravariant and mixed tensor fields with minimal modification. We rely on the basic definitions and machinery found in Lee (2013, Chapter 12). Under a right -action on , the tensor field transforms via the pullback map defined by
(111) |
for every .
Definition 29
Given , a tensor field is -invariant if and only if , that is,
(112) |
To study the invariance of tensor fields in our framework, we recall that a tensor field is a section of the tensor bundle , a vector bundle over , where is the dual space of . The right -action on induces a right -action defined by
(113) |
It is clear that each is a homomorphism of the vector bundle descending to under the bundle projection. Crucially, we have
(114) |
meaning that is -invariant if and only if it is -equivariant as a section of with respect to the action . Since gives the time- flow of the vector field , the Lie derivative in (97) for this action agrees with the standard Lie derivative of along (see Lee (2013, p.321)), that is
(115) |
The Lie derivative for arbitrary covariant tensor fields can be computed by applying Proposition 12.32 in Lee (2013) and its corollaries. More generally, thanks to 6.16-18 in Kolář et al. (1993), the Lie derivative for any tensor product of sections of natural vector bundles can be computed via the formula
(116) |
For example, this holds when are arbitrary smooth tensor fields of mixed types. The Lie derivative of a differential form on can be computed by Cartan’s magic formula
(117) |
where is the exterior derivative.
11.3 Hamiltonian dynamics
The dynamics of frictionless mechanical systems can be described by Hamiltonian vector fields on symplectic manifolds. Roughly speaking, these encompass systems that conserve energy, such as motion of rigid bodies and particles interacting via conservative forces. The celebrated theorem of Noether (1918) says that conserved quantities of Hamiltonian systems correspond with symmetries of the energy function (the system’s Hamiltonian). In this section, we briefly illustrate how to enforce Hamiltonicity constraints on learned dynamical systems and how to promote, discover, and enforce conservation laws. A thorough treatment of classical mechanics, symplectic manifolds, and Hamiltonian systems can be found in Abraham and Marsden (2008); Marsden and Ratiu (1999). This includes methods for reduction of systems with known symmetries and conservation laws. The following brief introduction follows Chapter 22 of Lee (2013).
Hamiltonian systems are defined on symplectic manifolds. That is, a smooth even-dimensional manifold together with a smooth, nondegenerate, closed differential -form , called the symplectic form. Nondegeneracy means that the map is a bijective linear map of onto its dual for every . Closure means that , where is the exterior derivative. Thanks to nondegeneracy, any smooth function gives rise to a smooth vector field
(118) |
called the “Hamiltonian vector field” of . A vector field is said to be Hamiltonian if for some function , called the Hamiltonian of . A vector field is locally Hamiltonian if it is Hamiltonian in neighborhood of each point of .
The symplectic manifolds considered in classical mechanics usually consist of the cotangent bundle of an -dimensional manifold describing the “configuration” of the system, e.g., the positions of particles. The cotangent bundle has a canonical symplectic form given by
(119) |
where are any choice of natural coordinates on a patch of (see Proposition 22.11 in Lee (2013)). Here, each is a generalized coordinate describing the configuration and is its “conjugate” or “generalized” momentum. The Darboux theorem (Theorem 22.13 in Lee (2013)) says that any symplectic form on a manifold can be put into the form of (119) by a choice of local coordinates. In these “Darboux” coordinates, the dynamics of a Hamiltonian system are governed by the equations
(120) |
which should be familiar to anyone who has studied undergraduate mechanics.
Enforcing local Hamiltonicity on a vector field is equivalent to the linear constraint
(121) |
thanks to Proposition 22.17 in Lee (2013). Here is the Lie derivative of the tensor field , i.e., (115) with being the flow of and its generator being the identity . Note that the Lie derivative still makes sense even when the orbits are only defined for small . In Darboux coordinates, this constraint is equivalent to the set of equations
(122) |
for all . When the first de Rham cohomology group satisfies , for example when is contractible, local Hamilonicity implies the existence of a global Hamiltonian for , unique on each component of up to addition of a constant by Lee (2013, Proposition 22.17).
Of course our approach also makes it possible to promote Hamiltonicity with respect to candidate symplectic structures when learning a vector field . To do this, we can penalize the nuclear norm of restricted to a subspace of candidate closed -forms using the regularization function
(123) |
The strength of this penalty can be increased when solving a regression problem for until there is a non-degenerate -form in the nullspace . This gives a symplectic form with respect to which is locally Hamiltonian.
Another option is to learn a (globally-defined) Hamiltonian function directly by fitting to data. In this case, we can regularize the learning problem by penalizing the breaking of conservation laws. The time-derivative of a quantity, that is, a smooth function under the flow of is given by the Poisson bracket
(124) |
Hence, is a conserved quantity if and only if is invariant under the flow of — this is Noether’s theorem. It is also evident that the Poisson bracket is linear with respect to both of its arguments. In fact, the Poisson bracket turns into a Lie algebra with being a Lie algebra homomorphism, i.e., .
As a result of these basic properties of the Poisson bracket, the quantities conserved by a given Hamiltonian vector field form a Lie subalgebra given by the nullspace of a linear operator defined by
(125) |
To promote conservation of quantities in a given subalgebra when learning a Hamiltonian , we can penalize the nuclear norm of restricted to , that is
(126) |
For example, we might expect a mechanical system to conserve angular momentum about some axes, but not others due to applied torques. In the absence of data to the contrary, it often makes sense to assume that various linear and angular momenta are conserved.
11.4 Multilinear integral operators
In this section we provide machinery to study the symmetries of linear and nonlinear integral operators acting on sections of vector bundles, yielding far-reaching generalizations of our results in Section 6.2. Such operators can form the layers of neural networks acting on various vector and tensor fields supported on manifolds.
Let and be vector bundles with being -dimensional orientable Riemannian manifolds with volume forms , . Note that here, does not denote the exterior derivative of a -form. A continuous section of the bundle
(127) |
can be viewed as a continuous family of -multilinear maps
(128) |
The section can serve as the kernel to define an -multilinear integral operator with action on given by
(129) |
This operator is linear when . When and , (129) can be used to define a nonlinear integral operator with action .
Given fiber-linear right -actions , there is an induced fiber-linear right -action on defined by
(130) |
for viewed as an -multilinear map and . Sections transform according to
(131) |
with the section defining the integral kernel transforming according to
(132) |
Using these transformation laws, we define equivariance for the integral operator as follows:
Definition 30
This definition is equivalent to the following condition on the integral kernel:
Lemma 31
We note that is the natural transformation of the differential form (a covariant tensor field) described in Section 11.2. The Lie derivative of this action on volume forms is given by
(135) |
thanks to Cartan’s magic formula and the definition of divergence (see Lee (2013)). Therefore, differentiating (134) along the curve yields the Lie derivative
(136) |
Here, is the Lie derivative associated with (132). For the integral operators discussed in Section 6.2, the formulas derived here recover Eqs. 38 and 40.
12 Invariant submanifolds and tangency
Studying the symmetries of smooth maps can be cast into a more general framework in which we study the symmetries of submanifolds. Specifically, the symmetries of a smooth map between manifolds correspond to symmetries of its graph, , and the symmetries of a smooth section of a vector bundle correspond to symmetries of its image, — both of which are properly embedded submanifolds of and , respectively. We show that symmetries of a large class of submanifolds, including the above, are revealed by checking whether the infinitesimal generators of the group action are tangent to the submanifold. In this setting, the Lie derivative of has a geometric interpretation as a projection of the infintesimal generator onto the tangent space of the image , viewed as a submanifold of the bundle.
12.1 Symmetry of submanifolds
In this section we study the infinitesimal conditions for a submanifold to be invariant under the action of a Lie group. Suppose that is a manifold and is a right action of a Lie group on . Sometimes we denote this action by when there is no ambiguity. Though our results also hold for left actions, as we discuss later in Remark 37, working with right actions is standard in this context and allows us to leverage results from Lee (2013) more naturally in our proofs. Fixing , the orbit map of this action is denoted . Fixing , the map defined by is a diffeomorphism with inverse .
Definition 32
A subset is -invariant if and only if for every and .
Sometimes we will denote , in which case -invariance of can be stated as .
We study the group invariance of submanifolds of the following type:
Definition 33
Let be a weakly embedded -dimensional submanifold of an -dimensional manifold . We say that is arcwise-closed if any smooth curve satisfying must also satisfy .
Submanifolds of this type include all properly embedded submanifolds of because properly embedded submanifolds are closed subsets (Proposition 5.5 in Lee (2013)). More interestingly, we have the following:
Proposition 34
The leaves of any (nonsingular) foliation of are arcwise-closed. We provide a proof in Appendix A.
This means that the kinds of submanifolds we are considering include all possible Lie subgroups (Lee (2013, Theorem 19.25)) as well as their orbits under free and proper group actions (Lee (2013, Proposition 21.7)). The leaves of singular foliations associated with integrable distributions of nonconstant rank (see Kolář et al. (1993, Sections 3.18–25)) can fail to be arcwise-closed. For example, the distribution spanned by the vector field on has maximal integral manifolds , , and forming a singular foliation of . Obviously, the leaves and are not arcwise-closed.
Given a submanifold and a candidate group of transformations, the following theorem describes the largest connected Lie subgroup of symmetries of the submanifold. Specifically, these symmetries can be identified by checking tangency conditions between infinitesimal generators and the submanifold.
Theorem 35
Let be an immersed submanifold of and let be a right action of a Lie group on with infinitesimal generator . Then
(137) |
is the Lie subalgebra of a unique connected Lie subgroup . If is weakly-embedded and arcwise-closed in , then this subgroup has the following properties:
-
(i)
-
(ii)
If is a connected Lie subgroup of such that , then .
If is properly embedded in then is the identity component of the closed, properly embedded Lie subgroup
(138) |
A proof is provided in Appendix F.
Since the infinitesimal generator is a linear map and is a subspace of , the tangency conditions defining the Lie subalgebra (137) can be viewed as a set of linear equality constraints on the elements . Hence, can be computed as the nullspace of a positive semidefinite operator on , defined analogously to the case described earlier in Section 7.1.
The following theorem provides necessary and sufficient conditions for arcwise-closed weakly-embedded submanifolds to be -invariant. These are generally nonlinear constraints on the submanifold, regarded as the zero section of its normal bundle under identification with a tubular neighborhood. However, we will show in Section 12.2 that these become linear constraints recovering the Lie derivative when the submanifold in question is the image of a section of a vector bundle and the group action is fiber-linear.
Theorem 36
Let be an arcwise-closed weakly-embedded submanifold of and let be a right action of a Lie group on with infinitesimal generator . Let generate and let be elements from each non-identity component of . Then is -invariant if and only if
(139) |
If, in addition, we have for every , then is -invariant. A proof is provided in Appendix G.
12.2 The Lie derivative as a projection
We provide a geometric interpretation of the Lie derivative in (97) by expressing it in terms of a projection of the infinitesimal generator of the group action onto the tangent space of for smooth sections . This allows us to connect the Lie derivative to the tangency conditions for symmetry of submanifolds presented in Section 12.1.
The Lie derivative lies in , while is a subspace of . To relate quantities in these different spaces, the following lemma introduces a lifting of each to a subspace of .
Lemma 38
For every smooth section there is a well-defined injective vector bundle homomorphism that is expressed in any local trivialization as
(140) | ||||
We give a proof in Appendix H.
This is a special case of the “vertical lift” of into the vertical bundle described by Kolář et al. (1993) in Section 6.11. The “vertical projection” provides a left-inverse satisfying .
The following result relates the Lie derivative to a projection via the vertical lifting.
Theorem 39
Given a smooth section and , the map is a linear projection onto and for every we have
(141) |
We give a proof in Appendix H.
For the special case of smooth maps viewed a sections of the bundle , this theorem reproduces (55). The following corollary provides a link between our main results for sections of vector bundles and our main results for symmetries of submanifolds.
Corollary 40
For every smooth section , , and we have
(142) |
13 Conclusion
This paper provides a unified theoretical approach to enforce, discover, and promote symmetries in machine learning models. In particular, we provide theoretical foundations for Lie group symmetry in machine learning from a linear-algebraic viewpoint. This perspective unifies and generalizes several leading approaches in the literature, including approaches for incorporating and uncovering symmetries in neural networks and more general machine learning models. The central objects in this work are linear operators describing the finite and infinitesimal transformations of smooth sections of vector bundles with fiber-linear Lie group actions. To make the paper accessible to a wide range of practitioners, Sections 4–10 deal with the special case where the machine learning models are built using smooth functions between vector spaces. Our main results establish that the infinitesimal operators — the Lie derivatives — fully encode the connected subgroup of symmetries for sections of vector bundles (resp. functions between vector spaces). In other words, the Lie derivatives encode symmetries that the machine learning models are equivariant with respect to.
We illustrate that promoting and enforcing continuous symmetries in large classes of machine learning models are dual problems with respect to the bilinear structure of the Lie derivative. Moreovery, these ideas extend naturally to identify continuous symmetries of arbitrary submanifolds, recovering the Lie derivative when the submanifold is the image of a section of a vector bundle (resp., the graph of a function between vector spaces). Using the fundamental operators, we also describe how symmetries can be promoted as inductive biases during training of machine learning models using convex penalties. Our numerical results show that minimizing these convex penalties can be used to recover highly symmetric polynomial functions using fewer samples than are required to determine the polynomial coefficients directly as the solution of a linear system. This reduction in sample complexity becomes more pronounced in higher dimensions and with increasing symmetry of the function to be recovered. Finally, we provide rigorous data-driven methods for discretizing and approximating the fundamental operators to accomplish the tasks of enforcing, promoting, and discovering symmetry. Importantly, these theoretical concepts, while extremely general, admit efficient computational implementations via simple linear algebra.
The main limitations of our approach come from the need to make appropriate choices for key objects including the candidate group , the space of functions defining the machine learning model, and appropriate inner products for discretizing the fundamental operators. For example, it is possible that the only -symmetric functions in are trivial, meaning that enforcing symmetry results in learning only trivial models. One open question is whether our framework can be used in such cases to learn relaxed symmetries, as described by Wang et al. (2022). In other words, we may hope to find elements in that are nearly symmetric, and to bound the degree of asymmetry based on the quantities derived from the fundamental operators, such as their norms. Additionally, the choice of inner products associated with the discretization of the fundamental operators could affect the results of nuclear norm penalization. Our reliance on the Lie algebra to study continuous symmetries also limits the ability of our proposed methods to account for partial symmetries, such as the invariance in classifying the characters “Z” and “N” to rotations by small angles, but not large angles.
In follow-up work, we aim to apply the proposed methods to a wide range of examples, and to explain practical implementation details. A main goal will be to study the extent to which nuclear norm relaxation can recover underlying symmetry groups and reduce the amount of data required to train accurate machine learning models on realistic data sets. Additionally, we will examine how the proposed techniques perform in the presence of noisy data, with the goal of understanding the empirical effects of problem dimension, noise level, and the candidate symmetry group.
Other important avenues of future work include investigating computationally efficient approaches to discretize the fundamental operators and use them to enforce, discover, and promote symmetry within our framework. This could involve leveraging sparse structure of the discretized operators in certain bases to enable the use of efficient Krylov subspace algorithms. It will also be useful to identify efficient optimization algorithms for training symmetry-constrained or symmetry-regularized machine learning models. Promising candidates include projected gradient descent, proximal splitting algorithms, and the Iteratively Reweighted Least Squares (IRLS) algorithms described by Mohan and Fazel (2012). Using IRLS could enable symmetry-promoting penalties to be based on non-convex Schatten -norms with , potentially improving the recovery of underlying symmetry groups compared to the nuclear norm where .
There are also several avenues we plan to explore in future theoretical work. These include extending the techniques presented here via jet bundle prolongation, as described by Olver (1986), to study symmetries in machine learning for Partial Differential Equations (PDEs). Combining analogues of our proposed methods in this setting with techniques using the weak formulation proposed by Messenger and Bortz (2021); Reinbold et al. (2020) could provide robust ways to identify symmetric PDEs in the presence of high noise and limited training data. We also aim to study the perturbative effects of noisy data in algorithms to discover and promote symmetry with the goal of understanding the effects of problem dimension, noise level, and number of data points on recovery of symmetry groups. Another important direction of theoretical study will be to build on the work of Peitz et al. (2023); Steyert (2022) by studying symmetry in the setting of Koopman operators for dynamical systems. To do this, one might follow the program set forth by Colbrook (2023), where the measure preserving property of certain dynamical systems is exploited to enhance the Extended Dynamic Mode Decomposition (EDMD) algorithm of Williams et al. (2015).
Acknowledgements
The authors acknowledge support from the National Science Foundation AI Institute in Dynamic Systems (grant number 2112085). SLB acknowledges support from the Army Research Office (ARO W911NF-19-1-0045) and the Boeing Company. The authors would also like to acknowledge valuable discussions with Tess Smidt and Matthew Colbrook.
References
- Abraham et al. (1988) R. Abraham, J. E. Marsden, and T. Ratiu. Manifolds, Tensor Analysis, and Applications, volume 75 of Applied Mathematical Sciences. Springer-Verlag, 1988.
- Abraham and Marsden (2008) Ralph Abraham and Jerrold E Marsden. Foundations of mechanics. AMS Chelsea Publishing, 2 edition, 2008.
- Agrawal et al. (2018) Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
- Ahmadi and Khadir (2020) Amir Ali Ahmadi and Bachir El Khadir. Learning dynamical systems with side information. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, pages 718–727. PMLR, 10–11 Jun 2020. URL https://proceedings.mlr.press/v120/ahmadi20a.html.
- Akhound-Sadegh et al. (2024) Tara Akhound-Sadegh, Laurence Perreault-Levasseur, Johannes Brandstetter, Max Welling, and Siamak Ravanbakhsh. Lie point symmetry and physics-informed networks. Advances in Neural Information Processing Systems, 36, 2024.
- Baddoo et al. (2023) Peter J Baddoo, Benjamin Herrmann, Beverley J McKeon, J Nathan Kutz, and Steven L Brunton. Physics-informed dynamic mode decomposition. Proceedings of the Royal Society A, 479(2271):20220576, 2023.
- Batzner et al. (2022) Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
- Benton et al. (2020) Gregory Benton, Marc Finzi, Pavel Izmailov, and Andrew G Wilson. Learning invariances in neural networks from training data. Advances in neural information processing systems, 33:17605–17616, 2020.
- Berry and Giannakis (2020) Tyrus Berry and Dimitrios Giannakis. Spectral exterior calculus. Communications on Pure and Applied Mathematics, 73(4):689–770, 2020.
- Boullé and Townsend (2023) Nicolas Boullé and Alex Townsend. A mathematical guide to operator learning. arXiv preprint arXiv:2312.14688, 2023.
- Bouwmans et al. (2018) Thierry Bouwmans, Sajid Javed, Hongyang Zhang, Zhouchen Lin, and Ricardo Otazo. On the applications of robust PCA in image and video processing. Proceedings of the IEEE, 106(8):1427–1457, 2018.
- Brandstetter et al. (2022) Johannes Brandstetter, Max Welling, and Daniel E Worrall. Lie point symmetry data augmentation for neural pde solvers. In International Conference on Machine Learning, pages 2241–2256. PMLR, 2022.
- Brunton and Kutz (2022) S. L. Brunton and J. N. Kutz. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022.
- Brunton et al. (2016) Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
- Brunton et al. (2020) Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos. Machine learning for fluid mechanics. Annual Review of Fluid Mechanics, 52:477–508, 2020.
- Brunton et al. (2022) Steven L Brunton, Marko Budišić, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
- Cahill et al. (2023) Jameson Cahill, Dustin G Mixon, and Hans Parshall. Lie PCA: Density estimation for symmetric manifolds. Applied and Computational Harmonic Analysis, 2023.
- Callaham et al. (2022) Jared L Callaham, Georgios Rigas, Jean-Christophe Loiseau, and Steven L Brunton. An empirical mean-field model of symmetry-breaking in a turbulent wake. Science Advances, 8(eabm4786), 2022.
- Candès and Recht (2009) Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9:717–772, 2009.
- Candès et al. (2011) Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Caron and Traynor (2005) Richard Caron and Tim Traynor. The zero set of a polynomial. WSMR Report 05-02, 2005. URL https://www.researchgate.net/profile/Richard-Caron-3/publication/281285245_The_Zero_Set_of_a_Polynomial/links/55df56b608aecb1a7cc1a043/The-Zero-Set-of-a-Polynomial.pdf.
- Champion et al. (2020) Kathleen Champion, Peng Zheng, Aleksandr Y Aravkin, Steven L Brunton, and J Nathan Kutz. A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access, 8:169259–169271, 2020.
- Chen et al. (2020) Shuxiao Chen, Edgar Dobriban, and Jane H. Lee. A group-theoretic framework for data augmentation. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
- Cohen and Welling (2016) Taco Cohen and Max Welling. Group equivariant convolutional networks. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2990–2999, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/cohenc16.html.
- Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. arXiv preprint arXiv:1801.10130, 2018.
- Cohen et al. (2019) Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant CNNs on homogeneous spaces. Advances in neural information processing systems, 32, 2019.
- Colbrook (2023) Matthew J Colbrook. The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023.
- Cubuk et al. (2019) Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
- Desai et al. (2022) Krish Desai, Benjamin Nachman, and Jesse Thaler. Symmetry discovery with deep learning. Physical Review D, 105(9):096031, 2022.
- Diamond and Boyd (2016) Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning SO(3) equivariant representations with spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018.
- Finzi et al. (2020) Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to Lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
- Finzi et al. (2021) Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning, pages 3318–3328. PMLR, 2021.
- Fukushima (1980) Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
- Goswami et al. (2023) Somdatta Goswami, Aniruddha Bora, Yue Yu, and George Em Karniadakis. Physics-informed deep neural operator networks. In Machine Learning in Modeling and Simulation: Methods and Applications, pages 219–254. Springer, 2023.
- Gotô (1950) Morikuni Gotô. Faithful representations of Lie groups II. Nagoya mathematical journal, 1:91–107, 1950.
- Gross (2011) David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, 2011.
- Gross (1996) David J Gross. The role of symmetry in fundamental physics. Proceedings of the National Academy of Sciences, 93(25):14256–14259, 1996.
- Gruver et al. (2022) Nate Gruver, Marc Anton Finzi, Micah Goldblum, and Andrew Gordon Wilson. The Lie derivative for measuring learned equivariance. In The Eleventh International Conference on Learning Representations, 2022.
- Guan et al. (2021) Yifei Guan, Steven L Brunton, and Igor Novosselov. Sparse nonlinear models of chaotic electroconvection. Royal Society Open Science, 8(8):202367, 2021.
- Hall (2015) Brian C. Hall. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Springer, 2015.
- Hataya et al. (2020) Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki Nakayama. Faster autoaugment: Learning augmentation strategies using backpropagation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 1–16. Springer, 2020.
- Holmes et al. (2012) P. J. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge Monographs in Mechanics. Cambridge University Press, Cambridge, England, 2nd edition, 2012.
- Horn and Johnson (2013) Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2 edition, 2013.
- Kaiser et al. (2018) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Discovering conservation laws from data for control. In 2018 IEEE Conference on Decision and Control (CDC), pages 6415–6421. IEEE, 2018.
- Kaiser et al. (2021) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
- Kolář et al. (1993) Ivan Kolář, Peter W. Michor, and Jan Slovák. Natural Operations in Differential Geometry. Springer-Verlag, 1993.
- Kondor and Trivedi (2018) Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pages 2747–2755. PMLR, 2018.
- Koopman (1931) B. O. Koopman. Hamiltonian systems and transformations in Hilbert space. Proceedings of the National Academy of Sciences, 17:315–318, 1931.
- Koralov and Sinai (2012) Leonid B. Koralov and Yakov G. Sinai. Theory of Probability and Random Processes. Springer, 2 edition, 2012.
- Kovachki et al. (2023) Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89):1–97, 2023.
- LeCun et al. (1989) Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
- Lee (2013) John M. Lee. Introduction to Smooth Manifolds: Second Edition. Springer, 2013.
- Lezcano-Casado and Martınez-Rubio (2019) Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In International Conference on Machine Learning, pages 3794–3803. PMLR, 2019.
- Liu and Tegmark (2022) Ziming Liu and Max Tegmark. Machine learning hidden symmetries. Phys. Rev. Lett., 128:180201, May 2022. doi: 10.1103/PhysRevLett.128.180201. URL https://link.aps.org/doi/10.1103/PhysRevLett.128.180201.
- Loiseau and Brunton (2018) J.-C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid Mechanics, 838:42–67, 2018.
- Maron et al. (2018) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
- Marsden and Ratiu (1999) J. E. Marsden and T. S. Ratiu. Introduction to mechanics and symmetry. Springer-Verlag, 2nd edition, 1999.
- Mauroy et al. (2020) Alexandre Mauroy, Y Susuki, and I Mezić. Koopman operator in systems and control. Springer, 2020.
- Messenger and Bortz (2021) Daniel A Messenger and David M Bortz. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Modeling & Simulation, 19(3):1474–1497, 2021.
- Meyer (2000) Carl D Meyer. Matrix analysis and applied linear algebra, volume 71. Siam, 2000.
- Mezić (2005) Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41:309–325, 2005.
- Miao and Rao (2007) Xu Miao and Rajesh PN Rao. Learning the Lie groups of visual invariance. Neural computation, 19(10):2665–2693, 2007.
- Mohan and Fazel (2012) Karthik Mohan and Maryam Fazel. Iterative reweighted algorithms for matrix rank minimization. Journal of Machine Learning Research, 13(1):3441–3473, 2012.
- Moskalev et al. (2022) Artem Moskalev, Anna Sepliarskaia, Ivan Sosnovik, and Arnold Smeulders. LieGG: Studying learned Lie group generators. Advances in Neural Information Processing Systems, 35:25212–25223, 2022.
- Noether (1918) E Noether. Invariante variationsprobleme nachr. d. könig. gesellsch. d. wiss. zu göttingen, math-phys. klasse 1918: 235-257. English Reprint: physics/0503066, http://dx. doi. org/10.1080/00411457108231446, page 57, 1918.
- Olver (1986) Peter J. Olver. Applications of Lie Groups to Differential Equations. Springer, 1986.
- Otto and Rowley (2021) Samuel E Otto and Clarence W Rowley. Koopman operators for estimation and control of dynamical systems. Annual Review of Control, Robotics, and Autonomous Systems, 4:59–87, 2021.
- Peitz et al. (2023) Sebastian Peitz, Hans Harder, Feliks Nüske, Friedrich Philipp, Manuel Schaller, and Karl Worthmann. Partial observations, coarse graining and equivariance in Koopman operator theory for large-scale dynamical systems. arXiv preprint arXiv:2307.15325, 2023.
- Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- Rao and Ruderman (1999) Rajesh Rao and Daniel Ruderman. Learning Lie groups for invariant visual perception. Advances in neural information processing systems, 11, 1999.
- Recht et al. (2010) Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
- Reinbold et al. (2020) Patrick AK Reinbold, Daniel R Gurevich, and Roman O Grigoriev. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Physical Review E, 101(010203), 2020.
- Romero and Lohit (2022) David W Romero and Suhas Lohit. Learning partial equivariances from data. Advances in Neural Information Processing Systems, 35:36466–36478, 2022.
- Rowley et al. (2003) Clarence W. Rowley, Ioannis G. Kevrekidis, Jerrold E. Marsden, and Kurt Lust. Reduction and reconstruction for self-similar dynamical systems. Nonlinearity, 16(4):1257, 2003.
- Shorten and Khoshgoftaar (2019) Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Steyert (2022) Vivian T Steyert. Uncovering Structure with Data-driven Reduced-Order Modeling. PhD thesis, Princeton University, 2022.
- Tao (2011) Terence Tao. Two small facts about Lie groups. https://terrytao.wordpress.com/2011/06/25/two-small-facts-about-lie-groups/, 6 2011.
- Van Dyk and Meng (2001) David A Van Dyk and Xiao-Li Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
- Varadarajan (1984) V. S. Varadarajan. Lie groups, Lie algebras, and their representations. Springer, 1984.
- Wang et al. (2022) Rui Wang, Robin Walters, and Rose Yu. Approximately equivariant networks for imperfectly symmetric dynamics. In International Conference on Machine Learning, pages 23078–23091. PMLR, 2022.
- Weiler and Cesa (2019) Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. Advances in neural information processing systems, 32, 2019.
- Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 31, 2018.
- Williams et al. (2015) Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015.
- Yang et al. (2023a) Jianke Yang, Nima Dehmamy, Robin Walters, and Rose Yu. Latent space symmetry discovery. arXiv preprint arXiv:2310.00105, 2023a.
- Yang et al. (2023b) Jianke Yang, Robin Walters, Nima Dehmamy, and Rose Yu. Generative adversarial symmetry discovery. arXiv preprint arXiv:2302.00236, 2023b.
- Yang et al. (2024) Jianke Yang, Wang Rao, Nima Dehmamy, Robin Walters, and Rose Yu. Symmetry-informed governing equation discovery. arXiv preprint arXiv:2405.16756, 2024.
- Yuan and Lin (2006) Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.
Appendix A Proofs of minor results
Proof [Proposition 6] Obviously, if then . On the other hand, suppose that for some . Hence, there are vectors and such that . This remains true for all in a neighborhood of by continuity of and . Letting where is a smooth, nonnegative, function with and support in , we obtain
(143) |
meaning .
Therefore, if and only if .
We use the following lemma in the proof of Proposition 11.
Lemma 41
Suppose that is a convergent sequence of matrices and when . Then there is an integer such that for every we have . We provide a proof in Appendix A.
Proof Since the sequence of matrices acts on a finite-dimensional space, is a monotone bounded sequence of integers. Therefore, there exists an integer such that for every . Since , we must have for every . Since by assumption, it remains to show the reverse containment. Suppose , then
(144) |
meaning that .
Proof [Proposition 11] By the Cauchy-Schwarz inequality our assumption means that for every . Let be a basis for . By the strong law of large numbers, specifically Theorem 7.7 in Koralov and Sinai (2012), we have
(145) |
for every almost surely.
Consequently, almost surely.
By nonnegativity of each term in the sum defining , it follows that when .
Moreover, if then it follows from the continuity of that for every .
Hence, for every .
Therefore, and obey the hypotheses of Lemma 41 almost surely and the conclusion follows.
Proof [Proposition 17] Consider the function defined by
(146) |
with standard action of on its domain and the trivial action on its codomain. The symmetries of are shared by . By Theorem 4, the Lie algebra of ’s symmetry group is characterized by
(147) |
This means the generators are characterized by the equations
(148) |
Since , we have , called “skew symmetry”, giving . The above is satisfied if and only if
(149) |
which automatically yields . Therefore, . To determine the dimension of the symmetry group, we observe that must satisfy
(150) |
and any such uniquely determines . Therefore, the dimension of equals the dimension of the space of skew-symmetric matrices satisfying (150). Let the columns of form an orthonormal basis for with the columns of being a basis for . The above constraints, together with skew-symmetry, mean that takes the form
(151) |
where is an matrix skew-symmetric matrix. Therefore, the dimension of equals the dimension of the space of skew-symmetric matrices, which is .
The argument for is similar, with the symmetries of
(152) |
also being symmetries of . The condition is equivalent to
(153) |
which occurs if and only if and . This immediately yields . Per our earlier argument, the skew-symmetric matrices satisfying
(154) |
form a vector space with dimension . The subspace of vectors satisfying is -dimensional. Adding these gives the dimension of , which is .
Suppose there exists a polynomial with degree such that when . Let be a complementary subspace to in , that is, . We observe that if and only if there is a nonzero satisfying . The “if” part of this claim is obvious. The “only if” part follows from the fact that means that for some nonzero . Using the direct-sum decomposition of , there are unique and such that , yielding
(155) |
Moreover, because . Letting form a basis for , we consider the Gram matrix with entries
(156) |
This matrix is singular if and only if there is a nonzero satisfying for every in the cube . Since
(157) |
is a polynomial function of , it vanishes in the cube if and only if it vanishes everywhere. Hence is singular if and only if . Letting denote the vector of coefficients defining in a basis for the polynomials of degree on , we observe that
(158) |
is a polynomial function of .
The set of polynomials with degree for which corresponds to the zero level set of , i.e., those such that .
Obviously, , and taking the coefficients corresponding to gives , meaning is a nonconstant polynomial.
Since each level set of a nonconstant polynomial is a set of measure zero (Caron and Traynor (2005)), it follows that the zero level set of has measure zero.
Precisely the same argument works for , , and .
Proof [Corollary 18] By Proposition 17, it suffices to find a degree- polynomial such that . We choose , giving
(159) |
If generates a symmetry of then
(160) |
The terms in this expression with highest degree in must vanish, yielding
(161) |
This implies that . Proceeding inductively, suppose that for every . Then, vanishing the highest-degree term in (160) gives
(162) |
implying that . It follows that
(163) |
by induction, meaning that .
Hence, , which completes the proof.
Proof [Corollary 19] By Proposition 17, it suffices to find a quadratic polynomial such that . Letting and , consider the quadratic function , giving
(164) |
If generates a symmetry of then
(165) |
Differentiating the above with respect to at yields , which, because is injective, means that . The fact that for every means that , i.e.,
(166) |
Letting the columns of span the orthogonal complement to columns of and expressing
(167) |
the above commutation relation with gives
(168) |
Multiplying on left and right by combinations of or and or extracts the relations
(169) |
Since is invertible, we must have and . Since , we must also have , meaning that its diagonal entries are identically zero. Considering the element of with , we have
(170) |
meaning that .
Therefore, only can be nonzero, which gives .
Combined with the fact that , we conclude that , completing the proof.
Proof [Proposition 16]
As an intersection of closed subgroups, is a closed subgroup of .
By the closed subgroup theorem (see Theorem 20.12 in Lee (2013)), is an embedded Lie subgroup, whose Lie subalgebra we denote by .
If then for all and every .
Differentiating at proves that , i.e., by Theorem 4.
Conversely, if for every , then by Theorem 4, .
Since is a Lie subgroup, differentiating at proves that .
Proof [Proposition 20] Let be a basis for . Consider the sequence of Gram matrices with entries
(171) |
It suffices to show that is positive-definite for sufficiently large . Since the inner product is positive-definite on , it follows that the Gram matrix with entries
(172) |
is positive-definite.
Hence, its smallest eigenvalue is positive.
Since the ordered eigenvalues of symmetric matrices are continuous with respect to their entries (see Corollary 4.3.15 in Horn and Johnson (2013)) and for all by assumption, we have as .
Therefore, there is an so that for every we have , i.e., is positive-definite.
Proof [Lemma 31] Using the fact that the integral is invariant under pullbacks by diffeomorphisms, we can express the left-hand-side of the equivariance condition in Definition 30 as
(173) |
Hence, by comparing the integrand to (129), it is clear that (134) implies that is equivariant in the sense of Definition 30. Conversely, if is -equivariant, then
(174) |
holds for every .
Since the domain contains all smooth, compactly-supported fields , it follows that (134) holds.
Proof [Proposition 34] Consider a leaf of an -dimensional foliation on the -dimensional manifold and let be a smooth curve satisfying . First, it is clear that is a weakly embedded submanifold of since is an integral manifold of an involutive distribution (Lee (2013, Proposition 19.19)) and the local structure theorem for integral manifolds (Lee (2013, Proposition 19.16)) shows that they are weakly embedded.
By continuity of , any neighborhood of in must have nonempty intersection with . By definition of a foliation (see Lee (2013)), there is a coordinate chart for with such that is a coordinate-aligned cube in and consists of countably many slices of the form for constants . Since is continuous, there is a so that , and in particular, . By continuity of , there are constants such that for every and . Hence, we have
(175) |
meaning that .
An analogous argument shows that , completing the proof that is arcwise-closed.
Appendix B Proof of Proposition 21
Our proof relies on the following lemma:
Lemma 42
Let denote a finite-dimensional vector space of polynomials . If then the evaluation map defined by
(176) |
is injective for almost every with respect to Lebesgue measure.
Proof Letting and choosing a basis for , injectivity of is equivalent to injectivity of the matrix
(177) |
Finally, this is equivalent to
(178) |
taking a nonzero value. We observe that is a polynomial on the Euclidean space .
Suppose there exists a set of points such that is injective. Then for this set . Obviously, , meaning that cannot be constant. Thanks to the main result in Caron and Traynor (2005), this means that each level set of has zero Lebesgue measure in . In particular, the level set , consisting of those for which fails to be injective, has zero Lebesgue measure. Therefore, it suffices to prove that there exists such that is injective. We do this by induction.
It is clear that there exists so that the matrix
(179) |
has full rank since cannot be the zero polynomial. Proceeding by induction, we assume that there exists so that
(180) |
has full rank. Suppose that the matrix
(181) |
has rank for every . Since the upper left block of is , we must always have . The nullspace of is contained in the nullspace of the upper block of . Since both nullspaces are one-dimensional, they are equal. The upper block of does not depend on , so there is a fixed nonzero vector so that for every . The last row of this expression reads
(182) |
contradicting the linear independence of .
Therefore there exists so that has full rank.
It follows by induction on that there exists so that has full rank.
Choosing any additional vectors yields an injective , which completes the proof.
Proof [Proposition 21] The sum in (89) clearly defines a symmetric, positive-semidefinite bilinear form on . It remains to show that this bilinear form is positive-definite. Suppose that there is a function such that . Thanks to Lemma 42, our assumption that means that the evaluation operator is injective on for almost every with respect to Lebesgue measure. Since a countable (in this case finite) intersection of sets of measure zero has measure zero, it follows that for almost every with respect to Lebesgue measure, is injective on every , . Defining the positive diagonal matrix
(183) |
and using (89) yields
(184) |
This implies that for .
Since is injective on each it follows that each , meaning that .
This completes the proof.
Appendix C Proof of Proposition 24
We begin by proving
(185) |
for every . To prove the first equality, we choose , let , and compute
(186) | ||||
Here, we have used the composition law for the operators and the fact that is fiber-linear. Taking the limit at yields
(187) |
which is the first equality in (185).
Next, we prove
(189) |
when . To do this, we choose , and define the map by
(190) |
As a composition of maps, is , and its derivative at the identity is
(191) |
for every . Since the derivative is linear, it follows that is linear.
Finally, we prove that
(192) |
when . Recall that gives the flow of the left-invariant vector field (see Theorem 4.18(3) in Kolář et al. (1993)). By Theorem 3.16 in Kolář et al. (1993) the curve given by
(193) |
satisfies , , and
(194) |
in the sense that is a derivation on , hence an element of . Composing with the map in (190) yields
(195) |
Combining (194) and (191) (noting the definition of the tangent map acting on derivations, as in Kolář et al. (1993), Lee (2013)) gives
(196) |
This proves the first equality in (192) thanks to the composition law
(197) |
To differentiate the above expression, we use the following observations. If is such that is , then obviously with the usual identification . Moreover, we have
(198) | ||||
because for all and is linear on . Using this, we obtain
(199) | ||||
because lies in the vector space , allowing us to exchanged the order of differentiation. Since lies in the vector space for all , we can apply the chain rule and (198) to obtain
(200) |
Using (185) gives
(201) |
Applying the same technique to differentiate a second time and using the linearity in (189) to cancel terms yields
(202) |
which completes the proof.
Appendix D Proof of Theorem 25
We begin by showing that is a closed subgroup of . It is obviously a subgroup, for if then
(203) |
meaning that . To show that is closed, we observe that for each , the map defined by
(204) |
is continuous, as it is a composition of continuous maps. As is a single point in , the preimage set is closed in . Since is an intersection,
(205) |
of closed sets, it follows that is closed in . By the closed subgroup theorem (Theorem 20.12 in Lee (2013)) it follows that is an embedded Lie subgroup of .
Let be the Lie algebra of . Choosing any we have for every , yielding
(206) |
Hence, and , meaning that , as defined by (101).
Appendix E Proof of Theorem 26
If is -equivariant, then for all and . Differentiating with respect to at gives .
Conversely, suppose that for a collection of generators of . By Theorem 25, is a closed Lie subgroup of whose Lie subalgebra contains . Since generate , it follows that . This means that due to the correspondence between connected Lie subgroups and their Lie subalgebras established by Theorem 19.26 in Lee (2013). Specifically, the identity component of must correspond to since both are connected Lie subgroups of with identical Lie subalgebras.
Now, let us suppose in addition that for an element from each non-identity component , of . By Proposition 7.15 in Lee (2013), is a normal subgroup of and every connected component of is diffeomorphic to . In fact in the proof of this result it is shown that every connected component of is a coset of , meaning that . Choosing any there is an element such that and we obtain
(208) |
This completes the proof because .
Appendix F Proof of Theorem 35
Our proof of the theorem relies on the following technical lemma concerning the integral curves of vector fields tangent to weakly embedded, arcwise-closed submanifolds.
Lemma 43
Let be an arcwise-closed weakly embedded submanifold of a manifold . Let be a vector field tangent to , that is
(209) |
If is a maximal integral curve of that intersects , then lies in .
Proof By the translation lemma (Lemma 9.4 in Lee (2013)), we can assume without loss of generality that and . Let denote the inclusion map. Since is an immersed submanifold of and is tangent to , there is a unique smooth vector field that is -related to thanks to Proposition 8.23 in Lee (2013). Let be the maximal integral curve of with . By the naturality of integral curves (Proposition 9.6 in Lee (2013)) is an integral curve of with . Since integral curves of smooth vector fields starting at the same point are unique (Theorem 9.12, part (a) in Lee (2013)) we have and
(210) |
Therefore, it remains to show that .
By the local existence of integral curves (Proposition 9.2 in Lee (2013)), the domains and of the maximal integral curves and are open intervals in . Suppose, for the sake of producing a contradiction, that there exists with . Then it follows that the least upper bound is an element of . By (210) and continuity of we have
(211) |
Since is arcwise-closed, it follows that .
To complete the proof, we use the local existence of an integral curve for starting at to contradict the maximality of .
By the local existence of integral curves (Proposition 9.2 in Lee (2013)) and the translation lemma (Lemma 9.4 in Lee (2013)), there is an and an integral curve of such that .
Shrinking the interval, we take .
Again, by nauturality and uniqueness of integral curves we must have for all .
Hence, by (210) and injectivity of it follows that for all .
Applying the gluing lemma (Corollary 2.8 in Lee (2013)) to and yields an extension of to the larger open interval .
Since this contradicts the maximality of , there is no with .
The same argument shows that there is no with , and so we must have .
We also use the following lemma describing the elements of a Lie group that can be constructed from products of exponentials.
Lemma 44
Let be the identity component of a Lie group . Then every element can be expressed as a finite product of elements for . Let be a connected component of and let . Then every element can be expressed as for some .
Proof By the inverse function theorem (more specifically by Proposition 20.8(f) in Lee (2013)), the range of the exponential map contains an open, connected neighborhood of the identity element . The inverses of the elements in also belong to the range of the exponential map thanks to Proposition 20.8(c) in Lee (2013). By Proposition 7.14(b) and Proposition 7.15 in Lee (2013), it follows that generates the identity component of . That is, any element can be written as a finite product of elements in and their inverses, which proves the first claim.
By Proposition 7.15 in Lee (2013), is a normal subgroup of and every connected component of is diffeomorphic to .
In fact in the proof of this result it is shown that every connected component of is a coset of .
Therefore, if is a non-identity connected component of and then , which proves the second claim.
Proof [Theorem 35] The set is a subspace of , for if and then
(212) |
thanks to linearity of the infinitesimal generator . To show that is a Lie subalgrebra, we must show that it is also closed under the Lie bracket. Recall that is a Lie algebra homomorphism (see Theorem 20.15 in Lee (2013)), and so . Since the Lie bracket of two vector fields tangent to an immersed submanifold is also tangent to the submanifold (see Corollary 8.32 in Lee (2013)), it follows that is tangent to . Hence, is closed under the Lie bracket and is therefore a Lie subalgebra of . By Theorem 19.26 in Lee (2013), there is a unique connected Lie subgroup of whose Lie subalgebra is .
Now suppose that is weakly embedded and arcwise-closed in . We aim to show that . Choosing any , Lemma 20.14 in Lee (2013) shows that , regarded as a left-invariant vector field on , and are -related for every . By the naturality of integral curves (Proposition 9.6 in Lee (2013)) it follows that defined by
(213) |
is the unique maximal integral curve of passing through at . When , this integral curve lies in thanks to Lemma 43. This means that is invariant under the action of any group element in the range of the exponential map restricted to . Proceeding by induction, suppose that is invariant under the action of any product of such elements. If is a product of elements , then it follows from associativity and the induction hypothesis that
(214) |
Therefore, is invariant under the action of any finite product of group elements in by induction on . By Lemma 44, it follows that is -invariant, proving claim (i).
To prove claim (ii), suppose that is another connected Lie subgroup of such that . Choosing any and , we have
(215) |
Since is weakly embedded in , this defines a smooth curve such that , where is the inclusion map. Differentiating and using the definition of the infinitesimal generator gives
(216) |
Therefore, which implies that by Theorem 19.26 in Lee (2013), establishing claim (ii).
Now suppose that is properly embedded in and denote
(217) |
The equality of these expressions is a simple matter of unwinding their definitions. It is clear that is a subgroup of , for if then the composition law for the group action gives . Since is properly embedded, it is closed in (see Lee (2013, Proposition 5.5)), meaning that each preimge set is closed in by continuity of . As an intersection of closed subsets, it follows that is closed in . By the closed subgroup theorem (Lee (2013, Theorem 20.12)), is a properly embedded Lie subgroup of . The same holds for the identity component of since is closed in , which implies that is closed in .
Finally, we show that is the identity component of .
First, we observe that because is connected and contained in .
The reverse containment follows from the fact that is a connected Lie subgroup satisfying , which by our earlier result implies that .
Appendix G Proof of Theorem 36
First, suppose that is -invariant. In particular, this means that for every and , the smooth curve defined by
(218) |
lies in . Since is weakly embedded in , is also smooth as a map into . Specifically, there is a smooth curve so that where is the inclusion map. Differentiating at yields
(219) |
which lies in . In particular, for every and .
Conversely, suppose that the tangency condition expressed in (139) holds. By Theorem 35, the elements belong to the Lie subalgebra of the largest connected Lie subgroup of symmetries of . Since generate , it follows that . Therefore, by Theorem 19.26 in Lee (2013), we obtain because both are connected Lie subgroups of with identical Lie subalgebras.
Finally, suppose, in addition, that for an element from each non-identity component of . By Lemma 44, if then there is an element such that . Therefore, we obtain
(220) |
which completes the proof because .
Appendix H Proof of Theorem 39
Proof [Lemma 38] The map defined in a local trivialization by (140) is injective. It is a vector bundle homomorphism because , , and are vector bundle homomorphisms and and are invertible. It remains to show that the definition of does not depend on the choice of local trivialization. Given two local trivializations and defined on where is an open subset of , it suffices to show that the following diagram commutes:
(221) |
Since is a bundle homomorphism descending to the identity, it can be written as
(222) |
for a matrix-valued function . Moreover, the matrices are invertible because the local trivializations are bundle isomorphisms. Differentiating, we obtain
(223) |
where . Composing this with , we obtain
(224) |
proving that the diagram commutes.
Proof [Theorem 39] We observe that is a smooth idempotent map whose image is . By differentiating the expression at a point , we obtain
(225) |
meaning that is a linear projection. Since
(226) |
we have . Differentiating yields
(227) |
meaning that . Since it follows that is a linear projection onto .
We observe that the generalized Lie derivative in (97) can be expressed as
(228) | ||||
The first equality follows because is a vector bundle homomorphism, meaning that the restricted map is linear; here . The second equality follows because is continuous. Note that in the first expression the limit is taken in the vector space , whereas in the last expression the limit must be taken in .
We proceed by expressing everything in a local trivialization of an open neighborhood of . Since the maps , , and are vector bundle homomorphisms, there is a matrix-valued function such that
(229) |
Differentiating with respect to yields the generator
(230) |
where . We define the function by
(231) |
Using the above definitions, we can express the generalized Lie derivative in the local trivialization:
(232) | ||||
Applying Lemma 38 allows us to express the left-hand-side of (141) as
(233) |
We can also express the quantities on the right-hand-side of (141) in the local trivialization. To do this, we compute
(234) | ||||
and
(235) | ||||
Subtracting these yields
(236) |
which, upon comparison with (233) completes the proof.