This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\quotingsetup

vskip=5pt,leftmargin=15pt,rightmargin=15pt

A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning

\nameSamuel E. Otto \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameNicholas Zolman \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameJ. Nathan Kutz \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameSteven L. Brunton \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA
 Present affiliation: Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY, USA
Abstract

Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincaré invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.

Keywords: Symmetries, machine learning, Lie groups, manifolds, invariance, equivariance, neural networks, deep learning

1 Introduction

Symmetry is present throughout nature, and according to David Gross (1996) the discovery of fundamental symmetries has played an increasingly central role in physics since the beginning of the 20th century. He asserts that

“Einstein’s great advance in 1905 was to put symmetry first, to regard the symmetry principle as the primary feature of nature that constrains the allowable dynamical laws.”

According to Einstein’s special theory of relativity, physical laws including those of electromagnetism and quantum mechanics are Poincaré-invariant, meaning that after predictable transformations (actions of the Poincaré group), these laws can be applied in any non-accelerating reference frame, anywhere in the universe, at all times. Specifically these transformations form a ten-parameter group including four translations of space-time, three rotations of space, and three shifts or “boosts” in velocity. For small boosts of velocity, these transformations become the Galilean symmetries of classical mechanics. Similarly, the theorems of Euclidean geometry are unchanged after arbitrary translations, rotations, and reflections, comprising the Euclidean group. In fluid mechanics, the conformal (angle-preserving) symmetry of Laplace’s equation is used to reduce the study of idealized flows in complicated geometries to canonical flows in simple domains. In dynamical systems, the celebrated theorem of Noether (1918) establishes a correspondence between symmetries and conservation laws, an idea which has become a central pillar of mechanics (Abraham and Marsden, 2008). These examples illustrate the diversity of symmetry groups and their physical applications. More importantly, they illustrate how symmetric models and theories in physics automatically extrapolate in explainable ways to environments beyond the available data.

In machine learning, models that exploit symmetry can be trained with less data and use fewer parameters compared to their asymmetric counterparts. Early examples include augmenting data with known transformations (see Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001)) or using convolutional neural networks (CNNs) to achieve translation invariance for image processing tasks (see Fukushima (1980); LeCun et al. (1989)). More recently, equivariant neural networks respecting Euclidean symmetries have achieved state-of-the-art performance for predicting potentials in molecular dynamics Batzner et al. (2022). As with physical laws, symmetries and invariances allow machine learning models to extrapolate beyond the training data, and achieve high performance with fewer modeling parameters.

However, many problems are only weakly symmetric. Gravity, friction, and other external forces can cause some or all of the Poincaré or Galilean symmetries to be broken. Interactions between particles can be viewed as breaking symmetries possessed by non-interacting particles. Written characters have translation and scaling symmetry, but not rotation (cf. 66 and 99, d and p, N and Z) or reflection (cf. b and d, b and p). One of the main contributions of this work is to propose a method of enforcing a new principle of parsimony in machine learning. This principal of parsimony by maximal symmetry states that a model should break a symmetry within a set of physically reasonable transformations (such as Poincaré, Galilean, Euclidean, or conformal symmetry) only when there is sufficient evidence in the data.

In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in the following three ways:

  1. Task 1.

    Enforce. Train a model with known symmetry.

  2. Task 2.

    Discover. Identify the symmetries of a given model or data set.

  3. Task 3.

    Promote. Train a model with as many symmetries as possible (from among candidates), breaking symmetries only when there is sufficient evidence in the data.

While these tasks have been studied to varying extents separately, we show how they can be situated within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. As a special case, the Lie derivative recovers the linear constraints derived by Finzi et al. (2021) for weights in equivariant multilayer perceptrons. In full generality, we show that known symmetries can be enforced as linear constraints derived from Lie derivatives for a large class of problems including learning vector and tensor fields on manifolds as well as learning equivariant integral operators acting on such fields. For example the kernels of “steerable” CNNs developed by Weiler et al. (2018); Weiler and Cesa (2019) are constructed to automatically satisfy these constraints for the groups SO(3)\operatorname{SO}(3) (rotations in three dimensions) and SE(2)\operatorname{SE}(2) (rotations and translations in two dimensions). We show how analogous steerable networks for other groups, such as subgroups of SE(n)\operatorname{SE}(n), can be constructed by numerically enforcing linear constraints derived from the Lie derivative on integral kernels defining each layer. Symmetries, conservation laws, and symplectic structure can also be enforced when learning dynamical systems via linear constraints on the vector field. Again these constraints come from the Lie derivative and can be incorporated into neural network architectures and basis function regression models such as Sparse Identification of Nonlinear Dynamics (SINDy) (Brunton et al., 2016).

Moskalev et al. (2022) identifies the connected subgroup of symmetries of a trained neural network by computing the nullspace of a linear operator. Likewise, Kaiser et al. (2018, 2021) recovers the symmetries and conservation laws of a dynamical system by computing the nullspace of a different linear operator. We observe that these operators and others whose nullspaces encode the symmetries of more general models can be derived directly from the Lie derivative in a manner dual to the construction of operators used to enforce symmetry. Specifically, the nullspaces of the operators we construct reveal the largest connected subgroups of symmetries for enormous classes of models. This extends work by Gruver et al. (2022) using the Lie derivative to test whether a trained neural network is equivariant with respect to a given one-parameter group, e.g., rotation of images. Generalizing the ideas in Cahill et al. (2023), we also show that the symmetries of point clouds approximating underlying submanifolds can be recovered by computing the nullspaces of associated linear operators. This allows for the unsupervised mining of data for hidden symmetries.

The idea of relaxed symmetry has been introduced recently by Wang et al. (2022), along with architecture-specific symmetry-promoting regularization functions involving sums or integrals over the candidate group of transformations. The Augerino method introduced by Benton et al. (2020) uses regularization to promote equivariance with respect to a larger collection of candidate transformations. Promoting physical constraints through the loss function is also a core concept of Physics-Informed Neural Networks (PINNs) introduced by Raissi et al. (2019). Our approach to the third task (promoting symmetry) is to introduce a unified and broadly applicable class of convex regularization functions based on the Lie derivative to penalize symmetry breaking during training of machine learning models. As we describe above, the Lie derivative yields an operator whose nullspace corresponds to the symmetries of a given model. Hence, the lower the rank of this operator, the more symmetric the model is. The nuclear norm has been used extensively as a convex relaxation of the rank with favorable theoretical properties for compressed sensing and low-rank matrix recovery (Recht et al., 2010; Gross, 2011), as well as in robust PCA (Candès et al., 2011; Bouwmans et al., 2018). Penalizing the nuclear norm of our symmetry-encoding operator yields a convex regularization function that can be added to the loss when training machine learning models, including basis function regression and neural networks. Likewise, we use a nuclear norm penalty to promote conservation laws and Hamiltonicity with respect to candidate symplectic structures when fitting dynamical systems to data. This lets us train the model and enforce data-consistent symmetries simultaneously.

2 Executive summary

This paper provides a linear-algebraic framework to enforce, discover, and promote symmetry of machine learning models. To illustrate, consider a model defined by a function F:mnF:\mathbb{R}^{m}\to\mathbb{R}^{n}. By a symmetry, we mean an invertible transformation TT and an invertible linear transformation T~\tilde{T} so that

F(T(x))=T~F(x).F(T(x))=\tilde{T}F(x). (1)

Examples to keep in mind are rotations and translations. If (Ta,T~a)(T_{a},\tilde{T}_{a}) is a symmetry, then so is its inverse (Ta1,T~a1)(T_{a}^{-1},\tilde{T}_{a}^{-1}), and if (Tb,T~b)(T_{b},\tilde{T}_{b}) is another symmetry, then so is the composition (TbTa,T~bT~a)(T_{b}\circ T_{a},\tilde{T}_{b}\circ\tilde{T}_{a}). We work with collections of transformations {(Tg,T~g)}gG\{(T_{g},\tilde{T}_{g})\}_{g\in G}, called groups, that have an identity element and are closed under composition and inverses. Specifically, we consider Lie groups.

2.1 Enforcing symmetry

Given a group of transformations, the symmetry condition (1) imposes linear constraints on FF that can be enforced during the fitting process. However, there is one constraint per transformation, making direct enforcement impractical for continuous Lie groups such as rotations or translations. We observe that for smoothly-parametrized, connected groups, it suffices to consider a finite collection of infinitesimal linear constraints ξiF=0{\mathcal{L}}_{\xi_{i}}F=0 where ξ{\mathcal{L}}_{\xi} is the defined “Lie derivative” defined by

ξF(x)=(T~gF(x)g|g=IdF(x)xTg(x)g|g=Id)ξ.{\mathcal{L}}_{\xi}F(x)=\left(\left.\frac{\partial\tilde{T}_{g}F(x)}{\partial g}\right|_{g=\operatorname{Id}}-\frac{\partial F(x)}{\partial x}\left.\frac{\partial T_{g}(x)}{\partial g}\right|_{g=\operatorname{Id}}\right)\xi. (2)

Notice that this expression is linear with respect to ξ\xi and with respect to FF.

2.2 Discovering symmetry

The symmetries of a given model FF form a subgroup that we seek to identify within a given group of candidates. For continuous Lie groups of transformations, the component of the subgroup containing the identity is revealed by the nullspace of the linear map

LF:ξξF.L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F. (3)

More generally, the symmetries of a smooth surface in n\mathbb{R}^{n} can be determined from data sampled from this surface by computing the nullspace of a positive semidefinite operator. When the surface is the graph of the function FF in m×n\mathbb{R}^{m}\times\mathbb{R}^{n}, this operator is LFLFL_{F}^{*}L_{F} with LFL_{F}^{*} being an adjoint operator.

2.3 Promoting symmetry

Here, we seek to learn a model FF that both fits data and possesses as many symmetries as possible from a given candidate group of transformations. Since the nullspace of the operator LFL_{F} defined in (3) corresponds with the symmetries of FF, we seek to minimize the rank of LFL_{F} during the training process. To do this, we regularize optimization problems for FF using a convex relaxation of the rank given by the nuclear norm (sum of singular values)

LF=i=1dimGσi(LF).\|L_{F}\|_{*}=\sum_{i=1}^{\dim G}\sigma_{i}(L_{F}). (4)

This is convex with respect to FF because FLFF\mapsto L_{F} is linear and the nuclear norm is convex.

3 Related work

3.1 Enforcing symmetry

Data-augmentation, as reviewed by Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001), is one of the simplest ways to incorporate known symmetry into machine learning models. Usually this entails training a neural network architecture on training data to which known transformations have been applied. The theoretical foundations of these methods are explored by Chen et al. (2020). Data-augmentation has also been used by Benton et al. (2020) to construct equivariant neural networks by averaging the network’s output over transformations applied to the data. Finally, Brandstetter et al. (2022) applied data-augmentation strategies with known Lie-point symmetries for improving neural PDE solvers.

Symmetry can also be enforced directly on the machine learning architecture. For example, Convolutional Neural Networks (CNNs), introduced by Fukushima (1980) and popularized by LeCun et al. (1989), achieve translational equivariance by employing convolutional filters with trainable kernels in each layer. CNNs have been generalized to provide equivariance with respect to symmetry groups other than translation. Group-Equivariant CNNs (G-CNNs) (Cohen and Welling, 2016) provide equivariance with respect to arbitrary discrete groups generated by translations, reflections, and rotations. Rotational equivariance can be enforced on three-dimensional scalar, vector, or tensor fields using the 3D Steerable CNNs developed by Weiler et al. (2018). Spherical CNNs Cohen et al. (2018); Esteves et al. (2018) allow for rotation-equivariant maps to be learned for fields (such as projected images of 3D objects) on spheres. Essentially any group equivariant linear map (defining a layer of an equivariant neural network) acting fields can be described by group convolution (Kondor and Trivedi, 2018; Cohen et al., 2019), with the spaces of appropriate convolution kernel characterized by Cohen et al. (2019). Finzi et al. (2020) provides a practical way to construct convolutional layers that are equivariant with respect to arbitrary Lie groups and for general data types. For dynamical systems, Marsden and Ratiu (1999); Rowley et al. (2003); Abraham and Marsden (2008) describe techniques for symmetry reduction of the original problem to a quotient space where the known symmetry group has been factored out. Related approaches have been used by Peitz et al. (2023); Steyert (2022) to approximate Koopman operators for symmetric dynamical systems (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)).

A general method for constructing equivariant neural networks is introduced by Finzi et al. (2021), and relies on the observation that equivariance can be enforced through a set of linear constraints. For graph neural networks, Maron et al. (2018) characterizes the subspaces of linear layers satisfying permutation equivariance. Similarly, Ahmadi and Khadir (2020) shows that discrete symmetries and other types of side information can be enforced via linear or convex constraints in learning problems for dynamical systems. Our work builds on the results of Finzi et al. (2021), Weiler et al. (2018), Cohen et al. (2019), and Ahmadi and Khadir (2020) by showing that equivariance can be enforced in a systematic and unified way via linear constraints for large classes of functions and neural networks. Concurrent work by (Yang et al., 2024) shows how to enforce known or discovered Lie group symmetries on latent dynamics using hard linear constraints or soft penalties.

3.2 Discovering symmetry

Early work by Rao and Ruderman (1999); Miao and Rao (2007) used nonlinear optimization to learn infinitesimal generators describing transformations between images. Later, it was recognized by Cahill et al. (2023) that linear algebraic methods could be used to uncover the generators of continuous linear symmetries of arbitrary point clouds in Euclidean space. Similarly, Kaiser et al. (2018) and Moskalev et al. (2022) show how conserved quantities of dynamical systems and invariances of trained neural networks can be revealed by computing the nullspaces of associated linear operators. We connect these linear-algebraic methods to the Lie derivative, and provide generalizations to nonlinear group actions on manifolds. The Lie derivative has been used by Gruver et al. (2022) to quantify the extent to which a trained network is equivariant with respect to a given one-parameter subgroup of transformations. Our results show how the Lie derivative can reveal the entire connected subgroup of symmetries of a trained model via symmetric eigendecomposition.

More sophisticated nonlinear optimization techniques use Generative Adversarial Networks (GANs) to learn the transformations that leave a data distribution unchanged. These methods include SymmetryGAN developed by Desai et al. (2022) and LieGAN developed by Yang et al. (2023b). In contrast, our methods for detecting symmetry are entirely linear-algebraic.

While symmetries may exist in data, their representation may be difficult to describe. Yang et al. (2023a) develop Latent LieGAN (LaLieGAN) to extend LieGAN to find linear representations of symmetries in a latent space. Recently this has been applied to dynamics discovery (Yang et al., 2024). Likewise, Liu and Tegmark (2022) discover hidden symmetries by optimizing nonlinear transformations into spaces where candidate symmetries hold. Similar to our approach for promoting symmetry, they use a cost function to measure whether a given symmetry holds. In contrast, our regularization functions enable subgroups of candidate symmetry groups to be identified.

3.3 Promoting symmetry

Biasing a network towards increased symmetry can be accomplished through methods such as symmetry regularization. Analogous to the physics-informed loss developed in PINNs by Raissi et al. (2019) that penalize a solution for violating known dynamics, one can penalize symmetry violation for a known group; for example, Akhound-Sadegh et al. (2024) extends the PINN framework to penalize deviations of known Lie-point symmetries of a PDE. More generally, however, one can consider a candidate group of symmetries and promote as much symmetry as possible that is consistent with the available data. Wang et al. (2022) discusses these approaches, along with architecture-specific methods, including regularization functions involving summations or integrals over the candidate group of symmetries. While our regularization functions resemble these for discrete groups, we use a radically different regularization for continuous Lie groups. By leveraging the Lie algebra, our regularization functions eliminate the need to numerically integrate complicated functions over the group—a task that is already prohibitive for the 1010-dimensional non-compact group of Galilean symmetries in classical mechanics.

Automated data augmentation techniques introduced by Cubuk et al. (2019); Hataya et al. (2020); Benton et al. (2020) are another class of methods that arguably promote symmetry. These techniques optimize the distribution of transformations applied to augment the data during training. For example “Augerino” is an elegant method developed by Benton et al. (2020) which averages an arbitrary network’s output over the augmentation distribution and relies on regularization to prevent the distribution of transformations from becoming concentrated near the identity. In essence, the regularization biases the averaged network towards increased symmetry.

In contrast, our regularization functions promote symmetry on an architectural level for the original network. This eliminates the need to perform averaging, which grows more costly for larger collections of symmetries. While a distribution over symmetries can be useful for learning interesting partial symmetries (e.g. 66 stays 66 for small rotations, before turning into 99), as is done by Benton et al. (2020) and Romero and Lohit (2022), it is not clear how to use a continuous distribution over transformations to identify lower-dimensional subgroups which have measure zero. On the other hand, our linear-algebraic approach easily identifies and promotes symmetries in lower-dimensional connected subgroups.

3.4 Additional approaches and applications

There are several other approaches that incorporate various aspects of enforcing, discovering, and promoting symmetries. For example, Baddoo et al. (2023) developed algorithms to enforce and promote known symmetries in dynamic mode decomposition, through manifold constrained learning and regularization, respectively. Baddoo et al. (2023) also showed that discovering unknown symmetries is a dual problem to enforcing symmetry. Exploiting symmetry has also been a central theme in the reduced-order modeling of fluids for decades (Holmes et al., 2012). As machine learning methods are becoming widely used to develop these models (Brunton et al., 2020), the themes of enforcing and discovering symmetries in machine models are increasingly relevant. Known fluid symmetries have been enforced in SINDy for fluid systems (Loiseau and Brunton, 2018) through linear equality constraints; this approach was generalized to enforce more complex constraints (Champion et al., 2020). Unknown symmetries were similarly uncovered for electroconvective flows (Guan et al., 2021). Symmetry breaking is also important in many turbulent flows (Callaham et al., 2022).

4 Elementary theory of Lie group actions

This section provides background and notation required to understand the main results of this paper in the less abstract, but still remarkably useful setting of Lie groups acting on vector spaces. In Section 5 we use this theory to study the symmetries of continuously differentiable functions between vector spaces. Such functions form the basic building blocks of many machine learning models such as basis functions regression models, the layers of multilayer perceptrons, and the kernels of integral operators acting on spatial fields such as images. We emphasize that this is not the most general setting for our results, but we provide this section and simpler versions of our main Theorems in Section 5 in order to make the presentation more accessible. We develop our main results in the more general setting of fiber-linear Lie group actions on sections of vector bundles in Section 11.

4.1 Lie groups and subgroups

Lie groups are ubiquitous in science and engineering. Some familiar examples include the general linear group GL(n)\operatorname{GL}(n) consisting of all real, invertible, n×nn\times n matrices; the orthogonal group

O(n)={Qn×n:QTQ=I};O(n)=\left\{Q\in\mathbb{R}^{n\times n}\ :\ Q^{T}Q=I\right\}; (5)

and the special Euclidean group

SE(n)={[Qb01]:Qn×n,bn,QTQ=I,det(Q)=1},\operatorname{SE}(n)=\left\{\begin{bmatrix}Q&b\\ 0&1\end{bmatrix}\ :\ Q\in\mathbb{R}^{n\times n},\ b\in\mathbb{R}^{n},\ Q^{T}Q=I,\ \det(Q)=1\right\}, (6)

representing rotations and translations in real nn-dimensional space, n\mathbb{R}^{n}, embedded in n+1\mathbb{R}^{n+1} via x(x,1)x\mapsto(x,1). Observe that the sets GL(n)\operatorname{GL}(n), O(n)O(n), and SE(n)\operatorname{SE}(n) contain the identity matrix and are closed under matrix multiplication and inversion, making them into (non-commutative) groups. They are also smooth manifolds, which makes them Lie groups (Lee, 2013). In general, a Lie group is a smooth manifold that is simultaneously an algebraic group whose composition and inversion operations are smooth maps. The identity element is usually denoted ee for “einselement”, which for a matrix Lie group is the identity matrix e=Ie=I. This section summarizes some basic results that can be found in references such as (Abraham et al., 1988; Lee, 2013; Varadarajan, 1984; Hall, 2015).

The most useful and profound property of a Lie group is the fact that the group is almost entirely characterized by an associated vector space called its Lie algebra. This allows global nonlinear questions about the group — such as which elements leave a function unchanged — to be answered using linear algebra. If GG is a Lie group, its Lie algebra, commonly denoted Lie(G)\operatorname{Lie}(G) or 𝔤{\mathfrak{g}}, is the vector space consisting of all smooth vector fields on GG that remain invariant when pushed forward by left translation Lg:hghL_{g}:h\mapsto g\cdot h. Translating back and forth from the identity element, the Lie algebra can be identified with the tangent space Lie(G)TeG\operatorname{Lie}(G)\cong T_{e}G. For example, the Lie algebra of the orthogonal group O(n)O(n) consists of all skew-symmetric matrices, and is denoted

𝔬(n)={Sn×n:S+ST=0}.\mathfrak{o}(n)=\left\{S\in\mathbb{R}^{n\times n}\ :\ S+S^{T}=0\right\}. (7)

A key fact is that the Lie algebra of GG is closed under the “Lie bracket” of vector fields111In n\mathbb{R}^{n} a vector field V=(V1,,Vn)V=(V^{1},\ldots,V^{n}) is equivalent to a directional derivative operator V1x1++V1x1V^{1}\frac{\partial}{\partial x^{1}}+\cdots+V^{1}\frac{\partial}{\partial x^{1}}. A vector field on a smooth manifold is defined as an analogous linear operator acting on the space of smooth functions. The Lie bracket is defined as the commutator of these operators. See Lee (2013).

[ξ,η]=ξηηξLie(G).[\xi,\eta]=\xi\eta-\eta\xi\in\operatorname{Lie}(G). (8)

For matrix Lie groups, this corresponds to the same commutator of matrices ξ,ηTIG\xi,\eta\in T_{I}G, as shown by Theorem 3.20 in Hall (2015).

The key tool relating global properties of a Lie group back to its Lie algebra is the exponential map exp:Lie(G)G\exp:\operatorname{Lie}(G)\to G. A vector field ξLie(G)\xi\in\operatorname{Lie}(G) has a unique integral curve γ:(,)G\gamma:(-\infty,\infty)\to G passing through the identity γ(0)=e\gamma(0)=e and satisfying γ(t)=ξ|γ(t)\gamma^{\prime}(t)=\xi|_{\gamma(t)}. The exponential map defined by

exp(ξ):=γ(1)\exp(\xi):=\gamma(1) (9)

reproduces the entire integral curve exp(tξ)=γ(t)\exp(t\xi)=\gamma(t) thanks to Proposition 20.5 in Lee (2013). Such an exponential curve is illustrated in Figure 1. For a matrix Lie group and ξTIG\xi\in T_{I}G, the exponential map is given by the matrix exponential

exp(ξ)=eξ=k=01k!ξk.\exp(\xi)=e^{\xi}=\sum_{k=0}^{\infty}\frac{1}{k!}\xi^{k}. (10)

Proposition 20.8 in (Lee, 2013) provides many of the basic properties of the exponential map, such as exp((s+t)ξ)=exp(sξ)exp(tξ)\exp((s+t)\xi)=\exp(s\xi)\cdot\exp(t\xi), exp(ξ)1=exp(ξ)\exp(\xi)^{-1}=\exp(-\xi), and dexp(0)=IdTeG\operatorname{\mathrm{d}}\exp(0)=\operatorname{Id}_{T_{e}G}. Perhaps the most important is that it provides a diffeomorphism of an open neighborhood of the origin 0 in Lie(G)\operatorname{Lie}(G) and an open neighborhood of the identity element ee in GG.

The connected component of GG containing the identity element is called the “identity component” of the Lie group and is denoted G0G_{0}. Any element in this component can then be expressed as a finite product of exponentials thanks to Proposition 7.14 in (Lee, 2013), that is

G0={exp(ξ1)exp(ξN):ξ1,,ξNLie(G),N=1,2,3,}.G_{0}=\big{\{}\exp{(\xi_{1})}\cdots\exp{(\xi_{N})}\ :\ \xi_{1},\ldots,\xi_{N}\in\operatorname{Lie}(G),\ N=1,2,3,\ldots\big{\}}. (11)

Moreover, the identity component is a normal subgroup of GG and all of the other connected components of GG are diffeomorphic cosets of G0G_{0} (Proposition 7.15 in Lee (2013)), as we illustrate in Figure 1. For example, the special Euclidean group SE(n)\operatorname{SE}(n) is connected, and thus equal to its identity component. On the other hand, the orthogonal group O(n)O(n) is compact and has two components consisting of orthogonal matrices QQ whose determinants are 11 and 1-1. The identity component of the orthogonal group is called the special orthogonal group and is denoted SO(n)\operatorname{SO}(n). It is a general fact that when a Lie group such as SO(n)\operatorname{SO}(n) is connected and compact, it is equal to the image of the exponential map without the need to consider products of exponentials, see Tao (2011) and Appendix C.1 of Lezcano-Casado and Martınez-Rubio (2019).

Refer to captionLie group, GGG0G_{0}TeGLie(G)T_{e}G\cong\operatorname{Lie}(G)eeexp(tξ)\exp(t\xi)ξ\xig1g_{1}g1G0g_{1}G_{0}g2g_{2}g2G0g_{2}G_{0}Manifold, {\mathcal{M}}TxT_{x}{\mathcal{M}}xxθ^(ξ)x\hat{\theta}(\xi)_{x}θexp(tξ)(x)\theta_{\exp(t\xi)}(x)Action, θ\theta
Figure 1: A Lie group GG and its action θ\theta on a manifold {\mathcal{M}}. The Lie group GG consists of three connected components with G0G_{0} being the one that contains the identity element ee. Each non-identity component of GG is a coset giG0g_{i}G_{0} formed by translating the identity component by an arbitrary element gig_{i} in the component. The Lie algebra Lie(G)\operatorname{Lie}(G) is identified with the tangent space TeGT_{e}G and an exponential curve exp(tξ)\exp(t\xi) generated by an element ξLie(G)\xi\in\operatorname{Lie}(G) is shown. The infinitesimal generator θ^(ξ)\hat{\theta}(\xi) is the vector field on {\mathcal{M}} whose flow corresponds with the action θexp(tξ)\theta_{\exp(t\xi)} of group elements along exp(tξ)\exp(t\xi).

A subgroup HH of a Lie group GG is called a “Lie subgroup” when HH is an immersed submanifold of GG and the group operations are smooth when restricted to HH. An immersed submanifold does not necessarily inherit its topology as a subset of GG, but rather HH has a topology and smooth structure such that the inclusion ıH:HG\imath_{H}:H\hookrightarrow G is smooth and its derivative is injective (see Lee (2013)). The tangent space to a Lie subgroup HGH\subset G at the identity, defined as TeH=Range(dıH(I))Lie(G)T_{e}H=\operatorname{Range}(\operatorname{\mathrm{d}}\imath_{H}(I))\subset\operatorname{Lie}(G), is closed under the Lie bracket and thus forms a “Lie subalgebra” of Lie(G)\operatorname{Lie}(G), denoted Lie(H)\operatorname{Lie}(H). Conversely, a remarkable result stated by Theorem 19.26 in Lee (2013) shows that any subalgebra 𝔥Lie(G){\mathfrak{h}}\subset\operatorname{Lie}(G), that is, any subspace closed under the Lie bracket corresponds to a unique connected Lie subgroup HG0H\subset G_{0} satisfying Lie(H)=𝔥\operatorname{Lie}(H)={\mathfrak{h}}. Later on, we will use this fact to identify the connected subgroups of symmetries of machine learning models based on infinitesimal criteria. Another remarkable and useful fact is the “closed subgroup theorem” stated as Theorem 20.12 in Lee (2013). It says that if HGH\subset G is a closed subset and is closed under the group operations of GG, then HH is automatically an embedded Lie subgroup of GG. Interestingly, while a Lie subgroup HGL(n)H\subset GL(n) need not be a closed subset, it turns out that HH can always be embedded as a closed subgroup in a larger GL(n)\operatorname{GL}(n^{\prime}), nnn^{\prime}\geq n thanks to Theorem 9 in Gotô (1950).

4.2 Group representations, actions, and infinitesimal generators

A Lie group homomorphism is a smooth map Φ:G1G2\Phi:G_{1}\to G_{2} between Lie groups that respects the group product, that is,

Φ(g1g2)=Φ(g1)Φ(g2).\Phi(g_{1}g_{2})=\Phi(g_{1})\cdot\Phi(g_{2}). (12)

The tangent map ϕ:=dΦ(e):TeG1𝔤1TeG2𝔤2\phi:=\operatorname{\mathrm{d}}\Phi(e):T_{e}G_{1}\cong{\mathfrak{g}}_{1}\to T_{e}G_{2}\cong{\mathfrak{g}}_{2} is a Lie algebra homomorphism by Theorem 8.44 in Lee (2013), meaning that it is a linear map respecting the Lie bracket:

ϕ([ξ1,ξ2])=[ϕ(ξ1),ϕ(ξ2)].\phi\big{(}[\xi_{1},\xi_{2}]\big{)}=\big{[}\phi(\xi_{1}),\phi(\xi_{2})\big{]}. (13)

Moreover, (see Proposition 20.8 in Lee (2013)) the Lie group homomorphism and its induced Lie algebra homomorphism are related by the exponential maps on G1G_{1} and G2G_{2} via the identity

Φ(exp(ξ))=exp(ϕ(ξ)).\Phi\big{(}\exp(\xi)\big{)}=\exp\big{(}\phi(\xi)\big{)}. (14)

Another fundamental result (Theorem 20.19 in Lee (2013)) is that any Lie algebra homomorphism Lie(G1)Lie(G2)\operatorname{Lie}(G_{1})\to\operatorname{Lie}(G_{2}) corresponds to a unique Lie group homomorphism G1G2G_{1}\to G_{2} when G1G_{1} is simply connected. When G2G_{2} is the general linear group on a vector space, then the Lie group and Lie algebra homomorphisms are called Lie group and Lie algebra “representations”.

A Lie group GG can act on a vector space 𝒱{\mathcal{V}} via a representation Φ:GGL(𝒱)\Phi:G\to\operatorname{GL}({\mathcal{V}}) according to

θ:(x,g)Φ(g1)x,\theta:(x,g)\mapsto\Phi(g^{-1})x, (15)

with x𝒱x\in{\mathcal{V}} and gGg\in G. More generally, a nonlinear right action of a Lie group GG on a manifold {\mathcal{M}} is any smooth map θ:×G\theta:{\mathcal{M}}\times G\to{\mathcal{M}} satisfying

θ(θ(x,g1),g2)=θ(x,g1g2)andθ(x,e)=x\theta(\theta(x,g_{1}),g_{2})=\theta(x,g_{1}g_{2})\qquad\mbox{and}\qquad\theta(x,e)=x (16)

for every xx\in{\mathcal{M}} and g1,g2Gg_{1},g_{2}\in G. Figure 1 depicts the action of a Lie group on a manifold. We make frequent use of the maps θg=θ(,g)\theta_{g}=\theta(\cdot,g), which have smooth inverses θg1\theta_{g^{-1}}, and the “orbit maps” θ(x)=θ(x,)\theta^{(x)}=\theta(x,\cdot). For example, using a representation Φ:SE(3)GL(7)\Phi:\operatorname{SE}(3)\to\operatorname{GL}(\mathbb{R}^{7}), the position qq and velocity vv of a particle in 3\mathbb{R}^{3} can be rotated and translated via the action

θ([Qb01],[qv1])=Φ([QTQTb01])[qv1]=[QT0QTb0QT0001][qv1]=[QT(qb)QTv1].\theta\left(\begin{bmatrix}Q&b\\ 0&1\end{bmatrix},\ \begin{bmatrix}q\\ v\\ 1\end{bmatrix}\right)=\Phi\left(\begin{bmatrix}Q^{T}&-Q^{T}b\\ 0&1\end{bmatrix}\right)\begin{bmatrix}q\\ v\\ 1\end{bmatrix}=\begin{bmatrix}Q^{T}&0&-Q^{T}b\\ 0&Q^{T}&0\\ 0&0&1\end{bmatrix}\begin{bmatrix}q\\ v\\ 1\end{bmatrix}=\begin{bmatrix}Q^{T}(q-b)\\ Q^{T}v\\ 1\end{bmatrix}.

The positions and velocities of nn particles arranged as a vector (q1,,qn,v1,,vn,1)(q_{1},\ldots,q_{n},v_{1},\ldots,v_{n},1) can be simultaneously rotated and translated via an analogous representation Φ:SE(n)GL(6n+1)\Phi:\operatorname{SE}(n)\to\operatorname{GL}(\mathbb{R}^{6n+1}).

The key fact about a group action is that it is almost completely characterized by a linear map called the infinitesimal generator. This map θ^\hat{\theta} assigns to each element ξLie(G)\xi\in\operatorname{Lie}(G) in the Lie algebra, a vector field θ^(ξ)\hat{\theta}(\xi) on {\mathcal{M}} defined by

θ^(ξ)x=ddt|t=0θexp(tξ)(x)=dθ(x)(e)ξ.\hat{\theta}(\xi)_{x}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\theta_{\exp(t\xi)}(x)=\operatorname{\mathrm{d}}\theta^{(x)}(e)\xi. (17)

The infinitesimal generator and its relation to the group action are illustrated in Figure 1. For the linear action in (15), the infinitesimal generator is the linear vector field given by the matrix-vector product θ^(ξ)x=ϕ(ξ)x\hat{\theta}(\xi)_{x}=-\phi(\xi)x. Crucially, the flow of the generator recovers the group action along the exponential curve exp(tξ)\exp(t\xi), i.e.,

Flθ^(ξ)t(x)=θexp(tξ)(x).\operatorname{Fl}_{\hat{\theta}(\xi)}^{t}(x)=\theta_{\exp(t\xi)}(x). (18)

For the linear right action in (15), this is easily verified by differentiation, applying (14), and the fact that solutions of smooth ordinary differential equations are unique. For a nonlinear right action this follows from Lemma 20.14 and Proposition 9.13 in Lee (2013).

Remark 1

In contrast to a “right” action θ:×G\theta:{\mathcal{M}}\times G\to{\mathcal{M}}, a “left” action θ:G×\theta:G\times{\mathcal{M}}\to{\mathcal{M}} satisfies θ(g2,θ(g1,x))=θ(g2g1,x){\theta(g_{2},\theta(g_{1},x))=\theta(g_{2}g_{1},x)}. While our main results work for left actions too, e.g. θ(g,x)=Φ(g)x\theta(g,x)=\Phi(g)x, right actions are slightly more natural because the infintesimal generator is a Lie alegbra homomorphism, i.e.,

θ^([ξ,η])=[θ^(ξ),θ^(η)],\hat{\theta}([\xi,\eta])=[\hat{\theta}(\xi),\hat{\theta}(\eta)], (19)

whereas this holds with a sign change for left actions. Every left action θL\theta^{L} can be converted into an equivalent right action defined by θR(x,g)=θL(g1,x)\theta^{R}(x,g)=\theta^{L}(g^{-1},x), and vice versa.

5 Fundamental operators for studying symmetry

Here we introduce our main theoretical results for studying symmetries of machine learning models by focusing on a concrete and useful special case. The basic building blocks of the machine learning models we consider here are continuously differentiable functions F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} between finite-dimensional vector spaces. The spaces of functions 𝒱𝒲{\mathcal{V}}\to{\mathcal{W}} with continuous derivatives up to order k{}k\in\mathbb{N}\cup\{\infty\} is denoted Ck(𝒱;𝒲)C^{k}({\mathcal{V}};{\mathcal{W}}), with addition and scalar multiplication defined point-wise. These functions could be layers of a multilayer neural network, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task as in Brunton et al. (2016). General versions of our results for sections of vector bundles are developed later in Section 11. Our main results show that two families of fundamental linear operators encode the symmetries of these functions. The fundamental operators allow us to enforce, promote, and discover symmetry in machine learning models as we describe in Sections 67, and 8.

We consider a general (perhaps nonlinear) right action θ:𝒱×G𝒱\theta:{\mathcal{V}}\times G\to{\mathcal{V}} and a representation Φ:GGL(𝒲)\Phi:G\to\operatorname{GL}({\mathcal{W}}). The definition of equivariance, the symmetry group of a function, and the first family of fundamental operators are introduced by the following:

Definition 2

We say that FF is equivariant with respect to a group element gGg\in G if

(𝒦gF)(x):=Φ(g)F(θg(x))=F(x)({\mathcal{K}}_{g}F)(x):=\Phi(g)F(\theta_{g}(x))=F(x) (20)

for every x𝒱x\in{\mathcal{V}}. These elements form a subgroup of GG denoted SymG(F)\operatorname{Sym}_{G}(F).

Note that when the action θ(x,g)=Ψ(g1)x\theta(x,g)=\Psi(g^{-1})x is also defined by a representation, then (20) becomes

(𝒦gF)(x):=Φ(g)F(Ψ(g)1x)=F(x).({\mathcal{K}}_{g}F)(x):=\Phi(g)F(\Psi(g)^{-1}x)=F(x). (21)

The transformation operators 𝒦g{\mathcal{K}}_{g} are linear maps sending functions in Ck(𝒱;𝒲)C^{k}({\mathcal{V}};{\mathcal{W}}) to functions in Ck(𝒱;𝒲)C^{k}({\mathcal{V}};{\mathcal{W}}). These fundamental operators form a group with composition 𝒦g𝒦h=𝒦gh{\mathcal{K}}_{g}{\mathcal{K}}_{h}={\mathcal{K}}_{gh} and inversion 𝒦g1=𝒦g1{\mathcal{K}}_{g}^{-1}={\mathcal{K}}_{g^{-1}}. Thus, g𝒦gg\mapsto{\mathcal{K}}_{g} is an infinite-dimensional representation of GG in Ck(𝒱;𝒲)C^{k}({\mathcal{V}};{\mathcal{W}}) for any kk. These operators are useful for studying discrete symmetries of functions. However, for a continuous group GG it is impractical to work directly with the uncountable family {𝒦g}gG\{{\mathcal{K}}_{g}\}_{g\in G}.

The second family of fundamental operators are the key objects we use to study continuous symmetries of functions. These are the Lie derivatives ξ:C1(𝒱;𝒲)C0(𝒱;𝒲){\mathcal{L}}_{\xi}:C^{1}({\mathcal{V}};{\mathcal{W}})\to C^{0}({\mathcal{V}};{\mathcal{W}}) defined along each ξLie(G)\xi\in\operatorname{Lie}(G) by

(ξF)(x)=ddt|t=0(𝒦exp(tξ)F)(x)=ϕ(ξ)F(x)+F(x)xθ^(ξ)x.({\mathcal{L}}_{\xi}F)(x)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(x)=\phi(\xi)F(x)+\frac{\partial F(x)}{\partial x}\hat{\theta}(\xi)_{x}. (22)

Note that when the action is θ(x,g)=Ψ(g1)x\theta(x,g)=\Psi(g^{-1})x, we have θ^(ξ)x=ψ(ξ)x\hat{\theta}(\xi)_{x}=-\psi(\xi)x and (22) becomes

(ξF)(x)=ϕ(ξ)F(x)F(x)xψ(ξ)x,({\mathcal{L}}_{\xi}F)(x)=\phi(\xi)F(x)-\frac{\partial F(x)}{\partial x}\psi(\xi)x, (23)

where ϕ,ψ\phi,\psi are the Lie algebra representations corresponding to Φ,Ψ\Phi,\Psi. Evident from (22) is the fact that the Lie derivative is linear with respect to both ξ\xi and FF, and sends functions in Ck+1C^{k+1} to functions in CkC^{k} for every k0k\geq 0. The geometric construction of the fundamental operators 𝒦g{\mathcal{K}}_{g} and ξ{\mathcal{L}}_{\xi} are depicted in Figure 2. It turns out (see Proposition 24) that ξξ\xi\mapsto{\mathcal{L}}_{\xi} is the Lie algebra representation corresponding to g𝒦gg\mapsto{\mathcal{K}}_{g} on C(𝒱;𝒲)C^{\infty}({\mathcal{V}};{\mathcal{W}}), meaning that on this space we have the handy relations

ddt𝒦exp(tξ)=ξ𝒦exp(tξ)=𝒦exp(tξ)ξand[ξ,η]=ξηηξ.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}\qquad\mbox{and}\qquad{\mathcal{L}}_{[\xi,\eta]}={\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}. (24)

The results stated below are special cases of more general results developed later in Section 11.

Refer to caption𝒱{\mathcal{V}}xxθexp(tξ)(x)\theta_{\exp(t\xi)}(x)𝒲{\mathcal{W}}F(x)F(x)ξF(x){\mathcal{L}}_{\xi}F(x)𝒦exp(tξ)F(x){\mathcal{K}}_{\exp(t\xi)}F(x)𝒲{\mathcal{W}}F(θexp(tξ)(x))F\big{(}\theta_{\exp(t\xi)}(x)\big{)}Φ(exp(tξ))\Phi(\exp(t\xi))
Figure 2: The fundamental operators for functions between vector spaces and linear Lie group actions defined by representations. The finite transformation operators 𝒦g{\mathcal{K}}_{g} act on the function F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} by composing it with the transformation θg\theta_{g} and then applying Φ(g)\Phi(g) to the values in 𝒲{\mathcal{W}}. The function is gg-equivariant when this process does not alter the function. The Lie derivative ξ{\mathcal{L}}_{\xi} is formed by differentiating t𝒦exp(tξ)t\mapsto{\mathcal{K}}_{\exp(t\xi)} at t=0t=0. Geometrically, ξF(x){\mathcal{L}}_{\xi}F(x) is the vector in 𝒲{\mathcal{W}} tangent to the curve t𝒦exp(tξ)F(x)t\mapsto{\mathcal{K}}_{\exp(t\xi)}F(x) in 𝒲{\mathcal{W}} passing through F(x)F(x) at t=0t=0.

Our first main result provides necessary and sufficient conditions for a continuously differentiable function F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} to be equivariant with respect to the Lie group actions on 𝒱{\mathcal{V}} and 𝒲{\mathcal{W}}. This generalizes the constraints derived by Finzi et al. (2021) for the linear layers of equivariant multilayer perceptrons.

Theorem 3

Let {ξi}i=1q\{\xi_{i}\}_{i=1}^{q} generate (via linear combinations and Lie brackets) the Lie algebra Lie(G)\operatorname{Lie}(G) and let {gj}j=1nG1\{g_{j}\}_{j=1}^{n_{G}-1} contain one element from each non-identity component of GG. Then FC1(𝒱;𝒲)F\in C^{1}({\mathcal{V}};{\mathcal{W}}) is GG-equivariant if and only if

ξiF=0and𝒦gjFF=0{\mathcal{L}}_{\xi_{i}}F=0\qquad\mbox{and}\qquad{\mathcal{K}}_{g_{j}}F-F=0 (25)

for every i=1,,qi=1,\ldots,q and every j=1,,nG1j=1,\ldots,n_{G}-1. This is a special case of Theorem 26.

Since the fundamental operators ξ{\mathcal{L}}_{\xi} and 𝒦g{\mathcal{K}}_{g} are linear, Theorem 3 provides linear constraints for a continuously differentiable function FF to be GG-equivariant.

Our second main result shows that the continuous symmetries of a given continuously differentiable function F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} are encoded by its Lie derivatives.

Theorem 4

Given FC1(𝒱;𝒲)F\in C^{1}({\mathcal{V}};{\mathcal{W}}), the symmetry group SymG(F)\operatorname{Sym}_{G}(F) is a closed, embedded Lie subgroup of GG with Lie subalgebra

𝔰𝔶𝔪G(F)={ξLie(G):ξF=0}.\operatorname{\mathfrak{sym}}_{G}(F)=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F=0\right\}. (26)

This is a special case of Theorem 25.

This result completely characterizes the identity component of the symmetry group SymG(F)\operatorname{Sym}_{G}(F) because the connected Lie subgroups of GG are in one-to-one correspondence with Lie subalgebras of Lie(G)\operatorname{Lie}(G) by Theorem 19.26 in Lee (2013). The Lie subalgebra of symmetries of a C1C^{1} function FF can be identified via linear algebra. In particular, 𝔰𝔶𝔪g(F)\operatorname{\mathfrak{sym}}_{g}(F) is the nullspace of the linear operator LF:Lie(G)C0(𝒱;𝒲)L_{F}:\operatorname{Lie}(G)\to C^{0}({\mathcal{V}};{\mathcal{W}}) defined by

LF:ξξF.L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F. (27)

Discretization methods suitable for linear-algebraic computations with the fundamental operators will be discussed in Section 10. The key point is that when the functions FF lie in a finite-dimensional subspace C1(𝒱;𝒲){\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}}), the ranges of the restricted Lie derivatives {ξ|}ξLie(G)\{\left.{\mathcal{L}}_{\xi}\right|_{{\mathcal{F}}}\}_{\xi\in\operatorname{Lie}(G)}, hence, also the ranges of {LF}F\{L_{F}\}_{F\in{\mathcal{F}}}, are contained in a corresponding finite-dimensional subspace C0(𝒱;𝒲){\mathcal{F}}^{\prime}\subset C^{0}({\mathcal{V}};{\mathcal{W}}) on which inner products can be defined using sampling or quadrature.

The preceding two theorems already show the duality between enforcing and discovering continuous symmetries with respect to the Lie derivative, viewed as a bilinear form (ξ,F)ξF(\xi,F)\mapsto{\mathcal{L}}_{\xi}F. To discover symmetries, we seek generators ξLie(G)\xi\in\operatorname{Lie}(G) satisfying ξF=0{\mathcal{L}}_{\xi}F=0 for a known function FF. On the other hand, to enforce a connected group of symmetries, we seek functions FF satisfying ξiF=0{\mathcal{L}}_{\xi_{i}}F=0 with known generators ξ1,,ξq\xi_{1},\ldots,\xi_{q} of Lie(G)\operatorname{Lie}(G).

6 Enforcing symmetry with linear constraints

Methods to enforce symmetry in neural networks and other machine learning models have been studied extensively, as we reviewed briefly in Section 3.1. A unifying theme in these techniques has been the use of linear constraints to enforce symmetry (Finzi et al., 2021; Loiseau and Brunton, 2018; Weiler et al., 2018; Cohen et al., 2019; Ahmadi and Khadir, 2020). The purpose of this section is to show how several of these methods can be understood in terms of the fundamental operators and linear constraints provided by Theorem 3.

6.1 Multilayer perceptrons

Enforcing symmetry in multilayer percetrons was studied by Finzi et al. (2021). They provide a practical method based on enforcing linear constraints on the weights defining each layer of a neural network. The network uses specialized nonlinearities that are automatically equivariant, meaning that the constraints need only be enforced on the linear component of each layer. We show that the constraints derived by Finzi et al. (2021) are the same as those given by Theorem 3.

Specifically, each linear layer F(l):𝒱l1𝒱lF^{(l)}:{\mathcal{V}}_{l-1}\to{\mathcal{V}}_{l}, for l=1,,Ll=1,\ldots,L, is defined by

F(l)(x)=W(l)x+b(l),F^{(l)}(x)=W^{(l)}x+b^{(l)}, (28)

where W(l)W^{(l)} are weight matrices and b(l)b^{(l)} are bias vectors. Defining group representations Φl:GGL(𝒱l)\Phi_{l}:G\to\operatorname{GL}({\mathcal{V}}_{l}) for each layer, yields fundamental operators given by

𝒦gF(l)(x)F(l)(x)\displaystyle{\mathcal{K}}_{g}F^{(l)}(x)-F^{(l)}(x) =(Φl(g)W(l)Φl1(g)1W(l))x+Φl(g)b(l)b(l)\displaystyle=\big{(}\Phi_{l}(g)W^{(l)}\Phi_{l-1}(g)^{-1}-W^{(l)}\big{)}x+\Phi_{l}(g)b^{(l)}-b^{(l)} (29)
ξF(l)(x)\displaystyle{\mathcal{L}}_{\xi}F^{(l)}(x) =(ϕl(ξ)W(l)W(l)ϕl1(ξ))x+ϕl(ξ)b(l).\displaystyle=\big{(}\phi_{l}(\xi)W^{(l)}-W^{(l)}\phi_{l-1}(\xi)\big{)}x+\phi_{l}(\xi)b^{(l)}. (30)

Let {ξi}i=1q\{\xi_{i}\}_{i=1}^{q} generate Lie(G)\operatorname{Lie}(G) and let {gj}j=1nG1\{g_{j}\}_{j=1}^{n_{G}-1} consist of an element from each non-identity component of GG. Using the fundamental operators and Theorem 3, it follows that the layer F(l)F^{(l)} is GG-equivariant if and only if the weights and biases satisfy

ϕl(ξi)W(l)=W(l)ϕl1(ξi),andΦl(gj)W(l)=W(l)Φl1(gj),\phi_{l}(\xi_{i})W^{(l)}=W^{(l)}\phi_{l-1}(\xi_{i}),\quad\mbox{and}\quad\Phi_{l}(g_{j})W^{(l)}=W^{(l)}\Phi_{l-1}(g_{j}), (31)
ϕl(ξi)b(l)=0,andΦl(gj)b(l)=b(l)\phi_{l}(\xi_{i})b^{(l)}=0,\quad\mbox{and}\quad\Phi_{l}(g_{j})b^{(l)}=b^{(l)} (32)

for every i=1,,qi=1,\ldots,q and j=1,,ng1j=1,\ldots,n_{g}-1. These are the same as the linear constraints one derives using the method by Finzi et al. (2021). The equivariant linear layers are then combined with specialized equivariant nonlinearities σ(l):𝒱l𝒱l\sigma^{(l)}:{\mathcal{V}}_{l}\to{\mathcal{V}}_{l} to produce an equivariant network

F=σ(L)F(L)σ(1)F(1):𝒱0𝒱L.F=\sigma^{(L)}\circ F^{(L)}\circ\cdots\circ\sigma^{(1)}\circ F^{(1)}:{\mathcal{V}}_{0}\to{\mathcal{V}}_{L}. (33)

The composition of equivariant functions is equivariant, as one can easily check using Definition 2.

6.2 Neural operators acting on fields

Enforcing symmetry in neural networks acting on spatial fields has been studied extensively by Weiler et al. (2018); Cohen et al. (2018); Esteves et al. (2018); Kondor and Trivedi (2018); Cohen et al. (2019) among others. Many of these techniques use integral operators to define equivariant linear layers, which are coupled with equivariant nonlinearities, such as the gated nonlinearities proposed by Weiler et al. (2018). Networks built by composing integral operators with nonlinearities constitute a large class of “neural operators” described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023). The key task is to identify appropriate bases for equivariant kernels. For certain groups, such as the Special Euclidean group G=SE(3)G=\operatorname{SE}(3), bases can be constructed explicitly using spherical harmonics, as in Weiler et al. (2018). We show that equivariance with respect to arbitrary group actions can be enforced via linear constraints on the integral kernels derived using the fundamental operators introduced in Section 5. Appropriate bases of kernel functions can then be constructed numerically by computing an appropriate nullspace, as is done by Finzi et al. (2021) for multilayer perceptrons.

For the sake of simplicity we consider integral operators acting on vector-valued functions F:m𝒱F:\mathbb{R}^{m}\to{\mathcal{V}}, where 𝒱{\mathcal{V}} is a finite-dimensional vector space. Later on in Section 11.4 we study higher-order integral operators acting on sections of vector bundles. If 𝒲{\mathcal{W}} is another finite-dimensional vector space, an integral operator acting on FF to produce a new function n𝒲\mathbb{R}^{n}\to{\mathcal{W}} is defined by

𝒯KF(x)=mK(x,y)F(y)dy,{\mathcal{T}}_{K}F(x)=\int_{\mathbb{R}^{m}}K(x,y)F(y)\operatorname{\mathrm{d}}y, (34)

where the “kernel” function KK provides a linear map K(x,y):𝒱𝒲K(x,y):{\mathcal{V}}\to{\mathcal{W}} at each (x,y)n×m(x,y)\in\mathbb{R}^{n}\times\mathbb{R}^{m}. In other words, the kernel is a function on n×m\mathbb{R}^{n}\times\mathbb{R}^{m} taking values in the tensor product space 𝒲𝒱{\mathcal{W}}\otimes{\mathcal{V}}^{*}, where 𝒱{\mathcal{V}}^{*} denotes the algebraic dual of 𝒱{\mathcal{V}}. Many of the neural operator architectures described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023) are constructed by composing layers defined by integral operators (34) with nonlinear activation functions, usually acting pointwise. The kernel functions KK are optimized during training of the neural operator.

With group actions defined by representations on m,n,𝒱,𝒲\mathbb{R}^{m},\mathbb{R}^{n},{\mathcal{V}},{\mathcal{W}}, functions F:m𝒱F:\mathbb{R}^{m}\to{\mathcal{V}} transform according to

𝒦g(m,𝒱)F(x)=Φ𝒱(g)F(Φm(g)1x){\mathcal{K}}_{g}^{(\mathbb{R}^{m},{\mathcal{V}})}F(x)=\Phi_{{\mathcal{V}}}(g)F(\Phi_{\mathbb{R}^{m}}(g)^{-1}x) (35)

for gGg\in G. Likewise, functions n𝒲\mathbb{R}^{n}\to{\mathcal{W}} transform via an analogous operator 𝒦g(n,𝒲){\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}.

Definition 5

The integral operator 𝒯K{\mathcal{T}}_{K} in (34) is equivariant with respect to gGg\in G when

𝒦g(n,𝒲)𝒯K𝒦g1(m,𝒱)=𝒯K.{\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{(\mathbb{R}^{m},{\mathcal{V}})}={\mathcal{T}}_{K}. (36)

The elements gg satisfying this equation form a subgroup of GG denoted SymG(𝒯K)\operatorname{Sym}_{G}({\mathcal{T}}_{K}).

By changing variables in the integral, the operator on the left is given by

𝒦g(n,𝒲)𝒯K𝒦g1(m,𝒱)F(x)=m𝒦gK(x,y)F(y)dy,{\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{(\mathbb{R}^{m},{\mathcal{V}})}F(x)=\int_{\mathbb{R}^{m}}{\mathcal{K}}_{g}K(x,y)F(y)\operatorname{\mathrm{d}}y, (37)

where

𝒦gK(x,y)=Φ𝒲(g)K(Φn(g)1x,Φm(g)1y)Φ𝒱(g)1det[Φm(g)1].{\mathcal{K}}_{g}K(x,y)=\Phi_{{\mathcal{W}}}(g)K\big{(}\Phi_{\mathbb{R}^{n}}(g)^{-1}x,\Phi_{\mathbb{R}^{m}}(g)^{-1}y\big{)}\Phi_{{\mathcal{V}}}(g)^{-1}\det\big{[}\Phi_{\mathbb{R}^{m}}(g)^{-1}\big{]}. (38)

The following result provides equivariance conditions in terms of the kernel, generalizing Lemma 1 in Weiler et al. (2018).

Proposition 6

Let KK be continuous and suppose that 𝒯K{\mathcal{T}}_{K} acts on a function space containing all smooth, compactly supported fields. Then

SymG(𝒯K)={gG:𝒦gK=K}.\operatorname{Sym}_{G}({\mathcal{T}}_{K})=\left\{g\in G\ :\ {\mathcal{K}}_{g}K=K\right\}. (39)

We give a proof in Appendix A

The Lie derivative of a continuously differentiable kernel function is given by

ξK(x,y)=ϕ𝒲(ξ)K(x,y)K(x,y)ϕ𝒱(ξ)K(x,y)Tr[ϕm(ξ)]K(x,y)xϕn(ξ)xK(x,y)yϕm(ξ)y{\mathcal{L}}_{\xi}K(x,y)=\phi_{{\mathcal{W}}}(\xi)K(x,y)-K(x,y)\phi_{{\mathcal{V}}}(\xi)-K(x,y)\operatorname{Tr}[\phi_{\mathbb{R}^{m}}(\xi)]\\ -\frac{\partial K(x,y)}{\partial x}\phi_{\mathbb{R}^{n}}(\xi)x-\frac{\partial K(x,y)}{\partial y}\phi_{\mathbb{R}^{m}}(\xi)y (40)

The operators 𝒦g{\mathcal{K}}_{g} and ξ{\mathcal{L}}_{\xi} are the fundamental operators from Section 5 because the transformation law for the kernel can be written as

𝒦gK=Φ𝒲𝒱(g)KΦm×m(g)1,{\mathcal{K}}_{g}K=\Phi_{{\mathcal{W}}\otimes{\mathcal{V}}^{*}}(g)K\circ\Phi_{\mathbb{R}^{m}\times\mathbb{R}^{m}}(g)^{-1}, (41)

where

Φn×m(g)\displaystyle\Phi_{\mathbb{R}^{n}\times\mathbb{R}^{m}}(g) :(x,y)(Φn(g)x,Φm(g)y),\displaystyle:(x,y)\mapsto\left(\Phi_{\mathbb{R}^{n}}(g)x,\Phi_{\mathbb{R}^{m}}(g)y\right), (42)
Φ𝒲𝒱(g)\displaystyle\Phi_{{\mathcal{W}}\otimes{\mathcal{V}}^{*}}(g) :TΦ𝒲(g)TΦ𝒱(g)1det[Φm(g)1]\displaystyle:T\mapsto\Phi_{{\mathcal{W}}}(g)T\Phi_{{\mathcal{V}}}(g)^{-1}\det\big{[}\Phi_{\mathbb{R}^{m}}(g)^{-1}\big{]}

are representations of GG in m×m\mathbb{R}^{m}\times\mathbb{R}^{m} and 𝒲𝒱{\mathcal{W}}\otimes{\mathcal{V}}^{*}.

As an immediate consequence of Theorem 3, we have the following corollary establishing linear constraints for the kernel to produce an equivariant integral operator.

Corollary 7

Let {ξi}i=1q\{\xi_{i}\}_{i=1}^{q} generate the Lie algebra Lie(G)\operatorname{Lie}(G) and let {gj}j=1nG1\{g_{j}\}_{j=1}^{n_{G}-1} contain one element from each non-identity component of GG. Under the same hypotheses as Proposition 6 and assuming KK is continuously differentiable, the integral operator 𝒯K{\mathcal{T}}_{K} in (34) is GG-equivariant in the sense of Definition 5 if and only if

ξiK=0and𝒦gjKK=0{\mathcal{L}}_{\xi_{i}}K=0\qquad\mbox{and}\qquad{\mathcal{K}}_{g_{j}}K-K=0 (43)

for every i=1,,qi=1,\ldots,q and every j=1,,nG1j=1,\ldots,n_{G}-1.

These linear constraint equations must be satisfied to enforced equivariance with respect to a known symmetry GG in the machine learning process. By discretizing the operators 𝒦g{\mathcal{K}}_{g} and ξ{\mathcal{L}}_{\xi}, as discussed later in Section 10, one can solve these constraints numerically to construct a basis of kernel functions for equivariant integral operators.

As an immediate consequence of Theorem 4, the following result shows that the Lie derivative of the kernel encodes the continuous symmetries of a given integral operator.

Corollary 8

Under the same hypotheses as Proposition 6 and assuming KK is continuously differentiable, it follows that SymG(𝒯K)\operatorname{Sym}_{G}({\mathcal{T}}_{K}) is a closed, embedded Lie subgroup of GG with Lie subalgebra

𝔰𝔶𝔪G(𝒯K)={ξLie(G):ξK=0}.\operatorname{\mathfrak{sym}}_{G}({\mathcal{T}}_{K})=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}K=0\right\}. (44)

This result will be useful for methods that promote symmetry of the integral operator, as we describe later in Section 8.

7 Discovering symmetry by computing nullspaces

In this section we show that in a wide range of settings, the continuous symmetries of a manifold, point cloud, or map can be recovered by computing the nullspace of a linear operator. For functions, this is already covered by Theorem 4, which allows us to compute the connected subgroup of symmetries by identifying its Lie subalgebra

𝔰𝔶𝔪G(F)=Null(LF)\operatorname{\mathfrak{sym}}_{G}(F)=\operatorname{Null}(L_{F}) (45)

where LF:ξξL_{F}:\xi\mapsto{\mathcal{L}}_{\xi} is the linear operator defined by (27). Hence, if a machine learning model FF has a symmetry group SymG(F)\operatorname{Sym}_{G}(F), then its Lie algebra is equal to the nullspace of LFL_{F}.

This section explains how this is actually a special case of a more general result allowing us to reveal the symmetries of submanifolds via the nullspace of a closely related operator. We begin with the more general case where we study the symmetries of a submanifold of Euclidean space, and we explain how to recover symmetries from point clouds approximating submanifolds. The Lie derivative described in Section 5 is then recovered when the submanifold is the graph of a function. We also briefly describe how the fundamental operators from Section 5 can be used to recover symmetries and conservation laws of dynamical systems.

7.1 Symmetries of submanifolds

We begin by studying the symmetries of smooth submanifolds {\mathcal{M}} of Euclidean space d\mathbb{R}^{d} using an approach similar to Cahill et al. (2023). However, we use a different operator that generalizes more naturally to nonlinear group actions on arbitrary manifolds (see Section 12) and recovers the Lie derivative (see Section 7.2). With right action θ:d×Gd\theta:\mathbb{R}^{d}\times G\to\mathbb{R}^{d} of a Lie group, we define invariance of a submanifold as follows:

Definition 9

A submanifold d{\mathcal{M}}\subset\mathbb{R}^{d} is invariant with respect to a group element gGg\in G if

θg(z)\theta_{g}(z)\in{\mathcal{M}} (46)

for every zz\in{\mathcal{M}}. These elements form a subgroup of GG denoted SymG()\operatorname{Sym}_{G}({\mathcal{M}}).

The subgroup of symmetries of a submanifold is characterized by the following theorem.

Theorem 10

Let {\mathcal{M}} be a smooth, closed, embedded submanifold of d\mathbb{R}^{d}. Then SymG()\operatorname{Sym}_{G}({\mathcal{M}}) is a closed, embedded Lie subgroup of GG whose Lie subalgebra is

𝔰𝔶𝔪G()={ξLie(G):θ^(ξ)zTzz}.\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\{\xi\in\operatorname{Lie}(G)\ :\ \hat{\theta}(\xi)_{z}\in T_{z}{\mathcal{M}}\quad\forall z\in{\mathcal{M}}\}. (47)

This is a special case of Theorem 35.

The meaning of this result and its practical use for detecting symmetry are illustrated in Figure 3.

To reveal the connected component of SymG()\operatorname{Sym}_{G}({\mathcal{M}}), we let Pz:ddP_{z}:\mathbb{R}^{d}\to\mathbb{R}^{d} be a family of linear projections onto TzdT_{z}{\mathcal{M}}\subset\mathbb{R}^{d}. These are assumed to vary continuously with respect to zz\in{\mathcal{M}}. Then under the assumptions of the above theorem, 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) is the nullspace of the symmetric, positive-semidefinite operator S:Lie(G)Lie(G)S_{{\mathcal{M}}}:\operatorname{Lie}(G)\to\operatorname{Lie}(G) defined by

η,SξLie(G)=θ^(η)zT(IPz)T(IPz)θ^(ξ)zdμ(z)\big{\langle}\eta,\ S_{{\mathcal{M}}}\xi\big{\rangle}_{\operatorname{Lie}(G)}=\int_{{\mathcal{M}}}\hat{\theta}(\eta)_{z}^{T}(I-P_{z})^{T}(I-P_{z})\hat{\theta}(\xi)_{z}\ \operatorname{\mathrm{d}}\mu(z) (48)

for every η,ξLie(G)\eta,\xi\in\operatorname{Lie}(G). We see in Figure 3 that (IPz)θ^(ξ)z(I-P_{z})\hat{\theta}(\xi)_{z} measures the component of the infinitesimal generator not tangent to the submanifold at zz. Here, μ\mu is any strictly positive measure on {\mathcal{M}} that makes all of these integrals finite. The above formula is useful for computing the matrix of SS_{{\mathcal{M}}} in an orthonormal basis for Lie(G)\operatorname{Lie}(G).

Refer to caption{\mathcal{M}}TzT_{z}{\mathcal{M}}zzθ^(ξ)z\hat{\theta}(\xi)_{z}θexp(tξ)(z)\theta_{\exp(t\xi)}(z)
Refer to caption{\mathcal{M}}TzT_{z}{\mathcal{M}}zzPzθ^(ξ)zP_{z}\hat{\theta}(\xi)_{z}(IPz)θ^(ξ)z(I-P_{z})\hat{\theta}(\xi)_{z}θ^(ξ)z\hat{\theta}(\xi)_{z}θexp(tξ)(z)\theta_{\exp(t\xi)}(z)
Figure 3: Tangency of infinitesimal generators and symmetries of submanifolds. The infinitesimal generator θ^(ξ)\hat{\theta}(\xi) is everywhere tangent to the submanifold {\mathcal{M}} if and only if the curves tθexp(tξ)(z)t\mapsto\theta_{\exp(t\xi)}(z), with zz\in{\mathcal{M}}, lie in {\mathcal{M}} for all tt. The Lie algebra elements ξ\xi satisfying this tangency condition form the Lie subalgebra of symmetries of {\mathcal{M}}. To test for tangency of the infinitesimal generator we use a family of projections PzP_{z} onto the tangent spaces TzT_{z}{\mathcal{M}} for every zz\in{\mathcal{M}}. Specifically, (IPz)θ^(ξ)z(I-P_{z})\hat{\theta}(\xi)_{z} is the component of the infinitesimal generator at zz that does not lie tangent to {\mathcal{M}}. Hence, ξ\xi generates a symmetry of {\mathcal{M}} if and only if (IPz)θ^(ξ)z=0(I-P_{z})\hat{\theta}(\xi)_{z}=0 for all zz\in{\mathcal{M}}.

Alternatively, when the dimension of GG is large, one can compute the nullspace using a Krylov algorithm such as the one described in Finzi et al. (2021). Such algorithms rely solely on queries of SS_{{\mathcal{M}}} acting on vectors ξLie(G)\xi\in\operatorname{Lie}(G). When θg(z)=Φ(g1)z\theta_{g}(z)=\Phi(g^{-1})z and θ^(ξ)z=ϕ(ξ)z\hat{\theta}(\xi)_{z}=-\phi(\xi)z are given by a Lie group representation (see Section 4.2), then the operator defined in (48) is given explicitly by

Sξ=dΦ(e)[(IPz)T(IPz)ϕ(ξ)zzT]dμ(z),S_{{\mathcal{M}}}\xi=\int_{{\mathcal{M}}}\operatorname{\mathrm{d}}\Phi(e)^{*}\left[(I-P_{z})^{T}(I-P_{z})\phi(\xi)zz^{T}\right]\ \operatorname{\mathrm{d}}\mu(z), (49)

where dΦ(e):d×dLie(G)\operatorname{\mathrm{d}}\Phi(e)^{*}:\mathbb{R}^{d\times d}\to\operatorname{Lie}(G) is the adjoint of dΦ(e):Lie(G)d×d\operatorname{\mathrm{d}}\Phi(e):\operatorname{Lie}(G)\to\mathbb{R}^{d\times d}. If Gd×dG\subset\mathbb{R}^{d\times d} is a matrix Lie group and Φ\Phi is the identity representation, then dΦ(e)\operatorname{\mathrm{d}}\Phi(e) is the injection Lie(G)d×d\operatorname{Lie}(G)\hookrightarrow\mathbb{R}^{d\times d}. When Lie(G)d×d\operatorname{Lie}(G)\subset\mathbb{R}^{d\times d} inherits its inner product from d×d\mathbb{R}^{d\times d}, then dΦ(e)\operatorname{\mathrm{d}}\Phi(e)^{*} is the orthogonal projection of d×d\mathbb{R}^{d\times d} onto Lie(G)\operatorname{Lie}(G). For example, if Φ\Phi is the identity representation of SE(d)\operatorname{SE}(d) in d+1\mathbb{R}^{d+1} with the inner product on 𝔰𝔢(d)(d+1)×(d+1)\operatorname{\mathfrak{se}}(d)\subset\mathbb{R}^{(d+1)\times(d+1)} given by the usual inner product of matrices M1,M2=Tr(M1TM2)\langle M_{1},M_{2}\rangle=\operatorname{Tr}(M_{1}^{T}M_{2}), then it can be readily verified that

dΦ(e)([AbcTa])=[12(AAT)b00],Ad×d,bd,cd,a.\operatorname{\mathrm{d}}\Phi(e)^{*}\left(\begin{bmatrix}A&b\\ c^{T}&a\end{bmatrix}\right)=\begin{bmatrix}\frac{1}{2}(A-A^{T})&b\\ 0&0\end{bmatrix},\qquad A\in\mathbb{R}^{d\times d},\ b\in\mathbb{R}^{d},\ c\in\mathbb{R}^{d},\ a\in\mathbb{R}. (50)

In practice, one can use sample points ziz_{i} on the manifold to obtain a Monte-Carlo estimate of SS_{{\mathcal{M}}} with approximate projections PziP_{z_{i}} computed using local principal component analysis (PCA), as described in Cahill et al. (2023). More accurate estimates of the tangent spaces can be obtained using the methods in Berry and Giannakis (2020). Assuming the PziP_{z_{i}} are accurate, the following proposition shows that the correct Lie subalgebra of symmetries is revealed using finitely many sample points ziz_{i}. However, this result does not tell us how many samples to use, or even when to stop sampling.

Proposition 11

Let μ\mu be a strictly positive probability measure on a smooth manifold {\mathcal{M}} such that ξ,Sξ<\langle\xi,S_{{\mathcal{M}}}\xi\rangle<\infty for every ξLie(G)\xi\in\operatorname{Lie}(G). Let ziz_{i} be drawn independently from the distribution μ\mu and let Sm:Lie(G)Lie(G)S_{m}:\operatorname{Lie}(G)\to\operatorname{Lie}(G) be defined by

η,SmξLie(G)=1mi=1mθ^(η)ziT(IPzi)T(IPzi)θ^(ξ)zi.\big{\langle}\eta,\ S_{m}\xi\big{\rangle}_{\operatorname{Lie}(G)}=\frac{1}{m}\sum_{i=1}^{m}\hat{\theta}(\eta)_{z_{i}}^{T}(I-P_{z_{i}})^{T}(I-P_{z_{i}})\hat{\theta}(\xi)_{z_{i}}. (51)

Then there is almost surely an integer M0M_{0} such that for every mM0m\geq M_{0} we have Null(Sm)=𝔰𝔶𝔪G()\operatorname{Null}(S_{m})=\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}). We provide a proof in Appendix A.

7.2 Symmetries of functions as symmetries of submanifolds

The method described above for studying symmetries of submanifolds can be applied to reveal the symmetries of smooth maps between vector spaces by identifying the map F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} with its graph

gr(F)={(x,F(x))𝒱×𝒲:x𝒱}.\operatorname{gr}(F)=\{(x,F(x))\in{\mathcal{V}}\times{\mathcal{W}}\ :\ x\in{\mathcal{V}}\}. (52)

The graph is a smooth, closed, embedded submanifold of the space 𝒱×𝒲{\mathcal{V}}\times{\mathcal{W}} by Proposition 5.7 in Lee (2013). We show that this approach recovers the Lie derivative and our result in Theorem 4. By choosing bases for the domain and codomain, it suffices to consider smooth functions F:mnF:\mathbb{R}^{m}\to\mathbb{R}^{n}.

Supposing that we have representations Φm\Phi_{\mathbb{R}^{m}} and Φn\Phi_{\mathbb{R}^{n}} of GG in the domain and codomain, we consider a combined representation

Φ:g[Φm(g)00Φn(g)].\Phi:g\mapsto\begin{bmatrix}\Phi_{\mathbb{R}^{m}}(g)&0\\ 0&\Phi_{\mathbb{R}^{n}}(g)\end{bmatrix}. (53)

Defining a smoothly-varying family of projections

P(x,F(x))=[I0dF(x)0]P_{(x,F(x))}=\begin{bmatrix}I&0\\ \operatorname{\mathrm{d}}F(x)&0\end{bmatrix} (54)

onto T(x,F(x))gr(F)T_{(x,F(x))}\operatorname{gr}(F), it is easy to check that

[0ξF(x)]=([I00I][I0dF(x)0])IP(x,F(x))[ϕm(ξ)00ϕn(ξ)]ϕ(ξ)[xF(x)].\begin{bmatrix}0\\ {\mathcal{L}}_{\xi}F(x)\end{bmatrix}=\underbrace{\left(\begin{bmatrix}I&0\\ 0&I\end{bmatrix}-\begin{bmatrix}I&0\\ \operatorname{\mathrm{d}}F(x)&0\end{bmatrix}\right)}_{I-P_{(x,F(x))}}\underbrace{\begin{bmatrix}\phi_{\mathbb{R}^{m}}(\xi)&0\\ 0&\phi_{\mathbb{R}^{n}}(\xi)\end{bmatrix}}_{\phi(\xi)}\begin{bmatrix}x\\ F(x)\end{bmatrix}. (55)

We note that this is a special case of Theorem 39 describing the Lie derivative in terms of a projection onto the tangent space of a function’s graph. The resulting operator Sgr(F)S_{\operatorname{gr}(F)} defined by (48) is given by

η,Sgr(F)ξLie(G)=m(ηF(x))TξF(x)dμ(x),\left\langle\eta,S_{\operatorname{gr}(F)}\xi\right\rangle_{\operatorname{Lie}(G)}=\int_{\mathbb{R}^{m}}({\mathcal{L}}_{\eta}F(x))^{T}{\mathcal{L}}_{\xi}F(x)\ \operatorname{\mathrm{d}}\mu(x), (56)

for η,ξLie(G)\eta,\xi\in\operatorname{Lie}(G) and an appropriate positive measure μ\mu on m\mathbb{R}^{m} that makes the integrals finite. Therefore, Theorem 4 is recovered from our result about symmetries of submanifolds stated in Theorem 10.

Related quantities have been used to study the symmetries of trained neural networks, with the FF being the network and its derivatives computed via back-propagation. The quantity ξ,Sgr(F)ξLie(G)=ξFL2(μ)\left\langle\xi,S_{\operatorname{gr}(F)}\xi\right\rangle_{\operatorname{Lie}(G)}=\|{\mathcal{L}}_{\xi}F\|_{L^{2}(\mu)} was used by Gruver et al. (2022) to construct the Local Equivariant Error or (LEE), measuring the extent to which a trained neural network FF fails to respect symmetries in the one-parameter group {exp(tξ)}t\{\exp(t\xi)\}_{t\in\mathbb{R}}. The nullspace of ξξF\xi\mapsto{\mathcal{L}}_{\xi}F in the special case where Φn(g)=I\Phi_{\mathbb{R}^{n}}(g)=I acts trivially was used by Moskalev et al. (2022) to identify the connected subgroup with respect to which a given network is invariant.

By viewing a function as a submanifold, we obtain a simple data-driven technique for estimating the Lie derivative and subgroup of symmetries of the function. To approximate ξF{\mathcal{L}}_{\xi}F, Sgr(F)S_{\operatorname{gr}(F)}, and 𝔰𝔶𝔪G(F)\operatorname{\mathfrak{sym}}_{G}(F) using input-output pairs (xi,yi=F(xi))(x_{i},y_{i}=F(x_{i})), one simply needs to approximate the projection in (54) using these data. To do this, we can obtain matrices UiU_{i} with mm columns spanning T(xi,yi)gr(F)T_{(x_{i},y_{i})}\operatorname{gr}(F) by applying local PCA to the data zi=(xi,yi)z_{i}=(x_{i},y_{i}), or by pruning the frames computed in Berry and Giannakis (2020). With E=[Im×m0m×n]E=\begin{bmatrix}I_{m\times m}&0_{m\times n}\end{bmatrix} the projection in (54) is given by

Pzi=Ui(EUi)1EP_{z_{i}}=U_{i}(EU_{i})^{-1}E (57)

because any projection is uniquely determined by its range and nullspace (see Section 5.9 of Meyer (2000)). This gives us a simple way to approximate (ξF)(zi)({\mathcal{L}}_{\xi}F)(z_{i}), Sgr(F)S_{\operatorname{gr}(F)}, and 𝔰𝔶𝔪G(F)\operatorname{\mathfrak{sym}}_{G}(F) using the input-output pairs. However, many such pairs are needed since the tangent space to the graph of FF at xix_{i} is well-approximated by local PCA only when there are at least mm neighboring samples sufficiently close to xix_{i}. Even more samples are needed when they are noisy. The convergence properties of the spectral methods in Berry and Giannakis (2020) are better, but they still require enough samples to obtain accurate Monte-Carlo or quadrature-based estimates of integrals, in this case over m\mathbb{R}^{m}.

7.3 Symmetries and conservation laws of dynamical systems

Here, we consider the case when F:nnF:\mathbb{R}^{n}\to\mathbb{R}^{n} is a smooth function defining a dynamical system

ddtx(t)=F(x(t))\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}x(t)=F(x(t)) (58)

with state variables x(t)nx(t)\in\mathbb{R}^{n}. The solution of this equation is described by the flow map Fl:(t,x(τ))x(τ+t)\operatorname{Fl}:(t,x(\tau))\mapsto x(\tau+t), which is defined on a maximal connected open set DD containing 0×n0\times\mathbb{R}^{n}. In many cases we write Flt()=Fl(t,)\operatorname{Fl}^{t}(\cdot)=\operatorname{Fl}(t,\cdot). Given a Lie group representation Φ:GGL(n)\Phi:G\to\operatorname{GL}(\mathbb{R}^{n}), equivariance for the dynamical system is defined as follows:

Definition 12

The dynamical system in (58) is equivariant with respect to a group element gGg\in G if the flow map satisfies

𝒦gFlt(x):=Φ(g)Flt(Φ(g)1x)=Flt(x){\mathcal{K}}_{g}\operatorname{Fl}^{t}(x):=\Phi(g)\operatorname{Fl}^{t}(\Phi(g)^{-1}x)=\operatorname{Fl}^{t}(x) (59)

for every (t,x)D(t,x)\in D.

Differentiating at t=0t=0 shows that equivariance of the dynamical system implies that FF is equivariant in the sense of Definition 2. The converse is also true thanks to Corollary 9.14 in Lee (2013), meaning that equivariance for the dynamical system is equivalent to equivariance of FF. Therefore, we can study equivariance of the dynamical system in (58) by directly applying the tools developed in Section 5 to the function FF. Thanks to Theorem 4, identifying the connected subgroup of symmetries for the dynamical system is a simple matter of computing the nullspace of the linear map ξξF\xi\mapsto{\mathcal{L}}_{\xi}F, that is

𝔰𝔶𝔪G(F)={ξLie(G):ξF=0}.\operatorname{\mathfrak{sym}}_{G}(F)=\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F=0\}. (60)

Here, the Lie derivative is given by

ξF(x)=ϕ(ξ)F(x)F(x)xϕ(ξ)x=[θ^(ξ),F](x),{\mathcal{L}}_{\xi}F(x)=\phi(\xi)F(x)-\frac{\partial F(x)}{\partial x}\phi(\xi)x=[\hat{\theta}(\xi),F](x), (61)

where [θ^(ξ),F][\hat{\theta}(\xi),F] is the Lie bracket of the infinitesimal generator defined by θ^(ξ)x=ϕ(ξ)x\hat{\theta}(\xi)_{x}=-\phi(\xi)x and the vector field FF. Symmetries can also be enforced as linear constraints on FF described by Theorem 3. This was done by Ahmadi and Khadir (2020) for polynomial dynamical systems with discrete symmetries. Later on in Section 11.1 we show that analogous results apply to dynamical systems defined by vector fields on manifolds and nonlinear Lie group actions.

A conserved quantity for the system in (58) is defined as follows:

Definition 13

A scalar valued quantity f:nf:\mathbb{R}^{n}\to\mathbb{R} is said to be conserved when

𝒦tf(x):=f(Flt(x))=f(x)(t,x)D.{\mathcal{K}}_{t}f(x):=f(\operatorname{Fl}^{t}(x))=f(x)\qquad\forall(t,x)\in D. (62)

In this setting, the composition operators 𝒦t{\mathcal{K}}_{t} are often referred to as Koopman operators (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)). It is easy to see that a smooth function ff is conserved if and only if

Ff:=ddt|t=0𝒦tf=fxF=0.{\mathcal{L}}_{F}f:=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{t}f=\frac{\partial f}{\partial x}F=0. (63)

This relation is used by Kaiser et al. (2018, 2021) to identify conserved quantities by computing the nullspace of F{\mathcal{L}}_{F} restricted to finite-dimensional spaces of candidate functions. When the flow is defined for all tt\in\mathbb{R}, the operators 𝒦t{\mathcal{K}}_{t} and F{\mathcal{L}}_{F} are the fundamental operators from Section 5 for the right action θ(x,t)=Flt(x)\theta(x,t)=\operatorname{Fl}^{t}(x) and representation Φ(t)=I\Phi(t)=I of the Lie group G=(,+)G=(\mathbb{R},+).

Remark 14

For Hamiltonian dynamical systems Noether’s theorem establishes a remarkable equivalence between the symmetries of the Hamiltonian and conserved quantities of the system. We study Hamiltonian systems later in Section 11.3.

8 Promoting symmetry with convex penalties

In this section we show how to design custom convex regularization functions to promote symmetries within a given candidate group during training of a machine learning model. This allows us to train a model with as many symmetries as possible from among the candidates, while breaking candidate symmetries only when the data provides sufficient evidence. We study both discrete and continuous groups of candidate symmetries. We quantify the extent to which symmetries within the candidate group are broken using the fundamental operators described in Section 5. For discrete groups we use the transformation operators {𝒦g}gG\{{\mathcal{K}}_{g}\}_{g\in G} and for continuous groups we use the Lie derivatives {ξ}ξLie(G)\{{\mathcal{L}}_{\xi}\}_{\xi\in\operatorname{Lie}(G)}. In the continuous case we penalize a convex relaxation of the codimension of the subgroup of symmetries given by a nuclear norm (Schatten 11-norm) of the operator ξξF\xi\mapsto{\mathcal{L}}_{\xi}F defined by (27); minimizing this codimension via the proxy nuclear norm will promote the largest nullspace possible, and hence the largest admissible symmetry group. Once these regularization functions are developed abstractly in Sections 8.1 and 8.2, we show how the approach can be applied to basis function regression (Section 8.3), symmetric function recovery (Section 9), and neural networks (Section 8.4).

As in Section 5, the basic building blocks of the machine learning models we consider are continuously differentiable (C1C^{1}) functions F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} between finite-dimensional vector spaces. While we consider this restricted setting here, our results readily generalize to sections of vector bundles, as we describe later in Section 11. These functions could be layers of a multilayer perceptron, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task. We consider parametric models where FF is constrained to lie in a given finite-dimensional subspace C1(𝒱;𝒲){\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}}) of continuously differentiable functions. Working within a finite-dimensional subspace of functions will allow us to discretize the fundamental operators in Section 10.

We consider the same setting as Section 5, i.e., candidate symmetries are described by a Lie group GG acting on the domain and codomain of functions FF\in{\mathcal{F}} via a right action θ:𝒱×GG\theta:{\mathcal{V}}\times G\to G and a representation Φ:GGL(𝒲)\Phi:G\to\operatorname{GL}({\mathcal{W}}). Equivariance in this setting is described by Definition 2. When fitting the function FF to data, our regularization functions penalize the size of GSymG(F)G\setminus\operatorname{Sym}_{G}(F). For reasons that will become clear, we use different penalties corresponding to different notions of “size” when GG is a discrete group versus when GG is continuous. The main result describing the continuous symmetries of FF is Theorem 4.

8.1 Discrete symmetries

When the group GG has finitely many elements, one can measure the size of GSymG(F)G\setminus\operatorname{Sym}_{G}(F) simply by counting its elements:

RG,0(F)=|GSymG(F)|.R_{G,0}(F)=|G\setminus\operatorname{Sym}_{G}(F)|. (64)

However, this penalty is impractical for optimization owing to its discrete values and nonconvexity. Letting \|\cdot\| be any norm on the space ′′=span{𝒦gF:gG,F}{\mathcal{F}}^{\prime\prime}=\operatorname{span}\{{\mathcal{K}}_{g}F\ :\ g\in G,\ F\in{\mathcal{F}}\} yields a convex relaxation of the above penalty given by

RG,1(F)=gG𝒦gFF.R_{G,1}(F)=\sum_{g\in G}\|{\mathcal{K}}_{g}F-F\|. (65)

This is a convex function on {\mathcal{F}} because 𝒦g{\mathcal{K}}_{g} is a linear operator and vector space norms are convex. For example, if 𝒄=(c1,,cN){\boldsymbol{c}}=(c_{1},\ldots,c_{N}) are the coefficients of FF in a basis for ′′{\mathcal{F}}^{\prime\prime} and 𝑲g{\boldsymbol{K}}_{g} is the matrix of 𝒦g{\mathcal{K}}_{g} in this basis, then the Euclidean norm can be used to define

RG,1(F)=gG𝑲g𝒄𝒄2.R_{G,1}(F)=\sum_{g\in G}\|{\boldsymbol{K}}_{g}{\boldsymbol{c}}-{\boldsymbol{c}}\|_{2}. (66)

This is directly analogous to the group sparsity penalty proposed in Yuan and Lin (2006).

8.2 Continuous symmetries

We now consider the case where GG is a Lie group of dimension greater than zero. Here we use the dimension of SymG(F)\operatorname{Sym}_{G}(F) to measure the symmetry of FF, seeking to penalize the complementary dimension or “codimension”, given by

RG,0(F)=codim(SymG(F))=dim(G)dim(SymG(F)).R_{G,0}(F)=\operatorname{codim}(\operatorname{Sym}_{G}(F))=\dim(G)-\dim(\operatorname{Sym}_{G}(F)). (67)

We take this approach in the continuous case because it is no longer possible to simply count the number of broken symmetries. While it is possible in principle to replace the sum in (65) by an integral of 𝒦gFF\|{\mathcal{K}}_{g}F-F\| over gGg\in G, the numerical quadrature required to approximate it becomes prohibitive for higher-dimensional candidate groups. This difficulty is exacerbated by the fact that the integrand is not smooth. The space ′′{\mathcal{F}}^{\prime\prime} can also become infinite-dimensional when GG has positive dimension, making it challenging to compute the norm \|\cdot\|.

In contrast, it is much easier to measure the “size” of a continuous symmetry group using its dimension because this can be computed via linear algebra. Specifically, the dimension of SymG(F)\operatorname{Sym}_{G}(F) is equal to that of its Lie algebra. Thanks to Theorem 4, this is the nullspace of a linear operator LF:Lie(G)C0(𝒱;𝒲)L_{F}:\operatorname{Lie}(G)\to C^{0}({\mathcal{V}};{\mathcal{W}}) defined by

LF:ξξF,L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F, (68)

where ξ{\mathcal{L}}_{\xi} is the Lie derivative in (22). By the rank and nullity theorem, the codimension of SymG(F)\operatorname{Sym}_{G}(F) is equal to the rank of this operator:

RG,0(F)=codim(SymG(F))=rank(LF).R_{G,0}(F)=\operatorname{codim}(\operatorname{Sym}_{G}(F))=\operatorname{rank}(L_{F}). (69)

Penalizing the rank of an operator is impractical for optimization owing to its discrete values and nonconvexity. A commonly used convex relaxation of the rank is provided by the Schatten 11-norm, also known as the “nuclear norm”, given by

RG,(F)=LF=i=1dim(G)σi(LF).R_{G,*}(F)=\|L_{F}\|_{*}=\sum_{i=1}^{\dim(G)}\sigma_{i}(L_{F}). (70)

Here σi(LF)\sigma_{i}(L_{F}) denotes the iith singular value of LFL_{F} with respect to inner products on Lie(G)\operatorname{Lie}(G) and =span{ξF:ξLie(G),F}{\mathcal{F}}^{\prime}=\operatorname{span}\{{\mathcal{L}}_{\xi}F\ :\ \xi\in\operatorname{Lie}(G),\ F\in{\mathcal{F}}\}. This space is finite-dimensional, being spanned by {ξiFj}i,j\{{\mathcal{L}}_{\xi_{i}}F_{j}\}_{i,j} where ξi\xi_{i} and FjF_{j} are basis elements for Lie(G)\operatorname{Lie}(G) and {\mathcal{F}}. This enables computations with discrete inner products on {\mathcal{F}}^{\prime}, as we describe in Section 10. For certain rank minimization problems, penalizing the nuclear norm is guaranteed to recover the true minimum rank solution (Candès and Recht, 2009; Recht et al., 2010; Gross, 2011).

The proposed regularization function (70) is convex on {\mathcal{F}} because FLFF\mapsto L_{F} is linear and the nuclear norm is convex. For example, if (c1,,cN)(c_{1},\ldots,c_{N}) are the coefficients of FF in a basis {F1,,FN}\{F_{1},\ldots,F_{N}\} for {\mathcal{F}} and 𝑳Fi{\boldsymbol{L}}_{F_{i}} are the matrices of LFiL_{F_{i}} in orthonormal bases for Lie(G)\operatorname{Lie}(G) and {\mathcal{F}}^{\prime}, then

RG,(F)=c1𝑳F1++cN𝑳FN.R_{G,*}(F)=\|c_{1}{\boldsymbol{L}}_{F_{1}}+\cdots+c_{N}{\boldsymbol{L}}_{F_{N}}\|_{*}. (71)

With {ξ1,,ξdim(G)}\{\xi_{1},\ldots,\xi_{\dim(G)}\} and {u1,,uN}\{u_{1},\ldots,u_{N^{\prime}}\} being the orthonormal bases for Lie(G)\operatorname{Lie}(G) and {\mathcal{F}}^{\prime}, one can compute a store the rank-33 tensor [𝑳Fi]j,k=uj,ξkFi[{\boldsymbol{L}}_{F_{i}}]_{j,k}=\left\langle u_{j},\ {\mathcal{L}}_{\xi_{k}}F_{i}\right\rangle_{{\mathcal{F}}^{\prime}}. Practical methods for constructing and computing with inner products on {\mathcal{F}}^{\prime} will be discussed in Section 10.

8.3 Promoting symmetry in basis function regression

To demonstrate how the symmetry-promoting regularization functions proposed above can be used in practice, consider a regression problem for a function F:mnF:\mathbb{R}^{m}\to\mathbb{R}^{n}. It is common to parameterize this problem by expressing F(x)=W𝒟(x)F(x)=W\mathcal{D}(x) in a dictionary 𝒟:mN\mathcal{D}:\mathbb{R}^{m}\to\mathbb{R}^{N} consisting of user-defined smooth functions with a matrix of weights Wn×NW\in\mathbb{R}^{n\times N} to be fit during the training process. For example, the sparse identification of nonlinear dynamics (SINDy) algorithm (Brunton et al., 2016) belongs to this type of learning, among other machine learning algorithms (Brunton and Kutz, 2022). The fundamental operators (Section 5) for this class of functions are given by

(𝒦gF)(x)F(x)\displaystyle({\mathcal{K}}_{g}F)(x)-F(x) =Φ(g)W𝒟(θg(x))W𝒟(x),\displaystyle=\Phi(g)W\mathcal{D}(\theta_{g}(x))-W\mathcal{D}(x), (72)
(ξF)(x)\displaystyle({\mathcal{L}}_{\xi}F)(x) =ϕ(ξ)W𝒟(x)+W𝒟(x)xθ^(ξ)x.\displaystyle=\phi(\xi)W\mathcal{D}(x)+W\frac{\partial\mathcal{D}(x)}{\partial x}\hat{\theta}(\xi)_{x}. (73)

These can be used directly in (65) and (70) to construct symmetry-promoting regularization functions RG(W)R_{G}(W) that are convex with respect to the weight matrix WW. Given a training dataset consisting of input-output pairs {(xj,yj)}j=1M\{(x_{j},y_{j})\}_{j=1}^{M} we can seek a regularized least-squares fit by solving the convex optimization problem

minimizeWn×N1Mj=1MyjW𝒟(xj)2+γRG(W𝒟).\operatorname*{\min\!imize\enskip}_{W\in\mathbb{R}^{n\times N}}\frac{1}{M}\sum_{j=1}^{M}\|y_{j}-W\mathcal{D}(x_{j})\|^{2}+\gamma R_{G}(W{\mathcal{D}}). (74)

Here, γ0\gamma\geq 0 is a parameter controlling the strength of the regularization that can be determined using cross-validation. To examine when this approach could be beneficial, we study a simplified problem — symmetric function recovery — in Section 9, below.

Remark 15

The solutions F=W𝒟F=W{\mathcal{D}} of (74) do not depend on how the dictionary functions are normalized due to the fact that the function being minimized can be written entirely in terms of FF and the data (xj,yj)(x_{j},y_{j}). This is in contrast to other types of regularized regression problems that penalize the weights WW directly, and therefore depend on how the functions in 𝒟{\mathcal{D}} are normalized.

8.4 Promoting symmetry in neural networks

In this section we describe a convex regularizing penalty to promote GG-equivariance in feed-forward neural networks

F=F(L)F(1)F=F^{(L)}\circ\cdots\circ F^{(1)} (75)

composed of layers F(l):𝒱l1𝒱lF^{(l)}:{\mathcal{V}}_{l-1}\to{\mathcal{V}}_{l} with group representations Φl:GGL(𝒱l)\Phi_{l}:G\to\operatorname{GL}({\mathcal{V}}_{l}). Since the composition is gg-equivariant if every layer is gg-equivariant, the main idea is to measure the symmetries shared by all of the layers. Specifically, we aim to maximize the “size” of the subgroup

l=1LSymG(F(l))={gG:𝒦gF(l)=F(l),l=1,,L}SymG(F),\bigcap_{l=1}^{L}\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)}=\{g\in G\ :\ {\mathcal{K}}_{g}F^{(l)}=F^{(l)},\ l=1,\ldots,L\}\subset\operatorname{Sym}_{G}(F), (76)

where the notion of “size” we adopt depends on whether GG is discrete or continuous. The same ideas can be applied to neural networks acting on fields with layers defined by integral operators as described in Section 6.2. In this case we consider symmetries shared by all of the integral kernels.

We consider the case in which the trainable layers are elements of vector spaces l{\mathcal{F}}_{l}, over which the optimization is carried out. For example, each layer may be given by F(l)=W(l)𝒟(l)F^{(l)}=W^{(l)}\mathcal{D}^{(l)} as in Section 8.3, where W(l)W^{(l)} is a trainable weight matrix and 𝒟(l)\mathcal{D}^{(l)} is a fixed dictionary of nonlinear functions. Alternatively, we could follow Finzi et al. (2021) and use trainable linear layers composed with fixed GG-equivariant nonlinearities. In contrast with Finzi et al. (2021), we do not force the linear layers to be GG-equivariant. Rather, we penalize the breaking of GG-symmetries in the linear layers as a means to regularize the neural network and to learn which subgroup of symmetries are compatible with the data and which are not.

As in Section 8.1, when GG is a discrete group with finitely many elements, a convex relaxation of the cardinality of Gl=1LSymG(F(l))G\setminus\bigcap_{l=1}^{L}\operatorname{Sym}_{G}(F^{(l)}) is

RG,1(F(1),,F(l))=gGl=1L𝒦gF(l)F(l)2.R_{G,1}\big{(}F^{(1)},\ldots,F^{(l)}\big{)}=\sum_{g\in G}\sqrt{\sum_{l=1}^{L}\big{\|}{\mathcal{K}}_{g}F^{(l)}-F^{(l)}\big{\|}^{2}}. (77)

Again, this is analogous to the group-LASSO penalty developed in Yuan and Lin (2006).

When GG is a Lie group with nonzero dimension, we follow the approach in Section 8.2 using the following observation:

Proposition 16

The subgroup in (76) is closed and embedded in GG; its Lie subalgebra is

l=1L𝔰𝔶𝔪G(F(l))={ξLie(G):ξF(l)=0,l=1,,L}.\bigcap_{l=1}^{L}\operatorname{\mathfrak{sym}}_{G}\big{(}F^{(l)}\big{)}=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F^{(l)}=0,\ l=1,\ldots,L\right\}. (78)

We provide a proof in Appendix A.

The Lie subalgebra in the proposition is equal to the nullspace of the linear operator LF(1),,F(L):Lie(G)l=1LC(𝒱l1;𝒱l)L_{F^{(1)},\ldots,F^{(L)}}:\operatorname{Lie}(G)\to\bigoplus_{l=1}^{L}C^{\infty}({\mathcal{V}}_{l-1};{\mathcal{V}}_{l}) defined by

LF(1),,F(L):ξ(ξF(1),,ξF(L)).L_{F^{(1)},\ldots,F^{(L)}}:\xi\mapsto\big{(}{\mathcal{L}}_{\xi}F^{(1)},\ldots,{\mathcal{L}}_{\xi}F^{(L)}\big{)}. (79)

By the rank and nullity theorem, minimizing the rank of this operator is equivalent to maximizing the dimension of the subgroup of symmetries shared by all of the layers in the network. As in Section 8.2, a convex relaxation of the rank is provided by the nuclear norm

RG,(F(1),,F(l))=LF(1),,F(L)=[𝑳F(1)𝑳F(L)],R_{G,*}\big{(}F^{(1)},\ldots,F^{(l)}\big{)}=\big{\|}L_{F^{(1)},\ldots,F^{(L)}}\big{\|}_{*}=\left\|\begin{bmatrix}{\boldsymbol{L}}_{F^{(1)}}\\ \vdots\\ {\boldsymbol{L}}_{F^{(L)}}\end{bmatrix}\right\|_{*}, (80)

where 𝑳F(l){\boldsymbol{L}}_{F^{(l)}} are the matrices of LF(l)L_{F^{(l)}} in orthonormal bases for Lie(G)\operatorname{Lie}(G) and the associated spaces l′′{\mathcal{F}}_{l}^{\prime\prime}.

9 Numerical study of sample complexity to recover symmetric functions

Can promoting symmetry help us learn an unknown symmetric function using less data? To begin answering this question, we perform numerical experiments studying the amount of sampled data needed to recover structured polynomial functions on n\mathbb{R}^{n} of the form

Frad(x)\displaystyle F_{\text{rad}}(x) =φrad(xc122,,xcr22)and\displaystyle=\varphi_{\text{rad}}\big{(}\|x-c_{1}\|_{2}^{2},\ \ldots,\ \|x-c_{r}\|_{2}^{2}\big{)}\quad\mbox{and} (81)
Flin(x)\displaystyle F_{\text{lin}}(x) =φlin(u1Tx,,urTx).\displaystyle=\varphi_{\text{lin}}\big{(}u_{1}^{T}x,\ \ldots,\ u_{r}^{T}x\big{)}. (82)

These possess various rotation and translation invariances when r<nr<n, as characterized in detail below by Proposition 17 and its corollaries.

We aim to recover the unknown function FF_{*} within the space 𝒫d(n){\mathcal{P}}_{d}(\mathbb{R}^{n}) of polynomial functions on n\mathbb{R}^{n} with degrees up to d=degFd=\deg F_{*} based on the values yj=F(xj)y_{j}=F_{*}(x_{j}) at sample points x1,,xNx_{1},\ldots,x_{N}. Our approximation F^\hat{F} of FF_{*} is computed by solving the convex optimization problem

minimizeF𝒫d(n)RG,(F)=LFs.t.F(xj)=yj,j=1,,N,\operatorname*{\min\!imize\enskip}_{F\in{\mathcal{P}}_{d}(\mathbb{R}^{n})}R_{G,*}(F)=\|L_{F}\|_{*}\qquad\text{s.t.}\qquad F(x_{j})=y_{j},\quad j=1,\ldots,N, (83)

where GG is a candidate Lie group of symmetries acting on n\mathbb{R}^{n}. This was done using the CVXPY software package developed by Diamond and Boyd (2016); Agrawal et al. (2018). The nuclear norm in (83) was defined with respect to inner products on the corresponding Lie algebras given by ξ,ηLie(G)=Tr(ξTη)\langle\xi,\ \eta\rangle_{\operatorname{Lie}(G)}=\operatorname{Tr}(\xi^{T}\eta). As we describe later in Section 10, the inner product on the space {\mathcal{F}}^{\prime} containing the ranges of every LFL_{F} was defined by (89) with unit weights wi=1w_{i}=1 and M=dim𝒫d(n)M=\dim{\mathcal{P}}_{d}(\mathbb{R}^{n}) points drawn uniformly from the cube [1,1]n[-1,1]^{n}. Note that these discretization points were not the same as the sample points xjx_{j} in (83). The validity of this inner product is guaranteed almost surely by Proposition 21.

To study the sample complexity for (83) to recover functions in the form of FradF_{\text{rad}} and FlinF_{\text{lin}}, we perform multiple experiments using random realizations of these functions sampled at random points xjx_{j}. In each experiment, the vectors cic_{i} were drawn uniformly from the cube [1,1]n[-1,1]^{n} and the vectors uiu_{i} were formed from the columns of a random n×rn\times r orthonormal matrix (specifically, the left singular vectors of an n×rn\times r matrix with standard Gaussian entries). The coefficients of φrad\varphi_{\text{rad}} and φlin\varphi_{\text{lin}} in a basis of monomials up to a specified degree were sampled uniformly from the interval [0,1][0,1]. This yielded random polynomial functions FradF_{\text{rad}} and FlinF_{\text{lin}} with degrees degFrad=2degφrad\deg F_{\text{rad}}=2\deg\varphi_{\text{rad}} and degFlin=degφlin\deg F_{\text{lin}}=\deg\varphi_{\text{lin}}.

The sample points xjx_{j} for each experiment were drawn uniformly from the cube [1,1]n[-1,1]^{n}. A total of dim𝒫d(n)\dim{\mathcal{P}}_{d}(\mathbb{R}^{n}) with d=degFd=\deg F_{*} sample points were drawn, which is sufficient to recover the function almost surely regardless of regularization. For each experiment we determine the smallest NN_{*} so that recovery is achieved by (83) with F^=F\hat{F}=F_{*} using the sample points x1,,xNx_{1},\ldots,x_{N} for every NNN\geq N_{*}. To be precise, successful recovery is declared when all coefficients describing F^\hat{F} and FF_{*} in the monomial basis for 𝒫d(n){\mathcal{P}}_{d}(\mathbb{R}^{n}) agree to a tolerance of 5×1035\times 10^{-3} times the magnitude of the largest coefficient of FF_{*}. The range of values for NN_{*} across 1010 such random experiments provides an estimate of the sample complexity. In Figures 45, and 6 we plot the range of values for NN_{*} as shaded regions with the average values displayed as a solid lines.

In Figure 4, we use the special Euclidean group G=SE(n)G=SE(n) as a candidate group to recover functions of the form FradF_{\text{rad}} with the degree of φrad\varphi_{\text{rad}} specified. The number of radial features rdegφradr\leq\deg\varphi_{\text{rad}} is selected in accordance with Corollary 18 in order to ensure that 𝔰𝔶𝔪G(Frad)=𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})={\mathfrak{g}}_{\text{rad}} has the known form and dimension stated in Proposition 17. The symmetry-promoting regularization significantly reduces the number of samples needed to recover FradF_{\text{rad}} compared to the number of samples needed to solve the linear system specifying this function within the space of polynomials with the same or lesser degree. As the number of radial features rr increases, so does the sample complexity to recover FradF_{\text{rad}}. This is likely due to the decreased dimension of 𝔰𝔶𝔪G(Frad)\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}}).

Refer to captionsamples, NN_{*}
Figure 4: Sample complexity to recover polynomial functions FradF_{\text{rad}} of rr radial features, i.e., (81) with polynomial φrad\varphi_{\text{rad}} of the specified degree, by solving (83) using the special Euclidean group G=SE(n)G=SE(n). Black dots indicate the number of dictionary functions, dim𝒫d(n)\dim{\mathcal{P}}_{d}(\mathbb{R}^{n}), hence the number of samples needed to recover FradF_{\text{rad}} without regularization.

In Figures 5 and 6, we use G=SE(n)G=SE(n) and the group of translations G=(n,+)G=(\mathbb{R}^{n},+) as candidate symmetry groups to recover function of the form FlinF_{\text{lin}} with the degree of φlin\varphi_{\text{lin}} specified. Obviously, FlinF_{\text{lin}} has an (nd)(n-d)-dimensional subgroup of translation symmetries orthogonal to span{u1,,ur}\operatorname{span}\{u_{1},\ldots,u_{r}\}. By Corollary 19, choosing degφlin2\deg\varphi_{\text{lin}}\geq 2 is sufficient to ensure that 𝔰𝔶𝔪SE(n)(Flin)=𝔤lin\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}})={\mathfrak{g}}_{\text{lin}} has the known form and dimension stated in Proposition 17. The results in Figures 5 and 6 show that the symmetry-promoting regularization reduces the sample complexity to recover FlinF_{\text{lin}}. Moreover, fewer samples are needed when FlinF_{\text{lin}} depends on fewer linear features, as might be expected because the dimension of 𝔰𝔶𝔪G(Frad)\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}}) increases as rr decreases.

Refer to captionsamples, NN_{*}
Figure 5: Sample complexity to recover polynomial functions FlinF_{\text{lin}} of rr linear features, i.e., (82) with polynomial φlin\varphi_{\text{lin}} of the specified degree, by solving (83) using the special Euclidean group G=SE(n)G=SE(n). Black dots indicate the number of dictionary functions, dim𝒫d(n)\dim{\mathcal{P}}_{d}(\mathbb{R}^{n}), hence the number of samples needed to recover FlinF_{\text{lin}} without regularization.
Refer to captionsamples, NN_{*}
Figure 6: Analogue of Fig. 5 using the group of translations G=(n,+)G=(\mathbb{R}^{n},+).
Proposition 17

Let rnr\leq n and suppose that {ckc1}k=2r\{c_{k}-c_{1}\}_{k=2}^{r} and {uk}k=1r\{u_{k}\}_{k=1}^{r} are sets of linearly-independent vectors in n\mathbb{R}^{n}. Then, 𝔰𝔶𝔪SE(n)(Frad)\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}}) contains the 12(nr)(nr+1)\frac{1}{2}(n-r)(n-r+1)-dimensional subalgebra

𝔤rad={[Sv00]:ST=SandSc1==Scr=v}\mathfrak{g}_{\text{rad}}=\left\{\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\ :\ S^{T}=-S\ \mbox{and}\ Sc_{1}=\cdots=Sc_{r}=-v\right\} (84)

and 𝔰𝔶𝔪SE(n)(Flin)\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}}) contains the 12(nr)(nr+1)\frac{1}{2}(n-r)(n-r+1)-dimensional subalgebra

𝔤lin={[Sv00]:ST=S,Su1==Sur=0,andu1Tv==urTv=0}.\mathfrak{g}_{\text{lin}}=\left\{\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\ :\ S^{T}=-S,\ Su_{1}=\cdots=Su_{r}=0,\ \mbox{and}\ u_{1}^{T}v=\cdots=u_{r}^{T}v=0\right\}. (85)

Either every polynomial φrad\varphi_{\text{rad}} with degree d\leq d gives 𝔰𝔶𝔪SE(n)(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}} or the set of polynomials φrad\varphi_{\text{rad}} with degree d\leq d satisfying 𝔰𝔶𝔪SE(n)(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}} is a set of measure zero. Likewise, for φlin\varphi_{\text{lin}}, FlinF_{\text{lin}}, and 𝔤lin\mathfrak{g}_{\text{lin}}. See Appendix A for a proof.

Corollary 18

With the same hypotheses as Proposition 17, let drd\geq r. The set of polynomials φrad\varphi_{\text{rad}} with degree d\leq d satisfying 𝔰𝔶𝔪SE(n)(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}} is a set of measure zero. A proof is given in Appendix A.

Corollary 19

With the same hypotheses as Proposition 17, let d2d\geq 2. The set of polynomials φlin\varphi_{\text{lin}} with degree d\leq d satisfying 𝔰𝔶𝔪SE(n)(Flin)𝔤lin\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}})\neq\mathfrak{g}_{\text{lin}} is a set of measure zero. A proof is given in Appendix A.

10 Discretizing the operators

This section describes how to construct matrices for the operators ξ{\mathcal{L}}_{\xi} and LFL_{F} for continuously differentiable functions FF in a user-specified finite-dimensional subspace C1(𝒱;𝒲){\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}}). By choosing bases for the finite-dimensional vector spaces 𝒱{\mathcal{V}} and 𝒲{\mathcal{W}}, it suffices without loss of generality to consider the case in which 𝒱=m{\mathcal{V}}=\mathbb{R}^{m} and 𝒲=n{\mathcal{W}}=\mathbb{R}^{n}. We assume that Lie(G)\operatorname{Lie}(G) and {\mathcal{F}} are endowed with inner products and that {ξ1,ξdim(G)}\{\xi_{1},\ldots\xi_{\dim(G)}\} and {F1,Fdim()}\{F_{1},\ldots F_{\dim({\mathcal{F}})}\} are orthonormal bases for these spaces, respectively. The key task is to endow the finite-dimensional subspace

=span{ξF:ξLie(G),F}C0(m;n){\mathcal{F}}^{\prime}=\operatorname{span}\left\{{\mathcal{L}}_{\xi}F\ :\ \xi\in\operatorname{Lie}(G),\ F\in{\mathcal{F}}\right\}\subset C^{0}(\mathbb{R}^{m};\mathbb{R}^{n}) (86)

with a convenient inner product. Once this is done, an orthonormal basis {u1,,uN}\{u_{1},\ldots,u_{N}\} for {\mathcal{F}}^{\prime} can be constructed by applying a Gram-Schmidt process to the functions ξiFj{\mathcal{L}}_{\xi_{i}}F_{j}, which span {\mathcal{F}}^{\prime}. Matrices for ξ{\mathcal{L}}_{\xi} and LFL_{F} are then easily obtained by computing

[𝓛ξ]i,j=ui,ξFj,[𝑳F]i,k=ui,ξkF.\big{[}{\boldsymbol{{\mathcal{L}}}}_{\xi}\big{]}_{i,j}=\big{\langle}u_{i},\ {\mathcal{L}}_{\xi}F_{j}\big{\rangle}_{{\mathcal{F}}^{\prime}},\qquad\big{[}{\boldsymbol{L}}_{F}\big{]}_{i,k}=\big{\langle}u_{i},\ {\mathcal{L}}_{\xi_{k}}F\big{\rangle}_{{\mathcal{F}}^{\prime}}. (87)

The issue at hand is to choose the inner product on {\mathcal{F}}^{\prime} in a way that makes computing these matrices easy. A natural choice is to equip {\mathcal{F}}^{\prime} with an L2(m,μ;n)L^{2}(\mathbb{R}^{m},\mu;\mathbb{R}^{n}) inner product where μ\mu is a positive measure on m\mathbb{R}^{m} (such as a Guassian distribution) for which the L2L^{2} norms of function in {\mathcal{F}}^{\prime} are finite. The problem is that it is usually challenging or inconvenient to compute the required integrals

ξiFj,ξkFlL2(μ)=m(ξiFj)(x)T(ξkFl)(x)dμ(x)\big{\langle}{\mathcal{L}}_{\xi_{i}}F_{j},\ {\mathcal{L}}_{\xi_{k}}F_{l}\big{\rangle}_{L^{2}(\mu)}=\int_{\mathbb{R}^{m}}\big{(}{\mathcal{L}}_{\xi_{i}}F_{j}\big{)}(x)^{T}\big{(}{\mathcal{L}}_{\xi_{k}}F_{l}\big{)}(x)\operatorname{\mathrm{d}}\mu(x) (88)

analytically. In this section we discuss inner products that are easy to compute in practice.

10.1 Numerical quadrature and Monte-Carlo

When (88) cannot be computed analytically, one can resort to a numerical quadrature or Monte-Carlo approximation. In both cases the integral is approximated by a weighted sum, yielding a semi-inner product

f,gL2(μM)=1Mi=1Mwif(xi)Tg(xi)\langle f,\ g\rangle_{L^{2}(\mu_{M})}=\frac{1}{M}\sum_{i=1}^{M}w_{i}f(x_{i})^{T}g(x_{i}) (89)

that converges to f,gL2(μ)\langle f,\ g\rangle_{L^{2}(\mu)} as MM\to\infty. The following proposition means that we do not have to pass to the limit MM\to\infty in order to obtain a valid inner product defined by (89) on {\mathcal{F}}^{\prime}.

Proposition 20

Suppose that {\mathcal{F}}^{\prime} is a finite-dimensional and f,gL2(μM)f,gL2(μ)\langle f,\ g\rangle_{L^{2}(\mu_{M})}\to\langle f,\ g\rangle_{L^{2}(\mu)} as MM\to\infty for every f,gf,g\in{\mathcal{F}}^{\prime}. Then there is an M0M_{0} such that (89) is an inner product on {\mathcal{F}}^{\prime} for every MM0M\geq M_{0}. We give a proof in Appendix A.

For example, in Monte-Carlo approximation, the samples xix_{i} are drawn independently from a distribution ν\nu with the assumption that both μ\mu and ν\nu are σ\sigma-finite and μ\mu is absolutely continuous with respect to ν\nu. The weights are given by the Radon-Nikodym derivative wi=dμdν(xi)w_{i}=\frac{\operatorname{\mathrm{d}}\mu}{\operatorname{\mathrm{d}}\nu}(x_{i}). Then for every f,gL2(μ)f,g\in L^{2}(\mu) the approximate integral converges f,gL2(μM)f,gL2(μ)\langle f,\ g\rangle_{L^{2}(\mu_{M})}\to\langle f,\ g\rangle_{L^{2}(\mu)} as MM\to\infty almost surely thanks to the strong law of large numbers (see Theorem 7.7 in Koralov and Sinai (2012)). By the proposition, there is almost surely a finite M0M_{0} such that (89) is an inner product on {\mathcal{F}}^{\prime} for every MM0M\geq M_{0}.

10.2 Subspaces of polynomials

Here we consider the special case when {\mathcal{F}} is a finite-dimensional subspace consisting of polynomial functions mn\mathbb{R}^{m}\to\mathbb{R}^{n}. Examining the expression in (22), it is evident that ξF{\mathcal{L}}_{\xi}F is also a polynomial function mn\mathbb{R}^{m}\to\mathbb{R}^{n} with degree not greater than that of FF\in{\mathcal{F}}. Thus, {\mathcal{F}}^{\prime} is also a space of polynomial functions with degree not exceeding the maximum degree in {\mathcal{F}}. Since a polynomial that vanishes on an open set must be identically zero, we can take the integrals defining the inner product in (89) over a cube, such as [0,1]mm[0,1]^{m}\subset\mathbb{R}^{m}. This is convenient because polynomial integrals over the cube can be calculated analytically.

We can also use the sample-based inner product in (89) with randomly chosen points xix_{i} and positive weights wiw_{i}. The following proposition tells us exactly how many sample points we need.

Proposition 21

Let {\mathcal{F}}^{\prime} be a space of real polynomial functions mn\mathbb{R}^{m}\to\mathbb{R}^{n} and let πi:n\pi_{i}:\mathbb{R}^{n}\to\mathbb{R} be the iith coordinate projection π(c1,,cn)=ci\pi(c_{1},\ldots,c_{n})=c_{i}. Let

MM0=max1indim(πi())M\geq M_{0}=\max_{1\leq i\leq n}\dim(\pi_{i}({\mathcal{F}}^{\prime})) (90)

and let w1,,wM>0w_{1},\ldots,w_{M}>0 be positive weights. Then for almost every set of points (x1,,xM)(m)M(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M} with respect to Lebesgue measure, (89) is an inner product on {\mathcal{F}}^{\prime}. We give a proof in Appendix B.

This means that we can draw MM0M\geq M_{0} sample points independently from any absolutely continuous measure (such as a Gaussian distribution or the uniform distribution on a cube), and with probability one, (89) will be an inner product on {\mathcal{F}}^{\prime}. When {\mathcal{F}} consists of polynomials with degree at most dd, then taking

M|{(p0,,pm)0m+1:p0++pm=d}|=(d+mm)M\geq\left|\left\{(p_{0},\ldots,p_{m})\in\mathbb{N}_{0}^{m+1}\ :\ p_{0}+\cdots+p_{m}=d\right\}\right|=\binom{d+m}{m} (91)

is sufficient.

11 Generalization to sections of vector bundles

The machinery for promoting, discovering, and enforcing symmetry of maps F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} between finite-dimensional vector spaces is a special case of more general machinery for sections of vector bundles presented here. Applications of this more general framework include studying the symmetries of vector fields, tensor fields, dynamical systems, and integral operators manifolds with respect to nonlinear group actions (Abraham et al., 1988). We rely heavily on background, definitions, and results that can be found in Lee (2013) and Kolář et al. (1993).

First, we provide some background on smooth vector bundles that can be found in Lee (2013, Chapter 10). A rank-kk smooth vector bundle EE is a collection of kk-dimensional vector spaces EpE_{p}, called “fibers”, organized smoothly over a base manifold {\mathcal{M}}. This fibers are organized by the “bundle projection” π:E\pi:E\to{\mathcal{M}}, a surjective map whose preimages are the fibers Ep=π1(p)E_{p}=\pi^{-1}(p). Smoothness means that π\pi is a smooth submersion where EE is a smooth manifold covered by smooth local trivializations

ψα:π1(𝒰α)E𝒰α×k\psi_{\alpha}:\pi^{-1}({\mathcal{U}}_{\alpha})\subset E\to{\mathcal{U}}_{\alpha}\times\mathbb{R}^{k}

with {𝒰α}α𝒜\{{\mathcal{U}}_{\alpha}\}_{\alpha\in{\mathcal{A}}} being open subsets covering {\mathcal{M}}. The transition functions between local trivializations are k\mathbb{R}^{k}-linear, meaning that there are smooth matrix-valued functions 𝑻α,β:𝒰α𝒰βk×k{\boldsymbol{T}}_{\alpha,\beta}:{\mathcal{U}}_{\alpha}\cap{\mathcal{U}}_{\beta}\to\mathbb{R}^{k\times k} satisfying

ψαψβ1(p,𝒗)=(p,𝑻α,β(p)𝒗)\psi_{\alpha}\circ\psi_{\beta}^{-1}(p,{\boldsymbol{v}})=(p,{\boldsymbol{T}}_{\alpha,\beta}(p){\boldsymbol{v}}) (92)

for every p𝒰α𝒰βp\in{\mathcal{U}}_{\alpha}\cap{\mathcal{U}}_{\beta} and 𝒗k{\boldsymbol{v}}\in\mathbb{R}^{k}. The bundle with this structure is often denoted π:E\pi:E\to{\mathcal{M}}.

A “section” of the rank-kk vector bundle π:E\pi:E\to{\mathcal{M}} is a map F:EF:{\mathcal{M}}\to E satisfying πF=Id\pi\circ F=\operatorname{Id}_{{\mathcal{M}}}. The space of (possibly rough) sections, denoted Σ(E)\operatorname{\Sigma}(E), is a vector space with addition and scalar multiplication defined pointwise in each fiber EpE_{p}. We equip Σ(E)\operatorname{\Sigma}(E) with the topology of pointwise convergence, making it into a locally-convex space. The space of sections possessing mm continuous derivatives is denoted Cm(,E)C^{m}({\mathcal{M}},E), with the space of merely continuous sections being C(,E)=C0(,E)C({\mathcal{M}},E)=C^{0}({\mathcal{M}},E) and the space of smooth sections being C(,E)C^{\infty}({\mathcal{M}},E). A vector bundle and a section are depicted in Figure 7, along with the fundamental operators for a group action that we introduce below.

Refer to caption{\mathcal{M}}ppq=θexp(tξ)(p)q=\theta_{\exp(t\xi)}(p)EpE_{p}F(p)F(p)ξF(p){\mathcal{L}}_{\xi}F(p)𝒦exp(tξ)F(p){\mathcal{K}}_{\exp(t\xi)}F(p)EqE_{q}F(q)F(q)Θexp(tξ)\Theta_{\exp(-t\xi)}EEπ\pi
Figure 7: Fundamental operators for sections of vector bundles equipped with fiber-linear Lie group actions. The action Θg\Theta_{g} is linear on each fiber EpE_{p} and descends under the bundle projection π:E\pi:E\to{\mathcal{M}} to an action θg\theta_{g} on {\mathcal{M}}. Given a section F:EF:{\mathcal{M}}\to E, the finite transformation operators 𝒦g{\mathcal{K}}_{g} produce a new section 𝒦gF{\mathcal{K}}_{g}F whose value at pp is given by evaluating FF at q=θg(p)q=\theta_{g}(p) and pulling the value in EqE_{q} back to EpE_{p} via the linear map Θg1\Theta_{g^{-1}} on EqE_{q}. The operators 𝒦g{\mathcal{K}}_{g} are linear thanks to linearity of Θg1\Theta_{g^{-1}} on every EqE_{q}. The Lie derivative ξ{\mathcal{L}}_{\xi} is the operator on sections formed by differentiating t𝒦exp(tξ)t\mapsto{\mathcal{K}}_{\exp(t\xi)} at t=0t=0. Geometrically, ξF(p){\mathcal{L}}_{\xi}F(p) is the vector in EpE_{p} lying tangent to the curve t𝒦exp(tξ)F(p)t\mapsto{\mathcal{K}}_{\exp(t\xi)}F(p) in EpE_{p} passing through F(p)F(p) at t=0t=0.

We consider a smooth “fiber-linear” right GG-action Θ:E×GE\Theta:E\times G\to E, meaning that every Θg=Θ(,g):EE\Theta_{g}=\Theta(\cdot,g):E\to E is a smooth vector bundle homomorphism. In other words, Θ\Theta descends under the bundle projection π\pi to a unique smooth right GG-action θ:×G\theta:{\mathcal{M}}\times G\to{\mathcal{M}} so that the diagram

E{E}E{E}{{\mathcal{M}}}{{\mathcal{M}}}Θg\scriptstyle{\Theta_{g}}π\scriptstyle{\pi}π\scriptstyle{\pi}θg\scriptstyle{\theta_{g}} (93)

commutes and the restricted maps Θg|Ep:EpEθ(p,g)\Theta_{g}|_{E_{p}}:E_{p}\to E_{\theta(p,g)} are linear. We define what it means for a section to be symmetric with respect to this action as follows:

Definition 22

A section FΣ(E)F\in\Sigma(E) is equivariant with respect to a transformation gGg\in G if

𝒦gF:=Θg1Fθg=F.{\mathcal{K}}_{g}F:=\Theta_{g^{-1}}\circ F\circ\theta_{g}=F. (94)

These transformations form a subgroup of GG denoted SymG(F)\operatorname{Sym}_{G}(F).

The operators 𝒦g{\mathcal{K}}_{g} are depicted in Figure 7. Thanks to the vector bundle homomorphism properties of Θg1\Theta_{g^{-1}}, the operators 𝒦g:Σ(E)Σ(E){\mathcal{K}}_{g}:\operatorname{\Sigma}(E)\to\operatorname{\Sigma}(E) are well-defined and linear. Moreover, they form a group under composition 𝒦g1𝒦g2=𝒦g1g2{\mathcal{K}}_{g_{1}}{\mathcal{K}}_{g_{2}}={\mathcal{K}}_{g_{1}\cdot g_{2}}, with inverses given by 𝒦g1=𝒦g1{\mathcal{K}}_{g}^{-1}={\mathcal{K}}_{g^{-1}}.

The “infinitesimal generator” of the group action is the linear map Θ^:Lie(G)𝔛(E)\hat{\Theta}:\operatorname{Lie}(G)\to\mathfrak{X}(E) defined by

Θ^(ξ)=ddt|t=0Θexp(tξ).\hat{\Theta}(\xi)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\Theta_{\exp(t\xi)}. (95)

It turns out that this vector field is Θ\Theta-related to 0×ξ𝔛(×G)0\times\xi\in\mathfrak{X}({\mathcal{M}}\times G) (see Lemma 5.13 in Kolář et al. (1993), Lemma 20.14 in Lee (2013)), meaning that the flow of Θ^(ξ)\hat{\Theta}(\xi) is given by

FlΘ^(ξ)t=Θexp(tξ).\operatorname{Fl}_{\hat{\Theta}(\xi)}^{t}=\Theta_{\exp(t\xi)}. (96)

Likewise, θexp(tξ)\theta_{\exp(t\xi)} is the flow of θ^(ξ)=ddt|t=0θexp(tξ)𝔛()\hat{\theta}(\xi)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\theta_{\exp(t\xi)}\in\mathfrak{X}({\mathcal{M}}), which is π\pi-related to Θ^(ξ)\hat{\Theta}(\xi).

Differentiating the smooth curves t𝒦exp(tξ)F(p)t\mapsto{\mathcal{K}}_{\exp(t\xi)}F(p) lying in EpE_{p} for each pp\in{\mathcal{M}} gives rise to the Lie derivative ξ:D(ξ)Σ(E)Σ(E){\mathcal{L}}_{\xi}:D({\mathcal{L}}_{\xi})\subset\operatorname{\Sigma}(E)\to\operatorname{\Sigma}(E) along ξLie(G)\xi\in\operatorname{Lie}(G) defined by

ξF=ddt|t=0𝒦exp(tξ)F=limt01t(Θexp(tξ)Fθexp(tξ)F),\boxed{{\mathcal{L}}_{\xi}F=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\exp(t\xi)}F=\lim_{t\to 0}\frac{1}{t}\left(\Theta_{\exp(-t\xi)}\circ F\circ\theta_{\exp(t\xi)}-F\right),} (97)

where FD(ξ)F\in D({\mathcal{L}}_{\xi}) if and only if the limit converges in Σ(E)\operatorname{\Sigma}(E), i.e., pointwise. Note that we implicitly identify TF(p)EpEpT_{F(p)}E_{p}\cong E_{p}. This construction is illustrated in Figure 7. We recover (22) from (97) in the special case where a smooth function F:𝒱𝒲F:{\mathcal{V}}\to{\mathcal{W}} is viewed as a section x(x,F(x))x\mapsto(x,F(x)) of the bundle π:𝒱×𝒲𝒱\pi:{\mathcal{V}}\times{\mathcal{W}}\to{\mathcal{V}} and acted upon by group representations. Critically, the Lie derivative ξ{\mathcal{L}}_{\xi}, as defined above, is a linear operator on sections of the vector bundle EE. This allows us to formulate convex symmetry-promoting regularization functions as in Section 8 using Lie derivatives in the broader setting of vector bundle sections.

Remark 23 (Lie derivatives using flows)

Thanks to (96), the Lie derivative defined in (97) only depends on the infinitesimal generator Θ^(ξ)𝔛(E)\hat{\Theta}(\xi)\in\mathfrak{X}(E), and its flow for small time tt. Hence, any vector field in 𝔛(E)\mathfrak{X}(E) whose flow is fiber-linear, but not necessarily defined for all tt\in\mathbb{R}, gives rise to an analogously-defined Lie derivative acting linearly on Σ(E)\operatorname{\Sigma}(E). These are the so-called “linear vector fields” described by Kolář et al. (1993) in Section 47.9. In fact, more general versions of the Lie derivative based on flows for maps between manifolds are described by Kolář et al. (1993) in Chapter 11. However, these generalizations are nonlinear operators, destroying the convex properties of the symmetry-promoting regularization functions in Section 8.

In addition to linearity, the key properties of the operators 𝒦g{\mathcal{K}}_{g} and ξ{\mathcal{L}}_{\xi} for studying symmetries of sections are:

Proposition 24

For every ξ,ηLie(G)\xi,\eta\in\operatorname{Lie}(G), and α,β,t\alpha,\beta,t\in\mathbb{R}, we have

ddt𝒦exp(tξ)F=ξ𝒦exp(tξ)F=𝒦exp(tξ)ξFFD(ξ),\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F\qquad\forall F\in D({\mathcal{L}}_{\xi}), (98)
αξ+βηF=(αξ+βη)FFC1(,E),{\mathcal{L}}_{\alpha\xi+\beta\eta}F=\big{(}\alpha{\mathcal{L}}_{\xi}+\beta{\mathcal{L}}_{\eta}\big{)}F\qquad\forall F\in C^{1}({\mathcal{M}},E), (99)

and

[ξ,η]F=(ξηηξ)FFC2(,E).{\mathcal{L}}_{[\xi,\eta]}F=\big{(}{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}\big{)}F\qquad\forall F\in C^{2}({\mathcal{M}},E). (100)

We give a proof in Appendix C.

Taken together, these results mean that Π:g𝒦g\Pi:g\mapsto{\mathcal{K}}_{g} and Π:ξξ\Pi_{*}:\xi\mapsto{\mathcal{L}}_{\xi} are (infinite-dimensional) representations of GG and Lie(G)\operatorname{Lie}(G) in C(,E)C^{\infty}({\mathcal{M}},E).

The main results of this section are the following two theorems. The first completely characterizes the identity component of SymG(F)\operatorname{Sym}_{G}(F) by correspondence with its Lie subgalgebra (see Theorem 19.26 in Lee (2013)). The second gives necessary and sufficient conditions for a section to be GG-equivariant.

Theorem 25

If FC(,E)F\in C({\mathcal{M}},E) is a continuous section, then SymG(F)\operatorname{Sym}_{G}(F) is a closed, embedded Lie subgroup of GG whose Lie subalgebra is

𝔰𝔶𝔪G(F)={ξLie(G):FD(ξ)andξF=0}.\operatorname{\mathfrak{sym}}_{G}(F)=\left\{\xi\in\operatorname{Lie}(G)\ :\ F\in D({\mathcal{L}}_{\xi})\ \mbox{and}\ {\mathcal{L}}_{\xi}F=0\right\}. (101)

We give a proof in Appendix D.

Theorem 26

Suppose that GG has nGn_{G} connected components with G0G_{0} being the component containing the identity element. Let ξ1,,ξq\xi_{1},\ldots,\xi_{q} generate Lie(G)\operatorname{Lie}(G) and let g1,,gnG1g_{1},\ldots,g_{n_{G}-1} be elements from each non-identity component of GG. A continuous section FC(,E)F\in C({\mathcal{M}},E) is G0G_{0}-equivariant if and only if

FD(ξi)andξiF=0,i=1,,q.F\in D({\mathcal{L}}_{\xi_{i}})\quad\mbox{and}\quad{\mathcal{L}}_{\xi_{i}}F=0,\qquad i=1,\ldots,q. (102)

If, in addition, we have

𝒦gjF=F,j=1,,nG1,{\mathcal{K}}_{g_{j}}F=F,\qquad j=1,\ldots,n_{G}-1, (103)

then FF is GG-equivariant. We give a proof in Appendix E.

These results allow us to promote, enforce, and discover symmetries for sections of vector bundles in fundamentally the same way we did for maps between finite-dimensional vector spaces in Sections 67, and 8. In particular, symmetry can be enforced through analogous linear constraints, discovered through nullspaces of analogous operators, and promoted through analogous convex penalties based on the nuclear norm.

Remark 27 (Left actions)

Theorems 26 and 25 hold without any modification for left G-actions ΘL:G×EE\Theta^{L}:G\times E\to E. This is because we can define a corresponding right GG-action by ΘR(p,g)=ΘL(g1,p)\Theta^{R}(p,g)=\Theta^{L}(g^{-1},p) with associated operators related by

𝒦gR=𝒦g1LandξR=ξL.{\mathcal{K}}^{R}_{g}={\mathcal{K}}^{L}_{g^{-1}}\qquad\mbox{and}\qquad{\mathcal{L}}^{R}_{\xi}=-{\mathcal{L}}^{L}_{\xi}. (104)

The symmetry group SymG(F)\operatorname{Sym}_{G}(F) does not depend on whether it is defined by the condition 𝒦gRF=F{\mathcal{K}}^{R}_{g}F=F or by 𝒦gLF=F{\mathcal{K}}^{L}_{g}F=F. It is slighly less natural to work with left actions because ΠL:g𝒦gL\Pi^{L}:g\mapsto{\mathcal{K}}^{L}_{g} and ΠL:ξξL\Pi^{L}_{*}:\xi\mapsto{\mathcal{L}}^{L}_{\xi} are Lie group and Lie algebra anti-homomorphisms, that is,

ΠL(g1g2)=ΠL(g2)ΠL(g1)andΠL([ξ,η])=[ΠL(η),ΠL(ξ)].\Pi^{L}(g_{1}g_{2})=\Pi^{L}(g_{2})\Pi^{L}(g_{1})\qquad\mbox{and}\qquad\Pi^{L}_{*}\big{(}[\xi,\eta]\big{)}=\big{[}\Pi^{L}_{*}(\eta),\Pi^{L}_{*}(\xi)\big{]}. (105)

11.1 Vector fields

Here we study the symmetries of a vector field V𝔛()V\in\mathfrak{X}({\mathcal{M}}) under a right GG-action θ:×G\theta:{\mathcal{M}}\times G\to{\mathcal{M}}. This allows us to extend the discussion in Section 7.3 to dynamical systems described by vector fields on smooth manifolds and acted upon nonlinearly by arbitrary Lie groups. The tangent map of the diffeomorhpism θg=θ(,g):\theta_{g}=\theta(\cdot,g):{\mathcal{M}}\to{\mathcal{M}} transforms vector fields via the pushforward map (θg):𝔛()𝔛()(\theta_{g})_{*}:\mathfrak{X}({\mathcal{M}})\to\mathfrak{X}({\mathcal{M}}) defined by

((θg)V)pg=dθg(p)Vp((\theta_{g})_{*}V)_{p\cdot g}=\operatorname{\mathrm{d}}\theta_{g}(p)V_{p} (106)

for every pp\in{\mathcal{M}}.

Definition 28

Given gGg\in G, we say that a vector field V𝔛()V\in\mathfrak{X}({\mathcal{M}}) is gg-invariant if and only if (θg)V=V(\theta_{g})_{*}V=V, that is,

Vpg=dθg(p)Vpp.V_{p\cdot g}=\operatorname{\mathrm{d}}\theta_{g}(p)V_{p}\qquad\forall p\in{\mathcal{M}}. (107)

Because (θg1)(θg)=(θg1θg)=Id𝔛()(\theta_{g^{-1}})_{*}(\theta_{g})_{*}=(\theta_{g^{-1}}\circ\theta_{g})_{*}=\operatorname{Id}_{\mathfrak{X}({\mathcal{M}})}, it is clear that a vector field is gg-invariant if and only if it is g1g^{-1}-invariant.

Recall that vector fields V𝔛()V\in\mathfrak{X}({\mathcal{M}}) are smooth sections of the tangent bundle E=TE=T{\mathcal{M}}. The right GG-action θ\theta on {\mathcal{M}} induces a right GG-action Θ:T×GT\Theta:T{\mathcal{M}}\times G\to T{\mathcal{M}} on the tangent bundle defined by

Θg(vp)=dθg(p)vp.\Theta_{g}(v_{p})=\operatorname{\mathrm{d}}\theta_{g}(p)v_{p}. (108)

It is easy to see that each Θg\Theta_{g} is a vector bundle homomorphism descending to θg\theta_{g} under the bundle projection π\pi. Crucially, we have

𝒦gV=Θg1Vθg=(θg1)V,{\mathcal{K}}_{g}V=\Theta_{g^{-1}}\circ V\circ\theta_{g}=(\theta_{g^{-1}})_{*}V, (109)

meaning that a vector field V𝔛()V\in\mathfrak{X}({\mathcal{M}}) is gg-invariant if and only if it is gg-equivariant as a section of TT{\mathcal{M}} with respect to the action Θ\Theta. Recall that (by Lemma 20.14 in Lee (2013)) the left-invariant vector field ξLie(G)𝔛(G)\xi\in\operatorname{Lie}(G)\subset\mathfrak{X}(G) and its infinitesimal generator θ^(ξ)𝔛()\hat{\theta}(\xi)\in\mathfrak{X}({\mathcal{M}}) are θ(p)\theta^{(p)}-related, where θ(p):gθ(p,g)\theta^{(p)}:g\mapsto\theta(p,g) is the orbit map. This means that θexp(tξ)\theta_{\exp(t\xi)} is the time-tt flow of θ^(ξ)\hat{\theta}(\xi) by Proposition 9.13 in Lee (2013). As a result, the Lie derivative in (97) agrees with the standard Lie derivative of VV along θ^(ξ)\hat{\theta}(\xi) (see Lee (2013, p.228)), that is,

ξV(p)=limt01t[dθexp(tξ)(θexp(tξ)(p))Vθexp(tξ)(p)Vp]=[θ^(ξ),V]p,\boxed{{\mathcal{L}}_{\xi}V(p)=\lim_{t\to 0}\frac{1}{t}\left[\operatorname{\mathrm{d}}\theta_{\exp(-t\xi)}(\theta_{\exp(t\xi)}(p))V_{\theta_{\exp(t\xi)}(p)}-V_{p}\right]=[\hat{\theta}(\xi),V]_{p},} (110)

where the expression on the right is the Lie bracket of θ^(ξ)\hat{\theta}(\xi) and VV.

11.2 Tensor fields

Symmetries of a tensor field can also be revealed using our framework. This will be useful for our study of Hamiltonian dynamics in Section 11.3 and for our study of integral operators, whose kernels can be viewed as tensor fields, in Section 11.4. For simplicity, we consider a rank-kk covariant tensor field A𝔗k()A\in\mathfrak{T}^{k}({\mathcal{M}}), although our results extend to contravariant and mixed tensor fields with minimal modification. We rely on the basic definitions and machinery found in Lee (2013, Chapter 12). Under a right GG-action θ\theta on {\mathcal{M}}, the tensor field transforms via the pullback map θg:𝔗k()𝔗k()\theta_{g}^{*}:\mathfrak{T}^{k}({\mathcal{M}})\to\mathfrak{T}^{k}({\mathcal{M}}) defined by

(θgA)p(v1,,vk)=(dθg(p)Apg)(v1,,vk)=Apg(dθg(p)v1,,dθg(p)vk)(\theta_{g}^{*}A)_{p}(v_{1},\ldots,v_{k})=(\operatorname{\mathrm{d}}\theta_{g}(p)^{*}A_{p\cdot g})(v_{1},\ldots,v_{k})=A_{p\cdot g}(\operatorname{\mathrm{d}}\theta_{g}(p)v_{1},\ldots,\operatorname{\mathrm{d}}\theta_{g}(p)v_{k}) (111)

for every v1,,vkTpv_{1},\ldots,v_{k}\in T_{p}{\mathcal{M}}.

Definition 29

Given gGg\in G, a tensor field A𝔗k()A\in\mathfrak{T}^{k}({\mathcal{M}}) is gg-invariant if and only if θgA=A\theta_{g}^{*}A=A, that is,

Apg(dθg(p)v1,,dθg(p)vk)=Ap(v1,,vk)p.A_{p\cdot g}(\operatorname{\mathrm{d}}\theta_{g}(p)v_{1},\ldots,\operatorname{\mathrm{d}}\theta_{g}(p)v_{k})=A_{p}(v_{1},\ldots,v_{k})\qquad\forall p\in{\mathcal{M}}. (112)

To study the invariance of tensor fields in our framework, we recall that a tensor field is a section of the tensor bundle E=TkT=p(Tp)kE=T^{k}T^{*}{\mathcal{M}}=\coprod_{p\in{\mathcal{M}}}(T_{p}^{*}{\mathcal{M}})^{\otimes k}, a vector bundle over {\mathcal{M}}, where TpT_{p}^{*}{\mathcal{M}} is the dual space of TpT_{p}{\mathcal{M}}. The right GG-action θ\theta on {\mathcal{M}} induces a right GG-action Θ:TkT×GTkT\Theta:T^{k}T{\mathcal{M}}\times G\to T^{k}T{\mathcal{M}} defined by

Θg(Ap)=dθg1(θg(p))Ap.\Theta_{g}(A_{p})=\operatorname{\mathrm{d}}\theta_{g^{-1}}(\theta_{g}(p))^{*}A_{p}. (113)

It is clear that each Θg\Theta_{g} is a homomorphism of the vector bundle TkTT^{k}T^{*}{\mathcal{M}} descending to θg\theta_{g} under the bundle projection. Crucially, we have

𝒦gA=Θg1Aθg=θgA,{\mathcal{K}}_{g}A=\Theta_{g^{-1}}\circ A\circ\theta_{g}=\theta_{g}^{*}A, (114)

meaning that A𝔗k()A\in\mathfrak{T}^{k}({\mathcal{M}}) is gg-invariant if and only if it is gg-equivariant as a section of TkTT^{k}T^{*}{\mathcal{M}} with respect to the action Θ\Theta. Since θexp(tξ)\theta_{\exp(t\xi)} gives the time-tt flow of the vector field θ^(ξ)𝔛()\hat{\theta}(\xi)\in\mathfrak{X}({\mathcal{M}}), the Lie derivative in (97) for this action agrees with the standard Lie derivative of A𝔗k()A\in\mathfrak{T}^{k}({\mathcal{M}}) along θ^(ξ)\hat{\theta}(\xi) (see Lee (2013, p.321)), that is

(ξA)p=limt01t[dθexp(tξ)(p)Aθexp(tξ)(p)Ap]=(θ^(ξ)A)p.\boxed{({\mathcal{L}}_{\xi}A)_{p}=\lim_{t\to 0}\frac{1}{t}\left[\operatorname{\mathrm{d}}\theta_{\exp(t\xi)}(p)^{*}A_{\theta_{\exp(t\xi)}(p)}-A_{p}\right]=({\mathcal{L}}_{\hat{\theta}(\xi)}A)_{p}.} (115)

The Lie derivative for arbitrary covariant tensor fields can be computed by applying Proposition 12.32 in Lee (2013) and its corollaries. More generally, thanks to 6.16-18 in Kolář et al. (1993), the Lie derivative for any tensor product of sections of natural vector bundles can be computed via the formula

ξ(A1A2)=(ξA1)A2+A1(ξA2).{\mathcal{L}}_{\xi}(A_{1}\otimes A_{2})=({\mathcal{L}}_{\xi}A_{1})\otimes A_{2}+A_{1}\otimes({\mathcal{L}}_{\xi}A_{2}). (116)

For example, this holds when A1,A2A_{1},A_{2} are arbitrary smooth tensor fields of mixed types. The Lie derivative of a differential form ω\omega on {\mathcal{M}} can be computed by Cartan’s magic formula

ξω=θ^(ξ)(dω)+d(θ^(ξ)ω),{\mathcal{L}}_{\xi}\omega=\hat{\theta}(\xi)\operatorname{{\lrcorner}}(\operatorname{\mathrm{d}}\omega)+\operatorname{\mathrm{d}}(\hat{\theta}(\xi)\operatorname{{\lrcorner}}\omega), (117)

where d\operatorname{\mathrm{d}} is the exterior derivative.

11.3 Hamiltonian dynamics

The dynamics of frictionless mechanical systems can be described by Hamiltonian vector fields on symplectic manifolds. Roughly speaking, these encompass systems that conserve energy, such as motion of rigid bodies and particles interacting via conservative forces. The celebrated theorem of Noether (1918) says that conserved quantities of Hamiltonian systems correspond with symmetries of the energy function (the system’s Hamiltonian). In this section, we briefly illustrate how to enforce Hamiltonicity constraints on learned dynamical systems and how to promote, discover, and enforce conservation laws. A thorough treatment of classical mechanics, symplectic manifolds, and Hamiltonian systems can be found in Abraham and Marsden (2008); Marsden and Ratiu (1999). This includes methods for reduction of systems with known symmetries and conservation laws. The following brief introduction follows Chapter 22 of Lee (2013).

Hamiltonian systems are defined on symplectic manifolds. That is, a smooth even-dimensional manifold 𝒮{\mathcal{S}} together with a smooth, nondegenerate, closed differential 22-form ω\omega, called the symplectic form. Nondegeneracy means that the map ω^p:vωp(v,)\hat{\omega}_{p}:v\mapsto\omega_{p}(v,\cdot) is a bijective linear map of Tp𝒮T_{p}{\mathcal{S}} onto its dual Tp𝒮T_{p}^{*}{\mathcal{S}} for every p𝒮p\in{\mathcal{S}}. Closure means that dω=0\operatorname{\mathrm{d}}\omega=0, where d\operatorname{\mathrm{d}} is the exterior derivative. Thanks to nondegeneracy, any smooth function HC(𝒮)H\in C^{\infty}({\mathcal{S}}) gives rise to a smooth vector field

VH=ω^1(dH)V_{H}=\hat{\omega}^{-1}(\operatorname{\mathrm{d}}H) (118)

called the “Hamiltonian vector field” of HH. A vector field V𝔛(𝒮)V\in\mathfrak{X}({\mathcal{S}}) is said to be Hamiltonian if V=VHV=V_{H} for some function HH, called the Hamiltonian of VV. A vector field is locally Hamiltonian if it is Hamiltonian in neighborhood of each point of 𝒮{\mathcal{S}}.

The symplectic manifolds considered in classical mechanics usually consist of the cotangent bundle 𝒮=T{\mathcal{S}}=T^{*}{\mathcal{M}} of an mm-dimensional manifold {\mathcal{M}} describing the “configuration” of the system, e.g., the positions of particles. The cotangent bundle has a canonical symplectic form given by

ω=i=1mdxidξi,\omega=\sum_{i=1}^{m}\operatorname{\mathrm{d}}x^{i}\wedge\operatorname{\mathrm{d}}\xi_{i}, (119)

where (xi,ξi)(x^{i},\xi_{i}) are any choice of natural coordinates on a patch of TT^{*}{\mathcal{M}} (see Proposition 22.11 in Lee (2013)). Here, each xix^{i} is a generalized coordinate describing the configuration and ξi\xi_{i} is its “conjugate” or “generalized” momentum. The Darboux theorem (Theorem 22.13 in Lee (2013)) says that any symplectic form on a manifold can be put into the form of (119) by a choice of local coordinates. In these “Darboux” coordinates, the dynamics of a Hamiltonian system are governed by the equations

ddtxi=VH(xi)=Hξi,ddtξi=VH(ξi)=Hxi,\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}x^{i}=V_{H}(x^{i})=\frac{\partial H}{\partial\xi_{i}},\qquad\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\xi_{i}=V_{H}(\xi_{i})=-\frac{\partial H}{\partial x^{i}}, (120)

which should be familiar to anyone who has studied undergraduate mechanics.

Enforcing local Hamiltonicity on a vector field V𝔛(𝒮)V\in\mathfrak{X}({\mathcal{S}}) is equivalent to the linear constraint

Vω=0{\mathcal{L}}_{V}\omega=0 (121)

thanks to Proposition 22.17 in Lee (2013). Here V{\mathcal{L}}_{V} is the Lie derivative of the tensor field ω𝔗2(𝒮)\omega\in\mathfrak{T}^{2}({\mathcal{S}}), i.e., (115) with θ\theta being the flow of VV and its generator being the identity θ^(V)=V\hat{\theta}(V)=V. Note that the Lie derivative still makes sense even when the orbits tθt(p)=θexp(t1)(p)t\mapsto\theta_{t}(p)=\theta_{\exp(t1)}(p) are only defined for small t(ε,ε)t\in(-\varepsilon,\varepsilon). In Darboux coordinates, this constraint is equivalent to the set of equations

V(xi)xj+V(ξj)ξi=0,V(ξi)xjV(ξj)xi=0,V(xi)ξjV(xj)ξi=0\frac{\partial V(x^{i})}{\partial x^{j}}+\frac{\partial V(\xi_{j})}{\partial\xi_{i}}=0,\qquad\frac{\partial V(\xi_{i})}{\partial x^{j}}-\frac{\partial V(\xi_{j})}{\partial x^{i}}=0,\qquad\frac{\partial V(x^{i})}{\partial\xi_{j}}-\frac{\partial V(x^{j})}{\partial\xi_{i}}=0 (122)

for all 1i,jm1\leq i,j\leq m. When the first de Rham cohomology group satisfies HdR1(𝒮)=0H^{1}_{\text{dR}}({\mathcal{S}})=0, for example when 𝒮{\mathcal{S}} is contractible, local Hamilonicity implies the existence of a global Hamiltonian for VV, unique on each component of 𝒮{\mathcal{S}} up to addition of a constant by Lee (2013, Proposition 22.17).

Of course our approach also makes it possible to promote Hamiltonicity with respect to candidate symplectic structures when learning a vector field VV. To do this, we can penalize the nuclear norm of V{\mathcal{L}}_{V} restricted to a subspace Ω~\tilde{\Omega} of candidate closed 22-forms using the regularization function

RΩ~,(V)=V|Ω~.R_{\tilde{\Omega},*}(V)=\big{\|}\left.{\mathcal{L}}_{V}\right|_{\tilde{\Omega}}\big{\|}_{*}. (123)

The strength of this penalty can be increased when solving a regression problem for VV until there is a non-degenerate 22-form in the nullspace Null(V)Ω~\operatorname{Null}({\mathcal{L}}_{V})\cap\tilde{\Omega}. This gives a symplectic form with respect to which VV is locally Hamiltonian.

Another option is to learn a (globally-defined) Hamiltonian function HH directly by fitting VHV_{H} to data. In this case, we can regularize the learning problem by penalizing the breaking of conservation laws. The time-derivative of a quantity, that is, a smooth function fC(𝒮)f\in C^{\infty}({\mathcal{S}}) under the flow of VHV_{H} is given by the Poisson bracket

{f,H}:=ω(Vf,VH)=df(VH)=VH(f)=Vf(H).\{f,H\}:=\omega(V_{f},V_{H})=\operatorname{\mathrm{d}}f(V_{H})=V_{H}(f)=-V_{f}(H). (124)

Hence, ff is a conserved quantity if and only if HH is invariant under the flow of VfV_{f} — this is Noether’s theorem. It is also evident that the Poisson bracket is linear with respect to both of its arguments. In fact, the Poisson bracket turns C(𝒮)C^{\infty}({\mathcal{S}}) into a Lie algebra with fVff\mapsto V_{f} being a Lie algebra homomorphism, i.e., V{f,g}=[Vf,Vg]V_{\{f,g\}}=[V_{f},V_{g}].

As a result of these basic properties of the Poisson bracket, the quantities conserved by a given Hamiltonian vector field VHV_{H} form a Lie subalgebra given by the nullspace of a linear operator LH:C(𝒮)C(𝒮)L_{H}:C^{\infty}({\mathcal{S}})\mapsto C^{\infty}({\mathcal{S}}) defined by

LH:f{f,H}.L_{H}:f\mapsto\{f,H\}. (125)

To promote conservation of quantities in a given subalgebra 𝔤C(𝒮){\mathfrak{g}}\subset C^{\infty}({\mathcal{S}}) when learning a Hamiltonian HH, we can penalize the nuclear norm of LHL_{H} restricted to 𝔤{\mathfrak{g}}, that is

R𝔤,(H)=LH|𝔤.R_{{\mathfrak{g}},*}(H)=\big{\|}\left.L_{H}\right|_{{\mathfrak{g}}}\big{\|}_{*}. (126)

For example, we might expect a mechanical system to conserve angular momentum about some axes, but not others due to applied torques. In the absence of data to the contrary, it often makes sense to assume that various linear and angular momenta are conserved.

11.4 Multilinear integral operators

In this section we provide machinery to study the symmetries of linear and nonlinear integral operators acting on sections of vector bundles, yielding far-reaching generalizations of our results in Section 6.2. Such operators can form the layers of neural networks acting on various vector and tensor fields supported on manifolds.

Let π0:E00\pi_{0}:E_{0}\to{\mathcal{M}}_{0} and πj:Ejj\pi_{j}:E_{j}\to{\mathcal{M}}_{j} be vector bundles with j{\mathcal{M}}_{j} being djd_{j}-dimensional orientable Riemannian manifolds with volume forms dVjΩdj(Tj)\operatorname{dV}_{j}\in\Omega^{d_{j}}(T^{*}{\mathcal{M}}_{j}), j=1,,rj=1,\ldots,r. Note that here, dVj\operatorname{dV}_{j} does not denote the exterior derivative of a (dj1)(d_{j}-1)-form. A continuous section KK of the bundle

E=E0E1Er:=(p,q1,,qr)0××rE0,pE1,q1Er,qrE=E_{0}\otimes E_{1}^{*}\otimes\cdots\otimes E_{r}^{*}:=\coprod_{(p,q_{1},\ldots,q_{r})\in{\mathcal{M}}_{0}\times\cdots\times{\mathcal{M}}_{r}}E_{0,p}\otimes E_{1,q_{1}}^{*}\otimes\cdots\otimes E_{r,q_{r}}^{*} (127)

can be viewed as a continuous family of rr-multilinear maps

K(p,q1,,qr):j=1rEj,qjE0,p.K(p,q_{1},\ldots,q_{r}):\bigoplus_{j=1}^{r}E_{j,q_{j}}\to E_{0,p}. (128)

The section KK can serve as the kernel to define an rr-multilinear integral operator 𝒯K:D(𝒯K)j=1rΣ(Ej)Σ(E0){\mathcal{T}}_{K}:D({\mathcal{T}}_{K})\subset\bigoplus_{j=1}^{r}\operatorname{\Sigma}(E_{j})\to\operatorname{\Sigma}(E_{0}) with action on (F1,,Fr)D(𝒯K)(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K}) given by

𝒯K[F1,,Fr](p)=1××rK(p,q1,,qr)[F1(q1),,Fr(qr)]dV1(q1)dVr(qr).{\mathcal{T}}_{K}[F_{1},\ldots,F_{r}](p)=\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(p,q_{1},\ldots,q_{r})\big{[}F_{1}(q_{1}),\ldots,F_{r}(q_{r})\big{]}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\operatorname{dV}_{r}(q_{r}). (129)

This operator is linear when r=1r=1. When r>1r>1 and E1==ErE_{1}=\cdots=E_{r}, (129) can be used to define a nonlinear integral operator Σ(E1)Σ(E0)\operatorname{\Sigma}(E_{1})\to\operatorname{\Sigma}(E_{0}) with action F𝒯K[F,,F]F\mapsto{\mathcal{T}}_{K}[F,\ldots,F].

Given fiber-linear right GG-actions Θj:Ej×GEj\Theta_{j}:E_{j}\times G\to E_{j}, there is an induced fiber-linear right GG-action Θ:E×GE\Theta:E\times G\to E on EE defined by

Θg(Kp,q1,,qr)[v1,,vr]=Θ0,g(Kp,q1,,qr[Θ1,g1(v1),,Θr,g1(vr)])\Theta_{g}(K_{p,q_{1},\ldots,q_{r}})\big{[}v_{1},\ldots,v_{r}\big{]}=\Theta_{0,g}\left(K_{p,q_{1},\ldots,q_{r}}\big{[}\Theta_{1,g^{-1}}(v_{1}),\ldots,\Theta_{r,g^{-1}}(v_{r})\big{]}\right) (130)

for Kp,q1,,qrEK_{p,q_{1},\ldots,q_{r}}\in E viewed as an rr-multilinear map E1,q1Er,qrE0,pE_{1,q_{1}}\oplus\cdots\oplus E_{r,q_{r}}\to E_{0,p} and vjEj,θj,g(qj)v_{j}\in E_{j,\theta_{j,g}(q_{j})}. Sections FjΣ(Ej)F_{j}\in\operatorname{\Sigma}(E_{j}) transform according to

𝒦gEjFj=Θj,g1Fjθj,g,{\mathcal{K}}^{E_{j}}_{g}F_{j}=\Theta_{j,g^{-1}}\circ F_{j}\circ\theta_{j,g}, (131)

with the section defining the integral kernel transforming according to

𝒦gEK(p,q1,,qr)[vq1,,vqr]=Θ0,g1{K(θ0,g(p),θ1,g(q1),,θr,g(qr))[Θ1,g(vq1),,Θr,g(vqr)]}.{\mathcal{K}}^{E}_{g}K(p,q_{1},\ldots,q_{r})[v_{q_{1}},\ldots,v_{q_{r}}]=\\ \Theta_{0,g^{-1}}\Big{\{}K\big{(}\theta_{0,g}(p),\theta_{1,g}(q_{1}),\ldots,\theta_{r,g}(q_{r})\big{)}\big{[}\Theta_{1,g}(v_{q_{1}}),\ldots,\Theta_{r,g}(v_{q_{r}})\big{]}\Big{\}}. (132)

Using these transformation laws, we define equivariance for the integral operator as follows:

Definition 30

The integral operator in (129) is equivariant with respect to gGg\in G if

𝒦gE0𝒯K[𝒦g1E1F1,,𝒦g1ErFr]=𝒯K[F1,,Fr]{\mathcal{K}}^{E_{0}}_{g}{\mathcal{T}}_{K}\big{[}{\mathcal{K}}^{E_{1}}_{g^{-1}}F_{1},\ldots,{\mathcal{K}}^{E_{r}}_{g^{-1}}F_{r}\big{]}={\mathcal{T}}_{K}\big{[}F_{1},\ldots,F_{r}\big{]} (133)

for every (F1,,Fr)D(𝒯K)(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K}).

This definition is equivalent to the following condition on the integral kernel:

Lemma 31

The integral operator in (129) is equivariant with respect to gGg\in G if and only if

𝒦g(KdV1Vr):=𝒦gE(K)θ1,gdV1θr,gdVr=KdV1dVr,{\mathcal{K}}_{g}\big{(}K\operatorname{dV}_{1}\wedge\cdots\wedge V_{r}\big{)}:={\mathcal{K}}^{E}_{g}(K)\ \theta_{1,g}^{*}\operatorname{dV}_{1}\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}=K\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}, (134)

where 𝒦gE{\mathcal{K}}_{g}^{E} is defined by (132). A proof is available in Appendix A.

We note that 𝒦gΩ(dVj)=θj,gdVj{\mathcal{K}}^{\Omega}_{g}(\operatorname{dV}_{j})=\theta_{j,g}^{*}\operatorname{dV}_{j} is the natural transformation of the differential form dVjΩdj(Tj)\operatorname{dV}_{j}\in\Omega^{d_{j}}(T^{*}{\mathcal{M}}_{j}) (a covariant tensor field) described in Section 11.2. The Lie derivative of this action on volume forms is given by

ξΩdV=d(θ^(ξ)dV)=divθ^(ξ)dV{\mathcal{L}}^{\Omega}_{\xi}\operatorname{dV}=\operatorname{\mathrm{d}}\big{(}\hat{\theta}(\xi)\operatorname{{\lrcorner}}\operatorname{dV}\big{)}=\operatorname{div}\hat{\theta}(\xi)\operatorname{dV} (135)

thanks to Cartan’s magic formula and the definition of divergence (see Lee (2013)). Therefore, differentiating (134) along the curve g(t)=exp(tξ)g(t)=\exp(t\xi) yields the Lie derivative

ξ(KdV1dVr)=[ξE(K)+Kj=1rdivθ^j(ξ)]dV1dVr.{\mathcal{L}}_{\xi}\big{(}K\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}\big{)}=\bigg{[}{\mathcal{L}}^{E}_{\xi}(K)+K\sum_{j=1}^{r}\operatorname{div}\hat{\theta}_{j}(\xi)\bigg{]}\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}. (136)

Here, ξE{\mathcal{L}}_{\xi}^{E} is the Lie derivative associated with (132). For the integral operators discussed in Section 6.2, the formulas derived here recover Eqs. 38 and 40.

12 Invariant submanifolds and tangency

Studying the symmetries of smooth maps can be cast into a more general framework in which we study the symmetries of submanifolds. Specifically, the symmetries of a smooth map F:01F:{\mathcal{M}}_{0}\to{\mathcal{M}}_{1} between manifolds correspond to symmetries of its graph, gr(F)\operatorname{gr}(F), and the symmetries of a smooth section of a vector bundle FC(,E)F\in C^{\infty}({\mathcal{M}},E) correspond to symmetries of its image, im(F)\operatorname{im}(F) — both of which are properly embedded submanifolds of 0×1{\mathcal{M}}_{0}\times{\mathcal{M}}_{1} and EE, respectively. We show that symmetries of a large class of submanifolds, including the above, are revealed by checking whether the infinitesimal generators of the group action are tangent to the submanifold. In this setting, the Lie derivative of FC(,E)F\in C^{\infty}({\mathcal{M}},E) has a geometric interpretation as a projection of the infintesimal generator onto the tangent space of the image im(F)\operatorname{im}(F), viewed as a submanifold of the bundle.

12.1 Symmetry of submanifolds

In this section we study the infinitesimal conditions for a submanifold to be invariant under the action of a Lie group. Suppose that 𝒩{\mathcal{N}} is a manifold and θ:𝒩×G𝒩\theta:{\mathcal{N}}\times G\to{\mathcal{N}} is a right action of a Lie group GG on 𝒩{\mathcal{N}}. Sometimes we denote this action by pg=θ(p,g)p\cdot g=\theta(p,g) when there is no ambiguity. Though our results also hold for left actions, as we discuss later in Remark 37, working with right actions is standard in this context and allows us to leverage results from Lee (2013) more naturally in our proofs. Fixing p𝒩p\in{\mathcal{N}}, the orbit map of this action is denoted θ(p):G𝒩\theta^{(p)}:G\to{\mathcal{N}}. Fixing gGg\in G, the map θg:𝒩𝒩\theta_{g}:{\mathcal{N}}\to{\mathcal{N}} defined by θg:pθ(p,g)\theta_{g}:p\mapsto\theta(p,g) is a diffeomorphism with inverse θg1\theta_{g^{-1}}.

Definition 32

A subset 𝒩{\mathcal{M}}\subset{\mathcal{N}} is GG-invariant if and only if θ(p,g)\theta(p,g)\in{\mathcal{M}} for every gGg\in G and pp\in{\mathcal{M}}.

Sometimes we will denote G={θ(p,g):p,gG}{\mathcal{M}}\cdot G=\{\theta(p,g)\ :\ p\in{\mathcal{M}},\ g\in G\}, in which case GG-invariance of {\mathcal{M}} can be stated as G{\mathcal{M}}\cdot G\subset{\mathcal{M}}.

We study the group invariance of submanifolds of the following type:

Definition 33

Let {\mathcal{M}} be a weakly embedded mm-dimensional submanifold of an nn-dimensional manifold 𝒩{\mathcal{N}}. We say that {\mathcal{M}} is arcwise-closed if any smooth curve γ:[a,b]𝒩\gamma:[a,b]\to{\mathcal{N}} satisfying γ((a,b))\gamma((a,b))\subset{\mathcal{M}} must also satisfy γ([a,b])\gamma([a,b])\subset{\mathcal{M}}.

Submanifolds of this type include all properly embedded submanifolds of 𝒩{\mathcal{N}} because properly embedded submanifolds are closed subsets (Proposition 5.5 in Lee (2013)). More interestingly, we have the following:

Proposition 34

The leaves of any (nonsingular) foliation of 𝒩{\mathcal{N}} are arcwise-closed. We provide a proof in Appendix A.

This means that the kinds of submanifolds we are considering include all possible Lie subgroups (Lee (2013, Theorem 19.25)) as well as their orbits under free and proper group actions (Lee (2013, Proposition 21.7)). The leaves of singular foliations associated with integrable distributions of nonconstant rank (see Kolář et al. (1993, Sections 3.18–25)) can fail to be arcwise-closed. For example, the distribution spanned by the vector field xxx\frac{\partial}{\partial x} on \mathbb{R} has maximal integral manifolds (,0)(-\infty,0), {0}\{0\}, and (0,)(0,\infty) forming a singular foliation of \mathbb{R}. Obviously, the leaves (,0)(-\infty,0) and (0,)(0,\infty) are not arcwise-closed.

Given a submanifold and a candidate group of transformations, the following theorem describes the largest connected Lie subgroup of symmetries of the submanifold. Specifically, these symmetries can be identified by checking tangency conditions between infinitesimal generators and the submanifold.

Theorem 35

Let {\mathcal{M}} be an immersed submanifold of 𝒩{\mathcal{N}} and let θ:𝒩×G𝒩\theta:{\mathcal{N}}\times G\to{\mathcal{N}} be a right action of a Lie group GG on 𝒩{\mathcal{N}} with infinitesimal generator θ^:Lie(G)𝔛(𝒩)\hat{\theta}:\operatorname{Lie}(G)\to\mathfrak{X}({\mathcal{N}}). Then

𝔰𝔶𝔪G()={ξLie(G):θ^(ξ)pTpp}\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\big{\{}\xi\in\operatorname{Lie}(G)\ :\ \hat{\theta}(\xi)_{p}\in T_{p}{\mathcal{M}}\quad\forall p\in{\mathcal{M}}\big{\}} (137)

is the Lie subalgebra of a unique connected Lie subgroup HGH\subset G. If {\mathcal{M}} is weakly-embedded and arcwise-closed in 𝒩{\mathcal{N}}, then this subgroup has the following properties:

  1. (i)

    H{\mathcal{M}}\cdot H\subset{\mathcal{M}}

  2. (ii)

    If H~\tilde{H} is a connected Lie subgroup of GG such that H~{\mathcal{M}}\cdot\tilde{H}\subset{\mathcal{M}}, then H~H\tilde{H}\subset H.

If {\mathcal{M}} is properly embedded in 𝒩{\mathcal{N}} then HH is the identity component of the closed, properly embedded Lie subgroup

SymG()={gG:g}.\operatorname{Sym}_{G}({\mathcal{M}})=\{g\in G\ :\ {\mathcal{M}}\cdot g\subset{\mathcal{M}}\}. (138)

A proof is provided in Appendix F.

Since the infinitesimal generator θ^\hat{\theta} is a linear map and TpT_{p}{\mathcal{M}} is a subspace of Tp𝒩T_{p}{\mathcal{N}}, the tangency conditions defining the Lie subalgebra (137) can be viewed as a set of linear equality constraints on the elements ξLie(G)\xi\in\operatorname{Lie}(G). Hence, 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) can be computed as the nullspace of a positive semidefinite operator on Lie(G)\operatorname{Lie}(G), defined analogously to the case described earlier in Section 7.1.

The following theorem provides necessary and sufficient conditions for arcwise-closed weakly-embedded submanifolds to be GG-invariant. These are generally nonlinear constraints on the submanifold, regarded as the zero section of its normal bundle under identification with a tubular neighborhood. However, we will show in Section 12.2 that these become linear constraints recovering the Lie derivative when the submanifold in question is the image of a section of a vector bundle and the group action is fiber-linear.

Theorem 36

Let {\mathcal{M}} be an arcwise-closed weakly-embedded submanifold of 𝒩{\mathcal{N}} and let θ:𝒩×G𝒩\theta:{\mathcal{N}}\times G\to{\mathcal{N}} be a right action of a Lie group GG on 𝒩{\mathcal{N}} with infinitesimal generator θ^:Lie(G)𝔛(𝒩)\hat{\theta}:\operatorname{Lie}(G)\to\mathfrak{X}({\mathcal{N}}). Let ξ1,,ξq\xi_{1},\ldots,\xi_{q} generate Lie(G)\operatorname{Lie}(G) and let g1,,gnG1g_{1},\ldots,g_{n_{G}-1} be elements from each non-identity component of GG. Then {\mathcal{M}} is G0G_{0}-invariant if and only if

θ^(ξi)pTpp,i=1,,q.\hat{\theta}(\xi_{i})_{p}\in T_{p}{\mathcal{M}}\qquad\forall p\in{\mathcal{M}},\quad i=1,\ldots,q. (139)

If, in addition, we have gj{\mathcal{M}}\cdot g_{j}\subset{\mathcal{M}} for every j=1,,nG1j=1,\ldots,n_{G}-1, then {\mathcal{M}} is GG-invariant. A proof is provided in Appendix G.

Remark 37 (Left actions)

When the group GG acts on 𝒩{\mathcal{N}} from the left according to θL:G×𝒩𝒩\theta^{L}:G\times{\mathcal{N}}\to{\mathcal{N}}, we can always construct an equivalent right-action θR:𝒩×𝒩𝒩\theta^{R}:{\mathcal{N}}\times{\mathcal{N}}\to{\mathcal{N}} by setting θR(p,g)=θL(g1,p)\theta^{R}(p,g)=\theta^{L}(g^{-1},p). The corresponding infinitesimal generators are related by θ^R=θ^L\hat{\theta}^{R}=-\hat{\theta}^{L}. Since θ^L(ξ)pTp\hat{\theta}^{L}(\xi)_{p}\in T_{p}{\mathcal{M}} if and only if θ^R(ξ)pTp\hat{\theta}^{R}(\xi)_{p}\in T_{p}{\mathcal{M}}, Theorems 36 and 35 hold without modification for left GG-actions.

12.2 The Lie derivative as a projection

We provide a geometric interpretation of the Lie derivative in (97) by expressing it in terms of a projection of the infinitesimal generator of the group action onto the tangent space of im(F)\operatorname{im}(F) for smooth sections FC(,E)F\in C^{\infty}({\mathcal{M}},E). This allows us to connect the Lie derivative to the tangency conditions for symmetry of submanifolds presented in Section 12.1.

The Lie derivative ξF(p){\mathcal{L}}_{\xi}F(p) lies in EpE_{p}, while TF(p)im(F)T_{F(p)}\operatorname{im}(F) is a subspace of TF(p)ET_{F(p)}E. To relate quantities in these different spaces, the following lemma introduces a lifting of each EpE_{p} to a subspace of TF(p)ET_{F(p)}E.

Lemma 38

For every smooth section FC(,E)F\in C^{\infty}({\mathcal{M}},E) there is a well-defined injective vector bundle homomorphism ıF:ETE\imath_{F}:E\to TE that is expressed in any local trivialization Φ:π1(𝒰)𝒰×k\Phi:\pi^{-1}({\mathcal{U}})\to{\mathcal{U}}\times\mathbb{R}^{k} as

dΦıFΦ1:𝒰×k\displaystyle\operatorname{\mathrm{d}}\Phi\circ\imath_{F}\circ\Phi^{-1}:{\mathcal{U}}\times\mathbb{R}^{k} T(𝒰×k)\displaystyle\to T({\mathcal{U}}\times\mathbb{R}^{k}) (140)
(p,𝒗)\displaystyle(p,{\boldsymbol{v}}) (0,𝒗)Φ(F(p)).\displaystyle\mapsto(0,{\boldsymbol{v}})_{\Phi(F(p))}.

We give a proof in Appendix H.

This is a special case of the “vertical lift” of EE into the vertical bundle VE={vTE:dπv=0}VE=\{v\in TE\ :\ \operatorname{\mathrm{d}}\pi v=0\} described by Kolář et al. (1993) in Section 6.11. The “vertical projection” vprE:VEE\operatorname{vpr}_{E}:VE\to E provides a left-inverse satisfying vprEıF=IdE\operatorname{vpr}_{E}\circ\imath_{F}=\operatorname{Id}_{E}.

The following result relates the Lie derivative to a projection via the vertical lifting.

Theorem 39

Given a smooth section FC(,E)F\in C^{\infty}({\mathcal{M}},E) and pp\in{\mathcal{M}}, the map PF(p):=d(Fπ)(F(p)):TF(p)ETF(p)EP_{F(p)}:=\operatorname{\mathrm{d}}(F\circ\pi)(F(p)):T_{F(p)}E\to T_{F(p)}E is a linear projection onto TF(p)im(F)T_{F(p)}\operatorname{im}(F) and for every ξLie(G)\xi\in\operatorname{Lie}(G) we have

ıF(ξF)(p)=[IPF(p)]Θ^(ξ)F(p).\imath_{F}\circ({\mathcal{L}}_{\xi}F)(p)=-\big{[}I-P_{F(p)}\big{]}\hat{\Theta}(\xi)_{F(p)}. (141)

We give a proof in Appendix H.

For the special case of smooth maps F:mnF:\mathbb{R}^{m}\to\mathbb{R}^{n} viewed a sections x(x,F(x))x\mapsto(x,F(x)) of the bundle π:m×nm\pi:\mathbb{R}^{m}\times\mathbb{R}^{n}\to\mathbb{R}^{m}, this theorem reproduces (55). The following corollary provides a link between our main results for sections of vector bundles and our main results for symmetries of submanifolds.

Corollary 40

For every smooth section FC(,E)F\in C^{\infty}({\mathcal{M}},E), ξLie(G)\xi\in\operatorname{Lie}(G), and pp\in{\mathcal{M}} we have

(ξF)(p)=0Θ^(ξ)F(p)TF(p)im(F).({\mathcal{L}}_{\xi}F)(p)=0\qquad\Leftrightarrow\qquad\hat{\Theta}(\xi)_{F(p)}\in T_{F(p)}\operatorname{im}(F). (142)

In particular, this means that for smooth sections, Theorems 26 and 25 are special cases of Theorems 36 and 35.

13 Conclusion

This paper provides a unified theoretical approach to enforce, discover, and promote symmetries in machine learning models. In particular, we provide theoretical foundations for Lie group symmetry in machine learning from a linear-algebraic viewpoint. This perspective unifies and generalizes several leading approaches in the literature, including approaches for incorporating and uncovering symmetries in neural networks and more general machine learning models. The central objects in this work are linear operators describing the finite and infinitesimal transformations of smooth sections of vector bundles with fiber-linear Lie group actions. To make the paper accessible to a wide range of practitioners, Sections 410 deal with the special case where the machine learning models are built using smooth functions between vector spaces. Our main results establish that the infinitesimal operators — the Lie derivatives — fully encode the connected subgroup of symmetries for sections of vector bundles (resp. functions between vector spaces). In other words, the Lie derivatives encode symmetries that the machine learning models are equivariant with respect to.

We illustrate that promoting and enforcing continuous symmetries in large classes of machine learning models are dual problems with respect to the bilinear structure of the Lie derivative. Moreovery, these ideas extend naturally to identify continuous symmetries of arbitrary submanifolds, recovering the Lie derivative when the submanifold is the image of a section of a vector bundle (resp., the graph of a function between vector spaces). Using the fundamental operators, we also describe how symmetries can be promoted as inductive biases during training of machine learning models using convex penalties. Our numerical results show that minimizing these convex penalties can be used to recover highly symmetric polynomial functions using fewer samples than are required to determine the polynomial coefficients directly as the solution of a linear system. This reduction in sample complexity becomes more pronounced in higher dimensions and with increasing symmetry of the function to be recovered. Finally, we provide rigorous data-driven methods for discretizing and approximating the fundamental operators to accomplish the tasks of enforcing, promoting, and discovering symmetry. Importantly, these theoretical concepts, while extremely general, admit efficient computational implementations via simple linear algebra.

The main limitations of our approach come from the need to make appropriate choices for key objects including the candidate group GG, the space of functions {\mathcal{F}} defining the machine learning model, and appropriate inner products for discretizing the fundamental operators. For example, it is possible that the only GG-symmetric functions in {\mathcal{F}} are trivial, meaning that enforcing symmetry results in learning only trivial models. One open question is whether our framework can be used in such cases to learn relaxed symmetries, as described by Wang et al. (2022). In other words, we may hope to find elements in {\mathcal{F}} that are nearly symmetric, and to bound the degree of asymmetry based on the quantities derived from the fundamental operators, such as their norms. Additionally, the choice of inner products associated with the discretization of the fundamental operators could affect the results of nuclear norm penalization. Our reliance on the Lie algebra to study continuous symmetries also limits the ability of our proposed methods to account for partial symmetries, such as the invariance in classifying the characters “Z” and “N” to rotations by small angles, but not large angles.

In follow-up work, we aim to apply the proposed methods to a wide range of examples, and to explain practical implementation details. A main goal will be to study the extent to which nuclear norm relaxation can recover underlying symmetry groups and reduce the amount of data required to train accurate machine learning models on realistic data sets. Additionally, we will examine how the proposed techniques perform in the presence of noisy data, with the goal of understanding the empirical effects of problem dimension, noise level, and the candidate symmetry group.

Other important avenues of future work include investigating computationally efficient approaches to discretize the fundamental operators and use them to enforce, discover, and promote symmetry within our framework. This could involve leveraging sparse structure of the discretized operators in certain bases to enable the use of efficient Krylov subspace algorithms. It will also be useful to identify efficient optimization algorithms for training symmetry-constrained or symmetry-regularized machine learning models. Promising candidates include projected gradient descent, proximal splitting algorithms, and the Iteratively Reweighted Least Squares (IRLS) algorithms described by Mohan and Fazel (2012). Using IRLS could enable symmetry-promoting penalties to be based on non-convex Schatten pp-norms with 0<p<10<p<1, potentially improving the recovery of underlying symmetry groups compared to the nuclear norm where p=1p=1.

There are also several avenues we plan to explore in future theoretical work. These include extending the techniques presented here via jet bundle prolongation, as described by Olver (1986), to study symmetries in machine learning for Partial Differential Equations (PDEs). Combining analogues of our proposed methods in this setting with techniques using the weak formulation proposed by Messenger and Bortz (2021); Reinbold et al. (2020) could provide robust ways to identify symmetric PDEs in the presence of high noise and limited training data. We also aim to study the perturbative effects of noisy data in algorithms to discover and promote symmetry with the goal of understanding the effects of problem dimension, noise level, and number of data points on recovery of symmetry groups. Another important direction of theoretical study will be to build on the work of Peitz et al. (2023); Steyert (2022) by studying symmetry in the setting of Koopman operators for dynamical systems. To do this, one might follow the program set forth by Colbrook (2023), where the measure preserving property of certain dynamical systems is exploited to enhance the Extended Dynamic Mode Decomposition (EDMD) algorithm of Williams et al. (2015).

Acknowledgements

The authors acknowledge support from the National Science Foundation AI Institute in Dynamic Systems (grant number 2112085). SLB acknowledges support from the Army Research Office (ARO W911NF-19-1-0045) and the Boeing Company. The authors would also like to acknowledge valuable discussions with Tess Smidt and Matthew Colbrook.

References

  • Abraham et al. (1988) R. Abraham, J. E. Marsden, and T. Ratiu. Manifolds, Tensor Analysis, and Applications, volume 75 of Applied Mathematical Sciences. Springer-Verlag, 1988.
  • Abraham and Marsden (2008) Ralph Abraham and Jerrold E Marsden. Foundations of mechanics. AMS Chelsea Publishing, 2 edition, 2008.
  • Agrawal et al. (2018) Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
  • Ahmadi and Khadir (2020) Amir Ali Ahmadi and Bachir El Khadir. Learning dynamical systems with side information. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, pages 718–727. PMLR, 10–11 Jun 2020. URL https://proceedings.mlr.press/v120/ahmadi20a.html.
  • Akhound-Sadegh et al. (2024) Tara Akhound-Sadegh, Laurence Perreault-Levasseur, Johannes Brandstetter, Max Welling, and Siamak Ravanbakhsh. Lie point symmetry and physics-informed networks. Advances in Neural Information Processing Systems, 36, 2024.
  • Baddoo et al. (2023) Peter J Baddoo, Benjamin Herrmann, Beverley J McKeon, J Nathan Kutz, and Steven L Brunton. Physics-informed dynamic mode decomposition. Proceedings of the Royal Society A, 479(2271):20220576, 2023.
  • Batzner et al. (2022) Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
  • Benton et al. (2020) Gregory Benton, Marc Finzi, Pavel Izmailov, and Andrew G Wilson. Learning invariances in neural networks from training data. Advances in neural information processing systems, 33:17605–17616, 2020.
  • Berry and Giannakis (2020) Tyrus Berry and Dimitrios Giannakis. Spectral exterior calculus. Communications on Pure and Applied Mathematics, 73(4):689–770, 2020.
  • Boullé and Townsend (2023) Nicolas Boullé and Alex Townsend. A mathematical guide to operator learning. arXiv preprint arXiv:2312.14688, 2023.
  • Bouwmans et al. (2018) Thierry Bouwmans, Sajid Javed, Hongyang Zhang, Zhouchen Lin, and Ricardo Otazo. On the applications of robust PCA in image and video processing. Proceedings of the IEEE, 106(8):1427–1457, 2018.
  • Brandstetter et al. (2022) Johannes Brandstetter, Max Welling, and Daniel E Worrall. Lie point symmetry data augmentation for neural pde solvers. In International Conference on Machine Learning, pages 2241–2256. PMLR, 2022.
  • Brunton and Kutz (2022) S. L. Brunton and J. N. Kutz. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022.
  • Brunton et al. (2016) Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
  • Brunton et al. (2020) Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos. Machine learning for fluid mechanics. Annual Review of Fluid Mechanics, 52:477–508, 2020.
  • Brunton et al. (2022) Steven L Brunton, Marko Budišić, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
  • Cahill et al. (2023) Jameson Cahill, Dustin G Mixon, and Hans Parshall. Lie PCA: Density estimation for symmetric manifolds. Applied and Computational Harmonic Analysis, 2023.
  • Callaham et al. (2022) Jared L Callaham, Georgios Rigas, Jean-Christophe Loiseau, and Steven L Brunton. An empirical mean-field model of symmetry-breaking in a turbulent wake. Science Advances, 8(eabm4786), 2022.
  • Candès and Recht (2009) Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9:717–772, 2009.
  • Candès et al. (2011) Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
  • Caron and Traynor (2005) Richard Caron and Tim Traynor. The zero set of a polynomial. WSMR Report 05-02, 2005. URL https://www.researchgate.net/profile/Richard-Caron-3/publication/281285245_The_Zero_Set_of_a_Polynomial/links/55df56b608aecb1a7cc1a043/The-Zero-Set-of-a-Polynomial.pdf.
  • Champion et al. (2020) Kathleen Champion, Peng Zheng, Aleksandr Y Aravkin, Steven L Brunton, and J Nathan Kutz. A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access, 8:169259–169271, 2020.
  • Chen et al. (2020) Shuxiao Chen, Edgar Dobriban, and Jane H. Lee. A group-theoretic framework for data augmentation. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
  • Cohen and Welling (2016) Taco Cohen and Max Welling. Group equivariant convolutional networks. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2990–2999, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/cohenc16.html.
  • Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. arXiv preprint arXiv:1801.10130, 2018.
  • Cohen et al. (2019) Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant CNNs on homogeneous spaces. Advances in neural information processing systems, 32, 2019.
  • Colbrook (2023) Matthew J Colbrook. The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023.
  • Cubuk et al. (2019) Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
  • Desai et al. (2022) Krish Desai, Benjamin Nachman, and Jesse Thaler. Symmetry discovery with deep learning. Physical Review D, 105(9):096031, 2022.
  • Diamond and Boyd (2016) Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  • Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning SO(3) equivariant representations with spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018.
  • Finzi et al. (2020) Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to Lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
  • Finzi et al. (2021) Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning, pages 3318–3328. PMLR, 2021.
  • Fukushima (1980) Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
  • Goswami et al. (2023) Somdatta Goswami, Aniruddha Bora, Yue Yu, and George Em Karniadakis. Physics-informed deep neural operator networks. In Machine Learning in Modeling and Simulation: Methods and Applications, pages 219–254. Springer, 2023.
  • Gotô (1950) Morikuni Gotô. Faithful representations of Lie groups II. Nagoya mathematical journal, 1:91–107, 1950.
  • Gross (2011) David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, 2011.
  • Gross (1996) David J Gross. The role of symmetry in fundamental physics. Proceedings of the National Academy of Sciences, 93(25):14256–14259, 1996.
  • Gruver et al. (2022) Nate Gruver, Marc Anton Finzi, Micah Goldblum, and Andrew Gordon Wilson. The Lie derivative for measuring learned equivariance. In The Eleventh International Conference on Learning Representations, 2022.
  • Guan et al. (2021) Yifei Guan, Steven L Brunton, and Igor Novosselov. Sparse nonlinear models of chaotic electroconvection. Royal Society Open Science, 8(8):202367, 2021.
  • Hall (2015) Brian C. Hall. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Springer, 2015.
  • Hataya et al. (2020) Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki Nakayama. Faster autoaugment: Learning augmentation strategies using backpropagation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 1–16. Springer, 2020.
  • Holmes et al. (2012) P. J. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge Monographs in Mechanics. Cambridge University Press, Cambridge, England, 2nd edition, 2012.
  • Horn and Johnson (2013) Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2 edition, 2013.
  • Kaiser et al. (2018) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Discovering conservation laws from data for control. In 2018 IEEE Conference on Decision and Control (CDC), pages 6415–6421. IEEE, 2018.
  • Kaiser et al. (2021) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
  • Kolář et al. (1993) Ivan Kolář, Peter W. Michor, and Jan Slovák. Natural Operations in Differential Geometry. Springer-Verlag, 1993.
  • Kondor and Trivedi (2018) Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pages 2747–2755. PMLR, 2018.
  • Koopman (1931) B. O. Koopman. Hamiltonian systems and transformations in Hilbert space. Proceedings of the National Academy of Sciences, 17:315–318, 1931.
  • Koralov and Sinai (2012) Leonid B. Koralov and Yakov G. Sinai. Theory of Probability and Random Processes. Springer, 2 edition, 2012.
  • Kovachki et al. (2023) Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89):1–97, 2023.
  • LeCun et al. (1989) Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
  • Lee (2013) John M. Lee. Introduction to Smooth Manifolds: Second Edition. Springer, 2013.
  • Lezcano-Casado and Martınez-Rubio (2019) Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In International Conference on Machine Learning, pages 3794–3803. PMLR, 2019.
  • Liu and Tegmark (2022) Ziming Liu and Max Tegmark. Machine learning hidden symmetries. Phys. Rev. Lett., 128:180201, May 2022. doi: 10.1103/PhysRevLett.128.180201. URL https://link.aps.org/doi/10.1103/PhysRevLett.128.180201.
  • Loiseau and Brunton (2018) J.-C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid Mechanics, 838:42–67, 2018.
  • Maron et al. (2018) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
  • Marsden and Ratiu (1999) J. E. Marsden and T. S. Ratiu. Introduction to mechanics and symmetry. Springer-Verlag, 2nd edition, 1999.
  • Mauroy et al. (2020) Alexandre Mauroy, Y Susuki, and I Mezić. Koopman operator in systems and control. Springer, 2020.
  • Messenger and Bortz (2021) Daniel A Messenger and David M Bortz. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Modeling & Simulation, 19(3):1474–1497, 2021.
  • Meyer (2000) Carl D Meyer. Matrix analysis and applied linear algebra, volume 71. Siam, 2000.
  • Mezić (2005) Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41:309–325, 2005.
  • Miao and Rao (2007) Xu Miao and Rajesh PN Rao. Learning the Lie groups of visual invariance. Neural computation, 19(10):2665–2693, 2007.
  • Mohan and Fazel (2012) Karthik Mohan and Maryam Fazel. Iterative reweighted algorithms for matrix rank minimization. Journal of Machine Learning Research, 13(1):3441–3473, 2012.
  • Moskalev et al. (2022) Artem Moskalev, Anna Sepliarskaia, Ivan Sosnovik, and Arnold Smeulders. LieGG: Studying learned Lie group generators. Advances in Neural Information Processing Systems, 35:25212–25223, 2022.
  • Noether (1918) E Noether. Invariante variationsprobleme nachr. d. könig. gesellsch. d. wiss. zu göttingen, math-phys. klasse 1918: 235-257. English Reprint: physics/0503066, http://dx. doi. org/10.1080/00411457108231446, page 57, 1918.
  • Olver (1986) Peter J. Olver. Applications of Lie Groups to Differential Equations. Springer, 1986.
  • Otto and Rowley (2021) Samuel E Otto and Clarence W Rowley. Koopman operators for estimation and control of dynamical systems. Annual Review of Control, Robotics, and Autonomous Systems, 4:59–87, 2021.
  • Peitz et al. (2023) Sebastian Peitz, Hans Harder, Feliks Nüske, Friedrich Philipp, Manuel Schaller, and Karl Worthmann. Partial observations, coarse graining and equivariance in Koopman operator theory for large-scale dynamical systems. arXiv preprint arXiv:2307.15325, 2023.
  • Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
  • Rao and Ruderman (1999) Rajesh Rao and Daniel Ruderman. Learning Lie groups for invariant visual perception. Advances in neural information processing systems, 11, 1999.
  • Recht et al. (2010) Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
  • Reinbold et al. (2020) Patrick AK Reinbold, Daniel R Gurevich, and Roman O Grigoriev. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Physical Review E, 101(010203), 2020.
  • Romero and Lohit (2022) David W Romero and Suhas Lohit. Learning partial equivariances from data. Advances in Neural Information Processing Systems, 35:36466–36478, 2022.
  • Rowley et al. (2003) Clarence W. Rowley, Ioannis G. Kevrekidis, Jerrold E. Marsden, and Kurt Lust. Reduction and reconstruction for self-similar dynamical systems. Nonlinearity, 16(4):1257, 2003.
  • Shorten and Khoshgoftaar (2019) Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  • Steyert (2022) Vivian T Steyert. Uncovering Structure with Data-driven Reduced-Order Modeling. PhD thesis, Princeton University, 2022.
  • Tao (2011) Terence Tao. Two small facts about Lie groups. https://terrytao.wordpress.com/2011/06/25/two-small-facts-about-lie-groups/, 6 2011.
  • Van Dyk and Meng (2001) David A Van Dyk and Xiao-Li Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
  • Varadarajan (1984) V. S. Varadarajan. Lie groups, Lie algebras, and their representations. Springer, 1984.
  • Wang et al. (2022) Rui Wang, Robin Walters, and Rose Yu. Approximately equivariant networks for imperfectly symmetric dynamics. In International Conference on Machine Learning, pages 23078–23091. PMLR, 2022.
  • Weiler and Cesa (2019) Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. Advances in neural information processing systems, 32, 2019.
  • Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 31, 2018.
  • Williams et al. (2015) Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015.
  • Yang et al. (2023a) Jianke Yang, Nima Dehmamy, Robin Walters, and Rose Yu. Latent space symmetry discovery. arXiv preprint arXiv:2310.00105, 2023a.
  • Yang et al. (2023b) Jianke Yang, Robin Walters, Nima Dehmamy, and Rose Yu. Generative adversarial symmetry discovery. arXiv preprint arXiv:2302.00236, 2023b.
  • Yang et al. (2024) Jianke Yang, Wang Rao, Nima Dehmamy, Robin Walters, and Rose Yu. Symmetry-informed governing equation discovery. arXiv preprint arXiv:2405.16756, 2024.
  • Yuan and Lin (2006) Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.

Appendix A Proofs of minor results

Proof [Proposition 6] Obviously, if 𝒦gK=K{\mathcal{K}}_{g}K=K then 𝒦g(𝒲)𝒯K𝒦g1(𝒱)=𝒯K{\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}={\mathcal{T}}_{K}. On the other hand, suppose that 𝒦gK(x0,y0)K(x0,y0){\mathcal{K}}_{g}K(x_{0},y_{0})\neq K(x_{0},y_{0}) for some (x0,y0)n×m(x_{0},y_{0})\in\mathbb{R}^{n}\times\mathbb{R}^{m}. Hence, there are vectors v𝒱v\in{\mathcal{V}} and w𝒲w\in{\mathcal{W}}^{*} such that w,𝒦gK(x0,y0)vK(x0,y0)v>0\langle w,\ {\mathcal{K}}_{g}K(x_{0},y_{0})v-K(x_{0},y_{0})v\rangle>0. This remains true for all yy in a neighborhood 𝒰{\mathcal{U}} of y0y_{0} by continuity of KK and 𝒦gK{\mathcal{K}}_{g}K. Letting F(x)=vφ(x)F(x)=v\varphi(x) where φ\varphi is a smooth, nonnegative, function with φ(y0)>0\varphi(y_{0})>0 and support in 𝒰{\mathcal{U}}, we obtain

w,𝒦g(𝒲)𝒯K𝒦g1(𝒱)F(x)𝒯KF(x)=mw,𝒦gK(x,y)vK(x,y)vφ(y)dy>0,\big{\langle}w,\ {\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}F(x)-{\mathcal{T}}_{K}F(x)\big{\rangle}=\int_{\mathbb{R}^{m}}\big{\langle}w,\ {\mathcal{K}}_{g}K(x,y)v-K(x,y)v\big{\rangle}\varphi(y)\operatorname{\mathrm{d}}y>0, (143)

meaning 𝒦g(𝒲)𝒯K𝒦g1(𝒱)𝒯K{\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}\neq{\mathcal{T}}_{K}. Therefore, 𝒦gK=K{\mathcal{K}}_{g}K=K if and only if 𝒦g(𝒲)𝒯K𝒦g1(𝒱)=𝒯K{\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}={\mathcal{T}}_{K}.  

We use the following lemma in the proof of Proposition 11.

Lemma 41

Suppose that SmSS_{m}\to S is a convergent sequence of matrices and Null(S)Null(Sn)Null(Sm)\operatorname{Null}(S)\subset\operatorname{Null}(S_{n})\subset\operatorname{Null}(S_{m}) when nmn\geq m. Then there is an integer M0M_{0} such that for every mM0m\geq M_{0} we have Null(Sm)=Null(S)\operatorname{Null}(S_{m})=\operatorname{Null}(S). We provide a proof in Appendix A.

Proof Since the sequence of matrices acts on a finite-dimensional space, dimNull(Sm)\dim\operatorname{Null}(S_{m}) is a monotone bounded sequence of integers. Therefore, there exists an integer M0M_{0} such that dimNull(Sm)=dimNull(SM0)\dim\operatorname{Null}(S_{m})=\dim\operatorname{Null}(S_{M_{0}}) for every mM0m\geq M_{0}. Since Null(Sm)Null(SM0)\operatorname{Null}(S_{m})\subset\operatorname{Null}(S_{M_{0}}), we must have Null(Sm)=Null(SM0)\operatorname{Null}(S_{m})=\operatorname{Null}(S_{M_{0}}) for every mM0m\geq M_{0}. Since Null(S)Null(SM0)\operatorname{Null}(S)\subset\operatorname{Null}(S_{M_{0}}) by assumption, it remains to show the reverse containment. Suppose ξNull(SM0)\xi\in\operatorname{Null}(S_{M_{0}}), then

Sξ=limmSmξ=0S\xi=\lim_{m\to\infty}S_{m}\xi=0 (144)

meaning that Null(SM0)Null(S)\operatorname{Null}(S_{M_{0}})\subset\operatorname{Null}(S).  

Proof [Proposition 11] By the Cauchy-Schwarz inequality our assumption means that η,Sξ<\langle\eta,S_{{\mathcal{M}}}\xi\rangle<\infty for every η,ξLie(G)\eta,\xi\in\operatorname{Lie}(G). Let ξ1,,ξdimG\xi_{1},\ldots,\xi_{\dim G} be a basis for Lie(G)\operatorname{Lie}(G). By the strong law of large numbers, specifically Theorem 7.7 in Koralov and Sinai (2012), we have

ξj,SmξkLie(G)ξj,Sξk\big{\langle}\xi_{j},\ S_{m}\xi_{k}\big{\rangle}_{\operatorname{Lie}(G)}\to\langle\xi_{j},S_{{\mathcal{M}}}\xi_{k}\rangle (145)

for every j,kj,k almost surely. Consequently, SmSS_{m}\to S_{{\mathcal{M}}} almost surely. By nonnegativity of each term in the sum defining ξ,SmξLie(G)\big{\langle}\xi,\ S_{m}\xi\big{\rangle}_{\operatorname{Lie}(G)}, it follows that Null(Sn)Null(Sm)\operatorname{Null}(S_{n})\subset\operatorname{Null}(S_{m}) when nmn\geq m. Moreover, if ξNull(S)\xi\in\operatorname{Null}(S_{{\mathcal{M}}}) then it follows from the continuity of z(IPz)θ^(ξ)zz\mapsto(I-P_{z})\hat{\theta}(\xi)_{z} that (IPz)θ^(ξ)z=0(I-P_{z})\hat{\theta}(\xi)_{z}=0 for every zz\in{\mathcal{M}}. Hence, ξNull(Sm)\xi\in\operatorname{Null}(S_{m}) for every mm. Therefore, SmS_{m} and SS_{{\mathcal{M}}} obey the hypotheses of Lemma 41 almost surely and the conclusion follows.  

Proof [Proposition 17] Consider the function F~rad:nr\tilde{F}_{\text{rad}}:\mathbb{R}^{n}\to\mathbb{R}^{r} defined by

F~rad(x)=(xc122,,xcr22)\tilde{F}_{\text{rad}}(x)=(\|x-c_{1}\|_{2}^{2},\ \ldots,\ \|x-c_{r}\|_{2}^{2}) (146)

with standard action of SE(n)\operatorname{SE}(n) on its domain and the trivial action on its codomain. The symmetries of F~rad\tilde{F}_{\text{rad}} are shared by FradF_{\text{rad}}. By Theorem 4, the Lie algebra of F~rad\tilde{F}_{\text{rad}}’s symmetry group is characterized by

ξ=[Sv00]𝔰𝔶𝔪SE(n)F~rad0=ξF~rad(x)=F~rad(x)x(Sx+v)xn.\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{rad}}\quad\Leftrightarrow\quad 0={\mathcal{L}}_{\xi}\tilde{F}_{\text{rad}}(x)=\frac{\partial\tilde{F}_{\text{rad}}(x)}{\partial x}(Sx+v)\quad\forall x\in\mathbb{R}^{n}. (147)

This means the generators ξ\xi are characterized by the equations

0=(xci)T(Sx+v)=xTSxciTSx+xTvciTv,xni=1,,r.0=(x-c_{i})^{T}(Sx+v)=x^{T}Sx-c_{i}^{T}Sx+x^{T}v-c_{i}^{T}v,\qquad\forall x\in\mathbb{R}^{n}\quad i=1,\ldots,r. (148)

Since ξ𝔰𝔢(n)\xi\in\operatorname{\mathfrak{se}}(n), we have ST=SS^{T}=-S, called “skew symmetry”, giving xTSx=0x^{T}Sx=0. The above is satisfied if and only if

Sci=v,Sc_{i}=-v, (149)

which automatically yields ciTv=ciTSci=0c_{i}^{T}v=-c_{i}^{T}Sc_{i}=0. Therefore, 𝔰𝔶𝔪SE(n)F~rad=𝔤rad𝔰𝔶𝔪SE(n)Frad\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{rad}}=\mathfrak{g}_{\text{rad}}\subset\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}F_{\text{rad}}. To determine the dimension of the symmetry group, we observe that SS must satisfy

S(c2c1)==S(crc1)=0,S(c_{2}-c_{1})=\cdots=S(c_{r}-c_{1})=0, (150)

and any such SS uniquely determines v=Sc1v=-Sc_{1}. Therefore, the dimension of 𝔤rad\mathfrak{g}_{\text{rad}} equals the dimension of the space of skew-symmetric matrices satisfying (150). Let the columns of W=[W1W2]W=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix} form an orthonormal basis for n\mathbb{R}^{n} with the r1r-1 columns of W1W_{1} being a basis for span{(c2c1),,(crc1)}\operatorname{span}\{(c_{2}-c_{1}),\ldots,(c_{r}-c_{1})\}. The above constraints, together with skew-symmetry, mean that SS takes the form

S=[W1W2][000S~][W1TW2T],S=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix}\begin{bmatrix}0&0\\ 0&\tilde{S}\end{bmatrix}\begin{bmatrix}W_{1}^{T}\\ W_{2}^{T}\end{bmatrix}, (151)

where S~\tilde{S} is an (nr+1)×(nr+1)(n-r+1)\times(n-r+1) matrix skew-symmetric matrix. Therefore, the dimension of 𝔤rad\mathfrak{g}_{\text{rad}} equals the dimension of the space of (nr+1)×(nr+1)(n-r+1)\times(n-r+1) skew-symmetric matrices, which is 12(nr)(nr+1)\frac{1}{2}(n-r)(n-r+1).

The argument for FlinF_{\text{lin}} is similar, with the symmetries of

F~lin(x)=(u1Tx,,urTx)=UTx\tilde{F}_{\text{lin}}(x)=\big{(}u_{1}^{T}x,\ \ldots,\ u_{r}^{T}x\big{)}=U^{T}x (152)

also being symmetries of FlinF_{\text{lin}}. The condition 0=ξF~lin0={\mathcal{L}}_{\xi}\tilde{F}_{\text{lin}} is equivalent to

UTSx+UTv=0xn,U^{T}Sx+U^{T}v=0\qquad\forall x\in\mathbb{R}^{n}, (153)

which occurs if and only if UTS=0U^{T}S=0 and UTv=0U^{T}v=0. This immediately yields 𝔰𝔶𝔪SE(n)F~lin=𝔤lin𝔰𝔶𝔪SE(n)Flin\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{lin}}=\mathfrak{g}_{\text{lin}}\subset\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}F_{\text{lin}}. Per our earlier argument, the skew-symmetric matrices SS satisfying

Su1==Sur=0Su_{1}=\cdots=Su_{r}=0 (154)

form a vector space with dimension 12(nr)(nr1)\frac{1}{2}(n-r)(n-r-1). The subspace of vectors vnv\in\mathbb{R}^{n} satisfying UTv=0U^{T}v=0 is (nr)(n-r)-dimensional. Adding these gives the dimension of 𝔤lin\mathfrak{g}_{\text{lin}}, which is 12(nr)(nr+1)\frac{1}{2}(n-r)(n-r+1).

Suppose there exists a polynomial φ0\varphi_{0} with degree d\leq d such that 𝔰𝔶𝔪SE(n)(Frad)=𝔤rad\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})=\mathfrak{g}_{\text{rad}} when φrad=φ0\varphi_{\text{rad}}=\varphi_{0}. Let 𝔤{\mathfrak{g}}_{\perp} be a complementary subspace to 𝔤rad{\mathfrak{g}}_{\text{rad}} in 𝔰𝔢(n)\operatorname{\mathfrak{se}}(n), that is, 𝔤𝔤rad=𝔰𝔢(n){\mathfrak{g}}_{\perp}\oplus{\mathfrak{g}}_{\text{rad}}=\operatorname{\mathfrak{se}}(n). We observe that 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}} if and only if there is a nonzero ξ𝔤\xi_{\perp}\in{\mathfrak{g}}_{\perp} satisfying ξFrad=0{\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}=0. The “if” part of this claim is obvious. The “only if” part follows from the fact that 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}} means that ξFrad=0{\mathcal{L}}_{\xi}F_{\text{rad}}=0 for some nonzero ξ𝔤rad\xi\notin{\mathfrak{g}}_{\text{rad}}. Using the direct-sum decomposition of 𝔰𝔢(n)\operatorname{\mathfrak{se}}(n), there are unique ξ𝔤\xi_{\perp}\in{\mathfrak{g}}_{\perp} and ξa𝔤a\xi_{a}\in{\mathfrak{g}}_{a} such that ξ=ξ+ξa\xi=\xi_{\perp}+\xi_{a}, yielding

0=ξFrad=ξFrad+ξaFrad=ξFrad.0={\mathcal{L}}_{\xi}F_{\text{rad}}={\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}+{\mathcal{L}}_{\xi_{a}}F_{\text{rad}}={\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}. (155)

Moreover, ξ0\xi_{\perp}\neq 0 because ξ𝔤a\xi\notin{\mathfrak{g}}_{a}. Letting ξ1,,ξD\xi_{1},\ldots,\xi_{D} form a basis for 𝔤{\mathfrak{g}}_{\perp}, we consider the D×DD\times D Gram matrix 𝑮(Frad){\boldsymbol{G}}(F_{\text{rad}}) with entries

[𝑮(Frad)]i,j=[0,1]nξiFrad(x)ξjFrad(x)dx.\big{[}{\boldsymbol{G}}(F_{\text{rad}})\big{]}_{i,j}=\int_{[0,1]^{n}}{\mathcal{L}}_{\xi_{i}}F_{\text{rad}}(x){\mathcal{L}}_{\xi_{j}}F_{\text{rad}}(x)\ \operatorname{\mathrm{d}}x. (156)

This matrix is singular if and only if there is a nonzero ξ𝔤\xi_{\perp}\in{\mathfrak{g}}_{\perp} satisfying ξFrad(x)=0{\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}(x)=0 for every xx in the cube [0,1]n[0,1]^{n}. Since

ξFrad(x)=Frad(x)x(Sx+v),ξ=[Sv00],{\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}(x)=\frac{\partial F_{\text{rad}}(x)}{\partial x}(Sx+v),\qquad\xi_{\perp}=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}, (157)

is a polynomial function of xx, it vanishes in the cube if and only if it vanishes everywhere. Hence 𝑮(Frad){\boldsymbol{G}}(F_{\text{rad}}) is singular if and only if 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}}. Letting 𝒄{\boldsymbol{c}} denote the vector of coefficients defining φrad\varphi_{\text{rad}} in a basis for the polynomials of degree d\leq d on r\mathbb{R}^{r}, we observe that

f:𝒄det(𝑮(Frad))f:{\boldsymbol{c}}\mapsto\det\big{(}{\boldsymbol{G}}(F_{\text{rad}})\big{)} (158)

is a polynomial function of 𝒄{\boldsymbol{c}}. The set of polynomials φrad\varphi_{\text{rad}} with degree d\leq d for which 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}} corresponds to the zero level set of ff, i.e., those 𝒄{\boldsymbol{c}} such that f(𝒄)=0f({\boldsymbol{c}})=0. Obviously, f(0)=0f(0)=0, and taking the coefficients 𝒄0{\boldsymbol{c}}_{0} corresponding to φ0\varphi_{0} gives f(𝒄0)0f({\boldsymbol{c}}_{0})\neq 0, meaning ff is a nonconstant polynomial. Since each level set of a nonconstant polynomial is a set of measure zero (Caron and Traynor (2005)), it follows that the zero level set of ff has measure zero. Precisely the same argument works for φlin\varphi_{\text{lin}}, FlinF_{\text{lin}}, and 𝔤lin{\mathfrak{g}}_{\text{lin}}.  

Proof [Corollary 18] By Proposition 17, it suffices to find a degree-rr polynomial φrad\varphi_{\text{rad}} such that 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\subset{\mathfrak{g}}_{\text{rad}}. We choose φrad(z1,,zr)=z1+z22++zrr\varphi_{\text{rad}}(z_{1},\ldots,z_{r})=z_{1}+z_{2}^{2}+\cdots+z_{r}^{r}, giving

Frad(x)=k=1rxck22k.F_{\text{rad}}(x)=\sum_{k=1}^{r}\|x-c_{k}\|_{2}^{2k}. (159)

If ξ=[Sv00]𝔰𝔢(n)\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{se}}(n) generates a symmetry of FlinF_{\text{lin}} then

0=ξFrad(x)=k=1rkxck22(k1)(xck)T(Sx+v)xT(Sck+v)ckTvxn.0={\mathcal{L}}_{\xi}F_{\text{rad}}(x)=\sum_{k=1}^{r}k\|x-c_{k}\|_{2}^{2(k-1)}\underbrace{(x-c_{k})^{T}(Sx+v)}_{x^{T}(Sc_{k}+v)-c_{k}^{T}v}\qquad\forall x\in\mathbb{R}^{n}. (160)

The terms in this expression with highest degree in xx must vanish, yielding

0=rx22(r1)xT(Scr+v)xn.0=r\|x\|_{2}^{2(r-1)}x^{T}(Sc_{r}+v)\qquad\forall x\in\mathbb{R}^{n}. (161)

This implies that Scr+v=0Sc_{r}+v=0. Proceeding inductively, suppose that Scl+v=0Sc_{l}+v=0 for every l>kl>k. Then, vanishing the highest-degree term in (160) gives

0=kx22(k1)xT(Sck+v)xn,0=k\|x\|_{2}^{2(k-1)}x^{T}(Sc_{k}+v)\qquad\forall x\in\mathbb{R}^{n}, (162)

implying that Sck+v=0Sc_{k}+v=0. It follows that

Sck+v=0k=1,,r,Sc_{k}+v=0\qquad\forall k=1,\ldots,r, (163)

by induction, meaning that ξ𝔤rad\xi\in{\mathfrak{g}}_{\text{rad}}. Hence, 𝔰𝔶𝔪G(Frad)𝔤rad\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\subset{\mathfrak{g}}_{\text{rad}}, which completes the proof.  

Proof [Corollary 19] By Proposition 17, it suffices to find a quadratic polynomial φlin\varphi_{\text{lin}} such that 𝔰𝔶𝔪G(Flin)𝔤lin\operatorname{\mathfrak{sym}}_{G}(F_{\text{lin}})\subset{\mathfrak{g}}_{\text{lin}}. Letting D=diag[1,2,,r]D=\operatorname{diag}[1,2,\ldots,r] and U=[u1ur]U=\begin{bmatrix}u_{1}&\cdots&u_{r}\end{bmatrix}, consider the quadratic function φlin(z)=12zTDz\varphi_{\text{lin}}(z)=\frac{1}{2}z^{T}Dz, giving

Flin(x)=12xTUDUTx.F_{\text{lin}}(x)=\frac{1}{2}x^{T}UDU^{T}x. (164)

If ξ=[Sv00]𝔰𝔢(n)\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{se}}(n) generates a symmetry of FlinF_{\text{lin}} then

0=ξFlin(x)=xTUDUTSx+xTUDUTvxn.0={\mathcal{L}}_{\xi}F_{\text{lin}}(x)=x^{T}UDU^{T}Sx+x^{T}UDU^{T}v\qquad\forall x\in\mathbb{R}^{n}. (165)

Differentiating the above with respect to xx at x=0x=0 yields UDUTv=0UDU^{T}v=0, which, because UDUD is injective, means that UTv=0U^{T}v=0. The fact that xTUDUTSx=0x^{T}UDU^{T}Sx=0 for every xx means that UDUTS+STUDUT=0UDU^{T}S+S^{T}UDU^{T}=0, i.e.,

UDUTS=SUDUT.UDU^{T}S=SUDU^{T}. (166)

Letting the columns of UU_{\perp} span the orthogonal complement to columns of UU and expressing

S=US~1,1UT+US~1,2UT+US~2,1UT+US~2,2UT,S=U\tilde{S}_{1,1}U^{T}+U\tilde{S}_{1,2}U_{\perp}^{T}+U_{\perp}\tilde{S}_{2,1}U^{T}+U_{\perp}\tilde{S}_{2,2}U_{\perp}^{T}, (167)

the above commutation relation with UDUUDU gives

UDS~1,1UT+UDS~1,2UT=US~1,1DUT+US~2,1DUT.UD\tilde{S}_{1,1}U^{T}+UD\tilde{S}_{1,2}U_{\perp}^{T}=U\tilde{S}_{1,1}DU^{T}+U_{\perp}\tilde{S}_{2,1}DU^{T}. (168)

Multiplying on left and right by combinations of UTU^{T} or UTU_{\perp}^{T} and UU or UU_{\perp} extracts the relations

DS~1,1=S~1,1D,DS~1,2=0,S~2,1D=0.D\tilde{S}_{1,1}=\tilde{S}_{1,1}D,\qquad D\tilde{S}_{1,2}=0,\qquad\tilde{S}_{2,1}D=0. (169)

Since DD is invertible, we must have S~1,2=0\tilde{S}_{1,2}=0 and S~2,1=0\tilde{S}_{2,1}=0. Since ST=SS^{T}=-S, we must also have S~1,1T=S~1,1\tilde{S}_{1,1}^{T}=-\tilde{S}_{1,1}, meaning that its diagonal entries are identically zero. Considering the (j,k)(j,k) element of S~1,1\tilde{S}_{1,1} with jkj\neq k, we have

j[S~1,1]j,k=[DS~1,1]j,k=[S~1,1D]j,k=k[S~1,1]j,k,j[\tilde{S}_{1,1}]_{j,k}=[D\tilde{S}_{1,1}]_{j,k}=[\tilde{S}_{1,1}D]_{j,k}=k[\tilde{S}_{1,1}]_{j,k}, (170)

meaning that [S~1,1]j,k=0[\tilde{S}_{1,1}]_{j,k}=0. Therefore, only S~2,2\tilde{S}_{2,2} can be nonzero, which gives SU=0SU=0. Combined with the fact that UTv=0U^{T}v=0, we conclude that 𝔰𝔶𝔪G(Flin)𝔤lin\operatorname{\mathfrak{sym}}_{G}(F_{\text{lin}})\subset{\mathfrak{g}}_{\text{lin}}, completing the proof.  

Proof [Proposition 16] As an intersection of closed subgroups, H:=l=1LSymG(F(l))H:=\bigcap_{l=1}^{L}\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)} is a closed subgroup of GG. By the closed subgroup theorem (see Theorem 20.12 in Lee (2013)), HH is an embedded Lie subgroup, whose Lie subalgebra we denote by 𝔥\mathfrak{h}. If ξ𝔥\xi\in\mathfrak{h} then exp(tξ)SymG(F(l))\exp(t\xi)\in\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)} for all tt\in\mathbb{R} and every l=1,,Ll=1,\ldots,L. Differentiating 𝒦exp(tξ)F(l)=F(l){\mathcal{K}}_{\exp(t\xi)}F^{(l)}=F^{(l)} at t=0t=0 proves that ξF(l)=0{\mathcal{L}}_{\xi}F^{(l)}=0, i.e., ξ𝔰𝔶𝔪G(F(l))\xi\in\operatorname{\mathfrak{sym}}_{G}(F^{(l)}) by Theorem 4. Conversely, if ξF(l)=0{\mathcal{L}}_{\xi}F^{(l)}=0 for every l=1,,Ll=1,\ldots,L, then by Theorem 4, exp(tξ)H\exp(t\xi)\in H. Since HH is a Lie subgroup, differentiating exp(tξ)\exp(t\xi) at t=0t=0 proves that ξ𝔥\xi\in\mathfrak{h}.  

Proof [Proposition 20] Let f1,,fNf_{1},\ldots,f_{N} be a basis for {\mathcal{F}}^{\prime}. Consider the sequence of Gram matrices 𝑮M{\boldsymbol{G}}_{M} with entries

[𝑮M]i,j=fi,fjL2(μM).[{\boldsymbol{G}}_{M}]_{i,j}=\left\langle f_{i},\ f_{j}\right\rangle_{L^{2}(\mu_{M})}. (171)

It suffices to show that 𝑮M{\boldsymbol{G}}_{M} is positive-definite for sufficiently large MM. Since the L2(μ)L^{2}(\mu) inner product is positive-definite on {\mathcal{F}}^{\prime}, it follows that the Gram matrix 𝑮{\boldsymbol{G}} with entries

[𝑮]i,j=fi,fjL2(μ)[{\boldsymbol{G}}]_{i,j}=\left\langle f_{i},\ f_{j}\right\rangle_{L^{2}(\mu)} (172)

is positive-definite. Hence, its smallest eigenvalue λmin(𝑮)\lambda_{\text{min}}({\boldsymbol{G}}) is positive. Since the ordered eigenvalues of symmetric matrices are continuous with respect to their entries (see Corollary 4.3.15 in Horn and Johnson (2013)) and [𝑮M]i,j[𝑮]i,j[{\boldsymbol{G}}_{M}]_{i,j}\to[{\boldsymbol{G}}]_{i,j} for all 1i,jN1\leq i,j\leq N by assumption, we have λmin(𝑮M)λmin(𝑮)\lambda_{\text{min}}({\boldsymbol{G}}_{M})\to\lambda_{\text{min}}({\boldsymbol{G}}) as MM\to\infty. Therefore, there is an M0M_{0} so that for every MM0M\geq M_{0} we have λmin(𝑮M)>0\lambda_{\text{min}}({\boldsymbol{G}}_{M})>0, i.e., 𝑮M{\boldsymbol{G}}_{M} is positive-definite.  

Proof [Lemma 31] Using the fact that the integral is invariant under pullbacks by diffeomorphisms, we can express the left-hand-side of the equivariance condition in Definition 30 as

𝒦0,g𝒯K[𝒦1,g1F1,,𝒦r,g1Fr](p)=Θ0,g1{1××rK(θ0,g(p),q1,,qr)[Θ1,gF1θ1,g1(q1),,Θr,gFrθr,g1(qr)]dV1(q1)dVr(qr).}=Θ0,g1{1××rK(θ0,g(p),θ1,g(q1),,θr,g(qr))[Θ1,gF1(q1),,Θr,gFr(qr)]θ1,gdV1(q1)θr,gdVr(qr).}=1××r𝒦gEK(p,q1,,qr)[F1(q1),,Fr(qr)]θ1,gdV1(q1)θr,gdVr(qr).{\mathcal{K}}_{0,g}{\mathcal{T}}_{K}\big{[}{\mathcal{K}}_{1,g^{-1}}F_{1},\ldots,{\mathcal{K}}_{r,g^{-1}}F_{r}\big{]}(p)=\\ \Theta_{0,g^{-1}}\Bigg{\{}\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(\theta_{0,g}(p),q_{1},\ldots,q_{r})\big{[}\Theta_{1,g}\circ F_{1}\circ\theta_{1,g^{-1}}(q_{1}),\ldots,\Theta_{r,g}\circ F_{r}\circ\theta_{r,g^{-1}}(q_{r})\big{]}\\ \operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\operatorname{dV}_{r}(q_{r}).\Bigg{\}}=\\ \Theta_{0,g^{-1}}\Bigg{\{}\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(\theta_{0,g}(p),\theta_{1,g}(q_{1}),\ldots,\theta_{r,g}(q_{r}))\big{[}\Theta_{1,g}\circ F_{1}(q_{1}),\ldots,\Theta_{r,g}\circ F_{r}(q_{r})\big{]}\\ \theta_{1,g}^{*}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}(q_{r}).\Bigg{\}}=\\ \int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}{\mathcal{K}}^{E}_{g}K(p,q_{1},\ldots,q_{r})\big{[}F_{1}(q_{1}),\ldots,F_{r}(q_{r})\big{]}\theta_{1,g}^{*}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}(q_{r}). (173)

Hence, by comparing the integrand to (129), it is clear that (134) implies that 𝒯K{\mathcal{T}}_{K} is equivariant in the sense of Definition 30. Conversely, if 𝒯K{\mathcal{T}}_{K} is gg-equivariant, then

1××r𝒦gEK[F1,,Fr]θ1,gdV1θr,gdVr=1××rK[F1,,Fr]dV1dVr\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}{\mathcal{K}}^{E}_{g}K\big{[}F_{1},\ldots,F_{r}\big{]}\theta_{1,g}^{*}\operatorname{dV}_{1}\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}=\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K\big{[}F_{1},\ldots,F_{r}\big{]}\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r} (174)

holds for every (F1,,Fr)D(𝒯K)(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K}). Since the domain contains all smooth, compactly-supported fields (F1,,Fr)(F_{1},\ldots,F_{r}), it follows that (134) holds.  

Proof [Proposition 34] Consider a leaf {\mathcal{M}} of an mm-dimensional foliation on the nn-dimensional manifold 𝒩{\mathcal{N}} and let γ:[a,b]𝒩\gamma:[a,b]\to{\mathcal{N}} be a smooth curve satisfying γ((a,b))\gamma((a,b))\subset{\mathcal{M}}. First, it is clear that {\mathcal{M}} is a weakly embedded submanifold of 𝒩{\mathcal{N}} since {\mathcal{M}} is an integral manifold of an involutive distribution (Lee (2013, Proposition 19.19)) and the local structure theorem for integral manifolds (Lee (2013, Proposition 19.16)) shows that they are weakly embedded.

By continuity of γ\gamma, any neighborhood of γ(b)\gamma(b) in 𝒩{\mathcal{N}} must have nonempty intersection with {\mathcal{M}}. By definition of a foliation (see Lee (2013)), there is a coordinate chart (𝒰,𝒙)({\mathcal{U}},{\boldsymbol{x}}) for 𝒩{\mathcal{N}} with γ(b)𝒰\gamma(b)\in{\mathcal{U}} such that 𝒙(𝒰){\boldsymbol{x}}({\mathcal{U}}) is a coordinate-aligned cube in n\mathbb{R}^{n} and 𝒰{\mathcal{M}}\cap{\mathcal{U}} consists of countably many slices of the form xm+1=cm+1,,xn=cnx^{m+1}=c^{m+1},\ldots,x^{n}=c^{n} for constants cm+1,,cnc^{m+1},\ldots,c^{n}. Since γ\gamma is continuous, there is a δ>0\delta>0 so that γ((bδ,b])𝒰\gamma((b-\delta,b])\subset{\mathcal{U}}, and in particular, γ((bδ,b))𝒰\gamma((b-\delta,b))\subset{\mathcal{M}}\cap{\mathcal{U}}. By continuity of γ\gamma, there are constants cm+1,,cnc^{m+1},\ldots,c^{n} such that xi(γ(t))=cix^{i}(\gamma(t))=c^{i} for every i=m+1,,ni=m+1,\ldots,n and t(bδ,b)t\in(b-\delta,b). Hence, we have

xi(γ(b))=limtbxi(γ(t))=ci,i=m+1,,n,x^{i}(\gamma(b))=\lim_{t\to b}x^{i}(\gamma(t))=c^{i},\qquad i=m+1,\ldots,n, (175)

meaning that γ(b)\gamma(b)\in{\mathcal{M}}. An analogous argument shows that γ(a)\gamma(a)\in{\mathcal{M}}, completing the proof that {\mathcal{M}} is arcwise-closed.  

Appendix B Proof of Proposition 21

Our proof relies on the following lemma:

Lemma 42

Let 𝒫{\mathcal{P}} denote a finite-dimensional vector space of polynomials m\mathbb{R}^{m}\to\mathbb{R}. If Mdim(𝒫)M\geq\dim({\mathcal{P}}) then the evaluation map T:𝒫MT:{\mathcal{P}}\to\mathbb{R}^{M} defined by

T(x1,,xM):P(P(x1),,P(xM))T_{(x_{1},\ldots,x_{M})}:P\mapsto(P(x_{1}),\ldots,P(x_{M})) (176)

is injective for almost every (x1,,xM)(m)M(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M} with respect to Lebesgue measure.

Proof  Letting M0=dim(𝒫)M_{0}=\dim({\mathcal{P}}) and choosing a basis P1,,PM0P_{1},\ldots,P_{M_{0}} for 𝒫{\mathcal{P}}, injectivity of T(x1,,xM)T_{(x_{1},\ldots,x_{M})} is equivalent to injectivity of the M×M0M\times M_{0} matrix

𝑻(x1,,xM)=[P1(x1)PM0(x1)P1(xM)PM0(xM)].{\boldsymbol{T}}_{(x_{1},\ldots,x_{M})}=\begin{bmatrix}P_{1}(x_{1})&\cdots&P_{M_{0}}(x_{1})\\ \vdots&\ddots&\vdots\\ P_{1}(x_{M})&\cdots&P_{M_{0}}(x_{M})\end{bmatrix}. (177)

Finally, this is equivalent to

φ(x1,,xM)=det((𝑻(x1,,xM))T𝑻(x1,,xM))\varphi(x_{1},\ldots,x_{M})=\det\big{(}({\boldsymbol{T}}_{(x_{1},\ldots,x_{M})})^{T}{\boldsymbol{T}}_{(x_{1},\ldots,x_{M})}\big{)} (178)

taking a nonzero value. We observe that φ\varphi is a polynomial on the Euclidean space (m)M(\mathbb{R}^{m})^{M}.

Suppose there exists a set of points (x¯1,,x¯M)(m)M(\bar{x}_{1},\ldots,\bar{x}_{M})\in(\mathbb{R}^{m})^{M} such that T(x¯1,,x¯M)T_{(\bar{x}_{1},\ldots,\bar{x}_{M})} is injective. Then for this set φ(x¯1,,x¯M)0\varphi(\bar{x}_{1},\ldots,\bar{x}_{M})\neq 0. Obviously, φ(0,,0)=0\varphi(0,\ldots,0)=0, meaning that φ\varphi cannot be constant. Thanks to the main result in Caron and Traynor (2005), this means that each level set of φ\varphi has zero Lebesgue measure in (m)M(\mathbb{R}^{m})^{M}. In particular, the level set φ1(0)\varphi^{-1}(0), consisting of those x1,,xMx_{1},\ldots,x_{M} for which T(x1,,xM)T_{(x_{1},\ldots,x_{M})} fails to be injective, has zero Lebesgue measure. Therefore, it suffices to prove that there exists (x¯1,,x¯M)(m)M(\bar{x}_{1},\ldots,\bar{x}_{M})\in(\mathbb{R}^{m})^{M} such that T(x¯1,,x¯M)T_{(\bar{x}_{1},\ldots,\bar{x}_{M})} is injective. We do this by induction.

It is clear that there exists x¯1\bar{x}_{1} so that the 1×11\times 1 matrix

𝑻1=[P1(x¯1)]{\boldsymbol{T}}_{1}=\begin{bmatrix}P_{1}(\bar{x}_{1})\end{bmatrix} (179)

has full rank since P1P_{1} cannot be the zero polynomial. Proceeding by induction, we assume that there exists x¯1,,x¯s\bar{x}_{1},\ldots,\bar{x}_{s} so that

𝑻s=[P1(x¯1)Ps(x¯1)P1(x¯s)Ps(x¯s)]{\boldsymbol{T}}_{s}=\begin{bmatrix}P_{1}(\bar{x}_{1})&\cdots&P_{s}(\bar{x}_{1})\\ \vdots&\ddots&\vdots\\ P_{1}(\bar{x}_{s})&\cdots&P_{s}(\bar{x}_{s})\end{bmatrix} (180)

has full rank. Suppose that the matrix

𝑻~s+1(x)=[P1(x¯1)Ps(x¯1)Ps+1(x¯1)P1(x¯s)Ps(x¯s)Ps+1(x¯s)P1(x)Ps(x)Ps+1(x)]{\boldsymbol{\tilde{T}}}_{s+1}(x)=\begin{bmatrix}P_{1}(\bar{x}_{1})&\cdots&P_{s}(\bar{x}_{1})&P_{s+1}(\bar{x}_{1})\\ \vdots&\ddots&\vdots&\vdots\\ P_{1}(\bar{x}_{s})&\cdots&P_{s}(\bar{x}_{s})&P_{s+1}(\bar{x}_{s})\\ P_{1}(x)&\cdots&P_{s}(x)&P_{s+1}(x)\end{bmatrix} (181)

has rank <s+1<s+1 for every xmx\in\mathbb{R}^{m}. Since the upper left s×ss\times s block of 𝑻~s+1(x){\boldsymbol{\tilde{T}}}_{s+1}(x) is 𝑻s{\boldsymbol{T}}_{s}, we must always have rank(𝑻~s+1(x))=s\operatorname{rank}({\boldsymbol{\tilde{T}}}_{s+1}(x))=s. The nullspace of 𝑻~s+1(x){\boldsymbol{\tilde{T}}}_{s+1}(x) is contained in the nullspace of the upper s×(s+1)s\times(s+1) block of 𝑻~s+1(x){\boldsymbol{\tilde{T}}}_{s+1}(x). Since both nullspaces are one-dimensional, they are equal. The upper s×(s+1)s\times(s+1) block of 𝑻~s+1(x){\boldsymbol{\tilde{T}}}_{s+1}(x) does not depend on xx, so there is a fixed nonzero vector 𝒗s+1{\boldsymbol{v}}\in\mathbb{R}^{s+1} so that 𝑻~s+1(x)𝒗=𝟎{\boldsymbol{\tilde{T}}}_{s+1}(x){\boldsymbol{v}}={\boldsymbol{0}} for every xmx\in\mathbb{R}^{m}. The last row of this expression reads

v1P1(x)++vs+1Ps+1(x)=0xm,v_{1}P_{1}(x)+\cdots+v_{s+1}P_{s+1}(x)=0\qquad\forall x\in\mathbb{R}^{m}, (182)

contradicting the linear independence of P1,,Ps+1P_{1},\ldots,P_{s+1}. Therefore there exists x¯s+1\bar{x}_{s+1} so that 𝑻s+1=𝑻~s+1(x¯s+1){\boldsymbol{T}}_{s+1}={\boldsymbol{\tilde{T}}}_{s+1}(\bar{x}_{s+1}) has full rank. It follows by induction on ss that there exists x¯1,x¯M0m\bar{x}_{1},\ldots\bar{x}_{M_{0}}\in\mathbb{R}^{m} so that 𝑻(x¯1,x¯M0)=𝑻M0{\boldsymbol{T}}_{(\bar{x}_{1},\ldots\bar{x}_{M_{0}})}={\boldsymbol{T}}_{M_{0}} has full rank. Choosing any MM0M-M_{0} additional vectors yields an injective 𝑻(x¯1,x¯M){\boldsymbol{T}}_{(\bar{x}_{1},\ldots\bar{x}_{M})}, which completes the proof.  

Proof [Proposition 21] The sum in (89) clearly defines a symmetric, positive-semidefinite bilinear form on {\mathcal{F}}^{\prime}. It remains to show that this bilinear form is positive-definite. Suppose that there is a function ff\in{\mathcal{F}}^{\prime} such that f,fL2(μM)=0\langle f,f\rangle_{L^{2}(\mu_{M})}=0. Thanks to Lemma 42, our assumption that Mdim(πi())M\geq\dim(\pi_{i}({\mathcal{F}}^{\prime})) means that the evaluation operator T(x1,,xM)T_{(x_{1},\ldots,x_{M})} is injective on πi()\pi_{i}({\mathcal{F}}^{\prime}) for almost every (x1,,xM)(m)M(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M} with respect to Lebesgue measure. Since a countable (in this case finite) intersection of sets of measure zero has measure zero, it follows that for almost every (x1,,xM)(m)M(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M} with respect to Lebesgue measure, T(x1,,xM)T_{(x_{1},\ldots,x_{M})} is injective on every πi()\pi_{i}({\mathcal{F}}^{\prime}), i=1,,ni=1,\ldots,n. Defining the positive diagonal matrix

𝑫=1N[w1wM],{\boldsymbol{D}}=\frac{1}{\sqrt{N}}\begin{bmatrix}\sqrt{w_{1}}&&\\ &\ddots&\\ &&\sqrt{w_{M}}\end{bmatrix}, (183)

and using (89) yields

0=f,fL2(μM)=j=1n(𝑫T(x1,,xM)πjf)T𝑫T(x1,,xM)πjf.0=\langle f,f\rangle_{L^{2}(\mu_{M})}=\sum_{j=1}^{n}\big{(}{\boldsymbol{D}}T_{(x_{1},\ldots,x_{M})}\pi_{j}f\big{)}^{T}{\boldsymbol{D}}T_{(x_{1},\ldots,x_{M})}\pi_{j}f. (184)

This implies that T(x1,,xM)πjf=𝟎T_{(x_{1},\ldots,x_{M})}\pi_{j}f={\boldsymbol{0}} for j=1,,nj=1,\ldots,n. Since T(x1,,xM)T_{(x_{1},\ldots,x_{M})} is injective on each πj()\pi_{j}({\mathcal{F}}^{\prime}) it follows that each πjf=0\pi_{j}f=0, meaning that f=0f=0. This completes the proof.  

Appendix C Proof of Proposition 24

We begin by proving

ddt𝒦exp(tξ)F=𝒦exp(tξ)ξF=ξ𝒦exp(tξ)F,\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}F, (185)

for every FD(ξ)F\in D({\mathcal{L}}_{\xi)}. To prove the first equality, we choose pp\in{\mathcal{M}}, let p=θexp(t0ξ)(p)p^{\prime}=\theta_{\exp(t_{0}\xi)}(p), and compute

1t[(𝒦exp(t0ξ)𝒦exp(tξ)F)(p)(𝒦exp(t0ξ)F)(p)]\displaystyle\frac{1}{t}\left[\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p)-\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}F\big{)}(p)\right] =1tΘexp(t0ξ)(𝒦exp(tξ)FF)θexp(t0ξ)(p)\displaystyle=\frac{1}{t}\Theta_{\exp(-t_{0}\xi)}\circ\left({\mathcal{K}}_{\exp(t\xi)}F-F\right)\circ\theta_{\exp(t_{0}\xi)}(p) (186)
=Θexp(t0ξ)(1t[(𝒦exp(tξ)F)(p)F(p)]).\displaystyle=\Theta_{\exp(-t_{0}\xi)}\left(\frac{1}{t}\left[\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p^{\prime})-F(p^{\prime})\right]\right).

Here, we have used the composition law for the operators 𝒦gh=𝒦g𝒦h{\mathcal{K}}_{gh}={\mathcal{K}}_{g}{\mathcal{K}}_{h} and the fact that Θexp(t0ξ)\Theta_{\exp(-t_{0}\xi)} is fiber-linear. Taking the limit at t0t\to 0 yields

ddt|t=t0(𝒦exp(tξ)F)(p)=Θexp(t0ξ)(ξF(p))=(𝒦exp(t0ξ)ξF)(p),\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=t_{0}}\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p)=\Theta_{\exp(-t_{0}\xi)}\left({\mathcal{L}}_{\xi}F(p^{\prime})\right)=\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F\big{)}(p), (187)

which is the first equality in (185).

The second equality in (185) follows from

limt01t[𝒦exp(tξ)𝒦exp(t0ξ)F𝒦exp(t0ξ)F]=limt01t[𝒦exp(t0ξ)𝒦exp(tξ)F𝒦exp(t0ξ)F]=𝒦exp(t0ξ)ξF.\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t_{0}\xi)}F-{\mathcal{K}}_{\exp(t_{0}\xi)}F\right]=\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{K}}_{\exp(t\xi)}F-{\mathcal{K}}_{\exp(t_{0}\xi)}F\right]={\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F. (188)

This shows that 𝒦exp(t0ξ)FD(ξ){\mathcal{K}}_{\exp(t_{0}\xi)}F\in D({\mathcal{L}}_{\xi}) and ξ𝒦exp(t0ξ)F=𝒦exp(t0ξ)ξF{\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t_{0}\xi)}F={\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F.

Next, we prove

αξ+βηF=αξF+βηF,{\mathcal{L}}_{\alpha\xi+\beta\eta}F=\alpha{\mathcal{L}}_{\xi}F+\beta{\mathcal{L}}_{\eta}F, (189)

when FC1(,E)F\in C^{1}({\mathcal{M}},E). To do this, we choose pp\in{\mathcal{M}}, and define the map h:GEph:G\to E_{p} by

h:g𝒦gF(p)=Θ(F(θ(p,g)),g1).h:g\mapsto{\mathcal{K}}_{g}F(p)=\Theta\big{(}F(\theta(p,g)),g^{-1}\big{)}. (190)

As a composition of C1C^{1} maps, hh is C1C^{1}, and its derivative at the identity is

dh(e)ξe=ddt|t=0h(exp(tξ))=ξF(p)\operatorname{\mathrm{d}}h(e)\xi_{e}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}h(\exp(t\xi))={\mathcal{L}}_{\xi}F(p) (191)

for every ξeTeGLie(G)\xi_{e}\in T_{e}G\cong\operatorname{Lie}(G). Since the derivative is linear, it follows that ξξF(p)\xi\mapsto{\mathcal{L}}_{\xi}F(p) is linear.

Finally, we prove that

[ξ,η]F=12d2dt2|t=0𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)F=ξηFηξF,{\mathcal{L}}_{[\xi,\eta]}F=\frac{1}{2}\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F={\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}F, (192)

when FC2(,E)F\in C^{2}({\mathcal{M}},E). Recall that Flξt:ggexp(tξ)\operatorname{Fl}_{\xi}^{t}:g\mapsto g\cdot\exp(t\xi) gives the flow of the left-invariant vector field ξLie(G)\xi\in\operatorname{Lie}(G) (see Theorem 4.18(3) in Kolář et al. (1993)). By Theorem 3.16 in Kolář et al. (1993) the curve γ:G\gamma:\mathbb{R}\to G given by

γ(t)=FlηtFlξtFlηtFlξt(e)=exp(tξ)exp(tη)exp(tξ)exp(tη).\gamma(t)=\operatorname{Fl}_{-\eta}^{t}\circ\operatorname{Fl}_{-\xi}^{t}\circ\operatorname{Fl}_{\eta}^{t}\circ\operatorname{Fl}_{\xi}^{t}(e)=\exp(t\xi)\exp(t\eta)\exp(-t\xi)\exp(-t\eta). (193)

satisfies γ(0)=e\gamma(0)=e, γ(0)=0\gamma^{\prime}(0)=0, and

12γ′′(0)=[ξ,η]eTeG\frac{1}{2}\gamma^{\prime\prime}(0)=[\xi,\eta]_{e}\in T_{e}G (194)

in the sense that γ′′(0):f(fγ)′′(0)\gamma^{\prime\prime}(0):f\mapsto(f\circ\gamma)^{\prime\prime}(0) is a derivation on C(G)C^{\infty}(G), hence an element of TeGT_{e}G. Composing with the map in (190) yields

0=dh(e)γ(0)=(hγ)(0)=ddt|t=0𝒦γ(t)F(p).0=\operatorname{\mathrm{d}}h(e)\gamma^{\prime}(0)=(h\circ\gamma)^{\prime}(0)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F(p). (195)

Combining (194) and (191) (noting the definition of the tangent map dh(e)\operatorname{\mathrm{d}}h(e) acting on derivations, as in Kolář et al. (1993), Lee (2013)) gives

[ξ,η]F(p)=12dh(e)γ′′(0)=12(hγ)′′(0)=12d2dt2|t=0𝒦γ(t)F(p).{\mathcal{L}}_{[\xi,\eta]}F(p)=\frac{1}{2}\operatorname{\mathrm{d}}h(e)\gamma^{\prime\prime}(0)=\frac{1}{2}(h\circ\gamma)^{\prime\prime}(0)=\frac{1}{2}\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F(p). (196)

This proves the first equality in (192) thanks to the composition law

𝒦γ(t)=𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη).{\mathcal{K}}_{\gamma(t)}={\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}. (197)

To differentiate the above expression, we use the following observations. If FtΣ(E)F_{t}\in\operatorname{\Sigma}(E) is such that (t,p)Ft(p)(t,p)\mapsto F_{t}(p) is C2(×,E)C^{2}(\mathbb{R}\times{\mathcal{M}},E), then obviously ddtFtC1(,E)\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\in C^{1}({\mathcal{M}},E) with the usual identification TEpEpTE_{p}\cong E_{p}. Moreover, we have

ddt𝒦gFt(p)\displaystyle\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{g}F_{t}(p) =ddtΘg1(Ft(θg(p)))\displaystyle=\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\Theta_{g^{-1}}\big{(}F_{t}(\theta_{g}(p))\big{)} (198)
=Θg1(ddtFt(θg(p)))=𝒦g(ddtFt)(p)\displaystyle=\Theta_{g^{-1}}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}(\theta_{g}(p))\Big{)}={\mathcal{K}}_{g}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)

because Ft(θg(p))Eθg(p)F_{t}(\theta_{g}(p))\in E_{\theta_{g}(p)} for all tt\in\mathbb{R} and Θg1\Theta_{g^{-1}} is linear on Eθg(p)E_{\theta_{g}(p)}. Using this, we obtain

ddtξFt(p)\displaystyle\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{L}}_{\xi}F_{t}(p) =ddtddτ|τ=0𝒦exp(τξ)Ft(p)\displaystyle=\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p) (199)
=ddτ|τ=0ddt𝒦exp(τξ)Ft(p)\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right|_{\tau=0}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)
=ddτ|τ=0𝒦exp(τξ)(ddtFt)(p)=ξ(ddtFt)(p)\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)={\mathcal{L}}_{\xi}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)

because (t,τ)𝒦exp(τξ)Ft(p)(t,\tau)\mapsto{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p) lies in the vector space EpE_{p}, allowing us to exchanged the order of differentiation. Since (t1,t2,t3,t4)𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)F(p)(t_{1},t_{2},t_{3},t_{4})\mapsto{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F(p) lies in the vector space EpE_{p} for all (t1,t2,t3,t4)4(t_{1},t_{2},t_{3},t_{4})\in\mathbb{R}^{4}, we can apply the chain rule and (198) to obtain

ddt𝒦γ(t)F=t1|t1=t𝒦exp(t1ξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)F+𝒦exp(tξ)t2|t2=t𝒦exp(t2η)𝒦exp(tξ)𝒦exp(tη)F+𝒦exp(tξ)𝒦exp(tη)t3|t3=t𝒦exp(t3ξ)𝒦exp(tη)F+𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)t4|t4=t𝒦exp(t4η)F.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\gamma(t)}F=\left.\frac{\partial}{\partial t_{1}}\right|_{t_{1}=t}{\mathcal{K}}_{\exp(t_{1}\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}\left.\frac{\partial}{\partial t_{2}}\right|_{t_{2}=t}{\mathcal{K}}_{\exp(t_{2}\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}\left.\frac{\partial}{\partial t_{3}}\right|_{t_{3}=t}{\mathcal{K}}_{\exp(-t_{3}\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}\left.\frac{\partial}{\partial t_{4}}\right|_{t_{4}=t}{\mathcal{K}}_{\exp(-t_{4}\eta)}F. (200)

Using (185) gives

ddt𝒦γ(t)F=ξ𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)𝒦γ(t)F+𝒦exp(tξ)η𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)F+𝒦exp(tξ)𝒦exp(tη)ξ𝒦exp(tξ)𝒦exp(tη)F+𝒦exp(tξ)𝒦exp(tη)𝒦exp(tξ)𝒦exp(tη)𝒦γ(t)ηF.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\gamma(t)}F={\mathcal{L}}_{\xi}\overbrace{{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}}^{{\mathcal{K}}_{\gamma(t)}}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\eta}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{L}}_{-\xi}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +\underbrace{{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}}_{{\mathcal{K}}_{\gamma(t)}}{\mathcal{L}}_{-\eta}F. (201)

Applying the same technique to differentiate a second time and using the linearity in (189) to cancel terms yields

d2dt2|t=0𝒦γ(t)F=ξddt|t=0𝒦γ(t)F0+ξηF+ηξF+ηξF+ξηF2(ξηFηξF)+ddt|t=0𝒦γ(t)ηF0,\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F={\mathcal{L}}_{\xi}\underbrace{\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F}_{0}+\underbrace{{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F+{\mathcal{L}}_{\eta}{\mathcal{L}}_{-\xi}F+{\mathcal{L}}_{\eta}{\mathcal{L}}_{-\xi}F+{\mathcal{L}}_{-\xi}{\mathcal{L}}_{-\eta}F}_{2\big{(}{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}F\big{)}}\\ +\underbrace{\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}{\mathcal{L}}_{-\eta}F}_{0}, (202)

which completes the proof. \blacksquare

Appendix D Proof of Theorem 25

We begin by showing that SymG(F)\operatorname{Sym}_{G}(F) is a closed subgroup of GG. It is obviously a subgroup, for if g1,g2SymG(F)g_{1},g_{2}\in\operatorname{Sym}_{G}(F) then

𝒦g1g2F=𝒦g1𝒦g2F=𝒦g1F=F,{\mathcal{K}}_{g_{1}g_{2}}F={\mathcal{K}}_{g_{1}}{\mathcal{K}}_{g_{2}}F={\mathcal{K}}_{g_{1}}F=F, (203)

meaning that g1g2SymG(F)g_{1}g_{2}\in\operatorname{Sym}_{G}(F). To show that SymG(F)\operatorname{Sym}_{G}(F) is closed, we observe that for each pp\in{\mathcal{M}}, the map hp:GEh_{p}:G\to E defined by

hp:g𝒦gF(p)=Θ(F(θ(p,g)),g1)h_{p}:g\mapsto{\mathcal{K}}_{g}F(p)=\Theta\big{(}F(\theta(p,g)),g^{-1}\big{)} (204)

is continuous, as it is a composition of continuous maps. As F(p)F(p) is a single point in EE, the preimage set hp1({F(p)})h_{p}^{-1}\big{(}\{F(p)\}\big{)} is closed in GG. Since SymG(F)\operatorname{Sym}_{G}(F) is an intersection,

SymG(F)=php1({F(p)}),\operatorname{Sym}_{G}(F)=\bigcap_{p\in{\mathcal{M}}}h_{p}^{-1}\big{(}\{F(p)\}\big{)}, (205)

of closed sets, it follows that SymG(F)\operatorname{Sym}_{G}(F) is closed in GG. By the closed subgroup theorem (Theorem 20.12 in Lee (2013)) it follows that SymG(F)\operatorname{Sym}_{G}(F) is an embedded Lie subgroup of GG.

Let 𝔥=Lie(SymG(F)){\mathfrak{h}}=\operatorname{Lie}(\operatorname{Sym}_{G}(F)) be the Lie algebra of SymG(F)\operatorname{Sym}_{G}(F). Choosing any ξ𝔥\xi\in{\mathfrak{h}} we have exp(tξ)SymG(F)\exp(t\xi)\in\operatorname{Sym}_{G}(F) for every tt\in\mathbb{R}, yielding

limt01t[𝒦exp(tξ)FF]=0.\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t\xi)}F-F\right]=0. (206)

Hence, FD(ξ)F\in D({\mathcal{L}}_{\xi}) and ξF=0{\mathcal{L}}_{\xi}F=0, meaning that 𝔥𝔰𝔶𝔪G(F){\mathfrak{h}}\subset\operatorname{\mathfrak{sym}}_{G}(F), as defined by (101).

To show the reverse containment, choose ξ𝔰𝔶𝔪G(F)\xi\in\operatorname{\mathfrak{sym}}_{G}(F), meaning that FD(ξ)F\in D({\mathcal{L}}_{\xi}) and ξF=0{\mathcal{L}}_{\xi}F=0. We observe that (98) in Proposition 24 yields

ddt𝒦exp(tξ)F=𝒦exp(tξ)ξF=0t.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F=0\qquad\forall t\in\mathbb{R}. (207)

It follows that 𝒦exp(tξ)F=F{\mathcal{K}}_{\exp(t\xi)}F=F, that is, exp(tξ)SymG(F)\exp(t\xi)\in\operatorname{Sym}_{G}(F) for all tt\in\mathbb{R}. Differentiating at t=0t=0 proves that ξ𝔥\xi\in{\mathfrak{h}}. Therefore, 𝔥=𝔰𝔶𝔪G(F){\mathfrak{h}}=\operatorname{\mathfrak{sym}}_{G}(F), which completes the proof. \blacksquare

Appendix E Proof of Theorem 26

If FC(,E)F\in C({\mathcal{M}},E) is G0G_{0}-equivariant, then 𝒦exp(tξ)F=F{\mathcal{K}}_{\exp(t\xi)}F=F for all ξLie(G)\xi\in\operatorname{Lie}(G) and tt\in\mathbb{R}. Differentiating with respect to tt at t=0t=0 gives ξF=0{\mathcal{L}}_{\xi}F=0.

Conversely, suppose that ξiF=0{\mathcal{L}}_{\xi_{i}}F=0 for a collection of generators ξ1,,ξq\xi_{1},\ldots,\xi_{q} of Lie(G)\operatorname{Lie}(G). By Theorem 25, SymG(F)\operatorname{Sym}_{G}(F) is a closed Lie subgroup of GG whose Lie subalgebra 𝔰𝔶𝔪G(F)\operatorname{\mathfrak{sym}}_{G}(F) contains ξ1,,ξq\xi_{1},\ldots,\xi_{q}. Since ξ1,,ξq\xi_{1},\ldots,\xi_{q} generate Lie(G)\operatorname{Lie}(G), it follows that 𝔰𝔶𝔪G(F)=Lie(G)\operatorname{\mathfrak{sym}}_{G}(F)=\operatorname{Lie}(G). This means that G0SymG(F)G_{0}\subset\operatorname{Sym}_{G}(F) due to the correspondence between connected Lie subgroups and their Lie subalgebras established by Theorem 19.26 in Lee (2013). Specifically, the identity component of SymG(F)\operatorname{Sym}_{G}(F) must correspond to G0G_{0} since both are connected Lie subgroups of GG with identical Lie subalgebras.

Now, let us suppose in addition that 𝒦gjF=F{\mathcal{K}}_{g_{j}}F=F for an element gjg_{j} from each non-identity component GjG_{j}, j=1,,nG1j=1,\ldots,n_{G}-1 of GG. By Proposition 7.15 in Lee (2013), G0G_{0} is a normal subgroup of GG and every connected component GjG_{j} of GG is diffeomorphic to G0G_{0}. In fact in the proof of this result it is shown that every connected component of GjG_{j} is a coset of G0G_{0}, meaning that Gj=G0gjG_{j}=G_{0}\cdot g_{j}. Choosing any gGjg\in G_{j} there is an element g0G0g_{0}\in G_{0} such that g=g0gjg=g_{0}\cdot g_{j} and we obtain

𝒦gF=𝒦g0𝒦gjF=𝒦g0F=F.{\mathcal{K}}_{g}F={\mathcal{K}}_{g_{0}}{\mathcal{K}}_{g_{j}}F={\mathcal{K}}_{g_{0}}F=F. (208)

This completes the proof because G=j=0nG1GjG=\bigcup_{j=0}^{n_{G}-1}G_{j}. \blacksquare

Appendix F Proof of Theorem 35

Our proof of the theorem relies on the following technical lemma concerning the integral curves of vector fields tangent to weakly embedded, arcwise-closed submanifolds.

Lemma 43

Let {\mathcal{M}} be an arcwise-closed weakly embedded submanifold of a manifold 𝒩{\mathcal{N}}. Let V𝔛(𝒩)V\in\mathfrak{X}({\mathcal{N}}) be a vector field tangent to {\mathcal{M}}, that is

VpTpp.V_{p}\in T_{p}{\mathcal{M}}\qquad\forall p\in{\mathcal{M}}. (209)

If γ:I𝒩\gamma:I\to{\mathcal{N}} is a maximal integral curve of VV that intersects {\mathcal{M}}, then γ\gamma lies in {\mathcal{M}}.

Proof  By the translation lemma (Lemma 9.4 in Lee (2013)), we can assume without loss of generality that 0I0\in I and p0=γ(0)p_{0}=\gamma(0)\in{\mathcal{M}}. Let ı:𝒩\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}} denote the inclusion map. Since {\mathcal{M}} is an immersed submanifold of 𝒩{\mathcal{N}} and VV is tangent to {\mathcal{M}}, there is a unique smooth vector field V|𝔛()V|_{{\mathcal{M}}}\in\mathfrak{X}({\mathcal{M}}) that is ı\imath_{{\mathcal{M}}}-related to VV thanks to Proposition 8.23 in Lee (2013). Let γ~:I~\tilde{\gamma}:\tilde{I}\to{\mathcal{M}} be the maximal integral curve of V|V|_{{\mathcal{M}}} with γ~(0)=p0\tilde{\gamma}(0)=p_{0}. By the naturality of integral curves (Proposition 9.6 in Lee (2013)) ıγ~\imath_{{\mathcal{M}}}\circ\tilde{\gamma} is an integral curve of VV with ıγ~(0)=p0\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(0)=p_{0}. Since integral curves of smooth vector fields starting at the same point are unique (Theorem 9.12, part (a) in Lee (2013)) we have I~I\tilde{I}\subset I and

ıγ~(t)=γ(t)tI~.\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(t)=\gamma(t)\qquad\forall t\in\tilde{I}. (210)

Therefore, it remains to show that I~=I\tilde{I}=I.

By the local existence of integral curves (Proposition 9.2 in Lee (2013)), the domains II and I~\tilde{I} of the maximal integral curves γ\gamma and γ~\tilde{\gamma} are open intervals in \mathbb{R}. Suppose, for the sake of producing a contradiction, that there exists tIt\in I with t>I~t>\tilde{I}. Then it follows that the least upper bound b=supI~b=\sup\tilde{I} is an element of II. By (210) and continuity of γ\gamma we have

q0=γ(b)=limtbıγ~(t).q_{0}=\gamma(b)=\lim_{t\to b}\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(t). (211)

Since {\mathcal{M}} is arcwise-closed, it follows that q0q_{0}\in{\mathcal{M}}.

To complete the proof, we use the local existence of an integral curve for V|V|_{{\mathcal{M}}} starting at q0q_{0} to contradict the maximality of γ~\tilde{\gamma}. By the local existence of integral curves (Proposition 9.2 in Lee (2013)) and the translation lemma (Lemma 9.4 in Lee (2013)), there is an ε>0\varepsilon>0 and an integral curve γ^:(bε,b+ε)\hat{\gamma}:(b-\varepsilon,b+\varepsilon)\to{\mathcal{M}} of V|V|_{{\mathcal{M}}} such that γ^(b)=q0=γ(b)\hat{\gamma}(b)=q_{0}=\gamma(b). Shrinking the interval, we take 0<ε<ba0<\varepsilon<b-a. Again, by nauturality and uniqueness of integral curves we must have ıγ^(t)=γ(t)\imath_{{\mathcal{M}}}\circ\hat{\gamma}(t)=\gamma(t) for all t(bε,b+ε)t\in(b-\varepsilon,b+\varepsilon). Hence, by (210) and injectivity of ı\imath_{{\mathcal{M}}} it follows that γ^(t)=γ~(t)\hat{\gamma}(t)=\tilde{\gamma}(t) for all t(bε,b)t\in(b-\varepsilon,b). Applying the gluing lemma (Corollary 2.8 in Lee (2013)) to γ~\tilde{\gamma} and γ^\hat{\gamma} yields an extension of γ~\tilde{\gamma} to the larger open interval I~(bε,b+ε)\tilde{I}\cup(b-\varepsilon,b+\varepsilon). Since this contradicts the maximality of γ~\tilde{\gamma}, there is no tIt\in I with t>I~t>\tilde{I}. The same argument shows that there is no tIt\in I with t<I~t<\tilde{I}, and so we must have I~=I\tilde{I}=I.  

We also use the following lemma describing the elements of a Lie group that can be constructed from products of exponentials.

Lemma 44

Let G0G_{0} be the identity component of a Lie group GG. Then every element gG0g\in G_{0} can be expressed as a finite product g=hmh1g=h_{m}\cdots h_{1} of elements hi=exp(ξi)h_{i}=\exp(\xi_{i}) for ξiLie(G)\xi_{i}\in\operatorname{Lie}(G). Let GiG_{i} be a connected component of GG and let giGig_{i}\in G_{i}. Then every element gGig\in G_{i} can be expressed as g=g0gig=g_{0}g_{i} for some g0G0g_{0}\in G_{0}.

Proof  By the inverse function theorem (more specifically by Proposition 20.8(f) in Lee (2013)), the range of the exponential map contains an open, connected neighborhood 𝒰{\mathcal{U}} of the identity element eGe\in G. The inverses of the elements in 𝒰{\mathcal{U}} also belong to the range of the exponential map thanks to Proposition 20.8(c) in Lee (2013). By Proposition 7.14(b) and Proposition 7.15 in Lee (2013), it follows that 𝒰{\mathcal{U}} generates the identity component G0G_{0} of GG. That is, any element gG0g\in G_{0} can be written as a finite product of elements in 𝒰{\mathcal{U}} and their inverses, which proves the first claim.

By Proposition 7.15 in Lee (2013), G0G_{0} is a normal subgroup of GG and every connected component of GG is diffeomorphic to G0G_{0}. In fact in the proof of this result it is shown that every connected component of GG is a coset of G0G_{0}. Therefore, if GiG_{i} is a non-identity connected component of GG and giGig_{i}\in G_{i} then Gi=G0giG_{i}=G_{0}\cdot g_{i}, which proves the second claim.  

Proof [Theorem 35] The set 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) is a subspace of Lie(G)\operatorname{Lie}(G), for if ξ1,ξ2𝔰𝔶𝔪G()\xi_{1},\xi_{2}\in\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) and a1,a2a_{1},a_{2}\in\mathbb{R} then

θ^(a1ξ1+a2ξ2)p=a1θ^(ξ1)Tp+a2θ^(ξ2)TpTp\hat{\theta}(a_{1}\xi_{1}+a_{2}\xi_{2})_{p}=a_{1}\underbrace{\hat{\theta}(\xi_{1})}_{\in T_{p}{\mathcal{M}}}+a_{2}\underbrace{\hat{\theta}(\xi_{2})}_{\in T_{p}{\mathcal{M}}}\in T_{p}{\mathcal{M}} (212)

thanks to linearity of the infinitesimal generator θ^\hat{\theta}. To show that 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) is a Lie subalgrebra, we must show that it is also closed under the Lie bracket. Recall that θ^\hat{\theta} is a Lie algebra homomorphism (see Theorem 20.15 in Lee (2013)), and so θ^([ξ1,ξ2])=[θ^(ξ1),θ^(ξ1)]\hat{\theta}([\xi_{1},\xi_{2}])=[\hat{\theta}(\xi_{1}),\hat{\theta}(\xi_{1})]. Since the Lie bracket of two vector fields tangent to an immersed submanifold is also tangent to the submanifold (see Corollary 8.32 in Lee (2013)), it follows that [θ^(ξ1),θ^(ξ1)][\hat{\theta}(\xi_{1}),\hat{\theta}(\xi_{1})] is tangent to {\mathcal{M}}. Hence, 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) is closed under the Lie bracket and is therefore a Lie subalgebra of Lie(G)\operatorname{Lie}(G). By Theorem 19.26 in Lee (2013), there is a unique connected Lie subgroup of HGH\subset G whose Lie subalgebra is 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}).

Now suppose that {\mathcal{M}} is weakly embedded and arcwise-closed in 𝒩{\mathcal{N}}. We aim to show that H{\mathcal{M}}\cdot H\subset{\mathcal{M}}. Choosing any ξ𝔰𝔶𝔪G()\xi\in\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}), Lemma 20.14 in Lee (2013) shows that ξ\xi, regarded as a left-invariant vector field on GG, and ξ^=θ^(ξ)\hat{\xi}=\hat{\theta}(\xi) are θ(p)\theta^{(p)}-related for every p𝒩p\in{\mathcal{N}}. By the naturality of integral curves (Proposition 9.6 in Lee (2013)) it follows that γξ(p):𝒩\gamma_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{N}} defined by

γξ(p)(t)=pexp(tξ)\gamma_{\xi}^{(p)}(t)=p\cdot\exp(t\xi) (213)

is the unique maximal integral curve of ξ^\hat{\xi} passing through pp at t=0t=0. When pp\in{\mathcal{M}}, this integral curve lies in {\mathcal{M}} thanks to Lemma 43. This means that {\mathcal{M}} is invariant under the action of any group element in the range of the exponential map restricted to 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}). Proceeding by induction, suppose that {\mathcal{M}} is invariant under the action of any product of mm such elements. If g=h1hmhm+1g=h_{1}\cdots h_{m}\cdot h_{m+1} is a product of m+1m+1 elements hiexp(𝔰𝔶𝔪G())Hh_{i}\in\exp\big{(}\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})\big{)}\subset H, then it follows from associativity and the induction hypothesis that

(h1hmhm+1)=(h1hm)hm+1hm+1.{\mathcal{M}}\cdot(h_{1}\cdots h_{m}\cdot h_{m+1})=({\mathcal{M}}\cdot h_{1}\cdots h_{m})\cdot h_{m+1}\subset{\mathcal{M}}\cdot h_{m+1}\subset{\mathcal{M}}. (214)

Therefore, {\mathcal{M}} is invariant under the action of any finite product of group elements in exp(𝔰𝔶𝔪G())\exp\big{(}\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})\big{)} by induction on mm. By Lemma 44, it follows that {\mathcal{M}} is HH-invariant, proving claim (i).

To prove claim (ii), suppose that H~\tilde{H} is another connected Lie subgroup of GG such that H~{\mathcal{M}}\cdot\tilde{H}\subset{\mathcal{M}}. Choosing any pp\in{\mathcal{M}} and ξLie(H~)\xi\in\operatorname{Lie}(\tilde{H}), we have

pexp(tξ)t.p\cdot\exp(t\xi)\in{\mathcal{M}}\qquad\forall t\in\mathbb{R}. (215)

Since {\mathcal{M}} is weakly embedded in 𝒩{\mathcal{N}}, this defines a smooth curve γ:\gamma:\mathbb{R}\to{\mathcal{M}} such that ıγ(t)=pexp(tξ)\imath_{{\mathcal{M}}}\circ\gamma(t)=p\cdot\exp(t\xi), where ı:𝒩\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}} is the inclusion map. Differentiating and using the definition of the infinitesimal generator gives

θ^(ξ)p=ddt|t=0pexp(tξ)=dı(p)ddt|t=0γ(t)Tp.\hat{\theta}(\xi)_{p}=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}\right|_{t=0}p\cdot\exp(t\xi)=\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}\right|_{t=0}\gamma(t)\in T_{p}{\mathcal{M}}. (216)

Therefore, Lie(H~)𝔰𝔶𝔪G()\operatorname{Lie}(\tilde{H})\subset\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) which implies that H~H\tilde{H}\subset H by Theorem 19.26 in Lee (2013), establishing claim (ii).

Now suppose that {\mathcal{M}} is properly embedded in 𝒩{\mathcal{N}} and denote

SymG()={gG:g}=p(θ(p))1().\operatorname{Sym}_{G}({\mathcal{M}})=\{g\in G\ :{\mathcal{M}}\cdot g\subset{\mathcal{M}}\}=\bigcap_{p\in{\mathcal{M}}}\big{(}\theta^{(p)}\big{)}^{-1}({\mathcal{M}}). (217)

The equality of these expressions is a simple matter of unwinding their definitions. It is clear that SymG()\operatorname{Sym}_{G}({\mathcal{M}}) is a subgroup of GG, for if g1,g2SymG()g_{1},g_{2}\in\operatorname{Sym}_{G}({\mathcal{M}}) then the composition law for the group action gives (g1g2)=(g1)g2g1{\mathcal{M}}\cdot(g_{1}\cdot g_{2})=({\mathcal{M}}\cdot g_{1})\cdot g_{2}\subset{\mathcal{M}}\cdot g_{1}\subset{\mathcal{M}}. Since {\mathcal{M}} is properly embedded, it is closed in {\mathcal{M}} (see Lee (2013, Proposition 5.5)), meaning that each preimge set (θ(p))1()\big{(}\theta^{(p)}\big{)}^{-1}({\mathcal{M}}) is closed in GG by continuity of θ(p)\theta^{(p)}. As an intersection of closed subsets, it follows that SymG()\operatorname{Sym}_{G}({\mathcal{M}}) is closed in GG. By the closed subgroup theorem (Lee (2013, Theorem 20.12)), SymG()\operatorname{Sym}_{G}({\mathcal{M}}) is a properly embedded Lie subgroup of GG. The same holds for the identity component SymG()0\operatorname{Sym}_{G}({\mathcal{M}})_{0} of SymG()\operatorname{Sym}_{G}({\mathcal{M}}) since SymG()0\operatorname{Sym}_{G}({\mathcal{M}})_{0} is closed in SymG()\operatorname{Sym}_{G}({\mathcal{M}}), which implies that SymG()0\operatorname{Sym}_{G}({\mathcal{M}})_{0} is closed in GG.

Finally, we show that H=SymG()0H=\operatorname{Sym}_{G}({\mathcal{M}})_{0} is the identity component of SymG()\operatorname{Sym}_{G}({\mathcal{M}}). First, we observe that HSymG()0H\subset\operatorname{Sym}_{G}({\mathcal{M}})_{0} because HH is connected and contained in SymG()\operatorname{Sym}_{G}({\mathcal{M}}). The reverse containment follows from the fact that SymG()0\operatorname{Sym}_{G}({\mathcal{M}})_{0} is a connected Lie subgroup satisfying SymG()0{\mathcal{M}}\cdot\operatorname{Sym}_{G}({\mathcal{M}})_{0}\subset{\mathcal{M}}, which by our earlier result implies that SymG()0H\operatorname{Sym}_{G}({\mathcal{M}})_{0}\subset H.  

Appendix G Proof of Theorem 36

First, suppose that {\mathcal{M}} is G0G_{0}-invariant. In particular, this means that for every pp\in{\mathcal{M}} and ξLie(G)\xi\in\operatorname{Lie}(G), the smooth curve γξ(p):𝒩\gamma_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{N}} defined by

γξ(p)(t)=pexp(tξ)\gamma_{\xi}^{(p)}(t)=p\cdot\exp(t\xi) (218)

lies in {\mathcal{M}}. Since {\mathcal{M}} is weakly embedded in 𝒩{\mathcal{N}}, γξ(p)\gamma_{\xi}^{(p)} is also smooth as a map into {\mathcal{M}}. Specifically, there is a smooth curve γ~ξ(p):\tilde{\gamma}_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{M}} so that γξ(p)=ıγ~ξ(p)\gamma_{\xi}^{(p)}=\imath_{{\mathcal{M}}}\circ\tilde{\gamma}_{\xi}^{(p)} where ı:𝒩\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}} is the inclusion map. Differentiating at t=0t=0 yields

θ^(ξ)p=ddtγξ(p)(t)|t=0=dı(p)ddtγ~ξ(p)(t)|t=0,\hat{\theta}(\xi)_{p}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\gamma_{\xi}^{(p)}(t)\right|_{t=0}=\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\tilde{\gamma}_{\xi}^{(p)}(t)\right|_{t=0}, (219)

which lies in Tp=Range(dı(p))T_{p}{\mathcal{M}}=\operatorname{Range}\left(\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\right). In particular, θ^(ξi)pTp\hat{\theta}(\xi_{i})_{p}\in T_{p}{\mathcal{M}} for every pp\in{\mathcal{M}} and i=1,,qi=1,\ldots,q.

Conversely, suppose that the tangency condition expressed in (139) holds. By Theorem 35, the elements ξ1,,ξq\xi_{1},\ldots,\xi_{q} belong to the Lie subalgebra 𝔰𝔶𝔪G()\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}}) of the largest connected Lie subgroup HGH\subset G of symmetries of {\mathcal{M}}. Since ξ1,,ξq\xi_{1},\ldots,\xi_{q} generate Lie(G)\operatorname{Lie}(G), it follows that 𝔰𝔶𝔪G()=Lie(G)\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\operatorname{Lie}(G). Therefore, by Theorem 19.26 in Lee (2013), we obtain H=G0H=G_{0} because both are connected Lie subgroups of GG with identical Lie subalgebras.

Finally, suppose, in addition, that gj{\mathcal{M}}\cdot g_{j} for an element gjg_{j} from each non-identity component GjG_{j} of GG. By Lemma 44, if gGjg\in G_{j} then there is an element g0G0g_{0}\in G_{0} such that g=g0gjg=g_{0}\cdot g_{j}. Therefore, we obtain

g=g0gjgj,{\mathcal{M}}\cdot g={\mathcal{M}}\cdot g_{0}\cdot g_{j}\subset{\mathcal{M}}\cdot g_{j}\subset{\mathcal{M}}, (220)

which completes the proof because G=j=0nG1GjG=\bigcup_{j=0}^{n_{G}-1}G_{j}. \blacksquare

Appendix H Proof of Theorem 39

Proof [Lemma 38] The map ıF\imath_{F} defined in a local trivialization Φ\Phi by (140) is injective. It is a vector bundle homomorphism because dΦıFΦ1\operatorname{\mathrm{d}}\Phi\circ\imath_{F}\circ\Phi^{-1}, Φ\Phi, and dΦ\operatorname{\mathrm{d}}\Phi are vector bundle homomorphisms and Φ\Phi and dΦ\operatorname{\mathrm{d}}\Phi are invertible. It remains to show that the definition of ıF\imath_{F} does not depend on the choice of local trivialization. Given two local trivializations Φ\Phi and Φ~\tilde{\Phi} defined on π1(𝒰)E\pi^{-1}({\mathcal{U}})\subset E where 𝒰{\mathcal{U}} is an open subset of {\mathcal{M}}, it suffices to show that the following diagram commutes:

𝒰×k{{{\mathcal{U}}\times\mathbb{R}^{k}}}π1(𝒰)E{{\pi^{-1}({\mathcal{U}})\subset E}}𝒰×k{{{\mathcal{U}}\times\mathbb{R}^{k}}}T(𝒰×k){{T({\mathcal{U}}\times\mathbb{R}^{k})}}T(π1(𝒰))TE{{T(\pi^{-1}({\mathcal{U}}))\subset TE}}T(𝒰×k){{T({\mathcal{U}}\times\mathbb{R}^{k})}}ıF\scriptstyle{\imath_{F}}ıΦF\scriptstyle{\imath_{\Phi\circ F}}ıΦ~F\scriptstyle{\imath_{\tilde{\Phi}\circ F}}Φ\scriptstyle{\Phi}dΦ\scriptstyle{\operatorname{\mathrm{d}}\Phi}Φ~\scriptstyle{\tilde{\Phi}}dΦ~\scriptstyle{\operatorname{\mathrm{d}}\tilde{\Phi}} (221)

Since Φ~Φ1\tilde{\Phi}\circ\Phi^{-1} is a bundle homomorphism descending to the identity, it can be written as

Φ~Φ1:(p,𝒗)(p,𝑻(p)𝒗)\tilde{\Phi}\circ\Phi^{-1}:(p,{\boldsymbol{v}})\mapsto(p,{\boldsymbol{T}}(p){\boldsymbol{v}}) (222)

for a matrix-valued function 𝑻:𝒰k×k{\boldsymbol{T}}:{\mathcal{U}}\to\mathbb{R}^{k\times k}. Moreover, the matrices are invertible because the local trivializations are bundle isomorphisms. Differentiating, we obtain

dΦ~dΦ1:(wp,𝒘)(p,𝒗)(wp,d𝑻(p)wp+𝑻(p)𝒘)Φ~Φ1(p,𝒗),\operatorname{\mathrm{d}}\tilde{\Phi}\circ\operatorname{\mathrm{d}}\Phi^{-1}:(w_{p},{\boldsymbol{w}})_{(p,{\boldsymbol{v}})}\mapsto\big{(}w_{p},\operatorname{\mathrm{d}}{\boldsymbol{T}}(p)w_{p}+{\boldsymbol{T}}(p){\boldsymbol{w}}\big{)}_{\tilde{\Phi}\circ\Phi^{-1}(p,{\boldsymbol{v}})}, (223)

where wpTp𝒰w_{p}\in T_{p}{\mathcal{U}}. Composing this with ıΦF:(p,𝒗)(0,𝒗)Φ(F(p))\imath_{\Phi\circ F}:(p,{\boldsymbol{v}})\mapsto(0,{\boldsymbol{v}})_{\Phi(F(p))}, we obtain

dΦ~dΦ1ıΦF(p,𝒗)=(0,𝑻(p)𝒗)Φ~(F(p))=ıΦ~F(p,𝑻(p)𝒗)=ıΦ~FΦ~Φ1(p,𝒗),\operatorname{\mathrm{d}}\tilde{\Phi}\circ\operatorname{\mathrm{d}}\Phi^{-1}\circ\imath_{\Phi\circ F}(p,{\boldsymbol{v}})=(0,{\boldsymbol{T}}(p){\boldsymbol{v}})_{\tilde{\Phi}(F(p))}=\imath_{\tilde{\Phi}\circ F}(p,{\boldsymbol{T}}(p){\boldsymbol{v}})=\imath_{\tilde{\Phi}\circ F}\circ\tilde{\Phi}\circ\Phi^{-1}(p,{\boldsymbol{v}}), (224)

proving that the diagram commutes.  

Proof [Theorem 39] We observe that Fπ:EEF\circ\pi:E\to E is a smooth idempotent map whose image is im(F)E\operatorname{im}(F)\subset E. By differentiating the expression (Fπ)(Fπ)=Fπ(F\circ\pi)\circ(F\circ\pi)=F\circ\pi at a point F(p)im(F)F(p)\in\operatorname{im}(F), we obtain

d(Fπ)(F(p))d(Fπ)(F(p))=d(Fπ)(F(p)),\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\operatorname{\mathrm{d}}(F\circ\pi)(F(p))=\operatorname{\mathrm{d}}(F\circ\pi)(F(p)), (225)

meaning that d(Fπ)(F(p)):TF(p)ETF(p)E\operatorname{\mathrm{d}}(F\circ\pi)(F(p)):T_{F(p)}E\to T_{F(p)}E is a linear projection. Since

d(Fπ)(F(p))=dF(p)dπ(F(p)),\operatorname{\mathrm{d}}(F\circ\pi)(F(p))=\operatorname{\mathrm{d}}F(p)\operatorname{\mathrm{d}}\pi(F(p)), (226)

we have Range(d(Fπ)(F(p)))Range(dF(p))=TF(p)im(F)\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}\subset\operatorname{Range}(\operatorname{\mathrm{d}}F(p))=T_{F(p)}\operatorname{im}(F). Differentiating F=(Fπ)FF=(F\circ\pi)\circ F yields

dF(p)=d(Fπ)(F(p))dF(p),\operatorname{\mathrm{d}}F(p)=\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\operatorname{\mathrm{d}}F(p), (227)

meaning that TF(p)im(F)Range(d(Fπ)(F(p)))T_{F(p)}\operatorname{im}(F)\subset\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}. Since Range(d(Fπ)(F(p)))=TF(p)im(F)\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}=T_{F(p)}\operatorname{im}(F) it follows that d(Fπ)(F(p))\operatorname{\mathrm{d}}(F\circ\pi)(F(p)) is a linear projection onto TF(p)im(F)T_{F(p)}\operatorname{im}(F).

We observe that the generalized Lie derivative in (97) can be expressed as

(ξF)(p)\displaystyle({\mathcal{L}}_{\xi}F)(p) =limt01t[Θexp(tξ)(F(θexp(tξ)(p)))F(p)]\displaystyle=\lim_{t\to 0}\frac{1}{t}\left[\Theta_{\exp(-t\xi)}(F(\theta_{\exp(t\xi)}(p)))-F(p)\right] (228)
=limt0Θ(exp(tξ),1t[F(θexp(tξ)(p))Θexp(tξ)(F(p))])\displaystyle=\lim_{t\to 0}\Theta\left(\exp(-t\xi),\ \frac{1}{t}\left[F(\theta_{\exp(t\xi)}(p))-\Theta_{\exp(t\xi)}(F(p))\right]\right)
=limt01t[F(θexp(tξ)(p))Θexp(tξ)(F(p))].\displaystyle=\lim_{t\to 0}\frac{1}{t}\left[F(\theta_{\exp(t\xi)}(p))-\Theta_{\exp(t\xi)}(F(p))\right].

The first equality follows because Θg1\Theta_{g^{-1}} is a vector bundle homomorphism, meaning that the restricted map Θg1|Epg:EpgEp\Theta_{g^{-1}}|_{E_{p\cdot g}}:E_{p\cdot g}\to E_{p} is linear; here g=exp(tξ)g=\exp(t\xi). The second equality follows because Θ:E×GE\Theta:E\times G\to E is continuous. Note that in the first expression the limit is taken in the vector space EpE_{p}, whereas in the last expression the limit must be taken in EE.

We proceed by expressing everything in a local trivialization Φ:π1(𝒰)𝒰×k\Phi:\pi^{-1}({\mathcal{U}})\to{\mathcal{U}}\times\mathbb{R}^{k} of an open neighborhood 𝒰{\mathcal{U}}\subset{\mathcal{M}} of pp\in{\mathcal{M}}. Since the maps Θg\Theta_{g}, Φ\Phi, and Φ1\Phi^{-1} are vector bundle homomorphisms, there is a matrix-valued function 𝑻g:𝒰k×k{\boldsymbol{T}}_{g}:{\mathcal{U}}\to\mathbb{R}^{k\times k} such that

Θ~g=ΦΘgΦ1:(p,𝒗)(θg(p),𝑻g(p)𝒗).\tilde{\Theta}_{g}=\Phi\circ\Theta_{g}\circ\Phi^{-1}:(p,{\boldsymbol{v}})\mapsto(\theta_{g}(p),\ {\boldsymbol{T}}_{g}(p){\boldsymbol{v}}). (229)

Differentiating Θ~exp(tξ)(p,𝒗)\tilde{\Theta}_{\exp(t\xi)}(p,{\boldsymbol{v}}) with respect to tt yields the generator

Θ~^(ξ)(p,𝒗)=ddt|t=0Θ~exp(tξ)(p,𝒗)=(θ^(ξ)p,𝑻^(ξ)p𝒗)(p,𝒗),\hat{\tilde{\Theta}}(\xi)_{(p,{\boldsymbol{v}})}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\tilde{\Theta}_{\exp(t\xi)}(p,{\boldsymbol{v}})=\left(\hat{\theta}(\xi)_{p},\ {\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{v}}\right)_{(p,{\boldsymbol{v}})}, (230)

where 𝑻^(ξ)p=ddt|t=0𝑻exp(tξ)(p){\boldsymbol{\hat{T}}}(\xi)_{p}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\boldsymbol{T}}_{\exp(t\xi)}(p). We define the function 𝑭~:𝒰k{\boldsymbol{\tilde{F}}}:{\mathcal{U}}\to\mathbb{R}^{k} by

(p,𝑭~(p))=ΦF(p)p𝒰.(p,{\boldsymbol{\tilde{F}}}(p))=\Phi\circ F(p)\qquad\forall p\in{\mathcal{U}}. (231)

Using the above definitions, we can express the generalized Lie derivative in the local trivialization:

Φ(ξF)(p)\displaystyle\Phi\circ({\mathcal{L}}_{\xi}F)(p) =limt0(θexp(tξ)(p),1t[𝑭~(θexp(tξ)(p))𝑻exp(tξ)𝑭~(p)])\displaystyle=\lim_{t\to 0}\left(\theta_{\exp(t\xi)}(p),\ \frac{1}{t}\left[{\boldsymbol{\tilde{F}}}(\theta_{\exp(t\xi)}(p))-{\boldsymbol{T}}_{\exp(t\xi)}{\boldsymbol{\tilde{F}}}(p)\right]\right) (232)
=(p,d𝑭~(p)θ^(ξ)p𝑻^(ξ)p𝑭~(p)).\displaystyle=\left(p,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right).

Applying Lemma 38 allows us to express the left-hand-side of (141) as

dΦ[ıF(ξF)(p)]=(0,d𝑭~(p)θ^(ξ)p𝑻^(ξ)p𝑭~(p))Φ(F(p)).\operatorname{\mathrm{d}}\Phi\left[\imath_{F}\circ({\mathcal{L}}_{\xi}F)(p)\right]=\left(0,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))}. (233)

We can also express the quantities on the right-hand-side of (141) in the local trivialization. To do this, we compute

dΦ(F(p))d(Fπ)(F(p))Θ^(ξ)F(p)\displaystyle\operatorname{\mathrm{d}}\Phi(F(p))\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\hat{{\Theta}}(\xi)_{F(p)} =ddt|t=0ΦFπΘexp(tξ)(F(p))\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\Phi\circ F\circ\pi\circ\Theta_{\exp(t\xi)}(F(p)) (234)
=ddt|t=0ΦF(θexp(tξ)(p))\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\Phi\circ F(\theta_{\exp(t\xi)}(p))
=(θ^(ξ)p,d𝑭~(p)θ^(ξ)p)Φ(F(p))\displaystyle=\left(\hat{\theta}(\xi)_{p},\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}\right)_{\Phi(F(p))}

and

dΦ(F(p))Θ^(ξ)F(p)\displaystyle\operatorname{\mathrm{d}}\Phi(F(p))\hat{\Theta}(\xi)_{F(p)} =ddt|t=0ΦΘexp(tξ)Φ1ΦF(p)\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\Phi\circ\Theta_{\exp(t\xi)}\circ\Phi^{-1}\circ\Phi\circ F(p) (235)
=Θ~^(ξ)Φ(F(p))=(θ^(ξ)p,𝑻^(ξ)p𝑭~(p))Φ(F(p)).\displaystyle=\hat{\tilde{\Theta}}(\xi)_{\Phi(F(p))}=\left(\hat{\theta}(\xi)_{p},\ {\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))}.

Subtracting these yields

dΦ[(d(Fπ)(F(p))IdTF(p)E)Θ^(ξ)F(p)]=(0,d𝑭~(p)θ^(ξ)p𝑻^(ξ)p𝑭~(p))Φ(F(p)),\operatorname{\mathrm{d}}\Phi\left[\left(\operatorname{\mathrm{d}}(F\circ\pi)(F(p))-\operatorname{Id}_{T_{F(p)}E}\right)\hat{\Theta}(\xi)_{F(p)}\right]=\left(0,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))}, (236)

which, upon comparison with (233) completes the proof.