\quotingsetup

vskip=5pt,leftmargin=15pt,rightmargin=15pt

A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning

\nameSamuel E. Otto \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameNicholas Zolman \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameJ. Nathan Kutz \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA \AND\nameSteven L. Brunton \email[email protected]
\addrAI Institute in Dynamic Systems
University of Washington
Seattle, WA 98195-4322, USA Present affiliation: Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY, USA

Abstract

Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincaré invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.

Keywords: Symmetries, machine learning, Lie groups, manifolds, invariance, equivariance, neural networks, deep learning

1 Introduction

Symmetry is present throughout nature, and according to David Gross (1996) the discovery of fundamental symmetries has played an increasingly central role in physics since the beginning of the 20th century. He asserts that

“Einstein’s great advance in 1905 was to put symmetry first, to regard the symmetry principle as the primary feature of nature that constrains the allowable dynamical laws.”

According to Einstein’s special theory of relativity, physical laws including those of electromagnetism and quantum mechanics are Poincaré-invariant, meaning that after predictable transformations (actions of the Poincaré group), these laws can be applied in any non-accelerating reference frame, anywhere in the universe, at all times. Specifically these transformations form a ten-parameter group including four translations of space-time, three rotations of space, and three shifts or “boosts” in velocity. For small boosts of velocity, these transformations become the Galilean symmetries of classical mechanics. Similarly, the theorems of Euclidean geometry are unchanged after arbitrary translations, rotations, and reflections, comprising the Euclidean group. In fluid mechanics, the conformal (angle-preserving) symmetry of Laplace’s equation is used to reduce the study of idealized flows in complicated geometries to canonical flows in simple domains. In dynamical systems, the celebrated theorem of Noether (1918) establishes a correspondence between symmetries and conservation laws, an idea which has become a central pillar of mechanics (Abraham and Marsden, 2008). These examples illustrate the diversity of symmetry groups and their physical applications. More importantly, they illustrate how symmetric models and theories in physics automatically extrapolate in explainable ways to environments beyond the available data.

In machine learning, models that exploit symmetry can be trained with less data and use fewer parameters compared to their asymmetric counterparts. Early examples include augmenting data with known transformations (see Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001)) or using convolutional neural networks (CNNs) to achieve translation invariance for image processing tasks (see Fukushima (1980); LeCun et al. (1989)). More recently, equivariant neural networks respecting Euclidean symmetries have achieved state-of-the-art performance for predicting potentials in molecular dynamics Batzner et al. (2022). As with physical laws, symmetries and invariances allow machine learning models to extrapolate beyond the training data, and achieve high performance with fewer modeling parameters.

However, many problems are only weakly symmetric. Gravity, friction, and other external forces can cause some or all of the Poincaré or Galilean symmetries to be broken. Interactions between particles can be viewed as breaking symmetries possessed by non-interacting particles. Written characters have translation and scaling symmetry, but not rotation (cf. $6$ and $9$ , d and p, N and Z) or reflection (cf. b and d, b and p). One of the main contributions of this work is to propose a method of enforcing a new principle of parsimony in machine learning. This principal of parsimony by maximal symmetry states that a model should break a symmetry within a set of physically reasonable transformations (such as Poincaré, Galilean, Euclidean, or conformal symmetry) only when there is sufficient evidence in the data.

In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in the following three ways:

Task 1.

Enforce. Train a model with known symmetry.
Task 2.

Discover. Identify the symmetries of a given model or data set.
Task 3.

Promote. Train a model with as many symmetries as possible (from among candidates), breaking symmetries only when there is sufficient evidence in the data.

While these tasks have been studied to varying extents separately, we show how they can be situated within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. As a special case, the Lie derivative recovers the linear constraints derived by Finzi et al. (2021) for weights in equivariant multilayer perceptrons. In full generality, we show that known symmetries can be enforced as linear constraints derived from Lie derivatives for a large class of problems including learning vector and tensor fields on manifolds as well as learning equivariant integral operators acting on such fields. For example the kernels of “steerable” CNNs developed by Weiler et al. (2018); Weiler and Cesa (2019) are constructed to automatically satisfy these constraints for the groups $\operatorname{SO}(3)$ (rotations in three dimensions) and $\operatorname{SE}(2)$ (rotations and translations in two dimensions). We show how analogous steerable networks for other groups, such as subgroups of $\operatorname{SE}(n)$ , can be constructed by numerically enforcing linear constraints derived from the Lie derivative on integral kernels defining each layer. Symmetries, conservation laws, and symplectic structure can also be enforced when learning dynamical systems via linear constraints on the vector field. Again these constraints come from the Lie derivative and can be incorporated into neural network architectures and basis function regression models such as Sparse Identification of Nonlinear Dynamics (SINDy) (Brunton et al., 2016).

Moskalev et al. (2022) identifies the connected subgroup of symmetries of a trained neural network by computing the nullspace of a linear operator. Likewise, Kaiser et al. (2018, 2021) recovers the symmetries and conservation laws of a dynamical system by computing the nullspace of a different linear operator. We observe that these operators and others whose nullspaces encode the symmetries of more general models can be derived directly from the Lie derivative in a manner dual to the construction of operators used to enforce symmetry. Specifically, the nullspaces of the operators we construct reveal the largest connected subgroups of symmetries for enormous classes of models. This extends work by Gruver et al. (2022) using the Lie derivative to test whether a trained neural network is equivariant with respect to a given one-parameter group, e.g., rotation of images. Generalizing the ideas in Cahill et al. (2023), we also show that the symmetries of point clouds approximating underlying submanifolds can be recovered by computing the nullspaces of associated linear operators. This allows for the unsupervised mining of data for hidden symmetries.

The idea of relaxed symmetry has been introduced recently by Wang et al. (2022), along with architecture-specific symmetry-promoting regularization functions involving sums or integrals over the candidate group of transformations. The Augerino method introduced by Benton et al. (2020) uses regularization to promote equivariance with respect to a larger collection of candidate transformations. Promoting physical constraints through the loss function is also a core concept of Physics-Informed Neural Networks (PINNs) introduced by Raissi et al. (2019). Our approach to the third task (promoting symmetry) is to introduce a unified and broadly applicable class of convex regularization functions based on the Lie derivative to penalize symmetry breaking during training of machine learning models. As we describe above, the Lie derivative yields an operator whose nullspace corresponds to the symmetries of a given model. Hence, the lower the rank of this operator, the more symmetric the model is. The nuclear norm has been used extensively as a convex relaxation of the rank with favorable theoretical properties for compressed sensing and low-rank matrix recovery (Recht et al., 2010; Gross, 2011), as well as in robust PCA (Candès et al., 2011; Bouwmans et al., 2018). Penalizing the nuclear norm of our symmetry-encoding operator yields a convex regularization function that can be added to the loss when training machine learning models, including basis function regression and neural networks. Likewise, we use a nuclear norm penalty to promote conservation laws and Hamiltonicity with respect to candidate symplectic structures when fitting dynamical systems to data. This lets us train the model and enforce data-consistent symmetries simultaneously.

2 Executive summary

This paper provides a linear-algebraic framework to enforce, discover, and promote symmetry of machine learning models. To illustrate, consider a model defined by a function $F:\mathbb{R}^{m}\to\mathbb{R}^{n}$ . By a symmetry, we mean an invertible transformation $T$ and an invertible linear transformation $\tilde{T}$ so that

F(T(x))=\tilde{T}F(x).

(1)

Examples to keep in mind are rotations and translations. If $(T_{a},\tilde{T}_{a})$ is a symmetry, then so is its inverse $(T_{a}^{-1},\tilde{T}_{a}^{-1})$ , and if $(T_{b},\tilde{T}_{b})$ is another symmetry, then so is the composition $(T_{b}\circ T_{a},\tilde{T}_{b}\circ\tilde{T}_{a})$ . We work with collections of transformations $\{(T_{g},\tilde{T}_{g})\}_{g\in G}$ , called groups, that have an identity element and are closed under composition and inverses. Specifically, we consider Lie groups.

2.1 Enforcing symmetry

Given a group of transformations, the symmetry condition (1) imposes linear constraints on $F$ that can be enforced during the fitting process. However, there is one constraint per transformation, making direct enforcement impractical for continuous Lie groups such as rotations or translations. We observe that for smoothly-parametrized, connected groups, it suffices to consider a finite collection of infinitesimal linear constraints ${\mathcal{L}}_{\xi_{i}}F=0$ where ${\mathcal{L}}_{\xi}$ is the defined “Lie derivative” defined by

{\mathcal{L}}_{\xi}F(x)=\left(\left.\frac{\partial\tilde{T}_{g}F(x)}{\partial g}\right|_{g=\operatorname{Id}}-\frac{\partial F(x)}{\partial x}\left.\frac{\partial T_{g}(x)}{\partial g}\right|_{g=\operatorname{Id}}\right)\xi.

(2)

Notice that this expression is linear with respect to $\xi$ and with respect to $F$ .

2.2 Discovering symmetry

The symmetries of a given model $F$ form a subgroup that we seek to identify within a given group of candidates. For continuous Lie groups of transformations, the component of the subgroup containing the identity is revealed by the nullspace of the linear map

L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F.

(3)

More generally, the symmetries of a smooth surface in $\mathbb{R}^{n}$ can be determined from data sampled from this surface by computing the nullspace of a positive semidefinite operator. When the surface is the graph of the function $F$ in $\mathbb{R}^{m}\times\mathbb{R}^{n}$ , this operator is $L_{F}^{*}L_{F}$ with $L_{F}^{*}$ being an adjoint operator.

2.3 Promoting symmetry

Here, we seek to learn a model $F$ that both fits data and possesses as many symmetries as possible from a given candidate group of transformations. Since the nullspace of the operator $L_{F}$ defined in (3) corresponds with the symmetries of $F$ , we seek to minimize the rank of $L_{F}$ during the training process. To do this, we regularize optimization problems for $F$ using a convex relaxation of the rank given by the nuclear norm (sum of singular values)

\|L_{F}\|_{*}=\sum_{i=1}^{\dim G}\sigma_{i}(L_{F}).

(4)

This is convex with respect to $F$ because $F\mapsto L_{F}$ is linear and the nuclear norm is convex.

3 Related work

3.1 Enforcing symmetry

Data-augmentation, as reviewed by Shorten and Khoshgoftaar (2019); Van Dyk and Meng (2001), is one of the simplest ways to incorporate known symmetry into machine learning models. Usually this entails training a neural network architecture on training data to which known transformations have been applied. The theoretical foundations of these methods are explored by Chen et al. (2020). Data-augmentation has also been used by Benton et al. (2020) to construct equivariant neural networks by averaging the network’s output over transformations applied to the data. Finally, Brandstetter et al. (2022) applied data-augmentation strategies with known Lie-point symmetries for improving neural PDE solvers.

Symmetry can also be enforced directly on the machine learning architecture. For example, Convolutional Neural Networks (CNNs), introduced by Fukushima (1980) and popularized by LeCun et al. (1989), achieve translational equivariance by employing convolutional filters with trainable kernels in each layer. CNNs have been generalized to provide equivariance with respect to symmetry groups other than translation. Group-Equivariant CNNs (G-CNNs) (Cohen and Welling, 2016) provide equivariance with respect to arbitrary discrete groups generated by translations, reflections, and rotations. Rotational equivariance can be enforced on three-dimensional scalar, vector, or tensor fields using the 3D Steerable CNNs developed by Weiler et al. (2018). Spherical CNNs Cohen et al. (2018); Esteves et al. (2018) allow for rotation-equivariant maps to be learned for fields (such as projected images of 3D objects) on spheres. Essentially any group equivariant linear map (defining a layer of an equivariant neural network) acting fields can be described by group convolution (Kondor and Trivedi, 2018; Cohen et al., 2019), with the spaces of appropriate convolution kernel characterized by Cohen et al. (2019). Finzi et al. (2020) provides a practical way to construct convolutional layers that are equivariant with respect to arbitrary Lie groups and for general data types. For dynamical systems, Marsden and Ratiu (1999); Rowley et al. (2003); Abraham and Marsden (2008) describe techniques for symmetry reduction of the original problem to a quotient space where the known symmetry group has been factored out. Related approaches have been used by Peitz et al. (2023); Steyert (2022) to approximate Koopman operators for symmetric dynamical systems (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)).

A general method for constructing equivariant neural networks is introduced by Finzi et al. (2021), and relies on the observation that equivariance can be enforced through a set of linear constraints. For graph neural networks, Maron et al. (2018) characterizes the subspaces of linear layers satisfying permutation equivariance. Similarly, Ahmadi and Khadir (2020) shows that discrete symmetries and other types of side information can be enforced via linear or convex constraints in learning problems for dynamical systems. Our work builds on the results of Finzi et al. (2021), Weiler et al. (2018), Cohen et al. (2019), and Ahmadi and Khadir (2020) by showing that equivariance can be enforced in a systematic and unified way via linear constraints for large classes of functions and neural networks. Concurrent work by (Yang et al., 2024) shows how to enforce known or discovered Lie group symmetries on latent dynamics using hard linear constraints or soft penalties.

3.2 Discovering symmetry

Early work by Rao and Ruderman (1999); Miao and Rao (2007) used nonlinear optimization to learn infinitesimal generators describing transformations between images. Later, it was recognized by Cahill et al. (2023) that linear algebraic methods could be used to uncover the generators of continuous linear symmetries of arbitrary point clouds in Euclidean space. Similarly, Kaiser et al. (2018) and Moskalev et al. (2022) show how conserved quantities of dynamical systems and invariances of trained neural networks can be revealed by computing the nullspaces of associated linear operators. We connect these linear-algebraic methods to the Lie derivative, and provide generalizations to nonlinear group actions on manifolds. The Lie derivative has been used by Gruver et al. (2022) to quantify the extent to which a trained network is equivariant with respect to a given one-parameter subgroup of transformations. Our results show how the Lie derivative can reveal the entire connected subgroup of symmetries of a trained model via symmetric eigendecomposition.

More sophisticated nonlinear optimization techniques use Generative Adversarial Networks (GANs) to learn the transformations that leave a data distribution unchanged. These methods include SymmetryGAN developed by Desai et al. (2022) and LieGAN developed by Yang et al. (2023b). In contrast, our methods for detecting symmetry are entirely linear-algebraic.

While symmetries may exist in data, their representation may be difficult to describe. Yang et al. (2023a) develop Latent LieGAN (LaLieGAN) to extend LieGAN to find linear representations of symmetries in a latent space. Recently this has been applied to dynamics discovery (Yang et al., 2024). Likewise, Liu and Tegmark (2022) discover hidden symmetries by optimizing nonlinear transformations into spaces where candidate symmetries hold. Similar to our approach for promoting symmetry, they use a cost function to measure whether a given symmetry holds. In contrast, our regularization functions enable subgroups of candidate symmetry groups to be identified.

3.3 Promoting symmetry

Biasing a network towards increased symmetry can be accomplished through methods such as symmetry regularization. Analogous to the physics-informed loss developed in PINNs by Raissi et al. (2019) that penalize a solution for violating known dynamics, one can penalize symmetry violation for a known group; for example, Akhound-Sadegh et al. (2024) extends the PINN framework to penalize deviations of known Lie-point symmetries of a PDE. More generally, however, one can consider a candidate group of symmetries and promote as much symmetry as possible that is consistent with the available data. Wang et al. (2022) discusses these approaches, along with architecture-specific methods, including regularization functions involving summations or integrals over the candidate group of symmetries. While our regularization functions resemble these for discrete groups, we use a radically different regularization for continuous Lie groups. By leveraging the Lie algebra, our regularization functions eliminate the need to numerically integrate complicated functions over the group—a task that is already prohibitive for the $10$ -dimensional non-compact group of Galilean symmetries in classical mechanics.

Automated data augmentation techniques introduced by Cubuk et al. (2019); Hataya et al. (2020); Benton et al. (2020) are another class of methods that arguably promote symmetry. These techniques optimize the distribution of transformations applied to augment the data during training. For example “Augerino” is an elegant method developed by Benton et al. (2020) which averages an arbitrary network’s output over the augmentation distribution and relies on regularization to prevent the distribution of transformations from becoming concentrated near the identity. In essence, the regularization biases the averaged network towards increased symmetry.

In contrast, our regularization functions promote symmetry on an architectural level for the original network. This eliminates the need to perform averaging, which grows more costly for larger collections of symmetries. While a distribution over symmetries can be useful for learning interesting partial symmetries (e.g. $6$ stays $6$ for small rotations, before turning into $9$ ), as is done by Benton et al. (2020) and Romero and Lohit (2022), it is not clear how to use a continuous distribution over transformations to identify lower-dimensional subgroups which have measure zero. On the other hand, our linear-algebraic approach easily identifies and promotes symmetries in lower-dimensional connected subgroups.

3.4 Additional approaches and applications

There are several other approaches that incorporate various aspects of enforcing, discovering, and promoting symmetries. For example, Baddoo et al. (2023) developed algorithms to enforce and promote known symmetries in dynamic mode decomposition, through manifold constrained learning and regularization, respectively. Baddoo et al. (2023) also showed that discovering unknown symmetries is a dual problem to enforcing symmetry. Exploiting symmetry has also been a central theme in the reduced-order modeling of fluids for decades (Holmes et al., 2012). As machine learning methods are becoming widely used to develop these models (Brunton et al., 2020), the themes of enforcing and discovering symmetries in machine models are increasingly relevant. Known fluid symmetries have been enforced in SINDy for fluid systems (Loiseau and Brunton, 2018) through linear equality constraints; this approach was generalized to enforce more complex constraints (Champion et al., 2020). Unknown symmetries were similarly uncovered for electroconvective flows (Guan et al., 2021). Symmetry breaking is also important in many turbulent flows (Callaham et al., 2022).

4 Elementary theory of Lie group actions

This section provides background and notation required to understand the main results of this paper in the less abstract, but still remarkably useful setting of Lie groups acting on vector spaces. In Section 5 we use this theory to study the symmetries of continuously differentiable functions between vector spaces. Such functions form the basic building blocks of many machine learning models such as basis functions regression models, the layers of multilayer perceptrons, and the kernels of integral operators acting on spatial fields such as images. We emphasize that this is not the most general setting for our results, but we provide this section and simpler versions of our main Theorems in Section 5 in order to make the presentation more accessible. We develop our main results in the more general setting of fiber-linear Lie group actions on sections of vector bundles in Section 11.

4.1 Lie groups and subgroups

Lie groups are ubiquitous in science and engineering. Some familiar examples include the general linear group $\operatorname{GL}(n)$ consisting of all real, invertible, $n\times n$ matrices; the orthogonal group

O(n)=\left\{Q\in\mathbb{R}^{n\times n}\ :\ Q^{T}Q=I\right\};

(5)

and the special Euclidean group

\operatorname{SE}(n)=\left\{\begin{bmatrix}Q&b\\ 0&1\end{bmatrix}\ :\ Q\in\mathbb{R}^{n\times n},\ b\in\mathbb{R}^{n},\ Q^{T}Q=I,\ \det(Q)=1\right\},

(6)

representing rotations and translations in real $n$ -dimensional space, $\mathbb{R}^{n}$ , embedded in $\mathbb{R}^{n+1}$ via $x\mapsto(x,1)$ . Observe that the sets $\operatorname{GL}(n)$ , $O(n)$ , and $\operatorname{SE}(n)$ contain the identity matrix and are closed under matrix multiplication and inversion, making them into (non-commutative) groups. They are also smooth manifolds, which makes them Lie groups (Lee, 2013). In general, a Lie group is a smooth manifold that is simultaneously an algebraic group whose composition and inversion operations are smooth maps. The identity element is usually denoted $e$ for “einselement”, which for a matrix Lie group is the identity matrix $e=I$ . This section summarizes some basic results that can be found in references such as (Abraham et al., 1988; Lee, 2013; Varadarajan, 1984; Hall, 2015).

The most useful and profound property of a Lie group is the fact that the group is almost entirely characterized by an associated vector space called its Lie algebra. This allows global nonlinear questions about the group — such as which elements leave a function unchanged — to be answered using linear algebra. If $G$ is a Lie group, its Lie algebra, commonly denoted $\operatorname{Lie}(G)$ or ${\mathfrak{g}}$ , is the vector space consisting of all smooth vector fields on $G$ that remain invariant when pushed forward by left translation $L_{g}:h\mapsto g\cdot h$ . Translating back and forth from the identity element, the Lie algebra can be identified with the tangent space $\operatorname{Lie}(G)\cong T_{e}G$ . For example, the Lie algebra of the orthogonal group $O(n)$ consists of all skew-symmetric matrices, and is denoted

\mathfrak{o}(n)=\left\{S\in\mathbb{R}^{n\times n}\ :\ S+S^{T}=0\right\}.

(7)

A key fact is that the Lie algebra of $G$ is closed under the “Lie bracket” of vector fields¹¹1In $\mathbb{R}^{n}$ a vector field $V=(V^{1},\ldots,V^{n})$ is equivalent to a directional derivative operator $V^{1}\frac{\partial}{\partial x^{1}}+\cdots+V^{1}\frac{\partial}{\partial x^{1}}$ . A vector field on a smooth manifold is defined as an analogous linear operator acting on the space of smooth functions. The Lie bracket is defined as the commutator of these operators. See Lee (2013).

[\xi,\eta]=\xi\eta-\eta\xi\in\operatorname{Lie}(G).

(8)

For matrix Lie groups, this corresponds to the same commutator of matrices $\xi,\eta\in T_{I}G$ , as shown by Theorem 3.20 in Hall (2015).

The key tool relating global properties of a Lie group back to its Lie algebra is the exponential map $\exp:\operatorname{Lie}(G)\to G$ . A vector field $\xi\in\operatorname{Lie}(G)$ has a unique integral curve $\gamma:(-\infty,\infty)\to G$ passing through the identity $\gamma(0)=e$ and satisfying $\gamma^{\prime}(t)=\xi|_{\gamma(t)}$ . The exponential map defined by

\exp(\xi):=\gamma(1)

(9)

reproduces the entire integral curve $\exp(t\xi)=\gamma(t)$ thanks to Proposition 20.5 in Lee (2013). Such an exponential curve is illustrated in Figure 1. For a matrix Lie group and $\xi\in T_{I}G$ , the exponential map is given by the matrix exponential

\exp(\xi)=e^{\xi}=\sum_{k=0}^{\infty}\frac{1}{k!}\xi^{k}.

(10)

Proposition 20.8 in (Lee, 2013) provides many of the basic properties of the exponential map, such as $\exp((s+t)\xi)=\exp(s\xi)\cdot\exp(t\xi)$ , $\exp(\xi)^{-1}=\exp(-\xi)$ , and $\operatorname{\mathrm{d}}\exp(0)=\operatorname{Id}_{T_{e}G}$ . Perhaps the most important is that it provides a diffeomorphism of an open neighborhood of the origin $0$ in $\operatorname{Lie}(G)$ and an open neighborhood of the identity element $e$ in $G$ .

The connected component of $G$ containing the identity element is called the “identity component” of the Lie group and is denoted $G_{0}$ . Any element in this component can then be expressed as a finite product of exponentials thanks to Proposition 7.14 in (Lee, 2013), that is

G_{0}=\big{\{}\exp{(\xi_{1})}\cdots\exp{(\xi_{N})}\ :\ \xi_{1},\ldots,\xi_{N}\in\operatorname{Lie}(G),\ N=1,2,3,\ldots\big{\}}.

(11)

Moreover, the identity component is a normal subgroup of $G$ and all of the other connected components of $G$ are diffeomorphic cosets of $G_{0}$ (Proposition 7.15 in Lee (2013)), as we illustrate in Figure 1. For example, the special Euclidean group $\operatorname{SE}(n)$ is connected, and thus equal to its identity component. On the other hand, the orthogonal group $O(n)$ is compact and has two components consisting of orthogonal matrices $Q$ whose determinants are $1$ and $-1$ . The identity component of the orthogonal group is called the special orthogonal group and is denoted $\operatorname{SO}(n)$ . It is a general fact that when a Lie group such as $\operatorname{SO}(n)$ is connected and compact, it is equal to the image of the exponential map without the need to consider products of exponentials, see Tao (2011) and Appendix C.1 of Lezcano-Casado and Martınez-Rubio (2019).

Refer to caption — Figure 1: A Lie group $G$ and its action $\theta$ on a manifold ${\mathcal{M}}$ . The Lie group $G$ consists of three connected components with $G_{0}$ being the one that contains the identity element $e$ . Each non-identity component of $G$ is a coset $g_{i}G_{0}$ formed by translating the identity component by an arbitrary element $g_{i}$ in the component. The Lie algebra $\operatorname{Lie}(G)$ is identified with the tangent space $T_{e}G$ and an exponential curve $\exp(t\xi)$ generated by an element $\xi\in\operatorname{Lie}(G)$ is shown. The infinitesimal generator $\hat{\theta}(\xi)$ is the vector field on ${\mathcal{M}}$ whose flow corresponds with the action $\theta_{\exp(t\xi)}$ of group elements along $\exp(t\xi)$ .

A subgroup $H$ of a Lie group $G$ is called a “Lie subgroup” when $H$ is an immersed submanifold of $G$ and the group operations are smooth when restricted to $H$ . An immersed submanifold does not necessarily inherit its topology as a subset of $G$ , but rather $H$ has a topology and smooth structure such that the inclusion $\imath_{H}:H\hookrightarrow G$ is smooth and its derivative is injective (see Lee (2013)). The tangent space to a Lie subgroup $H\subset G$ at the identity, defined as $T_{e}H=\operatorname{Range}(\operatorname{\mathrm{d}}\imath_{H}(I))\subset\operatorname{Lie}(G)$ , is closed under the Lie bracket and thus forms a “Lie subalgebra” of $\operatorname{Lie}(G)$ , denoted $\operatorname{Lie}(H)$ . Conversely, a remarkable result stated by Theorem 19.26 in Lee (2013) shows that any subalgebra ${\mathfrak{h}}\subset\operatorname{Lie}(G)$ , that is, any subspace closed under the Lie bracket corresponds to a unique connected Lie subgroup $H\subset G_{0}$ satisfying $\operatorname{Lie}(H)={\mathfrak{h}}$ . Later on, we will use this fact to identify the connected subgroups of symmetries of machine learning models based on infinitesimal criteria. Another remarkable and useful fact is the “closed subgroup theorem” stated as Theorem 20.12 in Lee (2013). It says that if $H\subset G$ is a closed subset and is closed under the group operations of $G$ , then $H$ is automatically an embedded Lie subgroup of $G$ . Interestingly, while a Lie subgroup $H\subset GL(n)$ need not be a closed subset, it turns out that $H$ can always be embedded as a closed subgroup in a larger $\operatorname{GL}(n^{\prime})$ , $n^{\prime}\geq n$ thanks to Theorem 9 in Gotô (1950).

4.2 Group representations, actions, and infinitesimal generators

A Lie group homomorphism is a smooth map $\Phi:G_{1}\to G_{2}$ between Lie groups that respects the group product, that is,

\Phi(g_{1}g_{2})=\Phi(g_{1})\cdot\Phi(g_{2}).

(12)

The tangent map $\phi:=\operatorname{\mathrm{d}}\Phi(e):T_{e}G_{1}\cong{\mathfrak{g}}_{1}\to T_{e}G_{2}\cong{\mathfrak{g}}_{2}$ is a Lie algebra homomorphism by Theorem 8.44 in Lee (2013), meaning that it is a linear map respecting the Lie bracket:

\phi\big{(}[\xi_{1},\xi_{2}]\big{)}=\big{[}\phi(\xi_{1}),\phi(\xi_{2})\big{]}.

(13)

Moreover, (see Proposition 20.8 in Lee (2013)) the Lie group homomorphism and its induced Lie algebra homomorphism are related by the exponential maps on $G_{1}$ and $G_{2}$ via the identity

\Phi\big{(}\exp(\xi)\big{)}=\exp\big{(}\phi(\xi)\big{)}.

(14)

Another fundamental result (Theorem 20.19 in Lee (2013)) is that any Lie algebra homomorphism $\operatorname{Lie}(G_{1})\to\operatorname{Lie}(G_{2})$ corresponds to a unique Lie group homomorphism $G_{1}\to G_{2}$ when $G_{1}$ is simply connected. When $G_{2}$ is the general linear group on a vector space, then the Lie group and Lie algebra homomorphisms are called Lie group and Lie algebra “representations”.

A Lie group $G$ can act on a vector space ${\mathcal{V}}$ via a representation $\Phi:G\to\operatorname{GL}({\mathcal{V}})$ according to

\theta:(x,g)\mapsto\Phi(g^{-1})x,

(15)

with $x\in{\mathcal{V}}$ and $g\in G$ . More generally, a nonlinear right action of a Lie group $G$ on a manifold ${\mathcal{M}}$ is any smooth map $\theta:{\mathcal{M}}\times G\to{\mathcal{M}}$ satisfying

\theta(\theta(x,g_{1}),g_{2})=\theta(x,g_{1}g_{2})\qquad\mbox{and}\qquad\theta(x,e)=x

(16)

for every $x\in{\mathcal{M}}$ and $g_{1},g_{2}\in G$ . Figure 1 depicts the action of a Lie group on a manifold. We make frequent use of the maps $\theta_{g}=\theta(\cdot,g)$ , which have smooth inverses $\theta_{g^{-1}}$ , and the “orbit maps” $\theta^{(x)}=\theta(x,\cdot)$ . For example, using a representation $\Phi:\operatorname{SE}(3)\to\operatorname{GL}(\mathbb{R}^{7})$ , the position $q$ and velocity $v$ of a particle in $\mathbb{R}^{3}$ can be rotated and translated via the action

\theta\left(\begin{bmatrix}Q&b\\ 0&1\end{bmatrix},\ \begin{bmatrix}q\\ v\\ 1\end{bmatrix}\right)=\Phi\left(\begin{bmatrix}Q^{T}&-Q^{T}b\\ 0&1\end{bmatrix}\right)\begin{bmatrix}q\\ v\\ 1\end{bmatrix}=\begin{bmatrix}Q^{T}&0&-Q^{T}b\\ 0&Q^{T}&0\\ 0&0&1\end{bmatrix}\begin{bmatrix}q\\ v\\ 1\end{bmatrix}=\begin{bmatrix}Q^{T}(q-b)\\ Q^{T}v\\ 1\end{bmatrix}.

The positions and velocities of $n$ particles arranged as a vector $(q_{1},\ldots,q_{n},v_{1},\ldots,v_{n},1)$ can be simultaneously rotated and translated via an analogous representation $\Phi:\operatorname{SE}(n)\to\operatorname{GL}(\mathbb{R}^{6n+1})$ .

The key fact about a group action is that it is almost completely characterized by a linear map called the infinitesimal generator. This map $\hat{\theta}$ assigns to each element $\xi\in\operatorname{Lie}(G)$ in the Lie algebra, a vector field $\hat{\theta}(\xi)$ on ${\mathcal{M}}$ defined by

\hat{\theta}(\xi)_{x}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\theta_{\exp(t\xi)}(x)=\operatorname{\mathrm{d}}\theta^{(x)}(e)\xi.

(17)

The infinitesimal generator and its relation to the group action are illustrated in Figure 1. For the linear action in (15), the infinitesimal generator is the linear vector field given by the matrix-vector product $\hat{\theta}(\xi)_{x}=-\phi(\xi)x$ . Crucially, the flow of the generator recovers the group action along the exponential curve $\exp(t\xi)$ , i.e.,

\operatorname{Fl}_{\hat{\theta}(\xi)}^{t}(x)=\theta_{\exp(t\xi)}(x).

(18)

For the linear right action in (15), this is easily verified by differentiation, applying (14), and the fact that solutions of smooth ordinary differential equations are unique. For a nonlinear right action this follows from Lemma 20.14 and Proposition 9.13 in Lee (2013).

Remark 1

In contrast to a “right” action $\theta:{\mathcal{M}}\times G\to{\mathcal{M}}$ , a “left” action $\theta:G\times{\mathcal{M}}\to{\mathcal{M}}$ satisfies ${\theta(g_{2},\theta(g_{1},x))=\theta(g_{2}g_{1},x)}$ . While our main results work for left actions too, e.g. $\theta(g,x)=\Phi(g)x$ , right actions are slightly more natural because the infintesimal generator is a Lie alegbra homomorphism, i.e.,

\hat{\theta}([\xi,\eta])=[\hat{\theta}(\xi),\hat{\theta}(\eta)],

(19)

whereas this holds with a sign change for left actions. Every left action $\theta^{L}$ can be converted into an equivalent right action defined by $\theta^{R}(x,g)=\theta^{L}(g^{-1},x)$ , and vice versa.

5 Fundamental operators for studying symmetry

Here we introduce our main theoretical results for studying symmetries of machine learning models by focusing on a concrete and useful special case. The basic building blocks of the machine learning models we consider here are continuously differentiable functions $F:{\mathcal{V}}\to{\mathcal{W}}$ between finite-dimensional vector spaces. The spaces of functions ${\mathcal{V}}\to{\mathcal{W}}$ with continuous derivatives up to order $k\in\mathbb{N}\cup\{\infty\}$ is denoted $C^{k}({\mathcal{V}};{\mathcal{W}})$ , with addition and scalar multiplication defined point-wise. These functions could be layers of a multilayer neural network, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task as in Brunton et al. (2016). General versions of our results for sections of vector bundles are developed later in Section 11. Our main results show that two families of fundamental linear operators encode the symmetries of these functions. The fundamental operators allow us to enforce, promote, and discover symmetry in machine learning models as we describe in Sections 6, 7, and 8.

We consider a general (perhaps nonlinear) right action $\theta:{\mathcal{V}}\times G\to{\mathcal{V}}$ and a representation $\Phi:G\to\operatorname{GL}({\mathcal{W}})$ . The definition of equivariance, the symmetry group of a function, and the first family of fundamental operators are introduced by the following:

Definition 2

We say that $F$ is equivariant with respect to a group element $g\in G$ if

({\mathcal{K}}_{g}F)(x):=\Phi(g)F(\theta_{g}(x))=F(x)

(20)

for every $x\in{\mathcal{V}}$ . These elements form a subgroup of $G$ denoted $\operatorname{Sym}_{G}(F)$ .

Note that when the action $\theta(x,g)=\Psi(g^{-1})x$ is also defined by a representation, then (20) becomes

({\mathcal{K}}_{g}F)(x):=\Phi(g)F(\Psi(g)^{-1}x)=F(x).

(21)

The transformation operators ${\mathcal{K}}_{g}$ are linear maps sending functions in $C^{k}({\mathcal{V}};{\mathcal{W}})$ to functions in $C^{k}({\mathcal{V}};{\mathcal{W}})$ . These fundamental operators form a group with composition ${\mathcal{K}}_{g}{\mathcal{K}}_{h}={\mathcal{K}}_{gh}$ and inversion ${\mathcal{K}}_{g}^{-1}={\mathcal{K}}_{g^{-1}}$ . Thus, $g\mapsto{\mathcal{K}}_{g}$ is an infinite-dimensional representation of $G$ in $C^{k}({\mathcal{V}};{\mathcal{W}})$ for any $k$ . These operators are useful for studying discrete symmetries of functions. However, for a continuous group $G$ it is impractical to work directly with the uncountable family $\{{\mathcal{K}}_{g}\}_{g\in G}$ .

The second family of fundamental operators are the key objects we use to study continuous symmetries of functions. These are the Lie derivatives ${\mathcal{L}}_{\xi}:C^{1}({\mathcal{V}};{\mathcal{W}})\to C^{0}({\mathcal{V}};{\mathcal{W}})$ defined along each $\xi\in\operatorname{Lie}(G)$ by

({\mathcal{L}}_{\xi}F)(x)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(x)=\phi(\xi)F(x)+\frac{\partial F(x)}{\partial x}\hat{\theta}(\xi)_{x}.

(22)

Note that when the action is $\theta(x,g)=\Psi(g^{-1})x$ , we have $\hat{\theta}(\xi)_{x}=-\psi(\xi)x$ and (22) becomes

({\mathcal{L}}_{\xi}F)(x)=\phi(\xi)F(x)-\frac{\partial F(x)}{\partial x}\psi(\xi)x,

(23)

where $\phi,\psi$ are the Lie algebra representations corresponding to $\Phi,\Psi$ . Evident from (22) is the fact that the Lie derivative is linear with respect to both $\xi$ and $F$ , and sends functions in $C^{k+1}$ to functions in $C^{k}$ for every $k\geq 0$ . The geometric construction of the fundamental operators ${\mathcal{K}}_{g}$ and ${\mathcal{L}}_{\xi}$ are depicted in Figure 2. It turns out (see Proposition 24) that $\xi\mapsto{\mathcal{L}}_{\xi}$ is the Lie algebra representation corresponding to $g\mapsto{\mathcal{K}}_{g}$ on $C^{\infty}({\mathcal{V}};{\mathcal{W}})$ , meaning that on this space we have the handy relations

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}\qquad\mbox{and}\qquad{\mathcal{L}}_{[\xi,\eta]}={\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}.

(24)

The results stated below are special cases of more general results developed later in Section 11.

Our first main result provides necessary and sufficient conditions for a continuously differentiable function $F:{\mathcal{V}}\to{\mathcal{W}}$ to be equivariant with respect to the Lie group actions on ${\mathcal{V}}$ and ${\mathcal{W}}$ . This generalizes the constraints derived by Finzi et al. (2021) for the linear layers of equivariant multilayer perceptrons.

Theorem 3

Let $\{\xi_{i}\}_{i=1}^{q}$ generate (via linear combinations and Lie brackets) the Lie algebra $\operatorname{Lie}(G)$ and let $\{g_{j}\}_{j=1}^{n_{G}-1}$ contain one element from each non-identity component of $G$ . Then $F\in C^{1}({\mathcal{V}};{\mathcal{W}})$ is $G$ -equivariant if and only if

{\mathcal{L}}_{\xi_{i}}F=0\qquad\mbox{and}\qquad{\mathcal{K}}_{g_{j}}F-F=0

(25)

for every $i=1,\ldots,q$ and every $j=1,\ldots,n_{G}-1$ . This is a special case of Theorem 26.

Since the fundamental operators ${\mathcal{L}}_{\xi}$ and ${\mathcal{K}}_{g}$ are linear, Theorem 3 provides linear constraints for a continuously differentiable function $F$ to be $G$ -equivariant.

Our second main result shows that the continuous symmetries of a given continuously differentiable function $F:{\mathcal{V}}\to{\mathcal{W}}$ are encoded by its Lie derivatives.

Theorem 4

Given $F\in C^{1}({\mathcal{V}};{\mathcal{W}})$ , the symmetry group $\operatorname{Sym}_{G}(F)$ is a closed, embedded Lie subgroup of $G$ with Lie subalgebra

\operatorname{\mathfrak{sym}}_{G}(F)=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F=0\right\}.

(26)

This is a special case of Theorem 25.

This result completely characterizes the identity component of the symmetry group $\operatorname{Sym}_{G}(F)$ because the connected Lie subgroups of $G$ are in one-to-one correspondence with Lie subalgebras of $\operatorname{Lie}(G)$ by Theorem 19.26 in Lee (2013). The Lie subalgebra of symmetries of a $C^{1}$ function $F$ can be identified via linear algebra. In particular, $\operatorname{\mathfrak{sym}}_{g}(F)$ is the nullspace of the linear operator $L_{F}:\operatorname{Lie}(G)\to C^{0}({\mathcal{V}};{\mathcal{W}})$ defined by

L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F.

(27)

Discretization methods suitable for linear-algebraic computations with the fundamental operators will be discussed in Section 10. The key point is that when the functions $F$ lie in a finite-dimensional subspace ${\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}})$ , the ranges of the restricted Lie derivatives $\{\left.{\mathcal{L}}_{\xi}\right|_{{\mathcal{F}}}\}_{\xi\in\operatorname{Lie}(G)}$ , hence, also the ranges of $\{L_{F}\}_{F\in{\mathcal{F}}}$ , are contained in a corresponding finite-dimensional subspace ${\mathcal{F}}^{\prime}\subset C^{0}({\mathcal{V}};{\mathcal{W}})$ on which inner products can be defined using sampling or quadrature.

The preceding two theorems already show the duality between enforcing and discovering continuous symmetries with respect to the Lie derivative, viewed as a bilinear form $(\xi,F)\mapsto{\mathcal{L}}_{\xi}F$ . To discover symmetries, we seek generators $\xi\in\operatorname{Lie}(G)$ satisfying ${\mathcal{L}}_{\xi}F=0$ for a known function $F$ . On the other hand, to enforce a connected group of symmetries, we seek functions $F$ satisfying ${\mathcal{L}}_{\xi_{i}}F=0$ with known generators $\xi_{1},\ldots,\xi_{q}$ of $\operatorname{Lie}(G)$ .

6 Enforcing symmetry with linear constraints

Methods to enforce symmetry in neural networks and other machine learning models have been studied extensively, as we reviewed briefly in Section 3.1. A unifying theme in these techniques has been the use of linear constraints to enforce symmetry (Finzi et al., 2021; Loiseau and Brunton, 2018; Weiler et al., 2018; Cohen et al., 2019; Ahmadi and Khadir, 2020). The purpose of this section is to show how several of these methods can be understood in terms of the fundamental operators and linear constraints provided by Theorem 3.

6.1 Multilayer perceptrons

Enforcing symmetry in multilayer percetrons was studied by Finzi et al. (2021). They provide a practical method based on enforcing linear constraints on the weights defining each layer of a neural network. The network uses specialized nonlinearities that are automatically equivariant, meaning that the constraints need only be enforced on the linear component of each layer. We show that the constraints derived by Finzi et al. (2021) are the same as those given by Theorem 3.

Specifically, each linear layer $F^{(l)}:{\mathcal{V}}_{l-1}\to{\mathcal{V}}_{l}$ , for $l=1,\ldots,L$ , is defined by

F^{(l)}(x)=W^{(l)}x+b^{(l)},

(28)

where $W^{(l)}$ are weight matrices and $b^{(l)}$ are bias vectors. Defining group representations $\Phi_{l}:G\to\operatorname{GL}({\mathcal{V}}_{l})$ for each layer, yields fundamental operators given by

	$\displaystyle{\mathcal{K}}_{g}F^{(l)}(x)-F^{(l)}(x)$	$\displaystyle=\big{(}\Phi_{l}(g)W^{(l)}\Phi_{l-1}(g)^{-1}-W^{(l)}\big{)}x+\Phi_{l}(g)b^{(l)}-b^{(l)}$		(29)
	$\displaystyle{\mathcal{L}}_{\xi}F^{(l)}(x)$	$\displaystyle=\big{(}\phi_{l}(\xi)W^{(l)}-W^{(l)}\phi_{l-1}(\xi)\big{)}x+\phi_{l}(\xi)b^{(l)}.$		(30)

Let $\{\xi_{i}\}_{i=1}^{q}$ generate $\operatorname{Lie}(G)$ and let $\{g_{j}\}_{j=1}^{n_{G}-1}$ consist of an element from each non-identity component of $G$ . Using the fundamental operators and Theorem 3, it follows that the layer $F^{(l)}$ is $G$ -equivariant if and only if the weights and biases satisfy

\phi_{l}(\xi_{i})W^{(l)}=W^{(l)}\phi_{l-1}(\xi_{i}),\quad\mbox{and}\quad\Phi_{l}(g_{j})W^{(l)}=W^{(l)}\Phi_{l-1}(g_{j}),

(31)

\phi_{l}(\xi_{i})b^{(l)}=0,\quad\mbox{and}\quad\Phi_{l}(g_{j})b^{(l)}=b^{(l)}

(32)

for every $i=1,\ldots,q$ and $j=1,\ldots,n_{g}-1$ . These are the same as the linear constraints one derives using the method by Finzi et al. (2021). The equivariant linear layers are then combined with specialized equivariant nonlinearities $\sigma^{(l)}:{\mathcal{V}}_{l}\to{\mathcal{V}}_{l}$ to produce an equivariant network

F=\sigma^{(L)}\circ F^{(L)}\circ\cdots\circ\sigma^{(1)}\circ F^{(1)}:{\mathcal{V}}_{0}\to{\mathcal{V}}_{L}.

(33)

The composition of equivariant functions is equivariant, as one can easily check using Definition 2.

6.2 Neural operators acting on fields

Enforcing symmetry in neural networks acting on spatial fields has been studied extensively by Weiler et al. (2018); Cohen et al. (2018); Esteves et al. (2018); Kondor and Trivedi (2018); Cohen et al. (2019) among others. Many of these techniques use integral operators to define equivariant linear layers, which are coupled with equivariant nonlinearities, such as the gated nonlinearities proposed by Weiler et al. (2018). Networks built by composing integral operators with nonlinearities constitute a large class of “neural operators” described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023). The key task is to identify appropriate bases for equivariant kernels. For certain groups, such as the Special Euclidean group $G=\operatorname{SE}(3)$ , bases can be constructed explicitly using spherical harmonics, as in Weiler et al. (2018). We show that equivariance with respect to arbitrary group actions can be enforced via linear constraints on the integral kernels derived using the fundamental operators introduced in Section 5. Appropriate bases of kernel functions can then be constructed numerically by computing an appropriate nullspace, as is done by Finzi et al. (2021) for multilayer perceptrons.

For the sake of simplicity we consider integral operators acting on vector-valued functions $F:\mathbb{R}^{m}\to{\mathcal{V}}$ , where ${\mathcal{V}}$ is a finite-dimensional vector space. Later on in Section 11.4 we study higher-order integral operators acting on sections of vector bundles. If ${\mathcal{W}}$ is another finite-dimensional vector space, an integral operator acting on $F$ to produce a new function $\mathbb{R}^{n}\to{\mathcal{W}}$ is defined by

{\mathcal{T}}_{K}F(x)=\int_{\mathbb{R}^{m}}K(x,y)F(y)\operatorname{\mathrm{d}}y,

(34)

where the “kernel” function $K$ provides a linear map $K(x,y):{\mathcal{V}}\to{\mathcal{W}}$ at each $(x,y)\in\mathbb{R}^{n}\times\mathbb{R}^{m}$ . In other words, the kernel is a function on $\mathbb{R}^{n}\times\mathbb{R}^{m}$ taking values in the tensor product space ${\mathcal{W}}\otimes{\mathcal{V}}^{*}$ , where ${\mathcal{V}}^{*}$ denotes the algebraic dual of ${\mathcal{V}}$ . Many of the neural operator architectures described by Kovachki et al. (2023); Goswami et al. (2023); Boullé and Townsend (2023) are constructed by composing layers defined by integral operators (34) with nonlinear activation functions, usually acting pointwise. The kernel functions $K$ are optimized during training of the neural operator.

With group actions defined by representations on $\mathbb{R}^{m},\mathbb{R}^{n},{\mathcal{V}},{\mathcal{W}}$ , functions $F:\mathbb{R}^{m}\to{\mathcal{V}}$ transform according to

{\mathcal{K}}_{g}^{(\mathbb{R}^{m},{\mathcal{V}})}F(x)=\Phi_{{\mathcal{V}}}(g)F(\Phi_{\mathbb{R}^{m}}(g)^{-1}x)

(35)

for $g\in G$ . Likewise, functions $\mathbb{R}^{n}\to{\mathcal{W}}$ transform via an analogous operator ${\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}$ .

Definition 5

The integral operator ${\mathcal{T}}_{K}$ in (34) is equivariant with respect to $g\in G$ when

{\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{(\mathbb{R}^{m},{\mathcal{V}})}={\mathcal{T}}_{K}.

(36)

The elements $g$ satisfying this equation form a subgroup of $G$ denoted $\operatorname{Sym}_{G}({\mathcal{T}}_{K})$ .

By changing variables in the integral, the operator on the left is given by

{\mathcal{K}}_{g}^{(\mathbb{R}^{n},{\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{(\mathbb{R}^{m},{\mathcal{V}})}F(x)=\int_{\mathbb{R}^{m}}{\mathcal{K}}_{g}K(x,y)F(y)\operatorname{\mathrm{d}}y,

(37)

where

{\mathcal{K}}_{g}K(x,y)=\Phi_{{\mathcal{W}}}(g)K\big{(}\Phi_{\mathbb{R}^{n}}(g)^{-1}x,\Phi_{\mathbb{R}^{m}}(g)^{-1}y\big{)}\Phi_{{\mathcal{V}}}(g)^{-1}\det\big{[}\Phi_{\mathbb{R}^{m}}(g)^{-1}\big{]}.

(38)

The following result provides equivariance conditions in terms of the kernel, generalizing Lemma 1 in Weiler et al. (2018).

Proposition 6

Let $K$ be continuous and suppose that ${\mathcal{T}}_{K}$ acts on a function space containing all smooth, compactly supported fields. Then

\operatorname{Sym}_{G}({\mathcal{T}}_{K})=\left\{g\in G\ :\ {\mathcal{K}}_{g}K=K\right\}.

(39)

We give a proof in Appendix A

The Lie derivative of a continuously differentiable kernel function is given by

{\mathcal{L}}_{\xi}K(x,y)=\phi_{{\mathcal{W}}}(\xi)K(x,y)-K(x,y)\phi_{{\mathcal{V}}}(\xi)-K(x,y)\operatorname{Tr}[\phi_{\mathbb{R}^{m}}(\xi)]\\ -\frac{\partial K(x,y)}{\partial x}\phi_{\mathbb{R}^{n}}(\xi)x-\frac{\partial K(x,y)}{\partial y}\phi_{\mathbb{R}^{m}}(\xi)y

(40)

The operators ${\mathcal{K}}_{g}$ and ${\mathcal{L}}_{\xi}$ are the fundamental operators from Section 5 because the transformation law for the kernel can be written as

{\mathcal{K}}_{g}K=\Phi_{{\mathcal{W}}\otimes{\mathcal{V}}^{*}}(g)K\circ\Phi_{\mathbb{R}^{m}\times\mathbb{R}^{m}}(g)^{-1},

(41)

where

	$\displaystyle\Phi_{\mathbb{R}^{n}\times\mathbb{R}^{m}}(g)$	$\displaystyle:(x,y)\mapsto\left(\Phi_{\mathbb{R}^{n}}(g)x,\Phi_{\mathbb{R}^{m}}(g)y\right),$		(42)
	$\displaystyle\Phi_{{\mathcal{W}}\otimes{\mathcal{V}}^{*}}(g)$	$\displaystyle:T\mapsto\Phi_{{\mathcal{W}}}(g)T\Phi_{{\mathcal{V}}}(g)^{-1}\det\big{[}\Phi_{\mathbb{R}^{m}}(g)^{-1}\big{]}$		(42)

are representations of $G$ in $\mathbb{R}^{m}\times\mathbb{R}^{m}$ and ${\mathcal{W}}\otimes{\mathcal{V}}^{*}$ .

As an immediate consequence of Theorem 3, we have the following corollary establishing linear constraints for the kernel to produce an equivariant integral operator.

Corollary 7

Let $\{\xi_{i}\}_{i=1}^{q}$ generate the Lie algebra $\operatorname{Lie}(G)$ and let $\{g_{j}\}_{j=1}^{n_{G}-1}$ contain one element from each non-identity component of $G$ . Under the same hypotheses as Proposition 6 and assuming $K$ is continuously differentiable, the integral operator ${\mathcal{T}}_{K}$ in (34) is $G$ -equivariant in the sense of Definition 5 if and only if

{\mathcal{L}}_{\xi_{i}}K=0\qquad\mbox{and}\qquad{\mathcal{K}}_{g_{j}}K-K=0

(43)

for every $i=1,\ldots,q$ and every $j=1,\ldots,n_{G}-1$ .

These linear constraint equations must be satisfied to enforced equivariance with respect to a known symmetry $G$ in the machine learning process. By discretizing the operators ${\mathcal{K}}_{g}$ and ${\mathcal{L}}_{\xi}$ , as discussed later in Section 10, one can solve these constraints numerically to construct a basis of kernel functions for equivariant integral operators.

As an immediate consequence of Theorem 4, the following result shows that the Lie derivative of the kernel encodes the continuous symmetries of a given integral operator.

Corollary 8

Under the same hypotheses as Proposition 6 and assuming $K$ is continuously differentiable, it follows that $\operatorname{Sym}_{G}({\mathcal{T}}_{K})$ is a closed, embedded Lie subgroup of $G$ with Lie subalgebra

\operatorname{\mathfrak{sym}}_{G}({\mathcal{T}}_{K})=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}K=0\right\}.

(44)

This result will be useful for methods that promote symmetry of the integral operator, as we describe later in Section 8.

7 Discovering symmetry by computing nullspaces

In this section we show that in a wide range of settings, the continuous symmetries of a manifold, point cloud, or map can be recovered by computing the nullspace of a linear operator. For functions, this is already covered by Theorem 4, which allows us to compute the connected subgroup of symmetries by identifying its Lie subalgebra

\operatorname{\mathfrak{sym}}_{G}(F)=\operatorname{Null}(L_{F})

(45)

where $L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}$ is the linear operator defined by (27). Hence, if a machine learning model $F$ has a symmetry group $\operatorname{Sym}_{G}(F)$ , then its Lie algebra is equal to the nullspace of $L_{F}$ .

This section explains how this is actually a special case of a more general result allowing us to reveal the symmetries of submanifolds via the nullspace of a closely related operator. We begin with the more general case where we study the symmetries of a submanifold of Euclidean space, and we explain how to recover symmetries from point clouds approximating submanifolds. The Lie derivative described in Section 5 is then recovered when the submanifold is the graph of a function. We also briefly describe how the fundamental operators from Section 5 can be used to recover symmetries and conservation laws of dynamical systems.

7.1 Symmetries of submanifolds

We begin by studying the symmetries of smooth submanifolds ${\mathcal{M}}$ of Euclidean space $\mathbb{R}^{d}$ using an approach similar to Cahill et al. (2023). However, we use a different operator that generalizes more naturally to nonlinear group actions on arbitrary manifolds (see Section 12) and recovers the Lie derivative (see Section 7.2). With right action $\theta:\mathbb{R}^{d}\times G\to\mathbb{R}^{d}$ of a Lie group, we define invariance of a submanifold as follows:

Definition 9

A submanifold ${\mathcal{M}}\subset\mathbb{R}^{d}$ is invariant with respect to a group element $g\in G$ if

\theta_{g}(z)\in{\mathcal{M}}

(46)

for every $z\in{\mathcal{M}}$ . These elements form a subgroup of $G$ denoted $\operatorname{Sym}_{G}({\mathcal{M}})$ .

The subgroup of symmetries of a submanifold is characterized by the following theorem.

Theorem 10

Let ${\mathcal{M}}$ be a smooth, closed, embedded submanifold of $\mathbb{R}^{d}$ . Then $\operatorname{Sym}_{G}({\mathcal{M}})$ is a closed, embedded Lie subgroup of $G$ whose Lie subalgebra is

\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\{\xi\in\operatorname{Lie}(G)\ :\ \hat{\theta}(\xi)_{z}\in T_{z}{\mathcal{M}}\quad\forall z\in{\mathcal{M}}\}.

(47)

This is a special case of Theorem 35.

The meaning of this result and its practical use for detecting symmetry are illustrated in Figure 3.

To reveal the connected component of $\operatorname{Sym}_{G}({\mathcal{M}})$ , we let $P_{z}:\mathbb{R}^{d}\to\mathbb{R}^{d}$ be a family of linear projections onto $T_{z}{\mathcal{M}}\subset\mathbb{R}^{d}$ . These are assumed to vary continuously with respect to $z\in{\mathcal{M}}$ . Then under the assumptions of the above theorem, $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ is the nullspace of the symmetric, positive-semidefinite operator $S_{{\mathcal{M}}}:\operatorname{Lie}(G)\to\operatorname{Lie}(G)$ defined by

\big{\langle}\eta,\ S_{{\mathcal{M}}}\xi\big{\rangle}_{\operatorname{Lie}(G)}=\int_{{\mathcal{M}}}\hat{\theta}(\eta)_{z}^{T}(I-P_{z})^{T}(I-P_{z})\hat{\theta}(\xi)_{z}\ \operatorname{\mathrm{d}}\mu(z)

(48)

for every $\eta,\xi\in\operatorname{Lie}(G)$ . We see in Figure 3 that $(I-P_{z})\hat{\theta}(\xi)_{z}$ measures the component of the infinitesimal generator not tangent to the submanifold at $z$ . Here, $\mu$ is any strictly positive measure on ${\mathcal{M}}$ that makes all of these integrals finite. The above formula is useful for computing the matrix of $S_{{\mathcal{M}}}$ in an orthonormal basis for $\operatorname{Lie}(G)$ .

Alternatively, when the dimension of $G$ is large, one can compute the nullspace using a Krylov algorithm such as the one described in Finzi et al. (2021). Such algorithms rely solely on queries of $S_{{\mathcal{M}}}$ acting on vectors $\xi\in\operatorname{Lie}(G)$ . When $\theta_{g}(z)=\Phi(g^{-1})z$ and $\hat{\theta}(\xi)_{z}=-\phi(\xi)z$ are given by a Lie group representation (see Section 4.2), then the operator defined in (48) is given explicitly by

S_{{\mathcal{M}}}\xi=\int_{{\mathcal{M}}}\operatorname{\mathrm{d}}\Phi(e)^{*}\left[(I-P_{z})^{T}(I-P_{z})\phi(\xi)zz^{T}\right]\ \operatorname{\mathrm{d}}\mu(z),

(49)

where $\operatorname{\mathrm{d}}\Phi(e)^{*}:\mathbb{R}^{d\times d}\to\operatorname{Lie}(G)$ is the adjoint of $\operatorname{\mathrm{d}}\Phi(e):\operatorname{Lie}(G)\to\mathbb{R}^{d\times d}$ . If $G\subset\mathbb{R}^{d\times d}$ is a matrix Lie group and $\Phi$ is the identity representation, then $\operatorname{\mathrm{d}}\Phi(e)$ is the injection $\operatorname{Lie}(G)\hookrightarrow\mathbb{R}^{d\times d}$ . When $\operatorname{Lie}(G)\subset\mathbb{R}^{d\times d}$ inherits its inner product from $\mathbb{R}^{d\times d}$ , then $\operatorname{\mathrm{d}}\Phi(e)^{*}$ is the orthogonal projection of $\mathbb{R}^{d\times d}$ onto $\operatorname{Lie}(G)$ . For example, if $\Phi$ is the identity representation of $\operatorname{SE}(d)$ in $\mathbb{R}^{d+1}$ with the inner product on $\operatorname{\mathfrak{se}}(d)\subset\mathbb{R}^{(d+1)\times(d+1)}$ given by the usual inner product of matrices $\langle M_{1},M_{2}\rangle=\operatorname{Tr}(M_{1}^{T}M_{2})$ , then it can be readily verified that

\operatorname{\mathrm{d}}\Phi(e)^{*}\left(\begin{bmatrix}A&b\\ c^{T}&a\end{bmatrix}\right)=\begin{bmatrix}\frac{1}{2}(A-A^{T})&b\\ 0&0\end{bmatrix},\qquad A\in\mathbb{R}^{d\times d},\ b\in\mathbb{R}^{d},\ c\in\mathbb{R}^{d},\ a\in\mathbb{R}.

(50)

In practice, one can use sample points $z_{i}$ on the manifold to obtain a Monte-Carlo estimate of $S_{{\mathcal{M}}}$ with approximate projections $P_{z_{i}}$ computed using local principal component analysis (PCA), as described in Cahill et al. (2023). More accurate estimates of the tangent spaces can be obtained using the methods in Berry and Giannakis (2020). Assuming the $P_{z_{i}}$ are accurate, the following proposition shows that the correct Lie subalgebra of symmetries is revealed using finitely many sample points $z_{i}$ . However, this result does not tell us how many samples to use, or even when to stop sampling.

Proposition 11

Let $\mu$ be a strictly positive probability measure on a smooth manifold ${\mathcal{M}}$ such that $\langle\xi,S_{{\mathcal{M}}}\xi\rangle<\infty$ for every $\xi\in\operatorname{Lie}(G)$ . Let $z_{i}$ be drawn independently from the distribution $\mu$ and let $S_{m}:\operatorname{Lie}(G)\to\operatorname{Lie}(G)$ be defined by

\big{\langle}\eta,\ S_{m}\xi\big{\rangle}_{\operatorname{Lie}(G)}=\frac{1}{m}\sum_{i=1}^{m}\hat{\theta}(\eta)_{z_{i}}^{T}(I-P_{z_{i}})^{T}(I-P_{z_{i}})\hat{\theta}(\xi)_{z_{i}}.

(51)

Then there is almost surely an integer $M_{0}$ such that for every $m\geq M_{0}$ we have $\operatorname{Null}(S_{m})=\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ . We provide a proof in Appendix A.

7.2 Symmetries of functions as symmetries of submanifolds

The method described above for studying symmetries of submanifolds can be applied to reveal the symmetries of smooth maps between vector spaces by identifying the map $F:{\mathcal{V}}\to{\mathcal{W}}$ with its graph

\operatorname{gr}(F)=\{(x,F(x))\in{\mathcal{V}}\times{\mathcal{W}}\ :\ x\in{\mathcal{V}}\}.

(52)

The graph is a smooth, closed, embedded submanifold of the space ${\mathcal{V}}\times{\mathcal{W}}$ by Proposition 5.7 in Lee (2013). We show that this approach recovers the Lie derivative and our result in Theorem 4. By choosing bases for the domain and codomain, it suffices to consider smooth functions $F:\mathbb{R}^{m}\to\mathbb{R}^{n}$ .

Supposing that we have representations $\Phi_{\mathbb{R}^{m}}$ and $\Phi_{\mathbb{R}^{n}}$ of $G$ in the domain and codomain, we consider a combined representation

\Phi:g\mapsto\begin{bmatrix}\Phi_{\mathbb{R}^{m}}(g)&0\\ 0&\Phi_{\mathbb{R}^{n}}(g)\end{bmatrix}.

(53)

Defining a smoothly-varying family of projections

P_{(x,F(x))}=\begin{bmatrix}I&0\\ \operatorname{\mathrm{d}}F(x)&0\end{bmatrix}

(54)

onto $T_{(x,F(x))}\operatorname{gr}(F)$ , it is easy to check that

\begin{bmatrix}0\\ {\mathcal{L}}_{\xi}F(x)\end{bmatrix}=\underbrace{\left(\begin{bmatrix}I&0\\ 0&I\end{bmatrix}-\begin{bmatrix}I&0\\ \operatorname{\mathrm{d}}F(x)&0\end{bmatrix}\right)}_{I-P_{(x,F(x))}}\underbrace{\begin{bmatrix}\phi_{\mathbb{R}^{m}}(\xi)&0\\ 0&\phi_{\mathbb{R}^{n}}(\xi)\end{bmatrix}}_{\phi(\xi)}\begin{bmatrix}x\\ F(x)\end{bmatrix}.

(55)

We note that this is a special case of Theorem 39 describing the Lie derivative in terms of a projection onto the tangent space of a function’s graph. The resulting operator $S_{\operatorname{gr}(F)}$ defined by (48) is given by

\left\langle\eta,S_{\operatorname{gr}(F)}\xi\right\rangle_{\operatorname{Lie}(G)}=\int_{\mathbb{R}^{m}}({\mathcal{L}}_{\eta}F(x))^{T}{\mathcal{L}}_{\xi}F(x)\ \operatorname{\mathrm{d}}\mu(x),

(56)

for $\eta,\xi\in\operatorname{Lie}(G)$ and an appropriate positive measure $\mu$ on $\mathbb{R}^{m}$ that makes the integrals finite. Therefore, Theorem 4 is recovered from our result about symmetries of submanifolds stated in Theorem 10.

Related quantities have been used to study the symmetries of trained neural networks, with the $F$ being the network and its derivatives computed via back-propagation. The quantity $\left\langle\xi,S_{\operatorname{gr}(F)}\xi\right\rangle_{\operatorname{Lie}(G)}=\|{\mathcal{L}}_{\xi}F\|_{L^{2}(\mu)}$ was used by Gruver et al. (2022) to construct the Local Equivariant Error or (LEE), measuring the extent to which a trained neural network $F$ fails to respect symmetries in the one-parameter group $\{\exp(t\xi)\}_{t\in\mathbb{R}}$ . The nullspace of $\xi\mapsto{\mathcal{L}}_{\xi}F$ in the special case where $\Phi_{\mathbb{R}^{n}}(g)=I$ acts trivially was used by Moskalev et al. (2022) to identify the connected subgroup with respect to which a given network is invariant.

By viewing a function as a submanifold, we obtain a simple data-driven technique for estimating the Lie derivative and subgroup of symmetries of the function. To approximate ${\mathcal{L}}_{\xi}F$ , $S_{\operatorname{gr}(F)}$ , and $\operatorname{\mathfrak{sym}}_{G}(F)$ using input-output pairs $(x_{i},y_{i}=F(x_{i}))$ , one simply needs to approximate the projection in (54) using these data. To do this, we can obtain matrices $U_{i}$ with $m$ columns spanning $T_{(x_{i},y_{i})}\operatorname{gr}(F)$ by applying local PCA to the data $z_{i}=(x_{i},y_{i})$ , or by pruning the frames computed in Berry and Giannakis (2020). With $E=\begin{bmatrix}I_{m\times m}&0_{m\times n}\end{bmatrix}$ the projection in (54) is given by

P_{z_{i}}=U_{i}(EU_{i})^{-1}E

(57)

because any projection is uniquely determined by its range and nullspace (see Section 5.9 of Meyer (2000)). This gives us a simple way to approximate $({\mathcal{L}}_{\xi}F)(z_{i})$ , $S_{\operatorname{gr}(F)}$ , and $\operatorname{\mathfrak{sym}}_{G}(F)$ using the input-output pairs. However, many such pairs are needed since the tangent space to the graph of $F$ at $x_{i}$ is well-approximated by local PCA only when there are at least $m$ neighboring samples sufficiently close to $x_{i}$ . Even more samples are needed when they are noisy. The convergence properties of the spectral methods in Berry and Giannakis (2020) are better, but they still require enough samples to obtain accurate Monte-Carlo or quadrature-based estimates of integrals, in this case over $\mathbb{R}^{m}$ .

7.3 Symmetries and conservation laws of dynamical systems

Here, we consider the case when $F:\mathbb{R}^{n}\to\mathbb{R}^{n}$ is a smooth function defining a dynamical system

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}x(t)=F(x(t))

(58)

with state variables $x(t)\in\mathbb{R}^{n}$ . The solution of this equation is described by the flow map $\operatorname{Fl}:(t,x(\tau))\mapsto x(\tau+t)$ , which is defined on a maximal connected open set $D$ containing $0\times\mathbb{R}^{n}$ . In many cases we write $\operatorname{Fl}^{t}(\cdot)=\operatorname{Fl}(t,\cdot)$ . Given a Lie group representation $\Phi:G\to\operatorname{GL}(\mathbb{R}^{n})$ , equivariance for the dynamical system is defined as follows:

Definition 12

The dynamical system in (58) is equivariant with respect to a group element $g\in G$ if the flow map satisfies

{\mathcal{K}}_{g}\operatorname{Fl}^{t}(x):=\Phi(g)\operatorname{Fl}^{t}(\Phi(g)^{-1}x)=\operatorname{Fl}^{t}(x)

(59)

for every $(t,x)\in D$ .

Differentiating at $t=0$ shows that equivariance of the dynamical system implies that $F$ is equivariant in the sense of Definition 2. The converse is also true thanks to Corollary 9.14 in Lee (2013), meaning that equivariance for the dynamical system is equivalent to equivariance of $F$ . Therefore, we can study equivariance of the dynamical system in (58) by directly applying the tools developed in Section 5 to the function $F$ . Thanks to Theorem 4, identifying the connected subgroup of symmetries for the dynamical system is a simple matter of computing the nullspace of the linear map $\xi\mapsto{\mathcal{L}}_{\xi}F$ , that is

\operatorname{\mathfrak{sym}}_{G}(F)=\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F=0\}.

(60)

Here, the Lie derivative is given by

{\mathcal{L}}_{\xi}F(x)=\phi(\xi)F(x)-\frac{\partial F(x)}{\partial x}\phi(\xi)x=[\hat{\theta}(\xi),F](x),

(61)

where $[\hat{\theta}(\xi),F]$ is the Lie bracket of the infinitesimal generator defined by $\hat{\theta}(\xi)_{x}=-\phi(\xi)x$ and the vector field $F$ . Symmetries can also be enforced as linear constraints on $F$ described by Theorem 3. This was done by Ahmadi and Khadir (2020) for polynomial dynamical systems with discrete symmetries. Later on in Section 11.1 we show that analogous results apply to dynamical systems defined by vector fields on manifolds and nonlinear Lie group actions.

A conserved quantity for the system in (58) is defined as follows:

Definition 13

A scalar valued quantity $f:\mathbb{R}^{n}\to\mathbb{R}$ is said to be conserved when

{\mathcal{K}}_{t}f(x):=f(\operatorname{Fl}^{t}(x))=f(x)\qquad\forall(t,x)\in D.

(62)

In this setting, the composition operators ${\mathcal{K}}_{t}$ are often referred to as Koopman operators (see Koopman (1931); Mezić (2005); Mauroy et al. (2020); Otto and Rowley (2021); Brunton et al. (2022)). It is easy to see that a smooth function $f$ is conserved if and only if

{\mathcal{L}}_{F}f:=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{t}f=\frac{\partial f}{\partial x}F=0.

(63)

This relation is used by Kaiser et al. (2018, 2021) to identify conserved quantities by computing the nullspace of ${\mathcal{L}}_{F}$ restricted to finite-dimensional spaces of candidate functions. When the flow is defined for all $t\in\mathbb{R}$ , the operators ${\mathcal{K}}_{t}$ and ${\mathcal{L}}_{F}$ are the fundamental operators from Section 5 for the right action $\theta(x,t)=\operatorname{Fl}^{t}(x)$ and representation $\Phi(t)=I$ of the Lie group $G=(\mathbb{R},+)$ .

Remark 14

For Hamiltonian dynamical systems Noether’s theorem establishes a remarkable equivalence between the symmetries of the Hamiltonian and conserved quantities of the system. We study Hamiltonian systems later in Section 11.3.

8 Promoting symmetry with convex penalties

In this section we show how to design custom convex regularization functions to promote symmetries within a given candidate group during training of a machine learning model. This allows us to train a model with as many symmetries as possible from among the candidates, while breaking candidate symmetries only when the data provides sufficient evidence. We study both discrete and continuous groups of candidate symmetries. We quantify the extent to which symmetries within the candidate group are broken using the fundamental operators described in Section 5. For discrete groups we use the transformation operators $\{{\mathcal{K}}_{g}\}_{g\in G}$ and for continuous groups we use the Lie derivatives $\{{\mathcal{L}}_{\xi}\}_{\xi\in\operatorname{Lie}(G)}$ . In the continuous case we penalize a convex relaxation of the codimension of the subgroup of symmetries given by a nuclear norm (Schatten $1$ -norm) of the operator $\xi\mapsto{\mathcal{L}}_{\xi}F$ defined by (27); minimizing this codimension via the proxy nuclear norm will promote the largest nullspace possible, and hence the largest admissible symmetry group. Once these regularization functions are developed abstractly in Sections 8.1 and 8.2, we show how the approach can be applied to basis function regression (Section 8.3), symmetric function recovery (Section 9), and neural networks (Section 8.4).

As in Section 5, the basic building blocks of the machine learning models we consider are continuously differentiable ( $C^{1}$ ) functions $F:{\mathcal{V}}\to{\mathcal{W}}$ between finite-dimensional vector spaces. While we consider this restricted setting here, our results readily generalize to sections of vector bundles, as we describe later in Section 11. These functions could be layers of a multilayer perceptron, integral kernels to be applied to spatio-temporal fields, or simply linear combinations of user-specified basis functions in a regression task. We consider parametric models where $F$ is constrained to lie in a given finite-dimensional subspace ${\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}})$ of continuously differentiable functions. Working within a finite-dimensional subspace of functions will allow us to discretize the fundamental operators in Section 10.

We consider the same setting as Section 5, i.e., candidate symmetries are described by a Lie group $G$ acting on the domain and codomain of functions $F\in{\mathcal{F}}$ via a right action $\theta:{\mathcal{V}}\times G\to G$ and a representation $\Phi:G\to\operatorname{GL}({\mathcal{W}})$ . Equivariance in this setting is described by Definition 2. When fitting the function $F$ to data, our regularization functions penalize the size of $G\setminus\operatorname{Sym}_{G}(F)$ . For reasons that will become clear, we use different penalties corresponding to different notions of “size” when $G$ is a discrete group versus when $G$ is continuous. The main result describing the continuous symmetries of $F$ is Theorem 4.

8.1 Discrete symmetries

When the group $G$ has finitely many elements, one can measure the size of $G\setminus\operatorname{Sym}_{G}(F)$ simply by counting its elements:

R_{G,0}(F)=|G\setminus\operatorname{Sym}_{G}(F)|.

(64)

However, this penalty is impractical for optimization owing to its discrete values and nonconvexity. Letting $\|\cdot\|$ be any norm on the space ${\mathcal{F}}^{\prime\prime}=\operatorname{span}\{{\mathcal{K}}_{g}F\ :\ g\in G,\ F\in{\mathcal{F}}\}$ yields a convex relaxation of the above penalty given by

R_{G,1}(F)=\sum_{g\in G}\|{\mathcal{K}}_{g}F-F\|.

(65)

This is a convex function on ${\mathcal{F}}$ because ${\mathcal{K}}_{g}$ is a linear operator and vector space norms are convex. For example, if ${\boldsymbol{c}}=(c_{1},\ldots,c_{N})$ are the coefficients of $F$ in a basis for ${\mathcal{F}}^{\prime\prime}$ and ${\boldsymbol{K}}_{g}$ is the matrix of ${\mathcal{K}}_{g}$ in this basis, then the Euclidean norm can be used to define

R_{G,1}(F)=\sum_{g\in G}\|{\boldsymbol{K}}_{g}{\boldsymbol{c}}-{\boldsymbol{c}}\|_{2}.

(66)

This is directly analogous to the group sparsity penalty proposed in Yuan and Lin (2006).

8.2 Continuous symmetries

We now consider the case where $G$ is a Lie group of dimension greater than zero. Here we use the dimension of $\operatorname{Sym}_{G}(F)$ to measure the symmetry of $F$ , seeking to penalize the complementary dimension or “codimension”, given by

R_{G,0}(F)=\operatorname{codim}(\operatorname{Sym}_{G}(F))=\dim(G)-\dim(\operatorname{Sym}_{G}(F)).

(67)

We take this approach in the continuous case because it is no longer possible to simply count the number of broken symmetries. While it is possible in principle to replace the sum in (65) by an integral of $\|{\mathcal{K}}_{g}F-F\|$ over $g\in G$ , the numerical quadrature required to approximate it becomes prohibitive for higher-dimensional candidate groups. This difficulty is exacerbated by the fact that the integrand is not smooth. The space ${\mathcal{F}}^{\prime\prime}$ can also become infinite-dimensional when $G$ has positive dimension, making it challenging to compute the norm $\|\cdot\|$ .

In contrast, it is much easier to measure the “size” of a continuous symmetry group using its dimension because this can be computed via linear algebra. Specifically, the dimension of $\operatorname{Sym}_{G}(F)$ is equal to that of its Lie algebra. Thanks to Theorem 4, this is the nullspace of a linear operator $L_{F}:\operatorname{Lie}(G)\to C^{0}({\mathcal{V}};{\mathcal{W}})$ defined by

L_{F}:\xi\mapsto{\mathcal{L}}_{\xi}F,

(68)

where ${\mathcal{L}}_{\xi}$ is the Lie derivative in (22). By the rank and nullity theorem, the codimension of $\operatorname{Sym}_{G}(F)$ is equal to the rank of this operator:

R_{G,0}(F)=\operatorname{codim}(\operatorname{Sym}_{G}(F))=\operatorname{rank}(L_{F}).

(69)

Penalizing the rank of an operator is impractical for optimization owing to its discrete values and nonconvexity. A commonly used convex relaxation of the rank is provided by the Schatten $1$ -norm, also known as the “nuclear norm”, given by

R_{G,*}(F)=\|L_{F}\|_{*}=\sum_{i=1}^{\dim(G)}\sigma_{i}(L_{F}).

(70)

Here $\sigma_{i}(L_{F})$ denotes the $i$ th singular value of $L_{F}$ with respect to inner products on $\operatorname{Lie}(G)$ and ${\mathcal{F}}^{\prime}=\operatorname{span}\{{\mathcal{L}}_{\xi}F\ :\ \xi\in\operatorname{Lie}(G),\ F\in{\mathcal{F}}\}$ . This space is finite-dimensional, being spanned by $\{{\mathcal{L}}_{\xi_{i}}F_{j}\}_{i,j}$ where $\xi_{i}$ and $F_{j}$ are basis elements for $\operatorname{Lie}(G)$ and ${\mathcal{F}}$ . This enables computations with discrete inner products on ${\mathcal{F}}^{\prime}$ , as we describe in Section 10. For certain rank minimization problems, penalizing the nuclear norm is guaranteed to recover the true minimum rank solution (Candès and Recht, 2009; Recht et al., 2010; Gross, 2011).

The proposed regularization function (70) is convex on ${\mathcal{F}}$ because $F\mapsto L_{F}$ is linear and the nuclear norm is convex. For example, if $(c_{1},\ldots,c_{N})$ are the coefficients of $F$ in a basis $\{F_{1},\ldots,F_{N}\}$ for ${\mathcal{F}}$ and ${\boldsymbol{L}}_{F_{i}}$ are the matrices of $L_{F_{i}}$ in orthonormal bases for $\operatorname{Lie}(G)$ and ${\mathcal{F}}^{\prime}$ , then

R_{G,*}(F)=\|c_{1}{\boldsymbol{L}}_{F_{1}}+\cdots+c_{N}{\boldsymbol{L}}_{F_{N}}\|_{*}.

(71)

With $\{\xi_{1},\ldots,\xi_{\dim(G)}\}$ and $\{u_{1},\ldots,u_{N^{\prime}}\}$ being the orthonormal bases for $\operatorname{Lie}(G)$ and ${\mathcal{F}}^{\prime}$ , one can compute a store the rank- $3$ tensor $[{\boldsymbol{L}}_{F_{i}}]_{j,k}=\left\langle u_{j},\ {\mathcal{L}}_{\xi_{k}}F_{i}\right\rangle_{{\mathcal{F}}^{\prime}}$ . Practical methods for constructing and computing with inner products on ${\mathcal{F}}^{\prime}$ will be discussed in Section 10.

8.3 Promoting symmetry in basis function regression

To demonstrate how the symmetry-promoting regularization functions proposed above can be used in practice, consider a regression problem for a function $F:\mathbb{R}^{m}\to\mathbb{R}^{n}$ . It is common to parameterize this problem by expressing $F(x)=W\mathcal{D}(x)$ in a dictionary $\mathcal{D}:\mathbb{R}^{m}\to\mathbb{R}^{N}$ consisting of user-defined smooth functions with a matrix of weights $W\in\mathbb{R}^{n\times N}$ to be fit during the training process. For example, the sparse identification of nonlinear dynamics (SINDy) algorithm (Brunton et al., 2016) belongs to this type of learning, among other machine learning algorithms (Brunton and Kutz, 2022). The fundamental operators (Section 5) for this class of functions are given by

	$\displaystyle({\mathcal{K}}_{g}F)(x)-F(x)$	$\displaystyle=\Phi(g)W\mathcal{D}(\theta_{g}(x))-W\mathcal{D}(x),$		(72)
	$\displaystyle({\mathcal{L}}_{\xi}F)(x)$	$\displaystyle=\phi(\xi)W\mathcal{D}(x)+W\frac{\partial\mathcal{D}(x)}{\partial x}\hat{\theta}(\xi)_{x}.$		(73)

These can be used directly in (65) and (70) to construct symmetry-promoting regularization functions $R_{G}(W)$ that are convex with respect to the weight matrix $W$ . Given a training dataset consisting of input-output pairs $\{(x_{j},y_{j})\}_{j=1}^{M}$ we can seek a regularized least-squares fit by solving the convex optimization problem

\operatorname*{\min\!imize\enskip}_{W\in\mathbb{R}^{n\times N}}\frac{1}{M}\sum_{j=1}^{M}\|y_{j}-W\mathcal{D}(x_{j})\|^{2}+\gamma R_{G}(W{\mathcal{D}}).

(74)

Here, $\gamma\geq 0$ is a parameter controlling the strength of the regularization that can be determined using cross-validation. To examine when this approach could be beneficial, we study a simplified problem — symmetric function recovery — in Section 9, below.

Remark 15

The solutions $F=W{\mathcal{D}}$ of (74) do not depend on how the dictionary functions are normalized due to the fact that the function being minimized can be written entirely in terms of $F$ and the data $(x_{j},y_{j})$ . This is in contrast to other types of regularized regression problems that penalize the weights $W$ directly, and therefore depend on how the functions in ${\mathcal{D}}$ are normalized.

8.4 Promoting symmetry in neural networks

In this section we describe a convex regularizing penalty to promote $G$ -equivariance in feed-forward neural networks

F=F^{(L)}\circ\cdots\circ F^{(1)}

(75)

composed of layers $F^{(l)}:{\mathcal{V}}_{l-1}\to{\mathcal{V}}_{l}$ with group representations $\Phi_{l}:G\to\operatorname{GL}({\mathcal{V}}_{l})$ . Since the composition is $g$ -equivariant if every layer is $g$ -equivariant, the main idea is to measure the symmetries shared by all of the layers. Specifically, we aim to maximize the “size” of the subgroup

\bigcap_{l=1}^{L}\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)}=\{g\in G\ :\ {\mathcal{K}}_{g}F^{(l)}=F^{(l)},\ l=1,\ldots,L\}\subset\operatorname{Sym}_{G}(F),

(76)

where the notion of “size” we adopt depends on whether $G$ is discrete or continuous. The same ideas can be applied to neural networks acting on fields with layers defined by integral operators as described in Section 6.2. In this case we consider symmetries shared by all of the integral kernels.

We consider the case in which the trainable layers are elements of vector spaces ${\mathcal{F}}_{l}$ , over which the optimization is carried out. For example, each layer may be given by $F^{(l)}=W^{(l)}\mathcal{D}^{(l)}$ as in Section 8.3, where $W^{(l)}$ is a trainable weight matrix and $\mathcal{D}^{(l)}$ is a fixed dictionary of nonlinear functions. Alternatively, we could follow Finzi et al. (2021) and use trainable linear layers composed with fixed $G$ -equivariant nonlinearities. In contrast with Finzi et al. (2021), we do not force the linear layers to be $G$ -equivariant. Rather, we penalize the breaking of $G$ -symmetries in the linear layers as a means to regularize the neural network and to learn which subgroup of symmetries are compatible with the data and which are not.

As in Section 8.1, when $G$ is a discrete group with finitely many elements, a convex relaxation of the cardinality of $G\setminus\bigcap_{l=1}^{L}\operatorname{Sym}_{G}(F^{(l)})$ is

R_{G,1}\big{(}F^{(1)},\ldots,F^{(l)}\big{)}=\sum_{g\in G}\sqrt{\sum_{l=1}^{L}\big{\|}{\mathcal{K}}_{g}F^{(l)}-F^{(l)}\big{\|}^{2}}.

(77)

Again, this is analogous to the group-LASSO penalty developed in Yuan and Lin (2006).

When $G$ is a Lie group with nonzero dimension, we follow the approach in Section 8.2 using the following observation:

Proposition 16

The subgroup in (76) is closed and embedded in $G$ ; its Lie subalgebra is

\bigcap_{l=1}^{L}\operatorname{\mathfrak{sym}}_{G}\big{(}F^{(l)}\big{)}=\left\{\xi\in\operatorname{Lie}(G)\ :\ {\mathcal{L}}_{\xi}F^{(l)}=0,\ l=1,\ldots,L\right\}.

(78)

We provide a proof in Appendix A.

The Lie subalgebra in the proposition is equal to the nullspace of the linear operator $L_{F^{(1)},\ldots,F^{(L)}}:\operatorname{Lie}(G)\to\bigoplus_{l=1}^{L}C^{\infty}({\mathcal{V}}_{l-1};{\mathcal{V}}_{l})$ defined by

L_{F^{(1)},\ldots,F^{(L)}}:\xi\mapsto\big{(}{\mathcal{L}}_{\xi}F^{(1)},\ldots,{\mathcal{L}}_{\xi}F^{(L)}\big{)}.

(79)

By the rank and nullity theorem, minimizing the rank of this operator is equivalent to maximizing the dimension of the subgroup of symmetries shared by all of the layers in the network. As in Section 8.2, a convex relaxation of the rank is provided by the nuclear norm

R_{G,*}\big{(}F^{(1)},\ldots,F^{(l)}\big{)}=\big{\|}L_{F^{(1)},\ldots,F^{(L)}}\big{\|}_{*}=\left\|\begin{bmatrix}{\boldsymbol{L}}_{F^{(1)}}\\ \vdots\\ {\boldsymbol{L}}_{F^{(L)}}\end{bmatrix}\right\|_{*},

(80)

where ${\boldsymbol{L}}_{F^{(l)}}$ are the matrices of $L_{F^{(l)}}$ in orthonormal bases for $\operatorname{Lie}(G)$ and the associated spaces ${\mathcal{F}}_{l}^{\prime\prime}$ .

9 Numerical study of sample complexity to recover symmetric functions

Can promoting symmetry help us learn an unknown symmetric function using less data? To begin answering this question, we perform numerical experiments studying the amount of sampled data needed to recover structured polynomial functions on $\mathbb{R}^{n}$ of the form

	$\displaystyle F_{\text{rad}}(x)$	$\displaystyle=\varphi_{\text{rad}}\big{(}\\|x-c_{1}\\|_{2}^{2},\ \ldots,\ \\|x-c_{r}\\|_{2}^{2}\big{)}\quad\mbox{and}$		(81)
	$\displaystyle F_{\text{lin}}(x)$	$\displaystyle=\varphi_{\text{lin}}\big{(}u_{1}^{T}x,\ \ldots,\ u_{r}^{T}x\big{)}.$		(82)

These possess various rotation and translation invariances when $r<n$ , as characterized in detail below by Proposition 17 and its corollaries.

We aim to recover the unknown function $F_{*}$ within the space ${\mathcal{P}}_{d}(\mathbb{R}^{n})$ of polynomial functions on $\mathbb{R}^{n}$ with degrees up to $d=\deg F_{*}$ based on the values $y_{j}=F_{*}(x_{j})$ at sample points $x_{1},\ldots,x_{N}$ . Our approximation $\hat{F}$ of $F_{*}$ is computed by solving the convex optimization problem

\operatorname*{\min\!imize\enskip}_{F\in{\mathcal{P}}_{d}(\mathbb{R}^{n})}R_{G,*}(F)=\|L_{F}\|_{*}\qquad\text{s.t.}\qquad F(x_{j})=y_{j},\quad j=1,\ldots,N,

(83)

where $G$ is a candidate Lie group of symmetries acting on $\mathbb{R}^{n}$ . This was done using the CVXPY software package developed by Diamond and Boyd (2016); Agrawal et al. (2018). The nuclear norm in (83) was defined with respect to inner products on the corresponding Lie algebras given by $\langle\xi,\ \eta\rangle_{\operatorname{Lie}(G)}=\operatorname{Tr}(\xi^{T}\eta)$ . As we describe later in Section 10, the inner product on the space ${\mathcal{F}}^{\prime}$ containing the ranges of every $L_{F}$ was defined by (89) with unit weights $w_{i}=1$ and $M=\dim{\mathcal{P}}_{d}(\mathbb{R}^{n})$ points drawn uniformly from the cube $[-1,1]^{n}$ . Note that these discretization points were not the same as the sample points $x_{j}$ in (83). The validity of this inner product is guaranteed almost surely by Proposition 21.

To study the sample complexity for (83) to recover functions in the form of $F_{\text{rad}}$ and $F_{\text{lin}}$ , we perform multiple experiments using random realizations of these functions sampled at random points $x_{j}$ . In each experiment, the vectors $c_{i}$ were drawn uniformly from the cube $[-1,1]^{n}$ and the vectors $u_{i}$ were formed from the columns of a random $n\times r$ orthonormal matrix (specifically, the left singular vectors of an $n\times r$ matrix with standard Gaussian entries). The coefficients of $\varphi_{\text{rad}}$ and $\varphi_{\text{lin}}$ in a basis of monomials up to a specified degree were sampled uniformly from the interval $[0,1]$ . This yielded random polynomial functions $F_{\text{rad}}$ and $F_{\text{lin}}$ with degrees $\deg F_{\text{rad}}=2\deg\varphi_{\text{rad}}$ and $\deg F_{\text{lin}}=\deg\varphi_{\text{lin}}$ .

The sample points $x_{j}$ for each experiment were drawn uniformly from the cube $[-1,1]^{n}$ . A total of $\dim{\mathcal{P}}_{d}(\mathbb{R}^{n})$ with $d=\deg F_{*}$ sample points were drawn, which is sufficient to recover the function almost surely regardless of regularization. For each experiment we determine the smallest $N_{*}$ so that recovery is achieved by (83) with $\hat{F}=F_{*}$ using the sample points $x_{1},\ldots,x_{N}$ for every $N\geq N_{*}$ . To be precise, successful recovery is declared when all coefficients describing $\hat{F}$ and $F_{*}$ in the monomial basis for ${\mathcal{P}}_{d}(\mathbb{R}^{n})$ agree to a tolerance of $5\times 10^{-3}$ times the magnitude of the largest coefficient of $F_{*}$ . The range of values for $N_{*}$ across $10$ such random experiments provides an estimate of the sample complexity. In Figures 4, 5, and 6 we plot the range of values for $N_{*}$ as shaded regions with the average values displayed as a solid lines.

In Figure 4, we use the special Euclidean group $G=SE(n)$ as a candidate group to recover functions of the form $F_{\text{rad}}$ with the degree of $\varphi_{\text{rad}}$ specified. The number of radial features $r\leq\deg\varphi_{\text{rad}}$ is selected in accordance with Corollary 18 in order to ensure that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})={\mathfrak{g}}_{\text{rad}}$ has the known form and dimension stated in Proposition 17. The symmetry-promoting regularization significantly reduces the number of samples needed to recover $F_{\text{rad}}$ compared to the number of samples needed to solve the linear system specifying this function within the space of polynomials with the same or lesser degree. As the number of radial features $r$ increases, so does the sample complexity to recover $F_{\text{rad}}$ . This is likely due to the decreased dimension of $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})$ .

In Figures 5 and 6, we use $G=SE(n)$ and the group of translations $G=(\mathbb{R}^{n},+)$ as candidate symmetry groups to recover function of the form $F_{\text{lin}}$ with the degree of $\varphi_{\text{lin}}$ specified. Obviously, $F_{\text{lin}}$ has an $(n-d)$ -dimensional subgroup of translation symmetries orthogonal to $\operatorname{span}\{u_{1},\ldots,u_{r}\}$ . By Corollary 19, choosing $\deg\varphi_{\text{lin}}\geq 2$ is sufficient to ensure that $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}})={\mathfrak{g}}_{\text{lin}}$ has the known form and dimension stated in Proposition 17. The results in Figures 5 and 6 show that the symmetry-promoting regularization reduces the sample complexity to recover $F_{\text{lin}}$ . Moreover, fewer samples are needed when $F_{\text{lin}}$ depends on fewer linear features, as might be expected because the dimension of $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})$ increases as $r$ decreases.

Proposition 17

Let $r\leq n$ and suppose that $\{c_{k}-c_{1}\}_{k=2}^{r}$ and $\{u_{k}\}_{k=1}^{r}$ are sets of linearly-independent vectors in $\mathbb{R}^{n}$ . Then, $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})$ contains the $\frac{1}{2}(n-r)(n-r+1)$ -dimensional subalgebra

\mathfrak{g}_{\text{rad}}=\left\{\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\ :\ S^{T}=-S\ \mbox{and}\ Sc_{1}=\cdots=Sc_{r}=-v\right\}

(84)

and $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}})$ contains the $\frac{1}{2}(n-r)(n-r+1)$ -dimensional subalgebra

\mathfrak{g}_{\text{lin}}=\left\{\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\ :\ S^{T}=-S,\ Su_{1}=\cdots=Su_{r}=0,\ \mbox{and}\ u_{1}^{T}v=\cdots=u_{r}^{T}v=0\right\}.

(85)

Either every polynomial $\varphi_{\text{rad}}$ with degree $\leq d$ gives $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}}$ or the set of polynomials $\varphi_{\text{rad}}$ with degree $\leq d$ satisfying $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}}$ is a set of measure zero. Likewise, for $\varphi_{\text{lin}}$ , $F_{\text{lin}}$ , and $\mathfrak{g}_{\text{lin}}$ . See Appendix A for a proof.

Corollary 18

With the same hypotheses as Proposition 17, let $d\geq r$ . The set of polynomials $\varphi_{\text{rad}}$ with degree $\leq d$ satisfying $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})\neq\mathfrak{g}_{\text{rad}}$ is a set of measure zero. A proof is given in Appendix A.

Corollary 19

With the same hypotheses as Proposition 17, let $d\geq 2$ . The set of polynomials $\varphi_{\text{lin}}$ with degree $\leq d$ satisfying $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{lin}})\neq\mathfrak{g}_{\text{lin}}$ is a set of measure zero. A proof is given in Appendix A.

10 Discretizing the operators

This section describes how to construct matrices for the operators ${\mathcal{L}}_{\xi}$ and $L_{F}$ for continuously differentiable functions $F$ in a user-specified finite-dimensional subspace ${\mathcal{F}}\subset C^{1}({\mathcal{V}};{\mathcal{W}})$ . By choosing bases for the finite-dimensional vector spaces ${\mathcal{V}}$ and ${\mathcal{W}}$ , it suffices without loss of generality to consider the case in which ${\mathcal{V}}=\mathbb{R}^{m}$ and ${\mathcal{W}}=\mathbb{R}^{n}$ . We assume that $\operatorname{Lie}(G)$ and ${\mathcal{F}}$ are endowed with inner products and that $\{\xi_{1},\ldots\xi_{\dim(G)}\}$ and $\{F_{1},\ldots F_{\dim({\mathcal{F}})}\}$ are orthonormal bases for these spaces, respectively. The key task is to endow the finite-dimensional subspace

{\mathcal{F}}^{\prime}=\operatorname{span}\left\{{\mathcal{L}}_{\xi}F\ :\ \xi\in\operatorname{Lie}(G),\ F\in{\mathcal{F}}\right\}\subset C^{0}(\mathbb{R}^{m};\mathbb{R}^{n})

(86)

with a convenient inner product. Once this is done, an orthonormal basis $\{u_{1},\ldots,u_{N}\}$ for ${\mathcal{F}}^{\prime}$ can be constructed by applying a Gram-Schmidt process to the functions ${\mathcal{L}}_{\xi_{i}}F_{j}$ , which span ${\mathcal{F}}^{\prime}$ . Matrices for ${\mathcal{L}}_{\xi}$ and $L_{F}$ are then easily obtained by computing

\big{[}{\boldsymbol{{\mathcal{L}}}}_{\xi}\big{]}_{i,j}=\big{\langle}u_{i},\ {\mathcal{L}}_{\xi}F_{j}\big{\rangle}_{{\mathcal{F}}^{\prime}},\qquad\big{[}{\boldsymbol{L}}_{F}\big{]}_{i,k}=\big{\langle}u_{i},\ {\mathcal{L}}_{\xi_{k}}F\big{\rangle}_{{\mathcal{F}}^{\prime}}.

(87)

The issue at hand is to choose the inner product on ${\mathcal{F}}^{\prime}$ in a way that makes computing these matrices easy. A natural choice is to equip ${\mathcal{F}}^{\prime}$ with an $L^{2}(\mathbb{R}^{m},\mu;\mathbb{R}^{n})$ inner product where $\mu$ is a positive measure on $\mathbb{R}^{m}$ (such as a Guassian distribution) for which the $L^{2}$ norms of function in ${\mathcal{F}}^{\prime}$ are finite. The problem is that it is usually challenging or inconvenient to compute the required integrals

\big{\langle}{\mathcal{L}}_{\xi_{i}}F_{j},\ {\mathcal{L}}_{\xi_{k}}F_{l}\big{\rangle}_{L^{2}(\mu)}=\int_{\mathbb{R}^{m}}\big{(}{\mathcal{L}}_{\xi_{i}}F_{j}\big{)}(x)^{T}\big{(}{\mathcal{L}}_{\xi_{k}}F_{l}\big{)}(x)\operatorname{\mathrm{d}}\mu(x)

(88)

analytically. In this section we discuss inner products that are easy to compute in practice.

10.1 Numerical quadrature and Monte-Carlo

When (88) cannot be computed analytically, one can resort to a numerical quadrature or Monte-Carlo approximation. In both cases the integral is approximated by a weighted sum, yielding a semi-inner product

\langle f,\ g\rangle_{L^{2}(\mu_{M})}=\frac{1}{M}\sum_{i=1}^{M}w_{i}f(x_{i})^{T}g(x_{i})

(89)

that converges to $\langle f,\ g\rangle_{L^{2}(\mu)}$ as $M\to\infty$ . The following proposition means that we do not have to pass to the limit $M\to\infty$ in order to obtain a valid inner product defined by (89) on ${\mathcal{F}}^{\prime}$ .

Proposition 20

Suppose that ${\mathcal{F}}^{\prime}$ is a finite-dimensional and $\langle f,\ g\rangle_{L^{2}(\mu_{M})}\to\langle f,\ g\rangle_{L^{2}(\mu)}$ as $M\to\infty$ for every $f,g\in{\mathcal{F}}^{\prime}$ . Then there is an $M_{0}$ such that (89) is an inner product on ${\mathcal{F}}^{\prime}$ for every $M\geq M_{0}$ . We give a proof in Appendix A.

For example, in Monte-Carlo approximation, the samples $x_{i}$ are drawn independently from a distribution $\nu$ with the assumption that both $\mu$ and $\nu$ are $\sigma$ -finite and $\mu$ is absolutely continuous with respect to $\nu$ . The weights are given by the Radon-Nikodym derivative $w_{i}=\frac{\operatorname{\mathrm{d}}\mu}{\operatorname{\mathrm{d}}\nu}(x_{i})$ . Then for every $f,g\in L^{2}(\mu)$ the approximate integral converges $\langle f,\ g\rangle_{L^{2}(\mu_{M})}\to\langle f,\ g\rangle_{L^{2}(\mu)}$ as $M\to\infty$ almost surely thanks to the strong law of large numbers (see Theorem 7.7 in Koralov and Sinai (2012)). By the proposition, there is almost surely a finite $M_{0}$ such that (89) is an inner product on ${\mathcal{F}}^{\prime}$ for every $M\geq M_{0}$ .

10.2 Subspaces of polynomials

Here we consider the special case when ${\mathcal{F}}$ is a finite-dimensional subspace consisting of polynomial functions $\mathbb{R}^{m}\to\mathbb{R}^{n}$ . Examining the expression in (22), it is evident that ${\mathcal{L}}_{\xi}F$ is also a polynomial function $\mathbb{R}^{m}\to\mathbb{R}^{n}$ with degree not greater than that of $F\in{\mathcal{F}}$ . Thus, ${\mathcal{F}}^{\prime}$ is also a space of polynomial functions with degree not exceeding the maximum degree in ${\mathcal{F}}$ . Since a polynomial that vanishes on an open set must be identically zero, we can take the integrals defining the inner product in (89) over a cube, such as $[0,1]^{m}\subset\mathbb{R}^{m}$ . This is convenient because polynomial integrals over the cube can be calculated analytically.

We can also use the sample-based inner product in (89) with randomly chosen points $x_{i}$ and positive weights $w_{i}$ . The following proposition tells us exactly how many sample points we need.

Proposition 21

Let ${\mathcal{F}}^{\prime}$ be a space of real polynomial functions $\mathbb{R}^{m}\to\mathbb{R}^{n}$ and let $\pi_{i}:\mathbb{R}^{n}\to\mathbb{R}$ be the $i$ th coordinate projection $\pi(c_{1},\ldots,c_{n})=c_{i}$ . Let

M\geq M_{0}=\max_{1\leq i\leq n}\dim(\pi_{i}({\mathcal{F}}^{\prime}))

(90)

and let $w_{1},\ldots,w_{M}>0$ be positive weights. Then for almost every set of points $(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M}$ with respect to Lebesgue measure, (89) is an inner product on ${\mathcal{F}}^{\prime}$ . We give a proof in Appendix B.

This means that we can draw $M\geq M_{0}$ sample points independently from any absolutely continuous measure (such as a Gaussian distribution or the uniform distribution on a cube), and with probability one, (89) will be an inner product on ${\mathcal{F}}^{\prime}$ . When ${\mathcal{F}}$ consists of polynomials with degree at most $d$ , then taking

M\geq\left|\left\{(p_{0},\ldots,p_{m})\in\mathbb{N}_{0}^{m+1}\ :\ p_{0}+\cdots+p_{m}=d\right\}\right|=\binom{d+m}{m}

(91)

is sufficient.

11 Generalization to sections of vector bundles

The machinery for promoting, discovering, and enforcing symmetry of maps $F:{\mathcal{V}}\to{\mathcal{W}}$ between finite-dimensional vector spaces is a special case of more general machinery for sections of vector bundles presented here. Applications of this more general framework include studying the symmetries of vector fields, tensor fields, dynamical systems, and integral operators manifolds with respect to nonlinear group actions (Abraham et al., 1988). We rely heavily on background, definitions, and results that can be found in Lee (2013) and Kolář et al. (1993).

First, we provide some background on smooth vector bundles that can be found in Lee (2013, Chapter 10). A rank- $k$ smooth vector bundle $E$ is a collection of $k$ -dimensional vector spaces $E_{p}$ , called “fibers”, organized smoothly over a base manifold ${\mathcal{M}}$ . This fibers are organized by the “bundle projection” $\pi:E\to{\mathcal{M}}$ , a surjective map whose preimages are the fibers $E_{p}=\pi^{-1}(p)$ . Smoothness means that $\pi$ is a smooth submersion where $E$ is a smooth manifold covered by smooth local trivializations

\psi_{\alpha}:\pi^{-1}({\mathcal{U}}_{\alpha})\subset E\to{\mathcal{U}}_{\alpha}\times\mathbb{R}^{k}

with $\{{\mathcal{U}}_{\alpha}\}_{\alpha\in{\mathcal{A}}}$ being open subsets covering ${\mathcal{M}}$ . The transition functions between local trivializations are $\mathbb{R}^{k}$ -linear, meaning that there are smooth matrix-valued functions ${\boldsymbol{T}}_{\alpha,\beta}:{\mathcal{U}}_{\alpha}\cap{\mathcal{U}}_{\beta}\to\mathbb{R}^{k\times k}$ satisfying

\psi_{\alpha}\circ\psi_{\beta}^{-1}(p,{\boldsymbol{v}})=(p,{\boldsymbol{T}}_{\alpha,\beta}(p){\boldsymbol{v}})

(92)

for every $p\in{\mathcal{U}}_{\alpha}\cap{\mathcal{U}}_{\beta}$ and ${\boldsymbol{v}}\in\mathbb{R}^{k}$ . The bundle with this structure is often denoted $\pi:E\to{\mathcal{M}}$ .

A “section” of the rank- $k$ vector bundle $\pi:E\to{\mathcal{M}}$ is a map $F:{\mathcal{M}}\to E$ satisfying $\pi\circ F=\operatorname{Id}_{{\mathcal{M}}}$ . The space of (possibly rough) sections, denoted $\operatorname{\Sigma}(E)$ , is a vector space with addition and scalar multiplication defined pointwise in each fiber $E_{p}$ . We equip $\operatorname{\Sigma}(E)$ with the topology of pointwise convergence, making it into a locally-convex space. The space of sections possessing $m$ continuous derivatives is denoted $C^{m}({\mathcal{M}},E)$ , with the space of merely continuous sections being $C({\mathcal{M}},E)=C^{0}({\mathcal{M}},E)$ and the space of smooth sections being $C^{\infty}({\mathcal{M}},E)$ . A vector bundle and a section are depicted in Figure 7, along with the fundamental operators for a group action that we introduce below.

We consider a smooth “fiber-linear” right $G$ -action $\Theta:E\times G\to E$ , meaning that every $\Theta_{g}=\Theta(\cdot,g):E\to E$ is a smooth vector bundle homomorphism. In other words, $\Theta$ descends under the bundle projection $\pi$ to a unique smooth right $G$ -action $\theta:{\mathcal{M}}\times G\to{\mathcal{M}}$ so that the diagram

(93)

commutes and the restricted maps $\Theta_{g}|_{E_{p}}:E_{p}\to E_{\theta(p,g)}$ are linear. We define what it means for a section to be symmetric with respect to this action as follows:

Definition 22

A section $F\in\Sigma(E)$ is equivariant with respect to a transformation $g\in G$ if

{\mathcal{K}}_{g}F:=\Theta_{g^{-1}}\circ F\circ\theta_{g}=F.

(94)

These transformations form a subgroup of $G$ denoted $\operatorname{Sym}_{G}(F)$ .

The operators ${\mathcal{K}}_{g}$ are depicted in Figure 7. Thanks to the vector bundle homomorphism properties of $\Theta_{g^{-1}}$ , the operators ${\mathcal{K}}_{g}:\operatorname{\Sigma}(E)\to\operatorname{\Sigma}(E)$ are well-defined and linear. Moreover, they form a group under composition ${\mathcal{K}}_{g_{1}}{\mathcal{K}}_{g_{2}}={\mathcal{K}}_{g_{1}\cdot g_{2}}$ , with inverses given by ${\mathcal{K}}_{g}^{-1}={\mathcal{K}}_{g^{-1}}$ .

The “infinitesimal generator” of the group action is the linear map $\hat{\Theta}:\operatorname{Lie}(G)\to\mathfrak{X}(E)$ defined by

\hat{\Theta}(\xi)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\Theta_{\exp(t\xi)}.

(95)

It turns out that this vector field is $\Theta$ -related to $0\times\xi\in\mathfrak{X}({\mathcal{M}}\times G)$ (see Lemma 5.13 in Kolář et al. (1993), Lemma 20.14 in Lee (2013)), meaning that the flow of $\hat{\Theta}(\xi)$ is given by

\operatorname{Fl}_{\hat{\Theta}(\xi)}^{t}=\Theta_{\exp(t\xi)}.

(96)

Likewise, $\theta_{\exp(t\xi)}$ is the flow of $\hat{\theta}(\xi)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\theta_{\exp(t\xi)}\in\mathfrak{X}({\mathcal{M}})$ , which is $\pi$ -related to $\hat{\Theta}(\xi)$ .

Differentiating the smooth curves $t\mapsto{\mathcal{K}}_{\exp(t\xi)}F(p)$ lying in $E_{p}$ for each $p\in{\mathcal{M}}$ gives rise to the Lie derivative ${\mathcal{L}}_{\xi}:D({\mathcal{L}}_{\xi})\subset\operatorname{\Sigma}(E)\to\operatorname{\Sigma}(E)$ along $\xi\in\operatorname{Lie}(G)$ defined by

\boxed{{\mathcal{L}}_{\xi}F=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\exp(t\xi)}F=\lim_{t\to 0}\frac{1}{t}\left(\Theta_{\exp(-t\xi)}\circ F\circ\theta_{\exp(t\xi)}-F\right),}

(97)

where $F\in D({\mathcal{L}}_{\xi})$ if and only if the limit converges in $\operatorname{\Sigma}(E)$ , i.e., pointwise. Note that we implicitly identify $T_{F(p)}E_{p}\cong E_{p}$ . This construction is illustrated in Figure 7. We recover (22) from (97) in the special case where a smooth function $F:{\mathcal{V}}\to{\mathcal{W}}$ is viewed as a section $x\mapsto(x,F(x))$ of the bundle $\pi:{\mathcal{V}}\times{\mathcal{W}}\to{\mathcal{V}}$ and acted upon by group representations. Critically, the Lie derivative ${\mathcal{L}}_{\xi}$ , as defined above, is a linear operator on sections of the vector bundle $E$ . This allows us to formulate convex symmetry-promoting regularization functions as in Section 8 using Lie derivatives in the broader setting of vector bundle sections.

Remark 23 (Lie derivatives using flows)

Thanks to (96), the Lie derivative defined in (97) only depends on the infinitesimal generator $\hat{\Theta}(\xi)\in\mathfrak{X}(E)$ , and its flow for small time $t$ . Hence, any vector field in $\mathfrak{X}(E)$ whose flow is fiber-linear, but not necessarily defined for all $t\in\mathbb{R}$ , gives rise to an analogously-defined Lie derivative acting linearly on $\operatorname{\Sigma}(E)$ . These are the so-called “linear vector fields” described by Kolář et al. (1993) in Section 47.9. In fact, more general versions of the Lie derivative based on flows for maps between manifolds are described by Kolář et al. (1993) in Chapter 11. However, these generalizations are nonlinear operators, destroying the convex properties of the symmetry-promoting regularization functions in Section 8.

In addition to linearity, the key properties of the operators ${\mathcal{K}}_{g}$ and ${\mathcal{L}}_{\xi}$ for studying symmetries of sections are:

Proposition 24

For every $\xi,\eta\in\operatorname{Lie}(G)$ , and $\alpha,\beta,t\in\mathbb{R}$ , we have

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F\qquad\forall F\in D({\mathcal{L}}_{\xi}),

(98)

{\mathcal{L}}_{\alpha\xi+\beta\eta}F=\big{(}\alpha{\mathcal{L}}_{\xi}+\beta{\mathcal{L}}_{\eta}\big{)}F\qquad\forall F\in C^{1}({\mathcal{M}},E),

(99)

and

{\mathcal{L}}_{[\xi,\eta]}F=\big{(}{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}\big{)}F\qquad\forall F\in C^{2}({\mathcal{M}},E).

(100)

We give a proof in Appendix C.

Taken together, these results mean that $\Pi:g\mapsto{\mathcal{K}}_{g}$ and $\Pi_{*}:\xi\mapsto{\mathcal{L}}_{\xi}$ are (infinite-dimensional) representations of $G$ and $\operatorname{Lie}(G)$ in $C^{\infty}({\mathcal{M}},E)$ .

The main results of this section are the following two theorems. The first completely characterizes the identity component of $\operatorname{Sym}_{G}(F)$ by correspondence with its Lie subgalgebra (see Theorem 19.26 in Lee (2013)). The second gives necessary and sufficient conditions for a section to be $G$ -equivariant.

Theorem 25

If $F\in C({\mathcal{M}},E)$ is a continuous section, then $\operatorname{Sym}_{G}(F)$ is a closed, embedded Lie subgroup of $G$ whose Lie subalgebra is

\operatorname{\mathfrak{sym}}_{G}(F)=\left\{\xi\in\operatorname{Lie}(G)\ :\ F\in D({\mathcal{L}}_{\xi})\ \mbox{and}\ {\mathcal{L}}_{\xi}F=0\right\}.

(101)

We give a proof in Appendix D.

Theorem 26

Suppose that $G$ has $n_{G}$ connected components with $G_{0}$ being the component containing the identity element. Let $\xi_{1},\ldots,\xi_{q}$ generate $\operatorname{Lie}(G)$ and let $g_{1},\ldots,g_{n_{G}-1}$ be elements from each non-identity component of $G$ . A continuous section $F\in C({\mathcal{M}},E)$ is $G_{0}$ -equivariant if and only if

F\in D({\mathcal{L}}_{\xi_{i}})\quad\mbox{and}\quad{\mathcal{L}}_{\xi_{i}}F=0,\qquad i=1,\ldots,q.

(102)

If, in addition, we have

{\mathcal{K}}_{g_{j}}F=F,\qquad j=1,\ldots,n_{G}-1,

(103)

then $F$ is $G$ -equivariant. We give a proof in Appendix E.

These results allow us to promote, enforce, and discover symmetries for sections of vector bundles in fundamentally the same way we did for maps between finite-dimensional vector spaces in Sections 6, 7, and 8. In particular, symmetry can be enforced through analogous linear constraints, discovered through nullspaces of analogous operators, and promoted through analogous convex penalties based on the nuclear norm.

Remark 27 (Left actions)

Theorems 26 and 25 hold without any modification for left G-actions $\Theta^{L}:G\times E\to E$ . This is because we can define a corresponding right $G$ -action by $\Theta^{R}(p,g)=\Theta^{L}(g^{-1},p)$ with associated operators related by

{\mathcal{K}}^{R}_{g}={\mathcal{K}}^{L}_{g^{-1}}\qquad\mbox{and}\qquad{\mathcal{L}}^{R}_{\xi}=-{\mathcal{L}}^{L}_{\xi}.

(104)

The symmetry group $\operatorname{Sym}_{G}(F)$ does not depend on whether it is defined by the condition ${\mathcal{K}}^{R}_{g}F=F$ or by ${\mathcal{K}}^{L}_{g}F=F$ . It is slighly less natural to work with left actions because $\Pi^{L}:g\mapsto{\mathcal{K}}^{L}_{g}$ and $\Pi^{L}_{*}:\xi\mapsto{\mathcal{L}}^{L}_{\xi}$ are Lie group and Lie algebra anti-homomorphisms, that is,

\Pi^{L}(g_{1}g_{2})=\Pi^{L}(g_{2})\Pi^{L}(g_{1})\qquad\mbox{and}\qquad\Pi^{L}_{*}\big{(}[\xi,\eta]\big{)}=\big{[}\Pi^{L}_{*}(\eta),\Pi^{L}_{*}(\xi)\big{]}.

(105)

11.1 Vector fields

Here we study the symmetries of a vector field $V\in\mathfrak{X}({\mathcal{M}})$ under a right $G$ -action $\theta:{\mathcal{M}}\times G\to{\mathcal{M}}$ . This allows us to extend the discussion in Section 7.3 to dynamical systems described by vector fields on smooth manifolds and acted upon nonlinearly by arbitrary Lie groups. The tangent map of the diffeomorhpism $\theta_{g}=\theta(\cdot,g):{\mathcal{M}}\to{\mathcal{M}}$ transforms vector fields via the pushforward map $(\theta_{g})_{*}:\mathfrak{X}({\mathcal{M}})\to\mathfrak{X}({\mathcal{M}})$ defined by

((\theta_{g})_{*}V)_{p\cdot g}=\operatorname{\mathrm{d}}\theta_{g}(p)V_{p}

(106)

for every $p\in{\mathcal{M}}$ .

Definition 28

Given $g\in G$ , we say that a vector field $V\in\mathfrak{X}({\mathcal{M}})$ is $g$ -invariant if and only if $(\theta_{g})_{*}V=V$ , that is,

V_{p\cdot g}=\operatorname{\mathrm{d}}\theta_{g}(p)V_{p}\qquad\forall p\in{\mathcal{M}}.

(107)

Because $(\theta_{g^{-1}})_{*}(\theta_{g})_{*}=(\theta_{g^{-1}}\circ\theta_{g})_{*}=\operatorname{Id}_{\mathfrak{X}({\mathcal{M}})}$ , it is clear that a vector field is $g$ -invariant if and only if it is $g^{-1}$ -invariant.

Recall that vector fields $V\in\mathfrak{X}({\mathcal{M}})$ are smooth sections of the tangent bundle $E=T{\mathcal{M}}$ . The right $G$ -action $\theta$ on ${\mathcal{M}}$ induces a right $G$ -action $\Theta:T{\mathcal{M}}\times G\to T{\mathcal{M}}$ on the tangent bundle defined by

\Theta_{g}(v_{p})=\operatorname{\mathrm{d}}\theta_{g}(p)v_{p}.

(108)

It is easy to see that each $\Theta_{g}$ is a vector bundle homomorphism descending to $\theta_{g}$ under the bundle projection $\pi$ . Crucially, we have

{\mathcal{K}}_{g}V=\Theta_{g^{-1}}\circ V\circ\theta_{g}=(\theta_{g^{-1}})_{*}V,

(109)

meaning that a vector field $V\in\mathfrak{X}({\mathcal{M}})$ is $g$ -invariant if and only if it is $g$ -equivariant as a section of $T{\mathcal{M}}$ with respect to the action $\Theta$ . Recall that (by Lemma 20.14 in Lee (2013)) the left-invariant vector field $\xi\in\operatorname{Lie}(G)\subset\mathfrak{X}(G)$ and its infinitesimal generator $\hat{\theta}(\xi)\in\mathfrak{X}({\mathcal{M}})$ are $\theta^{(p)}$ -related, where $\theta^{(p)}:g\mapsto\theta(p,g)$ is the orbit map. This means that $\theta_{\exp(t\xi)}$ is the time- $t$ flow of $\hat{\theta}(\xi)$ by Proposition 9.13 in Lee (2013). As a result, the Lie derivative in (97) agrees with the standard Lie derivative of $V$ along $\hat{\theta}(\xi)$ (see Lee (2013, p.228)), that is,

\boxed{{\mathcal{L}}_{\xi}V(p)=\lim_{t\to 0}\frac{1}{t}\left[\operatorname{\mathrm{d}}\theta_{\exp(-t\xi)}(\theta_{\exp(t\xi)}(p))V_{\theta_{\exp(t\xi)}(p)}-V_{p}\right]=[\hat{\theta}(\xi),V]_{p},}

(110)

where the expression on the right is the Lie bracket of $\hat{\theta}(\xi)$ and $V$ .

11.2 Tensor fields

Symmetries of a tensor field can also be revealed using our framework. This will be useful for our study of Hamiltonian dynamics in Section 11.3 and for our study of integral operators, whose kernels can be viewed as tensor fields, in Section 11.4. For simplicity, we consider a rank- $k$ covariant tensor field $A\in\mathfrak{T}^{k}({\mathcal{M}})$ , although our results extend to contravariant and mixed tensor fields with minimal modification. We rely on the basic definitions and machinery found in Lee (2013, Chapter 12). Under a right $G$ -action $\theta$ on ${\mathcal{M}}$ , the tensor field transforms via the pullback map $\theta_{g}^{*}:\mathfrak{T}^{k}({\mathcal{M}})\to\mathfrak{T}^{k}({\mathcal{M}})$ defined by

(\theta_{g}^{*}A)_{p}(v_{1},\ldots,v_{k})=(\operatorname{\mathrm{d}}\theta_{g}(p)^{*}A_{p\cdot g})(v_{1},\ldots,v_{k})=A_{p\cdot g}(\operatorname{\mathrm{d}}\theta_{g}(p)v_{1},\ldots,\operatorname{\mathrm{d}}\theta_{g}(p)v_{k})

(111)

for every $v_{1},\ldots,v_{k}\in T_{p}{\mathcal{M}}$ .

Definition 29

Given $g\in G$ , a tensor field $A\in\mathfrak{T}^{k}({\mathcal{M}})$ is $g$ -invariant if and only if $\theta_{g}^{*}A=A$ , that is,

A_{p\cdot g}(\operatorname{\mathrm{d}}\theta_{g}(p)v_{1},\ldots,\operatorname{\mathrm{d}}\theta_{g}(p)v_{k})=A_{p}(v_{1},\ldots,v_{k})\qquad\forall p\in{\mathcal{M}}.

(112)

To study the invariance of tensor fields in our framework, we recall that a tensor field is a section of the tensor bundle $E=T^{k}T^{*}{\mathcal{M}}=\coprod_{p\in{\mathcal{M}}}(T_{p}^{*}{\mathcal{M}})^{\otimes k}$ , a vector bundle over ${\mathcal{M}}$ , where $T_{p}^{*}{\mathcal{M}}$ is the dual space of $T_{p}{\mathcal{M}}$ . The right $G$ -action $\theta$ on ${\mathcal{M}}$ induces a right $G$ -action $\Theta:T^{k}T{\mathcal{M}}\times G\to T^{k}T{\mathcal{M}}$ defined by

\Theta_{g}(A_{p})=\operatorname{\mathrm{d}}\theta_{g^{-1}}(\theta_{g}(p))^{*}A_{p}.

(113)

It is clear that each $\Theta_{g}$ is a homomorphism of the vector bundle $T^{k}T^{*}{\mathcal{M}}$ descending to $\theta_{g}$ under the bundle projection. Crucially, we have

{\mathcal{K}}_{g}A=\Theta_{g^{-1}}\circ A\circ\theta_{g}=\theta_{g}^{*}A,

(114)

meaning that $A\in\mathfrak{T}^{k}({\mathcal{M}})$ is $g$ -invariant if and only if it is $g$ -equivariant as a section of $T^{k}T^{*}{\mathcal{M}}$ with respect to the action $\Theta$ . Since $\theta_{\exp(t\xi)}$ gives the time- $t$ flow of the vector field $\hat{\theta}(\xi)\in\mathfrak{X}({\mathcal{M}})$ , the Lie derivative in (97) for this action agrees with the standard Lie derivative of $A\in\mathfrak{T}^{k}({\mathcal{M}})$ along $\hat{\theta}(\xi)$ (see Lee (2013, p.321)), that is

\boxed{({\mathcal{L}}_{\xi}A)_{p}=\lim_{t\to 0}\frac{1}{t}\left[\operatorname{\mathrm{d}}\theta_{\exp(t\xi)}(p)^{*}A_{\theta_{\exp(t\xi)}(p)}-A_{p}\right]=({\mathcal{L}}_{\hat{\theta}(\xi)}A)_{p}.}

(115)

The Lie derivative for arbitrary covariant tensor fields can be computed by applying Proposition 12.32 in Lee (2013) and its corollaries. More generally, thanks to 6.16-18 in Kolář et al. (1993), the Lie derivative for any tensor product of sections of natural vector bundles can be computed via the formula

{\mathcal{L}}_{\xi}(A_{1}\otimes A_{2})=({\mathcal{L}}_{\xi}A_{1})\otimes A_{2}+A_{1}\otimes({\mathcal{L}}_{\xi}A_{2}).

(116)

For example, this holds when $A_{1},A_{2}$ are arbitrary smooth tensor fields of mixed types. The Lie derivative of a differential form $\omega$ on ${\mathcal{M}}$ can be computed by Cartan’s magic formula

{\mathcal{L}}_{\xi}\omega=\hat{\theta}(\xi)\operatorname{{\lrcorner}}(\operatorname{\mathrm{d}}\omega)+\operatorname{\mathrm{d}}(\hat{\theta}(\xi)\operatorname{{\lrcorner}}\omega),

(117)

where $\operatorname{\mathrm{d}}$ is the exterior derivative.

11.3 Hamiltonian dynamics

The dynamics of frictionless mechanical systems can be described by Hamiltonian vector fields on symplectic manifolds. Roughly speaking, these encompass systems that conserve energy, such as motion of rigid bodies and particles interacting via conservative forces. The celebrated theorem of Noether (1918) says that conserved quantities of Hamiltonian systems correspond with symmetries of the energy function (the system’s Hamiltonian). In this section, we briefly illustrate how to enforce Hamiltonicity constraints on learned dynamical systems and how to promote, discover, and enforce conservation laws. A thorough treatment of classical mechanics, symplectic manifolds, and Hamiltonian systems can be found in Abraham and Marsden (2008); Marsden and Ratiu (1999). This includes methods for reduction of systems with known symmetries and conservation laws. The following brief introduction follows Chapter 22 of Lee (2013).

Hamiltonian systems are defined on symplectic manifolds. That is, a smooth even-dimensional manifold ${\mathcal{S}}$ together with a smooth, nondegenerate, closed differential $2$ -form $\omega$ , called the symplectic form. Nondegeneracy means that the map $\hat{\omega}_{p}:v\mapsto\omega_{p}(v,\cdot)$ is a bijective linear map of $T_{p}{\mathcal{S}}$ onto its dual $T_{p}^{*}{\mathcal{S}}$ for every $p\in{\mathcal{S}}$ . Closure means that $\operatorname{\mathrm{d}}\omega=0$ , where $\operatorname{\mathrm{d}}$ is the exterior derivative. Thanks to nondegeneracy, any smooth function $H\in C^{\infty}({\mathcal{S}})$ gives rise to a smooth vector field

V_{H}=\hat{\omega}^{-1}(\operatorname{\mathrm{d}}H)

(118)

called the “Hamiltonian vector field” of $H$ . A vector field $V\in\mathfrak{X}({\mathcal{S}})$ is said to be Hamiltonian if $V=V_{H}$ for some function $H$ , called the Hamiltonian of $V$ . A vector field is locally Hamiltonian if it is Hamiltonian in neighborhood of each point of ${\mathcal{S}}$ .

The symplectic manifolds considered in classical mechanics usually consist of the cotangent bundle ${\mathcal{S}}=T^{*}{\mathcal{M}}$ of an $m$ -dimensional manifold ${\mathcal{M}}$ describing the “configuration” of the system, e.g., the positions of particles. The cotangent bundle has a canonical symplectic form given by

\omega=\sum_{i=1}^{m}\operatorname{\mathrm{d}}x^{i}\wedge\operatorname{\mathrm{d}}\xi_{i},

(119)

where $(x^{i},\xi_{i})$ are any choice of natural coordinates on a patch of $T^{*}{\mathcal{M}}$ (see Proposition 22.11 in Lee (2013)). Here, each $x^{i}$ is a generalized coordinate describing the configuration and $\xi_{i}$ is its “conjugate” or “generalized” momentum. The Darboux theorem (Theorem 22.13 in Lee (2013)) says that any symplectic form on a manifold can be put into the form of (119) by a choice of local coordinates. In these “Darboux” coordinates, the dynamics of a Hamiltonian system are governed by the equations

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}x^{i}=V_{H}(x^{i})=\frac{\partial H}{\partial\xi_{i}},\qquad\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\xi_{i}=V_{H}(\xi_{i})=-\frac{\partial H}{\partial x^{i}},

(120)

which should be familiar to anyone who has studied undergraduate mechanics.

Enforcing local Hamiltonicity on a vector field $V\in\mathfrak{X}({\mathcal{S}})$ is equivalent to the linear constraint

{\mathcal{L}}_{V}\omega=0

(121)

thanks to Proposition 22.17 in Lee (2013). Here ${\mathcal{L}}_{V}$ is the Lie derivative of the tensor field $\omega\in\mathfrak{T}^{2}({\mathcal{S}})$ , i.e., (115) with $\theta$ being the flow of $V$ and its generator being the identity $\hat{\theta}(V)=V$ . Note that the Lie derivative still makes sense even when the orbits $t\mapsto\theta_{t}(p)=\theta_{\exp(t1)}(p)$ are only defined for small $t\in(-\varepsilon,\varepsilon)$ . In Darboux coordinates, this constraint is equivalent to the set of equations

\frac{\partial V(x^{i})}{\partial x^{j}}+\frac{\partial V(\xi_{j})}{\partial\xi_{i}}=0,\qquad\frac{\partial V(\xi_{i})}{\partial x^{j}}-\frac{\partial V(\xi_{j})}{\partial x^{i}}=0,\qquad\frac{\partial V(x^{i})}{\partial\xi_{j}}-\frac{\partial V(x^{j})}{\partial\xi_{i}}=0

(122)

for all $1\leq i,j\leq m$ . When the first de Rham cohomology group satisfies $H^{1}_{\text{dR}}({\mathcal{S}})=0$ , for example when ${\mathcal{S}}$ is contractible, local Hamilonicity implies the existence of a global Hamiltonian for $V$ , unique on each component of ${\mathcal{S}}$ up to addition of a constant by Lee (2013, Proposition 22.17).

Of course our approach also makes it possible to promote Hamiltonicity with respect to candidate symplectic structures when learning a vector field $V$ . To do this, we can penalize the nuclear norm of ${\mathcal{L}}_{V}$ restricted to a subspace $\tilde{\Omega}$ of candidate closed $2$ -forms using the regularization function

R_{\tilde{\Omega},*}(V)=\big{\|}\left.{\mathcal{L}}_{V}\right|_{\tilde{\Omega}}\big{\|}_{*}.

(123)

The strength of this penalty can be increased when solving a regression problem for $V$ until there is a non-degenerate $2$ -form in the nullspace $\operatorname{Null}({\mathcal{L}}_{V})\cap\tilde{\Omega}$ . This gives a symplectic form with respect to which $V$ is locally Hamiltonian.

Another option is to learn a (globally-defined) Hamiltonian function $H$ directly by fitting $V_{H}$ to data. In this case, we can regularize the learning problem by penalizing the breaking of conservation laws. The time-derivative of a quantity, that is, a smooth function $f\in C^{\infty}({\mathcal{S}})$ under the flow of $V_{H}$ is given by the Poisson bracket

\{f,H\}:=\omega(V_{f},V_{H})=\operatorname{\mathrm{d}}f(V_{H})=V_{H}(f)=-V_{f}(H).

(124)

Hence, $f$ is a conserved quantity if and only if $H$ is invariant under the flow of $V_{f}$ — this is Noether’s theorem. It is also evident that the Poisson bracket is linear with respect to both of its arguments. In fact, the Poisson bracket turns $C^{\infty}({\mathcal{S}})$ into a Lie algebra with $f\mapsto V_{f}$ being a Lie algebra homomorphism, i.e., $V_{\{f,g\}}=[V_{f},V_{g}]$ .

As a result of these basic properties of the Poisson bracket, the quantities conserved by a given Hamiltonian vector field $V_{H}$ form a Lie subalgebra given by the nullspace of a linear operator $L_{H}:C^{\infty}({\mathcal{S}})\mapsto C^{\infty}({\mathcal{S}})$ defined by

L_{H}:f\mapsto\{f,H\}.

(125)

To promote conservation of quantities in a given subalgebra ${\mathfrak{g}}\subset C^{\infty}({\mathcal{S}})$ when learning a Hamiltonian $H$ , we can penalize the nuclear norm of $L_{H}$ restricted to ${\mathfrak{g}}$ , that is

R_{{\mathfrak{g}},*}(H)=\big{\|}\left.L_{H}\right|_{{\mathfrak{g}}}\big{\|}_{*}.

(126)

For example, we might expect a mechanical system to conserve angular momentum about some axes, but not others due to applied torques. In the absence of data to the contrary, it often makes sense to assume that various linear and angular momenta are conserved.

11.4 Multilinear integral operators

In this section we provide machinery to study the symmetries of linear and nonlinear integral operators acting on sections of vector bundles, yielding far-reaching generalizations of our results in Section 6.2. Such operators can form the layers of neural networks acting on various vector and tensor fields supported on manifolds.

Let $\pi_{0}:E_{0}\to{\mathcal{M}}_{0}$ and $\pi_{j}:E_{j}\to{\mathcal{M}}_{j}$ be vector bundles with ${\mathcal{M}}_{j}$ being $d_{j}$ -dimensional orientable Riemannian manifolds with volume forms $\operatorname{dV}_{j}\in\Omega^{d_{j}}(T^{*}{\mathcal{M}}_{j})$ , $j=1,\ldots,r$ . Note that here, $\operatorname{dV}_{j}$ does not denote the exterior derivative of a $(d_{j}-1)$ -form. A continuous section $K$ of the bundle

E=E_{0}\otimes E_{1}^{*}\otimes\cdots\otimes E_{r}^{*}:=\coprod_{(p,q_{1},\ldots,q_{r})\in{\mathcal{M}}_{0}\times\cdots\times{\mathcal{M}}_{r}}E_{0,p}\otimes E_{1,q_{1}}^{*}\otimes\cdots\otimes E_{r,q_{r}}^{*}

(127)

can be viewed as a continuous family of $r$ -multilinear maps

K(p,q_{1},\ldots,q_{r}):\bigoplus_{j=1}^{r}E_{j,q_{j}}\to E_{0,p}.

(128)

The section $K$ can serve as the kernel to define an $r$ -multilinear integral operator ${\mathcal{T}}_{K}:D({\mathcal{T}}_{K})\subset\bigoplus_{j=1}^{r}\operatorname{\Sigma}(E_{j})\to\operatorname{\Sigma}(E_{0})$ with action on $(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K})$ given by

{\mathcal{T}}_{K}[F_{1},\ldots,F_{r}](p)=\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(p,q_{1},\ldots,q_{r})\big{[}F_{1}(q_{1}),\ldots,F_{r}(q_{r})\big{]}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\operatorname{dV}_{r}(q_{r}).

(129)

This operator is linear when $r=1$ . When $r>1$ and $E_{1}=\cdots=E_{r}$ , (129) can be used to define a nonlinear integral operator $\operatorname{\Sigma}(E_{1})\to\operatorname{\Sigma}(E_{0})$ with action $F\mapsto{\mathcal{T}}_{K}[F,\ldots,F]$ .

Given fiber-linear right $G$ -actions $\Theta_{j}:E_{j}\times G\to E_{j}$ , there is an induced fiber-linear right $G$ -action $\Theta:E\times G\to E$ on $E$ defined by

\Theta_{g}(K_{p,q_{1},\ldots,q_{r}})\big{[}v_{1},\ldots,v_{r}\big{]}=\Theta_{0,g}\left(K_{p,q_{1},\ldots,q_{r}}\big{[}\Theta_{1,g^{-1}}(v_{1}),\ldots,\Theta_{r,g^{-1}}(v_{r})\big{]}\right)

(130)

for $K_{p,q_{1},\ldots,q_{r}}\in E$ viewed as an $r$ -multilinear map $E_{1,q_{1}}\oplus\cdots\oplus E_{r,q_{r}}\to E_{0,p}$ and $v_{j}\in E_{j,\theta_{j,g}(q_{j})}$ . Sections $F_{j}\in\operatorname{\Sigma}(E_{j})$ transform according to

{\mathcal{K}}^{E_{j}}_{g}F_{j}=\Theta_{j,g^{-1}}\circ F_{j}\circ\theta_{j,g},

(131)

with the section defining the integral kernel transforming according to

{\mathcal{K}}^{E}_{g}K(p,q_{1},\ldots,q_{r})[v_{q_{1}},\ldots,v_{q_{r}}]=\\ \Theta_{0,g^{-1}}\Big{\{}K\big{(}\theta_{0,g}(p),\theta_{1,g}(q_{1}),\ldots,\theta_{r,g}(q_{r})\big{)}\big{[}\Theta_{1,g}(v_{q_{1}}),\ldots,\Theta_{r,g}(v_{q_{r}})\big{]}\Big{\}}.

(132)

Using these transformation laws, we define equivariance for the integral operator as follows:

Definition 30

The integral operator in (129) is equivariant with respect to $g\in G$ if

{\mathcal{K}}^{E_{0}}_{g}{\mathcal{T}}_{K}\big{[}{\mathcal{K}}^{E_{1}}_{g^{-1}}F_{1},\ldots,{\mathcal{K}}^{E_{r}}_{g^{-1}}F_{r}\big{]}={\mathcal{T}}_{K}\big{[}F_{1},\ldots,F_{r}\big{]}

(133)

for every $(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K})$ .

This definition is equivalent to the following condition on the integral kernel:

Lemma 31

The integral operator in (129) is equivariant with respect to $g\in G$ if and only if

{\mathcal{K}}_{g}\big{(}K\operatorname{dV}_{1}\wedge\cdots\wedge V_{r}\big{)}:={\mathcal{K}}^{E}_{g}(K)\ \theta_{1,g}^{*}\operatorname{dV}_{1}\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}=K\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r},

(134)

where ${\mathcal{K}}_{g}^{E}$ is defined by (132). A proof is available in Appendix A.

We note that ${\mathcal{K}}^{\Omega}_{g}(\operatorname{dV}_{j})=\theta_{j,g}^{*}\operatorname{dV}_{j}$ is the natural transformation of the differential form $\operatorname{dV}_{j}\in\Omega^{d_{j}}(T^{*}{\mathcal{M}}_{j})$ (a covariant tensor field) described in Section 11.2. The Lie derivative of this action on volume forms is given by

{\mathcal{L}}^{\Omega}_{\xi}\operatorname{dV}=\operatorname{\mathrm{d}}\big{(}\hat{\theta}(\xi)\operatorname{{\lrcorner}}\operatorname{dV}\big{)}=\operatorname{div}\hat{\theta}(\xi)\operatorname{dV}

(135)

thanks to Cartan’s magic formula and the definition of divergence (see Lee (2013)). Therefore, differentiating (134) along the curve $g(t)=\exp(t\xi)$ yields the Lie derivative

{\mathcal{L}}_{\xi}\big{(}K\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}\big{)}=\bigg{[}{\mathcal{L}}^{E}_{\xi}(K)+K\sum_{j=1}^{r}\operatorname{div}\hat{\theta}_{j}(\xi)\bigg{]}\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}.

(136)

Here, ${\mathcal{L}}_{\xi}^{E}$ is the Lie derivative associated with (132). For the integral operators discussed in Section 6.2, the formulas derived here recover Eqs. 38 and 40.

12 Invariant submanifolds and tangency

Studying the symmetries of smooth maps can be cast into a more general framework in which we study the symmetries of submanifolds. Specifically, the symmetries of a smooth map $F:{\mathcal{M}}_{0}\to{\mathcal{M}}_{1}$ between manifolds correspond to symmetries of its graph, $\operatorname{gr}(F)$ , and the symmetries of a smooth section of a vector bundle $F\in C^{\infty}({\mathcal{M}},E)$ correspond to symmetries of its image, $\operatorname{im}(F)$ — both of which are properly embedded submanifolds of ${\mathcal{M}}_{0}\times{\mathcal{M}}_{1}$ and $E$ , respectively. We show that symmetries of a large class of submanifolds, including the above, are revealed by checking whether the infinitesimal generators of the group action are tangent to the submanifold. In this setting, the Lie derivative of $F\in C^{\infty}({\mathcal{M}},E)$ has a geometric interpretation as a projection of the infintesimal generator onto the tangent space of the image $\operatorname{im}(F)$ , viewed as a submanifold of the bundle.

12.1 Symmetry of submanifolds

In this section we study the infinitesimal conditions for a submanifold to be invariant under the action of a Lie group. Suppose that ${\mathcal{N}}$ is a manifold and $\theta:{\mathcal{N}}\times G\to{\mathcal{N}}$ is a right action of a Lie group $G$ on ${\mathcal{N}}$ . Sometimes we denote this action by $p\cdot g=\theta(p,g)$ when there is no ambiguity. Though our results also hold for left actions, as we discuss later in Remark 37, working with right actions is standard in this context and allows us to leverage results from Lee (2013) more naturally in our proofs. Fixing $p\in{\mathcal{N}}$ , the orbit map of this action is denoted $\theta^{(p)}:G\to{\mathcal{N}}$ . Fixing $g\in G$ , the map $\theta_{g}:{\mathcal{N}}\to{\mathcal{N}}$ defined by $\theta_{g}:p\mapsto\theta(p,g)$ is a diffeomorphism with inverse $\theta_{g^{-1}}$ .

Definition 32

A subset ${\mathcal{M}}\subset{\mathcal{N}}$ is $G$ -invariant if and only if $\theta(p,g)\in{\mathcal{M}}$ for every $g\in G$ and $p\in{\mathcal{M}}$ .

Sometimes we will denote ${\mathcal{M}}\cdot G=\{\theta(p,g)\ :\ p\in{\mathcal{M}},\ g\in G\}$ , in which case $G$ -invariance of ${\mathcal{M}}$ can be stated as ${\mathcal{M}}\cdot G\subset{\mathcal{M}}$ .

We study the group invariance of submanifolds of the following type:

Definition 33

Let ${\mathcal{M}}$ be a weakly embedded $m$ -dimensional submanifold of an $n$ -dimensional manifold ${\mathcal{N}}$ . We say that ${\mathcal{M}}$ is arcwise-closed if any smooth curve $\gamma:[a,b]\to{\mathcal{N}}$ satisfying $\gamma((a,b))\subset{\mathcal{M}}$ must also satisfy $\gamma([a,b])\subset{\mathcal{M}}$ .

Submanifolds of this type include all properly embedded submanifolds of ${\mathcal{N}}$ because properly embedded submanifolds are closed subsets (Proposition 5.5 in Lee (2013)). More interestingly, we have the following:

Proposition 34

The leaves of any (nonsingular) foliation of ${\mathcal{N}}$ are arcwise-closed. We provide a proof in Appendix A.

This means that the kinds of submanifolds we are considering include all possible Lie subgroups (Lee (2013, Theorem 19.25)) as well as their orbits under free and proper group actions (Lee (2013, Proposition 21.7)). The leaves of singular foliations associated with integrable distributions of nonconstant rank (see Kolář et al. (1993, Sections 3.18–25)) can fail to be arcwise-closed. For example, the distribution spanned by the vector field $x\frac{\partial}{\partial x}$ on $\mathbb{R}$ has maximal integral manifolds $(-\infty,0)$ , $\{0\}$ , and $(0,\infty)$ forming a singular foliation of $\mathbb{R}$ . Obviously, the leaves $(-\infty,0)$ and $(0,\infty)$ are not arcwise-closed.

Given a submanifold and a candidate group of transformations, the following theorem describes the largest connected Lie subgroup of symmetries of the submanifold. Specifically, these symmetries can be identified by checking tangency conditions between infinitesimal generators and the submanifold.

Theorem 35

Let ${\mathcal{M}}$ be an immersed submanifold of ${\mathcal{N}}$ and let $\theta:{\mathcal{N}}\times G\to{\mathcal{N}}$ be a right action of a Lie group $G$ on ${\mathcal{N}}$ with infinitesimal generator $\hat{\theta}:\operatorname{Lie}(G)\to\mathfrak{X}({\mathcal{N}})$ . Then

\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\big{\{}\xi\in\operatorname{Lie}(G)\ :\ \hat{\theta}(\xi)_{p}\in T_{p}{\mathcal{M}}\quad\forall p\in{\mathcal{M}}\big{\}}

(137)

is the Lie subalgebra of a unique connected Lie subgroup $H\subset G$ . If ${\mathcal{M}}$ is weakly-embedded and arcwise-closed in ${\mathcal{N}}$ , then this subgroup has the following properties:

(i)

${\mathcal{M}}\cdot H\subset{\mathcal{M}}$
(ii)

If $\tilde{H}$ is a connected Lie subgroup of $G$ such that ${\mathcal{M}}\cdot\tilde{H}\subset{\mathcal{M}}$ , then $\tilde{H}\subset H$ .

If ${\mathcal{M}}$ is properly embedded in ${\mathcal{N}}$ then $H$ is the identity component of the closed, properly embedded Lie subgroup

\operatorname{Sym}_{G}({\mathcal{M}})=\{g\in G\ :\ {\mathcal{M}}\cdot g\subset{\mathcal{M}}\}.

(138)

A proof is provided in Appendix F.

Since the infinitesimal generator $\hat{\theta}$ is a linear map and $T_{p}{\mathcal{M}}$ is a subspace of $T_{p}{\mathcal{N}}$ , the tangency conditions defining the Lie subalgebra (137) can be viewed as a set of linear equality constraints on the elements $\xi\in\operatorname{Lie}(G)$ . Hence, $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ can be computed as the nullspace of a positive semidefinite operator on $\operatorname{Lie}(G)$ , defined analogously to the case described earlier in Section 7.1.

The following theorem provides necessary and sufficient conditions for arcwise-closed weakly-embedded submanifolds to be $G$ -invariant. These are generally nonlinear constraints on the submanifold, regarded as the zero section of its normal bundle under identification with a tubular neighborhood. However, we will show in Section 12.2 that these become linear constraints recovering the Lie derivative when the submanifold in question is the image of a section of a vector bundle and the group action is fiber-linear.

Theorem 36

Let ${\mathcal{M}}$ be an arcwise-closed weakly-embedded submanifold of ${\mathcal{N}}$ and let $\theta:{\mathcal{N}}\times G\to{\mathcal{N}}$ be a right action of a Lie group $G$ on ${\mathcal{N}}$ with infinitesimal generator $\hat{\theta}:\operatorname{Lie}(G)\to\mathfrak{X}({\mathcal{N}})$ . Let $\xi_{1},\ldots,\xi_{q}$ generate $\operatorname{Lie}(G)$ and let $g_{1},\ldots,g_{n_{G}-1}$ be elements from each non-identity component of $G$ . Then ${\mathcal{M}}$ is $G_{0}$ -invariant if and only if

\hat{\theta}(\xi_{i})_{p}\in T_{p}{\mathcal{M}}\qquad\forall p\in{\mathcal{M}},\quad i=1,\ldots,q.

(139)

If, in addition, we have ${\mathcal{M}}\cdot g_{j}\subset{\mathcal{M}}$ for every $j=1,\ldots,n_{G}-1$ , then ${\mathcal{M}}$ is $G$ -invariant. A proof is provided in Appendix G.

Remark 37 (Left actions)

When the group $G$ acts on ${\mathcal{N}}$ from the left according to $\theta^{L}:G\times{\mathcal{N}}\to{\mathcal{N}}$ , we can always construct an equivalent right-action $\theta^{R}:{\mathcal{N}}\times{\mathcal{N}}\to{\mathcal{N}}$ by setting $\theta^{R}(p,g)=\theta^{L}(g^{-1},p)$ . The corresponding infinitesimal generators are related by $\hat{\theta}^{R}=-\hat{\theta}^{L}$ . Since $\hat{\theta}^{L}(\xi)_{p}\in T_{p}{\mathcal{M}}$ if and only if $\hat{\theta}^{R}(\xi)_{p}\in T_{p}{\mathcal{M}}$ , Theorems 36 and 35 hold without modification for left $G$ -actions.

12.2 The Lie derivative as a projection

We provide a geometric interpretation of the Lie derivative in (97) by expressing it in terms of a projection of the infinitesimal generator of the group action onto the tangent space of $\operatorname{im}(F)$ for smooth sections $F\in C^{\infty}({\mathcal{M}},E)$ . This allows us to connect the Lie derivative to the tangency conditions for symmetry of submanifolds presented in Section 12.1.

The Lie derivative ${\mathcal{L}}_{\xi}F(p)$ lies in $E_{p}$ , while $T_{F(p)}\operatorname{im}(F)$ is a subspace of $T_{F(p)}E$ . To relate quantities in these different spaces, the following lemma introduces a lifting of each $E_{p}$ to a subspace of $T_{F(p)}E$ .

Lemma 38

For every smooth section $F\in C^{\infty}({\mathcal{M}},E)$ there is a well-defined injective vector bundle homomorphism $\imath_{F}:E\to TE$ that is expressed in any local trivialization $\Phi:\pi^{-1}({\mathcal{U}})\to{\mathcal{U}}\times\mathbb{R}^{k}$ as

	$\displaystyle\operatorname{\mathrm{d}}\Phi\circ\imath_{F}\circ\Phi^{-1}:{\mathcal{U}}\times\mathbb{R}^{k}$	$\displaystyle\to T({\mathcal{U}}\times\mathbb{R}^{k})$		(140)
	$\displaystyle(p,{\boldsymbol{v}})$	$\displaystyle\mapsto(0,{\boldsymbol{v}})_{\Phi(F(p))}.$		(140)

We give a proof in Appendix H.

This is a special case of the “vertical lift” of $E$ into the vertical bundle $VE=\{v\in TE\ :\ \operatorname{\mathrm{d}}\pi v=0\}$ described by Kolář et al. (1993) in Section 6.11. The “vertical projection” $\operatorname{vpr}_{E}:VE\to E$ provides a left-inverse satisfying $\operatorname{vpr}_{E}\circ\imath_{F}=\operatorname{Id}_{E}$ .

The following result relates the Lie derivative to a projection via the vertical lifting.

Theorem 39

Given a smooth section $F\in C^{\infty}({\mathcal{M}},E)$ and $p\in{\mathcal{M}}$ , the map $P_{F(p)}:=\operatorname{\mathrm{d}}(F\circ\pi)(F(p)):T_{F(p)}E\to T_{F(p)}E$ is a linear projection onto $T_{F(p)}\operatorname{im}(F)$ and for every $\xi\in\operatorname{Lie}(G)$ we have

\imath_{F}\circ({\mathcal{L}}_{\xi}F)(p)=-\big{[}I-P_{F(p)}\big{]}\hat{\Theta}(\xi)_{F(p)}.

(141)

We give a proof in Appendix H.

For the special case of smooth maps $F:\mathbb{R}^{m}\to\mathbb{R}^{n}$ viewed a sections $x\mapsto(x,F(x))$ of the bundle $\pi:\mathbb{R}^{m}\times\mathbb{R}^{n}\to\mathbb{R}^{m}$ , this theorem reproduces (55). The following corollary provides a link between our main results for sections of vector bundles and our main results for symmetries of submanifolds.

Corollary 40

For every smooth section $F\in C^{\infty}({\mathcal{M}},E)$ , $\xi\in\operatorname{Lie}(G)$ , and $p\in{\mathcal{M}}$ we have

({\mathcal{L}}_{\xi}F)(p)=0\qquad\Leftrightarrow\qquad\hat{\Theta}(\xi)_{F(p)}\in T_{F(p)}\operatorname{im}(F).

(142)

In particular, this means that for smooth sections, Theorems 26 and 25 are special cases of Theorems 36 and 35.

13 Conclusion

This paper provides a unified theoretical approach to enforce, discover, and promote symmetries in machine learning models. In particular, we provide theoretical foundations for Lie group symmetry in machine learning from a linear-algebraic viewpoint. This perspective unifies and generalizes several leading approaches in the literature, including approaches for incorporating and uncovering symmetries in neural networks and more general machine learning models. The central objects in this work are linear operators describing the finite and infinitesimal transformations of smooth sections of vector bundles with fiber-linear Lie group actions. To make the paper accessible to a wide range of practitioners, Sections 4–10 deal with the special case where the machine learning models are built using smooth functions between vector spaces. Our main results establish that the infinitesimal operators — the Lie derivatives — fully encode the connected subgroup of symmetries for sections of vector bundles (resp. functions between vector spaces). In other words, the Lie derivatives encode symmetries that the machine learning models are equivariant with respect to.

We illustrate that promoting and enforcing continuous symmetries in large classes of machine learning models are dual problems with respect to the bilinear structure of the Lie derivative. Moreovery, these ideas extend naturally to identify continuous symmetries of arbitrary submanifolds, recovering the Lie derivative when the submanifold is the image of a section of a vector bundle (resp., the graph of a function between vector spaces). Using the fundamental operators, we also describe how symmetries can be promoted as inductive biases during training of machine learning models using convex penalties. Our numerical results show that minimizing these convex penalties can be used to recover highly symmetric polynomial functions using fewer samples than are required to determine the polynomial coefficients directly as the solution of a linear system. This reduction in sample complexity becomes more pronounced in higher dimensions and with increasing symmetry of the function to be recovered. Finally, we provide rigorous data-driven methods for discretizing and approximating the fundamental operators to accomplish the tasks of enforcing, promoting, and discovering symmetry. Importantly, these theoretical concepts, while extremely general, admit efficient computational implementations via simple linear algebra.

The main limitations of our approach come from the need to make appropriate choices for key objects including the candidate group $G$ , the space of functions ${\mathcal{F}}$ defining the machine learning model, and appropriate inner products for discretizing the fundamental operators. For example, it is possible that the only $G$ -symmetric functions in ${\mathcal{F}}$ are trivial, meaning that enforcing symmetry results in learning only trivial models. One open question is whether our framework can be used in such cases to learn relaxed symmetries, as described by Wang et al. (2022). In other words, we may hope to find elements in ${\mathcal{F}}$ that are nearly symmetric, and to bound the degree of asymmetry based on the quantities derived from the fundamental operators, such as their norms. Additionally, the choice of inner products associated with the discretization of the fundamental operators could affect the results of nuclear norm penalization. Our reliance on the Lie algebra to study continuous symmetries also limits the ability of our proposed methods to account for partial symmetries, such as the invariance in classifying the characters “Z” and “N” to rotations by small angles, but not large angles.

In follow-up work, we aim to apply the proposed methods to a wide range of examples, and to explain practical implementation details. A main goal will be to study the extent to which nuclear norm relaxation can recover underlying symmetry groups and reduce the amount of data required to train accurate machine learning models on realistic data sets. Additionally, we will examine how the proposed techniques perform in the presence of noisy data, with the goal of understanding the empirical effects of problem dimension, noise level, and the candidate symmetry group.

Other important avenues of future work include investigating computationally efficient approaches to discretize the fundamental operators and use them to enforce, discover, and promote symmetry within our framework. This could involve leveraging sparse structure of the discretized operators in certain bases to enable the use of efficient Krylov subspace algorithms. It will also be useful to identify efficient optimization algorithms for training symmetry-constrained or symmetry-regularized machine learning models. Promising candidates include projected gradient descent, proximal splitting algorithms, and the Iteratively Reweighted Least Squares (IRLS) algorithms described by Mohan and Fazel (2012). Using IRLS could enable symmetry-promoting penalties to be based on non-convex Schatten $p$ -norms with $0<p<1$ , potentially improving the recovery of underlying symmetry groups compared to the nuclear norm where $p=1$ .

There are also several avenues we plan to explore in future theoretical work. These include extending the techniques presented here via jet bundle prolongation, as described by Olver (1986), to study symmetries in machine learning for Partial Differential Equations (PDEs). Combining analogues of our proposed methods in this setting with techniques using the weak formulation proposed by Messenger and Bortz (2021); Reinbold et al. (2020) could provide robust ways to identify symmetric PDEs in the presence of high noise and limited training data. We also aim to study the perturbative effects of noisy data in algorithms to discover and promote symmetry with the goal of understanding the effects of problem dimension, noise level, and number of data points on recovery of symmetry groups. Another important direction of theoretical study will be to build on the work of Peitz et al. (2023); Steyert (2022) by studying symmetry in the setting of Koopman operators for dynamical systems. To do this, one might follow the program set forth by Colbrook (2023), where the measure preserving property of certain dynamical systems is exploited to enhance the Extended Dynamic Mode Decomposition (EDMD) algorithm of Williams et al. (2015).

Acknowledgements

The authors acknowledge support from the National Science Foundation AI Institute in Dynamic Systems (grant number 2112085). SLB acknowledges support from the Army Research Office (ARO W911NF-19-1-0045) and the Boeing Company. The authors would also like to acknowledge valuable discussions with Tess Smidt and Matthew Colbrook.

References

Abraham et al. (1988) R. Abraham, J. E. Marsden, and T. Ratiu. Manifolds, Tensor Analysis, and Applications, volume 75 of Applied Mathematical Sciences. Springer-Verlag, 1988.
Abraham and Marsden (2008) Ralph Abraham and Jerrold E Marsden. Foundations of mechanics. AMS Chelsea Publishing, 2 edition, 2008.
Agrawal et al. (2018) Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
Ahmadi and Khadir (2020) Amir Ali Ahmadi and Bachir El Khadir. Learning dynamical systems with side information. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, pages 718–727. PMLR, 10–11 Jun 2020. URL https://proceedings.mlr.press/v120/ahmadi20a.html.
Akhound-Sadegh et al. (2024) Tara Akhound-Sadegh, Laurence Perreault-Levasseur, Johannes Brandstetter, Max Welling, and Siamak Ravanbakhsh. Lie point symmetry and physics-informed networks. Advances in Neural Information Processing Systems, 36, 2024.
Baddoo et al. (2023) Peter J Baddoo, Benjamin Herrmann, Beverley J McKeon, J Nathan Kutz, and Steven L Brunton. Physics-informed dynamic mode decomposition. Proceedings of the Royal Society A, 479(2271):20220576, 2023.
Batzner et al. (2022) Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
Benton et al. (2020) Gregory Benton, Marc Finzi, Pavel Izmailov, and Andrew G Wilson. Learning invariances in neural networks from training data. Advances in neural information processing systems, 33:17605–17616, 2020.
Berry and Giannakis (2020) Tyrus Berry and Dimitrios Giannakis. Spectral exterior calculus. Communications on Pure and Applied Mathematics, 73(4):689–770, 2020.
Boullé and Townsend (2023) Nicolas Boullé and Alex Townsend. A mathematical guide to operator learning. arXiv preprint arXiv:2312.14688, 2023.
Bouwmans et al. (2018) Thierry Bouwmans, Sajid Javed, Hongyang Zhang, Zhouchen Lin, and Ricardo Otazo. On the applications of robust PCA in image and video processing. Proceedings of the IEEE, 106(8):1427–1457, 2018.
Brandstetter et al. (2022) Johannes Brandstetter, Max Welling, and Daniel E Worrall. Lie point symmetry data augmentation for neural pde solvers. In International Conference on Machine Learning, pages 2241–2256. PMLR, 2022.
Brunton and Kutz (2022) S. L. Brunton and J. N. Kutz. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022.
Brunton et al. (2016) Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
Brunton et al. (2020) Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos. Machine learning for fluid mechanics. Annual Review of Fluid Mechanics, 52:477–508, 2020.
Brunton et al. (2022) Steven L Brunton, Marko Budišić, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
Cahill et al. (2023) Jameson Cahill, Dustin G Mixon, and Hans Parshall. Lie PCA: Density estimation for symmetric manifolds. Applied and Computational Harmonic Analysis, 2023.
Callaham et al. (2022) Jared L Callaham, Georgios Rigas, Jean-Christophe Loiseau, and Steven L Brunton. An empirical mean-field model of symmetry-breaking in a turbulent wake. Science Advances, 8(eabm4786), 2022.
Candès and Recht (2009) Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9:717–772, 2009.
Candès et al. (2011) Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
Caron and Traynor (2005) Richard Caron and Tim Traynor. The zero set of a polynomial. WSMR Report 05-02, 2005. URL https://www.researchgate.net/profile/Richard-Caron-3/publication/281285245_The_Zero_Set_of_a_Polynomial/links/55df56b608aecb1a7cc1a043/The-Zero-Set-of-a-Polynomial.pdf.
Champion et al. (2020) Kathleen Champion, Peng Zheng, Aleksandr Y Aravkin, Steven L Brunton, and J Nathan Kutz. A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access, 8:169259–169271, 2020.
Chen et al. (2020) Shuxiao Chen, Edgar Dobriban, and Jane H. Lee. A group-theoretic framework for data augmentation. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
Cohen and Welling (2016) Taco Cohen and Max Welling. Group equivariant convolutional networks. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2990–2999, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/cohenc16.html.
Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. arXiv preprint arXiv:1801.10130, 2018.
Cohen et al. (2019) Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant CNNs on homogeneous spaces. Advances in neural information processing systems, 32, 2019.
Colbrook (2023) Matthew J Colbrook. The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023.
Cubuk et al. (2019) Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
Desai et al. (2022) Krish Desai, Benjamin Nachman, and Jesse Thaler. Symmetry discovery with deep learning. Physical Review D, 105(9):096031, 2022.
Diamond and Boyd (2016) Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning SO(3) equivariant representations with spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018.
Finzi et al. (2020) Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to Lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
Finzi et al. (2021) Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning, pages 3318–3328. PMLR, 2021.
Fukushima (1980) Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
Goswami et al. (2023) Somdatta Goswami, Aniruddha Bora, Yue Yu, and George Em Karniadakis. Physics-informed deep neural operator networks. In Machine Learning in Modeling and Simulation: Methods and Applications, pages 219–254. Springer, 2023.
Gotô (1950) Morikuni Gotô. Faithful representations of Lie groups II. Nagoya mathematical journal, 1:91–107, 1950.
Gross (2011) David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, 2011.
Gross (1996) David J Gross. The role of symmetry in fundamental physics. Proceedings of the National Academy of Sciences, 93(25):14256–14259, 1996.
Gruver et al. (2022) Nate Gruver, Marc Anton Finzi, Micah Goldblum, and Andrew Gordon Wilson. The Lie derivative for measuring learned equivariance. In The Eleventh International Conference on Learning Representations, 2022.
Guan et al. (2021) Yifei Guan, Steven L Brunton, and Igor Novosselov. Sparse nonlinear models of chaotic electroconvection. Royal Society Open Science, 8(8):202367, 2021.
Hall (2015) Brian C. Hall. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Springer, 2015.
Hataya et al. (2020) Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki Nakayama. Faster autoaugment: Learning augmentation strategies using backpropagation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 1–16. Springer, 2020.
Holmes et al. (2012) P. J. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge Monographs in Mechanics. Cambridge University Press, Cambridge, England, 2nd edition, 2012.
Horn and Johnson (2013) Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2 edition, 2013.
Kaiser et al. (2018) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Discovering conservation laws from data for control. In 2018 IEEE Conference on Decision and Control (CDC), pages 6415–6421. IEEE, 2018.
Kaiser et al. (2021) Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
Kolář et al. (1993) Ivan Kolář, Peter W. Michor, and Jan Slovák. Natural Operations in Differential Geometry. Springer-Verlag, 1993.
Kondor and Trivedi (2018) Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pages 2747–2755. PMLR, 2018.
Koopman (1931) B. O. Koopman. Hamiltonian systems and transformations in Hilbert space. Proceedings of the National Academy of Sciences, 17:315–318, 1931.
Koralov and Sinai (2012) Leonid B. Koralov and Yakov G. Sinai. Theory of Probability and Random Processes. Springer, 2 edition, 2012.
Kovachki et al. (2023) Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89):1–97, 2023.
LeCun et al. (1989) Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
Lee (2013) John M. Lee. Introduction to Smooth Manifolds: Second Edition. Springer, 2013.
Lezcano-Casado and Martınez-Rubio (2019) Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In International Conference on Machine Learning, pages 3794–3803. PMLR, 2019.
Liu and Tegmark (2022) Ziming Liu and Max Tegmark. Machine learning hidden symmetries. Phys. Rev. Lett., 128:180201, May 2022. doi: 10.1103/PhysRevLett.128.180201. URL https://link.aps.org/doi/10.1103/PhysRevLett.128.180201.
Loiseau and Brunton (2018) J.-C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid Mechanics, 838:42–67, 2018.
Maron et al. (2018) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
Marsden and Ratiu (1999) J. E. Marsden and T. S. Ratiu. Introduction to mechanics and symmetry. Springer-Verlag, 2nd edition, 1999.
Mauroy et al. (2020) Alexandre Mauroy, Y Susuki, and I Mezić. Koopman operator in systems and control. Springer, 2020.
Messenger and Bortz (2021) Daniel A Messenger and David M Bortz. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Modeling & Simulation, 19(3):1474–1497, 2021.
Meyer (2000) Carl D Meyer. Matrix analysis and applied linear algebra, volume 71. Siam, 2000.
Mezić (2005) Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41:309–325, 2005.
Miao and Rao (2007) Xu Miao and Rajesh PN Rao. Learning the Lie groups of visual invariance. Neural computation, 19(10):2665–2693, 2007.
Mohan and Fazel (2012) Karthik Mohan and Maryam Fazel. Iterative reweighted algorithms for matrix rank minimization. Journal of Machine Learning Research, 13(1):3441–3473, 2012.
Moskalev et al. (2022) Artem Moskalev, Anna Sepliarskaia, Ivan Sosnovik, and Arnold Smeulders. LieGG: Studying learned Lie group generators. Advances in Neural Information Processing Systems, 35:25212–25223, 2022.
Noether (1918) E Noether. Invariante variationsprobleme nachr. d. könig. gesellsch. d. wiss. zu göttingen, math-phys. klasse 1918: 235-257. English Reprint: physics/0503066, http://dx. doi. org/10.1080/00411457108231446, page 57, 1918.
Olver (1986) Peter J. Olver. Applications of Lie Groups to Differential Equations. Springer, 1986.
Otto and Rowley (2021) Samuel E Otto and Clarence W Rowley. Koopman operators for estimation and control of dynamical systems. Annual Review of Control, Robotics, and Autonomous Systems, 4:59–87, 2021.
Peitz et al. (2023) Sebastian Peitz, Hans Harder, Feliks Nüske, Friedrich Philipp, Manuel Schaller, and Karl Worthmann. Partial observations, coarse graining and equivariance in Koopman operator theory for large-scale dynamical systems. arXiv preprint arXiv:2307.15325, 2023.
Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
Rao and Ruderman (1999) Rajesh Rao and Daniel Ruderman. Learning Lie groups for invariant visual perception. Advances in neural information processing systems, 11, 1999.
Recht et al. (2010) Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
Reinbold et al. (2020) Patrick AK Reinbold, Daniel R Gurevich, and Roman O Grigoriev. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Physical Review E, 101(010203), 2020.
Romero and Lohit (2022) David W Romero and Suhas Lohit. Learning partial equivariances from data. Advances in Neural Information Processing Systems, 35:36466–36478, 2022.
Rowley et al. (2003) Clarence W. Rowley, Ioannis G. Kevrekidis, Jerrold E. Marsden, and Kurt Lust. Reduction and reconstruction for self-similar dynamical systems. Nonlinearity, 16(4):1257, 2003.
Shorten and Khoshgoftaar (2019) Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
Steyert (2022) Vivian T Steyert. Uncovering Structure with Data-driven Reduced-Order Modeling. PhD thesis, Princeton University, 2022.
Tao (2011) Terence Tao. Two small facts about Lie groups. https://terrytao.wordpress.com/2011/06/25/two-small-facts-about-lie-groups/, 6 2011.
Van Dyk and Meng (2001) David A Van Dyk and Xiao-Li Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
Varadarajan (1984) V. S. Varadarajan. Lie groups, Lie algebras, and their representations. Springer, 1984.
Wang et al. (2022) Rui Wang, Robin Walters, and Rose Yu. Approximately equivariant networks for imperfectly symmetric dynamics. In International Conference on Machine Learning, pages 23078–23091. PMLR, 2022.
Weiler and Cesa (2019) Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. Advances in neural information processing systems, 32, 2019.
Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 31, 2018.
Williams et al. (2015) Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015.
Yang et al. (2023a) Jianke Yang, Nima Dehmamy, Robin Walters, and Rose Yu. Latent space symmetry discovery. arXiv preprint arXiv:2310.00105, 2023a.
Yang et al. (2023b) Jianke Yang, Robin Walters, Nima Dehmamy, and Rose Yu. Generative adversarial symmetry discovery. arXiv preprint arXiv:2302.00236, 2023b.
Yang et al. (2024) Jianke Yang, Wang Rao, Nima Dehmamy, Robin Walters, and Rose Yu. Symmetry-informed governing equation discovery. arXiv preprint arXiv:2405.16756, 2024.
Yuan and Lin (2006) Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.

Appendix A Proofs of minor results

Proof [Proposition 6] Obviously, if ${\mathcal{K}}_{g}K=K$ then ${\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}={\mathcal{T}}_{K}$ . On the other hand, suppose that ${\mathcal{K}}_{g}K(x_{0},y_{0})\neq K(x_{0},y_{0})$ for some $(x_{0},y_{0})\in\mathbb{R}^{n}\times\mathbb{R}^{m}$ . Hence, there are vectors $v\in{\mathcal{V}}$ and $w\in{\mathcal{W}}^{*}$ such that $\langle w,\ {\mathcal{K}}_{g}K(x_{0},y_{0})v-K(x_{0},y_{0})v\rangle>0$ . This remains true for all $y$ in a neighborhood ${\mathcal{U}}$ of $y_{0}$ by continuity of $K$ and ${\mathcal{K}}_{g}K$ . Letting $F(x)=v\varphi(x)$ where $\varphi$ is a smooth, nonnegative, function with $\varphi(y_{0})>0$ and support in ${\mathcal{U}}$ , we obtain

\big{\langle}w,\ {\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}F(x)-{\mathcal{T}}_{K}F(x)\big{\rangle}=\int_{\mathbb{R}^{m}}\big{\langle}w,\ {\mathcal{K}}_{g}K(x,y)v-K(x,y)v\big{\rangle}\varphi(y)\operatorname{\mathrm{d}}y>0,

(143)

meaning ${\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}\neq{\mathcal{T}}_{K}$ . Therefore, ${\mathcal{K}}_{g}K=K$ if and only if ${\mathcal{K}}_{g}^{({\mathcal{W}})}\circ{\mathcal{T}}_{K}\circ{\mathcal{K}}_{g^{-1}}^{({\mathcal{V}})}={\mathcal{T}}_{K}$ .

We use the following lemma in the proof of Proposition 11.

Lemma 41

Suppose that $S_{m}\to S$ is a convergent sequence of matrices and $\operatorname{Null}(S)\subset\operatorname{Null}(S_{n})\subset\operatorname{Null}(S_{m})$ when $n\geq m$ . Then there is an integer $M_{0}$ such that for every $m\geq M_{0}$ we have $\operatorname{Null}(S_{m})=\operatorname{Null}(S)$ . We provide a proof in Appendix A.

Proof Since the sequence of matrices acts on a finite-dimensional space, $\dim\operatorname{Null}(S_{m})$ is a monotone bounded sequence of integers. Therefore, there exists an integer $M_{0}$ such that $\dim\operatorname{Null}(S_{m})=\dim\operatorname{Null}(S_{M_{0}})$ for every $m\geq M_{0}$ . Since $\operatorname{Null}(S_{m})\subset\operatorname{Null}(S_{M_{0}})$ , we must have $\operatorname{Null}(S_{m})=\operatorname{Null}(S_{M_{0}})$ for every $m\geq M_{0}$ . Since $\operatorname{Null}(S)\subset\operatorname{Null}(S_{M_{0}})$ by assumption, it remains to show the reverse containment. Suppose $\xi\in\operatorname{Null}(S_{M_{0}})$ , then

S\xi=\lim_{m\to\infty}S_{m}\xi=0

(144)

meaning that $\operatorname{Null}(S_{M_{0}})\subset\operatorname{Null}(S)$ .

Proof [Proposition 11] By the Cauchy-Schwarz inequality our assumption means that $\langle\eta,S_{{\mathcal{M}}}\xi\rangle<\infty$ for every $\eta,\xi\in\operatorname{Lie}(G)$ . Let $\xi_{1},\ldots,\xi_{\dim G}$ be a basis for $\operatorname{Lie}(G)$ . By the strong law of large numbers, specifically Theorem 7.7 in Koralov and Sinai (2012), we have

\big{\langle}\xi_{j},\ S_{m}\xi_{k}\big{\rangle}_{\operatorname{Lie}(G)}\to\langle\xi_{j},S_{{\mathcal{M}}}\xi_{k}\rangle

(145)

for every $j,k$ almost surely. Consequently, $S_{m}\to S_{{\mathcal{M}}}$ almost surely. By nonnegativity of each term in the sum defining $\big{\langle}\xi,\ S_{m}\xi\big{\rangle}_{\operatorname{Lie}(G)}$ , it follows that $\operatorname{Null}(S_{n})\subset\operatorname{Null}(S_{m})$ when $n\geq m$ . Moreover, if $\xi\in\operatorname{Null}(S_{{\mathcal{M}}})$ then it follows from the continuity of $z\mapsto(I-P_{z})\hat{\theta}(\xi)_{z}$ that $(I-P_{z})\hat{\theta}(\xi)_{z}=0$ for every $z\in{\mathcal{M}}$ . Hence, $\xi\in\operatorname{Null}(S_{m})$ for every $m$ . Therefore, $S_{m}$ and $S_{{\mathcal{M}}}$ obey the hypotheses of Lemma 41 almost surely and the conclusion follows.

Proof [Proposition 17] Consider the function $\tilde{F}_{\text{rad}}:\mathbb{R}^{n}\to\mathbb{R}^{r}$ defined by

\tilde{F}_{\text{rad}}(x)=(\|x-c_{1}\|_{2}^{2},\ \ldots,\ \|x-c_{r}\|_{2}^{2})

(146)

with standard action of $\operatorname{SE}(n)$ on its domain and the trivial action on its codomain. The symmetries of $\tilde{F}_{\text{rad}}$ are shared by $F_{\text{rad}}$ . By Theorem 4, the Lie algebra of $\tilde{F}_{\text{rad}}$ ’s symmetry group is characterized by

\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{rad}}\quad\Leftrightarrow\quad 0={\mathcal{L}}_{\xi}\tilde{F}_{\text{rad}}(x)=\frac{\partial\tilde{F}_{\text{rad}}(x)}{\partial x}(Sx+v)\quad\forall x\in\mathbb{R}^{n}.

(147)

This means the generators $\xi$ are characterized by the equations

0=(x-c_{i})^{T}(Sx+v)=x^{T}Sx-c_{i}^{T}Sx+x^{T}v-c_{i}^{T}v,\qquad\forall x\in\mathbb{R}^{n}\quad i=1,\ldots,r.

(148)

Since $\xi\in\operatorname{\mathfrak{se}}(n)$ , we have $S^{T}=-S$ , called “skew symmetry”, giving $x^{T}Sx=0$ . The above is satisfied if and only if

Sc_{i}=-v,

(149)

which automatically yields $c_{i}^{T}v=-c_{i}^{T}Sc_{i}=0$ . Therefore, $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{rad}}=\mathfrak{g}_{\text{rad}}\subset\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}F_{\text{rad}}$ . To determine the dimension of the symmetry group, we observe that $S$ must satisfy

S(c_{2}-c_{1})=\cdots=S(c_{r}-c_{1})=0,

(150)

and any such $S$ uniquely determines $v=-Sc_{1}$ . Therefore, the dimension of $\mathfrak{g}_{\text{rad}}$ equals the dimension of the space of skew-symmetric matrices satisfying (150). Let the columns of $W=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix}$ form an orthonormal basis for $\mathbb{R}^{n}$ with the $r-1$ columns of $W_{1}$ being a basis for $\operatorname{span}\{(c_{2}-c_{1}),\ldots,(c_{r}-c_{1})\}$ . The above constraints, together with skew-symmetry, mean that $S$ takes the form

S=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix}\begin{bmatrix}0&0\\ 0&\tilde{S}\end{bmatrix}\begin{bmatrix}W_{1}^{T}\\ W_{2}^{T}\end{bmatrix},

(151)

where $\tilde{S}$ is an $(n-r+1)\times(n-r+1)$ matrix skew-symmetric matrix. Therefore, the dimension of $\mathfrak{g}_{\text{rad}}$ equals the dimension of the space of $(n-r+1)\times(n-r+1)$ skew-symmetric matrices, which is $\frac{1}{2}(n-r)(n-r+1)$ .

The argument for $F_{\text{lin}}$ is similar, with the symmetries of

\tilde{F}_{\text{lin}}(x)=\big{(}u_{1}^{T}x,\ \ldots,\ u_{r}^{T}x\big{)}=U^{T}x

(152)

also being symmetries of $F_{\text{lin}}$ . The condition $0={\mathcal{L}}_{\xi}\tilde{F}_{\text{lin}}$ is equivalent to

U^{T}Sx+U^{T}v=0\qquad\forall x\in\mathbb{R}^{n},

(153)

which occurs if and only if $U^{T}S=0$ and $U^{T}v=0$ . This immediately yields $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}\tilde{F}_{\text{lin}}=\mathfrak{g}_{\text{lin}}\subset\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}F_{\text{lin}}$ . Per our earlier argument, the skew-symmetric matrices $S$ satisfying

Su_{1}=\cdots=Su_{r}=0

(154)

form a vector space with dimension $\frac{1}{2}(n-r)(n-r-1)$ . The subspace of vectors $v\in\mathbb{R}^{n}$ satisfying $U^{T}v=0$ is $(n-r)$ -dimensional. Adding these gives the dimension of $\mathfrak{g}_{\text{lin}}$ , which is $\frac{1}{2}(n-r)(n-r+1)$ .

Suppose there exists a polynomial $\varphi_{0}$ with degree $\leq d$ such that $\operatorname{\mathfrak{sym}}_{\operatorname{SE}(n)}(F_{\text{rad}})=\mathfrak{g}_{\text{rad}}$ when $\varphi_{\text{rad}}=\varphi_{0}$ . Let ${\mathfrak{g}}_{\perp}$ be a complementary subspace to ${\mathfrak{g}}_{\text{rad}}$ in $\operatorname{\mathfrak{se}}(n)$ , that is, ${\mathfrak{g}}_{\perp}\oplus{\mathfrak{g}}_{\text{rad}}=\operatorname{\mathfrak{se}}(n)$ . We observe that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}}$ if and only if there is a nonzero $\xi_{\perp}\in{\mathfrak{g}}_{\perp}$ satisfying ${\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}=0$ . The “if” part of this claim is obvious. The “only if” part follows from the fact that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}}$ means that ${\mathcal{L}}_{\xi}F_{\text{rad}}=0$ for some nonzero $\xi\notin{\mathfrak{g}}_{\text{rad}}$ . Using the direct-sum decomposition of $\operatorname{\mathfrak{se}}(n)$ , there are unique $\xi_{\perp}\in{\mathfrak{g}}_{\perp}$ and $\xi_{a}\in{\mathfrak{g}}_{a}$ such that $\xi=\xi_{\perp}+\xi_{a}$ , yielding

0={\mathcal{L}}_{\xi}F_{\text{rad}}={\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}+{\mathcal{L}}_{\xi_{a}}F_{\text{rad}}={\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}.

(155)

Moreover, $\xi_{\perp}\neq 0$ because $\xi\notin{\mathfrak{g}}_{a}$ . Letting $\xi_{1},\ldots,\xi_{D}$ form a basis for ${\mathfrak{g}}_{\perp}$ , we consider the $D\times D$ Gram matrix ${\boldsymbol{G}}(F_{\text{rad}})$ with entries

\big{[}{\boldsymbol{G}}(F_{\text{rad}})\big{]}_{i,j}=\int_{[0,1]^{n}}{\mathcal{L}}_{\xi_{i}}F_{\text{rad}}(x){\mathcal{L}}_{\xi_{j}}F_{\text{rad}}(x)\ \operatorname{\mathrm{d}}x.

(156)

This matrix is singular if and only if there is a nonzero $\xi_{\perp}\in{\mathfrak{g}}_{\perp}$ satisfying ${\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}(x)=0$ for every $x$ in the cube $[0,1]^{n}$ . Since

{\mathcal{L}}_{\xi_{\perp}}F_{\text{rad}}(x)=\frac{\partial F_{\text{rad}}(x)}{\partial x}(Sx+v),\qquad\xi_{\perp}=\begin{bmatrix}S&v\\ 0&0\end{bmatrix},

(157)

is a polynomial function of $x$ , it vanishes in the cube if and only if it vanishes everywhere. Hence ${\boldsymbol{G}}(F_{\text{rad}})$ is singular if and only if $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}}$ . Letting ${\boldsymbol{c}}$ denote the vector of coefficients defining $\varphi_{\text{rad}}$ in a basis for the polynomials of degree $\leq d$ on $\mathbb{R}^{r}$ , we observe that

f:{\boldsymbol{c}}\mapsto\det\big{(}{\boldsymbol{G}}(F_{\text{rad}})\big{)}

(158)

is a polynomial function of ${\boldsymbol{c}}$ . The set of polynomials $\varphi_{\text{rad}}$ with degree $\leq d$ for which $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\neq{\mathfrak{g}}_{\text{rad}}$ corresponds to the zero level set of $f$ , i.e., those ${\boldsymbol{c}}$ such that $f({\boldsymbol{c}})=0$ . Obviously, $f(0)=0$ , and taking the coefficients ${\boldsymbol{c}}_{0}$ corresponding to $\varphi_{0}$ gives $f({\boldsymbol{c}}_{0})\neq 0$ , meaning $f$ is a nonconstant polynomial. Since each level set of a nonconstant polynomial is a set of measure zero (Caron and Traynor (2005)), it follows that the zero level set of $f$ has measure zero. Precisely the same argument works for $\varphi_{\text{lin}}$ , $F_{\text{lin}}$ , and ${\mathfrak{g}}_{\text{lin}}$ .

Proof [Corollary 18] By Proposition 17, it suffices to find a degree- $r$ polynomial $\varphi_{\text{rad}}$ such that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\subset{\mathfrak{g}}_{\text{rad}}$ . We choose $\varphi_{\text{rad}}(z_{1},\ldots,z_{r})=z_{1}+z_{2}^{2}+\cdots+z_{r}^{r}$ , giving

F_{\text{rad}}(x)=\sum_{k=1}^{r}\|x-c_{k}\|_{2}^{2k}.

(159)

If $\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{se}}(n)$ generates a symmetry of $F_{\text{lin}}$ then

0={\mathcal{L}}_{\xi}F_{\text{rad}}(x)=\sum_{k=1}^{r}k\|x-c_{k}\|_{2}^{2(k-1)}\underbrace{(x-c_{k})^{T}(Sx+v)}_{x^{T}(Sc_{k}+v)-c_{k}^{T}v}\qquad\forall x\in\mathbb{R}^{n}.

(160)

The terms in this expression with highest degree in $x$ must vanish, yielding

0=r\|x\|_{2}^{2(r-1)}x^{T}(Sc_{r}+v)\qquad\forall x\in\mathbb{R}^{n}.

(161)

This implies that $Sc_{r}+v=0$ . Proceeding inductively, suppose that $Sc_{l}+v=0$ for every $l>k$ . Then, vanishing the highest-degree term in (160) gives

0=k\|x\|_{2}^{2(k-1)}x^{T}(Sc_{k}+v)\qquad\forall x\in\mathbb{R}^{n},

(162)

implying that $Sc_{k}+v=0$ . It follows that

Sc_{k}+v=0\qquad\forall k=1,\ldots,r,

(163)

by induction, meaning that $\xi\in{\mathfrak{g}}_{\text{rad}}$ . Hence, $\operatorname{\mathfrak{sym}}_{G}(F_{\text{rad}})\subset{\mathfrak{g}}_{\text{rad}}$ , which completes the proof.

Proof [Corollary 19] By Proposition 17, it suffices to find a quadratic polynomial $\varphi_{\text{lin}}$ such that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{lin}})\subset{\mathfrak{g}}_{\text{lin}}$ . Letting $D=\operatorname{diag}[1,2,\ldots,r]$ and $U=\begin{bmatrix}u_{1}&\cdots&u_{r}\end{bmatrix}$ , consider the quadratic function $\varphi_{\text{lin}}(z)=\frac{1}{2}z^{T}Dz$ , giving

F_{\text{lin}}(x)=\frac{1}{2}x^{T}UDU^{T}x.

(164)

If $\xi=\begin{bmatrix}S&v\\ 0&0\end{bmatrix}\in\operatorname{\mathfrak{se}}(n)$ generates a symmetry of $F_{\text{lin}}$ then

0={\mathcal{L}}_{\xi}F_{\text{lin}}(x)=x^{T}UDU^{T}Sx+x^{T}UDU^{T}v\qquad\forall x\in\mathbb{R}^{n}.

(165)

Differentiating the above with respect to $x$ at $x=0$ yields $UDU^{T}v=0$ , which, because $UD$ is injective, means that $U^{T}v=0$ . The fact that $x^{T}UDU^{T}Sx=0$ for every $x$ means that $UDU^{T}S+S^{T}UDU^{T}=0$ , i.e.,

UDU^{T}S=SUDU^{T}.

(166)

Letting the columns of $U_{\perp}$ span the orthogonal complement to columns of $U$ and expressing

S=U\tilde{S}_{1,1}U^{T}+U\tilde{S}_{1,2}U_{\perp}^{T}+U_{\perp}\tilde{S}_{2,1}U^{T}+U_{\perp}\tilde{S}_{2,2}U_{\perp}^{T},

(167)

the above commutation relation with $UDU$ gives

UD\tilde{S}_{1,1}U^{T}+UD\tilde{S}_{1,2}U_{\perp}^{T}=U\tilde{S}_{1,1}DU^{T}+U_{\perp}\tilde{S}_{2,1}DU^{T}.

(168)

Multiplying on left and right by combinations of $U^{T}$ or $U_{\perp}^{T}$ and $U$ or $U_{\perp}$ extracts the relations

D\tilde{S}_{1,1}=\tilde{S}_{1,1}D,\qquad D\tilde{S}_{1,2}=0,\qquad\tilde{S}_{2,1}D=0.

(169)

Since $D$ is invertible, we must have $\tilde{S}_{1,2}=0$ and $\tilde{S}_{2,1}=0$ . Since $S^{T}=-S$ , we must also have $\tilde{S}_{1,1}^{T}=-\tilde{S}_{1,1}$ , meaning that its diagonal entries are identically zero. Considering the $(j,k)$ element of $\tilde{S}_{1,1}$ with $j\neq k$ , we have

j[\tilde{S}_{1,1}]_{j,k}=[D\tilde{S}_{1,1}]_{j,k}=[\tilde{S}_{1,1}D]_{j,k}=k[\tilde{S}_{1,1}]_{j,k},

(170)

meaning that $[\tilde{S}_{1,1}]_{j,k}=0$ . Therefore, only $\tilde{S}_{2,2}$ can be nonzero, which gives $SU=0$ . Combined with the fact that $U^{T}v=0$ , we conclude that $\operatorname{\mathfrak{sym}}_{G}(F_{\text{lin}})\subset{\mathfrak{g}}_{\text{lin}}$ , completing the proof.

Proof [Proposition 16] As an intersection of closed subgroups, $H:=\bigcap_{l=1}^{L}\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)}$ is a closed subgroup of $G$ . By the closed subgroup theorem (see Theorem 20.12 in Lee (2013)), $H$ is an embedded Lie subgroup, whose Lie subalgebra we denote by $\mathfrak{h}$ . If $\xi\in\mathfrak{h}$ then $\exp(t\xi)\in\operatorname{Sym}_{G}\big{(}F^{(l)}\big{)}$ for all $t\in\mathbb{R}$ and every $l=1,\ldots,L$ . Differentiating ${\mathcal{K}}_{\exp(t\xi)}F^{(l)}=F^{(l)}$ at $t=0$ proves that ${\mathcal{L}}_{\xi}F^{(l)}=0$ , i.e., $\xi\in\operatorname{\mathfrak{sym}}_{G}(F^{(l)})$ by Theorem 4. Conversely, if ${\mathcal{L}}_{\xi}F^{(l)}=0$ for every $l=1,\ldots,L$ , then by Theorem 4, $\exp(t\xi)\in H$ . Since $H$ is a Lie subgroup, differentiating $\exp(t\xi)$ at $t=0$ proves that $\xi\in\mathfrak{h}$ .

Proof [Proposition 20] Let $f_{1},\ldots,f_{N}$ be a basis for ${\mathcal{F}}^{\prime}$ . Consider the sequence of Gram matrices ${\boldsymbol{G}}_{M}$ with entries

[{\boldsymbol{G}}_{M}]_{i,j}=\left\langle f_{i},\ f_{j}\right\rangle_{L^{2}(\mu_{M})}.

(171)

It suffices to show that ${\boldsymbol{G}}_{M}$ is positive-definite for sufficiently large $M$ . Since the $L^{2}(\mu)$ inner product is positive-definite on ${\mathcal{F}}^{\prime}$ , it follows that the Gram matrix ${\boldsymbol{G}}$ with entries

[{\boldsymbol{G}}]_{i,j}=\left\langle f_{i},\ f_{j}\right\rangle_{L^{2}(\mu)}

(172)

is positive-definite. Hence, its smallest eigenvalue $\lambda_{\text{min}}({\boldsymbol{G}})$ is positive. Since the ordered eigenvalues of symmetric matrices are continuous with respect to their entries (see Corollary 4.3.15 in Horn and Johnson (2013)) and $[{\boldsymbol{G}}_{M}]_{i,j}\to[{\boldsymbol{G}}]_{i,j}$ for all $1\leq i,j\leq N$ by assumption, we have $\lambda_{\text{min}}({\boldsymbol{G}}_{M})\to\lambda_{\text{min}}({\boldsymbol{G}})$ as $M\to\infty$ . Therefore, there is an $M_{0}$ so that for every $M\geq M_{0}$ we have $\lambda_{\text{min}}({\boldsymbol{G}}_{M})>0$ , i.e., ${\boldsymbol{G}}_{M}$ is positive-definite.

Proof [Lemma 31] Using the fact that the integral is invariant under pullbacks by diffeomorphisms, we can express the left-hand-side of the equivariance condition in Definition 30 as

{\mathcal{K}}_{0,g}{\mathcal{T}}_{K}\big{[}{\mathcal{K}}_{1,g^{-1}}F_{1},\ldots,{\mathcal{K}}_{r,g^{-1}}F_{r}\big{]}(p)=\\ \Theta_{0,g^{-1}}\Bigg{\{}\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(\theta_{0,g}(p),q_{1},\ldots,q_{r})\big{[}\Theta_{1,g}\circ F_{1}\circ\theta_{1,g^{-1}}(q_{1}),\ldots,\Theta_{r,g}\circ F_{r}\circ\theta_{r,g^{-1}}(q_{r})\big{]}\\ \operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\operatorname{dV}_{r}(q_{r}).\Bigg{\}}=\\ \Theta_{0,g^{-1}}\Bigg{\{}\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K(\theta_{0,g}(p),\theta_{1,g}(q_{1}),\ldots,\theta_{r,g}(q_{r}))\big{[}\Theta_{1,g}\circ F_{1}(q_{1}),\ldots,\Theta_{r,g}\circ F_{r}(q_{r})\big{]}\\ \theta_{1,g}^{*}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}(q_{r}).\Bigg{\}}=\\ \int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}{\mathcal{K}}^{E}_{g}K(p,q_{1},\ldots,q_{r})\big{[}F_{1}(q_{1}),\ldots,F_{r}(q_{r})\big{]}\theta_{1,g}^{*}\operatorname{dV}_{1}(q_{1})\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}(q_{r}).

(173)

Hence, by comparing the integrand to (129), it is clear that (134) implies that ${\mathcal{T}}_{K}$ is equivariant in the sense of Definition 30. Conversely, if ${\mathcal{T}}_{K}$ is $g$ -equivariant, then

\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}{\mathcal{K}}^{E}_{g}K\big{[}F_{1},\ldots,F_{r}\big{]}\theta_{1,g}^{*}\operatorname{dV}_{1}\wedge\cdots\wedge\theta_{r,g}^{*}\operatorname{dV}_{r}=\int_{{\mathcal{M}}_{1}\times\cdots\times{\mathcal{M}}_{r}}K\big{[}F_{1},\ldots,F_{r}\big{]}\operatorname{dV}_{1}\wedge\cdots\wedge\operatorname{dV}_{r}

(174)

holds for every $(F_{1},\ldots,F_{r})\in D({\mathcal{T}}_{K})$ . Since the domain contains all smooth, compactly-supported fields $(F_{1},\ldots,F_{r})$ , it follows that (134) holds.

Proof [Proposition 34] Consider a leaf ${\mathcal{M}}$ of an $m$ -dimensional foliation on the $n$ -dimensional manifold ${\mathcal{N}}$ and let $\gamma:[a,b]\to{\mathcal{N}}$ be a smooth curve satisfying $\gamma((a,b))\subset{\mathcal{M}}$ . First, it is clear that ${\mathcal{M}}$ is a weakly embedded submanifold of ${\mathcal{N}}$ since ${\mathcal{M}}$ is an integral manifold of an involutive distribution (Lee (2013, Proposition 19.19)) and the local structure theorem for integral manifolds (Lee (2013, Proposition 19.16)) shows that they are weakly embedded.

By continuity of $\gamma$ , any neighborhood of $\gamma(b)$ in ${\mathcal{N}}$ must have nonempty intersection with ${\mathcal{M}}$ . By definition of a foliation (see Lee (2013)), there is a coordinate chart $({\mathcal{U}},{\boldsymbol{x}})$ for ${\mathcal{N}}$ with $\gamma(b)\in{\mathcal{U}}$ such that ${\boldsymbol{x}}({\mathcal{U}})$ is a coordinate-aligned cube in $\mathbb{R}^{n}$ and ${\mathcal{M}}\cap{\mathcal{U}}$ consists of countably many slices of the form $x^{m+1}=c^{m+1},\ldots,x^{n}=c^{n}$ for constants $c^{m+1},\ldots,c^{n}$ . Since $\gamma$ is continuous, there is a $\delta>0$ so that $\gamma((b-\delta,b])\subset{\mathcal{U}}$ , and in particular, $\gamma((b-\delta,b))\subset{\mathcal{M}}\cap{\mathcal{U}}$ . By continuity of $\gamma$ , there are constants $c^{m+1},\ldots,c^{n}$ such that $x^{i}(\gamma(t))=c^{i}$ for every $i=m+1,\ldots,n$ and $t\in(b-\delta,b)$ . Hence, we have

x^{i}(\gamma(b))=\lim_{t\to b}x^{i}(\gamma(t))=c^{i},\qquad i=m+1,\ldots,n,

(175)

meaning that $\gamma(b)\in{\mathcal{M}}$ . An analogous argument shows that $\gamma(a)\in{\mathcal{M}}$ , completing the proof that ${\mathcal{M}}$ is arcwise-closed.

Appendix B Proof of Proposition 21

Our proof relies on the following lemma:

Lemma 42

Let ${\mathcal{P}}$ denote a finite-dimensional vector space of polynomials $\mathbb{R}^{m}\to\mathbb{R}$ . If $M\geq\dim({\mathcal{P}})$ then the evaluation map $T:{\mathcal{P}}\to\mathbb{R}^{M}$ defined by

T_{(x_{1},\ldots,x_{M})}:P\mapsto(P(x_{1}),\ldots,P(x_{M}))

(176)

is injective for almost every $(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M}$ with respect to Lebesgue measure.

Proof Letting $M_{0}=\dim({\mathcal{P}})$ and choosing a basis $P_{1},\ldots,P_{M_{0}}$ for ${\mathcal{P}}$ , injectivity of $T_{(x_{1},\ldots,x_{M})}$ is equivalent to injectivity of the $M\times M_{0}$ matrix

{\boldsymbol{T}}_{(x_{1},\ldots,x_{M})}=\begin{bmatrix}P_{1}(x_{1})&\cdots&P_{M_{0}}(x_{1})\\ \vdots&\ddots&\vdots\\ P_{1}(x_{M})&\cdots&P_{M_{0}}(x_{M})\end{bmatrix}.

(177)

Finally, this is equivalent to

\varphi(x_{1},\ldots,x_{M})=\det\big{(}({\boldsymbol{T}}_{(x_{1},\ldots,x_{M})})^{T}{\boldsymbol{T}}_{(x_{1},\ldots,x_{M})}\big{)}

(178)

taking a nonzero value. We observe that $\varphi$ is a polynomial on the Euclidean space $(\mathbb{R}^{m})^{M}$ .

Suppose there exists a set of points $(\bar{x}_{1},\ldots,\bar{x}_{M})\in(\mathbb{R}^{m})^{M}$ such that $T_{(\bar{x}_{1},\ldots,\bar{x}_{M})}$ is injective. Then for this set $\varphi(\bar{x}_{1},\ldots,\bar{x}_{M})\neq 0$ . Obviously, $\varphi(0,\ldots,0)=0$ , meaning that $\varphi$ cannot be constant. Thanks to the main result in Caron and Traynor (2005), this means that each level set of $\varphi$ has zero Lebesgue measure in $(\mathbb{R}^{m})^{M}$ . In particular, the level set $\varphi^{-1}(0)$ , consisting of those $x_{1},\ldots,x_{M}$ for which $T_{(x_{1},\ldots,x_{M})}$ fails to be injective, has zero Lebesgue measure. Therefore, it suffices to prove that there exists $(\bar{x}_{1},\ldots,\bar{x}_{M})\in(\mathbb{R}^{m})^{M}$ such that $T_{(\bar{x}_{1},\ldots,\bar{x}_{M})}$ is injective. We do this by induction.

It is clear that there exists $\bar{x}_{1}$ so that the $1\times 1$ matrix

{\boldsymbol{T}}_{1}=\begin{bmatrix}P_{1}(\bar{x}_{1})\end{bmatrix}

(179)

has full rank since $P_{1}$ cannot be the zero polynomial. Proceeding by induction, we assume that there exists $\bar{x}_{1},\ldots,\bar{x}_{s}$ so that

{\boldsymbol{T}}_{s}=\begin{bmatrix}P_{1}(\bar{x}_{1})&\cdots&P_{s}(\bar{x}_{1})\\ \vdots&\ddots&\vdots\\ P_{1}(\bar{x}_{s})&\cdots&P_{s}(\bar{x}_{s})\end{bmatrix}

(180)

has full rank. Suppose that the matrix

{\boldsymbol{\tilde{T}}}_{s+1}(x)=\begin{bmatrix}P_{1}(\bar{x}_{1})&\cdots&P_{s}(\bar{x}_{1})&P_{s+1}(\bar{x}_{1})\\ \vdots&\ddots&\vdots&\vdots\\ P_{1}(\bar{x}_{s})&\cdots&P_{s}(\bar{x}_{s})&P_{s+1}(\bar{x}_{s})\\ P_{1}(x)&\cdots&P_{s}(x)&P_{s+1}(x)\end{bmatrix}

(181)

has rank $<s+1$ for every $x\in\mathbb{R}^{m}$ . Since the upper left $s\times s$ block of ${\boldsymbol{\tilde{T}}}_{s+1}(x)$ is ${\boldsymbol{T}}_{s}$ , we must always have $\operatorname{rank}({\boldsymbol{\tilde{T}}}_{s+1}(x))=s$ . The nullspace of ${\boldsymbol{\tilde{T}}}_{s+1}(x)$ is contained in the nullspace of the upper $s\times(s+1)$ block of ${\boldsymbol{\tilde{T}}}_{s+1}(x)$ . Since both nullspaces are one-dimensional, they are equal. The upper $s\times(s+1)$ block of ${\boldsymbol{\tilde{T}}}_{s+1}(x)$ does not depend on $x$ , so there is a fixed nonzero vector ${\boldsymbol{v}}\in\mathbb{R}^{s+1}$ so that ${\boldsymbol{\tilde{T}}}_{s+1}(x){\boldsymbol{v}}={\boldsymbol{0}}$ for every $x\in\mathbb{R}^{m}$ . The last row of this expression reads

v_{1}P_{1}(x)+\cdots+v_{s+1}P_{s+1}(x)=0\qquad\forall x\in\mathbb{R}^{m},

(182)

contradicting the linear independence of $P_{1},\ldots,P_{s+1}$ . Therefore there exists $\bar{x}_{s+1}$ so that ${\boldsymbol{T}}_{s+1}={\boldsymbol{\tilde{T}}}_{s+1}(\bar{x}_{s+1})$ has full rank. It follows by induction on $s$ that there exists $\bar{x}_{1},\ldots\bar{x}_{M_{0}}\in\mathbb{R}^{m}$ so that ${\boldsymbol{T}}_{(\bar{x}_{1},\ldots\bar{x}_{M_{0}})}={\boldsymbol{T}}_{M_{0}}$ has full rank. Choosing any $M-M_{0}$ additional vectors yields an injective ${\boldsymbol{T}}_{(\bar{x}_{1},\ldots\bar{x}_{M})}$ , which completes the proof.

Proof [Proposition 21] The sum in (89) clearly defines a symmetric, positive-semidefinite bilinear form on ${\mathcal{F}}^{\prime}$ . It remains to show that this bilinear form is positive-definite. Suppose that there is a function $f\in{\mathcal{F}}^{\prime}$ such that $\langle f,f\rangle_{L^{2}(\mu_{M})}=0$ . Thanks to Lemma 42, our assumption that $M\geq\dim(\pi_{i}({\mathcal{F}}^{\prime}))$ means that the evaluation operator $T_{(x_{1},\ldots,x_{M})}$ is injective on $\pi_{i}({\mathcal{F}}^{\prime})$ for almost every $(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M}$ with respect to Lebesgue measure. Since a countable (in this case finite) intersection of sets of measure zero has measure zero, it follows that for almost every $(x_{1},\ldots,x_{M})\in(\mathbb{R}^{m})^{M}$ with respect to Lebesgue measure, $T_{(x_{1},\ldots,x_{M})}$ is injective on every $\pi_{i}({\mathcal{F}}^{\prime})$ , $i=1,\ldots,n$ . Defining the positive diagonal matrix

{\boldsymbol{D}}=\frac{1}{\sqrt{N}}\begin{bmatrix}\sqrt{w_{1}}&&\\ &\ddots&\\ &&\sqrt{w_{M}}\end{bmatrix},

(183)

and using (89) yields

0=\langle f,f\rangle_{L^{2}(\mu_{M})}=\sum_{j=1}^{n}\big{(}{\boldsymbol{D}}T_{(x_{1},\ldots,x_{M})}\pi_{j}f\big{)}^{T}{\boldsymbol{D}}T_{(x_{1},\ldots,x_{M})}\pi_{j}f.

(184)

This implies that $T_{(x_{1},\ldots,x_{M})}\pi_{j}f={\boldsymbol{0}}$ for $j=1,\ldots,n$ . Since $T_{(x_{1},\ldots,x_{M})}$ is injective on each $\pi_{j}({\mathcal{F}}^{\prime})$ it follows that each $\pi_{j}f=0$ , meaning that $f=0$ . This completes the proof.

Appendix C Proof of Proposition 24

We begin by proving

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F={\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t\xi)}F,

(185)

for every $F\in D({\mathcal{L}}_{\xi)}$ . To prove the first equality, we choose $p\in{\mathcal{M}}$ , let $p^{\prime}=\theta_{\exp(t_{0}\xi)}(p)$ , and compute

	$\displaystyle\frac{1}{t}\left[\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p)-\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}F\big{)}(p)\right]$	$\displaystyle=\frac{1}{t}\Theta_{\exp(-t_{0}\xi)}\circ\left({\mathcal{K}}_{\exp(t\xi)}F-F\right)\circ\theta_{\exp(t_{0}\xi)}(p)$		(186)
		$\displaystyle=\Theta_{\exp(-t_{0}\xi)}\left(\frac{1}{t}\left[\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p^{\prime})-F(p^{\prime})\right]\right).$		(186)

Here, we have used the composition law for the operators ${\mathcal{K}}_{gh}={\mathcal{K}}_{g}{\mathcal{K}}_{h}$ and the fact that $\Theta_{\exp(-t_{0}\xi)}$ is fiber-linear. Taking the limit at $t\to 0$ yields

\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=t_{0}}\big{(}{\mathcal{K}}_{\exp(t\xi)}F\big{)}(p)=\Theta_{\exp(-t_{0}\xi)}\left({\mathcal{L}}_{\xi}F(p^{\prime})\right)=\big{(}{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F\big{)}(p),

(187)

which is the first equality in (185).

The second equality in (185) follows from

\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t_{0}\xi)}F-{\mathcal{K}}_{\exp(t_{0}\xi)}F\right]=\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{K}}_{\exp(t\xi)}F-{\mathcal{K}}_{\exp(t_{0}\xi)}F\right]={\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F.

(188)

This shows that ${\mathcal{K}}_{\exp(t_{0}\xi)}F\in D({\mathcal{L}}_{\xi})$ and ${\mathcal{L}}_{\xi}{\mathcal{K}}_{\exp(t_{0}\xi)}F={\mathcal{K}}_{\exp(t_{0}\xi)}{\mathcal{L}}_{\xi}F$ .

Next, we prove

{\mathcal{L}}_{\alpha\xi+\beta\eta}F=\alpha{\mathcal{L}}_{\xi}F+\beta{\mathcal{L}}_{\eta}F,

(189)

when $F\in C^{1}({\mathcal{M}},E)$ . To do this, we choose $p\in{\mathcal{M}}$ , and define the map $h:G\to E_{p}$ by

h:g\mapsto{\mathcal{K}}_{g}F(p)=\Theta\big{(}F(\theta(p,g)),g^{-1}\big{)}.

(190)

As a composition of $C^{1}$ maps, $h$ is $C^{1}$ , and its derivative at the identity is

\operatorname{\mathrm{d}}h(e)\xi_{e}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}h(\exp(t\xi))={\mathcal{L}}_{\xi}F(p)

(191)

for every $\xi_{e}\in T_{e}G\cong\operatorname{Lie}(G)$ . Since the derivative is linear, it follows that $\xi\mapsto{\mathcal{L}}_{\xi}F(p)$ is linear.

Finally, we prove that

{\mathcal{L}}_{[\xi,\eta]}F=\frac{1}{2}\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F={\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}F,

(192)

when $F\in C^{2}({\mathcal{M}},E)$ . Recall that $\operatorname{Fl}_{\xi}^{t}:g\mapsto g\cdot\exp(t\xi)$ gives the flow of the left-invariant vector field $\xi\in\operatorname{Lie}(G)$ (see Theorem 4.18(3) in Kolář et al. (1993)). By Theorem 3.16 in Kolář et al. (1993) the curve $\gamma:\mathbb{R}\to G$ given by

\gamma(t)=\operatorname{Fl}_{-\eta}^{t}\circ\operatorname{Fl}_{-\xi}^{t}\circ\operatorname{Fl}_{\eta}^{t}\circ\operatorname{Fl}_{\xi}^{t}(e)=\exp(t\xi)\exp(t\eta)\exp(-t\xi)\exp(-t\eta).

(193)

satisfies $\gamma(0)=e$ , $\gamma^{\prime}(0)=0$ , and

\frac{1}{2}\gamma^{\prime\prime}(0)=[\xi,\eta]_{e}\in T_{e}G

(194)

in the sense that $\gamma^{\prime\prime}(0):f\mapsto(f\circ\gamma)^{\prime\prime}(0)$ is a derivation on $C^{\infty}(G)$ , hence an element of $T_{e}G$ . Composing with the map in (190) yields

0=\operatorname{\mathrm{d}}h(e)\gamma^{\prime}(0)=(h\circ\gamma)^{\prime}(0)=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F(p).

(195)

Combining (194) and (191) (noting the definition of the tangent map $\operatorname{\mathrm{d}}h(e)$ acting on derivations, as in Kolář et al. (1993), Lee (2013)) gives

{\mathcal{L}}_{[\xi,\eta]}F(p)=\frac{1}{2}\operatorname{\mathrm{d}}h(e)\gamma^{\prime\prime}(0)=\frac{1}{2}(h\circ\gamma)^{\prime\prime}(0)=\frac{1}{2}\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F(p).

(196)

This proves the first equality in (192) thanks to the composition law

{\mathcal{K}}_{\gamma(t)}={\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}.

(197)

To differentiate the above expression, we use the following observations. If $F_{t}\in\operatorname{\Sigma}(E)$ is such that $(t,p)\mapsto F_{t}(p)$ is $C^{2}(\mathbb{R}\times{\mathcal{M}},E)$ , then obviously $\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\in C^{1}({\mathcal{M}},E)$ with the usual identification $TE_{p}\cong E_{p}$ . Moreover, we have

	$\displaystyle\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{g}F_{t}(p)$	$\displaystyle=\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\Theta_{g^{-1}}\big{(}F_{t}(\theta_{g}(p))\big{)}$		(198)
		$\displaystyle=\Theta_{g^{-1}}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}(\theta_{g}(p))\Big{)}={\mathcal{K}}_{g}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)$		(198)

because $F_{t}(\theta_{g}(p))\in E_{\theta_{g}(p)}$ for all $t\in\mathbb{R}$ and $\Theta_{g^{-1}}$ is linear on $E_{\theta_{g}(p)}$ . Using this, we obtain

$\displaystyle\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{L}}_{\xi}F_{t}(p)$	$\displaystyle=\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)$	(199)
	$\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)$
	$\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)={\mathcal{L}}_{\xi}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)$

because $(t,\tau)\mapsto{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)$ lies in the vector space $E_{p}$ , allowing us to exchanged the order of differentiation. Since $(t_{1},t_{2},t_{3},t_{4})\mapsto{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F(p)$ lies in the vector space $E_{p}$ for all $(t_{1},t_{2},t_{3},t_{4})\in\mathbb{R}^{4}$ , we can apply the chain rule and (198) to obtain

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\gamma(t)}F=\left.\frac{\partial}{\partial t_{1}}\right|_{t_{1}=t}{\mathcal{K}}_{\exp(t_{1}\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}\left.\frac{\partial}{\partial t_{2}}\right|_{t_{2}=t}{\mathcal{K}}_{\exp(t_{2}\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}\left.\frac{\partial}{\partial t_{3}}\right|_{t_{3}=t}{\mathcal{K}}_{\exp(-t_{3}\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}\left.\frac{\partial}{\partial t_{4}}\right|_{t_{4}=t}{\mathcal{K}}_{\exp(-t_{4}\eta)}F.

(200)

Using (185) gives

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\gamma(t)}F={\mathcal{L}}_{\xi}\overbrace{{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}}^{{\mathcal{K}}_{\gamma(t)}}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\eta}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{L}}_{-\xi}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}F\\ +\underbrace{{\mathcal{K}}_{\exp(t\xi)}{\mathcal{K}}_{\exp(t\eta)}{\mathcal{K}}_{\exp(-t\xi)}{\mathcal{K}}_{\exp(-t\eta)}}_{{\mathcal{K}}_{\gamma(t)}}{\mathcal{L}}_{-\eta}F.

(201)

Applying the same technique to differentiate a second time and using the linearity in (189) to cancel terms yields

\left.\frac{\operatorname{\mathrm{d}}^{2}}{\operatorname{\mathrm{d}}t^{2}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F={\mathcal{L}}_{\xi}\underbrace{\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}F}_{0}+\underbrace{{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F+{\mathcal{L}}_{\eta}{\mathcal{L}}_{-\xi}F+{\mathcal{L}}_{\eta}{\mathcal{L}}_{-\xi}F+{\mathcal{L}}_{-\xi}{\mathcal{L}}_{-\eta}F}_{2\big{(}{\mathcal{L}}_{\xi}{\mathcal{L}}_{\eta}F-{\mathcal{L}}_{\eta}{\mathcal{L}}_{\xi}F\big{)}}\\ +\underbrace{\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\mathcal{K}}_{\gamma(t)}{\mathcal{L}}_{-\eta}F}_{0},

(202)

which completes the proof. $\blacksquare$

Appendix D Proof of Theorem 25

We begin by showing that $\operatorname{Sym}_{G}(F)$ is a closed subgroup of $G$ . It is obviously a subgroup, for if $g_{1},g_{2}\in\operatorname{Sym}_{G}(F)$ then

{\mathcal{K}}_{g_{1}g_{2}}F={\mathcal{K}}_{g_{1}}{\mathcal{K}}_{g_{2}}F={\mathcal{K}}_{g_{1}}F=F,

(203)

meaning that $g_{1}g_{2}\in\operatorname{Sym}_{G}(F)$ . To show that $\operatorname{Sym}_{G}(F)$ is closed, we observe that for each $p\in{\mathcal{M}}$ , the map $h_{p}:G\to E$ defined by

h_{p}:g\mapsto{\mathcal{K}}_{g}F(p)=\Theta\big{(}F(\theta(p,g)),g^{-1}\big{)}

(204)

is continuous, as it is a composition of continuous maps. As $F(p)$ is a single point in $E$ , the preimage set $h_{p}^{-1}\big{(}\{F(p)\}\big{)}$ is closed in $G$ . Since $\operatorname{Sym}_{G}(F)$ is an intersection,

\operatorname{Sym}_{G}(F)=\bigcap_{p\in{\mathcal{M}}}h_{p}^{-1}\big{(}\{F(p)\}\big{)},

(205)

of closed sets, it follows that $\operatorname{Sym}_{G}(F)$ is closed in $G$ . By the closed subgroup theorem (Theorem 20.12 in Lee (2013)) it follows that $\operatorname{Sym}_{G}(F)$ is an embedded Lie subgroup of $G$ .

Let ${\mathfrak{h}}=\operatorname{Lie}(\operatorname{Sym}_{G}(F))$ be the Lie algebra of $\operatorname{Sym}_{G}(F)$ . Choosing any $\xi\in{\mathfrak{h}}$ we have $\exp(t\xi)\in\operatorname{Sym}_{G}(F)$ for every $t\in\mathbb{R}$ , yielding

\lim_{t\to 0}\frac{1}{t}\left[{\mathcal{K}}_{\exp(t\xi)}F-F\right]=0.

(206)

Hence, $F\in D({\mathcal{L}}_{\xi})$ and ${\mathcal{L}}_{\xi}F=0$ , meaning that ${\mathfrak{h}}\subset\operatorname{\mathfrak{sym}}_{G}(F)$ , as defined by (101).

To show the reverse containment, choose $\xi\in\operatorname{\mathfrak{sym}}_{G}(F)$ , meaning that $F\in D({\mathcal{L}}_{\xi})$ and ${\mathcal{L}}_{\xi}F=0$ . We observe that (98) in Proposition 24 yields

\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(t\xi)}F={\mathcal{K}}_{\exp(t\xi)}{\mathcal{L}}_{\xi}F=0\qquad\forall t\in\mathbb{R}.

(207)

It follows that ${\mathcal{K}}_{\exp(t\xi)}F=F$ , that is, $\exp(t\xi)\in\operatorname{Sym}_{G}(F)$ for all $t\in\mathbb{R}$ . Differentiating at $t=0$ proves that $\xi\in{\mathfrak{h}}$ . Therefore, ${\mathfrak{h}}=\operatorname{\mathfrak{sym}}_{G}(F)$ , which completes the proof. $\blacksquare$

Appendix E Proof of Theorem 26

If $F\in C({\mathcal{M}},E)$ is $G_{0}$ -equivariant, then ${\mathcal{K}}_{\exp(t\xi)}F=F$ for all $\xi\in\operatorname{Lie}(G)$ and $t\in\mathbb{R}$ . Differentiating with respect to $t$ at $t=0$ gives ${\mathcal{L}}_{\xi}F=0$ .

Conversely, suppose that ${\mathcal{L}}_{\xi_{i}}F=0$ for a collection of generators $\xi_{1},\ldots,\xi_{q}$ of $\operatorname{Lie}(G)$ . By Theorem 25, $\operatorname{Sym}_{G}(F)$ is a closed Lie subgroup of $G$ whose Lie subalgebra $\operatorname{\mathfrak{sym}}_{G}(F)$ contains $\xi_{1},\ldots,\xi_{q}$ . Since $\xi_{1},\ldots,\xi_{q}$ generate $\operatorname{Lie}(G)$ , it follows that $\operatorname{\mathfrak{sym}}_{G}(F)=\operatorname{Lie}(G)$ . This means that $G_{0}\subset\operatorname{Sym}_{G}(F)$ due to the correspondence between connected Lie subgroups and their Lie subalgebras established by Theorem 19.26 in Lee (2013). Specifically, the identity component of $\operatorname{Sym}_{G}(F)$ must correspond to $G_{0}$ since both are connected Lie subgroups of $G$ with identical Lie subalgebras.

Now, let us suppose in addition that ${\mathcal{K}}_{g_{j}}F=F$ for an element $g_{j}$ from each non-identity component $G_{j}$ , $j=1,\ldots,n_{G}-1$ of $G$ . By Proposition 7.15 in Lee (2013), $G_{0}$ is a normal subgroup of $G$ and every connected component $G_{j}$ of $G$ is diffeomorphic to $G_{0}$ . In fact in the proof of this result it is shown that every connected component of $G_{j}$ is a coset of $G_{0}$ , meaning that $G_{j}=G_{0}\cdot g_{j}$ . Choosing any $g\in G_{j}$ there is an element $g_{0}\in G_{0}$ such that $g=g_{0}\cdot g_{j}$ and we obtain

{\mathcal{K}}_{g}F={\mathcal{K}}_{g_{0}}{\mathcal{K}}_{g_{j}}F={\mathcal{K}}_{g_{0}}F=F.

(208)

This completes the proof because $G=\bigcup_{j=0}^{n_{G}-1}G_{j}$ . $\blacksquare$

Appendix F Proof of Theorem 35

Our proof of the theorem relies on the following technical lemma concerning the integral curves of vector fields tangent to weakly embedded, arcwise-closed submanifolds.

Lemma 43

Let ${\mathcal{M}}$ be an arcwise-closed weakly embedded submanifold of a manifold ${\mathcal{N}}$ . Let $V\in\mathfrak{X}({\mathcal{N}})$ be a vector field tangent to ${\mathcal{M}}$ , that is

V_{p}\in T_{p}{\mathcal{M}}\qquad\forall p\in{\mathcal{M}}.

(209)

If $\gamma:I\to{\mathcal{N}}$ is a maximal integral curve of $V$ that intersects ${\mathcal{M}}$ , then $\gamma$ lies in ${\mathcal{M}}$ .

Proof By the translation lemma (Lemma 9.4 in Lee (2013)), we can assume without loss of generality that $0\in I$ and $p_{0}=\gamma(0)\in{\mathcal{M}}$ . Let $\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}}$ denote the inclusion map. Since ${\mathcal{M}}$ is an immersed submanifold of ${\mathcal{N}}$ and $V$ is tangent to ${\mathcal{M}}$ , there is a unique smooth vector field $V|_{{\mathcal{M}}}\in\mathfrak{X}({\mathcal{M}})$ that is $\imath_{{\mathcal{M}}}$ -related to $V$ thanks to Proposition 8.23 in Lee (2013). Let $\tilde{\gamma}:\tilde{I}\to{\mathcal{M}}$ be the maximal integral curve of $V|_{{\mathcal{M}}}$ with $\tilde{\gamma}(0)=p_{0}$ . By the naturality of integral curves (Proposition 9.6 in Lee (2013)) $\imath_{{\mathcal{M}}}\circ\tilde{\gamma}$ is an integral curve of $V$ with $\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(0)=p_{0}$ . Since integral curves of smooth vector fields starting at the same point are unique (Theorem 9.12, part (a) in Lee (2013)) we have $\tilde{I}\subset I$ and

\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(t)=\gamma(t)\qquad\forall t\in\tilde{I}.

(210)

Therefore, it remains to show that $\tilde{I}=I$ .

By the local existence of integral curves (Proposition 9.2 in Lee (2013)), the domains $I$ and $\tilde{I}$ of the maximal integral curves $\gamma$ and $\tilde{\gamma}$ are open intervals in $\mathbb{R}$ . Suppose, for the sake of producing a contradiction, that there exists $t\in I$ with $t>\tilde{I}$ . Then it follows that the least upper bound $b=\sup\tilde{I}$ is an element of $I$ . By (210) and continuity of $\gamma$ we have

q_{0}=\gamma(b)=\lim_{t\to b}\imath_{{\mathcal{M}}}\circ\tilde{\gamma}(t).

(211)

Since ${\mathcal{M}}$ is arcwise-closed, it follows that $q_{0}\in{\mathcal{M}}$ .

To complete the proof, we use the local existence of an integral curve for $V|_{{\mathcal{M}}}$ starting at $q_{0}$ to contradict the maximality of $\tilde{\gamma}$ . By the local existence of integral curves (Proposition 9.2 in Lee (2013)) and the translation lemma (Lemma 9.4 in Lee (2013)), there is an $\varepsilon>0$ and an integral curve $\hat{\gamma}:(b-\varepsilon,b+\varepsilon)\to{\mathcal{M}}$ of $V|_{{\mathcal{M}}}$ such that $\hat{\gamma}(b)=q_{0}=\gamma(b)$ . Shrinking the interval, we take $0<\varepsilon<b-a$ . Again, by nauturality and uniqueness of integral curves we must have $\imath_{{\mathcal{M}}}\circ\hat{\gamma}(t)=\gamma(t)$ for all $t\in(b-\varepsilon,b+\varepsilon)$ . Hence, by (210) and injectivity of $\imath_{{\mathcal{M}}}$ it follows that $\hat{\gamma}(t)=\tilde{\gamma}(t)$ for all $t\in(b-\varepsilon,b)$ . Applying the gluing lemma (Corollary 2.8 in Lee (2013)) to $\tilde{\gamma}$ and $\hat{\gamma}$ yields an extension of $\tilde{\gamma}$ to the larger open interval $\tilde{I}\cup(b-\varepsilon,b+\varepsilon)$ . Since this contradicts the maximality of $\tilde{\gamma}$ , there is no $t\in I$ with $t>\tilde{I}$ . The same argument shows that there is no $t\in I$ with $t<\tilde{I}$ , and so we must have $\tilde{I}=I$ .

We also use the following lemma describing the elements of a Lie group that can be constructed from products of exponentials.

Lemma 44

Let $G_{0}$ be the identity component of a Lie group $G$ . Then every element $g\in G_{0}$ can be expressed as a finite product $g=h_{m}\cdots h_{1}$ of elements $h_{i}=\exp(\xi_{i})$ for $\xi_{i}\in\operatorname{Lie}(G)$ . Let $G_{i}$ be a connected component of $G$ and let $g_{i}\in G_{i}$ . Then every element $g\in G_{i}$ can be expressed as $g=g_{0}g_{i}$ for some $g_{0}\in G_{0}$ .

Proof By the inverse function theorem (more specifically by Proposition 20.8(f) in Lee (2013)), the range of the exponential map contains an open, connected neighborhood ${\mathcal{U}}$ of the identity element $e\in G$ . The inverses of the elements in ${\mathcal{U}}$ also belong to the range of the exponential map thanks to Proposition 20.8(c) in Lee (2013). By Proposition 7.14(b) and Proposition 7.15 in Lee (2013), it follows that ${\mathcal{U}}$ generates the identity component $G_{0}$ of $G$ . That is, any element $g\in G_{0}$ can be written as a finite product of elements in ${\mathcal{U}}$ and their inverses, which proves the first claim.

By Proposition 7.15 in Lee (2013), $G_{0}$ is a normal subgroup of $G$ and every connected component of $G$ is diffeomorphic to $G_{0}$ . In fact in the proof of this result it is shown that every connected component of $G$ is a coset of $G_{0}$ . Therefore, if $G_{i}$ is a non-identity connected component of $G$ and $g_{i}\in G_{i}$ then $G_{i}=G_{0}\cdot g_{i}$ , which proves the second claim.

Proof [Theorem 35] The set $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ is a subspace of $\operatorname{Lie}(G)$ , for if $\xi_{1},\xi_{2}\in\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ and $a_{1},a_{2}\in\mathbb{R}$ then

\hat{\theta}(a_{1}\xi_{1}+a_{2}\xi_{2})_{p}=a_{1}\underbrace{\hat{\theta}(\xi_{1})}_{\in T_{p}{\mathcal{M}}}+a_{2}\underbrace{\hat{\theta}(\xi_{2})}_{\in T_{p}{\mathcal{M}}}\in T_{p}{\mathcal{M}}

(212)

thanks to linearity of the infinitesimal generator $\hat{\theta}$ . To show that $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ is a Lie subalgrebra, we must show that it is also closed under the Lie bracket. Recall that $\hat{\theta}$ is a Lie algebra homomorphism (see Theorem 20.15 in Lee (2013)), and so $\hat{\theta}([\xi_{1},\xi_{2}])=[\hat{\theta}(\xi_{1}),\hat{\theta}(\xi_{1})]$ . Since the Lie bracket of two vector fields tangent to an immersed submanifold is also tangent to the submanifold (see Corollary 8.32 in Lee (2013)), it follows that $[\hat{\theta}(\xi_{1}),\hat{\theta}(\xi_{1})]$ is tangent to ${\mathcal{M}}$ . Hence, $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ is closed under the Lie bracket and is therefore a Lie subalgebra of $\operatorname{Lie}(G)$ . By Theorem 19.26 in Lee (2013), there is a unique connected Lie subgroup of $H\subset G$ whose Lie subalgebra is $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ .

Now suppose that ${\mathcal{M}}$ is weakly embedded and arcwise-closed in ${\mathcal{N}}$ . We aim to show that ${\mathcal{M}}\cdot H\subset{\mathcal{M}}$ . Choosing any $\xi\in\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ , Lemma 20.14 in Lee (2013) shows that $\xi$ , regarded as a left-invariant vector field on $G$ , and $\hat{\xi}=\hat{\theta}(\xi)$ are $\theta^{(p)}$ -related for every $p\in{\mathcal{N}}$ . By the naturality of integral curves (Proposition 9.6 in Lee (2013)) it follows that $\gamma_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{N}}$ defined by

\gamma_{\xi}^{(p)}(t)=p\cdot\exp(t\xi)

(213)

is the unique maximal integral curve of $\hat{\xi}$ passing through $p$ at $t=0$ . When $p\in{\mathcal{M}}$ , this integral curve lies in ${\mathcal{M}}$ thanks to Lemma 43. This means that ${\mathcal{M}}$ is invariant under the action of any group element in the range of the exponential map restricted to $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ . Proceeding by induction, suppose that ${\mathcal{M}}$ is invariant under the action of any product of $m$ such elements. If $g=h_{1}\cdots h_{m}\cdot h_{m+1}$ is a product of $m+1$ elements $h_{i}\in\exp\big{(}\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})\big{)}\subset H$ , then it follows from associativity and the induction hypothesis that

{\mathcal{M}}\cdot(h_{1}\cdots h_{m}\cdot h_{m+1})=({\mathcal{M}}\cdot h_{1}\cdots h_{m})\cdot h_{m+1}\subset{\mathcal{M}}\cdot h_{m+1}\subset{\mathcal{M}}.

(214)

Therefore, ${\mathcal{M}}$ is invariant under the action of any finite product of group elements in $\exp\big{(}\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})\big{)}$ by induction on $m$ . By Lemma 44, it follows that ${\mathcal{M}}$ is $H$ -invariant, proving claim (i).

To prove claim (ii), suppose that $\tilde{H}$ is another connected Lie subgroup of $G$ such that ${\mathcal{M}}\cdot\tilde{H}\subset{\mathcal{M}}$ . Choosing any $p\in{\mathcal{M}}$ and $\xi\in\operatorname{Lie}(\tilde{H})$ , we have

p\cdot\exp(t\xi)\in{\mathcal{M}}\qquad\forall t\in\mathbb{R}.

(215)

Since ${\mathcal{M}}$ is weakly embedded in ${\mathcal{N}}$ , this defines a smooth curve $\gamma:\mathbb{R}\to{\mathcal{M}}$ such that $\imath_{{\mathcal{M}}}\circ\gamma(t)=p\cdot\exp(t\xi)$ , where $\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}}$ is the inclusion map. Differentiating and using the definition of the infinitesimal generator gives

\hat{\theta}(\xi)_{p}=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}\right|_{t=0}p\cdot\exp(t\xi)=\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}\right|_{t=0}\gamma(t)\in T_{p}{\mathcal{M}}.

(216)

Therefore, $\operatorname{Lie}(\tilde{H})\subset\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ which implies that $\tilde{H}\subset H$ by Theorem 19.26 in Lee (2013), establishing claim (ii).

Now suppose that ${\mathcal{M}}$ is properly embedded in ${\mathcal{N}}$ and denote

\operatorname{Sym}_{G}({\mathcal{M}})=\{g\in G\ :{\mathcal{M}}\cdot g\subset{\mathcal{M}}\}=\bigcap_{p\in{\mathcal{M}}}\big{(}\theta^{(p)}\big{)}^{-1}({\mathcal{M}}).

(217)

The equality of these expressions is a simple matter of unwinding their definitions. It is clear that $\operatorname{Sym}_{G}({\mathcal{M}})$ is a subgroup of $G$ , for if $g_{1},g_{2}\in\operatorname{Sym}_{G}({\mathcal{M}})$ then the composition law for the group action gives ${\mathcal{M}}\cdot(g_{1}\cdot g_{2})=({\mathcal{M}}\cdot g_{1})\cdot g_{2}\subset{\mathcal{M}}\cdot g_{1}\subset{\mathcal{M}}$ . Since ${\mathcal{M}}$ is properly embedded, it is closed in ${\mathcal{M}}$ (see Lee (2013, Proposition 5.5)), meaning that each preimge set $\big{(}\theta^{(p)}\big{)}^{-1}({\mathcal{M}})$ is closed in $G$ by continuity of $\theta^{(p)}$ . As an intersection of closed subsets, it follows that $\operatorname{Sym}_{G}({\mathcal{M}})$ is closed in $G$ . By the closed subgroup theorem (Lee (2013, Theorem 20.12)), $\operatorname{Sym}_{G}({\mathcal{M}})$ is a properly embedded Lie subgroup of $G$ . The same holds for the identity component $\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ of $\operatorname{Sym}_{G}({\mathcal{M}})$ since $\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ is closed in $\operatorname{Sym}_{G}({\mathcal{M}})$ , which implies that $\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ is closed in $G$ .

Finally, we show that $H=\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ is the identity component of $\operatorname{Sym}_{G}({\mathcal{M}})$ . First, we observe that $H\subset\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ because $H$ is connected and contained in $\operatorname{Sym}_{G}({\mathcal{M}})$ . The reverse containment follows from the fact that $\operatorname{Sym}_{G}({\mathcal{M}})_{0}$ is a connected Lie subgroup satisfying ${\mathcal{M}}\cdot\operatorname{Sym}_{G}({\mathcal{M}})_{0}\subset{\mathcal{M}}$ , which by our earlier result implies that $\operatorname{Sym}_{G}({\mathcal{M}})_{0}\subset H$ .

Appendix G Proof of Theorem 36

First, suppose that ${\mathcal{M}}$ is $G_{0}$ -invariant. In particular, this means that for every $p\in{\mathcal{M}}$ and $\xi\in\operatorname{Lie}(G)$ , the smooth curve $\gamma_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{N}}$ defined by

\gamma_{\xi}^{(p)}(t)=p\cdot\exp(t\xi)

(218)

lies in ${\mathcal{M}}$ . Since ${\mathcal{M}}$ is weakly embedded in ${\mathcal{N}}$ , $\gamma_{\xi}^{(p)}$ is also smooth as a map into ${\mathcal{M}}$ . Specifically, there is a smooth curve $\tilde{\gamma}_{\xi}^{(p)}:\mathbb{R}\to{\mathcal{M}}$ so that $\gamma_{\xi}^{(p)}=\imath_{{\mathcal{M}}}\circ\tilde{\gamma}_{\xi}^{(p)}$ where $\imath_{{\mathcal{M}}}:{\mathcal{M}}\hookrightarrow{\mathcal{N}}$ is the inclusion map. Differentiating at $t=0$ yields

\hat{\theta}(\xi)_{p}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\gamma_{\xi}^{(p)}(t)\right|_{t=0}=\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\tilde{\gamma}_{\xi}^{(p)}(t)\right|_{t=0},

(219)

which lies in $T_{p}{\mathcal{M}}=\operatorname{Range}\left(\operatorname{\mathrm{d}}\imath_{{\mathcal{M}}}(p)\right)$ . In particular, $\hat{\theta}(\xi_{i})_{p}\in T_{p}{\mathcal{M}}$ for every $p\in{\mathcal{M}}$ and $i=1,\ldots,q$ .

Conversely, suppose that the tangency condition expressed in (139) holds. By Theorem 35, the elements $\xi_{1},\ldots,\xi_{q}$ belong to the Lie subalgebra $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})$ of the largest connected Lie subgroup $H\subset G$ of symmetries of ${\mathcal{M}}$ . Since $\xi_{1},\ldots,\xi_{q}$ generate $\operatorname{Lie}(G)$ , it follows that $\operatorname{\mathfrak{sym}}_{G}({\mathcal{M}})=\operatorname{Lie}(G)$ . Therefore, by Theorem 19.26 in Lee (2013), we obtain $H=G_{0}$ because both are connected Lie subgroups of $G$ with identical Lie subalgebras.

Finally, suppose, in addition, that ${\mathcal{M}}\cdot g_{j}$ for an element $g_{j}$ from each non-identity component $G_{j}$ of $G$ . By Lemma 44, if $g\in G_{j}$ then there is an element $g_{0}\in G_{0}$ such that $g=g_{0}\cdot g_{j}$ . Therefore, we obtain

{\mathcal{M}}\cdot g={\mathcal{M}}\cdot g_{0}\cdot g_{j}\subset{\mathcal{M}}\cdot g_{j}\subset{\mathcal{M}},

(220)

which completes the proof because $G=\bigcup_{j=0}^{n_{G}-1}G_{j}$ . $\blacksquare$

Appendix H Proof of Theorem 39

Proof [Lemma 38] The map $\imath_{F}$ defined in a local trivialization $\Phi$ by (140) is injective. It is a vector bundle homomorphism because $\operatorname{\mathrm{d}}\Phi\circ\imath_{F}\circ\Phi^{-1}$ , $\Phi$ , and $\operatorname{\mathrm{d}}\Phi$ are vector bundle homomorphisms and $\Phi$ and $\operatorname{\mathrm{d}}\Phi$ are invertible. It remains to show that the definition of $\imath_{F}$ does not depend on the choice of local trivialization. Given two local trivializations $\Phi$ and $\tilde{\Phi}$ defined on $\pi^{-1}({\mathcal{U}})\subset E$ where ${\mathcal{U}}$ is an open subset of ${\mathcal{M}}$ , it suffices to show that the following diagram commutes:

(221)

Since $\tilde{\Phi}\circ\Phi^{-1}$ is a bundle homomorphism descending to the identity, it can be written as

\tilde{\Phi}\circ\Phi^{-1}:(p,{\boldsymbol{v}})\mapsto(p,{\boldsymbol{T}}(p){\boldsymbol{v}})

(222)

for a matrix-valued function ${\boldsymbol{T}}:{\mathcal{U}}\to\mathbb{R}^{k\times k}$ . Moreover, the matrices are invertible because the local trivializations are bundle isomorphisms. Differentiating, we obtain

\operatorname{\mathrm{d}}\tilde{\Phi}\circ\operatorname{\mathrm{d}}\Phi^{-1}:(w_{p},{\boldsymbol{w}})_{(p,{\boldsymbol{v}})}\mapsto\big{(}w_{p},\operatorname{\mathrm{d}}{\boldsymbol{T}}(p)w_{p}+{\boldsymbol{T}}(p){\boldsymbol{w}}\big{)}_{\tilde{\Phi}\circ\Phi^{-1}(p,{\boldsymbol{v}})},

(223)

where $w_{p}\in T_{p}{\mathcal{U}}$ . Composing this with $\imath_{\Phi\circ F}:(p,{\boldsymbol{v}})\mapsto(0,{\boldsymbol{v}})_{\Phi(F(p))}$ , we obtain

\operatorname{\mathrm{d}}\tilde{\Phi}\circ\operatorname{\mathrm{d}}\Phi^{-1}\circ\imath_{\Phi\circ F}(p,{\boldsymbol{v}})=(0,{\boldsymbol{T}}(p){\boldsymbol{v}})_{\tilde{\Phi}(F(p))}=\imath_{\tilde{\Phi}\circ F}(p,{\boldsymbol{T}}(p){\boldsymbol{v}})=\imath_{\tilde{\Phi}\circ F}\circ\tilde{\Phi}\circ\Phi^{-1}(p,{\boldsymbol{v}}),

(224)

proving that the diagram commutes.

Proof [Theorem 39] We observe that $F\circ\pi:E\to E$ is a smooth idempotent map whose image is $\operatorname{im}(F)\subset E$ . By differentiating the expression $(F\circ\pi)\circ(F\circ\pi)=F\circ\pi$ at a point $F(p)\in\operatorname{im}(F)$ , we obtain

\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\operatorname{\mathrm{d}}(F\circ\pi)(F(p))=\operatorname{\mathrm{d}}(F\circ\pi)(F(p)),

(225)

meaning that $\operatorname{\mathrm{d}}(F\circ\pi)(F(p)):T_{F(p)}E\to T_{F(p)}E$ is a linear projection. Since

\operatorname{\mathrm{d}}(F\circ\pi)(F(p))=\operatorname{\mathrm{d}}F(p)\operatorname{\mathrm{d}}\pi(F(p)),

(226)

we have $\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}\subset\operatorname{Range}(\operatorname{\mathrm{d}}F(p))=T_{F(p)}\operatorname{im}(F)$ . Differentiating $F=(F\circ\pi)\circ F$ yields

\operatorname{\mathrm{d}}F(p)=\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\operatorname{\mathrm{d}}F(p),

(227)

meaning that $T_{F(p)}\operatorname{im}(F)\subset\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}$ . Since $\operatorname{Range}\big{(}\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\big{)}=T_{F(p)}\operatorname{im}(F)$ it follows that $\operatorname{\mathrm{d}}(F\circ\pi)(F(p))$ is a linear projection onto $T_{F(p)}\operatorname{im}(F)$ .

We observe that the generalized Lie derivative in (97) can be expressed as

$\displaystyle({\mathcal{L}}_{\xi}F)(p)$	$\displaystyle=\lim_{t\to 0}\frac{1}{t}\left[\Theta_{\exp(-t\xi)}(F(\theta_{\exp(t\xi)}(p)))-F(p)\right]$	(228)
	$\displaystyle=\lim_{t\to 0}\Theta\left(\exp(-t\xi),\ \frac{1}{t}\left[F(\theta_{\exp(t\xi)}(p))-\Theta_{\exp(t\xi)}(F(p))\right]\right)$
	$\displaystyle=\lim_{t\to 0}\frac{1}{t}\left[F(\theta_{\exp(t\xi)}(p))-\Theta_{\exp(t\xi)}(F(p))\right].$

The first equality follows because $\Theta_{g^{-1}}$ is a vector bundle homomorphism, meaning that the restricted map $\Theta_{g^{-1}}|_{E_{p\cdot g}}:E_{p\cdot g}\to E_{p}$ is linear; here $g=\exp(t\xi)$ . The second equality follows because $\Theta:E\times G\to E$ is continuous. Note that in the first expression the limit is taken in the vector space $E_{p}$ , whereas in the last expression the limit must be taken in $E$ .

We proceed by expressing everything in a local trivialization $\Phi:\pi^{-1}({\mathcal{U}})\to{\mathcal{U}}\times\mathbb{R}^{k}$ of an open neighborhood ${\mathcal{U}}\subset{\mathcal{M}}$ of $p\in{\mathcal{M}}$ . Since the maps $\Theta_{g}$ , $\Phi$ , and $\Phi^{-1}$ are vector bundle homomorphisms, there is a matrix-valued function ${\boldsymbol{T}}_{g}:{\mathcal{U}}\to\mathbb{R}^{k\times k}$ such that

\tilde{\Theta}_{g}=\Phi\circ\Theta_{g}\circ\Phi^{-1}:(p,{\boldsymbol{v}})\mapsto(\theta_{g}(p),\ {\boldsymbol{T}}_{g}(p){\boldsymbol{v}}).

(229)

Differentiating $\tilde{\Theta}_{\exp(t\xi)}(p,{\boldsymbol{v}})$ with respect to $t$ yields the generator

\hat{\tilde{\Theta}}(\xi)_{(p,{\boldsymbol{v}})}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}\tilde{\Theta}_{\exp(t\xi)}(p,{\boldsymbol{v}})=\left(\hat{\theta}(\xi)_{p},\ {\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{v}}\right)_{(p,{\boldsymbol{v}})},

(230)

where ${\boldsymbol{\hat{T}}}(\xi)_{p}=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right|_{t=0}{\boldsymbol{T}}_{\exp(t\xi)}(p)$ . We define the function ${\boldsymbol{\tilde{F}}}:{\mathcal{U}}\to\mathbb{R}^{k}$ by

(p,{\boldsymbol{\tilde{F}}}(p))=\Phi\circ F(p)\qquad\forall p\in{\mathcal{U}}.

(231)

Using the above definitions, we can express the generalized Lie derivative in the local trivialization:

	$\displaystyle\Phi\circ({\mathcal{L}}_{\xi}F)(p)$	$\displaystyle=\lim_{t\to 0}\left(\theta_{\exp(t\xi)}(p),\ \frac{1}{t}\left[{\boldsymbol{\tilde{F}}}(\theta_{\exp(t\xi)}(p))-{\boldsymbol{T}}_{\exp(t\xi)}{\boldsymbol{\tilde{F}}}(p)\right]\right)$		(232)
		$\displaystyle=\left(p,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right).$		(232)

Applying Lemma 38 allows us to express the left-hand-side of (141) as

\operatorname{\mathrm{d}}\Phi\left[\imath_{F}\circ({\mathcal{L}}_{\xi}F)(p)\right]=\left(0,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))}.

(233)

We can also express the quantities on the right-hand-side of (141) in the local trivialization. To do this, we compute

$\displaystyle\operatorname{\mathrm{d}}\Phi(F(p))\operatorname{\mathrm{d}}(F\circ\pi)(F(p))\hat{{\Theta}}(\xi)_{F(p)}$	$\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right\|_{t=0}\Phi\circ F\circ\pi\circ\Theta_{\exp(t\xi)}(F(p))$	(234)
	$\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right\|_{t=0}\Phi\circ F(\theta_{\exp(t\xi)}(p))$
	$\displaystyle=\left(\hat{\theta}(\xi)_{p},\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}\right)_{\Phi(F(p))}$

and

	$\displaystyle\operatorname{\mathrm{d}}\Phi(F(p))\hat{\Theta}(\xi)_{F(p)}$	$\displaystyle=\left.\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\right\|_{t=0}\Phi\circ\Theta_{\exp(t\xi)}\circ\Phi^{-1}\circ\Phi\circ F(p)$		(235)
		$\displaystyle=\hat{\tilde{\Theta}}(\xi)_{\Phi(F(p))}=\left(\hat{\theta}(\xi)_{p},\ {\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))}.$		(235)

Subtracting these yields

\operatorname{\mathrm{d}}\Phi\left[\left(\operatorname{\mathrm{d}}(F\circ\pi)(F(p))-\operatorname{Id}_{T_{F(p)}E}\right)\hat{\Theta}(\xi)_{F(p)}\right]=\left(0,\ \operatorname{\mathrm{d}}{\boldsymbol{\tilde{F}}}(p)\hat{\theta}(\xi)_{p}-{\boldsymbol{\hat{T}}}(\xi)_{p}{\boldsymbol{\tilde{F}}}(p)\right)_{\Phi(F(p))},

(236)

which, upon comparison with (233) completes the proof.

$\displaystyle\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{L}}_{\xi}F_{t}(p)$	$\displaystyle=\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)$	(199)
	$\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}{\mathcal{K}}_{\exp(\tau\xi)}F_{t}(p)$
	$\displaystyle=\left.\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}\tau}\right\|_{\tau=0}{\mathcal{K}}_{\exp(\tau\xi)}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)={\mathcal{L}}_{\xi}\Big{(}\operatorname{\frac{\operatorname{\mathrm{d}}}{\operatorname{\mathrm{d}}t}}F_{t}\Big{)}(p)$