This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\AtAppendix

On the Universality of Rotation Equivariant Point Cloud Networks

Nadav Dym
Duke University
[email protected]
&Haggai Maron
NVIDIA Research
[email protected]
Abstract

Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation. In this paper, we present a first study of the approximation power of these architectures. We first derive two sufficient conditions for an equivariant architecture to have the universal approximation property, based on a novel characterization of the space of equivariant polynomials. We then use these conditions to show that two recently suggested models (Thomas et al., 2018; Fuchs et al., 2020) are universal, and for devising two other novel universal architectures.

1 Introduction

Designing neural networks that respect data symmetry is a powerful approach for obtaining efficient deep models. Prominent examples being convolutional networks which respect the translational invariance of images, graph neural networks which respect the permutation invariance of graphs (Gilmer et al., 2017; Maron et al., 2019b), networks such as (Zaheer et al., 2017; Qi et al., 2017a) which respect the permutation invariance of sets, and networks which respect 3D rotational symmetries (Cohen et al., 2018; Weiler et al., 2018; Esteves et al., 2018; Worrall & Brostow, 2018; Kondor et al., 2018a).

[Uncaptioned image]

While the expressive power of equivariant models is reduced by design to include only equivariant functions, a desirable property of equivariant networks is universality: the ability to approximate any continuous equivariant function. This is not always the case: while convolutional networks and networks for sets are universal (Yarotsky, 2018; Segol & Lipman, 2019), popular graph neural networks are not (Xu et al., 2019; Morris et al., 2018).

In this paper, we consider the universality of networks that respect the symmetries of 3D point clouds: translations, rotations, and permutations. Designing such networks is a popular paradigm in recent years (Thomas et al., 2018; Fuchs et al., 2020; Poulenard et al., 2019; Zhao et al., 2019). While there have been many works on the universality of permutation invariant networks (Zaheer et al., 2017; Maron et al., 2019c; Keriven & Peyré, 2019), and a recent work discussing the universality of rotation equivariant networks (Bogatskiy et al., 2020), this is a first paper which discusses the universality of networks which combine rotations, permutations and translations.

We start the paper with a general, architecture-agnostic, discussion, and derive two sufficient conditions for universality. These conditions are a result of a novel characterization of equivariant polynomials for the symmetry group of interest. We use these conditions in order to prove universality of the prominent Tensor Field Networks (TFN) architecture (Thomas et al., 2018; Fuchs et al., 2020). The following is a weakened and simplified statement of Theorem 2 stated later on in the paper:

Theorem (Simplification of Theorem 2).

Any continuous equivariant function on point clouds can be approximated uniformly on compact sets by a composition of TFN layers.

We use our general discussion to prove the universality of two additional equivariant models: the first is a simple modification of the TFN architecture which allows for universality using only low dimensional filters. The second is a minimal architecture which is based on tensor product representations, rather than the more commonly used irreducible representations of SO(3)\mathrm{SO}(3). We discuss the advantages and disadvantages of both approaches.

To summarize, the contributions of this paper are: (1) A general approach for proving the universality of rotation equivariant models for point clouds; (2) A proof that two recent equivariant models (Thomas et al., 2018; Fuchs et al., 2020) are universal; (3) Two additional simple and novel universal architectures.

2 Previous work

Deep learning on point clouds. (Qi et al., 2017a; Zaheer et al., 2017) were the first to apply neural networks directly to the raw point cloud data, by using pointwise functions and pooling operations. Many subsequent works used local neighborhood information (Qi et al., 2017b; Wang et al., 2019; Atzmon et al., 2018). We refer the reader to a recent survey for more details (Guo et al., 2020). In contrast with the aforementioned works which focused solely on permutation invariance, more related to this paper are works that additionally incorporated invariance to rigid motions. (Thomas et al., 2018) proposed Tensor Field Networks (TFN) and showed their efficacy on physics and chemistry tasks.(Kondor et al., 2018b) also suggested an equivariant model for continuous rotations. (Li et al., 2019) suggested models that are equivariant to discrete subgroups of SO(3)\mathrm{SO}(3). (Poulenard et al., 2019) suggested an invariant model based on spherical harmonics. (Fuchs et al., 2020) followed TFN and added an attention mechanism. Recently, (Zhao et al., 2019) proposed a quaternion equivariant point capsule network that also achieves rotation and translation invariance.

Universal approximation for invariant networks. Understanding the approximation power of invariant models is a popular research goal. Most of the current results assume that the symmetry group is a permutation group. (Zaheer et al., 2017; Qi et al., 2017a; Segol & Lipman, 2019; Maron et al., 2020; Serviansky et al., 2020) proved universality for several SnS_{n}-invariant and equivariant models. (Maron et al., 2019b; a; Keriven & Peyré, 2019; Maehara & NT, 2019) studied the approximation power of high-order graph neural networks. (Maron et al., 2019c; Ravanbakhsh, 2020) targeted universality of networks that use high-order representations for permutation groups(Yarotsky, 2018) provided several theoretical constructions of universal equivariant neural network models based on polynomial invariants, including an SE(2)SE(2) equivariant model. In a recent work (Bogatskiy et al., 2020) presented a universal approximation theorem for networks that are equivariant to several Lie groups including SO(3)SO(3). The main difference from our paper is that we prove a universality theorem for a more complex group that besides rotations also includes translations and permutations.

3 A framework for proving universality

In this section, we describe a framework for proving the universality of equivariant networks. We begin with some mathematical preliminaries:

3.1 Mathematical setup

An action of a group GG on a real vector space WW is a collection of maps ρ(g):WW\rho(g):W\to W defined for any gGg\in G, such that ρ(g1)ρ(g2)=ρ(g1g2)\rho(g_{1})\circ\rho(g_{2})=\rho(g_{1}g_{2}) for all g1,g2Gg_{1},g_{2}\in G, and the identity element of GG is mapped to the identity mapping on WW. We say ρ\rho is a representation of GG if ρ(g)\rho(g) is a linear map for every gGg\in G. As is customary, when it does not cause confusion we often say that WW itself is a representation of GG .

In this paper, we are interested in functions on point clouds. Point clouds are sets of vectors in 3\mathbb{R}^{3} arranged as matrices:

X=(x1,,xn)3×n.X=(x_{1},\ldots,x_{n})\in\mathbb{R}^{3\times n}.

Many machine learning tasks on point clouds, such as classification, aim to learn a function which is invariant to rigid motions and relabeling of the points. Put differently, such functions are required to be invariant to the action of G=3SO(3)×SnG=\mathbb{R}^{3}\rtimes\mathrm{SO}(3)\times S_{n} on 3×n\mathbb{R}^{3\times n} via

ρG(t,R,P)(X)=R(Xt1nT)PT,\rho_{G}(t,R,P)(X)=R(X-t1_{n}^{T})P^{T}, (1)

where t3t\in\mathbb{R}^{3} defines a translation, RR is a rotation and PP is a permutation matrix.

Equivariant functions are generalizations of invariant functions: If GG acts on W1W_{1} via some action ρ1(g)\rho_{1}(g), and on W2W_{2} via some other group action ρ2(g)\rho_{2}(g), we say that a function f:W1W2f:W_{1}\to W_{2} is equivariant if

f(ρ1(g)w)=ρ2(g)f(w),wW1 and gG.f(\rho_{1}(g)w)=\rho_{2}(g)f(w),\forall w\in W_{1}\text{ and }g\in G.

Invariant functions correspond to the special case where ρ2(g)\rho_{2}(g) is the identity mapping for all gGg\in G.

In some machine learning tasks on point clouds, the functions learned are not invariant but rather equivariant. For example, segmentation tasks assign a discrete label to each point. They are invariant to translations and rotations but equivariant to permutations – in the sense that permuting the input causes a corresponding permutation of the output. Another example is predicting a normal for each point of a point cloud. This task is invariant to translations but equivariant to both rotations and permutations.

In this paper, we are interested in learning equivariant functions from point clouds into WTnW_{T}^{n}, where WTW_{T} is some representation of SO(3)\mathrm{SO}(3). The equivariance of these functions is with respect to the action ρG\rho_{G} on point clouds defined in equation 1, and the action of GG on WTnW_{T}^{n} defined by applying the rotation action from the left and permutation action from the right as in 1, but ‘ignoring’ the translation component. Thus, GG-equivariant functions will be translation invariant. This formulation of equivariance includes the normals prediction example by taking WT=3W_{T}=\mathbb{R}^{3}, as well as the segmentation case by setting WT=W_{T}=\mathbb{R} with the trivial identity representation. We focus on the harder case of functions into WTnW_{T}^{n} which are equivariant to permutations, since it easily implies the easier case of permutation invariant functions to WTW_{T}.

Notation. We use the notation +={0}\mathbb{N}_{+}=\mathbb{N}\cup\{0\} and +=r+r\mathbb{N}^{*}_{+}=\bigcup_{r\in\mathbb{N}}\mathbb{N}_{+}^{r}. We set [D]={1,,D}[D]=\{1,\ldots,D\} and [D]0={0,,D}[D]_{0}=\{0,\ldots,D\}.

Proofs. Proofs appear in the appendices, arranged according to sections.

3.2 Conditions for universality

The semi-lifted approach

In general, highly expressive equivariant neural networks can be achieved by using a ‘lifted approach’, where intermediate features in the network belong to high dimensional representations of the group. In the context of point clouds where typically n3n\gg 3, many papers, e.g., (Thomas et al., 2018; Kondor, 2018; Bogatskiy et al., 2020) use a ‘semi-lifted’ approach, where hidden layers hold only higher dimensional representations of SO(3)\mathrm{SO}(3), but not high order permutation representations. In this subsection, we propose a strategy for achieving universality with the semi-lifted approach.

We begin by an axiomatic formulation of the semi-lifted approach (see illustration in inset): we assume that our neural networks are composed of two main components: the first component is a family feat\mathcal{F}_{\mathrm{feat}} of parametric continuous GG-equivariant functions ffeatf_{\mathrm{feat}} which map the original point cloud 3×n\mathbb{R}^{3\times n} to a semi-lifted point cloud WfeatnW_{\mathrm{feat}}^{n}, where WfeatW_{\mathrm{feat}} is a lifted representation of SO(3)\mathrm{SO}(3).

[Uncaptioned image]

The second component is a family of parametric linear SO(3)\mathrm{SO}(3)-equivariant functions pool\mathcal{F}_{\mathrm{pool}}, which map from the high order representation WfeatW_{\mathrm{feat}} down to the target representation WTW_{T}. Each such SO(3)\mathrm{SO}(3)–equivariant function Λ:WfeatWT\Lambda:W_{\mathrm{feat}}\to W_{T} can be extended to a SO(3)×Sn\mathrm{SO}(3)\times S_{n} equivariant function Λ^:WfeatnWTn\hat{\Lambda}:W_{\mathrm{feat}}^{n}\to W_{T}^{n} by applying Λ\Lambda elementwise. For every positive integer CC, these two families of functions induce a family of functions C\mathcal{F}_{C} obtained by summing CC different compositions of these functions:

C(feat,pool)={f|f(X)=c=1CΛ^c(gc(X)),(Λc,gc)pool×feat}.\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}})=\{f|f(X)=\sum_{c=1}^{C}\hat{\Lambda}_{c}(g_{c}(X)),~{}(\Lambda_{c},g_{c})\in\mathcal{F}_{\mathrm{pool}}\times\mathcal{F}_{\mathrm{feat}}\}. (2)

Conditions for universality

We now describe sufficient conditions for universality using the semi-lifted approach. The first step is showing, as in (Yarotsky, 2018), that continuous GG-equivariant functions 𝒞G(3×n,WTn)\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n}) can be approximated by GG-equivariant polynomials 𝒫G(3×n,WTn)\mathcal{P}_{G}(\mathbb{R}^{3\times n},W_{T}^{n}).

Lemma 1.

Any continuous GG-equivariant function in 𝒞G(3×n,WTn)\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n}) can be approximated uniformly on compact sets by GG-equivariant polynomials in 𝒫G(3×n,WTn)\mathcal{P}_{G}(\mathbb{R}^{3\times n},W_{T}^{n}).

Universality is now reduced to the approximation of GG-equivariant polynomials. We provide two sufficient conditions which guarantee that GG-equivariant polynomials of degree DD can be expressed by function spaces C(feat,pool)\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}}) as defined in equation 2.

The first sufficient condition is that feat\mathcal{F}_{\mathrm{feat}} is able to represent all polynomials which are translation invariant and permutation-equivariant (but not necessarily SO(3)\mathrm{SO}(3)- equivariant). More precisely:

Definition 1 (DD-spanning).

For D+D\in\mathbb{N}_{+}, let feat\mathcal{F}_{\mathrm{feat}} be a subset of 𝒞G(3×n,Wfeatn)\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{\mathrm{feat}}^{n}). We say that feat\mathcal{F}_{\mathrm{feat}} is DD-spanning, if there exist f1,,fKfeatf_{1},\ldots,f_{K}\in\mathcal{F}_{\mathrm{feat}}, such that every polynomial p:3×nnp:\mathbb{R}^{3\times n}\to\mathbb{R}^{n} of degree DD which is invariant to translations and equivariant to permutations, can be written as

p(X)=k=1KΛ^k(fk(X)),p(X)=\sum_{k=1}^{K}\hat{\Lambda}_{k}(f_{k}(X)), (3)

where Λk:Wfeat\Lambda_{k}:W_{\mathrm{feat}}\to\mathbb{R} are all linear functionals, and Λ^k:Wfeatnn\hat{\Lambda}_{k}:W_{\mathrm{feat}}^{n}\to\mathbb{R}^{n} are the functions defined by elementwise applications of Λk\Lambda_{k}.

In Lemma 3 we provide a concrete condition which implies DD-spanning.

The second sufficient condition is that the set of linear SO(3)\mathrm{SO}(3) equivariant functionals pool\mathcal{F}_{\mathrm{pool}} contains all possible equivariant linear functionals:

Definition 2 (Linear universality).

We say that a collection pool\mathcal{F}_{\mathrm{pool}} of equivariant linear functionals between two representations WfeatW_{\mathrm{feat}} and WTW_{T} of SO(3)\mathrm{SO}(3) is linearly universal, if it contains all linear SO(3)\mathrm{SO}(3)-equivariant mappings between the two representations.

When these two necessary conditions apply, a rather simple symmetrization arguments leads to the following theorem:

Theorem 1.

If feat\mathcal{F}_{\mathrm{feat}} is DD-spanning and pool\mathcal{F}_{\mathrm{pool}} is linearly universal, then there exists some C(D)C(D)\in\mathbb{N} such that for all CC(D)C\geq C(D) the function space C(feat,pool)\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}}) contains all GG-equivariant polynomials of degree D\leq D.

[Uncaptioned image]

As a result of Theorem 1 and Lemma 1 we obtain our universality result (see inset for illustration)

Corollary 1.

For all C,D+C,D\in\mathbb{N}_{+}, let C,D\mathcal{F}_{C,D} denote function spaces generated by a pair of functions spaces which are DD-spanning and linearly universal as in equation 2. Then any continuous GG-equivariant function in 𝒞G(3×n,WTn)\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n}) can be approximated uniformly on compact sets by equivariant functions in

=DC(D),D.\mathcal{F}=\bigcup_{D\in\mathbb{N}}\mathcal{F}_{C(D),D}.

3.3 Sufficient conditions in action

In the remainder of the paper, we prove the universality of several GG-equivariant architectures, based on the framework we discussed in the previous subsection. We discuss two different strategies for achieving universality, which differ mainly in the type of lifted representations of SO(3)\mathrm{SO}(3) they use: (i) The first strategy uses (direct sums of) tensor-product representations; (ii) The second uses (direct sums of) irreducible representations. The main advantage of the first strategy from the perspective of our methodology is that achieving the DD-spanning property is more straightforward. The advantage of irreducible representations is that they almost automatically guarantees the linear universality property.

In Section 4 we discuss universality through tensor product representations, and give an example of a minimal tensor representation network architecture that would satisfy universality. In section 5 we discuss universality through irreducible representations, which is currently the more common strategy. We show that the TFN architecture (Thomas et al., 2018; Fuchs et al., 2020) which follows this strategy is universal, and describe a simple tweak that achieves universality using only low order filters, though the representations throughout the network are high dimensional.

4 Universality with tensor representations

In this section, we prove universality for models that are based on tensor product representations, as defined below. The main advantage of this approach is that DD-spanning is achieved rather easily. The main drawbacks are that its data representation is somewhat redundant and that characterizing the linear equivariant layers is more laborious.

[Uncaptioned image]

Tensor representations We begin by defining tensor representations. For k+k\in\mathbb{N}_{+} denote 𝒯k=3k\mathcal{T}_{k}=\mathbb{R}^{3^{k}}. SO(3)\mathrm{SO}(3) acts on 𝒯k\mathcal{T}_{k} by the tensor product representation, i.e., by applying the matrix Kronecker product kk times: ρk(R):=Rk\rho_{k}(R):=R^{\otimes k}. The inset illustrates the vector spaces and action for k=1,2,3k=1,2,3. With this action, for any i1,,ik[n]i_{1},\ldots,i_{k}\in[n], the map from 3×n\mathbb{R}^{3\times n} to 𝒯k\mathcal{T}_{k} defined by

(x1,,xn)xi1xi2xik(x_{1},\ldots,x_{n})\mapsto x_{i_{1}}\otimes x_{i_{2}}\ldots\otimes x_{i_{k}} (4)

is SO(3)\mathrm{SO}(3) equivariant.

A DD-spanning family We now show that tensor representations can be used to define a finite set of DD-spanning functions. The lifted representation WfeatW_{\mathrm{feat}} will be given by

Wfeat𝒯=T=0D𝒯T.W_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T}.

The DD-spanning functions are indexed by vectors r=(r1,,rK)\vec{r}=(r_{1},\ldots,r_{K}), where each rkr_{k} is a non-negative integer. The functions Q(r)=(Qj(r))j=1nQ^{(\vec{r})}=(Q_{j}^{(\vec{r})})_{j=1}^{n} are defined for fixed j[n]j\in[n] by

Qj(r)(X)=i2,,iK=1nxjr1xi2r2xi3r3xiKrK.Q_{j}^{(\vec{r})}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\otimes r_{1}}\otimes x_{i_{2}}^{\otimes r_{2}}\otimes x_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes x_{i_{K}}^{\otimes r_{K}}. (5)

The functions Qj(r)Q_{j}^{(\vec{r})} are SO(3)\mathrm{SO}(3) equivariant as they are a sum of equivariant functions from equation 4. Thus Q(r)Q^{(\vec{r})} is SO(3)×Sn\mathrm{SO}(3)\times S_{n} equivariant. The motivation behind the definition of these functions is that known characterizations of permutation equivariant polynomials (Segol & Lipman, 2019) tell us that the entries of these tensor valued functions span all permutation equivariant polynomials (see the proof of Lemma 1 for more details). To account for translation invariance, we compose the functions Q(r)Q^{(\vec{r})} with the centralization operation and define the set of functions

𝒬D={ιQ(r)(X1nX1n1nT)|r1D},{\mathcal{Q}}_{D}=\{\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})|\,\|\vec{r}\|_{1}\leq D\}, (6)

where ι\iota is the natural embedding that takes each 𝒯T\mathcal{T}_{T} into Wfeat𝒯=T=0D𝒯TW_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T}. In the following lemma, we prove that this set is DD-spanning.

Lemma 1.

For every D+D\in\mathbb{N}_{+}, the set 𝒬D{\mathcal{Q}}_{D} is DD-spanning.

A minimal universal architecture Once we have shown that 𝒬D{\mathcal{Q}}_{D} is DD-spanning, we can design DD-spanning architectures, by devising architectures that are able to span all elements of 𝒬D{\mathcal{Q}}_{D}. As we will now show, the compositional nature of neural networks allows us to do this in a very clean manner.

We define a parametric function f(X,V|θ1,θ2)f(X,V|\theta_{1},\theta_{2}) which maps 3×n𝒯kn\mathbb{R}^{3\times n}\oplus\mathcal{T}_{k}^{n} to 3×n𝒯k+1n\mathbb{R}^{3\times n}\oplus\mathcal{T}_{k+1}^{n} as follows: For all j[n]j\in[n], we have fj(X,V)=(Xj,V~j(X,V))f_{j}(X,V)=(X_{j},\tilde{V}_{j}(X,V)), where

V~j(X,V|θ1,θ2)=θ1(xjVj)+θ2i(xiVi)\tilde{V}_{j}(X,V|\theta_{1},\theta_{2})=\theta_{1}\left(x_{j}\otimes V_{j}\right)+\theta_{2}\sum_{i}\left(x_{i}\otimes V_{i}\right) (7)

We denote the set of functions (X,V)f(X,V|θ1,θ2)(X,V)\mapsto f(X,V|\theta_{1},\theta_{2}) obtained by choosing the parameters θ1,θ2\theta_{1},\theta_{2}\in\mathbb{R}, by min\mathcal{F}_{min} . While in the hidden layers of our network the data is represented using both coordinates (X,V)(X,V), the input to the network only contains an XX coordinate and the output only contains a VV coordinate. To this end, we define the functions

ext(X)=(X,1n) and πV(X,V)=V.\mathrm{ext}(X)=(X,1_{n})\text{ and }\pi_{V}(X,V)=V. (8)

We can achieve DD-spanning by composition of functions in min\mathcal{F}_{min} with these functions and centralizing:

Lemma 2.

The function set 𝒬D{\mathcal{Q}}_{D} is contained in

feat={ιπVf1f2fText(X1nX1n1nT)|fjmin,TD}.\mathcal{F}_{\mathrm{feat}}=\{\iota\circ\pi_{V}\circ f^{1}\circ f^{2}\circ\ldots\circ f^{T}\circ\mathrm{ext}(X-\frac{1}{n}X1_{n}1_{n}^{T})|\,f^{j}\in\mathcal{F}_{min},T\leq D\}. (9)

Thus feat\mathcal{F}_{\mathrm{feat}} is DD-spanning.

To complete the construction of a universal network, we now need to characterize all linear equivariant functions from Wfeat𝒯W_{\mathrm{feat}}^{\mathcal{T}} to the target representation WTW_{T}. In Appendix G we show how this can be done for the trivial representation WT=W_{T}=\mathbb{R}. This characterization gives us a set of linear function pool\mathcal{F}_{\mathrm{pool}}, which combined with feat\mathcal{F}_{\mathrm{feat}} defined in equation 9 (corresponds to SO(3)\mathrm{SO}(3) invariant functions) gives us a universal architecture as in Theorem 1. However, the disadvantage of this approach is that implementation of the linear functions in pool\mathcal{F}_{\mathrm{pool}} is somewhat cumbersome.

In the next section we discuss irreducible representations, which give us a systematic way to address linear equivariant mappings into any WTW_{T}. Proving DD-spanning for these networks is accomplished via the DD-spanning property of tensor representations, through the following lemma

Lemma 3.

If all functions in 𝒬D{\mathcal{Q}}_{D} can be written as

ιQ(r)(X1nX1n1nT)=k=1KA^kfk(X),\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})=\sum_{k=1}^{K}\hat{A}_{k}f_{k}(X),

where fkfeatf_{k}\in\mathcal{F}_{\mathrm{feat}}, Ak:WfeatWfeat𝒯A_{k}:W_{\mathrm{feat}}\to W_{\mathrm{feat}}^{\mathcal{T}} and A^k:Wfeatn(Wfeat𝒯)n\hat{A}_{k}:W_{\mathrm{feat}}^{n}\to(W_{\mathrm{feat}}^{\mathcal{T}})^{n} is defined by elementwise application of AkA_{k}, then feat\mathcal{F}_{\mathrm{feat}} is DD-spanning.

5 Universality with irreducible representations

In this section, we discuss how to achieve universality when using irreducible representations of SO(3)\mathrm{SO}(3). We will begin by defining irreducible representations, and explaining how linear universality is easily achieved by them, while the DD-spanning properties of tensor representations can be preserved. This discussion can be seen as an interpretation of the choices made in the construction of TFN and similar networks in the literature. We then show that these architectures are indeed universal.

5.1 Irreducible representations of SO(3)SO(3)

In general, any finite-dimensional representation WW of a compact group HH can be decomposed into irreducible representations: a subspace W0WW_{0}\subset W is HH-invariant if hwW0hw\in W_{0} for all hH,wW0h\in H,w\in W_{0}. A representation WW is irreducible if it has no non-trivial invariant subspaces. In the case of SO(3)\mathrm{SO}(3), all irreducible real representations are defined by matrices D()(R)D^{(\ell)}(R), called the real Wigner D-matrices, acting on W:=2+1W_{\ell}:=\mathbb{R}^{2\ell+1} by matrix multiplication. In particular, the representation for =0,1\ell=0,1 are D(0)(R)=1D^{(0)}(R)=1 and D(1)(R)=RD^{(1)}(R)=R.

Linear maps between irreducible representations As mentioned above, one of the main advantages of using irreducible representations is that there is a very simple characterization of all linear equivariant maps between two direct sums of irreducible representations. We use the notation W𝒍W_{\bm{l}} for direct sums of irreducible representations, where 𝒍=(1,,K)+K{\bm{l}}=(\ell_{1},\ldots,\ell_{K})\in\mathbb{N}_{+}^{K} and W𝒍=k=1KWkW_{\bm{l}}=\bigoplus_{k=1}^{K}W_{\ell_{k}}.

Lemma 1.

Let 𝐥(1)=(1(1),,K1(1)){\bm{l}}^{(1)}=(\ell_{1}^{(1)},\ldots,\ell_{K_{1}}^{(1)}) and 𝐥(2)=(1(2),,K2(2)){\bm{l}}^{(2)}=(\ell_{1}^{(2)},\ldots,\ell_{K_{2}}^{(2)}). A function Λ=(Λ1,,ΛK2)\Lambda=(\Lambda_{1},\ldots,\Lambda_{K_{2}}) is a linear equivariant mapping between W𝐥(1)W_{{\bm{l}}^{(1)}} and W𝐥(2)W_{{\bm{l}}^{(2)}}, if and only if there exists a K1×K2K_{1}\times K_{2} matrix MM with Mij=0M_{ij}=0 whenever i(1)j(2)\ell_{i}^{(1)}\neq\ell_{j}^{(2)}, such that

Λj(V)=i=1K1MijVi\Lambda_{j}(V)=\sum_{i=1}^{K_{1}}M_{ij}V_{i} (10)

where V=(Vi)i=1K1V=(V_{i})_{i=1}^{K_{1}} and ViWi(1)V_{i}\in W_{\ell_{i}^{(1)}} for all i=1,,K1i=1,\ldots,K_{1}.

We note that this lemma, which is based on Schur’s lemma, was proven in the complex setting in (Kondor, 2018). Here we observe that it holds for real irreducible representations of SO(3)\mathrm{SO}(3) as well since their dimension is always odd.

Clebsch-Gordan decomposition of tensor products As any finite-dimensional representation of SO(3)\mathrm{SO}(3) can be decomposed into a direct sum of irreducible representations, this is true for tensor representations as well. In particular, the Clebsch-Gordan coefficients provide an explicit formula for decomposing the tensor product of two irreducible representations W1W_{\ell_{1}} and W2W_{\ell_{2}} into a direct sum of irreducible representations. This decomposition can be easily extended to decompose the tensor product W𝒍1W𝒍2W_{{\bm{l}}_{1}}\otimes W_{{\bm{l}}_{2}} into a direct sum of irreducible representations, where 𝒍1,𝒍2{\bm{l}}_{1},{\bm{l}}_{2} are now vectors. In matrix notation, this means there is a unitary linear equivariant U(𝒍1,𝒍2)U({\bm{l}}_{1},{\bm{l}}_{2}) mapping of W𝒍1W𝒍2W_{{\bm{l}}_{1}}\otimes W_{{\bm{l}}_{2}} onto W𝒍W_{{\bm{l}}}, where the explicit values of 𝒍=𝒍(𝒍1,𝒍2){\bm{l}}={\bm{l}}({\bm{l}}_{1},{\bm{l}}_{2}) and the matrix U(𝒍1,𝒍2)U({\bm{l}}_{1},{\bm{l}}_{2}) can be inferred directly from the case where 1\ell_{1} and 2\ell_{2} are scalars.

By repeatedly taking tensor products and applying Clebsch-Gordan decompositions to the result, TFN and similar architectures can achieve the DD-spanning property in a manner analogous to tensor representations, and also enjoy linear universality since they maintain irreducible representations throughout the network.

5.2 Tensor field networks

We now describe the basic layers of the TFN architecture (Thomas et al., 2018), which are based on irreducible representations, and suggest an architecture based on these layers which can approximate GG-equivariant maps into any representation W𝒍Tn,𝒍T+W_{{\bm{l}}_{T}}^{n},{\bm{l}}_{T}\in\mathbb{N}^{*}_{+}. There are some superficial differences between our description of TFN and the description in the original paper, for more details see Appendix F.

We note that the universality of TFN also implies the universality of (Fuchs et al., 2020), which is a generalization of TFN that enables adding an attention mechanism. Assuming the attention mechanism is not restricted to local neighborhoods, this method is at least as expressive as TFN.

TFNs are composed of three types of layers: (i) Convolution (ii) Self-interaction and (iii) Non-linearities. In our architecture, we only use the first two layers types, which we will now describe:111Since convolution layers in TFN are not linear, the nonlinearities are formally redundant.

Convolution. Convolutional layers involve taking tensor products of a filter and a feature vector to create a new feature vector, and then decomposing into irreducible representations. Unlike in standard CNN, a filter here depends on the input, and is a function F:3W𝒍DF:\mathbb{R}^{3}\to W_{{\bm{l}}_{D}}, where 𝒍D=[0,1,,D]T{\bm{l}}_{D}=[0,1,\ldots,D]^{T}. The \ell-th component of the filter F(x)=[F(0)(x),,F(D)(x)]F(x)=\left[F^{(0)}(x),\ldots,F^{(D)}(x)\right] will be given by

Fm()(x)=R()(x)Ym(x^),m=,,F^{(\ell)}_{m}(x)=R^{(\ell)}(\|x\|)Y^{\ell}_{m}(\hat{x}),\,m=-\ell,\ldots,\ell (11)

where x^=x/x\hat{x}=x/\|x\| if x0x\neq 0 and x^=0\hat{x}=0 otherwise, YmY^{\ell}_{m} are spherical harmonics, and R()R^{(\ell)} any polynomial of degree D\leq D. In Appendix F we show that these polynomial functions can be replaced by fully connected networks, since the latter can approximate all polynomials uniformly.

The convolution of an input feature VW𝒍inV\in W_{{\bm{l}}_{i}}^{n} and a filter FF as defined above, will give an output feature V~=(V~a)a=1nW𝒍0n\tilde{V}=(\tilde{V}_{a})_{a=1}^{n}\in W_{{\bm{l}}_{0}}^{n}, where 𝒍o=𝒍(𝒍f,𝒍i){\bm{l}}_{o}={\bm{l}}({\bm{l}}_{f},{\bm{l}}_{i}), which is given by

V~a(X,V)=U(𝒍f,𝒍i)(θ0Va+b=1nF(xaxb)Vb).\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{0}V_{a}+\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}\right). (12)

More formally we will think of convolutional layer as functions of the form f(X,V)=(X,V~(X,V))f(X,V)=(X,\tilde{V}(X,V)). These functions are defined by a choice of DD, a choice of a scalar polynomial R(),=0,,DR^{(\ell)},\ell=0,\ldots,D, and a choice of the parameter θ0\theta_{0}\in\mathbb{R} in equation 12. We denote the set of all such functions ff by D\mathcal{F}_{D}.

Self Interaction layers. Self interaction layers are linear functions from Λ^:W𝒍nW𝒍Tn\hat{\Lambda}:W_{{\bm{l}}}^{n}\to W_{{\bm{l}}_{T}}^{n}, which are obtained from elementwise application of equivariant linear functions Λ:W𝒍W𝒍T\Lambda:W_{{\bm{l}}}\to W_{{\bm{l}}_{T}}. These linear functions can be specified by a choice of matrix MM with the sparsity pattern described in Lemma 1.

Network architecture. For our universality proof, we suggest a simple architecture which depends on two positive integer parameters (C,D)(C,D): For given DD, we will define feat(D)\mathcal{F}_{\mathrm{feat}}(D) as the set of function obtained by 2D2D recursive convolutions

feat(D)={πVf2Df2f1ext(X)|fjD},\mathcal{F}_{\mathrm{feat}}(D)=\{\pi_{V}\circ f^{2D}\circ\ldots f^{2}\circ f^{1}\circ\mathrm{ext}(X)|\,f^{j}\in\mathcal{F}_{D}\},

where ext\mathrm{ext} and πV\pi_{V} are defined as in equation 8. The output of a function in feat(D)\mathcal{F}_{\mathrm{feat}}(D) is in W𝒍(D)nW_{{\bm{l}}(D)}^{n}, for some 𝒍(D){\bm{l}}(D) which depends on DD. We then define pool(D)\mathcal{F}_{\mathrm{pool}}(D) to be the self-interaction layers which map W𝒍(D)nW_{{\bm{l}}(D)}^{n} to W𝒍TnW_{{\bm{l}}_{T}}^{n}. This choice of feat(D)\mathcal{F}_{\mathrm{feat}}(D) and pool(D)\mathcal{F}_{\mathrm{pool}}(D), together with a choice of the number of channels CC, defines the final network architecture C,DTFN=C(feat(D),pool(D))\mathcal{F}^{\mathrm{TFN}}_{C,D}=\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}}(D),\mathcal{F}_{\mathrm{pool}}(D)) as in equation 2. In the appendix we prove the universality of TFN:

Theorem 2.

For all n,𝐥T+n\in\mathbb{N},{\bm{l}}_{T}\in\mathbb{N}^{*}_{+},

  1. 1.

    For D+D\in\mathbb{N}_{+}, every GG-equivariant polynomial p:3×nWTnp:\mathbb{R}^{3\times n}\to W_{T}^{n} of degree DD is in C(D),DTFN\mathcal{F}^{\mathrm{TFN}}_{C(D),D}.

  2. 2.

    Every continuous GG-equivariant function can be approximated uniformly on compact sets by functions in D+C(D),DTFN\cup_{D\in\mathbb{N}_{+}}\mathcal{F}^{\mathrm{TFN}}_{C(D),D}

As discussed previously, the linear universality of pool\mathcal{F}_{\mathrm{pool}} is guaranteed. Thus proving Theorem 2 amounts to showing that feat(D)\mathcal{F}_{\mathrm{feat}}(D) is DD-spanning. This is done using the sufficient condition for DD-spanning defined in Lemma 3.

Alternative architecture

The complexity of the TFN network used to construct GG-equivariant polynomials of degree DD, can be reduced using a simple modifications of the convolutional layer in equation 12: We add two parameters θ1,θ2\theta_{1},\theta_{2}\in\mathbb{R} to the convolutional layer, which is now defined as:

V~a(X,V)=U(𝒍f,𝒍i)(θ1b=1nF(xaxb)Vb+θ2b=1nF(xaxb)Va).\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{1}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}+\theta_{2}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{a}\right). (13)

With this simple change, we can show that feat(D)\mathcal{F}_{\mathrm{feat}}(D) is DD-spanning even if we only take filters of order 0 and 11 throughout the network. This is shown in Appendix E.

6 Conclusion

In this paper, we have presented a new framework for proving the universality of GG-equivariant point cloud networks. We used this framework for proving the universality of the TFN model (Thomas et al., 2018; Fuchs et al., 2020), and for devising two additional novel simple universal architectures.

We believe that the framework we developed here will be useful for proving the universality of other GG-equivariant models for point cloud networks, and other related equivariant models. We note that while the discussion in the paper was limited to point clouds in 3\mathbb{R}^{3} and to the action of SO(3)\mathrm{SO}(3), large parts of it are relevant to many other setups involving dd-dimensional point clouds with symmetry groups of the form G=dH×SnG=\mathbb{R}^{d}\rtimes H\times S_{n} on d×n\mathbb{R}^{d\times n}, where HH can be any compact topological group.

Acknowledgements

The authors would like to thank Taco Cohen for helpful discussions. N.D. acknowledge the support of Simons Math+X Investigators Award 400837.

References

  • Atzmon et al. (2018) Matan Atzmon, Haggai Maron, and Yaron Lipman. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018.
  • Bogatskiy et al. (2020) Alexander Bogatskiy, Brandon Anderson, Jan T Offermann, Marwah Roussi, David W Miller, and Risi Kondor. Lorentz group equivariant neural network for particle physics. arXiv preprint arXiv:2006.04780, 2020.
  • Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018.
  • Dai & Xu (2013) Feng Dai and Yuan Xu. Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
  • Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  52–68, 2018.
  • Fuchs et al. (2020) Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. arXiv preprint arXiv:2006.10503, 2020.
  • Fulton & Harris (2013) William Fulton and Joe Harris. Representation theory: a first course, volume 129. Springer Science & Business Media, 2013.
  • Gilmer et al. (2017) Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.
  • Guo et al. (2020) Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  • Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. CoRR, abs/1905.04943, 2019. URL http://arxiv.org/abs/1905.04943.
  • Kondor (2018) Risi Kondor. N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588, 2018.
  • Kondor et al. (2018a) Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems, pp. 10117–10126, 2018a.
  • Kondor et al. (2018b) Risi Kondor, Hy Truong Son, Horace Pan, Brandon Anderson, and Shubhendu Trivedi. Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144, 2018b.
  • Kraft & Procesi (2000) Hanspeter Kraft and Claudio Procesi. Classical invariant theory, a primer. Lecture Notes, Version, 2000.
  • Li et al. (2019) Jiaxin Li, Yingcai Bi, and Gim Hee Lee. Discrete rotation equivariance for point cloud recognition. In 2019 International Conference on Robotics and Automation (ICRA), pp.  7269–7275. IEEE, 2019.
  • Maehara & NT (2019) Takanori Maehara and Hoang NT. A simple proof of the universality of invariant/equivariant graph neural networks, 2019.
  • Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. arXiv preprint arXiv:1905.11136, 2019a.
  • Maron et al. (2019b) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b. URL https://openreview.net/forum?id=Syx72jC9tm.
  • Maron et al. (2019c) Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. In International conference on machine learning, 2019c.
  • Maron et al. (2020) Haggai Maron, Or Litany, Gal Chechik, and Ethan Fetaya. On learning sets of symmetric elements. arXiv preprint arXiv:2002.08599, 2020.
  • Morris et al. (2018) Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. arXiv preprint arXiv:1810.02244, 2018.
  • Poulenard et al. (2019) Adrien Poulenard, Marie-Julie Rakotosaona, Yann Ponty, and Maks Ovsjanikov. Effective rotation-invariant point cnn with spherical harmonics kernels. In 2019 International Conference on 3D Vision (3DV), pp. 47–56. IEEE, 2019.
  • Qi et al. (2017a) Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017a.
  • Qi et al. (2017b) Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108, 2017b.
  • Ravanbakhsh (2020) Siamak Ravanbakhsh. Universal equivariant multilayer perceptrons. arXiv preprint arXiv:2002.02912, 2020.
  • Segol & Lipman (2019) Nimrod Segol and Yaron Lipman. On universal equivariant set networks. arXiv preprint arXiv:1910.02421, 2019.
  • Serviansky et al. (2020) Hadar Serviansky, Nimrod Segol, Jonathan Shlomi, Kyle Cranmer, Eilam Gross, Haggai Maron, and Yaron Lipman. Set2graph: Learning graphs from sets. arXiv preprint arXiv:2002.08772, 2020.
  • Thomas et al. (2018) Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
  • Wang et al. (2019) Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
  • Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco Cohen. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data. 2018. URL http://arxiv.org/abs/1807.02547.
  • Worrall & Brostow (2018) Daniel Worrall and Gabriel Brostow. Cubenet: Equivariance to 3d rotation and translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  567–584, 2018.
  • Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  • Yarotsky (2018) Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018.
  • Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp. 3391–3401, 2017.
  • Zhao et al. (2019) Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, and Federico Tombari. Quaternion equivariant capsule networks for 3d point clouds. arXiv preprint arXiv:1912.12098, 2019.

Appendix A Notation

We introduce some notation for the proofs in the appendices. We use the shortened notation X¯=X1nX1n1nT\bar{X}=X-\frac{1}{n}X1_{n}1_{n}^{T} and denote the columns of X¯\bar{X} by (x¯1,,x¯n)(\bar{x}_{1},\ldots,\bar{x}_{n}). We denote

ΣT={r+|r1=T}\Sigma_{T}=\{\vec{r}\in\mathbb{N}^{*}_{+}|\,\|\vec{r}\|_{1}=T\}

Appendix B Proofs for Section 3

B.1 GG-equivariant polynomials are dense

A first step in proving denseness of GG-equivariance polynomials, and in the proof used in the next subsection is the following simple lemma, which shows that translation invariance can be dealt with simply by centralizing the point cloud.

In the following, ρWT\rho_{W_{T}} is some representation of SO(3)\mathrm{SO}(3) on a finite dimensional real vector space WTW_{T}. this induces an action ρWT×Sn\rho_{W_{T}\times S_{n}} of SO(3)×Sn\mathrm{SO}(3)\times S_{n} on WTnW_{T}^{n} by

ρWT×Sn(R,P)(Y)=ρWT(R)YPT\rho_{W_{T}\times S_{n}}(R,P)(Y)=\rho_{W_{T}}(R)YP^{T}

This is also the action of GG which we consider, ρG=ρWT×Sn\rho_{G}=\rho_{W_{T}\times S_{n}}, where we have invariance with respect to the translation coordinate. The action of GG on 3×n\mathbb{R}^{3\times n} is defined in equation 1.

Lemma 1.

A function f:3×nWTnf:\mathbb{R}^{3\times n}\to W_{T}^{n} is GG-equivariant, if and only if there exists a function hh which is equivariant with respect to the action of SO(3)×Sn\mathrm{SO}(3)\times S_{n} on 3×n\mathbb{R}^{3\times n}, and

f(X)=h(X1nX1n1nT)f(X)=h(X-\frac{1}{n}X1_{n}1_{n}^{T}) (14)
Proof.

Recall that GG-equivariance means SO(3)×Sn\mathrm{SO}(3)\times S_{n} equivariance and translation invariance. Thus if ff is GG-equivariant then equation 14 holds with h=fh=f.

On the other hand, if ff satisfies equation 14 then we claim it is GG-equivariant. Indeed, for all (t,R,P)dSO(3)×Sn(t,R,P)\in\mathbb{R}^{d}\rtimes\mathrm{SO}(3)\times S_{n} , since PT1n1nT=1n1nT=1n1nTPTP^{T}1_{n}1_{n}^{T}=1_{n}1_{n}^{T}=1_{n}1_{n}^{T}P^{T},

f(ρG(t,R,P)(X))=\displaystyle f\left(\rho_{G}(t,R,P)(X)\right)= f(R(X+t1n)PT)=h(R(X+t1n)PT1nR(X+t1)PT1n1nT)\displaystyle f(R(X+t1_{n})P^{T})=h(R(X+t1_{n})P^{T}-\frac{1}{n}R(X+t1)P^{T}1_{n}1_{n}^{T})
=h(R(X1nX1n1nT)PT)=h(ρ3×Sn(R,P)(X1nX1n1nT))\displaystyle=h(R(X-\frac{1}{n}X1_{n}1_{n}^{T})P^{T})=h\left(\rho_{\mathbb{R}^{3}\times S_{n}}(R,P)(X-\frac{1}{n}X1_{n}1_{n}^{T})\right)
=ρWT×Sn(R,P)h(X1nX1n1nT)\displaystyle=\rho_{W_{T}\times S_{n}}(R,P)h\left(X-\frac{1}{n}X1_{n}1_{n}^{T}\right)
=ρG(t,R,P)f(X).\displaystyle=\rho_{G}(t,R,P)f(X).

We now prove denseness of GG-equivariant polynomials in the space of GG-invariant continuous functions (Lemma 1). See 1

Proof of Lemma 1.

Let K3×nK\subseteq\mathbb{R}^{3\times n} be a compact set. We need to show that continuous GG-equivariant functions can be approximated uniformly in KK by GG-equivariant polynomials. Let K0K_{0} denote the compact set which is the image of KK under the centralizing map XX1nX1n1nTX\mapsto X-\frac{1}{n}X1_{n}1_{n}^{T}. By Lemma 1, it is sufficient to show that every SO(3)×Sn\mathrm{SO}(3)\times S_{n} equivariant continuous function ff can be approximated uniformly on K0K_{0} by a sequence of SO(3)×Sn\mathrm{SO}(3)\times S_{n} equivariant polynomials pkp_{k}. The argument is now concluded by the following general lemma:

Lemma 2.

Let GG be a compact group, Let ρ1\rho_{1} and ρ2\rho_{2} be continuous222By this we mean that the maps (g,X)ρj(g)X,j=1,2(g,X)\mapsto\rho_{j}(g)X,j=1,2 are jointly continuousrepresentations of GG on the Euclidean spaces W1W_{1} and W2W_{2}. Let KW1K\subseteq W_{1} be a compact set. Then every equivariant function f:W1W2f:W_{1}\to W_{2} can be approximated uniformly on KK by a sequence of equivariant polynomials pk:W1W2p_{k}:W_{1}\mapsto W_{2}.

Let μ\mu be the Haar probability measure associated with the compact group GG. Let K1K_{1} denote the compact set obtained as an image of the compact set G×KG\times K under the continuous mapping

(g,X)ρ1(g)X.(g,X)\mapsto\rho_{1}(g)X.

Using the Stone-Weierstrass theorem, let pkp_{k} be a sequence of (not necessarily equivariant) polynomials which approximate ff uniformly on K1K_{1}. Every degree DD polynomial p:W1W2p:W_{1}\to W_{2} induces a GG-equivariant function

p(X)=Gρ2(g1)p(ρ1(g)X)𝑑μ(g).\langle p\rangle(X)=\int_{G}\rho_{2}(g^{-1})p(\rho_{1}(g)X)d\mu(g).

This function p\langle p\rangle is a degree DD polynomial as well: This is because p\langle p\rangle can be approximated uniformly on K1K_{1} by “Riemann Sums” of the form j=1Nwjρ2(gj1)p(ρ1(gj)X)\sum_{j=1}^{N}w_{j}\rho_{2}(g_{j}^{-1})p(\rho_{1}(g_{j})X) which are degree DD polynomials, and because degree DD polynomials are closed in C(K1)C(K_{1}).

Now for all XK1X\in K_{1}, continuity of the function gρ2(g1)g\mapsto\rho_{2}(g^{-1}) implies that the operator norm of ρ2(g1)\rho_{2}(g^{-1}) is bounded uniformly by some constant N>0N>0, and so

|pk(X)f(X)|\displaystyle|\langle p_{k}\rangle(X)-f(X)| =|Gρ2(g1)pk(ρ1(g)X)ρ2(g1)f(ρ1(g)X)dμ(g)|\displaystyle=\left|\int_{G}\rho_{2}(g^{-1})p_{k}(\rho_{1}(g)X)-\rho_{2}(g^{-1})f(\rho_{1}(g)X)d\mu(g)\right|
=|Gρ2(g1)[pk(ρ1(g)X)f(ρ1(g)X)]𝑑μ(g)|NfpkC(K1)0\displaystyle=\left|\int_{G}\rho_{2}(g^{-1})\left[p_{k}(\rho_{1}(g)X)-f(\rho_{1}(g)X)\right]d\mu(g)\right|\leq N\|f-p_{k}\|_{C(K_{1})}\rightarrow 0

We prove Theorem 1 See 1

Proof.

By the DD-spanning assumption, there exist f1,,fKfeatf_{1},\ldots,f_{K}\in\mathcal{F}_{\mathrm{feat}} such that any vector valued polynomial p:3×nnp:\mathbb{R}^{3\times n}\to\mathbb{R}^{n} invariant to translations and equivariant to permutations is of the form

p(X)=k=1KΛ^k(fk(X)),p(X)=\sum_{k=1}^{K}\hat{\Lambda}_{k}(f_{k}(X)), (15)

where Λk\Lambda_{k} are linear functions to \mathbb{R}. If pp is a matrix value polynomial mapping 3×n\mathbb{R}^{3\times n} to WTn=t×nW_{T}^{n}=\mathbb{R}^{t\times n}, which is invariant to translations and equivariant to permutations, then it is of the form p=(pij)i[t],j[n]p=(p_{ij})_{i\in[t],j\in[n]}, and each pi=(pij)j[n]p_{i}=(p_{ij})_{j\in[n]} is itself invariant to translations and permutation equivariant. It follows that matrix valued pp can also be written in the form equation 15, the only difference being that the image of the linear functions Λk\Lambda_{k} is now t\mathbb{R}^{t}.

Now let p:d×nWTp:\mathbb{R}^{d\times n}\to W_{T} be a GG-equivariant polynomial of degree D\leq D. It remains to show that we can choose Λk\Lambda_{k} to be SO(3)\mathrm{SO}(3) equivariant. We do this by a symmetrization argument: denote the Haar probability measure on SO(3)\mathrm{SO}(3) by ν\nu, and the action of SO(3)\mathrm{SO}(3) on WfeatW_{\mathrm{feat}} and WTW_{T} by ρ1\rho_{1} and ρ2\rho_{2} respectively Denote p=(pj)j=1np=(p_{j})_{j=1}^{n} and fk=(fkj)j=1nf_{k}=(f_{k}^{j})_{j=1}^{n}. For every j=1,,nj=1,\ldots,n, we use the SO(3)\mathrm{SO}(3) equivariance of pjp_{j} and fkjf_{k}^{j} to obtain

pj(X)\displaystyle p_{j}(X) =SO(3)ρ2(R1)pj(RX)𝑑ν(R)=k=1KSO(3)ρ2(R1)Λkfjk(RX)𝑑ν(R)\displaystyle=\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ p_{j}(RX)d\nu(R)=\sum_{k=1}^{K}\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\circ f_{j}^{k}(RX)d\nu(R)
=k=1KSO(3)ρ2(R1)Λk(ρ1(R)fkj(X))𝑑ν(R)=k=1KΛ~kfkj(X),\displaystyle=\sum_{k=1}^{K}\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\left(\rho_{1}(R)\circ f_{k}^{j}(X)\right)d\nu(R)=\sum_{k=1}^{K}\tilde{\Lambda}_{k}\circ f_{k}^{j}(X),

where Λ~k\tilde{\Lambda}_{k} stands for the equivariant linear functional from WfeatW_{\mathrm{feat}} to WTW_{T}, defined for wWfeatw\in W_{\mathrm{feat}} by

Λ~k(w)=SO(3)ρ2(R1)Λk(ρ1(R)w)𝑑ν(R).\tilde{\Lambda}_{k}(w)=\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\left(\rho_{1}(R)w\right)d\nu(R).

Thus we have shown that pp is in C(feat,pool)\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}}) for C=KC=K, as required. ∎

Appendix C Proofs for Section 4

We prove Lemma 1 See 1

Proof.

It is known (Segol & Lipman, 2019) (Theorem 2) that polynomials p:3×nnp:\mathbb{R}^{3\times n}\to\mathbb{R}^{n} which are SnS_{n}-equivariant, are spanned by polynomials of the form pα=(pαj)j=1np_{\vec{\alpha}}=(p^{j}_{\vec{\alpha}})_{j=1}^{n}, defined as

pαj(X)=i2,,iK=1nxjα1xi2α2xikαkp^{j}_{\vec{\alpha}}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\alpha_{1}}x_{i_{2}}^{\alpha_{2}}\ldots x_{i_{k}}^{\alpha_{k}} (16)

where α=(α1,,αK)\vec{\alpha}=(\alpha_{1},\ldots,\alpha_{K}) and each αk+3\alpha_{k}\in\mathbb{N}_{+}^{3} is a multi-index. It follows that SnS_{n}-equivariant polynomials of degree D\leq D are spanned by polynomials of the form pαjp^{j}_{\vec{\alpha}} where k=1K|αk|D\sum_{k=1}^{K}|\alpha_{k}|\leq D. Denoting rk=|αk|,k=1,Kr_{k}=|\alpha_{k}|,k=1,\ldots K, the sum of all rkr_{k} by TT, and r=(rk)k=1K\vec{r}=(r_{k})_{k=1}^{K}, we see that that exists a linear functional Λα,r:𝒯T\Lambda_{\vec{\alpha},\vec{r}}:\mathcal{T}_{T}\to\mathbb{R} such that

pαj(X)=Λα,rQjr(X)p^{j}_{\vec{\alpha}}(X)=\Lambda_{\vec{\alpha},\vec{r}}\circ Q_{j}^{\vec{r}}(X)

where we recall that Qr=(Qj(r)(X))j=1nQ^{\vec{r}}=\left(Q_{j}^{(\vec{r})}(X)\right)_{j=1}^{n} is defined in equation 5 as

Qj(r)(X)=i2,,iK=1nxjr1xi2r2xi3r3xiKrK.Q_{j}^{(\vec{r})}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\otimes r_{1}}\otimes x_{i_{2}}^{\otimes r_{2}}\otimes x_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes x_{i_{K}}^{\otimes r_{K}}.

Thus polynomials p=(pj)j=1np=(p_{j})_{j=1}^{n} which are of degree D\leq D, and are SnS_{n} equivariant, can be written as

pj(X)=TDrΣTα||αk|=rkΛα,r(Qj(r)(X))=TDrΣTΛr(ιQj(r)(X)),j=1,,n,p_{j}(X)=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\sum_{\vec{\alpha}||\alpha_{k}|=r_{k}}\Lambda_{\vec{\alpha},\vec{r}}\left(Q^{(\vec{r})}_{j}(X)\right)=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\Lambda_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}_{j}(X)\right),j=1,\ldots,n,

where Λr=α||αk|=rkΛα,rιT1\Lambda_{\vec{r}}=\sum_{\vec{\alpha}||\alpha_{k}|=r_{k}}\Lambda_{\vec{\alpha},\vec{r}}\circ\iota_{T}^{-1}, and ιT1\iota_{T}^{-1} is the left inverse of the embedding ι\iota. If pp is also translation invariant, then

p(X)=p(X1nX1n1nT)=TDrΣTΛ^r(ιQ(r)(X1nX1n1nT)).p(X)=p(X-\frac{1}{n}X1_{n}1_{n}^{T})=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\hat{\Lambda}_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\right).

Thus 𝒬D{\mathcal{Q}}_{D} is DD-spanning. ∎

We prove Lemma 2 See 2

Proof.

In this proof we make the dependence of feat\mathcal{F}_{\mathrm{feat}} on DD explicit and denote feat(D)\mathcal{F}_{\mathrm{feat}}(D).

We prove the claim by induction on DD. Assume D=0D=0. Then 𝒬0{\mathcal{Q}}_{0} contains only the constant function X1n𝒯0nX\mapsto 1_{n}\in\mathcal{T}_{0}^{n}, and this is precisely the function πVextfeat(0)\pi_{V}\circ\mathrm{ext}\in\mathcal{F}_{\mathrm{feat}}(0).

Now assume the claim holds for all DD^{\prime} with D1D0D-1\geq D^{\prime}\geq 0, and prove the claim for DD. Choose r=(r1,,rk)ΣT\vec{r}=(r_{1},\ldots,r_{k})\in\Sigma_{T} for some TDT\leq D, we need to show that the function Q(r)Q^{(\vec{r})} is in feat(D)\mathcal{F}_{\mathrm{feat}}(D). Since feat(D1)feat(D)\mathcal{F}_{\mathrm{feat}}(D-1)\subseteq\mathcal{F}_{\mathrm{feat}}(D) we know from the induction hypothesis that this is true if T<DT<D. Now assume T=DT=D. We consider two cases:

  1. 1.

    If r1>0r_{1}>0, we set r~=(r11,r2,,rK)\tilde{r}=(r_{1}-1,r_{2},\ldots,r_{K}). We know that ιQ(r~)(X¯)feat(D1)\iota\circ Q^{(\tilde{r})}(\bar{X})\in\mathcal{F}_{\mathrm{feat}}(D-1) by the induction hypothesis. So there exist f2,,fDf_{2},\ldots,f_{D} such that

    ιπVf2fDext(X¯)=ιQ(r~)(X¯).\iota\circ\pi_{V}\circ f_{2}\circ\ldots\circ f_{D}\circ\mathrm{ext}(\bar{X})=\iota\circ Q^{(\tilde{r})}(\bar{X}). (17)

    Now choose f1minf_{1}\in\mathcal{F}_{min} to be the function whose VV coordinate V~=(V~j)j=1n\tilde{V}=(\tilde{V}_{j})_{j=1}^{n}, is given by V~j(X,V)=xjVj\tilde{V}_{j}(X,V)=x_{j}\otimes V_{j}, obtained by setting θ1=1,θ2=0\theta_{1}=1,\theta_{2}=0 in equation 7. Then , we have

    V~j(X¯,Q(r~)(X¯))\displaystyle\tilde{V}_{j}(\bar{X},Q^{(\tilde{r})}(\bar{X})) =i2,,iK=1nx¯jx¯j(r11)x¯i2r2x¯iKrK\displaystyle=\sum_{i_{2},\ldots,i_{K}=1}^{n}\bar{x}_{j}\otimes\bar{x}_{j}^{\otimes(r_{1}-1)}\otimes\bar{x}_{i_{2}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}
    =Qj(r)(X¯).\displaystyle=Q_{j}^{(\vec{r})}(\bar{X}).

    and so

    ιπVf1f2fDext(X1nX1n1nT)=ιQ(r)(X¯).\iota\circ\pi_{V}\circ f_{1}\circ f_{2}\circ\ldots\circ f_{D}\circ\mathrm{ext}(X-\frac{1}{n}X1_{n}1_{n}^{T})=\iota\circ Q^{(\vec{r})}(\bar{X}). (18)

    and ιQ(r)(X1nX1n1nT)feat(D)\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\in\mathcal{F}_{\mathrm{feat}}(D).

  2. 2.

    If r1=0r_{1}=0. We assume without loss of generality that r2>0r_{2}>0. Set r~=(r21,r3,,rK)\tilde{r}=(r_{2}-1,r_{3},\ldots,r_{K}). As before by the induction hypothesis there exist f2,,fDf_{2},\ldots,f_{D} which satisfy equation 17. This time we choose f1minf_{1}\in\mathcal{F}_{min} to be the function whose VV coordinate V~=(V~j)j=1n\tilde{V}=(\tilde{V}_{j})_{j=1}^{n}, is given by V~j(X,V)=jxjVj\tilde{V}_{j}(X,V)=\sum_{j}x_{j}\otimes V_{j}, obtained by setting θ1=0,θ2=1\theta_{1}=0,\theta_{2}=1 in equation 7. Then we have

    V~j(X¯,Q(r~)(X¯))\displaystyle\tilde{V}_{j}(\bar{X},Q^{(\tilde{r})}(\bar{X})) =j=1ni3,,iK=1nx¯jx¯j(r21)x¯i3r2x¯iKrK\displaystyle=\sum_{j=1}^{n}\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{j}\otimes\bar{x}_{j}^{\otimes(r_{2}-1)}\otimes\bar{x}_{i_{3}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}
    =i2,i3,,iK=1nx¯i2r2x¯i3r2x¯iKrK\displaystyle=\sum_{i_{2},i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{2}}^{\otimes r_{2}}\otimes\bar{x}_{i_{3}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}
    =Qj(r)(X¯).\displaystyle=Q_{j}^{(\vec{r})}(\bar{X}).

    Thus equation 18 holds, and so again we have that ιQ(r)(X1nX1n1nT)feat(D)\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\in\mathcal{F}_{\mathrm{feat}}(D).

Finally we prove Lemma 3 See 3

Proof.

If the conditions in Lemma 3 hold, then since 𝒬D{\mathcal{Q}}_{D} is DD-spanning, every translation invariant and permutation equivariant polynomials pp of degree DD can be written as

p(X)\displaystyle p(X) =r|r1DΛ^r(ιQ(r)(X1nX1n1nT))=r|r1DΛ^r(k=1KrιA^k,rfk,r(X))\displaystyle=\sum_{\vec{r}|\|\vec{r}\|_{1}\leq D}\hat{\Lambda}_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\right)=\sum_{\vec{r}|\|\vec{r}\|_{1}\leq D}\hat{\Lambda}_{\vec{r}}\left(\sum_{k=1}^{K_{\vec{r}}}\iota\circ\hat{A}_{k,\vec{r}}f_{k,\vec{r}}(X)\right)
=r|r1Dk=1KrΛ^k,r(fk,r(X))\displaystyle=\sum_{\vec{r}|\|\vec{r}\|_{1}\leq D}\sum_{k=1}^{K_{\vec{r}}}\hat{\Lambda}_{k,\vec{r}}\left(f_{k,\vec{r}}(X)\right)

where we denote Λk,r=ΛrιAk,r\Lambda_{k,\vec{r}}=\Lambda_{\vec{r}}\circ\iota\circ A_{k,\vec{r}}. Thus we proved feat\mathcal{F}_{\mathrm{feat}} is DD-spanning. ∎

Appendix D Proofs for Section 5

We prove Lemma 1 See 1

Proof.

As mentioned in the main text, this lemma is based on Schur’s lemma. This lemma is typically stated for complex representations, but holds for odd dimensional real representation as well. We recount the lemma and its proof here for completeness (see also (Fulton & Harris, 2013)).

Lemma 1 (Schur’s Lemma for SO(3)\mathrm{SO}(3)).

Let Λ:W1W2\Lambda:W_{\ell_{1}}\to W_{\ell_{2}} be a linear equivariant map. If 12\ell_{1}\neq\ell_{2} then Λ=0\Lambda=0. Otherwise Λ\Lambda is a scalar multiply of the identity.

Proof.

Let Λ:W1W2\Lambda:W_{\ell_{1}}\to W_{\ell_{2}} be a linear equivariant map. The image and kernel of Λ\Lambda are invariant subspaces of W1W_{\ell_{1}} and W2W_{\ell_{2}}, respectively. It follows that if Λ0\Lambda\neq 0 then Λ\Lambda is a linear isomorphism so necessarily 1=2\ell_{1}=\ell_{2}. Now assume 1=2\ell_{1}=\ell_{2}. Since the dimension of W1W_{\ell_{1}} is odd, Λ\Lambda has a real eigenvalue λ\lambda. The linear function ΛλI\Lambda-\lambda I is equivariant and has a non-trivial kernel, so ΛλI=0\Lambda-\lambda I=0. ∎

We now return to the proof of Lemma 1. Note that each Λj:W𝒍(1)Wj(2)\Lambda_{j}:W_{{\bm{l}}^{(1)}}\to W_{\ell_{j}^{(2)}} is linear and SO(3)\mathrm{SO}(3) equivariant. Next denote the restrictions of each Λj\Lambda_{j} to Wi(1),i=1,,K2W_{\ell_{i}^{(1)}},i=1,\ldots,K_{2} by Λij\Lambda_{ij}, and note that

Λj(V1,,VK1)=i=1K1Λij(Vi).\Lambda_{j}(V_{1},\ldots,V_{K_{1}})=\sum_{i=1}^{K_{1}}\Lambda_{ij}(V_{i}). (19)

By considering vectors in W𝒍(1)W_{{\bm{l}}^{(1)}} of the form (0,,0,Vi,0,0)(0,\ldots,0,V_{i},0\ldots,0) we see that each Λij:Wi(1)Wj(2)\Lambda_{ij}:W_{\ell_{i}^{(1)}}\to W_{\ell_{j}^{(2)}} is linear and SO(3)\mathrm{SO}(3)-equivariant. Thus by Schur’s lemma, if i(1)=j(2)\ell_{i}^{(1)}=\ell_{j}^{(2)} then Λij(Vi)=MijVi\Lambda_{ij}(V_{i})=M_{ij}V_{i} for some real MijM_{ij}, and otherwise Mij=0M_{ij}=0. Plugging this into equation 19 we obtain equation 10.

We prove Theorem 2 which shows that the TFN network described in the main text is universal: See 2

Proof.

As mentioned in the main text, we only need to show that the function space feat(D)\mathcal{F}_{\mathrm{feat}}(D) is DD-spanning. Recall that feat(D)\mathcal{F}_{\mathrm{feat}}(D) is obtained by 2D2D consecutive convolutions with DD-filters. In general, we denote the space of functions defined by applying JJ consecutive convolutions by 𝒢J,D\mathcal{G}_{J,D}.

If 𝒴{\mathcal{Y}} is a space of functions from 3×nYn\mathbb{R}^{3\times n}\to Y^{n}, we denote by 𝒴,𝒯T\langle{\mathcal{Y}},\mathcal{T}_{T}\rangle the space of all functions p:3×n𝒯Tnp:\mathbb{R}^{3\times n}\to\mathcal{T}_{T}^{n} of the form

p(X)=k=1KA^kfk(X),p(X)=\sum_{k=1}^{K}\hat{A}_{k}f_{k}(X), (20)

where Ak:Y𝒯TA_{k}:Y\to\mathcal{T}_{T} are linear functions, A^k:Yn𝒯Tn\hat{A}_{k}:Y^{n}\to\mathcal{T}_{T}^{n} are induced by elementwise application, and fk𝒴f_{k}\in{\mathcal{Y}}. This notation is useful because: (i) by Lemma 3 it is sufficient to show that Q(r)(X¯)Q^{(\vec{r})}(\bar{X}) is in 𝒢2D,D,𝒯T\langle\mathcal{G}_{2D,D},{\mathcal{T}_{T}}\rangle for all rΣT\vec{r}\in\Sigma_{T} and all TDT\leq D, and because (ii) it enables comparison of the expressive power of function spaces 𝒴1,𝒴2{\mathcal{Y}}_{1},{\mathcal{Y}}_{2} whose elements map to different spaces Y1n,Y2nY_{1}^{n},Y_{2}^{n}, since the elements in 𝒴i,𝒯T,i=1,2\langle{\mathcal{Y}}_{i},\mathcal{T}_{T}\rangle,i=1,2 both map to the same space. In particular, note that if for every f𝒴2f\in{\mathcal{Y}}_{2} there is a g𝒴1g\in{\mathcal{Y}}_{1} and a linear map A:Y1Y2A:Y_{1}\to Y_{2} such that f(X)=A^g(X)f(X)=\hat{A}\circ g(X), then 𝒴2,𝒯T𝒴1,𝒯T\langle{\mathcal{Y}}_{2},\mathcal{T}_{T}\rangle\subseteq\langle{\mathcal{Y}}_{1},\mathcal{T}_{T}\rangle.

We now use this abstract discussion to prove some useful results: the first is that for the purpose of this lemma, we can ‘forget about’ the multiplication by a unitary matrix in equation 12, used for decomposition into irreducible representations: To see this, denote by 𝒢~J,D\tilde{\mathcal{G}}_{J,D} the function space obtained by taking JJ consecutive convolutions with DD-filters without multiplying by a unitary matrix in equation 12. Since Kronecker products of unitary matrices are unitary matrices, we obtain that the elements in 𝒢J,D\mathcal{G}_{J,D} and 𝒢~J,D\tilde{\mathcal{G}}_{J,D} differ only by multiplication by a unitary matrix, and thus 𝒢~J,D,𝒯T𝒢J,D,𝒯T\langle\tilde{\mathcal{G}}_{J,D},\mathcal{T}_{T}\rangle\subseteq\langle\mathcal{G}_{J,D},\mathcal{T}_{T}\rangle and 𝒢J,D,𝒯T𝒢~J,D,𝒯T\langle\mathcal{G}_{J,D},\mathcal{T}_{T}\rangle\subseteq\langle\tilde{\mathcal{G}}_{J,D},\mathcal{T}_{T}\rangle, so both sets are equal.

Next, we prove that adding convolutional layers (enlarging JJ) or taking higher order filters (enlarging DD) can only increase the expressive power of a network.

Lemma 2.

For all J,D,T+J,D,T\in\mathbb{N}_{+},

  1. 1.

    𝒢J,D,𝒯T𝒢J+1,D,𝒯T\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle\subseteq\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{T}}\rangle.

  2. 2.

    𝒢J,D,𝒯T𝒢J,D+1,𝒯T\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle\subseteq\langle\mathcal{G}_{J,D+1},{\mathcal{T}_{T}}\rangle.

Proof.

The first claim follows from the fact that every function ff in 𝒢J,D,𝒯T\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle can be identified with a function in 𝒢J+1,D,𝒯T\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{T}}\rangle by taking the J+1J+1 convolutional layer in equation 12 with θ0=1,F=0\theta_{0}=1,F=0.

The second claim follows from the fact that DD-filters can be identified with D+1D+1-filters whose D+1D+1-th entry is 0. ∎

The last preliminary lemma we will need is

Lemma 3.

For every J,D+J,D\in\mathbb{N}_{+}, and every t,s+t,s\in\mathbb{N}_{+}, if p𝒢J,D,𝒯tp\in\langle\mathcal{G}_{J,D},{\mathcal{T}_{t}}\rangle, then the function qq defined by

qa(X)=b=1n(x¯ax¯b)spb(X)q_{a}(X)=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes s}\otimes p_{b}(X)

is in 𝒢J+1,D,𝒯t+s\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{t+s}}\rangle.

Proof.

This lemma is based on the fact that the space of ss homogeneous polynomial on 3\mathbb{R}^{3} is spanned by polynomials of the form xsYm(x)\|x\|^{s-\ell}Y^{\ell}_{m}(x) for =s,s2,s4\ell=s,s-2,s-4\ldots (Dai & Xu, 2013). For each such \ell, and sDs\leq D, these polynomials can be realized by filters F()F^{(\ell)} by setting R()(x)=xsR^{(\ell)}(\|x\|)=\|x\|^{s} so that

Fm()(x)=xsYm(x^)=xsYm(x).F_{m}^{(\ell)}(x)=\|x\|^{s}Y^{\ell}_{m}(\hat{x})=\|x\|^{s-\ell}Y_{m}^{\ell}(x).

For every DD\in\mathbb{N} and sDs\leq D, we can construct a DD-filter Fs,D=(F(0),,F(D))F^{s,D}=(F^{(0)},\ldots,F^{(D)}) where F(s),F(s2),F^{(s)},F^{(s-2)},\ldots are as defined above and the other filters are zero. Since both the entries of Fs,D(x)F^{s,D}(x), and the entries of xsx^{\otimes s}, span the space of ss-homogeneous polynomials on 3\mathbb{R}^{3}, it follows that there exists a linear mapping Bs:W𝒍D𝒯sB_{s}:W_{{\bm{l}}_{D}}\to\mathcal{T}_{s} so that

xs=Bs(Fs,D(x)),x3.x^{\otimes s}=B_{s}(F^{s,D}(x)),\forall x\in\mathbb{R}^{3}. (21)

Thus, since pp can be written as a sum of compositions of linear mappings with functions in 𝒢J,D\mathcal{G}_{J,D} as in equation 20, and similarly xsx^{\otimes s} is obtained as a linear image of functions in 𝒢1,D\mathcal{G}_{1,D} as in equation 21, we deduce that

b=1n(xaxb)pb(X)=b=1n(x¯ax¯b)pb(X)\sum_{b=1}^{n}(x_{a}-x_{b})\otimes p_{b}(X)=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes p_{b}(X)

is in 𝒢J+1,D,𝒯t+s\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{t+s}}\rangle

As a final preliminary, we note that DD-filters can perform an averaging operation by setting R(0)=1R^{(0)}=1 and θ0,R(1),,R(D)=0\theta_{0},R^{(1)},\ldots,R^{(D)}=0 in equation 11 and equation 12 . We call this DD-filter an averaging filter.

We are now ready to prove our claim: we need to show that for every D,T+D,T\in\mathbb{N}_{+} where TDT\leq D, for every rΣT\vec{r}\in\Sigma_{T}, the function Q(r)Q^{(\vec{r})} is in 𝒢2D,D,𝒯T\langle\mathcal{G}_{2D,D},{\mathcal{T}_{T}}\rangle. Note that due to the inclusion relations in Lemma 2 it is sufficient to prove this for the case T=DT=D. We prove this by induction on DD. For D=0D=0, vectors rΣ0\vec{r}\in\Sigma_{0} contains only zeros and so

Q(r)(X¯)=1n=πVext(X)𝒢0,0,𝒯0.Q^{(\vec{r})}(\bar{X})=1_{n}=\pi_{V}\circ\mathrm{ext}(X)\in\langle\mathcal{G}_{0,0},{\mathcal{T}_{0}}\rangle.

We now assume the claim is true for all DD^{\prime} with D>D0D>D^{\prime}\geq 0 and prove the claim is true for DD. We need to show that for every rΣD\vec{r}\in\Sigma_{D} the function Q(r)Q^{(\vec{r})} is in 𝒢2D,D,𝒯D\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle. We prove this yet again by induction, this time on the value of r1r_{1}: assume that rΣD\vec{r}\in\Sigma_{D} and r1=0r_{1}=0.. Denote by r~\tilde{r} the vector in ΣD1\Sigma_{D-1} defined by

r~=(r21,r3,,rK).\tilde{r}=(r_{2}-1,r_{3},\ldots,r_{K}).

By the induction assumption on DD, we know that Q(r~)(X¯)𝒢2(D1),D1,D1Q^{(\tilde{r})}(\bar{X})\in\mathcal{G}_{2(D-1),D-1,D-1} and so

qa(X)\displaystyle q_{a}(X) =b=1n(x¯ax¯b)Qb(r~)(X¯)=b=1n(x¯ax¯b)x¯br21i3,,iK=1nx¯i3r3x¯iKrK\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes Q^{(\tilde{r})}_{b}(\bar{X})=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes\bar{x}_{b}^{\otimes r_{2}-1}\otimes\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}
=(x¯ab=1nQb(r~)(X¯))Q(r)(X¯)\displaystyle=\left(\bar{x}_{a}\otimes\sum_{b=1}^{n}Q^{(\tilde{r})}_{b}(\bar{X})\right)-Q^{(\vec{r})}(\bar{X})

is in 𝒢2D1,D1,𝒯D\langle\mathcal{G}_{2D-1,D-1},{\mathcal{T}_{D}}\rangle by Lemma 3, which is contained in 𝒢2D1,D,𝒯D\langle\mathcal{G}_{2D-1,D},{\mathcal{T}_{D}}\rangle by Lemma 2. Since x¯a\bar{x}_{a} has zero mean, while Qa(r)(X¯)Q_{a}^{(\vec{r})}(\bar{X}) does not depend on aa since r1=0r_{1}=0, applying an averaging filter to qaq_{a} gives us a constant value Qa(r)(X¯)-Q_{a}^{(\vec{r})}(\bar{X}) in each coordinate a[n]a\in[n], and so Q(r)(X¯)Q^{(\vec{r})}(\bar{X}) is in 𝒢2D,D,𝒯D\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle.

Now assume the claim is true for all rΣD\vec{r}\in\Sigma_{D} which sum to DD, and whose first coordinate is smaller than some r11r_{1}^{\prime}\geq 1, we now prove the claim is true when the first coordinate of r\vec{r} is equal to r1r_{1}^{\prime}. The vector r~=(r2,,rK)\tilde{r}=(r_{2},\ldots,r_{K}) obtained from r\vec{r} by removing the first coordinate, sums to D=Dr1<DD^{\prime}=D-r_{1}^{\prime}<D, and so by the induction hypothesis on DD we know that Q(r~)𝒢2D,D,𝒯DQ^{(\tilde{r})}\in\langle\mathcal{G}_{2D^{\prime},D^{\prime}},{\mathcal{T}_{D^{\prime}}}\rangle. By Lemma 3 we obtain a function qa𝒢2D+1,D,𝒯D𝒢2D,D,𝒯Dq_{a}\in\langle\mathcal{G}_{2D^{\prime}+1,D^{\prime}},{\mathcal{T}_{D}}\rangle\subseteq\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle defined by

qa(X)\displaystyle q_{a}(X) =b=1n(x¯ax¯b)r1Qb(r~)(X¯)\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes r_{1}}\otimes Q_{b}^{(\tilde{r})}(\bar{X})
=b=1n(x¯ax¯b)r1x¯br2i3,,iK=1nx¯i3r3x¯iKrK\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes r_{1}}\otimes\bar{x}_{b}^{\otimes r_{2}}\otimes\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}
=Qa(r)(X¯)+additional terms\displaystyle=Q_{a}^{(\vec{r})}(\bar{X})+\text{additional terms}

where the additional terms are linear combinations of functions of the form PDQa(r)(X¯)P_{D}Q_{a}^{(r^{\prime})}(\bar{X}) where rΣDr^{\prime}\in\Sigma_{D} and their first coordinate r1r_{1} is smaller than r1r_{1}^{\prime}, and PD:𝒯D𝒯DP_{D}:\mathcal{T}_{D}\to\mathcal{T}_{D} is a permutation. By the induction hypothesis on r1r_{1}, each such Q(r)Q^{(r^{\prime})} is in 𝒢2D,D,𝒯D\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle. It follows that PDQa(r)(X¯),a=1,,nP_{D}Q_{a}^{(r^{\prime})}(\bar{X}),a=1,\ldots,n, and thus Q(r)(X¯)Q^{(\vec{r})}(\bar{X}), are in 𝒢2D,D,𝒯D\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle as well. This concludes the proof of Theorem 2.

Appendix E Alternative TFN architecture

In this appendix we show that replacing the standard TFN convolutional layer with the layer defined in equation 13:

V~a(X,V)=U(𝒍f,𝒍i)(θ1b=1nF(xaxb)Vb+θ2b=1nF(xaxb)Va),\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{1}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}+\theta_{2}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{a}\right),

we can obtain DD-spanning networks using 2D2D consecutive convolutions with 11-filters (that is, filters in W𝒍1W_{{\bm{l}}_{1}}, where 𝒍1=[0,1]T{\bm{l}}_{1}=[0,1]^{T}). Our discussion here is somewhat informal, meant to provide the general ideas without delving into the details as we have done for the standard TFN architecture in the proof of Theorem 2. In the end of our discussion we will explain what is necessary to make this argument completely rigorous.

We will only need two fixed filters for our argument here: The first is the 11-filter FId=(F(0),F(1))F_{Id}=(F^{(0)},F^{(1)}) defined by setting R(0)(x)=0R^{(0)}(\|x\|)=0 and R(1)(x)=xR^{(1)}(\|x\|)=\|x\| to obtain

FId(x)=xY1(x^)=xx^=x.F_{Id}(x)=\|x\|Y^{1}(\hat{x})=\|x\|\hat{x}=x.

The second is the filter F1F_{1} defined by setting R(0)(x)=1R^{(0)}(\|x\|)=1 and R(1)(x)=0R^{(1)}(\|x\|)=0, so that

F1(x)=1.F_{1}(x)=1.

We prove our claim by showing that a pair of convolutions with 11-filters can construct any convolutional layer defined in equation 7 for the DD-spanning architecture using tensor representations. The claim then follows from the fact that DD convolutions of the latter architecture suffice for achieving DD-spanning, as shown in Lemma 2.

Convolutions for tensor representations, defined in equation 7, are composed of two terms:

V~atensor,1(X¯,V)=x¯aVa and V~atensor,2(X¯,V)=b=1nx¯bVb.\tilde{V}_{a}^{\mathrm{tensor},1}(\bar{X},V)=\bar{x}_{a}\otimes V_{a}\text{ and }\tilde{V}_{a}^{\mathrm{tensor},2}(\bar{X},V)=\sum_{b=1}^{n}\bar{x}_{b}\otimes V_{b}.

To obtain the first term V~atensor,1\tilde{V}_{a}^{\mathrm{tensor},1}, we set θ1=0,θ2=1/n,F=FId\theta_{1}=0,\theta_{2}=1/n,F=F_{Id} in equation 13 we obtain (the decomposition into irreducibles of) V~atensor,1(X¯,V)=x¯aVa\tilde{V}_{a}^{\mathrm{tensor},1}(\bar{X},V)=\bar{x}_{a}\otimes V_{a}. Thus this term can in fact be expressed by a single convolution. We can leave this outcome unchanged by a second convolution, defined by setting θ1=0,θ2=1/n,F=F1\theta_{1}=0,\theta_{2}=1/n,F=F_{1}.

To obtain the second term V~atensor,2\tilde{V}_{a}^{\mathrm{tensor},2}, we apply a first convolution with θ1=1,F=FId,θ2=0\theta_{1}=-1,F=F_{Id},\theta_{2}=0, to obtain

b=1n(xbxa)Vb=b=1n(x¯bx¯a)Vb=V~atensor,2(V,X¯)x¯ab=1nVb\sum_{b=1}^{n}(x_{b}-x_{a})\otimes V_{b}=\sum_{b=1}^{n}(\bar{x}_{b}-\bar{x}_{a})\otimes V_{b}=\tilde{V}_{a}^{\mathrm{tensor},2}(V,\bar{X})-\bar{x}_{a}\otimes\sum_{b=1}^{n}V_{b}

By applying an additional averaging filter, defined by setting θ1=1n,F=F1,θ2=0\theta_{1}=\frac{1}{n},F=F_{1},\theta_{2}=0, we obtain V~atensor,2(V,X¯)\tilde{V}_{a}^{\mathrm{tensor},2}(V,\bar{X}). This concludes our ‘informal proof’.

Our discussion here has been somewhat inaccurate, since in practice FId(x)=(0,x)W0W1F_{Id}(x)=(0,x)\in W_{0}\oplus W_{1} and F1(x)=(1,0)W0W1F_{1}(x)=(1,0)\in W_{0}\oplus W_{1}. Moreover, in our proof we have glossed over the multiplication by the unitary matrix used to obtain decomposition into irreducible representations. However the ideas discussed here can be used to show that 2D2D convolutions with 11-filters can satisfy the sufficient condition for DD-spanning defined in Lemma 3. See our treatment of Theorem 2 for more details.

Appendix F Comparison with original TFN paper

In this Appendix we discuss three superficial differences between the presentation of the TFN architecture in Thomas et al. (2018) and our presentation here:

  1. 1.

    We define convolutional layers between features residing in direct sums of irreducible representations, while (Thomas et al., 2018) focuses on features which inhabit a single irreducible representation. This difference is non-essential, as direct sums of irreducible representations can be represented as multiple channels where each feature inhabits a single irreducible representation.

  2. 2.

    The term θ0Va\theta_{0}V_{a} in equation 12 appears in (Fuchs et al., 2020), but does not appear explicitly in (Thomas et al., 2018). However it can be obtained by concatenation of the input of a self-interaction layer to the output, and then applying a self-interaction layer.

  3. 3.

    We take the scalar functions R()R^{(\ell)} to be polynomials, while (Thomas et al., 2018) take them to be fully connected networks composed with radial basis functions. Using polynomial scalar bases is convenient for our presentation here since it enables exact expression of equivariant polynomials. Replacing polynomial bases with fully connected networks, we obtain approximation of equivariant polynomials instead of exact expression. It can be shown that if pp is a GG-equivariant polynomial which can be expressed by some network C,D\mathcal{F}_{C,D} defined with filters coming from a polynomial scalar basis, then pp can be approximated on a compact set KK, up to an arbitrary ϵ\epsilon error, by a similar network with scalar functions coming from a sufficiently large fully connected network.

Appendix G Tensor Universality

In this section we show how to construct the complete set pool\mathcal{F}_{\mathrm{pool}} of linear SO(3)\mathrm{SO}(3) invariant functionals from Wfeat𝒯=T=0D𝒯TW_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T} to \mathbb{R}. Since each such functional Λ\Lambda is of the form

Λ(w0,,wD)=T=0DΛT(wT),\Lambda(w_{0},\ldots,w_{D})=\sum_{T=0}^{D}\Lambda_{T}(w_{T}),

where each ΛT\Lambda_{T} is SO(3)\mathrm{SO}(3)-invariant, it is sufficient to characterize all linear SO(3)\mathrm{SO}(3)-invariant functionals Λ:𝒯D\Lambda:\mathcal{T}_{D}\to\mathbb{R}.

It will be convenient to denote

W=3 and WD3D=𝒯D.W=\mathbb{R}^{3}\text{ and }W^{\otimes D}\cong\mathbb{R}^{3^{D}}=\mathcal{T}_{D}.

We achieve our characterization using the bijective correspondence between linear functional Λ:WD\Lambda:W^{\otimes D}\to\mathbb{R} and multi-linear functions Λ~:WD\tilde{\Lambda}:W^{D}\to\mathbb{R}: each such Λ\Lambda corresponds to a unique Λ^\hat{\Lambda}, such that

Λ~(ei1,,eiD)=Λ(ei1eiD),(i1,,iD)[3]D,\tilde{\Lambda}(e_{i_{1}},\ldots,e_{i_{D}})=\Lambda(e_{i_{1}}\otimes\ldots\otimes e_{i_{D}}),\,\forall(i_{1},\ldots,i_{D})\in[3]^{D}, (22)

where e1,e2,e3e_{1},e_{2},e_{3} denote the standard basis elements of 3\mathbb{R}^{3}. We define a spanning set of equivariant linear functionals on WDW^{\otimes D} via a corresponding characterization for multi-linear functionals on WDW^{D}. Specifically, set

KD={k+|D3k is even and non-negative. }K_{D}=\{k\in\mathbb{N}_{+}|\,D-3k\text{ is even and non-negative. }\}

For kKDk\in K_{D} we define a multi-linear functional:

Λ~k(w1,,wD)\displaystyle\tilde{\Lambda}_{k}(w_{1},\ldots,w_{D}) =det(w1,w2,w3)××det(w3k2,w3k1,w3k)×w3k+1,w3k+2×\displaystyle=\det(w_{1},w_{2},w_{3})\times\ldots\times\det(w_{3k-2},w_{3k-1},w_{3k})\times\langle w_{3k+1},w_{3k+2}\rangle\times\ldots
×wD1,wD,\displaystyle\times\langle w_{D-1},w_{D}\rangle, (23)

and for (k,σ)KD×SD(k,\sigma)\in K_{D}\times S_{D} we define

Λ~k,σ(w1,,wD)=Λ~k(wσ(1),,wσ(D))\tilde{\Lambda}_{k,\sigma}(w_{1},\ldots,w_{D})=\tilde{\Lambda}_{k}(w_{\sigma(1)},\ldots,w_{\sigma(D)}) (24)
Proposition 1.

The space of linear invariant functions from 𝒯D\mathcal{T}_{D} to \mathbb{R} is spanned by the set of linear invariant functionals λD={Λk,σ|(k,σ)KD×SD}\lambda_{D}=\{\Lambda_{k,\sigma}|\,(k,\sigma)\in K_{D}\times S_{D}\} induced by the multi-linear functional Λ~k,σ\tilde{\Lambda}_{k,\sigma} described in equation G and equation 24

We note that (i) equation 22 provides a (cumbersome) way to compute all linear invariant functionals Λk,σ\Lambda_{k,\sigma} explicitly by evaluating the corresponding Λ~k,σ\tilde{\Lambda}_{k,\sigma} on the 3D3^{D} elements of the standard basis and (ii) the set λD\lambda_{D} is spanning, but is not linearly independent. For example, since w1,w2=w2,w1\langle w_{1},w_{2}\rangle=\langle w_{2},w_{1}\rangle, the space of SO(3)\mathrm{SO}(3) invariant functionals on 𝒯2=W2\mathcal{T}_{2}=W^{\otimes 2} is one dimensional while |λ2|=2|\lambda_{2}|=2.

Proof of Proposition 1.

We first show that the bijective correspondence between linear functional Λ:WD\Lambda:W^{\otimes D}\to\mathbb{R} and multi-linear functions Λ~:WD\tilde{\Lambda}:W^{D}\to\mathbb{R}, extends to a bijective correspondence between SO(3)\mathrm{SO}(3)-invariant linear/multi-linear functionals. The action of SO(3)\mathrm{SO}(3) on WDW^{D} is defined by

ρ~(R)(w1,,wD)=(Rw1,,RwD).\tilde{\rho}(R)(w_{1},\ldots,w_{D})=(Rw_{1},\ldots,Rw_{D}).

The action ρ(R)=RD\rho(R)=R^{\otimes D} of SO(3)\mathrm{SO}(3) on WDW^{\otimes D} is such that the map

(w1,,wD)w1w2wD(w_{1},\ldots,w_{D})\mapsto w_{1}\otimes w_{2}\ldots w_{D}

is SO(3)\mathrm{SO}(3)- equivariant. It follows that if Λ~\tilde{\Lambda} and Λ\Lambda satisfy equation 22, then for all RSO(3)R\in\mathrm{SO}(3), the same equation holds for the pair Λ~ρ~(R)\tilde{\Lambda}\circ\tilde{\rho}(R) and Λρ(R)\Lambda\circ\rho(R). Thus SO(3)\mathrm{SO}(3)-invariance of Λ~\tilde{\Lambda} is equivalent to SO(3)\mathrm{SO}(3)-invariance of Λ\Lambda.

Multi-linear functionals on WDW^{D} invariant to ρ~\tilde{\rho} are a subset of the set of polynomials on WDW^{D} invariant to ρ~\tilde{\rho}. It is known (see (Kraft & Procesi, 2000), page 114), that all such polynomials are algebraically generated by functions of the form

det(wi1,wi2,wi3) and wj1,wj2, where i1,i2,i3,j1,j2[D].\det(w_{i_{1}},w_{i_{2}},w_{i_{3}})\text{ and }\langle w_{j_{1}},w_{j_{2}}\rangle,\text{ where }i_{1},i_{2},i_{3},j_{1},j_{2}\in[D].

Equivalently, SO(3)\mathrm{SO}(3)-invariant polynomials are spanned by linear combinations of polynomials of the form

det(wi1,wi2,wi3)det(wi4,wi5,wi6)wj1,wj2wj3,wj4.\det(w_{i_{1}},w_{i_{2}},w_{i_{3}})\det(w_{i_{4}},w_{i_{5}},w_{i_{6}})\ldots\langle w_{j_{1}},w_{j_{2}}\rangle\langle w_{j_{3}},w_{j_{4}}\rangle\ldots. (25)

When considering the subset of multi-linear invariant polynomials, we see that they must be spanned by polynomials as in equation 25, where each w1,,wDw_{1},\ldots,w_{D} appears exactly once in each polynomial in the spanning set. These precisely correspond to the functions in λD\lambda_{D}.