\AtAppendix

On the Universality of Rotation Equivariant Point Cloud Networks

Nadav Dym
Duke University
[email protected]
&Haggai Maron
NVIDIA Research
[email protected]

Abstract

Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation. In this paper, we present a first study of the approximation power of these architectures. We first derive two sufficient conditions for an equivariant architecture to have the universal approximation property, based on a novel characterization of the space of equivariant polynomials. We then use these conditions to show that two recently suggested models (Thomas et al., 2018; Fuchs et al., 2020) are universal, and for devising two other novel universal architectures.

1 Introduction

Designing neural networks that respect data symmetry is a powerful approach for obtaining efficient deep models. Prominent examples being convolutional networks which respect the translational invariance of images, graph neural networks which respect the permutation invariance of graphs (Gilmer et al., 2017; Maron et al., 2019b), networks such as (Zaheer et al., 2017; Qi et al., 2017a) which respect the permutation invariance of sets, and networks which respect 3D rotational symmetries (Cohen et al., 2018; Weiler et al., 2018; Esteves et al., 2018; Worrall & Brostow, 2018; Kondor et al., 2018a).

While the expressive power of equivariant models is reduced by design to include only equivariant functions, a desirable property of equivariant networks is universality: the ability to approximate any continuous equivariant function. This is not always the case: while convolutional networks and networks for sets are universal (Yarotsky, 2018; Segol & Lipman, 2019), popular graph neural networks are not (Xu et al., 2019; Morris et al., 2018).

In this paper, we consider the universality of networks that respect the symmetries of 3D point clouds: translations, rotations, and permutations. Designing such networks is a popular paradigm in recent years (Thomas et al., 2018; Fuchs et al., 2020; Poulenard et al., 2019; Zhao et al., 2019). While there have been many works on the universality of permutation invariant networks (Zaheer et al., 2017; Maron et al., 2019c; Keriven & Peyré, 2019), and a recent work discussing the universality of rotation equivariant networks (Bogatskiy et al., 2020), this is a first paper which discusses the universality of networks which combine rotations, permutations and translations.

We start the paper with a general, architecture-agnostic, discussion, and derive two sufficient conditions for universality. These conditions are a result of a novel characterization of equivariant polynomials for the symmetry group of interest. We use these conditions in order to prove universality of the prominent Tensor Field Networks (TFN) architecture (Thomas et al., 2018; Fuchs et al., 2020). The following is a weakened and simplified statement of Theorem 2 stated later on in the paper:

Theorem (Simplification of Theorem 2).

Any continuous equivariant function on point clouds can be approximated uniformly on compact sets by a composition of TFN layers.

We use our general discussion to prove the universality of two additional equivariant models: the first is a simple modification of the TFN architecture which allows for universality using only low dimensional filters. The second is a minimal architecture which is based on tensor product representations, rather than the more commonly used irreducible representations of $\mathrm{SO}(3)$ . We discuss the advantages and disadvantages of both approaches.

To summarize, the contributions of this paper are: (1) A general approach for proving the universality of rotation equivariant models for point clouds; (2) A proof that two recent equivariant models (Thomas et al., 2018; Fuchs et al., 2020) are universal; (3) Two additional simple and novel universal architectures.

2 Previous work

Deep learning on point clouds. (Qi et al., 2017a; Zaheer et al., 2017) were the first to apply neural networks directly to the raw point cloud data, by using pointwise functions and pooling operations. Many subsequent works used local neighborhood information (Qi et al., 2017b; Wang et al., 2019; Atzmon et al., 2018). We refer the reader to a recent survey for more details (Guo et al., 2020). In contrast with the aforementioned works which focused solely on permutation invariance, more related to this paper are works that additionally incorporated invariance to rigid motions. (Thomas et al., 2018) proposed Tensor Field Networks (TFN) and showed their efficacy on physics and chemistry tasks.(Kondor et al., 2018b) also suggested an equivariant model for continuous rotations. (Li et al., 2019) suggested models that are equivariant to discrete subgroups of $\mathrm{SO}(3)$ . (Poulenard et al., 2019) suggested an invariant model based on spherical harmonics. (Fuchs et al., 2020) followed TFN and added an attention mechanism. Recently, (Zhao et al., 2019) proposed a quaternion equivariant point capsule network that also achieves rotation and translation invariance.

Universal approximation for invariant networks. Understanding the approximation power of invariant models is a popular research goal. Most of the current results assume that the symmetry group is a permutation group. (Zaheer et al., 2017; Qi et al., 2017a; Segol & Lipman, 2019; Maron et al., 2020; Serviansky et al., 2020) proved universality for several $S_{n}$ -invariant and equivariant models. (Maron et al., 2019b; a; Keriven & Peyré, 2019; Maehara & NT, 2019) studied the approximation power of high-order graph neural networks. (Maron et al., 2019c; Ravanbakhsh, 2020) targeted universality of networks that use high-order representations for permutation groups(Yarotsky, 2018) provided several theoretical constructions of universal equivariant neural network models based on polynomial invariants, including an $SE(2)$ equivariant model. In a recent work (Bogatskiy et al., 2020) presented a universal approximation theorem for networks that are equivariant to several Lie groups including $SO(3)$ . The main difference from our paper is that we prove a universality theorem for a more complex group that besides rotations also includes translations and permutations.

3 A framework for proving universality

In this section, we describe a framework for proving the universality of equivariant networks. We begin with some mathematical preliminaries:

3.1 Mathematical setup

An action of a group $G$ on a real vector space $W$ is a collection of maps $\rho(g):W\to W$ defined for any $g\in G$ , such that $\rho(g_{1})\circ\rho(g_{2})=\rho(g_{1}g_{2})$ for all $g_{1},g_{2}\in G$ , and the identity element of $G$ is mapped to the identity mapping on $W$ . We say $\rho$ is a representation of $G$ if $\rho(g)$ is a linear map for every $g\in G$ . As is customary, when it does not cause confusion we often say that $W$ itself is a representation of $G$ .

In this paper, we are interested in functions on point clouds. Point clouds are sets of vectors in $\mathbb{R}^{3}$ arranged as matrices:

X=(x_{1},\ldots,x_{n})\in\mathbb{R}^{3\times n}.

Many machine learning tasks on point clouds, such as classification, aim to learn a function which is invariant to rigid motions and relabeling of the points. Put differently, such functions are required to be invariant to the action of $G=\mathbb{R}^{3}\rtimes\mathrm{SO}(3)\times S_{n}$ on $\mathbb{R}^{3\times n}$ via

\rho_{G}(t,R,P)(X)=R(X-t1_{n}^{T})P^{T},

(1)

where $t\in\mathbb{R}^{3}$ defines a translation, $R$ is a rotation and $P$ is a permutation matrix.

Equivariant functions are generalizations of invariant functions: If $G$ acts on $W_{1}$ via some action $\rho_{1}(g)$ , and on $W_{2}$ via some other group action $\rho_{2}(g)$ , we say that a function $f:W_{1}\to W_{2}$ is equivariant if

f(\rho_{1}(g)w)=\rho_{2}(g)f(w),\forall w\in W_{1}\text{ and }g\in G.

Invariant functions correspond to the special case where $\rho_{2}(g)$ is the identity mapping for all $g\in G$ .

In some machine learning tasks on point clouds, the functions learned are not invariant but rather equivariant. For example, segmentation tasks assign a discrete label to each point. They are invariant to translations and rotations but equivariant to permutations – in the sense that permuting the input causes a corresponding permutation of the output. Another example is predicting a normal for each point of a point cloud. This task is invariant to translations but equivariant to both rotations and permutations.

In this paper, we are interested in learning equivariant functions from point clouds into $W_{T}^{n}$ , where $W_{T}$ is some representation of $\mathrm{SO}(3)$ . The equivariance of these functions is with respect to the action $\rho_{G}$ on point clouds defined in equation 1, and the action of $G$ on $W_{T}^{n}$ defined by applying the rotation action from the left and permutation action from the right as in 1, but ‘ignoring’ the translation component. Thus, $G$ -equivariant functions will be translation invariant. This formulation of equivariance includes the normals prediction example by taking $W_{T}=\mathbb{R}^{3}$ , as well as the segmentation case by setting $W_{T}=\mathbb{R}$ with the trivial identity representation. We focus on the harder case of functions into $W_{T}^{n}$ which are equivariant to permutations, since it easily implies the easier case of permutation invariant functions to $W_{T}$ .

Notation. We use the notation $\mathbb{N}_{+}=\mathbb{N}\cup\{0\}$ and $\mathbb{N}^{*}_{+}=\bigcup_{r\in\mathbb{N}}\mathbb{N}_{+}^{r}$ . We set $[D]=\{1,\ldots,D\}$ and $[D]_{0}=\{0,\ldots,D\}$ .

Proofs. Proofs appear in the appendices, arranged according to sections.

3.2 Conditions for universality

The semi-lifted approach

In general, highly expressive equivariant neural networks can be achieved by using a ‘lifted approach’, where intermediate features in the network belong to high dimensional representations of the group. In the context of point clouds where typically $n\gg 3$ , many papers, e.g., (Thomas et al., 2018; Kondor, 2018; Bogatskiy et al., 2020) use a ‘semi-lifted’ approach, where hidden layers hold only higher dimensional representations of $\mathrm{SO}(3)$ , but not high order permutation representations. In this subsection, we propose a strategy for achieving universality with the semi-lifted approach.

We begin by an axiomatic formulation of the semi-lifted approach (see illustration in inset): we assume that our neural networks are composed of two main components: the first component is a family $\mathcal{F}_{\mathrm{feat}}$ of parametric continuous $G$ -equivariant functions $f_{\mathrm{feat}}$ which map the original point cloud $\mathbb{R}^{3\times n}$ to a semi-lifted point cloud $W_{\mathrm{feat}}^{n}$ , where $W_{\mathrm{feat}}$ is a lifted representation of $\mathrm{SO}(3)$ .

The second component is a family of parametric linear $\mathrm{SO}(3)$ -equivariant functions $\mathcal{F}_{\mathrm{pool}}$ , which map from the high order representation $W_{\mathrm{feat}}$ down to the target representation $W_{T}$ . Each such $\mathrm{SO}(3)$ –equivariant function $\Lambda:W_{\mathrm{feat}}\to W_{T}$ can be extended to a $\mathrm{SO}(3)\times S_{n}$ equivariant function $\hat{\Lambda}:W_{\mathrm{feat}}^{n}\to W_{T}^{n}$ by applying $\Lambda$ elementwise. For every positive integer $C$ , these two families of functions induce a family of functions $\mathcal{F}_{C}$ obtained by summing $C$ different compositions of these functions:

\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}})=\{f|f(X)=\sum_{c=1}^{C}\hat{\Lambda}_{c}(g_{c}(X)),~{}(\Lambda_{c},g_{c})\in\mathcal{F}_{\mathrm{pool}}\times\mathcal{F}_{\mathrm{feat}}\}.

(2)

Conditions for universality

We now describe sufficient conditions for universality using the semi-lifted approach. The first step is showing, as in (Yarotsky, 2018), that continuous $G$ -equivariant functions $\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n})$ can be approximated by $G$ -equivariant polynomials $\mathcal{P}_{G}(\mathbb{R}^{3\times n},W_{T}^{n})$ .

Lemma 1.

Any continuous $G$ -equivariant function in $\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n})$ can be approximated uniformly on compact sets by $G$ -equivariant polynomials in $\mathcal{P}_{G}(\mathbb{R}^{3\times n},W_{T}^{n})$ .

Universality is now reduced to the approximation of $G$ -equivariant polynomials. We provide two sufficient conditions which guarantee that $G$ -equivariant polynomials of degree $D$ can be expressed by function spaces $\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}})$ as defined in equation 2.

The first sufficient condition is that $\mathcal{F}_{\mathrm{feat}}$ is able to represent all polynomials which are translation invariant and permutation-equivariant (but not necessarily $\mathrm{SO}(3)$ - equivariant). More precisely:

Definition 1 ( $D$ -spanning).

For $D\in\mathbb{N}_{+}$ , let $\mathcal{F}_{\mathrm{feat}}$ be a subset of $\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{\mathrm{feat}}^{n})$ . We say that $\mathcal{F}_{\mathrm{feat}}$ is $D$ -spanning, if there exist $f_{1},\ldots,f_{K}\in\mathcal{F}_{\mathrm{feat}}$ , such that every polynomial $p:\mathbb{R}^{3\times n}\to\mathbb{R}^{n}$ of degree $D$ which is invariant to translations and equivariant to permutations, can be written as

p(X)=\sum_{k=1}^{K}\hat{\Lambda}_{k}(f_{k}(X)),

(3)

where $\Lambda_{k}:W_{\mathrm{feat}}\to\mathbb{R}$ are all linear functionals, and $\hat{\Lambda}_{k}:W_{\mathrm{feat}}^{n}\to\mathbb{R}^{n}$ are the functions defined by elementwise applications of $\Lambda_{k}$ .

In Lemma 3 we provide a concrete condition which implies $D$ -spanning.

The second sufficient condition is that the set of linear $\mathrm{SO}(3)$ equivariant functionals $\mathcal{F}_{\mathrm{pool}}$ contains all possible equivariant linear functionals:

Definition 2 (Linear universality).

We say that a collection $\mathcal{F}_{\mathrm{pool}}$ of equivariant linear functionals between two representations $W_{\mathrm{feat}}$ and $W_{T}$ of $\mathrm{SO}(3)$ is linearly universal, if it contains all linear $\mathrm{SO}(3)$ -equivariant mappings between the two representations.

When these two necessary conditions apply, a rather simple symmetrization arguments leads to the following theorem:

Theorem 1.

If $\mathcal{F}_{\mathrm{feat}}$ is $D$ -spanning and $\mathcal{F}_{\mathrm{pool}}$ is linearly universal, then there exists some $C(D)\in\mathbb{N}$ such that for all $C\geq C(D)$ the function space $\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}})$ contains all $G$ -equivariant polynomials of degree $\leq D$ .

As a result of Theorem 1 and Lemma 1 we obtain our universality result (see inset for illustration)

Corollary 1.

For all $C,D\in\mathbb{N}_{+}$ , let $\mathcal{F}_{C,D}$ denote function spaces generated by a pair of functions spaces which are $D$ -spanning and linearly universal as in equation 2. Then any continuous $G$ -equivariant function in $\mathcal{C}_{G}(\mathbb{R}^{3\times n},W_{T}^{n})$ can be approximated uniformly on compact sets by equivariant functions in

\mathcal{F}=\bigcup_{D\in\mathbb{N}}\mathcal{F}_{C(D),D}.

3.3 Sufficient conditions in action

In the remainder of the paper, we prove the universality of several $G$ -equivariant architectures, based on the framework we discussed in the previous subsection. We discuss two different strategies for achieving universality, which differ mainly in the type of lifted representations of $\mathrm{SO}(3)$ they use: (i) The first strategy uses (direct sums of) tensor-product representations; (ii) The second uses (direct sums of) irreducible representations. The main advantage of the first strategy from the perspective of our methodology is that achieving the $D$ -spanning property is more straightforward. The advantage of irreducible representations is that they almost automatically guarantees the linear universality property.

In Section 4 we discuss universality through tensor product representations, and give an example of a minimal tensor representation network architecture that would satisfy universality. In section 5 we discuss universality through irreducible representations, which is currently the more common strategy. We show that the TFN architecture (Thomas et al., 2018; Fuchs et al., 2020) which follows this strategy is universal, and describe a simple tweak that achieves universality using only low order filters, though the representations throughout the network are high dimensional.

4 Universality with tensor representations

In this section, we prove universality for models that are based on tensor product representations, as defined below. The main advantage of this approach is that $D$ -spanning is achieved rather easily. The main drawbacks are that its data representation is somewhat redundant and that characterizing the linear equivariant layers is more laborious.

Tensor representations We begin by defining tensor representations. For $k\in\mathbb{N}_{+}$ denote $\mathcal{T}_{k}=\mathbb{R}^{3^{k}}$ . $\mathrm{SO}(3)$ acts on $\mathcal{T}_{k}$ by the tensor product representation, i.e., by applying the matrix Kronecker product $k$ times: $\rho_{k}(R):=R^{\otimes k}$ . The inset illustrates the vector spaces and action for $k=1,2,3$ . With this action, for any $i_{1},\ldots,i_{k}\in[n]$ , the map from $\mathbb{R}^{3\times n}$ to $\mathcal{T}_{k}$ defined by

(x_{1},\ldots,x_{n})\mapsto x_{i_{1}}\otimes x_{i_{2}}\ldots\otimes x_{i_{k}}

(4)

is $\mathrm{SO}(3)$ equivariant.

A $D$ -spanning family We now show that tensor representations can be used to define a finite set of $D$ -spanning functions. The lifted representation $W_{\mathrm{feat}}$ will be given by

W_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T}.

The $D$ -spanning functions are indexed by vectors $\vec{r}=(r_{1},\ldots,r_{K})$ , where each $r_{k}$ is a non-negative integer. The functions $Q^{(\vec{r})}=(Q_{j}^{(\vec{r})})_{j=1}^{n}$ are defined for fixed $j\in[n]$ by

Q_{j}^{(\vec{r})}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\otimes r_{1}}\otimes x_{i_{2}}^{\otimes r_{2}}\otimes x_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes x_{i_{K}}^{\otimes r_{K}}.

(5)

The functions $Q_{j}^{(\vec{r})}$ are $\mathrm{SO}(3)$ equivariant as they are a sum of equivariant functions from equation 4. Thus $Q^{(\vec{r})}$ is $\mathrm{SO}(3)\times S_{n}$ equivariant. The motivation behind the definition of these functions is that known characterizations of permutation equivariant polynomials (Segol & Lipman, 2019) tell us that the entries of these tensor valued functions span all permutation equivariant polynomials (see the proof of Lemma 1 for more details). To account for translation invariance, we compose the functions $Q^{(\vec{r})}$ with the centralization operation and define the set of functions

{\mathcal{Q}}_{D}=\{\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})|\,\|\vec{r}\|_{1}\leq D\},

(6)

where $\iota$ is the natural embedding that takes each $\mathcal{T}_{T}$ into $W_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T}$ . In the following lemma, we prove that this set is $D$ -spanning.

Lemma 1.

For every $D\in\mathbb{N}_{+}$ , the set ${\mathcal{Q}}_{D}$ is $D$ -spanning.

A minimal universal architecture Once we have shown that ${\mathcal{Q}}_{D}$ is $D$ -spanning, we can design $D$ -spanning architectures, by devising architectures that are able to span all elements of ${\mathcal{Q}}_{D}$ . As we will now show, the compositional nature of neural networks allows us to do this in a very clean manner.

We define a parametric function $f(X,V|\theta_{1},\theta_{2})$ which maps $\mathbb{R}^{3\times n}\oplus\mathcal{T}_{k}^{n}$ to $\mathbb{R}^{3\times n}\oplus\mathcal{T}_{k+1}^{n}$ as follows: For all $j\in[n]$ , we have $f_{j}(X,V)=(X_{j},\tilde{V}_{j}(X,V))$ , where

\tilde{V}_{j}(X,V|\theta_{1},\theta_{2})=\theta_{1}\left(x_{j}\otimes V_{j}\right)+\theta_{2}\sum_{i}\left(x_{i}\otimes V_{i}\right)

(7)

We denote the set of functions $(X,V)\mapsto f(X,V|\theta_{1},\theta_{2})$ obtained by choosing the parameters $\theta_{1},\theta_{2}\in\mathbb{R}$ , by $\mathcal{F}_{min}$ . While in the hidden layers of our network the data is represented using both coordinates $(X,V)$ , the input to the network only contains an $X$ coordinate and the output only contains a $V$ coordinate. To this end, we define the functions

\mathrm{ext}(X)=(X,1_{n})\text{ and }\pi_{V}(X,V)=V.

(8)

We can achieve $D$ -spanning by composition of functions in $\mathcal{F}_{min}$ with these functions and centralizing:

Lemma 2.

The function set ${\mathcal{Q}}_{D}$ is contained in

\mathcal{F}_{\mathrm{feat}}=\{\iota\circ\pi_{V}\circ f^{1}\circ f^{2}\circ\ldots\circ f^{T}\circ\mathrm{ext}(X-\frac{1}{n}X1_{n}1_{n}^{T})|\,f^{j}\in\mathcal{F}_{min},T\leq D\}.

(9)

Thus $\mathcal{F}_{\mathrm{feat}}$ is $D$ -spanning.

To complete the construction of a universal network, we now need to characterize all linear equivariant functions from $W_{\mathrm{feat}}^{\mathcal{T}}$ to the target representation $W_{T}$ . In Appendix G we show how this can be done for the trivial representation $W_{T}=\mathbb{R}$ . This characterization gives us a set of linear function $\mathcal{F}_{\mathrm{pool}}$ , which combined with $\mathcal{F}_{\mathrm{feat}}$ defined in equation 9 (corresponds to $\mathrm{SO}(3)$ invariant functions) gives us a universal architecture as in Theorem 1. However, the disadvantage of this approach is that implementation of the linear functions in $\mathcal{F}_{\mathrm{pool}}$ is somewhat cumbersome.

In the next section we discuss irreducible representations, which give us a systematic way to address linear equivariant mappings into any $W_{T}$ . Proving $D$ -spanning for these networks is accomplished via the $D$ -spanning property of tensor representations, through the following lemma

Lemma 3.

If all functions in ${\mathcal{Q}}_{D}$ can be written as

\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})=\sum_{k=1}^{K}\hat{A}_{k}f_{k}(X),

where $f_{k}\in\mathcal{F}_{\mathrm{feat}}$ , $A_{k}:W_{\mathrm{feat}}\to W_{\mathrm{feat}}^{\mathcal{T}}$ and $\hat{A}_{k}:W_{\mathrm{feat}}^{n}\to(W_{\mathrm{feat}}^{\mathcal{T}})^{n}$ is defined by elementwise application of $A_{k}$ , then $\mathcal{F}_{\mathrm{feat}}$ is $D$ -spanning.

5 Universality with irreducible representations

In this section, we discuss how to achieve universality when using irreducible representations of $\mathrm{SO}(3)$ . We will begin by defining irreducible representations, and explaining how linear universality is easily achieved by them, while the $D$ -spanning properties of tensor representations can be preserved. This discussion can be seen as an interpretation of the choices made in the construction of TFN and similar networks in the literature. We then show that these architectures are indeed universal.

5.1 Irreducible representations of $SO(3)$

In general, any finite-dimensional representation $W$ of a compact group $H$ can be decomposed into irreducible representations: a subspace $W_{0}\subset W$ is $H$ -invariant if $hw\in W_{0}$ for all $h\in H,w\in W_{0}$ . A representation $W$ is irreducible if it has no non-trivial invariant subspaces. In the case of $\mathrm{SO}(3)$ , all irreducible real representations are defined by matrices $D^{(\ell)}(R)$ , called the real Wigner D-matrices, acting on $W_{\ell}:=\mathbb{R}^{2\ell+1}$ by matrix multiplication. In particular, the representation for $\ell=0,1$ are $D^{(0)}(R)=1$ and $D^{(1)}(R)=R$ .

Linear maps between irreducible representations As mentioned above, one of the main advantages of using irreducible representations is that there is a very simple characterization of all linear equivariant maps between two direct sums of irreducible representations. We use the notation $W_{\bm{l}}$ for direct sums of irreducible representations, where ${\bm{l}}=(\ell_{1},\ldots,\ell_{K})\in\mathbb{N}_{+}^{K}$ and $W_{\bm{l}}=\bigoplus_{k=1}^{K}W_{\ell_{k}}$ .

Lemma 1.

Let ${\bm{l}}^{(1)}=(\ell_{1}^{(1)},\ldots,\ell_{K_{1}}^{(1)})$ and ${\bm{l}}^{(2)}=(\ell_{1}^{(2)},\ldots,\ell_{K_{2}}^{(2)})$ . A function $\Lambda=(\Lambda_{1},\ldots,\Lambda_{K_{2}})$ is a linear equivariant mapping between $W_{{\bm{l}}^{(1)}}$ and $W_{{\bm{l}}^{(2)}}$ , if and only if there exists a $K_{1}\times K_{2}$ matrix $M$ with $M_{ij}=0$ whenever $\ell_{i}^{(1)}\neq\ell_{j}^{(2)}$ , such that

\Lambda_{j}(V)=\sum_{i=1}^{K_{1}}M_{ij}V_{i}

(10)

where $V=(V_{i})_{i=1}^{K_{1}}$ and $V_{i}\in W_{\ell_{i}^{(1)}}$ for all $i=1,\ldots,K_{1}$ .

We note that this lemma, which is based on Schur’s lemma, was proven in the complex setting in (Kondor, 2018). Here we observe that it holds for real irreducible representations of $\mathrm{SO}(3)$ as well since their dimension is always odd.

Clebsch-Gordan decomposition of tensor products As any finite-dimensional representation of $\mathrm{SO}(3)$ can be decomposed into a direct sum of irreducible representations, this is true for tensor representations as well. In particular, the Clebsch-Gordan coefficients provide an explicit formula for decomposing the tensor product of two irreducible representations $W_{\ell_{1}}$ and $W_{\ell_{2}}$ into a direct sum of irreducible representations. This decomposition can be easily extended to decompose the tensor product $W_{{\bm{l}}_{1}}\otimes W_{{\bm{l}}_{2}}$ into a direct sum of irreducible representations, where ${\bm{l}}_{1},{\bm{l}}_{2}$ are now vectors. In matrix notation, this means there is a unitary linear equivariant $U({\bm{l}}_{1},{\bm{l}}_{2})$ mapping of $W_{{\bm{l}}_{1}}\otimes W_{{\bm{l}}_{2}}$ onto $W_{{\bm{l}}}$ , where the explicit values of ${\bm{l}}={\bm{l}}({\bm{l}}_{1},{\bm{l}}_{2})$ and the matrix $U({\bm{l}}_{1},{\bm{l}}_{2})$ can be inferred directly from the case where $\ell_{1}$ and $\ell_{2}$ are scalars.

By repeatedly taking tensor products and applying Clebsch-Gordan decompositions to the result, TFN and similar architectures can achieve the $D$ -spanning property in a manner analogous to tensor representations, and also enjoy linear universality since they maintain irreducible representations throughout the network.

5.2 Tensor field networks

We now describe the basic layers of the TFN architecture (Thomas et al., 2018), which are based on irreducible representations, and suggest an architecture based on these layers which can approximate $G$ -equivariant maps into any representation $W_{{\bm{l}}_{T}}^{n},{\bm{l}}_{T}\in\mathbb{N}^{*}_{+}$ . There are some superficial differences between our description of TFN and the description in the original paper, for more details see Appendix F.

We note that the universality of TFN also implies the universality of (Fuchs et al., 2020), which is a generalization of TFN that enables adding an attention mechanism. Assuming the attention mechanism is not restricted to local neighborhoods, this method is at least as expressive as TFN.

TFNs are composed of three types of layers: (i) Convolution (ii) Self-interaction and (iii) Non-linearities. In our architecture, we only use the first two layers types, which we will now describe:¹¹1Since convolution layers in TFN are not linear, the nonlinearities are formally redundant.

Convolution. Convolutional layers involve taking tensor products of a filter and a feature vector to create a new feature vector, and then decomposing into irreducible representations. Unlike in standard CNN, a filter here depends on the input, and is a function $F:\mathbb{R}^{3}\to W_{{\bm{l}}_{D}}$ , where ${\bm{l}}_{D}=[0,1,\ldots,D]^{T}$ . The $\ell$ -th component of the filter $F(x)=\left[F^{(0)}(x),\ldots,F^{(D)}(x)\right]$ will be given by

F^{(\ell)}_{m}(x)=R^{(\ell)}(\|x\|)Y^{\ell}_{m}(\hat{x}),\,m=-\ell,\ldots,\ell

(11)

where $\hat{x}=x/\|x\|$ if $x\neq 0$ and $\hat{x}=0$ otherwise, $Y^{\ell}_{m}$ are spherical harmonics, and $R^{(\ell)}$ any polynomial of degree $\leq D$ . In Appendix F we show that these polynomial functions can be replaced by fully connected networks, since the latter can approximate all polynomials uniformly.

The convolution of an input feature $V\in W_{{\bm{l}}_{i}}^{n}$ and a filter $F$ as defined above, will give an output feature $\tilde{V}=(\tilde{V}_{a})_{a=1}^{n}\in W_{{\bm{l}}_{0}}^{n}$ , where ${\bm{l}}_{o}={\bm{l}}({\bm{l}}_{f},{\bm{l}}_{i})$ , which is given by

\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{0}V_{a}+\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}\right).

(12)

More formally we will think of convolutional layer as functions of the form $f(X,V)=(X,\tilde{V}(X,V))$ . These functions are defined by a choice of $D$ , a choice of a scalar polynomial $R^{(\ell)},\ell=0,\ldots,D$ , and a choice of the parameter $\theta_{0}\in\mathbb{R}$ in equation 12. We denote the set of all such functions $f$ by $\mathcal{F}_{D}$ .

Self Interaction layers. Self interaction layers are linear functions from $\hat{\Lambda}:W_{{\bm{l}}}^{n}\to W_{{\bm{l}}_{T}}^{n}$ , which are obtained from elementwise application of equivariant linear functions $\Lambda:W_{{\bm{l}}}\to W_{{\bm{l}}_{T}}$ . These linear functions can be specified by a choice of matrix $M$ with the sparsity pattern described in Lemma 1.

Network architecture. For our universality proof, we suggest a simple architecture which depends on two positive integer parameters $(C,D)$ : For given $D$ , we will define $\mathcal{F}_{\mathrm{feat}}(D)$ as the set of function obtained by $2D$ recursive convolutions

\mathcal{F}_{\mathrm{feat}}(D)=\{\pi_{V}\circ f^{2D}\circ\ldots f^{2}\circ f^{1}\circ\mathrm{ext}(X)|\,f^{j}\in\mathcal{F}_{D}\},

where $\mathrm{ext}$ and $\pi_{V}$ are defined as in equation 8. The output of a function in $\mathcal{F}_{\mathrm{feat}}(D)$ is in $W_{{\bm{l}}(D)}^{n}$ , for some ${\bm{l}}(D)$ which depends on $D$ . We then define $\mathcal{F}_{\mathrm{pool}}(D)$ to be the self-interaction layers which map $W_{{\bm{l}}(D)}^{n}$ to $W_{{\bm{l}}_{T}}^{n}$ . This choice of $\mathcal{F}_{\mathrm{feat}}(D)$ and $\mathcal{F}_{\mathrm{pool}}(D)$ , together with a choice of the number of channels $C$ , defines the final network architecture $\mathcal{F}^{\mathrm{TFN}}_{C,D}=\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}}(D),\mathcal{F}_{\mathrm{pool}}(D))$ as in equation 2. In the appendix we prove the universality of TFN:

Theorem 2.

For all $n\in\mathbb{N},{\bm{l}}_{T}\in\mathbb{N}^{*}_{+}$ ,

1.

For $D\in\mathbb{N}_{+}$ , every $G$ -equivariant polynomial $p:\mathbb{R}^{3\times n}\to W_{T}^{n}$ of degree $D$ is in $\mathcal{F}^{\mathrm{TFN}}_{C(D),D}$ .
2.

Every continuous $G$ -equivariant function can be approximated uniformly on compact sets by functions in $\cup_{D\in\mathbb{N}_{+}}\mathcal{F}^{\mathrm{TFN}}_{C(D),D}$

As discussed previously, the linear universality of $\mathcal{F}_{\mathrm{pool}}$ is guaranteed. Thus proving Theorem 2 amounts to showing that $\mathcal{F}_{\mathrm{feat}}(D)$ is $D$ -spanning. This is done using the sufficient condition for $D$ -spanning defined in Lemma 3.

Alternative architecture

The complexity of the TFN network used to construct $G$ -equivariant polynomials of degree $D$ , can be reduced using a simple modifications of the convolutional layer in equation 12: We add two parameters $\theta_{1},\theta_{2}\in\mathbb{R}$ to the convolutional layer, which is now defined as:

\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{1}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}+\theta_{2}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{a}\right).

(13)

With this simple change, we can show that $\mathcal{F}_{\mathrm{feat}}(D)$ is $D$ -spanning even if we only take filters of order $0$ and $1$ throughout the network. This is shown in Appendix E.

6 Conclusion

In this paper, we have presented a new framework for proving the universality of $G$ -equivariant point cloud networks. We used this framework for proving the universality of the TFN model (Thomas et al., 2018; Fuchs et al., 2020), and for devising two additional novel simple universal architectures.

We believe that the framework we developed here will be useful for proving the universality of other $G$ -equivariant models for point cloud networks, and other related equivariant models. We note that while the discussion in the paper was limited to point clouds in $\mathbb{R}^{3}$ and to the action of $\mathrm{SO}(3)$ , large parts of it are relevant to many other setups involving $d$ -dimensional point clouds with symmetry groups of the form $G=\mathbb{R}^{d}\rtimes H\times S_{n}$ on $\mathbb{R}^{d\times n}$ , where $H$ can be any compact topological group.

Acknowledgements

The authors would like to thank Taco Cohen for helpful discussions. N.D. acknowledge the support of Simons Math+X Investigators Award 400837.

References

Atzmon et al. (2018) Matan Atzmon, Haggai Maron, and Yaron Lipman. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018.
Bogatskiy et al. (2020) Alexander Bogatskiy, Brandon Anderson, Jan T Offermann, Marwah Roussi, David W Miller, and Risi Kondor. Lorentz group equivariant neural network for particle physics. arXiv preprint arXiv:2006.04780, 2020.
Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018.
Dai & Xu (2013) Feng Dai and Yuan Xu. Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–68, 2018.
Fuchs et al. (2020) Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. arXiv preprint arXiv:2006.10503, 2020.
Fulton & Harris (2013) William Fulton and Joe Harris. Representation theory: a first course, volume 129. Springer Science & Business Media, 2013.
Gilmer et al. (2017) Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.
Guo et al. (2020) Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. CoRR, abs/1905.04943, 2019. URL http://arxiv.org/abs/1905.04943.
Kondor (2018) Risi Kondor. N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588, 2018.
Kondor et al. (2018a) Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems, pp. 10117–10126, 2018a.
Kondor et al. (2018b) Risi Kondor, Hy Truong Son, Horace Pan, Brandon Anderson, and Shubhendu Trivedi. Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144, 2018b.
Kraft & Procesi (2000) Hanspeter Kraft and Claudio Procesi. Classical invariant theory, a primer. Lecture Notes, Version, 2000.
Li et al. (2019) Jiaxin Li, Yingcai Bi, and Gim Hee Lee. Discrete rotation equivariance for point cloud recognition. In 2019 International Conference on Robotics and Automation (ICRA), pp. 7269–7275. IEEE, 2019.
Maehara & NT (2019) Takanori Maehara and Hoang NT. A simple proof of the universality of invariant/equivariant graph neural networks, 2019.
Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. arXiv preprint arXiv:1905.11136, 2019a.
Maron et al. (2019b) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b. URL https://openreview.net/forum?id=Syx72jC9tm.
Maron et al. (2019c) Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. In International conference on machine learning, 2019c.
Maron et al. (2020) Haggai Maron, Or Litany, Gal Chechik, and Ethan Fetaya. On learning sets of symmetric elements. arXiv preprint arXiv:2002.08599, 2020.
Morris et al. (2018) Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. arXiv preprint arXiv:1810.02244, 2018.
Poulenard et al. (2019) Adrien Poulenard, Marie-Julie Rakotosaona, Yann Ponty, and Maks Ovsjanikov. Effective rotation-invariant point cnn with spherical harmonics kernels. In 2019 International Conference on 3D Vision (3DV), pp. 47–56. IEEE, 2019.
Qi et al. (2017a) Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017a.
Qi et al. (2017b) Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108, 2017b.
Ravanbakhsh (2020) Siamak Ravanbakhsh. Universal equivariant multilayer perceptrons. arXiv preprint arXiv:2002.02912, 2020.
Segol & Lipman (2019) Nimrod Segol and Yaron Lipman. On universal equivariant set networks. arXiv preprint arXiv:1910.02421, 2019.
Serviansky et al. (2020) Hadar Serviansky, Nimrod Segol, Jonathan Shlomi, Kyle Cranmer, Eilam Gross, Haggai Maron, and Yaron Lipman. Set2graph: Learning graphs from sets. arXiv preprint arXiv:2002.08772, 2020.
Thomas et al. (2018) Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
Wang et al. (2019) Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco Cohen. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data. 2018. URL http://arxiv.org/abs/1807.02547.
Worrall & Brostow (2018) Daniel Worrall and Gabriel Brostow. Cubenet: Equivariance to 3d rotation and translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 567–584, 2018.
Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
Yarotsky (2018) Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018.
Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp. 3391–3401, 2017.
Zhao et al. (2019) Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, and Federico Tombari. Quaternion equivariant capsule networks for 3d point clouds. arXiv preprint arXiv:1912.12098, 2019.

Appendix A Notation

We introduce some notation for the proofs in the appendices. We use the shortened notation $\bar{X}=X-\frac{1}{n}X1_{n}1_{n}^{T}$ and denote the columns of $\bar{X}$ by $(\bar{x}_{1},\ldots,\bar{x}_{n})$ . We denote

\Sigma_{T}=\{\vec{r}\in\mathbb{N}^{*}_{+}|\,\|\vec{r}\|_{1}=T\}

Appendix B Proofs for Section 3

B.1 $G$ -equivariant polynomials are dense

A first step in proving denseness of $G$ -equivariance polynomials, and in the proof used in the next subsection is the following simple lemma, which shows that translation invariance can be dealt with simply by centralizing the point cloud.

In the following, $\rho_{W_{T}}$ is some representation of $\mathrm{SO}(3)$ on a finite dimensional real vector space $W_{T}$ . this induces an action $\rho_{W_{T}\times S_{n}}$ of $\mathrm{SO}(3)\times S_{n}$ on $W_{T}^{n}$ by

\rho_{W_{T}\times S_{n}}(R,P)(Y)=\rho_{W_{T}}(R)YP^{T}

This is also the action of $G$ which we consider, $\rho_{G}=\rho_{W_{T}\times S_{n}}$ , where we have invariance with respect to the translation coordinate. The action of $G$ on $\mathbb{R}^{3\times n}$ is defined in equation 1.

Lemma 1.

A function $f:\mathbb{R}^{3\times n}\to W_{T}^{n}$ is $G$ -equivariant, if and only if there exists a function $h$ which is equivariant with respect to the action of $\mathrm{SO}(3)\times S_{n}$ on $\mathbb{R}^{3\times n}$ , and

f(X)=h(X-\frac{1}{n}X1_{n}1_{n}^{T})

(14)

Proof.

Recall that $G$ -equivariance means $\mathrm{SO}(3)\times S_{n}$ equivariance and translation invariance. Thus if $f$ is $G$ -equivariant then equation 14 holds with $h=f$ .

On the other hand, if $f$ satisfies equation 14 then we claim it is $G$ -equivariant. Indeed, for all $(t,R,P)\in\mathbb{R}^{d}\rtimes\mathrm{SO}(3)\times S_{n}$ , since $P^{T}1_{n}1_{n}^{T}=1_{n}1_{n}^{T}=1_{n}1_{n}^{T}P^{T}$ ,

	$\displaystyle f\left(\rho_{G}(t,R,P)(X)\right)=$	$\displaystyle f(R(X+t1_{n})P^{T})=h(R(X+t1_{n})P^{T}-\frac{1}{n}R(X+t1)P^{T}1_{n}1_{n}^{T})$
		$\displaystyle=h(R(X-\frac{1}{n}X1_{n}1_{n}^{T})P^{T})=h\left(\rho_{\mathbb{R}^{3}\times S_{n}}(R,P)(X-\frac{1}{n}X1_{n}1_{n}^{T})\right)$
		$\displaystyle=\rho_{W_{T}\times S_{n}}(R,P)h\left(X-\frac{1}{n}X1_{n}1_{n}^{T}\right)$
		$\displaystyle=\rho_{G}(t,R,P)f(X).$

∎

We now prove denseness of $G$ -equivariant polynomials in the space of $G$ -invariant continuous functions (Lemma 1). See 1

Proof of Lemma 1.

Let $K\subseteq\mathbb{R}^{3\times n}$ be a compact set. We need to show that continuous $G$ -equivariant functions can be approximated uniformly in $K$ by $G$ -equivariant polynomials. Let $K_{0}$ denote the compact set which is the image of $K$ under the centralizing map $X\mapsto X-\frac{1}{n}X1_{n}1_{n}^{T}$ . By Lemma 1, it is sufficient to show that every $\mathrm{SO}(3)\times S_{n}$ equivariant continuous function $f$ can be approximated uniformly on $K_{0}$ by a sequence of $\mathrm{SO}(3)\times S_{n}$ equivariant polynomials $p_{k}$ . The argument is now concluded by the following general lemma:

Lemma 2.

Let $G$ be a compact group, Let $\rho_{1}$ and $\rho_{2}$ be continuous²²2By this we mean that the maps $(g,X)\mapsto\rho_{j}(g)X,j=1,2$ are jointly continuousrepresentations of $G$ on the Euclidean spaces $W_{1}$ and $W_{2}$ . Let $K\subseteq W_{1}$ be a compact set. Then every equivariant function $f:W_{1}\to W_{2}$ can be approximated uniformly on $K$ by a sequence of equivariant polynomials $p_{k}:W_{1}\mapsto W_{2}$ .

Let $\mu$ be the Haar probability measure associated with the compact group $G$ . Let $K_{1}$ denote the compact set obtained as an image of the compact set $G\times K$ under the continuous mapping

(g,X)\mapsto\rho_{1}(g)X.

Using the Stone-Weierstrass theorem, let $p_{k}$ be a sequence of (not necessarily equivariant) polynomials which approximate $f$ uniformly on $K_{1}$ . Every degree $D$ polynomial $p:W_{1}\to W_{2}$ induces a $G$ -equivariant function

\langle p\rangle(X)=\int_{G}\rho_{2}(g^{-1})p(\rho_{1}(g)X)d\mu(g).

This function $\langle p\rangle$ is a degree $D$ polynomial as well: This is because $\langle p\rangle$ can be approximated uniformly on $K_{1}$ by “Riemann Sums” of the form $\sum_{j=1}^{N}w_{j}\rho_{2}(g_{j}^{-1})p(\rho_{1}(g_{j})X)$ which are degree $D$ polynomials, and because degree $D$ polynomials are closed in $C(K_{1})$ .

Now for all $X\in K_{1}$ , continuity of the function $g\mapsto\rho_{2}(g^{-1})$ implies that the operator norm of $\rho_{2}(g^{-1})$ is bounded uniformly by some constant $N>0$ , and so

	$\displaystyle\|\langle p_{k}\rangle(X)-f(X)\|$	$\displaystyle=\left\|\int_{G}\rho_{2}(g^{-1})p_{k}(\rho_{1}(g)X)-\rho_{2}(g^{-1})f(\rho_{1}(g)X)d\mu(g)\right\|$
		$\displaystyle=\left\|\int_{G}\rho_{2}(g^{-1})\left[p_{k}(\rho_{1}(g)X)-f(\rho_{1}(g)X)\right]d\mu(g)\right\|\leq N\\|f-p_{k}\\|_{C(K_{1})}\rightarrow 0$

∎

We prove Theorem 1 See 1

Proof.

By the $D$ -spanning assumption, there exist $f_{1},\ldots,f_{K}\in\mathcal{F}_{\mathrm{feat}}$ such that any vector valued polynomial $p:\mathbb{R}^{3\times n}\to\mathbb{R}^{n}$ invariant to translations and equivariant to permutations is of the form

p(X)=\sum_{k=1}^{K}\hat{\Lambda}_{k}(f_{k}(X)),

(15)

where $\Lambda_{k}$ are linear functions to $\mathbb{R}$ . If $p$ is a matrix value polynomial mapping $\mathbb{R}^{3\times n}$ to $W_{T}^{n}=\mathbb{R}^{t\times n}$ , which is invariant to translations and equivariant to permutations, then it is of the form $p=(p_{ij})_{i\in[t],j\in[n]}$ , and each $p_{i}=(p_{ij})_{j\in[n]}$ is itself invariant to translations and permutation equivariant. It follows that matrix valued $p$ can also be written in the form equation 15, the only difference being that the image of the linear functions $\Lambda_{k}$ is now $\mathbb{R}^{t}$ .

Now let $p:\mathbb{R}^{d\times n}\to W_{T}$ be a $G$ -equivariant polynomial of degree $\leq D$ . It remains to show that we can choose $\Lambda_{k}$ to be $\mathrm{SO}(3)$ equivariant. We do this by a symmetrization argument: denote the Haar probability measure on $\mathrm{SO}(3)$ by $\nu$ , and the action of $\mathrm{SO}(3)$ on $W_{\mathrm{feat}}$ and $W_{T}$ by $\rho_{1}$ and $\rho_{2}$ respectively Denote $p=(p_{j})_{j=1}^{n}$ and $f_{k}=(f_{k}^{j})_{j=1}^{n}$ . For every $j=1,\ldots,n$ , we use the $\mathrm{SO}(3)$ equivariance of $p_{j}$ and $f_{k}^{j}$ to obtain

	$\displaystyle p_{j}(X)$	$\displaystyle=\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ p_{j}(RX)d\nu(R)=\sum_{k=1}^{K}\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\circ f_{j}^{k}(RX)d\nu(R)$
		$\displaystyle=\sum_{k=1}^{K}\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\left(\rho_{1}(R)\circ f_{k}^{j}(X)\right)d\nu(R)=\sum_{k=1}^{K}\tilde{\Lambda}_{k}\circ f_{k}^{j}(X),$

where $\tilde{\Lambda}_{k}$ stands for the equivariant linear functional from $W_{\mathrm{feat}}$ to $W_{T}$ , defined for $w\in W_{\mathrm{feat}}$ by

\tilde{\Lambda}_{k}(w)=\int_{\mathrm{SO}(3)}\rho_{2}(R^{-1})\circ\Lambda_{k}\left(\rho_{1}(R)w\right)d\nu(R).

Thus we have shown that $p$ is in $\mathcal{F}_{C}(\mathcal{F}_{\mathrm{feat}},\mathcal{F}_{\mathrm{pool}})$ for $C=K$ , as required. ∎

Appendix C Proofs for Section 4

We prove Lemma 1 See 1

Proof.

It is known (Segol & Lipman, 2019) (Theorem 2) that polynomials $p:\mathbb{R}^{3\times n}\to\mathbb{R}^{n}$ which are $S_{n}$ -equivariant, are spanned by polynomials of the form $p_{\vec{\alpha}}=(p^{j}_{\vec{\alpha}})_{j=1}^{n}$ , defined as

p^{j}_{\vec{\alpha}}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\alpha_{1}}x_{i_{2}}^{\alpha_{2}}\ldots x_{i_{k}}^{\alpha_{k}}

(16)

where $\vec{\alpha}=(\alpha_{1},\ldots,\alpha_{K})$ and each $\alpha_{k}\in\mathbb{N}_{+}^{3}$ is a multi-index. It follows that $S_{n}$ -equivariant polynomials of degree $\leq D$ are spanned by polynomials of the form $p^{j}_{\vec{\alpha}}$ where $\sum_{k=1}^{K}|\alpha_{k}|\leq D$ . Denoting $r_{k}=|\alpha_{k}|,k=1,\ldots K$ , the sum of all $r_{k}$ by $T$ , and $\vec{r}=(r_{k})_{k=1}^{K}$ , we see that that exists a linear functional $\Lambda_{\vec{\alpha},\vec{r}}:\mathcal{T}_{T}\to\mathbb{R}$ such that

p^{j}_{\vec{\alpha}}(X)=\Lambda_{\vec{\alpha},\vec{r}}\circ Q_{j}^{\vec{r}}(X)

where we recall that $Q^{\vec{r}}=\left(Q_{j}^{(\vec{r})}(X)\right)_{j=1}^{n}$ is defined in equation 5 as

Q_{j}^{(\vec{r})}(X)=\sum_{i_{2},\ldots,i_{K}=1}^{n}x_{j}^{\otimes r_{1}}\otimes x_{i_{2}}^{\otimes r_{2}}\otimes x_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes x_{i_{K}}^{\otimes r_{K}}.

Thus polynomials $p=(p_{j})_{j=1}^{n}$ which are of degree $\leq D$ , and are $S_{n}$ equivariant, can be written as

p_{j}(X)=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\sum_{\vec{\alpha}||\alpha_{k}|=r_{k}}\Lambda_{\vec{\alpha},\vec{r}}\left(Q^{(\vec{r})}_{j}(X)\right)=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\Lambda_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}_{j}(X)\right),j=1,\ldots,n,

where $\Lambda_{\vec{r}}=\sum_{\vec{\alpha}||\alpha_{k}|=r_{k}}\Lambda_{\vec{\alpha},\vec{r}}\circ\iota_{T}^{-1}$ , and $\iota_{T}^{-1}$ is the left inverse of the embedding $\iota$ . If $p$ is also translation invariant, then

p(X)=p(X-\frac{1}{n}X1_{n}1_{n}^{T})=\sum_{T\leq D}\sum_{\vec{r}\in\Sigma_{T}}\hat{\Lambda}_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\right).

Thus ${\mathcal{Q}}_{D}$ is $D$ -spanning. ∎

We prove Lemma 2 See 2

Proof.

In this proof we make the dependence of $\mathcal{F}_{\mathrm{feat}}$ on $D$ explicit and denote $\mathcal{F}_{\mathrm{feat}}(D)$ .

We prove the claim by induction on $D$ . Assume $D=0$ . Then ${\mathcal{Q}}_{0}$ contains only the constant function $X\mapsto 1_{n}\in\mathcal{T}_{0}^{n}$ , and this is precisely the function $\pi_{V}\circ\mathrm{ext}\in\mathcal{F}_{\mathrm{feat}}(0)$ .

Now assume the claim holds for all $D^{\prime}$ with $D-1\geq D^{\prime}\geq 0$ , and prove the claim for $D$ . Choose $\vec{r}=(r_{1},\ldots,r_{k})\in\Sigma_{T}$ for some $T\leq D$ , we need to show that the function $Q^{(\vec{r})}$ is in $\mathcal{F}_{\mathrm{feat}}(D)$ . Since $\mathcal{F}_{\mathrm{feat}}(D-1)\subseteq\mathcal{F}_{\mathrm{feat}}(D)$ we know from the induction hypothesis that this is true if $T<D$ . Now assume $T=D$ . We consider two cases:

If $r_{1}>0$ , we set $\tilde{r}=(r_{1}-1,r_{2},\ldots,r_{K})$ . We know that $\iota\circ Q^{(\tilde{r})}(\bar{X})\in\mathcal{F}_{\mathrm{feat}}(D-1)$ by the induction hypothesis. So there exist $f_{2},\ldots,f_{D}$ such that

\iota\circ\pi_{V}\circ f_{2}\circ\ldots\circ f_{D}\circ\mathrm{ext}(\bar{X})=\iota\circ Q^{(\tilde{r})}(\bar{X}).

(17)

Now choose $f_{1}\in\mathcal{F}_{min}$ to be the function whose $V$ coordinate $\tilde{V}=(\tilde{V}_{j})_{j=1}^{n}$ , is given by $\tilde{V}_{j}(X,V)=x_{j}\otimes V_{j}$ , obtained by setting $\theta_{1}=1,\theta_{2}=0$ in equation 7. Then , we have

	$\displaystyle\tilde{V}_{j}(\bar{X},Q^{(\tilde{r})}(\bar{X}))$	$\displaystyle=\sum_{i_{2},\ldots,i_{K}=1}^{n}\bar{x}_{j}\otimes\bar{x}_{j}^{\otimes(r_{1}-1)}\otimes\bar{x}_{i_{2}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}$
		$\displaystyle=Q_{j}^{(\vec{r})}(\bar{X}).$

and so

\iota\circ\pi_{V}\circ f_{1}\circ f_{2}\circ\ldots\circ f_{D}\circ\mathrm{ext}(X-\frac{1}{n}X1_{n}1_{n}^{T})=\iota\circ Q^{(\vec{r})}(\bar{X}).

(18)

and $\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\in\mathcal{F}_{\mathrm{feat}}(D)$ .

If $r_{1}=0$ . We assume without loss of generality that $r_{2}>0$ . Set $\tilde{r}=(r_{2}-1,r_{3},\ldots,r_{K})$ . As before by the induction hypothesis there exist $f_{2},\ldots,f_{D}$ which satisfy equation 17. This time we choose $f_{1}\in\mathcal{F}_{min}$ to be the function whose $V$ coordinate $\tilde{V}=(\tilde{V}_{j})_{j=1}^{n}$ , is given by $\tilde{V}_{j}(X,V)=\sum_{j}x_{j}\otimes V_{j}$ , obtained by setting $\theta_{1}=0,\theta_{2}=1$ in equation 7. Then we have

	$\displaystyle\tilde{V}_{j}(\bar{X},Q^{(\tilde{r})}(\bar{X}))$	$\displaystyle=\sum_{j=1}^{n}\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{j}\otimes\bar{x}_{j}^{\otimes(r_{2}-1)}\otimes\bar{x}_{i_{3}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}$
		$\displaystyle=\sum_{i_{2},i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{2}}^{\otimes r_{2}}\otimes\bar{x}_{i_{3}}^{\otimes r_{2}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}$
		$\displaystyle=Q_{j}^{(\vec{r})}(\bar{X}).$

Thus equation 18 holds, and so again we have that $\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\in\mathcal{F}_{\mathrm{feat}}(D)$ .

∎

Finally we prove Lemma 3 See 3

Proof.

If the conditions in Lemma 3 hold, then since ${\mathcal{Q}}_{D}$ is $D$ -spanning, every translation invariant and permutation equivariant polynomials $p$ of degree $D$ can be written as

	$\displaystyle p(X)$	$\displaystyle=\sum_{\vec{r}\|\\|\vec{r}\\|_{1}\leq D}\hat{\Lambda}_{\vec{r}}\left(\iota\circ Q^{(\vec{r})}(X-\frac{1}{n}X1_{n}1_{n}^{T})\right)=\sum_{\vec{r}\|\\|\vec{r}\\|_{1}\leq D}\hat{\Lambda}_{\vec{r}}\left(\sum_{k=1}^{K_{\vec{r}}}\iota\circ\hat{A}_{k,\vec{r}}f_{k,\vec{r}}(X)\right)$
		$\displaystyle=\sum_{\vec{r}\|\\|\vec{r}\\|_{1}\leq D}\sum_{k=1}^{K_{\vec{r}}}\hat{\Lambda}_{k,\vec{r}}\left(f_{k,\vec{r}}(X)\right)$

where we denote $\Lambda_{k,\vec{r}}=\Lambda_{\vec{r}}\circ\iota\circ A_{k,\vec{r}}$ . Thus we proved $\mathcal{F}_{\mathrm{feat}}$ is $D$ -spanning. ∎

Appendix D Proofs for Section 5

We prove Lemma 1 See 1

Proof.

As mentioned in the main text, this lemma is based on Schur’s lemma. This lemma is typically stated for complex representations, but holds for odd dimensional real representation as well. We recount the lemma and its proof here for completeness (see also (Fulton & Harris, 2013)).

Lemma 1 (Schur’s Lemma for $\mathrm{SO}(3)$ ).

Let $\Lambda:W_{\ell_{1}}\to W_{\ell_{2}}$ be a linear equivariant map. If $\ell_{1}\neq\ell_{2}$ then $\Lambda=0$ . Otherwise $\Lambda$ is a scalar multiply of the identity.

Proof.

Let $\Lambda:W_{\ell_{1}}\to W_{\ell_{2}}$ be a linear equivariant map. The image and kernel of $\Lambda$ are invariant subspaces of $W_{\ell_{1}}$ and $W_{\ell_{2}}$ , respectively. It follows that if $\Lambda\neq 0$ then $\Lambda$ is a linear isomorphism so necessarily $\ell_{1}=\ell_{2}$ . Now assume $\ell_{1}=\ell_{2}$ . Since the dimension of $W_{\ell_{1}}$ is odd, $\Lambda$ has a real eigenvalue $\lambda$ . The linear function $\Lambda-\lambda I$ is equivariant and has a non-trivial kernel, so $\Lambda-\lambda I=0$ . ∎

We now return to the proof of Lemma 1. Note that each $\Lambda_{j}:W_{{\bm{l}}^{(1)}}\to W_{\ell_{j}^{(2)}}$ is linear and $\mathrm{SO}(3)$ equivariant. Next denote the restrictions of each $\Lambda_{j}$ to $W_{\ell_{i}^{(1)}},i=1,\ldots,K_{2}$ by $\Lambda_{ij}$ , and note that

\Lambda_{j}(V_{1},\ldots,V_{K_{1}})=\sum_{i=1}^{K_{1}}\Lambda_{ij}(V_{i}).

(19)

By considering vectors in $W_{{\bm{l}}^{(1)}}$ of the form $(0,\ldots,0,V_{i},0\ldots,0)$ we see that each $\Lambda_{ij}:W_{\ell_{i}^{(1)}}\to W_{\ell_{j}^{(2)}}$ is linear and $\mathrm{SO}(3)$ -equivariant. Thus by Schur’s lemma, if $\ell_{i}^{(1)}=\ell_{j}^{(2)}$ then $\Lambda_{ij}(V_{i})=M_{ij}V_{i}$ for some real $M_{ij}$ , and otherwise $M_{ij}=0$ . Plugging this into equation 19 we obtain equation 10.

∎

We prove Theorem 2 which shows that the TFN network described in the main text is universal: See 2

Proof.

As mentioned in the main text, we only need to show that the function space $\mathcal{F}_{\mathrm{feat}}(D)$ is $D$ -spanning. Recall that $\mathcal{F}_{\mathrm{feat}}(D)$ is obtained by $2D$ consecutive convolutions with $D$ -filters. In general, we denote the space of functions defined by applying $J$ consecutive convolutions by $\mathcal{G}_{J,D}$ .

If ${\mathcal{Y}}$ is a space of functions from $\mathbb{R}^{3\times n}\to Y^{n}$ , we denote by $\langle{\mathcal{Y}},\mathcal{T}_{T}\rangle$ the space of all functions $p:\mathbb{R}^{3\times n}\to\mathcal{T}_{T}^{n}$ of the form

p(X)=\sum_{k=1}^{K}\hat{A}_{k}f_{k}(X),

(20)

where $A_{k}:Y\to\mathcal{T}_{T}$ are linear functions, $\hat{A}_{k}:Y^{n}\to\mathcal{T}_{T}^{n}$ are induced by elementwise application, and $f_{k}\in{\mathcal{Y}}$ . This notation is useful because: (i) by Lemma 3 it is sufficient to show that $Q^{(\vec{r})}(\bar{X})$ is in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{T}}\rangle$ for all $\vec{r}\in\Sigma_{T}$ and all $T\leq D$ , and because (ii) it enables comparison of the expressive power of function spaces ${\mathcal{Y}}_{1},{\mathcal{Y}}_{2}$ whose elements map to different spaces $Y_{1}^{n},Y_{2}^{n}$ , since the elements in $\langle{\mathcal{Y}}_{i},\mathcal{T}_{T}\rangle,i=1,2$ both map to the same space. In particular, note that if for every $f\in{\mathcal{Y}}_{2}$ there is a $g\in{\mathcal{Y}}_{1}$ and a linear map $A:Y_{1}\to Y_{2}$ such that $f(X)=\hat{A}\circ g(X)$ , then $\langle{\mathcal{Y}}_{2},\mathcal{T}_{T}\rangle\subseteq\langle{\mathcal{Y}}_{1},\mathcal{T}_{T}\rangle$ .

We now use this abstract discussion to prove some useful results: the first is that for the purpose of this lemma, we can ‘forget about’ the multiplication by a unitary matrix in equation 12, used for decomposition into irreducible representations: To see this, denote by $\tilde{\mathcal{G}}_{J,D}$ the function space obtained by taking $J$ consecutive convolutions with $D$ -filters without multiplying by a unitary matrix in equation 12. Since Kronecker products of unitary matrices are unitary matrices, we obtain that the elements in $\mathcal{G}_{J,D}$ and $\tilde{\mathcal{G}}_{J,D}$ differ only by multiplication by a unitary matrix, and thus $\langle\tilde{\mathcal{G}}_{J,D},\mathcal{T}_{T}\rangle\subseteq\langle\mathcal{G}_{J,D},\mathcal{T}_{T}\rangle$ and $\langle\mathcal{G}_{J,D},\mathcal{T}_{T}\rangle\subseteq\langle\tilde{\mathcal{G}}_{J,D},\mathcal{T}_{T}\rangle$ , so both sets are equal.

Next, we prove that adding convolutional layers (enlarging $J$ ) or taking higher order filters (enlarging $D$ ) can only increase the expressive power of a network.

Lemma 2.

For all $J,D,T\in\mathbb{N}_{+}$ ,

1.

$\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle\subseteq\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{T}}\rangle$ .
2.

$\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle\subseteq\langle\mathcal{G}_{J,D+1},{\mathcal{T}_{T}}\rangle$ .

Proof.

The first claim follows from the fact that every function $f$ in $\langle\mathcal{G}_{J,D},{\mathcal{T}_{T}}\rangle$ can be identified with a function in $\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{T}}\rangle$ by taking the $J+1$ convolutional layer in equation 12 with $\theta_{0}=1,F=0$ .

The second claim follows from the fact that $D$ -filters can be identified with $D+1$ -filters whose $D+1$ -th entry is $0$ . ∎

The last preliminary lemma we will need is

Lemma 3.

For every $J,D\in\mathbb{N}_{+}$ , and every $t,s\in\mathbb{N}_{+}$ , if $p\in\langle\mathcal{G}_{J,D},{\mathcal{T}_{t}}\rangle$ , then the function $q$ defined by

q_{a}(X)=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes s}\otimes p_{b}(X)

is in $\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{t+s}}\rangle$ .

Proof.

This lemma is based on the fact that the space of $s$ homogeneous polynomial on $\mathbb{R}^{3}$ is spanned by polynomials of the form $\|x\|^{s-\ell}Y^{\ell}_{m}(x)$ for $\ell=s,s-2,s-4\ldots$ (Dai & Xu, 2013). For each such $\ell$ , and $s\leq D$ , these polynomials can be realized by filters $F^{(\ell)}$ by setting $R^{(\ell)}(\|x\|)=\|x\|^{s}$ so that

F_{m}^{(\ell)}(x)=\|x\|^{s}Y^{\ell}_{m}(\hat{x})=\|x\|^{s-\ell}Y_{m}^{\ell}(x).

For every $D\in\mathbb{N}$ and $s\leq D$ , we can construct a $D$ -filter $F^{s,D}=(F^{(0)},\ldots,F^{(D)})$ where $F^{(s)},F^{(s-2)},\ldots$ are as defined above and the other filters are zero. Since both the entries of $F^{s,D}(x)$ , and the entries of $x^{\otimes s}$ , span the space of $s$ -homogeneous polynomials on $\mathbb{R}^{3}$ , it follows that there exists a linear mapping $B_{s}:W_{{\bm{l}}_{D}}\to\mathcal{T}_{s}$ so that

x^{\otimes s}=B_{s}(F^{s,D}(x)),\forall x\in\mathbb{R}^{3}.

(21)

Thus, since $p$ can be written as a sum of compositions of linear mappings with functions in $\mathcal{G}_{J,D}$ as in equation 20, and similarly $x^{\otimes s}$ is obtained as a linear image of functions in $\mathcal{G}_{1,D}$ as in equation 21, we deduce that

\sum_{b=1}^{n}(x_{a}-x_{b})\otimes p_{b}(X)=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes p_{b}(X)

is in $\langle\mathcal{G}_{J+1,D},{\mathcal{T}_{t+s}}\rangle$ ∎

As a final preliminary, we note that $D$ -filters can perform an averaging operation by setting $R^{(0)}=1$ and $\theta_{0},R^{(1)},\ldots,R^{(D)}=0$ in equation 11 and equation 12 . We call this $D$ -filter an averaging filter.

We are now ready to prove our claim: we need to show that for every $D,T\in\mathbb{N}_{+}$ where $T\leq D$ , for every $\vec{r}\in\Sigma_{T}$ , the function $Q^{(\vec{r})}$ is in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{T}}\rangle$ . Note that due to the inclusion relations in Lemma 2 it is sufficient to prove this for the case $T=D$ . We prove this by induction on $D$ . For $D=0$ , vectors $\vec{r}\in\Sigma_{0}$ contains only zeros and so

Q^{(\vec{r})}(\bar{X})=1_{n}=\pi_{V}\circ\mathrm{ext}(X)\in\langle\mathcal{G}_{0,0},{\mathcal{T}_{0}}\rangle.

We now assume the claim is true for all $D^{\prime}$ with $D>D^{\prime}\geq 0$ and prove the claim is true for $D$ . We need to show that for every $\vec{r}\in\Sigma_{D}$ the function $Q^{(\vec{r})}$ is in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle$ . We prove this yet again by induction, this time on the value of $r_{1}$ : assume that $\vec{r}\in\Sigma_{D}$ and $r_{1}=0$ .. Denote by $\tilde{r}$ the vector in $\Sigma_{D-1}$ defined by

\tilde{r}=(r_{2}-1,r_{3},\ldots,r_{K}).

By the induction assumption on $D$ , we know that $Q^{(\tilde{r})}(\bar{X})\in\mathcal{G}_{2(D-1),D-1,D-1}$ and so

	$\displaystyle q_{a}(X)$	$\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes Q^{(\tilde{r})}_{b}(\bar{X})=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})\otimes\bar{x}_{b}^{\otimes r_{2}-1}\otimes\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}$
		$\displaystyle=\left(\bar{x}_{a}\otimes\sum_{b=1}^{n}Q^{(\tilde{r})}_{b}(\bar{X})\right)-Q^{(\vec{r})}(\bar{X})$

is in $\langle\mathcal{G}_{2D-1,D-1},{\mathcal{T}_{D}}\rangle$ by Lemma 3, which is contained in $\langle\mathcal{G}_{2D-1,D},{\mathcal{T}_{D}}\rangle$ by Lemma 2. Since $\bar{x}_{a}$ has zero mean, while $Q_{a}^{(\vec{r})}(\bar{X})$ does not depend on $a$ since $r_{1}=0$ , applying an averaging filter to $q_{a}$ gives us a constant value $-Q_{a}^{(\vec{r})}(\bar{X})$ in each coordinate $a\in[n]$ , and so $Q^{(\vec{r})}(\bar{X})$ is in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle$ .

Now assume the claim is true for all $\vec{r}\in\Sigma_{D}$ which sum to $D$ , and whose first coordinate is smaller than some $r_{1}^{\prime}\geq 1$ , we now prove the claim is true when the first coordinate of $\vec{r}$ is equal to $r_{1}^{\prime}$ . The vector $\tilde{r}=(r_{2},\ldots,r_{K})$ obtained from $\vec{r}$ by removing the first coordinate, sums to $D^{\prime}=D-r_{1}^{\prime}<D$ , and so by the induction hypothesis on $D$ we know that $Q^{(\tilde{r})}\in\langle\mathcal{G}_{2D^{\prime},D^{\prime}},{\mathcal{T}_{D^{\prime}}}\rangle$ . By Lemma 3 we obtain a function $q_{a}\in\langle\mathcal{G}_{2D^{\prime}+1,D^{\prime}},{\mathcal{T}_{D}}\rangle\subseteq\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle$ defined by

	$\displaystyle q_{a}(X)$	$\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes r_{1}}\otimes Q_{b}^{(\tilde{r})}(\bar{X})$
		$\displaystyle=\sum_{b=1}^{n}(\bar{x}_{a}-\bar{x}_{b})^{\otimes r_{1}}\otimes\bar{x}_{b}^{\otimes r_{2}}\otimes\sum_{i_{3},\ldots,i_{K}=1}^{n}\bar{x}_{i_{3}}^{\otimes r_{3}}\otimes\ldots\otimes\bar{x}_{i_{K}}^{\otimes r_{K}}$
		$\displaystyle=Q_{a}^{(\vec{r})}(\bar{X})+\text{additional terms}$

where the additional terms are linear combinations of functions of the form $P_{D}Q_{a}^{(r^{\prime})}(\bar{X})$ where $r^{\prime}\in\Sigma_{D}$ and their first coordinate $r_{1}$ is smaller than $r_{1}^{\prime}$ , and $P_{D}:\mathcal{T}_{D}\to\mathcal{T}_{D}$ is a permutation. By the induction hypothesis on $r_{1}$ , each such $Q^{(r^{\prime})}$ is in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle$ . It follows that $P_{D}Q_{a}^{(r^{\prime})}(\bar{X}),a=1,\ldots,n$ , and thus $Q^{(\vec{r})}(\bar{X})$ , are in $\langle\mathcal{G}_{2D,D},{\mathcal{T}_{D}}\rangle$ as well. This concludes the proof of Theorem 2.

∎

Appendix E Alternative TFN architecture

In this appendix we show that replacing the standard TFN convolutional layer with the layer defined in equation 13:

\tilde{V}_{a}(X,V)=U({\bm{l}}_{f},{\bm{l}}_{i})\left(\theta_{1}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{b}+\theta_{2}\sum_{b=1}^{n}F(x_{a}-x_{b})\otimes V_{a}\right),

we can obtain $D$ -spanning networks using $2D$ consecutive convolutions with $1$ -filters (that is, filters in $W_{{\bm{l}}_{1}}$ , where ${\bm{l}}_{1}=[0,1]^{T}$ ). Our discussion here is somewhat informal, meant to provide the general ideas without delving into the details as we have done for the standard TFN architecture in the proof of Theorem 2. In the end of our discussion we will explain what is necessary to make this argument completely rigorous.

We will only need two fixed filters for our argument here: The first is the $1$ -filter $F_{Id}=(F^{(0)},F^{(1)})$ defined by setting $R^{(0)}(\|x\|)=0$ and $R^{(1)}(\|x\|)=\|x\|$ to obtain

F_{Id}(x)=\|x\|Y^{1}(\hat{x})=\|x\|\hat{x}=x.

The second is the filter $F_{1}$ defined by setting $R^{(0)}(\|x\|)=1$ and $R^{(1)}(\|x\|)=0$ , so that

F_{1}(x)=1.

We prove our claim by showing that a pair of convolutions with $1$ -filters can construct any convolutional layer defined in equation 7 for the $D$ -spanning architecture using tensor representations. The claim then follows from the fact that $D$ convolutions of the latter architecture suffice for achieving $D$ -spanning, as shown in Lemma 2.

Convolutions for tensor representations, defined in equation 7, are composed of two terms:

\tilde{V}_{a}^{\mathrm{tensor},1}(\bar{X},V)=\bar{x}_{a}\otimes V_{a}\text{ and }\tilde{V}_{a}^{\mathrm{tensor},2}(\bar{X},V)=\sum_{b=1}^{n}\bar{x}_{b}\otimes V_{b}.

To obtain the first term $\tilde{V}_{a}^{\mathrm{tensor},1}$ , we set $\theta_{1}=0,\theta_{2}=1/n,F=F_{Id}$ in equation 13 we obtain (the decomposition into irreducibles of) $\tilde{V}_{a}^{\mathrm{tensor},1}(\bar{X},V)=\bar{x}_{a}\otimes V_{a}$ . Thus this term can in fact be expressed by a single convolution. We can leave this outcome unchanged by a second convolution, defined by setting $\theta_{1}=0,\theta_{2}=1/n,F=F_{1}$ .

To obtain the second term $\tilde{V}_{a}^{\mathrm{tensor},2}$ , we apply a first convolution with $\theta_{1}=-1,F=F_{Id},\theta_{2}=0$ , to obtain

\sum_{b=1}^{n}(x_{b}-x_{a})\otimes V_{b}=\sum_{b=1}^{n}(\bar{x}_{b}-\bar{x}_{a})\otimes V_{b}=\tilde{V}_{a}^{\mathrm{tensor},2}(V,\bar{X})-\bar{x}_{a}\otimes\sum_{b=1}^{n}V_{b}

By applying an additional averaging filter, defined by setting $\theta_{1}=\frac{1}{n},F=F_{1},\theta_{2}=0$ , we obtain $\tilde{V}_{a}^{\mathrm{tensor},2}(V,\bar{X})$ . This concludes our ‘informal proof’.

Our discussion here has been somewhat inaccurate, since in practice $F_{Id}(x)=(0,x)\in W_{0}\oplus W_{1}$ and $F_{1}(x)=(1,0)\in W_{0}\oplus W_{1}$ . Moreover, in our proof we have glossed over the multiplication by the unitary matrix used to obtain decomposition into irreducible representations. However the ideas discussed here can be used to show that $2D$ convolutions with $1$ -filters can satisfy the sufficient condition for $D$ -spanning defined in Lemma 3. See our treatment of Theorem 2 for more details.

Appendix F Comparison with original TFN paper

In this Appendix we discuss three superficial differences between the presentation of the TFN architecture in Thomas et al. (2018) and our presentation here:

1.

We define convolutional layers between features residing in direct sums of irreducible representations, while (Thomas et al., 2018) focuses on features which inhabit a single irreducible representation. This difference is non-essential, as direct sums of irreducible representations can be represented as multiple channels where each feature inhabits a single irreducible representation.
2.

The term $\theta_{0}V_{a}$ in equation 12 appears in (Fuchs et al., 2020), but does not appear explicitly in (Thomas et al., 2018). However it can be obtained by concatenation of the input of a self-interaction layer to the output, and then applying a self-interaction layer.
3.

We take the scalar functions $R^{(\ell)}$ to be polynomials, while (Thomas et al., 2018) take them to be fully connected networks composed with radial basis functions. Using polynomial scalar bases is convenient for our presentation here since it enables exact expression of equivariant polynomials. Replacing polynomial bases with fully connected networks, we obtain approximation of equivariant polynomials instead of exact expression. It can be shown that if $p$ is a $G$ -equivariant polynomial which can be expressed by some network $\mathcal{F}_{C,D}$ defined with filters coming from a polynomial scalar basis, then $p$ can be approximated on a compact set $K$ , up to an arbitrary $\epsilon$ error, by a similar network with scalar functions coming from a sufficiently large fully connected network.

Appendix G Tensor Universality

In this section we show how to construct the complete set $\mathcal{F}_{\mathrm{pool}}$ of linear $\mathrm{SO}(3)$ invariant functionals from $W_{\mathrm{feat}}^{\mathcal{T}}=\bigoplus_{T=0}^{D}\mathcal{T}_{T}$ to $\mathbb{R}$ . Since each such functional $\Lambda$ is of the form

\Lambda(w_{0},\ldots,w_{D})=\sum_{T=0}^{D}\Lambda_{T}(w_{T}),

where each $\Lambda_{T}$ is $\mathrm{SO}(3)$ -invariant, it is sufficient to characterize all linear $\mathrm{SO}(3)$ -invariant functionals $\Lambda:\mathcal{T}_{D}\to\mathbb{R}$ .

It will be convenient to denote

W=\mathbb{R}^{3}\text{ and }W^{\otimes D}\cong\mathbb{R}^{3^{D}}=\mathcal{T}_{D}.

We achieve our characterization using the bijective correspondence between linear functional $\Lambda:W^{\otimes D}\to\mathbb{R}$ and multi-linear functions $\tilde{\Lambda}:W^{D}\to\mathbb{R}$ : each such $\Lambda$ corresponds to a unique $\hat{\Lambda}$ , such that

\tilde{\Lambda}(e_{i_{1}},\ldots,e_{i_{D}})=\Lambda(e_{i_{1}}\otimes\ldots\otimes e_{i_{D}}),\,\forall(i_{1},\ldots,i_{D})\in[3]^{D},

(22)

where $e_{1},e_{2},e_{3}$ denote the standard basis elements of $\mathbb{R}^{3}$ . We define a spanning set of equivariant linear functionals on $W^{\otimes D}$ via a corresponding characterization for multi-linear functionals on $W^{D}$ . Specifically, set

K_{D}=\{k\in\mathbb{N}_{+}|\,D-3k\text{ is even and non-negative. }\}

For $k\in K_{D}$ we define a multi-linear functional:

	$\displaystyle\tilde{\Lambda}_{k}(w_{1},\ldots,w_{D})$	$\displaystyle=\det(w_{1},w_{2},w_{3})\times\ldots\times\det(w_{3k-2},w_{3k-1},w_{3k})\times\langle w_{3k+1},w_{3k+2}\rangle\times\ldots$
		$\displaystyle\times\langle w_{D-1},w_{D}\rangle,$		(23)

and for $(k,\sigma)\in K_{D}\times S_{D}$ we define

\tilde{\Lambda}_{k,\sigma}(w_{1},\ldots,w_{D})=\tilde{\Lambda}_{k}(w_{\sigma(1)},\ldots,w_{\sigma(D)})

(24)

Proposition 1.

The space of linear invariant functions from $\mathcal{T}_{D}$ to $\mathbb{R}$ is spanned by the set of linear invariant functionals $\lambda_{D}=\{\Lambda_{k,\sigma}|\,(k,\sigma)\in K_{D}\times S_{D}\}$ induced by the multi-linear functional $\tilde{\Lambda}_{k,\sigma}$ described in equation G and equation 24

We note that (i) equation 22 provides a (cumbersome) way to compute all linear invariant functionals $\Lambda_{k,\sigma}$ explicitly by evaluating the corresponding $\tilde{\Lambda}_{k,\sigma}$ on the $3^{D}$ elements of the standard basis and (ii) the set $\lambda_{D}$ is spanning, but is not linearly independent. For example, since $\langle w_{1},w_{2}\rangle=\langle w_{2},w_{1}\rangle$ , the space of $\mathrm{SO}(3)$ invariant functionals on $\mathcal{T}_{2}=W^{\otimes 2}$ is one dimensional while $|\lambda_{2}|=2$ .

Proof of Proposition 1.

We first show that the bijective correspondence between linear functional $\Lambda:W^{\otimes D}\to\mathbb{R}$ and multi-linear functions $\tilde{\Lambda}:W^{D}\to\mathbb{R}$ , extends to a bijective correspondence between $\mathrm{SO}(3)$ -invariant linear/multi-linear functionals. The action of $\mathrm{SO}(3)$ on $W^{D}$ is defined by

\tilde{\rho}(R)(w_{1},\ldots,w_{D})=(Rw_{1},\ldots,Rw_{D}).

The action $\rho(R)=R^{\otimes D}$ of $\mathrm{SO}(3)$ on $W^{\otimes D}$ is such that the map

(w_{1},\ldots,w_{D})\mapsto w_{1}\otimes w_{2}\ldots w_{D}

is $\mathrm{SO}(3)$ - equivariant. It follows that if $\tilde{\Lambda}$ and $\Lambda$ satisfy equation 22, then for all $R\in\mathrm{SO}(3)$ , the same equation holds for the pair $\tilde{\Lambda}\circ\tilde{\rho}(R)$ and $\Lambda\circ\rho(R)$ . Thus $\mathrm{SO}(3)$ -invariance of $\tilde{\Lambda}$ is equivalent to $\mathrm{SO}(3)$ -invariance of $\Lambda$ .

Multi-linear functionals on $W^{D}$ invariant to $\tilde{\rho}$ are a subset of the set of polynomials on $W^{D}$ invariant to $\tilde{\rho}$ . It is known (see (Kraft & Procesi, 2000), page 114), that all such polynomials are algebraically generated by functions of the form

\det(w_{i_{1}},w_{i_{2}},w_{i_{3}})\text{ and }\langle w_{j_{1}},w_{j_{2}}\rangle,\text{ where }i_{1},i_{2},i_{3},j_{1},j_{2}\in[D].

Equivalently, $\mathrm{SO}(3)$ -invariant polynomials are spanned by linear combinations of polynomials of the form

\det(w_{i_{1}},w_{i_{2}},w_{i_{3}})\det(w_{i_{4}},w_{i_{5}},w_{i_{6}})\ldots\langle w_{j_{1}},w_{j_{2}}\rangle\langle w_{j_{3}},w_{j_{4}}\rangle\ldots.

(25)

When considering the subset of multi-linear invariant polynomials, we see that they must be spanned by polynomials as in equation 25, where each $w_{1},\ldots,w_{D}$ appears exactly once in each polynomial in the spanning set. These precisely correspond to the functions in $\lambda_{D}$ .

∎

On the Universality of Rotation Equivariant Point Cloud Networks

Abstract

1 Introduction

Theorem (Simplification of Theorem 2).

2 Previous work

3 A framework for proving universality

3.1 Mathematical setup

3.2 Conditions for universality

The semi-lifted approach

Conditions for universality

Lemma 1.

Definition 1 (DD-spanning).

Definition 2 (Linear universality).

Theorem 1.

Corollary 1.

3.3 Sufficient conditions in action

4 Universality with tensor representations

Lemma 1.

Lemma 2.

Lemma 3.

5 Universality with irreducible representations

5.1 Irreducible representations of S​O​(3)SO(3)

Lemma 1.

5.2 Tensor field networks

Theorem 2.

Alternative architecture

6 Conclusion

Acknowledgements

References

Appendix A Notation

Appendix B Proofs for Section 3

B.1 GG-equivariant polynomials are dense

Lemma 1.

Proof.

Proof of Lemma 1.

Lemma 2.

Proof.

Appendix C Proofs for Section 4

Proof.

Proof.

Proof.

Appendix D Proofs for Section 5

Proof.

Lemma 1 (Schur’s Lemma for SO​(3)\mathrm{SO}(3)).

Proof.

Proof.

Lemma 2.

Proof.

Lemma 3.

Proof.

Appendix E Alternative TFN architecture

Appendix F Comparison with original TFN paper

Appendix G Tensor Universality

Proposition 1.

Proof of Proposition 1.

Definition 1 ( $D$ -spanning).

5.1 Irreducible representations of $SO(3)$

B.1 $G$ -equivariant polynomials are dense

Lemma 1 (Schur’s Lemma for $\mathrm{SO}(3)$ ).