On the Universality of Rotation Equivariant Point Cloud Networks
Abstract
Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation. In this paper, we present a first study of the approximation power of these architectures. We first derive two sufficient conditions for an equivariant architecture to have the universal approximation property, based on a novel characterization of the space of equivariant polynomials. We then use these conditions to show that two recently suggested models (Thomas et al., 2018; Fuchs et al., 2020) are universal, and for devising two other novel universal architectures.
1 Introduction
Designing neural networks that respect data symmetry is a powerful approach for obtaining efficient deep models. Prominent examples being convolutional networks which respect the translational invariance of images, graph neural networks which respect the permutation invariance of graphs (Gilmer et al., 2017; Maron et al., 2019b), networks such as (Zaheer et al., 2017; Qi et al., 2017a) which respect the permutation invariance of sets, and networks which respect 3D rotational symmetries (Cohen et al., 2018; Weiler et al., 2018; Esteves et al., 2018; Worrall & Brostow, 2018; Kondor et al., 2018a).
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/514df3c2-f687-41c9-b21e-1948160d8f2d/universality.png)
While the expressive power of equivariant models is reduced by design to include only equivariant functions, a desirable property of equivariant networks is universality: the ability to approximate any continuous equivariant function. This is not always the case: while convolutional networks and networks for sets are universal (Yarotsky, 2018; Segol & Lipman, 2019), popular graph neural networks are not (Xu et al., 2019; Morris et al., 2018).
In this paper, we consider the universality of networks that respect the symmetries of 3D point clouds: translations, rotations, and permutations. Designing such networks is a popular paradigm in recent years (Thomas et al., 2018; Fuchs et al., 2020; Poulenard et al., 2019; Zhao et al., 2019). While there have been many works on the universality of permutation invariant networks (Zaheer et al., 2017; Maron et al., 2019c; Keriven & Peyré, 2019), and a recent work discussing the universality of rotation equivariant networks (Bogatskiy et al., 2020), this is a first paper which discusses the universality of networks which combine rotations, permutations and translations.
We start the paper with a general, architecture-agnostic, discussion, and derive two sufficient conditions for universality. These conditions are a result of a novel characterization of equivariant polynomials for the symmetry group of interest. We use these conditions in order to prove universality of the prominent Tensor Field Networks (TFN) architecture (Thomas et al., 2018; Fuchs et al., 2020). The following is a weakened and simplified statement of Theorem 2 stated later on in the paper:
Theorem (Simplification of Theorem 2).
Any continuous equivariant function on point clouds can be approximated uniformly on compact sets by a composition of TFN layers.
We use our general discussion to prove the universality of two additional equivariant models: the first is a simple modification of the TFN architecture which allows for universality using only low dimensional filters. The second is a minimal architecture which is based on tensor product representations, rather than the more commonly used irreducible representations of . We discuss the advantages and disadvantages of both approaches.
To summarize, the contributions of this paper are: (1) A general approach for proving the universality of rotation equivariant models for point clouds; (2) A proof that two recent equivariant models (Thomas et al., 2018; Fuchs et al., 2020) are universal; (3) Two additional simple and novel universal architectures.
2 Previous work
Deep learning on point clouds. (Qi et al., 2017a; Zaheer et al., 2017) were the first to apply neural networks directly to the raw point cloud data, by using pointwise functions and pooling operations. Many subsequent works used local neighborhood information (Qi et al., 2017b; Wang et al., 2019; Atzmon et al., 2018). We refer the reader to a recent survey for more details (Guo et al., 2020). In contrast with the aforementioned works which focused solely on permutation invariance, more related to this paper are works that additionally incorporated invariance to rigid motions. (Thomas et al., 2018) proposed Tensor Field Networks (TFN) and showed their efficacy on physics and chemistry tasks.(Kondor et al., 2018b) also suggested an equivariant model for continuous rotations. (Li et al., 2019) suggested models that are equivariant to discrete subgroups of . (Poulenard et al., 2019) suggested an invariant model based on spherical harmonics. (Fuchs et al., 2020) followed TFN and added an attention mechanism. Recently, (Zhao et al., 2019) proposed a quaternion equivariant point capsule network that also achieves rotation and translation invariance.
Universal approximation for invariant networks. Understanding the approximation power of invariant models is a popular research goal. Most of the current results assume that the symmetry group is a permutation group. (Zaheer et al., 2017; Qi et al., 2017a; Segol & Lipman, 2019; Maron et al., 2020; Serviansky et al., 2020) proved universality for several -invariant and equivariant models. (Maron et al., 2019b; a; Keriven & Peyré, 2019; Maehara & NT, 2019) studied the approximation power of high-order graph neural networks. (Maron et al., 2019c; Ravanbakhsh, 2020) targeted universality of networks that use high-order representations for permutation groups(Yarotsky, 2018) provided several theoretical constructions of universal equivariant neural network models based on polynomial invariants, including an equivariant model. In a recent work (Bogatskiy et al., 2020) presented a universal approximation theorem for networks that are equivariant to several Lie groups including . The main difference from our paper is that we prove a universality theorem for a more complex group that besides rotations also includes translations and permutations.
3 A framework for proving universality
In this section, we describe a framework for proving the universality of equivariant networks. We begin with some mathematical preliminaries:
3.1 Mathematical setup
An action of a group on a real vector space is a collection of maps defined for any , such that for all , and the identity element of is mapped to the identity mapping on . We say is a representation of if is a linear map for every . As is customary, when it does not cause confusion we often say that itself is a representation of .
In this paper, we are interested in functions on point clouds. Point clouds are sets of vectors in arranged as matrices:
Many machine learning tasks on point clouds, such as classification, aim to learn a function which is invariant to rigid motions and relabeling of the points. Put differently, such functions are required to be invariant to the action of on via
(1) |
where defines a translation, is a rotation and is a permutation matrix.
Equivariant functions are generalizations of invariant functions: If acts on via some action , and on via some other group action , we say that a function is equivariant if
Invariant functions correspond to the special case where is the identity mapping for all .
In some machine learning tasks on point clouds, the functions learned are not invariant but rather equivariant. For example, segmentation tasks assign a discrete label to each point. They are invariant to translations and rotations but equivariant to permutations – in the sense that permuting the input causes a corresponding permutation of the output. Another example is predicting a normal for each point of a point cloud. This task is invariant to translations but equivariant to both rotations and permutations.
In this paper, we are interested in learning equivariant functions from point clouds into , where is some representation of . The equivariance of these functions is with respect to the action on point clouds defined in equation 1, and the action of on defined by applying the rotation action from the left and permutation action from the right as in 1, but ‘ignoring’ the translation component. Thus, -equivariant functions will be translation invariant. This formulation of equivariance includes the normals prediction example by taking , as well as the segmentation case by setting with the trivial identity representation. We focus on the harder case of functions into which are equivariant to permutations, since it easily implies the easier case of permutation invariant functions to .
Notation. We use the notation and . We set and .
Proofs. Proofs appear in the appendices, arranged according to sections.
3.2 Conditions for universality
The semi-lifted approach
In general, highly expressive equivariant neural networks can be achieved by using a ‘lifted approach’, where intermediate features in the network belong to high dimensional representations of the group. In the context of point clouds where typically , many papers, e.g., (Thomas et al., 2018; Kondor, 2018; Bogatskiy et al., 2020) use a ‘semi-lifted’ approach, where hidden layers hold only higher dimensional representations of , but not high order permutation representations. In this subsection, we propose a strategy for achieving universality with the semi-lifted approach.
We begin by an axiomatic formulation of the semi-lifted approach (see illustration in inset): we assume that our neural networks are composed of two main components: the first component is a family of parametric continuous -equivariant functions which map the original point cloud to a semi-lifted point cloud , where is a lifted representation of .
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/514df3c2-f687-41c9-b21e-1948160d8f2d/network_parts.png)
The second component is a family of parametric linear -equivariant functions , which map from the high order representation down to the target representation . Each such –equivariant function can be extended to a equivariant function by applying elementwise. For every positive integer , these two families of functions induce a family of functions obtained by summing different compositions of these functions:
(2) |
Conditions for universality
We now describe sufficient conditions for universality using the semi-lifted approach. The first step is showing, as in (Yarotsky, 2018), that continuous -equivariant functions can be approximated by -equivariant polynomials .
Lemma 1.
Any continuous -equivariant function in can be approximated uniformly on compact sets by -equivariant polynomials in .
Universality is now reduced to the approximation of -equivariant polynomials. We provide two sufficient conditions which guarantee that -equivariant polynomials of degree can be expressed by function spaces as defined in equation 2.
The first sufficient condition is that is able to represent all polynomials which are translation invariant and permutation-equivariant (but not necessarily - equivariant). More precisely:
Definition 1 (-spanning).
For , let be a subset of . We say that is -spanning, if there exist , such that every polynomial of degree which is invariant to translations and equivariant to permutations, can be written as
(3) |
where are all linear functionals, and are the functions defined by elementwise applications of .
In Lemma 3 we provide a concrete condition which implies -spanning.
The second sufficient condition is that the set of linear equivariant functionals contains all possible equivariant linear functionals:
Definition 2 (Linear universality).
We say that a collection of equivariant linear functionals between two representations and of is linearly universal, if it contains all linear -equivariant mappings between the two representations.
When these two necessary conditions apply, a rather simple symmetrization arguments leads to the following theorem:
Theorem 1.
If is -spanning and is linearly universal, then there exists some such that for all the function space contains all -equivariant polynomials of degree .
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/514df3c2-f687-41c9-b21e-1948160d8f2d/function_spaces.png)
Corollary 1.
For all , let denote function spaces generated by a pair of functions spaces which are -spanning and linearly universal as in equation 2. Then any continuous -equivariant function in can be approximated uniformly on compact sets by equivariant functions in
3.3 Sufficient conditions in action
In the remainder of the paper, we prove the universality of several -equivariant architectures, based on the framework we discussed in the previous subsection. We discuss two different strategies for achieving universality, which differ mainly in the type of lifted representations of they use: (i) The first strategy uses (direct sums of) tensor-product representations; (ii) The second uses (direct sums of) irreducible representations. The main advantage of the first strategy from the perspective of our methodology is that achieving the -spanning property is more straightforward. The advantage of irreducible representations is that they almost automatically guarantees the linear universality property.
In Section 4 we discuss universality through tensor product representations, and give an example of a minimal tensor representation network architecture that would satisfy universality. In section 5 we discuss universality through irreducible representations, which is currently the more common strategy. We show that the TFN architecture (Thomas et al., 2018; Fuchs et al., 2020) which follows this strategy is universal, and describe a simple tweak that achieves universality using only low order filters, though the representations throughout the network are high dimensional.
4 Universality with tensor representations
In this section, we prove universality for models that are based on tensor product representations, as defined below. The main advantage of this approach is that -spanning is achieved rather easily. The main drawbacks are that its data representation is somewhat redundant and that characterizing the linear equivariant layers is more laborious.
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/514df3c2-f687-41c9-b21e-1948160d8f2d/tensor_prod_rep.png)
Tensor representations We begin by defining tensor representations. For denote . acts on by the tensor product representation, i.e., by applying the matrix Kronecker product times: . The inset illustrates the vector spaces and action for . With this action, for any , the map from to defined by
(4) |
is equivariant.
A -spanning family We now show that tensor representations can be used to define a finite set of -spanning functions. The lifted representation will be given by
The -spanning functions are indexed by vectors , where each is a non-negative integer. The functions are defined for fixed by
(5) |
The functions are equivariant as they are a sum of equivariant functions from equation 4. Thus is equivariant. The motivation behind the definition of these functions is that known characterizations of permutation equivariant polynomials (Segol & Lipman, 2019) tell us that the entries of these tensor valued functions span all permutation equivariant polynomials (see the proof of Lemma 1 for more details). To account for translation invariance, we compose the functions with the centralization operation and define the set of functions
(6) |
where is the natural embedding that takes each into . In the following lemma, we prove that this set is -spanning.
Lemma 1.
For every , the set is -spanning.
A minimal universal architecture Once we have shown that is -spanning, we can design -spanning architectures, by devising architectures that are able to span all elements of . As we will now show, the compositional nature of neural networks allows us to do this in a very clean manner.
We define a parametric function which maps to as follows: For all , we have , where
(7) |
We denote the set of functions obtained by choosing the parameters , by . While in the hidden layers of our network the data is represented using both coordinates , the input to the network only contains an coordinate and the output only contains a coordinate. To this end, we define the functions
(8) |
We can achieve -spanning by composition of functions in with these functions and centralizing:
Lemma 2.
The function set is contained in
(9) |
Thus is -spanning.
To complete the construction of a universal network, we now need to characterize all linear equivariant functions from to the target representation . In Appendix G we show how this can be done for the trivial representation . This characterization gives us a set of linear function , which combined with defined in equation 9 (corresponds to invariant functions) gives us a universal architecture as in Theorem 1. However, the disadvantage of this approach is that implementation of the linear functions in is somewhat cumbersome.
In the next section we discuss irreducible representations, which give us a systematic way to address linear equivariant mappings into any . Proving -spanning for these networks is accomplished via the -spanning property of tensor representations, through the following lemma
Lemma 3.
If all functions in can be written as
where , and is defined by elementwise application of , then is -spanning.
5 Universality with irreducible representations
In this section, we discuss how to achieve universality when using irreducible representations of . We will begin by defining irreducible representations, and explaining how linear universality is easily achieved by them, while the -spanning properties of tensor representations can be preserved. This discussion can be seen as an interpretation of the choices made in the construction of TFN and similar networks in the literature. We then show that these architectures are indeed universal.
5.1 Irreducible representations of
In general, any finite-dimensional representation of a compact group can be decomposed into irreducible representations: a subspace is -invariant if for all . A representation is irreducible if it has no non-trivial invariant subspaces. In the case of , all irreducible real representations are defined by matrices , called the real Wigner D-matrices, acting on by matrix multiplication. In particular, the representation for are and .
Linear maps between irreducible representations As mentioned above, one of the main advantages of using irreducible representations is that there is a very simple characterization of all linear equivariant maps between two direct sums of irreducible representations. We use the notation for direct sums of irreducible representations, where and .
Lemma 1.
Let and . A function is a linear equivariant mapping between and , if and only if there exists a matrix with whenever , such that
(10) |
where and for all .
We note that this lemma, which is based on Schur’s lemma, was proven in the complex setting in (Kondor, 2018). Here we observe that it holds for real irreducible representations of as well since their dimension is always odd.
Clebsch-Gordan decomposition of tensor products As any finite-dimensional representation of can be decomposed into a direct sum of irreducible representations, this is true for tensor representations as well. In particular, the Clebsch-Gordan coefficients provide an explicit formula for decomposing the tensor product of two irreducible representations and into a direct sum of irreducible representations. This decomposition can be easily extended to decompose the tensor product into a direct sum of irreducible representations, where are now vectors. In matrix notation, this means there is a unitary linear equivariant mapping of onto , where the explicit values of and the matrix can be inferred directly from the case where and are scalars.
By repeatedly taking tensor products and applying Clebsch-Gordan decompositions to the result, TFN and similar architectures can achieve the -spanning property in a manner analogous to tensor representations, and also enjoy linear universality since they maintain irreducible representations throughout the network.
5.2 Tensor field networks
We now describe the basic layers of the TFN architecture (Thomas et al., 2018), which are based on irreducible representations, and suggest an architecture based on these layers which can approximate -equivariant maps into any representation . There are some superficial differences between our description of TFN and the description in the original paper, for more details see Appendix F.
We note that the universality of TFN also implies the universality of (Fuchs et al., 2020), which is a generalization of TFN that enables adding an attention mechanism. Assuming the attention mechanism is not restricted to local neighborhoods, this method is at least as expressive as TFN.
TFNs are composed of three types of layers: (i) Convolution (ii) Self-interaction and (iii) Non-linearities. In our architecture, we only use the first two layers types, which we will now describe:111Since convolution layers in TFN are not linear, the nonlinearities are formally redundant.
Convolution. Convolutional layers involve taking tensor products of a filter and a feature vector to create a new feature vector, and then decomposing into irreducible representations. Unlike in standard CNN, a filter here depends on the input, and is a function , where . The -th component of the filter will be given by
(11) |
where if and otherwise, are spherical harmonics, and any polynomial of degree . In Appendix F we show that these polynomial functions can be replaced by fully connected networks, since the latter can approximate all polynomials uniformly.
The convolution of an input feature and a filter as defined above, will give an output feature , where , which is given by
(12) |
More formally we will think of convolutional layer as functions of the form . These functions are defined by a choice of , a choice of a scalar polynomial , and a choice of the parameter in equation 12. We denote the set of all such functions by .
Self Interaction layers. Self interaction layers are linear functions from , which are obtained from elementwise application of equivariant linear functions . These linear functions can be specified by a choice of matrix with the sparsity pattern described in Lemma 1.
Network architecture. For our universality proof, we suggest a simple architecture which depends on two positive integer parameters : For given , we will define as the set of function obtained by recursive convolutions
where and are defined as in equation 8. The output of a function in is in , for some which depends on . We then define to be the self-interaction layers which map to . This choice of and , together with a choice of the number of channels , defines the final network architecture as in equation 2. In the appendix we prove the universality of TFN:
Theorem 2.
For all ,
-
1.
For , every -equivariant polynomial of degree is in .
-
2.
Every continuous -equivariant function can be approximated uniformly on compact sets by functions in
As discussed previously, the linear universality of is guaranteed. Thus proving Theorem 2 amounts to showing that is -spanning. This is done using the sufficient condition for -spanning defined in Lemma 3.
Alternative architecture
The complexity of the TFN network used to construct -equivariant polynomials of degree , can be reduced using a simple modifications of the convolutional layer in equation 12: We add two parameters to the convolutional layer, which is now defined as:
(13) |
With this simple change, we can show that is -spanning even if we only take filters of order and throughout the network. This is shown in Appendix E.
6 Conclusion
In this paper, we have presented a new framework for proving the universality of -equivariant point cloud networks. We used this framework for proving the universality of the TFN model (Thomas et al., 2018; Fuchs et al., 2020), and for devising two additional novel simple universal architectures.
We believe that the framework we developed here will be useful for proving the universality of other -equivariant models for point cloud networks, and other related equivariant models. We note that while the discussion in the paper was limited to point clouds in and to the action of , large parts of it are relevant to many other setups involving -dimensional point clouds with symmetry groups of the form on , where can be any compact topological group.
Acknowledgements
The authors would like to thank Taco Cohen for helpful discussions. N.D. acknowledge the support of Simons Math+X Investigators Award 400837.
References
- Atzmon et al. (2018) Matan Atzmon, Haggai Maron, and Yaron Lipman. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018.
- Bogatskiy et al. (2020) Alexander Bogatskiy, Brandon Anderson, Jan T Offermann, Marwah Roussi, David W Miller, and Risi Kondor. Lorentz group equivariant neural network for particle physics. arXiv preprint arXiv:2006.04780, 2020.
- Cohen et al. (2018) Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018.
- Dai & Xu (2013) Feng Dai and Yuan Xu. Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
- Esteves et al. (2018) Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–68, 2018.
- Fuchs et al. (2020) Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. arXiv preprint arXiv:2006.10503, 2020.
- Fulton & Harris (2013) William Fulton and Joe Harris. Representation theory: a first course, volume 129. Springer Science & Business Media, 2013.
- Gilmer et al. (2017) Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.
- Guo et al. (2020) Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. CoRR, abs/1905.04943, 2019. URL http://arxiv.org/abs/1905.04943.
- Kondor (2018) Risi Kondor. N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588, 2018.
- Kondor et al. (2018a) Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems, pp. 10117–10126, 2018a.
- Kondor et al. (2018b) Risi Kondor, Hy Truong Son, Horace Pan, Brandon Anderson, and Shubhendu Trivedi. Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144, 2018b.
- Kraft & Procesi (2000) Hanspeter Kraft and Claudio Procesi. Classical invariant theory, a primer. Lecture Notes, Version, 2000.
- Li et al. (2019) Jiaxin Li, Yingcai Bi, and Gim Hee Lee. Discrete rotation equivariance for point cloud recognition. In 2019 International Conference on Robotics and Automation (ICRA), pp. 7269–7275. IEEE, 2019.
- Maehara & NT (2019) Takanori Maehara and Hoang NT. A simple proof of the universality of invariant/equivariant graph neural networks, 2019.
- Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. arXiv preprint arXiv:1905.11136, 2019a.
- Maron et al. (2019b) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b. URL https://openreview.net/forum?id=Syx72jC9tm.
- Maron et al. (2019c) Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. In International conference on machine learning, 2019c.
- Maron et al. (2020) Haggai Maron, Or Litany, Gal Chechik, and Ethan Fetaya. On learning sets of symmetric elements. arXiv preprint arXiv:2002.08599, 2020.
- Morris et al. (2018) Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. arXiv preprint arXiv:1810.02244, 2018.
- Poulenard et al. (2019) Adrien Poulenard, Marie-Julie Rakotosaona, Yann Ponty, and Maks Ovsjanikov. Effective rotation-invariant point cnn with spherical harmonics kernels. In 2019 International Conference on 3D Vision (3DV), pp. 47–56. IEEE, 2019.
- Qi et al. (2017a) Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017a.
- Qi et al. (2017b) Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108, 2017b.
- Ravanbakhsh (2020) Siamak Ravanbakhsh. Universal equivariant multilayer perceptrons. arXiv preprint arXiv:2002.02912, 2020.
- Segol & Lipman (2019) Nimrod Segol and Yaron Lipman. On universal equivariant set networks. arXiv preprint arXiv:1910.02421, 2019.
- Serviansky et al. (2020) Hadar Serviansky, Nimrod Segol, Jonathan Shlomi, Kyle Cranmer, Eilam Gross, Haggai Maron, and Yaron Lipman. Set2graph: Learning graphs from sets. arXiv preprint arXiv:2002.08772, 2020.
- Thomas et al. (2018) Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- Wang et al. (2019) Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
- Weiler et al. (2018) Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco Cohen. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data. 2018. URL http://arxiv.org/abs/1807.02547.
- Worrall & Brostow (2018) Daniel Worrall and Gabriel Brostow. Cubenet: Equivariance to 3d rotation and translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 567–584, 2018.
- Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
- Yarotsky (2018) Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018.
- Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp. 3391–3401, 2017.
- Zhao et al. (2019) Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, and Federico Tombari. Quaternion equivariant capsule networks for 3d point clouds. arXiv preprint arXiv:1912.12098, 2019.
Appendix A Notation
We introduce some notation for the proofs in the appendices. We use the shortened notation and denote the columns of by . We denote
Appendix B Proofs for Section 3
B.1 -equivariant polynomials are dense
A first step in proving denseness of -equivariance polynomials, and in the proof used in the next subsection is the following simple lemma, which shows that translation invariance can be dealt with simply by centralizing the point cloud.
In the following, is some representation of on a finite dimensional real vector space . this induces an action of on by
This is also the action of which we consider, , where we have invariance with respect to the translation coordinate. The action of on is defined in equation 1.
Lemma 1.
A function is -equivariant, if and only if there exists a function which is equivariant with respect to the action of on , and
(14) |
Proof.
Recall that -equivariance means equivariance and translation invariance. Thus if is -equivariant then equation 14 holds with .
On the other hand, if satisfies equation 14 then we claim it is -equivariant. Indeed, for all , since ,
∎
We now prove denseness of -equivariant polynomials in the space of -invariant continuous functions (Lemma 1). See 1
Proof of Lemma 1.
Let be a compact set. We need to show that continuous -equivariant functions can be approximated uniformly in by -equivariant polynomials. Let denote the compact set which is the image of under the centralizing map . By Lemma 1, it is sufficient to show that every equivariant continuous function can be approximated uniformly on by a sequence of equivariant polynomials . The argument is now concluded by the following general lemma:
Lemma 2.
Let be a compact group, Let and be continuous222By this we mean that the maps are jointly continuousrepresentations of on the Euclidean spaces and . Let be a compact set. Then every equivariant function can be approximated uniformly on by a sequence of equivariant polynomials .
Let be the Haar probability measure associated with the compact group . Let denote the compact set obtained as an image of the compact set under the continuous mapping
Using the Stone-Weierstrass theorem, let be a sequence of (not necessarily equivariant) polynomials which approximate uniformly on . Every degree polynomial induces a -equivariant function
This function is a degree polynomial as well: This is because can be approximated uniformly on by “Riemann Sums” of the form which are degree polynomials, and because degree polynomials are closed in .
Now for all , continuity of the function implies that the operator norm of is bounded uniformly by some constant , and so
∎
Proof.
By the -spanning assumption, there exist such that any vector valued polynomial invariant to translations and equivariant to permutations is of the form
(15) |
where are linear functions to . If is a matrix value polynomial mapping to , which is invariant to translations and equivariant to permutations, then it is of the form , and each is itself invariant to translations and permutation equivariant. It follows that matrix valued can also be written in the form equation 15, the only difference being that the image of the linear functions is now .
Now let be a -equivariant polynomial of degree . It remains to show that we can choose to be equivariant. We do this by a symmetrization argument: denote the Haar probability measure on by , and the action of on and by and respectively Denote and . For every , we use the equivariance of and to obtain
where stands for the equivariant linear functional from to , defined for by
Thus we have shown that is in for , as required. ∎
Appendix C Proofs for Section 4
Proof.
It is known (Segol & Lipman, 2019) (Theorem 2) that polynomials which are -equivariant, are spanned by polynomials of the form , defined as
(16) |
where and each is a multi-index. It follows that -equivariant polynomials of degree are spanned by polynomials of the form where . Denoting , the sum of all by , and , we see that that exists a linear functional such that
where we recall that is defined in equation 5 as
Thus polynomials which are of degree , and are equivariant, can be written as
where , and is the left inverse of the embedding . If is also translation invariant, then
Thus is -spanning. ∎
Proof.
In this proof we make the dependence of on explicit and denote .
We prove the claim by induction on . Assume . Then contains only the constant function , and this is precisely the function .
Now assume the claim holds for all with , and prove the claim for . Choose for some , we need to show that the function is in . Since we know from the induction hypothesis that this is true if . Now assume . We consider two cases:
-
1.
If , we set . We know that by the induction hypothesis. So there exist such that
(17) Now choose to be the function whose coordinate , is given by , obtained by setting in equation 7. Then , we have
and so
(18) and .
-
2.
If . We assume without loss of generality that . Set . As before by the induction hypothesis there exist which satisfy equation 17. This time we choose to be the function whose coordinate , is given by , obtained by setting in equation 7. Then we have
Thus equation 18 holds, and so again we have that .
∎
Proof.
If the conditions in Lemma 3 hold, then since is -spanning, every translation invariant and permutation equivariant polynomials of degree can be written as
where we denote . Thus we proved is -spanning. ∎
Appendix D Proofs for Section 5
Proof.
As mentioned in the main text, this lemma is based on Schur’s lemma. This lemma is typically stated for complex representations, but holds for odd dimensional real representation as well. We recount the lemma and its proof here for completeness (see also (Fulton & Harris, 2013)).
Lemma 1 (Schur’s Lemma for ).
Let be a linear equivariant map. If then . Otherwise is a scalar multiply of the identity.
Proof.
Let be a linear equivariant map. The image and kernel of are invariant subspaces of and , respectively. It follows that if then is a linear isomorphism so necessarily . Now assume . Since the dimension of is odd, has a real eigenvalue . The linear function is equivariant and has a non-trivial kernel, so . ∎
We now return to the proof of Lemma 1. Note that each is linear and equivariant. Next denote the restrictions of each to by , and note that
(19) |
By considering vectors in of the form we see that each is linear and -equivariant. Thus by Schur’s lemma, if then for some real , and otherwise . Plugging this into equation 19 we obtain equation 10.
∎
Proof.
As mentioned in the main text, we only need to show that the function space is -spanning. Recall that is obtained by consecutive convolutions with -filters. In general, we denote the space of functions defined by applying consecutive convolutions by .
If is a space of functions from , we denote by the space of all functions of the form
(20) |
where are linear functions, are induced by elementwise application, and . This notation is useful because: (i) by Lemma 3 it is sufficient to show that is in for all and all , and because (ii) it enables comparison of the expressive power of function spaces whose elements map to different spaces , since the elements in both map to the same space. In particular, note that if for every there is a and a linear map such that , then .
We now use this abstract discussion to prove some useful results: the first is that for the purpose of this lemma, we can ‘forget about’ the multiplication by a unitary matrix in equation 12, used for decomposition into irreducible representations: To see this, denote by the function space obtained by taking consecutive convolutions with -filters without multiplying by a unitary matrix in equation 12. Since Kronecker products of unitary matrices are unitary matrices, we obtain that the elements in and differ only by multiplication by a unitary matrix, and thus and , so both sets are equal.
Next, we prove that adding convolutional layers (enlarging ) or taking higher order filters (enlarging ) can only increase the expressive power of a network.
Lemma 2.
For all ,
-
1.
.
-
2.
.
Proof.
The first claim follows from the fact that every function in can be identified with a function in by taking the convolutional layer in equation 12 with .
The second claim follows from the fact that -filters can be identified with -filters whose -th entry is . ∎
The last preliminary lemma we will need is
Lemma 3.
For every , and every , if , then the function defined by
is in .
Proof.
This lemma is based on the fact that the space of homogeneous polynomial on is spanned by polynomials of the form for (Dai & Xu, 2013). For each such , and , these polynomials can be realized by filters by setting so that
For every and , we can construct a -filter where are as defined above and the other filters are zero. Since both the entries of , and the entries of , span the space of -homogeneous polynomials on , it follows that there exists a linear mapping so that
(21) |
As a final preliminary, we note that -filters can perform an averaging operation by setting and in equation 11 and equation 12 . We call this -filter an averaging filter.
We are now ready to prove our claim: we need to show that for every where , for every , the function is in . Note that due to the inclusion relations in Lemma 2 it is sufficient to prove this for the case . We prove this by induction on . For , vectors contains only zeros and so
We now assume the claim is true for all with and prove the claim is true for . We need to show that for every the function is in . We prove this yet again by induction, this time on the value of : assume that and .. Denote by the vector in defined by
By the induction assumption on , we know that and so
is in by Lemma 3, which is contained in by Lemma 2. Since has zero mean, while does not depend on since , applying an averaging filter to gives us a constant value in each coordinate , and so is in .
Now assume the claim is true for all which sum to , and whose first coordinate is smaller than some , we now prove the claim is true when the first coordinate of is equal to . The vector obtained from by removing the first coordinate, sums to , and so by the induction hypothesis on we know that . By Lemma 3 we obtain a function defined by
where the additional terms are linear combinations of functions of the form where and their first coordinate is smaller than , and is a permutation. By the induction hypothesis on , each such is in . It follows that , and thus , are in as well. This concludes the proof of Theorem 2.
∎
Appendix E Alternative TFN architecture
In this appendix we show that replacing the standard TFN convolutional layer with the layer defined in equation 13:
we can obtain -spanning networks using consecutive convolutions with -filters (that is, filters in , where ). Our discussion here is somewhat informal, meant to provide the general ideas without delving into the details as we have done for the standard TFN architecture in the proof of Theorem 2. In the end of our discussion we will explain what is necessary to make this argument completely rigorous.
We will only need two fixed filters for our argument here: The first is the -filter defined by setting and to obtain
The second is the filter defined by setting and , so that
We prove our claim by showing that a pair of convolutions with -filters can construct any convolutional layer defined in equation 7 for the -spanning architecture using tensor representations. The claim then follows from the fact that convolutions of the latter architecture suffice for achieving -spanning, as shown in Lemma 2.
Convolutions for tensor representations, defined in equation 7, are composed of two terms:
To obtain the first term , we set in equation 13 we obtain (the decomposition into irreducibles of) . Thus this term can in fact be expressed by a single convolution. We can leave this outcome unchanged by a second convolution, defined by setting .
To obtain the second term , we apply a first convolution with , to obtain
By applying an additional averaging filter, defined by setting , we obtain . This concludes our ‘informal proof’.
Our discussion here has been somewhat inaccurate, since in practice and . Moreover, in our proof we have glossed over the multiplication by the unitary matrix used to obtain decomposition into irreducible representations. However the ideas discussed here can be used to show that convolutions with -filters can satisfy the sufficient condition for -spanning defined in Lemma 3. See our treatment of Theorem 2 for more details.
Appendix F Comparison with original TFN paper
In this Appendix we discuss three superficial differences between the presentation of the TFN architecture in Thomas et al. (2018) and our presentation here:
-
1.
We define convolutional layers between features residing in direct sums of irreducible representations, while (Thomas et al., 2018) focuses on features which inhabit a single irreducible representation. This difference is non-essential, as direct sums of irreducible representations can be represented as multiple channels where each feature inhabits a single irreducible representation.
- 2.
-
3.
We take the scalar functions to be polynomials, while (Thomas et al., 2018) take them to be fully connected networks composed with radial basis functions. Using polynomial scalar bases is convenient for our presentation here since it enables exact expression of equivariant polynomials. Replacing polynomial bases with fully connected networks, we obtain approximation of equivariant polynomials instead of exact expression. It can be shown that if is a -equivariant polynomial which can be expressed by some network defined with filters coming from a polynomial scalar basis, then can be approximated on a compact set , up to an arbitrary error, by a similar network with scalar functions coming from a sufficiently large fully connected network.
Appendix G Tensor Universality
In this section we show how to construct the complete set of linear invariant functionals from to . Since each such functional is of the form
where each is -invariant, it is sufficient to characterize all linear -invariant functionals .
It will be convenient to denote
We achieve our characterization using the bijective correspondence between linear functional and multi-linear functions : each such corresponds to a unique , such that
(22) |
where denote the standard basis elements of . We define a spanning set of equivariant linear functionals on via a corresponding characterization for multi-linear functionals on . Specifically, set
For we define a multi-linear functional:
(23) |
and for we define
(24) |
Proposition 1.
We note that (i) equation 22 provides a (cumbersome) way to compute all linear invariant functionals explicitly by evaluating the corresponding on the elements of the standard basis and (ii) the set is spanning, but is not linearly independent. For example, since , the space of invariant functionals on is one dimensional while .
Proof of Proposition 1.
We first show that the bijective correspondence between linear functional and multi-linear functions , extends to a bijective correspondence between -invariant linear/multi-linear functionals. The action of on is defined by
The action of on is such that the map
is - equivariant. It follows that if and satisfy equation 22, then for all , the same equation holds for the pair and . Thus -invariance of is equivalent to -invariance of .
Multi-linear functionals on invariant to are a subset of the set of polynomials on invariant to . It is known (see (Kraft & Procesi, 2000), page 114), that all such polynomials are algebraically generated by functions of the form
Equivalently, -invariant polynomials are spanned by linear combinations of polynomials of the form
(25) |
When considering the subset of multi-linear invariant polynomials, we see that they must be spanned by polynomials as in equation 25, where each appears exactly once in each polynomial in the spanning set. These precisely correspond to the functions in .
∎