Finding symmetry-breaking Order Parameters with Euclidean Neural Networks
Abstract
Curie’s principle states that “when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them”. We demonstrate that symmetry equivariant neural networks uphold Curie’s principle and can be used to articulate many symmetry-relevant scientific questions into simple optimization problems. We prove these properties mathematically and demonstrate them numerically by training a Euclidean symmetry equivariant neural network to learn symmetry-breaking input to deform a square into a rectangle and to generate octahedra tilting patterns in perovskites.
Machine learning techniques such as neural networks are data-driven methods for building models that have been successfully applied to many areas of physics, such as quantum matter, particle physics, and cosmology Carleo et al. (2019); Schütt et al. (2020).
All machine learning models can be abstracted as a function parameterized by learnable weights that maps vector space to another vector space , i.e. . Weights are updated by using a loss function which evaluates the performance of the model. In the case of neural networks which are differentiable models, the weights are updated using the gradients of the loss with respect to , where is the learning rate.
An important consideration for enhancing the performance and interpretability of these “black box” models when used for physics is how to incorporate axioms of symmetry Behler and Parrinello (2007); Bartók et al. (2013); Schütt et al. (2018); Thomas et al. (2018); Kondor et al. (2018); Weiler et al. (2018); Anderson et al. (2019); Grisafi et al. (2018). Building symmetry into the model prevents the model from learning unphysical bias and can lead to new capabilities for investigating physical systems.
Symmetry invariant models only operate on invariant quantities i.e. scalars, while symmetry equivariant models can preserve equivariant transformations e.g. a change of coordinate system. A function is equivariant under symmetry group if and only if the group action commutes with the function, i.e. for the group representation and acting on vector space and , respectively, and . While equivariance is more general, invariant models are easier to build; most present-day symmetry-aware models are invariant. However, only symmetry equivariant models can fully express the richness of symmetry-related phenomena of physical systems, e.g. degeneracy and symmetry breaking.
Identifying sources of symmetry breaking is an essential technique for understanding complex physical systems. Many discoveries in physics have been made when symmetry implied something was missing (e.g. the first postulation of the neutrino by Pauli Brown (1978)); many physical phenomena are now understood to be consequences of symmetry breaking Anderson (1972): the mechanism that generates mass Englert and Brout (1964); Higgs (1964); Guralnik et al. (1964), superconductivity Bardeen et al. (1957); Nambu (1960), and phase transitions leading to ferroelectricity Landau (1937).
In this Letter, we show how symmetry equivariant models can perform symmetry-related tasks without the conventional tools of symmetry analysis (e.g. character tables and related subgroup conventions). Using these networks, we can pose symmetry-related scientific questions as simple optimization problems without using explicit knowledge of the subgroup symmetry of the input or output. These networks can e.g. identify when data (input and output) are not compatible by symmetry, recover missing symmetry-breaking information, find symmetry-intermediate solutions between a given input and target output, and build symmetry-compatible models from limited data.
These applications are possible due to two properties of symmetry equivariant neural networks that we prove in this Letter: (1) Symmetry equivariant functions exhibit Curie’s Principle Curie (1894); Chalmers (1970); (2) Gradients of an invariant loss function acting on both the network and target outputs can be used to recover the form (representation) of symmetry-breaking information missing from the network input.
We organize this Letter as follows: First, we provide background on symmetry equivariant neural networks. Second, we prove the symmetry properties of the output and gradients of Euclidean symmetry equivariant neural networks and demonstrate them numerically by training a Euclidean neural network to deform a square into a rectangle. Third, we use this technique on a more complex physical example, octahedral tilting in perovskites.
Euclidean neural networks are a general class of networks that has been explored by multiple groups Thomas et al. (2018); Kondor et al. (2018); Weiler et al. (2018) and build on previous work on building equivariances into convolutional neural networks Worrall et al. (2017); Kondor and Trivedi (2018); Cohen et al. (2019).
The success of convolutional neural networks at a variety of tasks is due to them having translation equivariance (e.g. a pattern can be identified in any location). Euclidean neural networks are a subset of convolutional neural networks where the filters are constrained to be equivariant to 3D rotations. To accomplish this, the filter functions are defined to be separable into a learned radial function and real spherical harmonics, , analogous to the separable nature of the hydrogenic wavefunctions.
An additional consequence of Euclidean equivariance is that all “tensors” in a Euclidean neural network are geometric tensors and input and filter geometric tensors must be combined according to the rules of tensor algebra, using Clebsch-Gordan coefficients or Wigner 3j symbols (they are equivalent) to contract representation indices. We express these geometric tensors in an irreducible representation basis. The only convention in these networks is the choice of the basis for the irreducible representations of which dictates the spherical harmonics and Wigner 3j symbols we use.
In our experiments, we use geometric tensors to express spatial functions; specifically the resulting coefficients from projecting a local point cloud onto spherical harmonics. We treat a local point cloud around a chosen origin as a set of functions and evaluate the spherical harmonics at those corresponding angles (up to some maximum ). Then, we weigh the spherical harmonic projection of each point by its radial distance from the origin
(1) |
The coefficients of this projection form a geometric tensor in the irreducible basis. We interpret the magnitude of the function on the sphere as a radial distance from the origin. We additionally re-scale this signal to account for finite basis effects by ensuring the max of the function corresponds to the original radial distance ().
First, we prove that symmetry equivariant functions obey “Curie’s principle”. For a group , vector space , and a representation , the symmetry group of is defined as
(2) |
Let be equivariant to group . “Curie’s principle” can be articulated as
(3) |
Proof: For (i.e. ),
(4) | ||||
(5) |
According to Eqn. 3, since Euclidean neural networks are equivariant to Euclidean symmetry, the symmetry of the output can only be of equal or higher symmetry than the input. This implies that the network will also preserve any subgroup of Euclidean symmetry, e.g. point groups and space groups.
To demonstrate this, we train Euclidean neural networks to deform two arrangements of points in the plane into one another, one with four points at the vertices of a square, and another with four points at the vertices of a rectangle, shown as blue and orange points in Figure 1.
To conduct our experiments, we use the e3nn framework Geiger et al. (2020) for 3D Euclidean equivariant neural networks in this work written with PyTorch pytorch Paszke et al. (2019). The jupyter Kluyver et al. (2016) notebooks used for running the experiments and creating the figures for this Letter are made available at Ref. Smidt (2020).
We train each network to match the spherical harmonic projection of the desired displacement vector i.e. final point location. As we will show, this representation is helpful for identifying degeneracies when they arise.
First, we train a Euclidean neural network to deform the rectangle into the square. This network is able to accomplish this quickly and accurately. Second, we train another Euclidean neural network to deform the square into the rectangle. No matter the amount of training, this network cannot accurately perform the desired task.
In Figure 1, we show output of the trained networks for both cases. On the right, we see that the model trained to deform the square into the rectangle is producing symmetric spherical harmonic signals each with two maxima. Due to being rotation equivariant, the network cannot distinguish distorting the square to form a rectangle aligned along the axis from a rectangle along the axis. The model automatically weighs symmetrically degenerate possibilities equally. By Eqn. 3, the output of the network has to have equal or higher symmetry than the input.
We emphasize here that the network does not “know” the symmetry of the inputs; the network predicts a degenerate answer simply because it is constrained to be equivariant. This is analogous to how physical systems operate and why physical systems exhibit “Curie’s priniciple”.
Having a dataset where the “inputs” are higher symmetry that the “outputs” implies there is missing data – an asymmetry waiting to be discovered. In the context of phase transitions as described by Landau theory Landau (1937), symmetry-breaking factors are called order parameters. To update its weights, a neural network is required to be differentiable, such that gradients of the loss can be taken with respect to every parameter in the model. This technique can be extended to the input; we use this approach to recover symmetry-breaking order parameters.
To prove that this is possible, we must prove that the gradients of a -invariant scalar loss (such as the MSE loss) evaluated on the output of a -equivariant neural network and ground truth data , e.g. , can have lower symmetry than the input.
The symmetry of the combined inputs to the invariant loss function is equal to or higher than the intersection of the symmetries of the predicted and ground truth outputs
(6) | |||
Proof: For ,
(7) |
Furthermore, if is a differentiable and invariant function , then is equivariant to by the equivariance of differentiation
(8) |
Thus, if the symmetry of the ground truth output is lower than the input to the network, the gradients can have symmetry lower than the input, allowing for the use of gradients to update the input to the network to make the network input and output symmetrically compatible. This procedure can be used to find symmetry-breaking order parameters missing in the original data but implied by symmetry.
Now, we demonstrate the symmetry properties of Euclidean neural networks according to Eqns. 6 and 8 can be used to learn symmetry breaking order parameters to deform the square into the rectangle.
In this task, we allow for additional inputs for each point of irreps where the number denotes the irrep degree and the subscript denote even and odd parity, respectively (e.g. transforms as a vector and transforms a pseudovector). These irreps are initialized to be zero and we modify the training procedure. We require the input to be the same on each point, such that we learn a “global” order parameter. We also add an identical component-wise mean absolute error (MAE) loss on each components of the input feature to encourage sparsity. We train the network in the coordinate frame that matches the conventions of point group tables.
We first train the model normally until the loss no longer improves. Then we alternate between updating the parameters of the model and updating the input using gradients of the loss. As the loss converges, we find that the input for consists of non-zero order parameters comprising only of , , , and , where the superscript denotes the order of the irrep where . See Fig. 2 for images of the evolution of the input and output signals during the model and order parameter optimization process. The order parameters distinguish the direction from the direction while maintaining the full symmetry of the rectangle.
Our optimization returns four order parameters , , , and because the gradients cannot break the degeneracy between these equally valid order parameters. To recover only e.g. the order parameter using Euclidean neural networks, we can do one of two things to break this degeneracy: limit the possible input order parameters e.g. or add a loss that penalizes higher degree order parameters. Thus, Euclidean neural networks can recover both the most general order parameters (including degeneracies) and more constrained orders parameters e.g. by using a custom loss function.

To arrive at this conclusion from the perspective of a conventional symmetry analysis: First, the symmetry of the square and rectangle must be identified as point group and point group , respectively. Second, the lost symmetries need to be enumerated; going from the square to the rectangle, 8 symmetry operations are lost – two four-fold axes, two two-fold axes, two improper four-fold axes, and two mirror planes. Then, the character table for the point group is used to find which direct sum of irreps break these symmetries. In this case, there is one 1-dimensional irreducible representation of that breaks all these symmetries, . The character table additionally lists that irrep has a basis function of (i.e. ) in the coordinate system with being along the highest symmetry axis and and aligned with two of the mirror planes. Character tables only typically report basis functions up to , so the higher order irreps , , and are not listed, but one can confirm with simple calculations that they transforms as . This conventional approach becomes more involved for objects with more complicated symmetry. In such cases, it is standard practice to employ computer algorithms to find e.g. relevant isotropy subgroups. However, many databases and tools for performing conventional symmetry analyses are not open source, making them difficult to incorporate into specific efforts.

Now, we demonstrate this method on a more complicated example and use it to find symmetrically intermediate structures. Perovskite crystal structures are composed of octahedrally coordinated transition metal sites on a cubic lattice where the octahedra share corner vertices (Figure 3). Perovskites display a wealth of exotic electronic and magnetic phenomena, the onset of which is often accompanied by a structural distortion of the parent structure in space group (221) caused by the softening of phonon modes or the onset of magnetic orders Benedek and Fennie (2013).

The octahedra in perovskites can distort in a variety of ways, one of which is by developing tilting patterns, commonly classified using Glazer notation introduced in Ref. Glazer (1972). Using the same procedure as the previous section, we recover the order parameters for two perovskites structures with different octahedral tilting. We use periodic boundary conditions to treat the crystal as an infinite periodic solid.
For this demonstration we compare the parent structure in (221) to the structure in the subgroup (62). We use the same training procedure as above except for the following: we only apply order parameters to the sites and we allow each site to have its own order parameter. We also add a penalty that increases with the of the candidate order parameter.
From training, the model is able to recover that each site has a nontrivial pseudovector order parameter with equal magnitude; this can be intuitively interpreted as different rotation axes with equal rotation angle for each site. The pattern of rotation axes and corresponding octahedral tilting is described and shown in Figure 3. If we look at the character table for we can confirm that this pattern of pseudovectors matches the irreps , the irreps recovered in Ref. Howard and Stokes (1998). In contrast to conventional symmetry analysis, our method provides a more clear geometric interpretation of these order parameters as rotation axes. Additionally, the same model can be used to determine the form of the order parameter and build a model that can predict the amplitude of this distortion e.g. based on composition and the parent structure.
We can also learn to produce output that is symmetrically intermediate between input and ground truth output by restricting learnable order parameters. If we train an identical model, but constrain the pseudovector order parameter to be zero along the direction and non-adjacent B sites to have identical order parameters, we recover an intermediate structure in the space group (74) described and shown in Figure 3.
In contrast to conventional symmetry analysis which requires classifying the symmetry of given systems, we perform symmetry analyses with Euclidean neural networks by learning equivariant mappings. This allows us to gain symmetry insights using standard neural network training procedures. Our methods do not rely on any tabulated information and can be directly applied to tensor fields of arbitrary complexity.
Symmetry equivariant neural networks act as “symmetry compilers”: they can only fit data that is symmetrically compatible and can be used to help find symmetry-breaking order parameters necessary for compatibility. The properties proven in this Letter generalize to any symmetry-equivariant network and are relevant to any branch of physics using symmetry-aware machine learning models to create surrogate or generative models of physical systems. The same procedures demonstrated in this Letter can be used to find order parameters of other physical systems, e.g. missing environmental parameters of an experimental setup (such as anisotropy in the magnetic field of an accelerator magnet) or identifying other symmetry-implied information unbeknownst to the researcher.
I Acknowledgements
Acknowledgements.
T.E.S. thanks Sean Lubner, Josh Rackers, Sinéad Griffin, Robert Littlejohn, James Sethian, Tamara Kolda, Frank Noé, Bert de Jong, and Christopher Sutton for helpful discussions. T.E.S. and M.G. were supported by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory and B.K.M. was supported by CAMERA both under U.S. Department of Energy Contract No. DE-AC02-05CH11231.References
- Carleo et al. (2019) G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Rev. Mod. Phys. 91, 045002 (2019), URL https://link.aps.org/doi/10.1103/RevModPhys.91.045002.
- Schütt et al. (2020) K. T. Schütt, S. Chmiela, O. A. von Lilienfeld, A. Tkatchenko, K. Tsuda, and K.-R. Müller, eds., Machine Learning Meets Quantum Physics (Springer International Publishing, 2020), URL https://doi.org/10.1007/978-3-030-40245-7.
- Behler and Parrinello (2007) J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007), URL https://link.aps.org/doi/10.1103/PhysRevLett.98.146401.
- Bartók et al. (2013) A. P. Bartók, R. Kondor, and G. Csányi, Phys. Rev. B 87, 184115 (2013), URL https://link.aps.org/doi/10.1103/PhysRevB.87.184115.
- Schütt et al. (2018) K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller, The Journal of Chemical Physics 148, 241722 (2018), eprint https://doi.org/10.1063/1.5019779, URL https://doi.org/10.1063/1.5019779.
- Thomas et al. (2018) N. Thomas, T. E. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds (2018), eprint arXiv:1802.08219.
- Kondor et al. (2018) R. Kondor, Z. Lin, and S. Trivedi, in Advances in Neural Information Processing Systems 32 (2018), pp. 10117–10126.
- Weiler et al. (2018) M. Weiler, M. Geiger, M. Welling, W. Boomsma, and T. Cohen, in Advances in Neural Information Processing Systems 32 (2018), pp. 10402–10413.
- Anderson et al. (2019) B. Anderson, T. S. Hy, and R. Kondor, in Advances in Neural Information Processing Systems (2019), pp. 14537–14546.
- Grisafi et al. (2018) A. Grisafi, D. M. Wilkins, G. Csányi, and M. Ceriotti, Phys. Rev. Lett. 120, 036002 (2018), URL https://link.aps.org/doi/10.1103/PhysRevLett.120.036002.
- Brown (1978) L. M. Brown, Physics Today 31, 23 (1978), URL https://doi.org/10.1063/1.2995181.
- Anderson (1972) P. W. Anderson, Science 177, 393 (1972), ISSN 0036-8075, URL https://science.sciencemag.org/content/177/4047/393.
- Englert and Brout (1964) F. Englert and R. Brout, Phys. Rev. Lett. 13, 321 (1964), URL https://link.aps.org/doi/10.1103/PhysRevLett.13.321.
- Higgs (1964) P. W. Higgs, Phys. Rev. Lett. 13, 508 (1964), URL https://link.aps.org/doi/10.1103/PhysRevLett.13.508.
- Guralnik et al. (1964) G. S. Guralnik, C. R. Hagen, and T. W. B. Kibble, Phys. Rev. Lett. 13, 585 (1964), URL https://link.aps.org/doi/10.1103/PhysRevLett.13.585.
- Bardeen et al. (1957) J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Phys. Rev. 108, 1175 (1957), URL https://link.aps.org/doi/10.1103/PhysRev.108.1175.
- Nambu (1960) Y. Nambu, Phys. Rev. 117, 648 (1960), URL https://link.aps.org/doi/10.1103/PhysRev.117.648.
- Landau (1937) L. Landau, Zh. Eksp. Teor. Fiz. 7, 19 (1937).
- Curie (1894) P. Curie, Journal de Physique Théorique et Appliquée 3, 393 (1894), URL https://doi.org/10.1051/jphystap:018940030039300.
- Chalmers (1970) A. F. Chalmers, The British Journal for the Philosophy of Science 21, 133 (1970).
- Worrall et al. (2017) D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
- Kondor and Trivedi (2018) R. Kondor and S. Trivedi, in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, edited by J. G. Dy and A. Krause (PMLR, 2018), vol. 80 of Proceedings of Machine Learning Research, pp. 2752–2760, URL http://proceedings.mlr.press/v80/kondor18a.html.
- Cohen et al. (2019) T. S. Cohen, M. Geiger, and M. Weiler, in Advances in Neural Information Processing Systems 32, edited by H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (2019), pp. 9142–9153.
- Geiger et al. (2020) M. Geiger, T. E. Smidt, B. K. Miller, W. Boomsma, K. Lapchevskyi, M. Weiler, M. Tyszkiewicz, and J. Frellsen, github.com/e3nn/e3nn (2020), URL https://doi.org/10.5281/zenodo.3723557.
- Paszke et al. (2019) A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., in Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019), pp. 8024–8035.
- Kluyver et al. (2016) T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, et al., in Positioning and Power in Academic Publishing: Players, Agents and Agendas, edited by F. Loizides and B. Schmidt (IOS Press, 2016), pp. 87 – 90.
- Smidt (2020) T. E. Smidt, Code repository for “Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks, https://github.com/blondegeek/e3nn_symm_breaking (2020), URL https://doi.org/10.5281/zenodo.4087189.
- Benedek and Fennie (2013) N. A. Benedek and C. J. Fennie, The Journal of Physical Chemistry C 117, 13339 (2013), URL https://doi.org/10.1021/jp402046t.
- Howard and Stokes (1998) C. J. Howard and H. T. Stokes, Acta Crystallographica Section B Structural Science 54, 782 (1998), URL https://doi.org/10.1107/s0108768198004200.
- Glazer (1972) A. M. Glazer, Acta Crystallographica Section B Structural Crystallography and Crystal Chemistry 28, 3384 (1972), URL https://doi.org/10.1107/s0567740872007976.