Polynomial Approximation of Symmetric Functions
Abstract
We study the polynomial approximation of symmetric multivariate functions and of multi-set functions. Specifically, we consider , where , and is invariant under permutations of its arguments. We demonstrate how these symmetries can be exploited to improve the cost versus error ratio in a polynomial approximation of the function , and in particular study the dependence of that ratio on and the polynomial degree. These results are then used to construct approximations and prove approximation rates for functions defined on multi-sets where becomes a parameter of the input.
1 Introduction
Many quantities of interest in sciences and engineering exhibit symmetries. The approximation of such quantities can be made more efficient when these symmetries are correctly exploited. A typical problem we have in mind is the approximation of particle models, in particular interatomic potentials, or Hamiltonians in quantum chemistry, materials science or bio-chemistry. Similar symmetries are present in -point correlation or cumulant functions of stochastic processes. Our current work focuses on polynomial approximation of multivariate functions that are symmetric under arbitrary permutations of coordinates, as a foundation for more complex symmetries. Even though our focus in the present work is on symmetric functions, our results are also directly relevant for anti-symmetric wave functions [6]: first, our complexity estimates immediately apply to anti-symmetric parameterisations; and secondly, many successful nonlinear parameterisations of wave functions (e.g., Jastrow or backflow) are in terms of symmetric components.
The performance of an approximation can be measured in different ways. One can measure the number of degrees of freedom required to achieve a given accuracy, in a target norm. A second important factor is the evaluation cost of the approximate function. Indeed, one may reduce the number of basis functions required to approximate a given function within a given tolerance while greatly increasing the evaluation cost. Therefore, a good compromise between these two aspects is particularly important.
Thus, the aim of this article is to show how the approximation of permutation invariant functions can be made efficient by combining two elements: First, a particular symmetrisation in the evaluation of the function leading to a linear evaluation cost of each basis function with respect to the number of variables; and second, profiting from the symmetries to speed the convergence of the approximation with respect to the number of basis functions and evaluation cost.
For physical models, one should also incorporate isometry invariance into the analysis, however this would make the analysis significantly more complex while only marginally improving our results. Thus for the sake of simplicity, the present work will only consider permutation invariant functions (symmetric functions) and we refer to [24, 7] to explain how invariance under (or other groups) can in principle also be incorporated into our framework.
In general, multivariate approximation of functions by polynomials on product domains (see for example [13, 9, 21]) is – even for analytic approximands – subject to the curse of dimensionality: the number of parameters necessary to reach a given accuracy increases exponentially with the space dimension. One possibility to avoid this effect when approximating smooth high-dimensional functions is to exploit anisotropy in the different dimensions [3, 9]. In the present setting, however, due to the symmetry all dimensions play an equivalent role. Approximations of symmetric functions have been studied in [10] in a context of nonlinear approximation, where the authors provide bounds on the number of parameters needed to approximate symmetric or anti-symmetric functions within a target accuracy. These results also exhibit the curse of dimensionality. However, note that due to the two different sets of assumptions used in [10] and our work, it is not straightforward to directly compare the results. Related to our own work is the study of deep set architectures [25, 15] for approximating set-functions, where the symmetry of the approximation is enforced by summation in a latent space. Theoretical results on deep sets [25, 22] relate the maximum number of elements in the set input to the dimension of the latent space (which is analogous to the total degree of our approximation) that is required to represent the function exactly, but no error estimates are available so far.
In the present work we develop a rigorous approximation theory for symmetric (and multi-set) functions that directly relates the number of parameters to the approximation error, independently of the space dimension (or the number of inputs). We also note that such efficient representations of symmetric functions can heavily rely on invariant theory [4] and group theory [19]. We will start in Section 2 by considering functions with full permutation symmetry, that is, invariance with respect to the permutation of any two variables. In Section 3, we will provide generic results for functions exhibiting symmetry with respect to permutations of vectorial arguments , which is the most typical situation for physical models. Here, is the dimension of each argument , which could for example represent the position or momentum of a particle, while denotes the number of such arguments. Our analysis is particularly concerned with the question of how the two dimensions and the polynomial degree are connected in terms of approximation error and computational cost. In this regard, our point of view differs from the one of independent accuracy and dimensionality parameters, as taken for instance in [23, 10].
Our primary motivation for this study is as a foundation for the approximation of extremely high-dimensional functions and of multi-set functions which can be decomposed in terms of a body-order expansion. The multi-set function setting is particularly interesting for us since it commonly arises in the representation of particle interactions. We will show in Section 4 how to extend our symmetric function approximation results to obtain approximation rates for functions defined on multi-sets. In that setting we will have to address the simultaneous approximation of a family of related functions with increasing dimensionality. Since our framework has close connections with deepsets, our analysis may also shed new light on those architectures. We briefly comment on the connection in Remark 4.4, but do not include a deeper exploration in the present work.
2 Symmetric Functions in
Before we formulate our most general results we consider the approximation of a smooth symmetric function . By being symmetric we mean that
(2.1) |
with denoting the symmetric group of degree . For later reference we define
In this section we will outline the main ideas how symmetry can be optimally incorporated into approximation schemes in the simplest possible concrete setting, but will then generalize them in various ways in § 3.
A general can be expanded as a Chebyshev series,
where and are the standard Chebyshev polynomials of the first kind. To allow for the possibility of constructing sparse approximations we will assume that belongs to a Korobov class,
(2.2) |
which one can justify, e.g., through a suitable multi-variate generalisation of analyticity. The dependence of the upper bound on also arises naturally in this context.
In high-dimensional approximation one often considers spaces with weighted norms that reflect the relative importance of dimensions, which would lead to replacing by accordingly weighted quantities; see, e.g., Chapter 5 of [14] and the references given there. However, due to our assumption of symmetry, all dimensions are equally important. In this case, the following total degree approximation is natural: For , define
(2.3) |
then we immediately obtain the exponential approximation error estimate
(2.4) |
Although the term suggests an excellent approximation rate, the curse of dimensionality still enters through the prefactor as well as through the cost of evaluating , which scales as
where is the number of terms in (2.4). The number of operations involved in each term can be reduced to ; cf Remark. 3.6. Although this is far superior to the naive tensor product approximation leading to terms, it remains expensive in high dimensions.
Incorporating the symmetry into the parameterisation allows us to significantly reduce the number of basis functions (or, parameters): If we require that inherits the symmetry (2.1) – see [7, Section 3] why this does not worsen the approximation quality – one can readily see that it is reflected in the coefficients via
Thus, we can obtain a symmetrised representation
(2.5) | ||||
where | (2.6) |
and denotes the set of all ordered -tuples, i.e., if . When is not strictly ordered then for some permutations and hence the coefficient is different from .
It is immediate to see that form a basis of the space of symmetric polynomials, which in turn is dense in ; see [7, Proposition 1] for more details.
Although the representation (2.5) significantly reduces the number of parameters (almost by a factor ), it does not reduce the cost of evaluating due to the cost of evaluating each symmetrised basis function . However, an elementary idea from invariant theory leads to an alternative symmetric basis, and hence an alternative scheme to evaluate , which significantly lowers the cost and appears to entirely overcome the curse of dimensionality: It is a classical result that, since is a symmetric polynomial, it can be written in the form
(2.7) |
where are the power sum polynomials. This representation fully exploits the symmetry and one could expand on this idea to construct an efficient evaluation scheme.
Here, we follow a closely related construction, inspired by [5], which is easier to analyze and most importantly to generalize to more complex scenarios (cf. § 3). A straightforward generalisation of the power sum polynomials is the following symmetric one-body basis,
(2.8) |
Remark 2.1.
Indeed, could play the same role as in (2.7), however, we use them differently by forming the products
(2.9) |
If is a permutation of then the two products and coincide; that is, are again symmetric polynomials. In fact they form a complete basis of that space.
Lemma 2.2.
The set is a complete basis of the space of symmetric polynomials.
Proof.
We begin by rewriting the naive symmetrised basis as
where contains the “self-interactions”, i.e., repeated indices. Therefore,
Since only involves terms with strictly less than products. Therefore, the change of basis from to is lower triangular with terms on the diagonal, and is hence invertible. ∎
A second immediate observation is that the total degree of a tensor product basis function immediately translates to the basis. That is, the total degree of is again , which yields the next result.
Corollary 2.3.
There exist coefficients such that
(2.10) |
In Remark 3.6 we explain that the computational cost of evaluating (2.10) is directly proportional to the the number of terms, or parameters, which we denote by
When clear from the context we will write . To estimate that number we observe that the set
can be interpreted as the set of all integer partitions of , of length at most (indices do not contribute). There exist various bounds for the number of such partitions that incorporate both and , such as [17, Theorem 4.9.2], originally presented in [1],
(2.11) |
which (unsurprisingly) suggests that we gain an additional factor in the number of parameters and in the computational cost, compared to the total degree approximation which has asymptotic cost as . We will return to these estimates below.
Since we are particularly interested in an -independent bound we will instead use a classical result of Hardy and Ramanujan [11].
Lemma 2.4.
For any we have
(2.12) |
Proof.
The key property of this bound is that it is independent of the domain dimension , which yields the following super-algebraic (but sub-exponential) convergence rate.
Theorem 2.5.
Let , then there exists a constant such that for all , the symmetric total degree approximation (2.10) satisfies
where are independent of and but may depend on .
Proof.
From our foregoing discussion we obtain that
Fix any , then clearly, there exists such that implies
hence, in this regime, we obtain
(2.13) |
Next, we estimate in terms of the number of parameters, using Lemma 2.4, by , which we invert to obtain
Inserting this into (2.13) completes the proof with and . ∎
3 General Results
3.1 A basis for symmetric functions in
We consider the approximation of functions where each coordinate , , and is invariant under permutations, i.e.,
(3.1) |
We will indicate this scenario by saying that is symmetric. We assume throughout that is the closure of a domain, and define the space
To construct approximants we begin from a one-body basis,
Assumption 3.1.
To enable the convenient extension of uni-variate to multi-variate constructions we make the following assumptions which are easily justified for all basis sets that we have in mind; see Remark 3.4.
-
(1)
The set of basis functions is dense in .
-
(2)
Each has an associated degree .
-
(3)
is an admissible index, and .
Finally, we need a definition of “intrinsic dimensionality” of our basis, and for simplicity also require that it matches the dimensionality of the domain . We therefore make the following additional assumption:
-
(4)
The number of one-body basis functions of degree is bounded by
where is a constant.
Under assumption (1), the tensor product basis functions
then satisfy that is dense in by the characterization of this space as an -fold injective tensor product (see, e.g., [18, §3.2]). As an immediate consequence, we obtain that the symmetrised tensor products,
span the symmetric functions, that is, we have the following result.
Proposition 3.2.
is dense in , and
Proof.
An alternative symmetric many-body basis can be constructed by mimicking the construction of the power sum polynomials in § 2 (see also [5, 7] which are our inspiration for this construction),
(3.2) | ||||
(3.3) |
We denote the corresponding basis set by
where we recall that . For future reference we define
Proposition 3.3.
= , and in particular, is dense in . Moreover, is linearly independent.
Proof.
The argument is identical to the case. ∎
Remark 3.4 (Examples of Basis Sets).
(i) Tensor product Chebyshev polynomials. The first concrete case we consider is with one-particle basis This is the simplest setting to which our analysis applies. In this case, we would simply take .
(ii) Two-dimensional rotational symmetry. Assuming a rotationally symmetric domain, , a natural one-body basis is
where the radial basis or could, e.g., be chosen as transformed Zernike, Chebyshev, or other orthogonal polynomials. This has the natural associated total degree definition
(iii) Three-dimensional rotational symmetry. Our final example concerns three-dimensional spherical symmetry, where the one-particle domain is given by In this case, it is natural to employ spherical harmonics to discretize the angular component, i.e.,
with associated degree Since and since are coupled under rotations it is often more natural though to take , which does not change our results.
Remark 3.5 (Lie Group Symmetries).
In many physical modelling scenarios one would also like to incorporate Lie group symmetries, e.g. invariance or covariance under rotations and reflections, or under in relativity theory under the Lorentz group. This can in principle be incorporated into our analysis, but there in general the coupling between the permutation group and such Lie groups is non-trivial and it becomes difficult to give sharp results. We therefore postpone such an analysis to future works.
That said, such additional symmetry does not reduce the parameterisation complexity nearly as much as the permutation symmetry does. For example in case (ii) of the previous remark, if rotational symmetry were imposed then this would only lead to an additional constraint on the basis functions. Secondly, such structural sparsity changes our remarks (e.g., Remark 3.6) on computational complexity. We show in [12] that asymptotically optimal evaluation algorithms can still be constructed for those cases.
Remark 3.6 (Computational Cost).
We briefly justify our claim that the computational cost for both (2.3) and (2.10) is directly proportional to the number of parameters. For the sake of brevity we focus only on the evaluation (2.10) in two stages. First, we compute the symmetric one-body basis defined in (2.8), where . According to () there can be at most elements of that basis. We shall assume that the cost of evaluating it scales as , which can for example be achieved by suitable recursive evaluations of a polynomial basis.
Once the one-body basis is precomputed and stored, the products can be evaluated recursively. Assume that the multi-indices are stored in a lexicographically ordered list. As we traverse the list we store the computed basis functions . For each new multi-index in the list,
we express the associated basis function as
thus evaluating with cost. Note that the multi-indices in the total degree approximations are a downset, hence if is part of the approximation then the new multi-index is also part of the approximation. In summary we therefore obtain that the total cost in evaluating (2.10) is bounded, up to a constant, by
3.2 Approximation errors
Our starting point in the case was to relate analyticity of the target function to approximation rates for the Chebyshev expansion. The same idea can be applied here but would be restrictive. In the multi-dimensional setting there is a far greater choice of coordinates and corresponding basis sets, which are highly problem-dependent. What is essential for our analysis is that the naive unsymmetrized (but sparse) approximation scheme has an exponential rate of convergence. In order to maintain the generality of our results we now formulate this as an assumption, which, loosely speaking encapsulates that is a “good” basis to approximate smooth functions on and therefore is a good basis for approximation in :
Assumption 3.7.
The target function has a sparse polynomial approximate , satisfying
(3.4) |
where , is the total degree of defined by
and the maximum is taken over all basis functions involved in the definition of .
Arguing as above we have
hence we may again assume without loss of generality that is symmetric. It can be represented either in terms of the basis or equivalently in terms of the basis , which optimizes the evaluation cost. This yields the representation
To obtain approximation rates in terms of the number of parameters,
we generalize the one-dimensional result of Hardy and Ramanujan [11].
Theorem 3.8.
Assume that Assumption 3.1 is satisfied, in particular (4) defining the intrinsic dimension . Then, there exists a constant such that for any ,
(3.5) |
Proof.
We adapt the proof of Erdös presented in [8] for the estimation of the number of integer partitions to the current context. First, we define the partition function
(3.6) |
where denotes the number of -body basis functions composed only of one-body basis functions with partial degree . Then, in the decomposition of as
(3.7) |
the coefficient is equal to the number of basis functions with total degree . Note that there is no limitations on the number of terms in the product in (3.6), which corresponds to having no restriction on .
Now, denoting by the number of one-body basis functions with degree , the number is the number of combinations with repetitions of in the set , which is
Then, for , a well-known identity on power series gives
Hence, the partition function can be written as
Taking the logarithm of , we obtain
from which we deduce, differentiating this expression, that
Using the expansion (3.7), we write , which leads to
Therefore, expanding as a power series,
Multiplying by on both sides, and substituting , we obtain
Hence, equating the coefficients of the power series on both sides, there holds
Let us now show by induction that , with
(3.8) |
where
(3.9) |
comes from (4) and .
First, , therefore the property is satisfied for .
Now, we assume that for any , . Using the recurrence property, there holds
(3.10) |
Then, using the concavity of the function and noting that , we obtain
Inserting the latter bound in (3.10) and using Assumption (4), we deduce
Summing over all , using that for ,
we obtain, taking ,
Introducing defined in (3.9), we get
Since and , we obtain
from which we deduce that for all , , i.e.
We then easily deduce the result of the theorem. ∎
With the basis size estimate in hand, we can now readily extend the one-dimensional approximation rate.
Proof.
3.3 Asymptotic Results for
Our results up to this point are uniform in but in fact, they turn out to be sharp only in the regime of fairly moderate degree , specifically for . Our final set of results provides improved complexity and error estimates for the regime . In particular, they will also provide strong indication that Theorem 3.9 is sharp in the regime .
Here, and below it will be convenient to use the symbols to mean less than, greater than, or bounded above and below up to uniform positive constants.
Our subsequent analysis is based on the fact that there exist constants such that, for with sufficiently large,
(3.11) |
The lower bound is straightforward to establish. The upper bound for follows immediately from (2.11). For , and in the limit it is again straightforward to establish. We prove a slightly modified generic upper bound for below. We then recover the upper bound (3.11) from the following theorem noting that with . (The constants in (3.11) and (3.12) are different).
Theorem 3.10.
Assume that Assumption 3.1 is satisfied, in particular (4) defining the intrinsic dimension . Then, there exist two constants that depend on such that for any ,
(3.12) |
Proof.
First, we define the partition function
(3.13) |
where denotes the number of -body basis functions composed only of one-body basis function with partial degree . Then, in the decomposition of as
(3.14) |
the coefficient is equal to the number of basis functions with total degree and number of parts .
Now, as in the proof of Theorem 3.8, denoting by the number of one-body basis functions of degree , we can write the partition function as
Taking the logarithm of , we obtain
from which we deduce, differentiating this expression with respect to , that
Using the expansion (3.14), we write , which leads to
Multiplying by on both sides, and substituting for , we obtain
Hence, equating the coefficients of the power series on both sides, we conclude
(3.15) |
We now show by induction that for any that satisfy
(3.16) |
where is the constant appearing in (4), , and for , we have
(3.17) |
Note that (3.16) holds in particular for the choice and .
First, we observe that , for , for , , and for all . Thus (3.17) holds in each of these cases.
Now for arbitrary but fixed and , we assume that (3.17) holds true for any , . Using the recurrence property, starting from (3.15), and noting that appearing in (4) can be bounded by for some , we find
(3.18) |
with
where , , . For estimating this quantity, we use the following two lemmas.
Lemma 3.11.
For , , , we have
(3.19) |
with , , and for .
Proof.
To prove (3.19), we introduce the function and compare the sum to the integral of . Note that (3.19) trivially holds for . Hence we assume . We treat three different cases depending on the values of and .
As the third case, we consider . For ,
so the function is increasing on , and decreasing on . In particular, the function has a single local maximum in . Hence, there holds
Integrating by parts times, we obtain
Moreover,
Since we easily deduce
and thus
Therefore, (3.19) holds also in this third case, with . This concludes the proof of the lemma. ∎
We now show that the prefactor can be bounded uniformly in and .
Lemma 3.12.
For , we have
(3.20) |
with and for , provided that .
Proof.
Coming back to (3.18) and using the estimates of the two lemmas, we obtain
Combining this with the estimates
yields
Noting that we obtain, replacing the index with ,
Taking such that and dividing by yields
Therefore, (3.17) is satisfied if that is, if This completes the induction. Hence, we deduce that
Finally, we note that a sufficient condition for to be increasing is that . Since this is satisfied under by (3.16),
Summing over the possible values of , we easily deduce the result of the Theorem. ∎
With this result at hand, we now consider the dependency of the number of parameters on .
Lemma 3.13.
Suppose that with . Then, for sufficiently large ,
(3.21) |
uniformly for all (but the constants may depend on ).
Proof.
From (3.11) we can estimate
Noting that asymptotically as , we easily obtain the upper bound. The lower bound is analogous. ∎
Proposition 3.14.
Suppose that for some , and that are sufficiently large, then
(3.22) |
Proof.
In particular, in the regime , if we start from the exponential convergence rate of Assumption 3.7, then we obtain that there exists a symmetric approximation with
(3.24) |
for some . This rate is clearly superior to the uniform -independent rate of Theorem 3.9 (in this regime at least). Moreover, as it strongly hints at a kind of “phase transition”, in other words, that the behaviour of the approximation scheme significantly changes at that point.
Remark 3.15.
It is furthermore interesting to compare our result (3.24) with analogous estimates if we had not exploited symmetry, or sparsity. An analogous analysis shows that if we use a (sparse) total degree approximation, but use a naive basis instead of a symmetrized basis, then we obtain
Finally, if we drop even the sparsity, so that , then we obtain
Note here that the dependence of the exponents highlights quite different qualitative behaviour of the three bases. Specifically, this provides strong qualitative evidence that (unsurprisingly) the sparse basis gives a significant advantage over the naive tensor product basis especially in the regime , but that it is still significantly more costly than the sparse symmetrized basis. Moreover, these estimates clearly highlight the unique advantage that the symmetrized basis provides in the regime treated in Theorem 3.9.
4 Approximation of multi-set functions
Our original motivation to study the approximation of symmetric functions arises in the atomic cluster expansion [5, 7] which is in fact concerned with the approximation of multi-set functions. We now study how our approximation results of the previous sections can be applied in this context. Similarly as in the previous sections, this discussion will again ignore isometry-invariance. We will focus on a general abstract setting, but employ assumptions that can be rigorously established in the setting of [5, 7].
Let denote the set of all multi-sets (or, msets) on ; i.e.,
(4.1) |
where denotes an unordered tuple or mset, e.g. describing a collection of (positions of) classical particles. The crucial aspect is that is now variable and no longer fixed. This is equivalent to the classical definition of a multiset which has the defining feature of allowing for multiple instances for each of its elements. We are interested in parameterising (approximating) mset functions
(4.2) |
Remark 4.1 (Context).
This situation arises, e.g., when modelling interactions between particles. Different local or global structures of particle systems lead to a flexible number of particles entering the range of the interaction law. It seems tempting to take the limit , which leads to a mean-field-like scenario where a signal processing perspective could be of interest. However, we are interested in an intermediate situation where is “moderate”; say, in the range to , and thus this limit is not of interest. Indeed, the number of interacting particles can be understood as another approximation parameter, which we discuss in more detail in Remark 4.3 below.
In order to reduce the approximation of an mset function to our foregoing results we must first produce a representation of in terms of finite-dimensional symmetric components. A classical idea is the many-body or ANOVA expansion, which we will formulate as an assumption. However, we emphasize that our recent results [20] rigorously justify this assumption in the context of coarse-graining electronic structure models into interatomic potentials models. The following formulation is modelled on those results, which are summarized in Appendix A.
Assumption 4.2.
(i) For all , there exist symmetric for , and such that , defined by
(4.3) |
for all .
(ii) Moreover, we suppose that each has a sparse polynomial approximation, where denotes the total degree, satisfying
(4.4) |
for some and . Here, is simply a constant term and requires no approximation.
Part (i) of Assumption 4.2 is the main result of [20], while part (ii) essentially encodes the assumption that the are analytic, with the region of analyticity encoded in and possibly varying. Under a suitable choice of one-particle basis, one then obtains (4.4). Note in particular, that in this context the dimensionality of the target functions is an approximation parameter, which makes it particularly natural to consider the different regimes how and are related in the foregoing sections.
Throughout the following discussion suppose that Assumption 4.2 is satisfied. Then, for a tuple specifying the total degrees used to approximate the components , the cluster expansion approximation to is given by
(4.5) |
with of the form , for some coefficients and one-body basis functions . We immediately deduce an approximation error estimate: for all ,
(4.6) |
From this expression we can now minimize the computational cost subject to the constraint that the overall error is no worse than the many-body approximation error and then obtain a resulting error vs cost estimate.
Remark 4.3.
As we already hinted in Remark 4.1, the function is often more naturally defined on the whole of (or even on the space of infinite multi-sets) but only “weakly” depends on points far away. That is, on defining (i.e. we remove from only points outside ) we have
that is, the domain itself becomes an approximation parameter as well. Such exponential “locality” results arise for example in modelling of interatomic interaction laws [20, 2]. For the sake of simplicity we will not incorporate this feature into our analysis, except for a natural assumption on how and are related:
If we approximate the restriction of to then we obtain
(4.7) |
Balancing the error, we choose . In many physical situations we can assume that particles do not cluster and this leads to the bound . More generally, we will therefore assume below that is bounded by a polynomial in , which will make our analysis a little more concrete.
Remark 4.4 (Connection to Deep sets).
Deep set architectures are based on the idea that set-functions which are continuous when restricted to any fixed number of inputs are symmetric if and only if there exist continuous functions and such that
(4.8) |
For , it has been shown that is sufficient [25], and there exist functions where is also necessary [22]. Motivated by this characterisation, one obtains a deep set approximation to by choosing a latent space dimension and learning the mappings and , e.g. parametrised using multi-layer perceptrons.
The cluster expansion approximation (4.5) can therefore be thought of as a special case of (4.8) with both and polynomials. However, the conceptual key difference in our setting is that, on the one hand, is significantly larger than and should be taken as an approximation parameter, while on the other hand, the embedding is known a priori and need not be learned. Despite these philosophical differences, one can see the approximation results of this section as approximation rates for deep set approximations.
4.1 Computational cost of the cluster expansion
The expression in (4.5) suggests that the evaluation cost scales as (notwithstanding the cost of evaluating the components), but in fact a similar transformation as in § 2 allows us to reduce this to a cost that is linear in ; see also [5, 7].
We consider a single term,
where contains the artificial self-interactions introduced when converting from to . We will return to this term momentarily. Now, inserting the expansion of , writing instead of , and interchanging summation order, we obtain
where the inner-most terms (power sum polynomials, atomic base, density projection) are now computed over the full input range instead of only a subcluster,
The self-interaction terms are simply polynomials of lower correlation-order and can be absorbed into the terms, provided that . Thus, we can equivalently write (4.5) as
(4.9) |
where the sum ranges over all ordered tuples of length , with .
This leads to the following result, which states that the cost of evaluating is the same as evaluating a single instance of each of the components ; that is, the sum over all possible clusters does not incur an additional cost.
Proposition 4.5.
Assume that Assumption 3.1 is satisfied, that the one-particle basis can be evaluated with operations per basis function (e.g. via recursion), and that the degrees are decreasing, . Then, cluster expansion can be evaluated with at most
(4.10) |
arithmetic operations (), where is the number of parameters used to represent the components for and is some positive constant.
Remark 4.6.
Proof.
Assumption 3.1 implies that there are one-particle basis functions, where . Precomputing all density projections therefore requires operations. The -correlations can be evaluated recursively with increasing correlation-order , with cost. Hence, the total cost of evaluating (4.5) will be as stated in (4.10). ∎
4.2 Error vs Cost estimates: special case
Having established the remarkably low computational cost of the cluster expansion in the reformulation (4.9), we can now return to the derivation of error versus cost estimates from (4.6). In order to illustrate our main results, we first consider the simplest case and suppose is constant. Motivated by Remark 4.3, we assume for some and define (independently of ), for some to be determined later. Since is constant, the number of parameters for the -correlation contributions will dominate. Since this puts us into the regime of the integer partition type estimates, which yields the following result.
Theorem 4.7.
Assume that appearing in (4.4) is constant and . If we choose for sufficiently large, then
(4.11) |
In terms of the total number of free parameters, , which is also directly proportional to the computational cost, the estimate reads
(4.12) |
for all with , for some , which is still a super-alebraic rate of convergence.
4.3 Error vs Cost estimates: General Case
To make this analysis concrete we will assume that in (4.4)
(4.13) |
For the sake of generality, we consider this more general class, which better highlights the importance of balancing degree and dimensionality , and also motivates us to revisit and try to sharpen the analysis of Appendix A and [20] in the future. In light of Proposition 4.5 we will formulate the result in terms of the number of parameters rather than in terms of computational cost.
Theorem 4.8.
Assume that for some and . If we choose
(4.14) |
for sufficiently large, then
(4.15) |
In terms of the total number of free parameters, , we have, for sufficiently large ,
(4.16) |
for all with , for some .
Proof.
Part 1: Error estimates (4.15). We balance the error by choosing such that for where is the constant from Assumption 4.2. To do so, we first apply Stirling’s formula, which gives for
Using that for , we therefore choose such that
That is, to obtain the required estimate, noting that , it is enough to choose
Since for all , we may instead choose such that
where . Since for all and for all , and by incorporating the condition that is decreasing, we may choose
To conclude, we note that, for , the function is decreasing and is increasing and so there exists for which for all and for all . Solving yields as required. This concludes the proof of (4.15).
Remark 4.9.
Moreover, by the same arguments, if , then we may choose for all (which, in particular, agrees with the choice in Theorem 4.7), and if , we can choose
Part 2: Estimates on the number of parameters (4.16). Let be the number of parameters needed to construct . According to (3.12), there exist such that
Using Stirling’s estimate, we obtain
Hence, for some ,
On the other hand, by (3.5), there exists such that
(4.17) |
for all .
Case (i): . In this case, we have following (4.14) and so if and only if . Therefore, assuming is sufficiently large such that
we obtain
(4.18) |
Case (ii): . In this case, and so using (4.17)
(4.20) |
In order to bound the total number of parameters needed to construct for , we consider sum of (4.18), (4.19), and (4.20) over their respective ranges of :
We start with (4.18). Since the function (for ) has a global maximum at with , we have taking and
(4.21) |
for some . Here, we have used the bound .
Next, we consider the sum of (4.19). On the relevant range of , we have , and thus
(4.22) |
for some .
Finally, summing (4.20) gives
(4.23) |
5 Conclusion
We have established rigorous approximation rates for sparse and symmetric polynomial approximations of symmetric functions in high dimension. What is particularly intriguing about our results is that they highlight clearly how symmetry reduces the curse of dimensionality, sometimes significantly so. Our results also build a foundation for analysing the approximation of functions defined on a configuration space (multi-sets), which we outlined as well.
Further open challenges include the incorporation of more complex symmetries, e.g., coupling permutation with Lie group symmetries such as for classical particles or for relativistic particles, as well as the identification of practical constructive algorithms to construct approximants in our context that achieve (close to) the optimal rates.
Acknowledgements
M.B. acknowledges funding by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 233630050 – TRR 146; G. D.’s work was supported by the French “Investissements d’Avenir” program, project ISITE-BFC (contract ANR-15-IDEX-0003); C.O. is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) [IDGR019381]; J.T. is supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant EP/W522594/1.
Appendix A Justification of the Many-Body Expansion
The body-ordered approximation asserted in Assumption 4.2 is motivated by recent results presented in [20] for a wide class of tight binding models. In this section, we give a brief justification of these results.
A.1 Boundedness:
Here, we will briefly note that naive body-ordered approximations satisfy the required bound. For , we may define the following vacuum cluster expansion:
(A.1) | ||||
(A.2) |
That is, the -body potential is obtained by considering the variables of interest and removing the terms that correspond to strictly smaller body-order. The alternating summation comes from the inclusion-exclusion principle. By construction, the expansion is exact for systems of size at most (i.e. for all ). Moreover, assuming is uniformly bounded, we obtain the required estimate .
While this classical vacuum cluster expansion is perhaps the most natural, it is not guaranteed to converge rapidly if at all. Instead, we will review recent results for the site energies for a particular class of simple electronic structure models.
A.2 Convergence: A tight-binding example
For an atomic configuration , a (two-centre) tight binding Hamiltonian , describing the interaction between the atoms in the system, is given by the following matrix
for some smooth function satisfying .
For functions , the corresponding observable, , is written as the following function of the Hamiltonian:
(A.3) |
We will think of as the total energy of the system and the right-hand side of (A.3) as a site energy decomposition. We will justify Assumption 4.2 for the site energy . By approximating with a polynomial of degree , we define the body-ordered approximation by
(A.4) |
a function of body-order at most . Convergence results for this approximation scheme follow from the estimate:
Now, if is an analytic function in a neighbourhood of the spectrum , we are able to construct approximations that give an exponential rate of convergence with rate depending on the region of analyticity of [20].
Written more explicitly, the body-ordered approximation has a similar expression to that of (A.2):
(A.5) |
Here, is the restriction of to defined as if and otherwise. Therefore, in this expression, is an -body potential of the central atom and (some authors therefore call an -correlation potential). The final line in (A.5) follows from an inclusion-exclusion principle [20]. In particular, the boundedness of the follows from the boundedness of [20], as in (A.2). Moreover, inherits the analyticity properties of the Hamiltonian.
We have therefore seen that the site energies in the tight binding framework satisfy both the conditions in Assumption 4.2: (i) convergence of a body-ordered approximation, with an exponential rate, and (ii) the are analytic (with region of analyticity depending on the regularity of the Hamiltonian).
References
- [1] A. G. Beged-dov, Lower and upper bounds for the number of lattice points in a simplex, SIAM J. Appl. Math., 22 (1972), pp. 106–108.
- [2] H. Chen and C. Ortner, QM/MM methods for crystalline defects. part 1: Locality of the tight binding model, Multiscale Model. Simul., 14 (2016), pp. 232–264.
- [3] A. Cohen and R. DeVore, Approximation of high-dimensional parametric PDEs *, Acta Numer., 24 (2015), pp. 1–159.
- [4] H. Derksen and G. Kemper, Computational Invariant Theory, Springer, Dec. 2015.
- [5] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Phys. Rev. B Condens. Matter, 99 (2019), p. 014104.
- [6] R. Drautz and C. Ortner. in preparation.
- [7] G. Dusson, M. Bachmayr, G. Csanyi, R. Drautz, S. Etter, C. van der Oord, and C. Ortner, Atomic cluster expansion: Completeness, efficiency and stability, J. Comp. Phys., 454 (2022).
- [8] P. Erdos, On an elementary proof of some asymptotic formulas in the theory of partitions, Ann. Math., 43 (1942), pp. 437–450.
- [9] M. Griebel and J. Oettershagen, On tensor product approximation of analytic functions, Journal of Approximation Theory, 207 (2016), pp. 348–379.
- [10] J. Han, Y. Li, L. Lin, J. Lu, J. Zhang, and L. Zhang, Universal approximation of symmetric and anti-symmetric functions, arXiv e-prints, 1912.01765 (2019).
- [11] G. H. Hardy and S. Ramanujan, Asymptotic formulaæ in combinatory analysis, Proceedings of the London Mathematical Society, (1918).
- [12] I. Kaliuzhnyi and C. Ortner, Optimal evaluation of symmetry-adapted -correlations via recursive contraction of sparse symmetric tensors. arXiv:2202.04140.
- [13] J. C. Mason, Near-best multivariate approximation by fourier series, chebyshev series and chebyshev interpolation, J. Approx. Theory, 28 (1980), pp. 349–358.
- [14] E. Novak and H. Woźniakowski, Tractability of multivariate problems. Vol. 1: Linear information, vol. 6 of EMS Tracts in Mathematics, European Mathematical Society (EMS), Zürich, 2008.
- [15] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- [16] R B J and A. Slomson, How to Count: An Introduction to Combinatorics, Second Edition, CRC Press, July 2011.
- [17] J. L. Ramírez Alfonsín, The Diophantine Frobenius Problem, OUP Oxford, Dec. 2005.
- [18] R. A. Ryan, Introduction to Tensor Products of Banach Spaces, Springer Monographs in Mathematics, Springer London, 2002.
- [19] B. Sagan, The symmetric group: representations, combinatorial algorithms, and symmetric functions, vol. 203, Springer Science & Business Media, 2001.
- [20] J. Thomas, H. Chen, and C. Ortner, Rigorous body-order approximations of an electronic structure potential energy landscape, arXiv e-prints, 2106.12572 (2021).
- [21] L. Trefethen, Multivariate polynomial approximation in the hypercube, Proc. Am. Math. Soc., (2017).
- [22] E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, and M. A. Osborne, On the limitations of representing functions on sets, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 6487–6494.
- [23] M. Weimar, The complexity of linear tensor product problems in (anti)symmetric Hilbert spaces, J. Approx. Theory, 164 (2012), pp. 1345–1368.
- [24] A. P. Yutsis and A. A. Bandzaitis, Theory of angular momentum in quantum mechanics, Vil’nyus, (1965).
- [25] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017.