This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Polynomial Approximation of Symmetric Functions

Markus Bachmayr Institut für Geometrie und Praktische Mathematik, RWTH Aachen University, Templergraben 55, 52056 Aachen, Germany Geneviève Dusson111[email protected] Laboratoire de Mathématiques de Besançon, UMR CNRS 6623, Université Bourgogne Franche-Comté, 16 route de Gray, 25030 Besançon, France Christoph Ortner Department of Mathematics, University of British Columbia, 1984 Mathematics Road, Vancouver, BC, V6T 1Z2, Canada Jack Thomas Mathematics Institute, Zeeman Building, University of Warwick, CV4 7AL, UK
Abstract

We study the polynomial approximation of symmetric multivariate functions and of multi-set functions. Specifically, we consider f(x1,,xN)f(x_{1},\dots,x_{N}), where xidx_{i}\in\mathbb{R}^{d}, and ff is invariant under permutations of its NN arguments. We demonstrate how these symmetries can be exploited to improve the cost versus error ratio in a polynomial approximation of the function ff, and in particular study the dependence of that ratio on d,Nd,N and the polynomial degree. These results are then used to construct approximations and prove approximation rates for functions defined on multi-sets where NN becomes a parameter of the input.

1 Introduction

Many quantities of interest in sciences and engineering exhibit symmetries. The approximation of such quantities can be made more efficient when these symmetries are correctly exploited. A typical problem we have in mind is the approximation of particle models, in particular interatomic potentials, or Hamiltonians in quantum chemistry, materials science or bio-chemistry. Similar symmetries are present in nn-point correlation or cumulant functions of stochastic processes. Our current work focuses on polynomial approximation of multivariate functions that are symmetric under arbitrary permutations of coordinates, as a foundation for more complex symmetries. Even though our focus in the present work is on symmetric functions, our results are also directly relevant for anti-symmetric wave functions [6]: first, our complexity estimates immediately apply to anti-symmetric parameterisations; and secondly, many successful nonlinear parameterisations of wave functions (e.g., Jastrow or backflow) are in terms of symmetric components.

The performance of an approximation can be measured in different ways. One can measure the number of degrees of freedom required to achieve a given accuracy, in a target norm. A second important factor is the evaluation cost of the approximate function. Indeed, one may reduce the number of basis functions required to approximate a given function within a given tolerance while greatly increasing the evaluation cost. Therefore, a good compromise between these two aspects is particularly important.

Thus, the aim of this article is to show how the approximation of permutation invariant functions can be made efficient by combining two elements: First, a particular symmetrisation in the evaluation of the function leading to a linear evaluation cost of each basis function with respect to the number of variables; and second, profiting from the symmetries to speed the convergence of the approximation with respect to the number of basis functions and evaluation cost.

For physical models, one should also incorporate isometry invariance into the analysis, however this would make the analysis significantly more complex while only marginally improving our results. Thus for the sake of simplicity, the present work will only consider permutation invariant functions (symmetric functions) and we refer to [24, 7] to explain how invariance under O(d)O(d) (or other groups) can in principle also be incorporated into our framework.

In general, multivariate approximation of functions by polynomials on product domains (see for example [13, 9, 21]) is – even for analytic approximands – subject to the curse of dimensionality: the number of parameters necessary to reach a given accuracy increases exponentially with the space dimension. One possibility to avoid this effect when approximating smooth high-dimensional functions is to exploit anisotropy in the different dimensions [3, 9]. In the present setting, however, due to the symmetry all dimensions play an equivalent role. Approximations of symmetric functions have been studied in [10] in a context of nonlinear approximation, where the authors provide bounds on the number of parameters needed to approximate symmetric or anti-symmetric functions within a target accuracy. These results also exhibit the curse of dimensionality. However, note that due to the two different sets of assumptions used in [10] and our work, it is not straightforward to directly compare the results. Related to our own work is the study of deep set architectures [25, 15] for approximating set-functions, where the symmetry of the approximation is enforced by summation in a latent space. Theoretical results on deep sets [25, 22] relate the maximum number of elements in the set input to the dimension of the latent space (which is analogous to the total degree of our approximation) that is required to represent the function exactly, but no error estimates are available so far.

In the present work we develop a rigorous approximation theory for symmetric (and multi-set) functions that directly relates the number of parameters to the approximation error, independently of the space dimension (or the number of inputs). We also note that such efficient representations of symmetric functions can heavily rely on invariant theory [4] and group theory [19]. We will start in Section 2 by considering functions with full permutation symmetry, that is, invariance with respect to the permutation of any two variables. In Section 3, we will provide generic results for functions exhibiting symmetry with respect to permutations of vectorial arguments (d)N(\mathbb{R}^{d})^{N}, which is the most typical situation for physical models. Here, dd is the dimension of each argument xix_{i}, which could for example represent the position or momentum of a particle, while NN denotes the number of such arguments. Our analysis is particularly concerned with the question of how the two dimensions d,Nd,N and the polynomial degree DD are connected in terms of approximation error and computational cost. In this regard, our point of view differs from the one of independent accuracy and dimensionality parameters, as taken for instance in [23, 10].

Our primary motivation for this study is as a foundation for the approximation of extremely high-dimensional functions and of multi-set functions which can be decomposed in terms of a body-order expansion. The multi-set function setting is particularly interesting for us since it commonly arises in the representation of particle interactions. We will show in Section 4 how to extend our symmetric function approximation results to obtain approximation rates for functions defined on multi-sets. In that setting we will have to address the simultaneous approximation of a family of related functions with increasing dimensionality. Since our framework has close connections with deepsets, our analysis may also shed new light on those architectures. We briefly comment on the connection in Remark 4.4, but do not include a deeper exploration in the present work.

2 Symmetric Functions in N\mathbb{R}^{N}

Before we formulate our most general results we consider the approximation of a smooth symmetric function f:[1,1]Nf:[-1,1]^{N}\to\mathbb{R}. By ff being symmetric we mean that

f(xσ1,,xσN)=f(x1,,xN)𝒙[1,1]N,σSym(N),f(x_{\sigma 1},\dots,x_{\sigma N})=f(x_{1},\dots,x_{N})\qquad\forall{\bm{x}}\in[-1,1]^{N},\quad\sigma\in{\rm Sym}(N), (2.1)

with Sym(N){\rm Sym}(N) denoting the symmetric group of degree NN. For later reference we define

Csym([1,1]N):={fC([1,1]N):f is symmetric}.C_{\rm sym}([-1,1]^{N}):=\big{\{}f\in C([-1,1]^{N})\colon f\text{ is symmetric}\big{\}}.

In this section we will outline the main ideas how symmetry can be optimally incorporated into approximation schemes in the simplest possible concrete setting, but will then generalize them in various ways in § 3.

A general fC([1,1]N)f\in C([-1,1]^{N}) can be expanded as a Chebyshev series,

f(𝒙)=𝒗Nf^𝒗T𝒗(𝒙),f({\bm{x}})=\sum_{{\bm{v}}\in\mathbb{N}^{N}}\hat{f}_{\bm{v}}T_{\bm{v}}({\bm{x}}),

where T𝒗=n=1NTvnT_{\bm{v}}=\otimes_{n=1}^{N}T_{v_{n}} and TvT_{v} are the standard Chebyshev polynomials of the first kind. To allow for the possibility of constructing sparse approximations we will assume that ff belongs to a Korobov class,

𝒦(M,μ,ρ):={f s.t. 𝒗Nρ𝒗1|f^𝒗|MμN},\mathcal{K}(M,\mu,\rho):=\big{\{}f\text{\leavevmode\nobreak\ s.t.\leavevmode\nobreak\ }{\textstyle\sum_{{\bm{v}}\in\mathbb{N}^{N}}}\rho^{\|{\bm{v}}\|_{1}}|\hat{f}_{\bm{v}}|\leq M\mu^{N}\big{\}}, (2.2)

which one can justify, e.g., through a suitable multi-variate generalisation of analyticity. The dependence of the upper bound on NN also arises naturally in this context.

In high-dimensional approximation one often considers spaces with weighted norms that reflect the relative importance of dimensions, which would lead to replacing 𝒗1\|{\bm{v}}\|_{1} by accordingly weighted quantities; see, e.g., Chapter 5 of [14] and the references given there. However, due to our assumption of symmetry, all dimensions are equally important. In this case, the following total degree approximation is natural: For D>0D>0, define

fD(𝒙):=𝒗:𝒗1Df^𝒗T𝒗(𝒙),f_{D}({\bm{x}}):=\sum_{{\bm{v}}:\|{\bm{v}}\|_{1}\leq D}\hat{f}_{\bm{v}}T_{\bm{v}}({\bm{x}}), (2.3)

then we immediately obtain the exponential approximation error estimate

ffDLMμNρD.\|f-f_{D}\|_{L^{\infty}}\leq M\mu^{N}\rho^{-D}. (2.4)

Although the term ρD\rho^{-D} suggests an excellent approximation rate, the curse of dimensionality still enters through the prefactor μN\mu^{N} as well as through the cost of evaluating fDf_{D}, which scales as

(N+DD){DN/N!,as D,ND/D!,as N.\binom{N+D}{D}\sim\begin{cases}D^{N}/N!,&\text{as }D\to\infty,\\ N^{D}/D!,&\text{as }N\to\infty.\end{cases}

where (N+DD)\binom{N+D}{D} is the number of terms in (2.4). The number of operations involved in each term can be reduced to 𝒪(1)\mathcal{O}(1); cf Remark. 3.6. Although this is far superior to the naive tensor product approximation leading to 𝒪(DN)\mathcal{O}(D^{N}) terms, it remains expensive in high dimensions.

Incorporating the symmetry into the parameterisation allows us to significantly reduce the number of basis functions (or, parameters): If we require that fDf_{D} inherits the symmetry (2.1) – see [7, Section 3] why this does not worsen the approximation quality – one can readily see that it is reflected in the coefficients via

f^𝒗=f^σ(𝒗)σSym(N).\hat{f}_{{\bm{v}}}=\hat{f}_{\sigma({\bm{v}})}\qquad\forall\sigma\in{\rm Sym}(N).

Thus, we can obtain a symmetrised representation

fD(𝒙)\displaystyle f_{D}({\bm{x}}) =𝒗ordN:𝒗1Dc𝒗symT𝒗(𝒙)\displaystyle=\sum_{{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D}c_{{\bm{v}}}\;\operatorname{sym}T_{{\bm{v}}}({\bm{x}}) (2.5)
where symT𝒗(𝒙)=σSym(N)T𝒗(σ𝒙)=σSym(N)Tσ𝒗(𝒙)\displaystyle\operatorname{sym}T_{{\bm{v}}}({\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}T_{{\bm{v}}}(\sigma{\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}T_{\sigma{\bm{v}}}({\bm{x}}) (2.6)

and ordN\mathbb{N}^{N}_{\rm ord} denotes the set of all ordered NN-tuples, i.e., 𝒗ordN{\bm{v}}\in\mathbb{N}^{N}_{\rm ord} if v1v2vNv_{1}\leq v_{2}\leq\dots\leq v_{N}. When 𝒗{\bm{v}} is not strictly ordered then σ𝒗=𝒗\sigma{\bm{v}}={\bm{v}} for some permutations and hence the coefficient c𝒗c_{{\bm{v}}} is different from f^𝒗\hat{f}_{{\bm{v}}}.

It is immediate to see that symT𝒗\operatorname{sym}T_{\bm{v}} form a basis of the space of symmetric polynomials, which in turn is dense in CsymC_{\rm sym}; see [7, Proposition 1] for more details.

Although the representation (2.5) significantly reduces the number of parameters c𝒗c_{\bm{v}} (almost by a factor N!N!), it does not reduce the cost of evaluating fDf_{D} due to the N!N! cost of evaluating each symmetrised basis function T𝒗symT^{\rm sym}_{{\bm{v}}}. However, an elementary idea from invariant theory leads to an alternative symmetric basis, and hence an alternative scheme to evaluate fDf_{D}, which significantly lowers the cost and appears to entirely overcome the curse of dimensionality: It is a classical result that, since fDf_{D} is a symmetric polynomial, it can be written in the form

fD(𝒙)=qD(p1,,pN),f_{D}({\bm{x}})=q_{D}(p_{1},\dots,p_{N}), (2.7)

where pn(𝒙):=j=1Nxjnp_{n}({\bm{x}}):=\sum_{j=1}^{N}x_{j}^{n} are the power sum polynomials. This representation fully exploits the symmetry and one could expand on this idea to construct an efficient evaluation scheme.

Here, we follow a closely related construction, inspired by [5], which is easier to analyze and most importantly to generalize to more complex scenarios (cf. § 3). A straightforward generalisation of the power sum polynomials is the following symmetric one-body basis,

Av(𝒙):=n=1NTv(xn),v.A_{v}({\bm{x}}):=\sum_{n=1}^{N}T_{v}(x_{n}),\qquad v\in\mathbb{N}. (2.8)
Remark 2.1.

Since (Tv(xn))n=1N\big{(}T_{v}(x_{n})\big{)}_{n=1}^{N} represents a feature vector, (2.8) is a pooling operation analogous to that introduced in [25] to impose symmetry in deep set architectures. We will say more about this connection in Remark 4.4.

Indeed, A1,,ANA_{1},\dots,A_{N} could play the same role as p1,,pNp_{1},\dots,p_{N} in (2.7), however, we use them differently by forming the products

𝑨𝒗(𝒙):=t=1NAvt(𝒙),𝒗N.{\bm{A}}_{\bm{v}}({\bm{x}}):=\prod_{t=1}^{N}A_{v_{t}}({\bm{x}}),\qquad{\bm{v}}\in\mathbb{N}^{N}. (2.9)

If σ(𝒗)\sigma({\bm{v}}) is a permutation of 𝒗{\bm{v}} then the two products 𝑨𝒗{\bm{A}}_{{\bm{v}}} and 𝑨σ(𝒗){\bm{A}}_{\sigma({\bm{v}})} coincide; that is, 𝑨𝒗{\bm{A}}_{\bm{v}} are again symmetric polynomials. In fact they form a complete basis of that space.

Lemma 2.2.

The set 𝐀:={𝐀𝐯:𝐯ordN}{\bm{A}}:=\{{\bm{A}}_{\bm{v}}\colon\,{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}\} is a complete basis of the space of symmetric polynomials.

Proof.

We begin by rewriting the naive symmetrised basis as

symT𝒗(𝒙)=σSym(N)t=1NTvt(xσt)\displaystyle\operatorname{sym}T_{{\bm{v}}}({\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}\prod_{t=1}^{N}T_{v_{t}}(x_{\sigma t}) =1N!j1jNt=1NTvt(xjt)\displaystyle=\frac{1}{N!}\sum_{j_{1}\neq\cdots\neq j_{N}}\prod_{t=1}^{N}T_{v_{t}}(x_{j_{t}})
=1N!j1,,jNt=1NTvt(xjt)+B\displaystyle=\frac{1}{N!}\sum_{j_{1},\dots,j_{N}}\prod_{t=1}^{N}T_{v_{t}}(x_{j_{t}})+B^{\prime}
=1N!𝑨𝒗(𝒙)+B,\displaystyle=\frac{1}{N!}{\bm{A}}_{{\bm{v}}}({\bm{x}})+B^{\prime},

where BB^{\prime} contains the “self-interactions”, i.e., repeated jtj_{t} indices. Therefore,

Bspan{σSym(N)j=1NTvj(xσj):at least one vj is 0}.B^{\prime}\in\text{span}\bigg{\{}\sum_{\sigma\in{\rm Sym}(N)}\prod_{j=1}^{N}T_{v_{j}}(x_{\sigma j})\,\colon\;\text{at least one $v_{j}$ is 0}\bigg{\}}.

Since T0=1,T_{0}=1, BB^{\prime} only involves terms with strictly less than NN products. Therefore, the change of basis from T𝒗symT^{\rm sym}_{\bm{v}} to 𝑨𝒗{\bm{A}}_{\bm{v}} is lower triangular with 1/N!1/N! terms on the diagonal, and is hence invertible. ∎

A second immediate observation is that the total degree 𝒗1\|{\bm{v}}\|_{1} of a tensor product basis function t=1NTvt\otimes_{t=1}^{N}T_{v_{t}} immediately translates to the 𝑨{\bm{A}} basis. That is, the total degree of 𝑨𝒗{\bm{A}}_{\bm{v}} is again 𝒗1\|{\bm{v}}\|_{1}, which yields the next result.

Corollary 2.3.

There exist coefficients c~𝐯\tilde{c}_{\bm{v}} such that

fD(𝒙)=𝒗ordN:𝒗1Dc~𝒗𝑨𝒗(𝒙).f_{D}({\bm{x}})=\sum_{{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D}\tilde{c}_{{\bm{v}}}{\bm{A}}_{{\bm{v}}}({\bm{x}}). (2.10)

In Remark 3.6 we explain that the computational cost of evaluating (2.10) is directly proportional to the the number of terms, or parameters, which we denote by

P(N,D):=#{𝒗ordN:𝒗1D}.P(N,D):=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}}.

When clear from the context we will write P=P(N,D)P=P(N,D). To estimate that number we observe that the set

{𝒗ordN:𝒗1=D}\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}=D\big{\}}

can be interpreted as the set of all integer partitions of DD, of length at most NN (indices v=0v=0 do not contribute). There exist various bounds for the number of such partitions that incorporate both NN and DD, such as [17, Theorem 4.9.2], originally presented in [1],

P(N,D)(D+N(N+1)2)N(N!)2DN(N!)2as D,P(N,D)\leq\frac{\left(D+\frac{N(N+1)}{2}\right)^{N}}{(N!)^{2}}\sim\frac{D^{N}}{(N!)^{2}}\qquad\text{as }D\rightarrow\infty, (2.11)

which (unsurprisingly) suggests that we gain an additional factor N!N! in the number of parameters and in the computational cost, compared to the total degree approximation which has asymptotic cost (N+DD)DNN!\binom{N+D}{D}\sim\frac{D^{N}}{N!} as DD\to\infty. We will return to these estimates below.

Since we are particularly interested in an NN-independent bound we will instead use a classical result of Hardy and Ramanujan [11].

Lemma 2.4.

For any N,DN,D we have

P(N,D)183Dexp(π43D).P(N,D)\leq\frac{1}{8\sqrt{3}D}\exp\left(\pi\sqrt{\textstyle{\frac{4}{3}}D}\right). (2.12)
Proof.

The result of [16, Theorem 6.8] states that, the cardinality of the set {𝒗ordN:𝒗1D}\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}} is bounded by the number of additive integer partitions of 2D2D, i.e.,

P(N,D)=#{𝒗ordN:𝒗1D}#{𝒗ordN:𝒗1=2D}.P(N,D)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}}\leq\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}=2D\big{\}}.

The result of Hardy and Ramanujan [11] estimates the latter cardinality as stated in (2.12). ∎

The key property of this bound is that it is independent of the domain dimension NN, which yields the following super-algebraic (but sub-exponential) convergence rate.

Theorem 2.5.

Let fCsym([1,1]N)𝒦(M,μ,ρ)f\in C_{\rm sym}([-1,1]^{N})\cap\mathcal{K}(M,\mu,\rho), then there exists a constant c>0c>0 such that for all DcND\geq cN, the symmetric total degree approximation (2.10) satisfies

ffDLCexp(α[logP]2),\|f-f_{D}\|_{L^{\infty}}\leq C\exp\Big{(}-\alpha[\log P]^{2}\Big{)},

where C,α>0C,\alpha>0 are independent of NN and DD but may depend on M,μ,ρM,\mu,\rho.

Proof.

From our foregoing discussion we obtain that

logffDLlogM+NlogμDlogρ.\log\|f-f_{D}\|_{L^{\infty}}\leq\log M+N\log\mu-D\log\rho.

Fix any 1<ρ¯<ρ1<\bar{\rho}<\rho, then clearly, there exists c>0c>0 such that DcND\geq cN implies

logM+NlogμDlogρlogMDlogρ¯,\log M+N\log\mu-D\log\rho\leq\log M-D\log\bar{\rho},

hence, in this regime, we obtain

ffDLMρ¯D.\|f-f_{D}\|_{L^{\infty}}\leq M\bar{\rho}^{-D}. (2.13)

Next, we estimate DD in terms of the number of parameters, using Lemma 2.4, by logPC1D\log P\leq C_{1}\sqrt{D}, which we invert to obtain

D[logPC1]2.D\geq\bigg{[}\frac{\log P}{C_{1}}\bigg{]}^{2}.

Inserting this into (2.13) completes the proof with C=MC=M and α=log(ρ¯)/C12\alpha=\log(\bar{\rho})/C_{1}^{2}. ∎

Remark 2.6.

Although the proof of Theorem 2.5 appears to sacrifice a significant amount of information by using the Hardy-Ramanujan result instead of the sharper estimate (2.11), it turns out to be sharp in the regime c1NDC1N2c_{1}N\leq D\leq C_{1}N^{2}. We will explain this observation in more detail in § 3.3.

3 General Results

3.1 A basis for symmetric functions in (d)N(\mathbb{R}^{d})^{N}

We consider the approximation of functions f(𝒙1,,𝒙N)f({\bm{x}}_{1},\dots,{\bm{x}}_{N}) where each coordinate 𝒙jΩd{\bm{x}}_{j}\in\Omega\subset\mathbb{R}^{d}, dd\in\mathbb{N}, and ff is invariant under permutations, i.e.,

f(𝒙1,,𝒙N)=f(𝒙σ1,,𝒙σN)σSym(N).f({\bm{x}}_{1},\dots,{\bm{x}}_{N})=f({\bm{x}}_{\sigma 1},\dots,{\bm{x}}_{\sigma N})\qquad\forall\sigma\in{\rm Sym}(N). (3.1)

We will indicate this scenario by saying that f:ΩNf:\Omega^{N}\to\mathbb{R} is symmetric. We assume throughout that Ω\Omega is the closure of a domain, and define the space

Csym(ΩN):={fC(ΩN):f is symmetric}.C_{\rm sym}(\Omega^{N}):=\{f\in C(\Omega^{N})\,\colon\;f\text{ is symmetric}\}.

To construct approximants we begin from a one-body basis,

Φ:={ϕv:v}.\Phi:=\big{\{}\phi_{v}\,\colon\;v\in\mathbb{N}\big{\}}.
Assumption 3.1.

To enable the convenient extension of uni-variate to multi-variate constructions we make the following assumptions which are easily justified for all basis sets that we have in mind; see Remark 3.4.

  • (Φ\Phi1)

    The set of basis functions Φ\Phi is dense in C(Ω)C(\Omega).

  • (Φ\Phi2)

    Each ϕvΦ\phi_{v}\in\Phi has an associated degree degϕv\operatorname{deg}\phi_{v}\in\mathbb{N}.

  • (Φ\Phi3)

    v=0v=0 is an admissible index, ϕ0=1\phi_{0}=1 and degϕ0=0\operatorname{deg}\phi_{0}=0.

Finally, we need a definition of “intrinsic dimensionality” of our basis, and for simplicity also require that it matches the dimensionality of the domain Ω\Omega. We therefore make the following additional assumption:

  • (Φ\Phi4)

    The number of one-body basis functions ϕv\phi_{v} of degree ii is bounded by

    cd(i)c(i+d1)(i+d2)(i+1),c_{d}(i)\leq c(i+d-1)(i+d-2)\ldots(i+1),

    where c>0c>0 is a constant.

Under assumption (Φ\Phi1), the tensor product basis functions

ΦN:={ϕ𝒗=t=1Nϕvt:𝒗N}\Phi^{\otimes N}:=\big{\{}\phi_{{\bm{v}}}=\otimes_{t=1}^{N}\phi_{v_{t}}\colon\,{\bm{v}}\in\mathbb{N}^{N}\big{\}}

then satisfy that spanΦN\operatorname{span}\Phi^{\otimes N} is dense in C(ΩN)C(\Omega^{N}) by the characterization of this space as an NN-fold injective tensor product (see, e.g., [18, §3.2]). As an immediate consequence, we obtain that the symmetrised tensor products,

sym[ΦN]:={sym[ϕ𝒗]:=1N!σSym(N)ϕ𝒗σ:𝒗N}\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]}:=\bigg{\{}\operatorname{sym}\big{[}\phi_{{\bm{v}}}\big{]}:=\frac{1}{N!}\sum_{\sigma\in{\rm Sym}(N)}\phi_{{\bm{v}}}\circ\sigma\,\colon\;{\bm{v}}\in\mathbb{N}^{N}\bigg{\}}

span the symmetric functions, that is, we have the following result.

Proposition 3.2.

spansym[ΦN]{\rm span}\,\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]} is dense in Csym(ΩN)C_{\rm sym}(\Omega^{N}), and

fsym[fD]ffD.\|f-\operatorname{sym}[f_{D}]\|\leq\|f-f_{D}\|.
Proof.

The argument is a verbatim repetition of the proof of the argument in § 2, Lemma 2.2, and of [7, Sec. 3]. ∎

An alternative symmetric many-body basis can be constructed by mimicking the construction of the power sum polynomials in § 2 (see also [5, 7] which are our inspiration for this construction),

𝑨𝒗(𝑿)\displaystyle{\bm{A}}_{\bm{v}}({\bm{X}}) =t=1NAvt,where\displaystyle=\prod_{t=1}^{N}A_{v_{t}},\qquad\text{where} (3.2)
Av(𝑿)\displaystyle A_{v}({\bm{X}}) =n=1Nϕv(𝒙n).\displaystyle=\sum_{n=1}^{N}\phi_{v}({\bm{x}}_{n}). (3.3)

We denote the corresponding basis set by

𝑨:={𝑨𝒗:𝒗ordN},{\bm{A}}:=\big{\{}{\bm{A}}_{{\bm{v}}}\,\colon\;{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}\},

where we recall that ordN:={𝒗N:v1v2}\mathbb{N}^{N}_{\rm ord}:=\{{\bm{v}}\in\mathbb{N}^{N}\,\colon\;v_{1}\leq v_{2}\leq\dots\}. For future reference we define

deg(𝒗):=t=1Ndegϕvt,for 𝒗N.{\rm deg}({\bm{v}}):=\sum_{t=1}^{N}\operatorname{deg}\phi_{v_{t}},\qquad\text{for }{\bm{v}}\in\mathbb{N}^{N}.
Proposition 3.3.

span𝑨{\rm span}\,{\bm{A}} = spansym[ΦN]{\rm span}\,\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]}, and in particular, span𝐀{\rm span}\,{\bm{A}} is dense in Csym(ΩN)C_{\rm sym}(\Omega^{N}). Moreover, 𝐀{\bm{A}} is linearly independent.

Proof.

The argument is identical to the d=1d=1 case. ∎

Remark 3.4 (Examples of Basis Sets).

(i) Tensor product Chebyshev polynomials. The first concrete case we consider is Ω=[1,1]d\Omega=[-1,1]^{d} with one-particle basis Φ:={T𝒌:=i=1dTki:𝒌d}.\Phi:=\big{\{}T_{\bm{k}}:=\otimes_{i=1}^{d}T_{k_{i}}\,\colon\,{\bm{k}}\in\mathbb{N}^{d}\big{\}}. This is the simplest setting to which our analysis applies. In this case, we would simply take degTk=k\operatorname{deg}T_{k}=k.

(ii) Two-dimensional rotational symmetry. Assuming a rotationally symmetric domain, Ω:={x2:r0|x|r1}\Omega:=\{x\in\mathbb{R}^{2}\,\colon\,r_{0}\leq|x|\leq r_{1}\}, a natural one-body basis is

Φ:={ϕnm(x)=Rnm(|x|)eimargx:n,m},\Phi:=\big{\{}\phi_{nm}(x)=R_{nm}(|x|)e^{im\,{\arg x}}\,\colon\,n\in\mathbb{N},m\in\mathbb{Z}\big{\}},

where the radial basis RnR_{n} or RnmR_{nm} could, e.g., be chosen as transformed Zernike, Chebyshev, or other orthogonal polynomials. This has the natural associated total degree definition degϕnm:=n+|m|.{\rm deg}\phi_{nm}:=n+|m|.

(iii) Three-dimensional rotational symmetry. Our final example concerns three-dimensional spherical symmetry, where the one-particle domain is given by Ω:={x3:r0|x|r1}.\Omega:=\{x\in\mathbb{R}^{3}\,\colon\,r_{0}\leq|x|\leq r_{1}\}. In this case, it is natural to employ spherical harmonics to discretize the angular component, i.e.,

Φ:={ϕnlm(x):=Rnl(|x|)Ylm(x^):n,l,m=l,,l},\Phi:=\big{\{}\phi_{nlm}(x):=R_{nl}(|x|)Y_{l}^{m}(\hat{x})\,\colon\,n,l\in\mathbb{N},m=-l,\dots,l\big{\}},

with associated degree degϕnlm:=n+l+|m|.{\rm deg}\phi_{nlm}:=n+l+|m|. Since |m|l|m|\leq l and since YlmY_{l}^{m} are coupled under rotations it is often more natural though to take degϕnlm=n+l\operatorname{deg}\phi_{nlm}=n+l, which does not change our results.

Remark 3.5 (Lie Group Symmetries).

In many physical modelling scenarios one would also like to incorporate Lie group symmetries, e.g. invariance or covariance under rotations and reflections, or under in relativity theory under the Lorentz group. This can in principle be incorporated into our analysis, but there in general the coupling between the permutation group and such Lie groups is non-trivial and it becomes difficult to give sharp results. We therefore postpone such an analysis to future works.

That said, such additional symmetry does not reduce the parameterisation complexity nearly as much as the permutation symmetry does. For example in case (ii) of the previous remark, if rotational symmetry were imposed then this would only lead to an additional constraint t=1Nmt=0\sum_{t=1}^{N}m_{t}=0 on the basis functions. Secondly, such structural sparsity changes our remarks (e.g., Remark 3.6) on computational complexity. We show in [12] that asymptotically optimal evaluation algorithms can still be constructed for those cases.

Remark 3.6 (Computational Cost).

We briefly justify our claim that the computational cost for both (2.3) and (2.10) is directly proportional to the number of parameters. For the sake of brevity we focus only on the evaluation (2.10) in two stages. First, we compute the symmetric one-body basis AvA_{v} defined in (2.8), where v=0,,maxϕvΦdeg(ϕv)v=0,\dots,\max_{\phi_{v}\in\Phi}\operatorname{deg}(\phi_{v}). According to (Φ4\Phi 4) there can be at most cd(D)c_{d}(D) elements of that basis. We shall assume that the cost of evaluating it scales as 𝒪(NDd)\mathcal{O}(ND^{d}), which can for example be achieved by suitable recursive evaluations of a polynomial basis.

Once the one-body basis AvA_{v} is precomputed and stored, the products 𝑨𝒗{\bm{A}}_{\bm{v}} can be evaluated recursively. Assume that the multi-indices are stored in a lexicographically ordered list. As we traverse the list we store the computed basis functions 𝑨𝒗{\bm{A}}_{{\bm{v}}}. For each new multi-index in the list,

𝒗=(v1,,vN),{\bm{v}}=(v_{1},\dots,v_{N}),

we express the associated basis function as

𝑨𝒗=N1Av1𝑨0,v2,,vN,{\bm{A}}_{{\bm{v}}}=N^{-1}A_{v_{1}}{\bm{A}}_{0,v_{2},\dots,v_{N}},

thus evaluating 𝑨𝒗{\bm{A}}_{{\bm{v}}} with 𝒪(1)\mathcal{O}(1) cost. Note that the multi-indices 𝒗{\bm{v}} in the total degree approximations are a downset, hence if (v1,,vN)(v_{1},\dots,v_{N}) is part of the approximation then the new multi-index (0,v2,,vN)(0,v_{2},\dots,v_{N}) is also part of the approximation. In summary we therefore obtain that the total cost in evaluating (2.10) is bounded, up to a constant, by

NDd+P(N,D).ND^{d}+P(N,D).

3.2 Approximation errors

Our starting point in the d=1d=1 case was to relate analyticity of the target function ff to approximation rates for the Chebyshev expansion. The same idea can be applied here but would be restrictive. In the multi-dimensional setting there is a far greater choice of coordinates and corresponding basis sets, which are highly problem-dependent. What is essential for our analysis is that the naive unsymmetrized (but sparse) approximation scheme has an exponential rate of convergence. In order to maintain the generality of our results we now formulate this as an assumption, which, loosely speaking encapsulates that Φ\Phi is a “good” basis to approximate smooth functions on Ω\Omega and therefore [ΦN]\big{[}\Phi^{\otimes N}\big{]} is a good basis for approximation in ΩN\Omega^{N}:

Assumption 3.7.

The target function fCsym(ΩN)f\in C_{\rm sym}(\Omega^{N}) has a sparse polynomial approximate fDspan(ΦN)f_{D}\in{\rm span}(\Phi^{\otimes N}), satisfying

ffDMeαD,\|f-f_{D}\|_{\infty}\leq Me^{-\alpha^{\prime}D}, (3.4)

where M,α>0M,\alpha^{\prime}>0, DD is the total degree of fDf_{D} defined by

D=max𝒗deg(𝒗),D=\max_{{\bm{v}}}{\rm deg}({\bm{v}}),

and the maximum is taken over all basis functions ϕ𝐯\phi_{\bm{v}} involved in the definition of fDf_{D}.

Arguing as above we have

fsym[fD]ffD,\|f-\operatorname{sym}[f_{D}]\|_{\infty}\leq\|f-f_{D}\|_{\infty},

hence we may again assume without loss of generality that fDf_{D} is symmetric. It can be represented either in terms of the basis sym[ΦN]\operatorname{sym}[\Phi^{\otimes N}] or equivalently in terms of the basis 𝑨{\bm{A}}, which optimizes the evaluation cost. This yields the representation

fD=deg(𝒗)Dc~𝒗𝑨𝒗.f_{D}=\sum_{{\rm deg}({\bm{v}})\leq D}\tilde{c}_{{\bm{v}}}{\bm{A}}_{{\bm{v}}}.

To obtain approximation rates in terms of the number of parameters,

P(N,D,d):=#{𝒗N:ordered, and deg(𝒗)D},P(N,D,d):=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}\colon\text{ordered, and }{\rm deg}({\bm{v}})\leq D\big{\}},

we generalize the one-dimensional result of Hardy and Ramanujan [11].

Theorem 3.8.

Assume that Assumption 3.1 is satisfied, in particular (Φ\Phi4) defining the intrinsic dimension dd. Then, there exists a constant βd>0{\beta_{d}}>0 such that for any N,DN,D,

P(N,D,d)=#{𝒗ordN:deg(𝒗)D}DeβdDd/(d+1).P(N,D,d)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})\leq D\big{\}}\leq De^{{\beta_{d}}D^{d/(d+1)}}. (3.5)
Proof.

We adapt the proof of Erdös presented in [8] for the estimation of the number of integer partitions to the current context. First, we define the partition function

F(x)=i=1(1+g(i,1)xi+g(i,2)x2i+),F(x)=\prod_{i=1}^{\infty}(1+g(i,1)x^{i}+g(i,2)x^{2i}+\cdots), (3.6)

where g(i,k)g(i,k) denotes the number of kk-body basis functions composed only of one-body basis functions with partial degree ii. Then, in the decomposition of F(x)F(x) as

F(x)=n0f(n)xn,F(x)=\sum_{n\geq 0}f(n)x^{n}, (3.7)

the coefficient f(n)f(n) is equal to the number of basis functions with total degree nn. Note that there is no limitations on the number of terms in the product in (3.6), which corresponds to having no restriction on NN.

Now, denoting by cd(i)c_{d}(i) the number of one-body basis functions with degree ii, the number g(i,k)g(i,k) is the number of combinations with repetitions of kk in the set {1,2,,cd(i)}\{1,2,\ldots,c_{d}(i)\}, which is

g(i,k)=(k+cd(i)1k)=(k+cd(i)1cd(i)1).g(i,k)={k+c_{d}(i)-1\choose k}={k+c_{d}(i)-1\choose c_{d}(i)-1}.

Then, for 0<x<10<x<1, a well-known identity on power series gives

k=0(k+cd(i)1cd(i)1)xk=1(cd(i)1)!dcd(i)1dxcd(i)1(1x)1=1(1x)cd(i).\displaystyle\sum_{k=0}^{\infty}{k+c_{d}(i)-1\choose c_{d}(i)-1}x^{k}=\frac{1}{(c_{d}(i)-1)!}\;\frac{\mathrm{d}^{c_{d}(i)-1}}{\mathrm{d}x^{c_{d}(i)-1}}(1-x)^{-1}=\frac{1}{(1-x)^{c_{d}(i)}}.

Hence, the partition function F(x)F(x) can be written as

F(x)=i=11(1xi)cd(i).F(x)=\prod_{i=1}^{\infty}\frac{1}{(1-x^{i})^{c_{d}(i)}}.

Taking the logarithm of F(x)F(x), we obtain

logF(x)=i=1cd(i)log(1xi),\log F(x)=-\sum_{i=1}^{\infty}c_{d}(i)\log(1-x^{i}),

from which we deduce, differentiating this expression, that

F(x)F(x)=i=1cd(i)ixi11xi.\frac{F^{\prime}(x)}{F(x)}=\sum_{i=1}^{\infty}c_{d}(i)\frac{ix^{i-1}}{1-x^{i}}.

Using the expansion (3.7), we write F(x)=n1nf(n)xn1F^{\prime}(x)=\sum_{n\geq 1}nf(n)x^{n-1}, which leads to

n1nf(n)xn1=l0i1f(l)xlcd(i)ixi11xi.\sum_{n\geq 1}nf(n)x^{n-1}=\sum_{l\geq 0}\sum_{i\geq 1}f(l)x^{l}c_{d}(i)\frac{ix^{i-1}}{1-x^{i}}.

Therefore, expanding (1xi)1(1-x^{i})^{-1} as a power series,

n1nf(n)xn1=l0i1k0f(l)cd(i)ixl+i(k+1)1.\sum_{n\geq 1}nf(n)x^{n-1}=\sum_{l\geq 0}\sum_{i\geq 1}\sum_{k\geq 0}f(l)c_{d}(i)ix^{l+i(k+1)-1}.

Multiplying by xx on both sides, and substituting k=k+1k=k+1, we obtain

n1nf(n)xn=l0i1k1f(l)cd(i)ixl+ik.\sum_{n\geq 1}nf(n)x^{n}=\sum_{l\geq 0}\sum_{i\geq 1}\sum_{k\geq 1}f(l)c_{d}(i)ix^{l+ik}.

Hence, equating the coefficients of the power series on both sides, there holds

nf(n)=i1,k1kincd(i)if(nki).nf(n)=\sum_{\begin{subarray}{c}i\geq 1,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)if(n-ki).

Let us now show by induction that f(n)eβdnαf(n)\leq e^{{\beta_{d}}n^{\alpha}}, with

βd=d+1d(cd!pdζ(d+1))1/(d+1),{\beta_{d}}=\frac{d+1}{d}\left(cd!p_{d}\zeta(d+1)\right)^{1/(d+1)}, (3.8)

where

pd=supx+xd+1ex(1ex)d+1,p_{d}=\sup_{x\in\mathbb{R}^{+}}\frac{x^{d+1}e^{-x}}{(1-e^{-x})^{d+1}}, (3.9)

cc comes from (Φ\Phi4) and α=dd+1\displaystyle\alpha=\frac{d}{d+1}.

First, f(0)=1f(0)=1, therefore the property f(n)eβdnαf(n)\leq e^{{\beta_{d}}n^{\alpha}} is satisfied for n=0n=0.

Now, we assume that for any k<nk<n, f(k)eβdkαf(k)\leq e^{{\beta_{d}}k^{\alpha}}. Using the recurrence property, there holds

nf(n)\displaystyle nf(n) =i,k1kincd(i)if(nki)i,k1kincd(i)ieβd(nki)α.\displaystyle=\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)if(n-ki)\leq\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)ie^{{\beta_{d}}(n-ki)^{\alpha}}. (3.10)

Then, using the concavity of the function xxαx\mapsto x^{\alpha} and noting that kin1\frac{ki}{n}\leq 1, we obtain

(nki)α\displaystyle(n-ki)^{\alpha} =nα(1kin)αnα(1αkin).\displaystyle=n^{\alpha}\left(1-\frac{ki}{n}\right)^{\alpha}\leq n^{\alpha}\left(1-\alpha\frac{ki}{n}\right).

Inserting the latter bound in (3.10) and using Assumption (Φ\Phi4), we deduce

nf(n)\displaystyle nf(n) ceβdnαi,k1kin(i+d1)(i+1)ieβdαkinα1.\displaystyle\leq ce^{{\beta_{d}}n^{\alpha}}\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}(i+d-1)\dots(i+1)ie^{-{\beta_{d}}\alpha kin^{\alpha-1}}.

Summing over all ii\in\mathbb{N}, using that for 0<x<10<x<1,

i1(i+d1)(i+1)ixi=x[11x](d+1)=d!x(1x)d+1,\displaystyle\sum_{i\geq 1}^{\infty}(i+d-1)\dots(i+1)ix^{i}=x\left[\frac{1}{1-x}\right]^{(d+1)}=\frac{d!\;x}{(1-x)^{d+1}},

we obtain, taking x=eβdαknα1x=e^{-{\beta_{d}}\alpha kn^{\alpha-1}},

nf(n)\displaystyle nf(n) cd!eβdnαk1eβdαknα1(1eβdαknα1)d+1.\displaystyle\leq c\;d!\;e^{{\beta_{d}}n^{\alpha}}\sum_{k\geq 1}\frac{e^{-{\beta_{d}}\alpha kn^{\alpha-1}}}{\left(1-e^{-{\beta_{d}}\alpha kn^{\alpha-1}}\right)^{d+1}}.

Introducing pdp_{d} defined in (3.9), we get

nf(n)\displaystyle nf(n) cd!pdeβdnαk11(βdαknα1)d+1\displaystyle\leq c\;d!\;p_{d}\;e^{{\beta_{d}}n^{\alpha}}\sum_{k\geq 1}\frac{1}{({\beta_{d}}\alpha kn^{\alpha-1})^{d+1}}
=cd!pdζ(d+1)βdd+1αd+1n(d+1)(1α)eβdnα.\displaystyle=c\;d!\;p_{d}\;\frac{\zeta(d+1)}{{\beta_{d}}^{d+1}\alpha^{d+1}}n^{(d+1)(1-\alpha)}e^{{\beta_{d}}n^{\alpha}}.

Since α=dd+1\alpha=\frac{d}{d+1} and βd=d+1d(cd!pdζ(d+1))1/(d+1){\beta_{d}}=\frac{d+1}{d}\left(cd!p_{d}\zeta(d+1)\right)^{1/(d+1)}, we obtain

nf(n)neβdnα,nf(n)\leq ne^{{\beta_{d}}n^{\alpha}},

from which we deduce that for all nn\in\mathbb{N}, f(n)eβdnαf(n)\leq e^{{\beta_{d}}n^{\alpha}}, i.e.

#{𝒗ordN:deg(𝒗)=D}eβdDd/(d+1).\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}}\leq e^{{\beta_{d}}D^{d/(d+1)}}.

We then easily deduce the result of the theorem. ∎

With the basis size estimate in hand, we can now readily extend the one-dimensional approximation rate.

Theorem 3.9.

Under Assumptions 3.1 and 3.7 there exist constants α,M>0\alpha,M>0 such that

ffDMeα[logP]1+1/d\|f-f_{D}\|_{\infty}\leq Me^{-\alpha[\log P]^{1+1/d}}
Proof.

This follows immediately from (3.4) and, denoting cd=maxD1βdlogDDd/(d+1)c_{d}=\max_{D}\frac{1}{\beta_{d}}\frac{\log D}{D^{d/(d+1)}}, using that logPβdDd/(d+1)+logD\log P\leq\beta_{d}D^{d/(d+1)}+\log D implies

D[logP(1+cd)βd]1+1/d-D\leq-\left[\frac{\log P}{(1+c_{d})\beta_{d}}\right]^{1+1/d}

and absorbing βd\beta_{d} into α:=α/((1+cd)βd)1+1/d\alpha:=\alpha^{\prime}/((1+c_{d})\beta_{d})^{1+1/d}. ∎

3.3 Asymptotic Results for DN1+1/dD\gg N^{1+1/d}

Our results up to this point are uniform in N,DN,D but in fact, they turn out to be sharp only in the regime of fairly moderate degree DD, specifically for DN1+1/dD\lesssim N^{1+1/d}. Our final set of results provides improved complexity and error estimates for the regime DN1+1/dD\gg N^{1+1/d}. In particular, they will also provide strong indication that Theorem 3.9 is sharp in the regime DN1+1/dD\lesssim N^{1+1/d}.

Here, and below it will be convenient to use the symbols ,,\lesssim,\gtrsim,\eqsim to mean less than, greater than, or bounded above and below up to uniform positive constants.

Our subsequent analysis is based on the fact that there exist constants c0,c1c_{0},{c_{1}} such that, for DcN1+1/dD\geq cN^{1+1/d} with cc sufficiently large,

c0NDdN(dN)!N!Pc1NDdN+1(dN)!N!.c_{0}^{N}\frac{D^{dN}}{(dN)!N!}\leq P\leq c_{1}^{N}\frac{D^{dN+1}}{(dN)!N!}. (3.11)

The lower bound is straightforward to establish. The upper bound for d=1d=1 follows immediately from (2.11). For d>1d>1, and in the limit DD\to\infty it is again straightforward to establish. We prove a slightly modified generic upper bound for d1d\geq 1 below. We then recover the upper bound (3.11) from the following theorem noting that DdN+1(dN)!(N1)!cNDdN+1(dN)!N!\frac{D^{dN+1}}{(dN)!(N-1)!}\leq c^{N}\frac{D^{dN+1}}{(dN)!N!} with c=31/31.44c=3^{1/3}\approx 1.44. (The constants in (3.11) and (3.12) are different).

Theorem 3.10.

Assume that Assumption 3.1 is satisfied, in particular (Φ\Phi4) defining the intrinsic dimension dd. Then, there exist two constants c1,c2>0c_{1},c_{2}>0 that depend on dd such that for any N,DN,D,

P(N,D,d)=#{𝒗ordN:deg(𝒗)D}Dc1N(D+c2Nd+1d)dN(dN)!(N1)!.P(N,D,d)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})\leq D\big{\}}\leq Dc_{1}^{N}\frac{(D+c_{2}N^{\frac{d+1}{d}})^{dN}}{(dN)!(N-1)!}. (3.12)
Proof.

First, we define the partition function

F(x,y)=i=1(1+g(i,1)xiy+g(i,2)x2iy2+),F(x,y)=\prod_{i=1}^{\infty}(1+g(i,1)x^{i}y+g(i,2)x^{2i}y^{2}+\cdots), (3.13)

where g(i,k)g(i,k) denotes the number of kk-body basis functions composed only of one-body basis function with partial degree ii. Then, in the decomposition of F(x,y)F(x,y) as

F(x,y)=n,m0f(n,m)xnym,F(x,y)=\sum_{n,m\geq 0}f(n,m)x^{n}y^{m}, (3.14)

the coefficient f(n,m)f(n,m) is equal to the number of basis functions with total degree nn and number of parts mm.

Now, as in the proof of Theorem 3.8, denoting by cd(i)c_{d}(i) the number of one-body basis functions of degree ii, we can write the partition function F(x,y)F(x,y) as

F(x,y)=i=11(1xiy)cd(i).F(x,y)=\prod_{i=1}^{\infty}\frac{1}{(1-x^{i}y)^{c_{d}(i)}}.

Taking the logarithm of F(x,y)F(x,y), we obtain

logF(x,y)=i=1cd(i)log(1xiy),\log F(x,y)=-\sum_{i=1}^{\infty}c_{d}(i)\log(1-x^{i}y),

from which we deduce, differentiating this expression with respect to yy, that

Fy(x,y)F(x,y)=i=1cd(i)xi1xiy.\frac{\frac{\partial F}{\partial y}(x,y)}{F(x,y)}=\sum_{i=1}^{\infty}c_{d}(i)\frac{x^{i}}{1-x^{i}y}.

Using the expansion (3.14), we write Fy(x,y)=n,m0mf(n,m)xnym1\frac{\partial F}{\partial y}(x,y)=\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m-1}, which leads to

n,m0mf(n,m)xnym1\displaystyle\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m-1} =p,q0f(p,q)xpyqi1cd(i)xi[k0(xiy)k]\displaystyle=\sum_{p,q\geq 0}f(p,q)x^{p}y^{q}\sum_{i\geq 1}c_{d}(i)x^{i}\left[\sum_{k\geq 0}(x^{i}y)^{k}\right]
=p,q0i1k0f(p,q)cd(i)xp+i(k+1)yk+q.\displaystyle=\sum_{p,q\geq 0}\sum_{i\geq 1}\sum_{k\geq 0}f(p,q)c_{d}(i)x^{p+i(k+1)}y^{k+q}.

Multiplying by yy on both sides, and substituting k+1k+1 for kk, we obtain

n,m0mf(n,m)xnym=p,q0i,k1f(p,q)cd(i)xp+ikyk+q.\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m}=\sum_{p,q\geq 0}\sum_{i,k\geq 1}f(p,q)c_{d}(i)x^{p+ik}y^{k+q}.

Hence, equating the coefficients of the power series on both sides, we conclude

mf(n,m)=i1,k1kinkmcd(i)f(nki,mk).mf(n,m)=\sum_{\begin{subarray}{c}i\geq 1,k\geq 1\\ ki\leq n\\ k\leq m\end{subarray}}c_{d}(i)f(n-ki,m-k). (3.15)

We now show by induction that for any c1,c2>0c_{1},c_{2}>0 that satisfy

c1c2d>dd and (c1c~αd)c2ddd,c_{1}c_{2}^{d}>d^{d}\;\text{ and }\;(c_{1}-\widetilde{c}\alpha_{d})c_{2}^{d}\geq d^{d}, (3.16)

where c~\tilde{c} is the constant appearing in (Φ\Phi4), α1=1\alpha_{1}=1, and αd=max{(d1)! 2d,(d1)!+d(d1)d1}\alpha_{d}=\max\{(d-1)!\,2^{d},(d-1)!+d(d-1)^{d-1}\} for d2d\geq 2, we have

f(n,m)c1m(n+c2md+1d)dm(dm)!m!.f(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}. (3.17)

Note that (3.16) holds in particular for the choice c1=1+c~αdc_{1}=1+\widetilde{c}\alpha_{d} and c2=dc_{2}=d.

First, we observe that f(0,0)=1f(0,0)=1, f(n,0)=0f(n,0)=0 for n1n\geq 1, f(0,m)=0f(0,m)=0 for m1m\geq 1, f(1,1)=cd(1)c~f(1,1)=c_{d}(1)\leq\widetilde{c}, and f(1,m)=0f(1,m)=0 for all m2m\geq 2. Thus (3.17) holds in each of these cases.

Now for arbitrary but fixed mm and nn, we assume that (3.17) holds true for any k<mk<m, i<ni<n. Using the recurrence property, starting from (3.15), and noting that cd(i)c_{d}(i) appearing in (Φ\Phi4) can be bounded by c~id1\tilde{c}i^{d-1} for some c~>0\tilde{c}>0, we find

mf(n,m)\displaystyle mf(n,m) i,k1kinkmcd(i)c1mk(nik+c2(mk)d+1d)d(mk)(d(mk))!(mk)!\displaystyle\leq\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\\ k\leq m\end{subarray}}c_{d}(i)c_{1}^{m-k}\frac{\left(n-ik+c_{2}(m-k)^{\frac{d+1}{d}}\right)^{d(m-k)}}{(d(m-k))!(m-k)!}
c~1kmc1mk(d(mk))!(mk)!S(n,k,c2(mk)d+1d,d(mk),d1),\displaystyle\leq\widetilde{c}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}}{(d(m-k))!(m-k)!}S(n,k,c_{2}(m-k)^{\frac{d+1}{d}},d(m-k),d-1), (3.18)

with

S(n,k,γ,p,q)=1ink(nik+γ)piq,S(n,k,\gamma,p,q)=\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}(n-ik+\gamma)^{p}i^{q},

where γ0\gamma\geq 0, p,q,kp,q,k\in\mathbb{N}, k1k\geq 1. For estimating this quantity, we use the following two lemmas.

Lemma 3.11.

For γ0\gamma\geq 0, n,k,p,qn,k,p,q\in\mathbb{N}, k1k\geq 1, we have

S(n,k,γ,p,q)β(q,p,k,γ)(n+γ)p+q+1(p+1)(p+2)(p+q+1),S(n,k,\gamma,p,q)\leq\beta(q,p,k,\gamma)\frac{(n+\gamma)^{p+q+1}}{(p+1)(p+2)\ldots(p+q+1)}, (3.19)

with β(0,p,k,γ)=1\beta(0,p,k,\gamma)=1, β(q,0,k,γ)=q! 2q+1\beta(q,0,k,\gamma)=q!\,2^{q+1}, and β(q,p,k,γ)=q!kq+1+ppqqkq(1+γ)(p+q+1)(p+q)p\beta(q,p,k,\gamma)=\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(1+\gamma)}\frac{(p+q+1)}{(p+q)^{p}} for p,q1p,q\geq 1.

Proof.

To prove (3.19), we introduce the function g:x(nkx+γ)pxqg:x\mapsto(n-kx+\gamma)^{p}x^{q} and compare the sum S(n,k,γ,p,q)=1inkg(i)S(n,k,\gamma,p,q)=\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}g(i) to the integral of gg. Note that (3.19) trivially holds for n=0n=0. Hence we assume n1n\geq 1. We treat three different cases depending on the values of pp and qq.

If q=0q=0, the function gg decreases on [0,nk][0,\frac{n}{k}]. Therefore,

S(n,k,γ,p,q)0nkg(x)𝑑x=[(nkx+γ)p+1(p+1)k]0nk(n+γ)p+1(p+1)k.S(n,k,\gamma,p,q)\leq\int_{0}^{\frac{n}{k}}g(x)dx=\left[-\frac{(n-kx+\gamma)^{p+1}}{(p+1)k}\right]_{0}^{\frac{n}{k}}\leq\frac{(n+\gamma)^{p+1}}{(p+1)k}.

Since k1k\geq 1, (3.19) holds in this case, with β(0,p,k,γ)=1\beta(0,p,k,\gamma)=1.

If p=0p=0, using k1k\geq 1, we obtain

S(n,k,γ,p,q)(nk+1)q+1q+1(n+1)q+1q+1nq+1(q+1)!q!(n+1n)q+1.S(n,k,\gamma,p,q)\leq\frac{(\frac{n}{k}+1)^{q+1}}{q+1}\leq\frac{(n+1)^{q+1}}{q+1}\leq\frac{n^{q+1}}{(q+1)!}q!\left(\frac{n+1}{n}\right)^{q+1}.

Hence (3.19) is satisfied with β(q,0,k,γ)=q! 2q+1\beta(q,0,k,\gamma)=q!\,2^{q+1}.

As the third case, we consider p1,q1p\geq 1,q\geq 1. For x0x\geq 0,

g(x)=(nkx+γ)p1xq1(q(n+γ)xk(p+q)),g^{\prime}(x)=(n-kx+\gamma)^{p-1}x^{q-1}\left(q(n+\gamma)-xk(p+q)\right),

so the function gg is increasing on [0,q(n+γ)k(p+q)]\left[0,\frac{q(n+\gamma)}{k(p+q)}\right], and decreasing on [q(n+γ)k(p+q),+[\left[\frac{q(n+\gamma)}{k(p+q)},+\infty\right[. In particular, the function has a single local maximum in x=q(n+γ)k(p+q)x=\frac{q(n+\gamma)}{k(p+q)}. Hence, there holds

1inkg(i)0nkg(x)𝑑x+g(q(n+γ)k(p+q)).\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}g(i)\leq\int_{0}^{\frac{n}{k}}g(x)dx+g\left(\frac{q(n+\gamma)}{k(p+q)}\right).

Integrating by parts qq times, we obtain

0nkg(x)𝑑x\displaystyle\int_{0}^{\frac{n}{k}}g(x)dx =[xq(nkx+γ)p+1k(p+1)]0nk+0nkqk(p+1)(nkx+γ)p+1xq1𝑑x\displaystyle=\left[-\frac{x^{q}(n-kx+\gamma)^{p+1}}{k(p+1)}\right]_{0}^{\frac{n}{k}}+\int_{0}^{\frac{n}{k}}\frac{q}{k(p+1)}(n-kx+\gamma)^{p+1}x^{q-1}dx
qk(p+1)0nk(nkx+γ)p+1xq1𝑑x\displaystyle\leq\frac{q}{k(p+1)}\int_{0}^{\frac{n}{k}}(n-kx+\gamma)^{p+1}x^{q-1}dx
q!kq(p+1)(p+q)0nk(nkx+γ)p+q𝑑x\displaystyle\leq\frac{q!}{k^{q}(p+1)\ldots(p+q)}\int_{0}^{\frac{n}{k}}(n-kx+\gamma)^{p+q}dx
q!(n+γ)p+q+1kq+1(p+1)(p+q+1).\displaystyle\leq\frac{q!(n+\gamma)^{p+q+1}}{k^{q+1}(p+1)\ldots(p+q+1)}.

Moreover,

g(q(n+γ)k(p+q))=(nq(n+γ)p+q+γ)p(q(n+γ)k(p+q))q=(n+γ)p+qppqqkq(p+q)p+q.g\left(\frac{q(n+\gamma)}{k(p+q)}\right)=\left(n-\frac{q(n+\gamma)}{p+q}+\gamma\right)^{p}\left(\frac{q(n+\gamma)}{k(p+q)}\right)^{q}=\left(n+\gamma\right)^{p+q}\frac{p^{p}q^{q}}{k^{q}(p+q)^{p+q}}.

Since n,k,p,q1,n,k,p,q\geq 1, we easily deduce

g(q(n+γ)k(p+q))\displaystyle g\left(\frac{q(n+\gamma)}{k(p+q)}\right) =(n+γ)p+q+1(p+1)(p+q+1)ppqqkq(n+γ)(p+1)(p+q)(p+q+1)(p+q)p+q\displaystyle=\frac{(n+\gamma)^{p+q+1}}{{(p+1)\ldots(p+q+1)}}\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{{(p+1)\ldots(p+q)(p+q+1)}}{(p+q)^{p+q}}
(n+γ)p+q+1(p+1)(p+q+1)ppqqkq(n+γ)(p+q+1)(p+q)p,\displaystyle\leq\frac{(n+\gamma)^{p+q+1}}{{(p+1)\ldots(p+q+1)}}\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{(p+q+1)}{(p+q)^{p}},

and thus

S(n,k,γ,p,q)(n+γ)p+q+1(p+1)(p+q+1)(q!kq+1+ppqqkq(n+γ)(p+q+1)(p+q)p).S(n,k,\gamma,p,q)\leq\frac{(n+\gamma)^{p+q+1}}{(p+1)\ldots(p+q+1)}\left(\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{(p+q+1)}{(p+q)^{p}}\right).

Therefore, (3.19) holds also in this third case, with β(q,p,k,γ)=q!kq+1+ppqqkq(1+γ)(p+q+1)(p+q)p\beta(q,p,k,\gamma)=\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(1+\gamma)}\frac{(p+q+1)}{(p+q)^{p}}. This concludes the proof of the lemma. ∎

We now show that the prefactor can be bounded uniformly in mm and kk.

Lemma 3.12.

For m,k,m,k\in\mathbb{N}, m,k1m,k\geq 1, we have

β(d1,d(mk),k,c2(mk)d+1d)αd,\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\leq\alpha_{d}, (3.20)

with α1=1,\alpha_{1}=1, and αd=max{(d1)! 2d,(d1)!+d(d1)d1}\alpha_{d}=\max\{(d-1)!\,2^{d},(d-1)!+d(d-1)^{d-1}\} for d2d\geq 2, provided that c21c_{2}\geq 1.

Proof.

If d=1d=1 (corresponding to q=0q=0 in the previous lemma) or k=mk=m (corresponding to p=0p=0), the estimate (3.20) holds with α1=1\alpha_{1}=1 and αd=(d1)! 2d\alpha_{d}=(d-1)!\,2^{d}, respectively, by Lemma 3.11. Now if 1k<m1\leq k<m,

β(d1,d(mk),k,c2(mk)d+1d)=(d1)!kd+[d(mk)]d(mk)(d1)(d1)kd1(1+c2(mk)d+1d)(d(mk)+d)(d(mk)+d1)d(mk)(d1)!+(d1)(d1)kd1d(mk+1)(1+c2(mk)d+1d)[d(mk)d(mk)+d1]d(mk).\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\\ =\frac{(d-1)!}{k^{d}}+\frac{[d(m-k)]^{d(m-k)}(d-1)^{(d-1)}}{k^{d-1}(1+c_{2}(m-k)^{\frac{d+1}{d}})}\frac{(d(m-k)+d)}{(d(m-k)+d-1)^{d(m-k)}}\\ \leq(d-1)!+\frac{(d-1)^{(d-1)}}{k^{d-1}}\frac{d(m-k+1)}{(1+c_{2}(m-k)^{\frac{d+1}{d}})}\left[\frac{d(m-k)}{d(m-k)+d-1}\right]^{d(m-k)}.

Noting that d(mk)d(mk)+d11\frac{d(m-k)}{d(m-k)+d-1}\leq 1, k1k\geq 1, and that d(mk+1)1+c2(mk)d+1dd\frac{d(m-k+1)}{1+c_{2}(m-k)^{\frac{d+1}{d}}}\leq d if c21c_{2}\geq 1, we obtain

β(d1,d(mk),k,c2(mk)d+1d)(d1)!+d(d1)d1,\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\leq(d-1)!+d(d-1)^{d-1},

completing the proof of (3.20). ∎

Coming back to (3.18) and using the estimates of the two lemmas, we obtain

mf(n,m)\displaystyle mf(n,m) c~αd1kmc1mk(d(mk))!(mk)!(n+c2(mk)d+1d)d(mk)+d(d(mk)+1)(d(mk)+2)(d(mk)+d)\displaystyle\leq\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}}{(d(m-k))!(m-k)!}\frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k)+d}}{(d(m-k)+1)(d(m-k)+2)\ldots(d(m-k)+d)}
=c~αd1kmc1mk(n+c2(mk)d+1d)d(mk+1)(d(mk+1))!(mk)!\displaystyle=\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(d(m-k+1))!(m-k)!}
c1m(n+c2md+1d)dm(dm)!m!c~αd1km1c1k(dm)!(d(mk+1))!m!(mk)!(n+c2(mk)d+1d)d(mk+1)(n+c2md+1d)dm.\displaystyle\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{1}{c_{1}^{k}}\frac{(dm)!}{(d(m-k+1))!}\frac{m!}{(m-k)!}\frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}.

Combining this with the estimates

(dm)!(d(mk+1))!(dm)d(k1),m!(mk)!mk,(n+c2(mk)d+1d)d(mk+1)(n+c2md+1d)dm1(n+c2md+1d)d(k1)1(c2md+1d)d(k1)\begin{gathered}\frac{(dm)!}{(d(m-k+1))!}\leq(dm)^{d(k-1)},\qquad\frac{m!}{(m-k)!}\leq m^{k},\\ \frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}\leq\frac{1}{(n+c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}\leq\frac{1}{(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}\end{gathered}

yields

mf(n,m)c1m(n+c2md+1d)dm(dm)!m!c~αd1km(dm)d(k1)mkc1k(c2md+1d)d(k1).mf(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{(dm)^{d(k-1)}m^{k}}{c_{1}^{k}(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}.

Noting that (dm)d(k1)mk(c2md+1d)d(k1)=mdd(k1)c2d(k1)\displaystyle\frac{(dm)^{d(k-1)}m^{k}}{(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}=m\frac{d^{d(k-1)}}{c_{2}^{d(k-1)}} we obtain, replacing the index kk with k1k-1,

mf(n,m)\displaystyle mf(n,m) c1m(n+c2md+1d)dm(dm)!m!c~αdmc10k(m1)(ddc1c2d)k.\displaystyle\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\frac{\alpha_{d}m}{c_{1}}\sum_{0\leq k\leq(m-1)}\left(\frac{d^{d}}{c_{1}c_{2}^{d}}\right)^{k}.

Taking c1,c2c_{1},c_{2} such that c1c2d>ddc_{1}c_{2}^{d}>d^{d} and dividing by mm yields

f(n,m)c1m(n+c2md+1d)dm(dm)!m!c~αdc111ddc1c2d.f(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\frac{\alpha_{d}}{c_{1}}\frac{1}{1-\frac{d^{d}}{c_{1}c_{2}^{d}}}.

Therefore, (3.17) is satisfied if c~αdc2dc1c2ddd,\displaystyle\widetilde{c}\alpha_{d}c_{2}^{d}\leq c_{1}c_{2}^{d}-d^{d}, that is, if c1c2dc~αdc2d+dd.\displaystyle c_{1}c_{2}^{d}\geq\widetilde{c}\alpha_{d}c_{2}^{d}+d^{d}. This completes the induction. Hence, we deduce that

#{𝒗ordN:deg(𝒗)=D}\displaystyle\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}} =m=0Nf(D,m)m=0Nc1m(D+c2md+1d)dm(dm)!m!.\displaystyle=\sum_{m=0}^{N}f(D,m)\leq\sum_{m=0}^{N}c_{1}^{m}\frac{(D+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}.

Finally, we note that a sufficient condition for mc1m(D+c2md+1d)dm(dm)!m!\mathbb{N}\ni m\mapsto c_{1}^{m}\frac{(D+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!} to be increasing is that c1c2dddc_{1}c_{2}^{d}\geq d^{d}. Since this is satisfied under by (3.16),

#{𝒗ordN:deg(𝒗)=D}c1N(D+c2Nd+1d)dN(dN)!(N1)!.\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}}\leq c_{1}^{N}\frac{(D+c_{2}N^{\frac{d+1}{d}})^{dN}}{(dN)!(N-1)!}.

Summing over the possible values of DD, we easily deduce the result of the Theorem. ∎

With this result at hand, we now consider the dependency of the number of parameters on DD.

Lemma 3.13.

Suppose that D=NtD=N^{t} with t1+1/dt\geq 1+1/d. Then, for sufficiently large DD,

logPD1/t+(t11/d)D1/tlogD1/t,\log P\eqsim D^{1/t}+(t-1-1/d)D^{1/t}\log D^{1/t}, (3.21)

uniformly for all t1+1/dt\geq 1+1/d (but the constants may depend on dd).

Proof.

From (3.11) we can estimate

logP\displaystyle\log P Nlogc1+dNlogDdNlog(dN/e)Nlog(N/e)+logD\displaystyle\leq N\log c_{1}+dN\log D-dN\log\big{(}dN/e\big{)}-N\log\big{(}N/e\big{)}+\log D
=dN(c2+logDlogNlogN1/d)+logD\displaystyle=dN\Big{(}c_{2}+\log D-\log N-\log N^{1/d}\Big{)}+\log D
=dD1/t(c2+logD11/t1/dt)+logD.\displaystyle=dD^{1/t}\Big{(}c_{2}+\log D^{1-1/t-1/dt})+\log D.

Noting that asymptotically as DD\to\infty, logDD1/t\log D\ll D^{1/t} we easily obtain the upper bound. The lower bound is analogous. ∎

Proposition 3.14.

Suppose that D=NtD=N^{t} for some t>1+1/dt>1+1/d, and that N,DN,D are sufficiently large, then

D1(t11/d)t[logPloglogP]t.D\gtrsim\frac{1}{(t-1-1/d)^{t}}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}. (3.22)
Proof.

From Lemma 3.13, for the case t>1+1/dt>1+1/d and N,DN,D sufficiently large we have

a0(t11/d)D1/tlogD1/tlogPa1(t11/d)D1/tlogD1/t.a_{0}(t-1-1/d)D^{1/t}\log D^{1/t}\leq\log P\leq a_{1}(t-1-1/d)D^{1/t}\log D^{1/t}. (3.23)

From the lower bound we deduce

log(a0(t11/d)/t)+logD1/t+loglogD1/tloglogP,\log(a_{0}(t-1-1/d)/t)+\log D^{1/t}+\log\log D^{1/t}\leq\log\log P,

and for DD sufficiently large, this implies

logD1/tloglogP,or,1loglogP1logD1/t.\log D^{1/t}\leq\log\log P,\qquad\text{or,}\qquad\frac{1}{\log\log P}\leq\frac{1}{\log D^{1/t}}.

Multiplying this with the right-hand inequality in (3.23) yields

logPloglogPa1(t11/d)D1/t,\frac{\log P}{\log\log P}\leq a_{1}(t-1-1/d)D^{1/t},

and taking this to power tt gives the stated result. ∎

In particular, in the regime D=Nt,t>1+1/dD=N^{t},t>1+1/d, if we start from the exponential convergence rate of Assumption 3.7, then we obtain that there exists a symmetric approximation fDf_{D} with

ffDexp(α(t11/d)t[logPloglogP]t),\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha(t-1-1/d)^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)}, (3.24)

for some α>0\alpha>0. This rate is clearly superior to the uniform NN-independent rate of Theorem 3.9 (in this regime at least). Moreover, as t1+1/dt\to 1+1/d it strongly hints at a kind of “phase transition”, in other words, that the behaviour of the approximation scheme significantly changes at that point.

Remark 3.15.

It is furthermore interesting to compare our result (3.24) with analogous estimates if we had not exploited symmetry, or sparsity. An analogous analysis shows that if we use a (sparse) total degree approximation, but use a naive basis instead of a symmetrized basis, then we obtain

ffDexp(α2(t1)t[logPloglogP]t),\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha_{2}(t-1)^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)},

Finally, if we drop even the sparsity, so that PDdNP\approx D^{dN}, then we obtain

ffDexp(α3tt[logPloglogP]t)\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha_{3}t^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)}

Note here that the tt dependence of the exponents highlights quite different qualitative behaviour of the three bases. Specifically, this provides strong qualitative evidence that (unsurprisingly) the sparse basis gives a significant advantage over the naive tensor product basis especially in the regime 1t1\leq t, but that it is still significantly more costly than the sparse symmetrized basis. Moreover, these estimates clearly highlight the unique advantage that the symmetrized basis provides in the regime DN1+1/dD\lesssim N^{1+1/d} treated in Theorem 3.9.

4 Approximation of multi-set functions

Our original motivation to study the approximation of symmetric functions arises in the atomic cluster expansion [5, 7] which is in fact concerned with the approximation of multi-set functions. We now study how our approximation results of the previous sections can be applied in this context. Similarly as in the previous sections, this discussion will again ignore isometry-invariance. We will focus on a general abstract setting, but employ assumptions that can be rigorously established in the setting of [5, 7].

Let MS(Ω)\mathrm{MS}(\Omega) denote the set of all multi-sets (or, msets) on Ω\Omega; i.e.,

MS(Ω):={𝒙=[x1,,xM]:M,{xj}j=1MΩ},\mathrm{MS}(\Omega):=\big{\{}{\bm{x}}=[x_{1},\dots,x_{M}]\colon M\in\mathbb{N},\{x_{j}\}_{j=1}^{M}\subset\Omega\big{\}}, (4.1)

where [x1,,xM][x_{1},\dots,x_{M}] denotes an unordered tuple or mset, e.g. describing a collection of (positions of) MM classical particles. The crucial aspect is that MM is now variable and no longer fixed. This is equivalent to the classical definition of a multiset which has the defining feature of allowing for multiple instances for each of its elements. We are interested in parameterising (approximating) mset functions

f:MS(Ω).\displaystyle f\colon\mathrm{MS}(\Omega)\to\mathbb{R}. (4.2)
Remark 4.1 (Context).

This situation arises, e.g., when modelling interactions between particles. Different local or global structures of particle systems lead to a flexible number of particles entering the range of the interaction law. It seems tempting to take the limit MM\to\infty, which leads to a mean-field-like scenario where a signal processing perspective could be of interest. However, we are interested in an intermediate situation where MM is “moderate”; say, in the range M=10M=10 to 100100, and thus this limit is not of interest. Indeed, the number of interacting particles MM can be understood as another approximation parameter, which we discuss in more detail in Remark 4.3 below.

In order to reduce the approximation of an mset function ff to our foregoing results we must first produce a representation of ff in terms of finite-dimensional symmetric components. A classical idea is the many-body or ANOVA expansion, which we will formulate as an assumption. However, we emphasize that our recent results [20] rigorously justify this assumption in the context of coarse-graining electronic structure models into interatomic potentials models. The following formulation is modelled on those results, which are summarized in Appendix A.

Assumption 4.2.

(i) For all NN, there exist symmetric VnN:ΩnV_{nN}\colon\Omega^{n}\to\mathbb{R} for n=0,,Nn=0,\dots,N, and η>0\eta>0 such that fN:MS(Ω)f_{N}\colon\mathrm{MS}(\Omega)\to\mathbb{R}, defined by

fN([x1,,xM]):=V0N+jV1N(xj)++j1<<jNVNN(xj1,,xjN)\displaystyle f_{N}([x_{1},\dots,x_{M}]):=V_{0N}+\sum_{j}V_{1N}(x_{j})+\dots+\sum_{j_{1}<\dots<j_{N}}V_{NN}(x_{j_{1}},\dots,x_{j_{N}})
satisfies|f(𝒙)fN(𝒙)|eηN\displaystyle\text{satisfies}\qquad|f(\bm{x})-f_{N}(\bm{x})|\lesssim e^{-\eta N} (4.3)

for all 𝐱MS(Ω)\bm{x}\in\mathrm{MS}(\Omega).

(ii) Moreover, we suppose that each VnNV_{nN} has a sparse polynomial approximation, VnNDV_{nND} where DD denotes the total degree, satisfying

|VnN(x1,,xn)VnND(x1,,xn)|cneαnDx1,,xnΩ\big{|}V_{nN}(x_{1},\dots,x_{n})-V_{nND}(x_{1},\dots,x_{n})\big{|}\lesssim c^{n}e^{-\alpha_{n}D}\qquad\forall x_{1},\dots,x_{n}\in\Omega (4.4)

for some c,αn>0c,\alpha_{n}>0 and n=1,,Nn=1,\dots,N. Here, V0NV_{0N} is simply a constant term and requires no approximation.

Part (i) of Assumption 4.2 is the main result of [20], while part (ii) essentially encodes the assumption that the VnNV_{nN} are analytic, with the region of analyticity encoded in αn\alpha_{n} and possibly varying. Under a suitable choice of one-particle basis, one then obtains (4.4). Note in particular, that in this context the dimensionality of the target functions VnNV_{nN} is an approximation parameter, which makes it particularly natural to consider the different regimes how DD and NN are related in the foregoing sections.

Throughout the following discussion suppose that Assumption 4.2 is satisfied. Then, for a tuple 𝑫=(D1,,DN){\bm{D}}=(D_{1},\dots,D_{N}) specifying the total degrees DnD_{n} used to approximate the components VnNV_{nN}, the cluster expansion approximation to ff is given by

fN𝑫(𝒙):=V0N+n=1Nj1<<jnVnNDn(xj1,,xjn),f_{N{\bm{D}}}(\bm{x}):=V_{0N}+\sum_{n=1}^{N}\sum_{j_{1}<\dots<j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}}), (4.5)

with VnNDnV_{nND_{n}} of the form VnNDn(xj1,,xjn)=v1,,vnc𝒗nNDnt=1nϕvt(xjt)V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})=\sum_{v_{1},\dots,v_{n}}c^{nND_{n}}_{\bm{v}}\prod_{t=1}^{n}\phi_{v_{t}}(x_{j_{t}}), for some coefficients c𝒗nNDnc^{nND_{n}}_{\bm{v}} and one-body basis functions (ϕv)v(\phi_{v})_{v\in\mathbb{N}}. We immediately deduce an approximation error estimate: for all 𝒙=[x1,,xM]MS(Ω)\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega),

|f(𝒙)fN𝑫(𝒙)|\displaystyle\left|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right| |f(𝒙)fN(𝒙)|+n=1N1j1<<jnM|(VnNVnND)(xj1,,xjn)|\displaystyle\leq|f(\bm{x})-f_{N}(\bm{x})|+\sum_{n=1}^{N}\sum_{1\leq j_{1}<\dots<j_{n}\leq M}\big{|}(V_{nN}-V_{nND})(x_{j_{1}},\dots,x_{j_{n}})\big{|}
eηN+n=1N(Mn)cneαnDn.\displaystyle\lesssim e^{-\eta N}+\sum_{n=1}^{N}{M\choose n}c^{n}e^{-\alpha_{n}D_{n}}. (4.6)

From this expression we can now minimize the computational cost subject to the constraint that the overall error is no worse than the many-body approximation error eηNe^{-\eta N} and then obtain a resulting error vs cost estimate.

Remark 4.3.

As we already hinted in Remark 4.1, the function ff is often more naturally defined on the whole of MS(d)\mathrm{MS}(\mathbb{R}^{d}) (or even on the space of infinite multi-sets) but only “weakly” depends on points far away. That is, on defining 𝒙R:=[x𝒙:|x|R]\bm{x}_{R}:=[x\in\bm{x}\colon|x|\leq R] (i.e. we remove from 𝒙\bm{x} only points outside BRB_{R}) we have

|f(𝒙)f(𝒙R)|eγR,\big{|}f(\bm{x})-f(\bm{x}_{R})\big{|}\lesssim e^{-\gamma R},

that is, the domain Ω\Omega itself becomes an approximation parameter as well. Such exponential “locality” results arise for example in modelling of interatomic interaction laws [20, 2]. For the sake of simplicity we will not incorporate this feature into our analysis, except for a natural assumption on how MM and NN are related:

If we approximate the restriction of ff to MS(BR)\mathrm{MS}(B_{R}) then we obtain

|f(𝒙)fN(𝒙R)|eγR+eηN\displaystyle|f(\bm{x})-f_{N}(\bm{x}_{R})|\lesssim e^{-\gamma R}+e^{-\eta N} (4.7)

Balancing the error, we choose γR=ηN\gamma R=\eta N. In many physical situations we can assume that particles do not cluster and this leads to the bound MRdM\lesssim R^{d}. More generally, we will therefore assume below that MM is bounded by a polynomial in NN, which will make our analysis a little more concrete.

Remark 4.4 (Connection to Deep sets).

Deep set architectures are based on the idea that set-functions f:m=0M(d)mf\colon\bigcup_{m=0}^{M}\big{(}\mathbb{R}^{d}\big{)}^{m}\to\mathbb{R} which are continuous when restricted to any fixed number of inputs are symmetric if and only if there exist continuous functions ϕ:dZ\phi\colon\mathbb{R}^{d}\to\mathbb{R}^{Z} and ρ:Z\rho\colon\mathbb{R}^{Z}\to\mathbb{R} such that

f(𝒙)=ρ(x𝒙ϕ(x)).\displaystyle f(\bm{x})=\rho\Big{(}\textstyle\sum_{x\in\bm{x}}\phi(x)\Big{)}. (4.8)

For d=1d=1, it has been shown that Z=MZ=M is sufficient [25], and there exist functions where Z=MZ=M is also necessary [22]. Motivated by this characterisation, one obtains a deep set approximation to ff by choosing a latent space dimension ZZ and learning the mappings ϕ\phi and ρ\rho, e.g. parametrised using multi-layer perceptrons.

The cluster expansion approximation fN𝑫f_{N\bm{D}} (4.5) can therefore be thought of as a special case of (4.8) with both ϕ\phi and ρ\rho polynomials. However, the conceptual key difference in our setting is that, on the one hand, ZZ is significantly larger than dd and should be taken as an approximation parameter, while on the other hand, the embedding ϕ\phi is known a priori and need not be learned. Despite these philosophical differences, one can see the approximation results of this section as approximation rates for deep set approximations.

4.1 Computational cost of the cluster expansion

The expression in (4.5) suggests that the evaluation cost scales as (MN){M\choose N} (notwithstanding the cost of evaluating the VnNV_{nN} components), but in fact a similar transformation as in § 2 allows us to reduce this to a cost that is linear in MM; see also [5, 7].

We consider a single term,

j1<<jnVnNDn(xj1,,xjn)\displaystyle\sum_{j_{1}<\dots<j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}}) =1n!j1jnVnNDn(xj1,,xjn)\displaystyle=\frac{1}{n!}\sum_{j_{1}\neq\dots\neq j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})
=j1,,jnVnNDn(xj1,,xjn)+Wn1,\displaystyle=\sum_{j_{1},\dots,j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})+W_{n-1},

where Wn1W_{n-1} contains the artificial self-interactions introduced when converting from j1jn\sum_{j_{1}\neq\dots\neq j_{n}} to j1,,jn\sum_{j_{1},\dots,j_{n}}. We will return to this term momentarily. Now, inserting the expansion of VnNDnV_{nND_{n}}, writing c𝒗c_{\bm{v}} instead of c𝒗nNDnc^{nND_{n}}_{\bm{v}}, and interchanging summation order, we obtain

j1,,jn=1MVnNDn(xj1,,xjn)\displaystyle\sum_{j_{1},\dots,j_{n}=1}^{M}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}}) =j1,,jn=1Mv1,,vnc𝒗t=1nϕvt(xjt)\displaystyle=\sum_{j_{1},\dots,j_{n}=1}^{M}\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}\phi_{v_{t}}(x_{j_{t}})
=v1,,vnc𝒗t=1nj=1Mϕvt(xj)\displaystyle=\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}\sum_{j=1}^{M}\phi_{v_{t}}(x_{j})
=v1,,vnc𝒗t=1nAv(𝒙),\displaystyle=\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}A_{v}({\bm{x}}),

where the inner-most terms (power sum polynomials, atomic base, density projection) are now computed over the full input range x1,xMx_{1},\dots x_{M} instead of only a subcluster,

Av(𝒙):=j=1Mϕv(xj).A_{v}({\bm{x}}):=\sum_{j=1}^{M}\phi_{v}(x_{j}).

The self-interaction terms Wn1W_{n-1} are simply polynomials of lower correlation-order and can be absorbed into the 𝒪(n1)\mathcal{O}(n-1) terms, provided that DnDn1D_{n}\leq D_{n-1}. Thus, we can equivalently write (4.5) as

fN𝑫=n=0N𝒗c~𝒗t=1nAvt,f_{N{\bm{D}}}=\sum_{n=0}^{N}\sum_{{\bm{v}}}\tilde{c}_{\bm{v}}\prod_{t=1}^{n}A_{v_{t}}, (4.9)

where the sum 𝒗\sum_{{\bm{v}}} ranges over all ordered tuples 𝒗{\bm{v}} of length nn, with t=1ndeg(ϕvt)Dn\sum_{t=1}^{n}{\rm deg}(\phi_{v_{t}})\leq D_{n}.

This leads to the following result, which states that the cost of evaluating fN𝑫f_{N{\bm{D}}} is the same as evaluating a single instance of each of the components VnNDnV_{nND_{n}}; that is, the sum over all possible clusters does not incur an additional cost.

Proposition 4.5.

Assume that Assumption 3.1 is satisfied, that the one-particle basis can be evaluated with 𝒪(1)\mathcal{O}(1) operations per basis function (e.g. via recursion), and that the degrees are decreasing, DnDn1D_{n}\leq D_{n-1}. Then, cluster expansion fN𝐃f_{N{\bm{D}}} can be evaluated with at most

C(MD1d+𝒫),C\bigg{(}MD_{1}^{d}+\mathcal{P}\bigg{)}, (4.10)

arithmetic operations (+,×+,\times), where 𝒫n=1NP(n,Dn,d)\mathcal{P}\coloneqq\sum_{n=1}^{N}P(n,D_{n},d) is the number of parameters used to represent the NN components VnNDnV_{nND_{n}} for n=1,,Nn=1,\dots,N and CC is some positive constant.

Remark 4.6.

The number of parameters used to represent VnNDnV_{nND_{n}} for all n=1,,Nn=1,\dots,N can be estimated using the results of the foregoing sections. In particular, this allows us to obtain error estimates in Theorems 4.7 and 4.8 in terms of the computational cost.

Proof.

Assumption 3.1 implies that there are O(D1d)O(D_{1}^{d}) one-particle basis functions, where D1=max1nNDnD_{1}=\max_{1\leq n\leq N}D_{n}. Precomputing all density projections AvA_{v} therefore requires 𝒪(MD1d)\mathcal{O}(MD_{1}^{d}) operations. The nn-correlations t=1nAvt\prod_{t=1}^{n}A_{v_{t}} can be evaluated recursively with increasing correlation-order nn, with 𝒪(1)\mathcal{O}(1) cost. Hence, the total cost of evaluating (4.5) will be as stated in (4.10). ∎

4.2 Error vs Cost estimates: special case

Having established the remarkably low computational cost of the cluster expansion in the reformulation (4.9), we can now return to the derivation of error versus cost estimates from (4.6). In order to illustrate our main results, we first consider the simplest case and suppose αn=α\alpha_{n}=\alpha is constant. Motivated by Remark 4.3, we assume NMNpN\ll M\leq N^{p} for some p>1p>1 and define Dn=D:=c1NlogMD_{n}=D:=\lceil c_{1}N\log M\rceil (independently of nn), for some c1c_{1} to be determined later. Since DnD_{n} is constant, the number of parameters for the NN-correlation contributions will dominate. Since NDN1+1/dN\ll D\ll N^{1+1/d} this puts us into the regime of the integer partition type estimates, which yields the following result.

Theorem 4.7.

Assume that αn=α\alpha_{n}=\alpha appearing in (4.4) is constant and p1p\geq 1. If we choose Dn=D=c1NlogND_{n}=D=c_{1}N\log N for c1c_{1} sufficiently large, then

|f(𝒙)fN𝑫(𝒙)|eηN𝒙=[x1,,xM]MS(Ω) with MNp.|f(\bm{x})-f_{N\bm{D}}(\bm{x})|\lesssim e^{-\eta N}\qquad\forall\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)\text{ with }M\leq N^{p}. (4.11)

In terms of the total number of free parameters, 𝒫=n=1NP(n,Dn,d)\mathcal{P}=\sum_{n=1}^{N}P(n,D_{n},d), which is also directly proportional to the computational cost, the estimate reads

|f(𝒙)fN𝑫(𝒙)|exp(η~[log𝒫]1+1/dloglog𝒫),|f(\bm{x})-f_{N\bm{D}}(\bm{x})|\lesssim\exp\Big{(}-\tilde{\eta}\frac{[\log\mathcal{P}]^{1+1/d}}{\log\log\mathcal{P}}\Big{)}, (4.12)

for all 𝐱=[x1,,xM]MS(Ω)\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega) with MNpM\leq N^{p}, for some η~>0\tilde{\eta}>0, which is still a super-alebraic rate of convergence.

Proof.

Applying (4.6) directly with Dn=DD_{n}=D for 1nN1\leq n\leq N, we obtain

|f(𝒙)fN𝑫(𝒙)|\displaystyle|f(\bm{x})-f_{N\bm{D}}(\bm{x})| eηN+n=1N(Mn)cnNαc1N\displaystyle\lesssim e^{-\eta N}+\sum_{n=1}^{N}{M\choose n}c^{n}N^{-\alpha c_{1}N}
eηN+N(max{1,c}Np)NNαc1N.\displaystyle\leq e^{-\eta N}+N({\max\{1,c\}}N^{p})^{N}N^{-\alpha c_{1}N}.

Therefore, we obtain the desired estimate (4.11) by choosing c1>0c_{1}>0 sufficiently large.

Using (3.5), there exists c2,c3>0c_{2},c_{3}>0 such that

𝒫=n=1NP(n,D,d)c1N2logNec2(c1NlogN)dd+1ec3(NlogN)dd+1.\mathcal{P}=\sum_{n=1}^{N}P(n,D,d)\leq c_{1}N^{2}\log Ne^{c_{2}(c_{1}N\log N)^{\frac{d}{d+1}}}\leq e^{c_{3}(N\log N)^{\frac{d}{d+1}}}.

That is, c3d+1dNlogN[log𝒫]1+1dc_{3}^{\frac{d+1}{d}}N\log N\geq[\log\mathcal{P}]^{1+\frac{1}{d}}. Now, if a=NlogNa=N\log N, then N=logalogNaloga=(1+loglogNlogN)alogaN=\frac{\log a}{\log N}\frac{a}{\log a}=\big{(}1+\frac{\log\log N}{\log N}\big{)}\frac{a}{\log a}. Therefore, since tloglogtlogtt\mapsto\frac{\log\log t}{\log t} is bounded for t>1+δt>1+\delta (for all δ>0\delta>0) and ttlogtt\mapsto\frac{t}{\log t} is increasing for t>et>e, there exists c4>0c_{4}>0 such that Nc4[log𝒫]1+1dloglog𝒫N\geq c_{4}\frac{[\log\mathcal{P}]^{1+\frac{1}{d}}}{\log\log\mathcal{P}} which proves (4.12). ∎

4.3 Error vs Cost estimates: General Case

To make this analysis concrete we will assume that in (4.4)

αn=α1nβ,for some β(0,1).\alpha_{n}=\alpha_{1}n^{\beta},\qquad\text{for some }\beta\in(0,1). (4.13)

For the sake of generality, we consider this more general class, which better highlights the importance of balancing degree DnD_{n} and dimensionality nn, and also motivates us to revisit and try to sharpen the analysis of Appendix A and [20] in the future. In light of Proposition 4.5 we will formulate the result in terms of the number of parameters rather than in terms of computational cost.

Theorem 4.8.

Assume that αn=α1nβ\alpha_{n}=\alpha_{1}n^{\beta} for some β(0,1)\beta\in(0,1) and p1p\geq 1. If we choose

Dn=c1{nβNif n(logN)1βNN1βlogNotherwise,D_{n}=c_{1}\begin{cases}n^{-\beta}N&\text{if }n\leq(\log N)^{-\frac{1}{\beta}}N\\ N^{1-\beta}\log N&\text{otherwise,}\end{cases} (4.14)

for c1c_{1} sufficiently large, then

|f(𝒙)fN𝑫(𝒙)|eηN𝒙=[x1,,xM]MS(Ω) with MNp.\left|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right|\lesssim e^{-\eta N}\qquad\forall\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)\text{ with }M\leq N^{p}. (4.15)

In terms of the total number of free parameters, 𝒫=n=1NP(n,Dn,d)\mathcal{P}=\sum_{n=1}^{N}P(n,D_{n},d), we have, for sufficiently large NN,

|f(𝒙)fN𝑫(𝒙)|exp(η~[log𝒫]1+β+1d)\left|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right|\lesssim\exp\left(-\tilde{\eta}[\log\mathcal{P}]^{1+\beta+\frac{1}{d}}\right) (4.16)

for all 𝐱=[x1,,xM]MS(Ω)\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega) with MNpM\leq N^{p}, for some η~>0\tilde{\eta}>0.

Proof.

Part 1: Error estimates (4.15). We balance the error by choosing 𝑫=(Dn)\bm{D}=(D_{n}) such that (Mn)cneα1nβDn1NeηN\displaystyle{M\choose n}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq\frac{1}{N}e^{-\eta N} for n=1,,Nn=1,\dots,N where c>0c>0 is the constant from Assumption 4.2. To do so, we first apply Stirling’s formula, which gives for n1n\geq 1

(Mn)cneα1nβDn=M!n!(Mn)!cneα1nβDn(ceMn)neα1nβDn.\displaystyle{M\choose n}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}=\frac{M!}{n!(M-n)!}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq\left(\frac{ceM}{n}\right)^{n}e^{-\alpha_{1}n^{\beta}D_{n}}.

Using that eN1/Ne^{-N}\leq 1/N for N0N\geq 0, we therefore choose DnD_{n} such that

(ceMn)neα1nβDne(η+1)N.\displaystyle\left(\frac{ceM}{n}\right)^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq e^{-(\eta+1)N}.

That is, to obtain the required estimate, noting that nNn\leq N, it is enough to choose

α1nβDnnlogMn+(η+logc+2)N.\alpha_{1}n^{\beta}D_{n}\geq n\log\frac{M}{n}+(\eta+\log c+2)N.

Since logMnlogNpnplogN\log\frac{M}{n}\leq\log\frac{N^{p}}{n}\leq p\log N for all 1nN1\leq n\leq N, we may instead choose (Dn)(D_{n}) such that

Dnc1nβmax{nlogN,N}D_{n}\geq c_{1}n^{-\beta}\max\left\{n\log N,N\right\}

where c1:=α11(p+η+logc+2)c_{1}:=\alpha_{1}^{-1}(p+\eta+\log c+2). Since nlogNNn\log N\leq N for all 1n(logN)1N1\leq n\leq(\log N)^{-1}N and nlogNNn\log N\geq N for all n(logN)1Nn\geq(\log N)^{-1}N, and by incorporating the condition that (Dn)(D_{n}) is decreasing, we may choose

Dn:=c1maxkn{kβNif 1k(logN)1Nk1βlogNif (logN)1NkN.D_{n}:=c_{1}\max_{k\geq n}\begin{cases}k^{-\beta}N&\text{if }1\leq k\leq(\log N)^{-1}N\\ k^{1-\beta}\log N&\text{if }(\log N)^{-1}N\leq k\leq N.\end{cases}

To conclude, we note that, for β(0,1)\beta\in(0,1), the function kkβNk\mapsto k^{-\beta}N is decreasing and kk1βlogNk\mapsto k^{1-\beta}\log N is increasing and so there exists 1n(logN)1N1\leq n^{\star}\leq(\log N)^{-1}N for which Dn=c1nβND_{n}=c_{1}n^{-\beta}N for all nnn\leq n^{\star} and Dn=c1N1βlogND_{n}=c_{1}N^{1-\beta}\log N for all nnn\geq n^{\star}. Solving (n)βN=N1βlogN(n^{\star})^{-\beta}N=N^{1-\beta}\log N yields n=(logN)1βNn^{\star}=(\log N)^{-\frac{1}{\beta}}N as required. This concludes the proof of (4.15).

Remark 4.9.

Moreover, by the same arguments, if β0\beta\leq 0, then we may choose Dn=c1N1βlogND_{n}=c_{1}N^{1-\beta}\log N for all 1nN1\leq n\leq N (which, in particular, agrees with the choice in Theorem 4.7), and if β1\beta\geq 1, we can choose

Dn=c1{nβNif n(logN)1Nn1βlogNotherwise.D_{n}=c_{1}\begin{cases}n^{-\beta}N&\text{if }n\leq(\log N)^{-1}N\\ n^{1-\beta}\log N&\text{otherwise.}\end{cases}

Part 2: Estimates on the number of parameters (4.16). Let PnP(n,Dn,d)P_{n}\coloneqq P(n,D_{n},d) be the number of parameters needed to construct VnNDnV_{nND_{n}}. According to (3.12), there exist c2,c3>0c_{2},c_{3}>0 such that

Pn\displaystyle P_{n} nc2nDn(Dn+c3nd+1d)dn(dn)!n!.\displaystyle\leq nc_{2}^{n}D_{n}\frac{(D_{n}+c_{3}n^{\frac{d+1}{d}})^{dn}}{(dn)!n!}.

Using Stirling’s estimate, we obtain

Pn(c2ed+1)nDn2πddn+12n(d+1)n 2dnmax{Dn,c3nd+1d}dn.P_{n}\leq\frac{(c_{2}e^{d+1})^{n}D_{n}}{2\pi d^{dn+\frac{1}{2}}n^{(d+1)n}}\;2^{dn}\max\{D_{n},c_{3}n^{\frac{d+1}{d}}\}^{dn}.

Hence, for some c4>0c_{4}>0,

Pn\displaystyle P_{n} ec4nDnmax{(Dndnd+1)n,1}={Dn(ec4Dndnd+1)nif 1nDndd+1Dnec4nif Dndd+1nN\displaystyle\leq e^{c_{4}n}D_{n}\max\left\{\left(\tfrac{D_{n}^{d}}{n^{d+1}}\right)^{n},1\right\}=\begin{cases}D_{n}\left(\frac{e^{c_{4}}D_{n}^{d}}{n^{d+1}}\right)^{n}&\text{if }1\leq n\leq D_{n}^{\frac{d}{d+1}}\\ D_{n}e^{c_{4}n}&\text{if }D_{n}^{\frac{d}{d+1}}\leq n\leq N\end{cases}

On the other hand, by (3.5), there exists c5>0c_{5}>0 such that

PnDnexp[c5Dndd+1],P_{n}\leq D_{n}\mathrm{exp}\big{[}c_{5}D_{n}^{\frac{d}{d+1}}\big{]}, (4.17)

for all 1nN1\leq n\leq N.

Case (i): 1n(logN)1βN1\leq n\leq(\log N)^{-\frac{1}{\beta}}N. In this case, we have following (4.14) Dn=c1nβND_{n}=c_{1}n^{-\beta}N and so nDndd+1n\leq D_{n}^{\frac{d}{d+1}} if and only if n(c1N)11+β+1/dn\leq(c_{1}N)^{\frac{1}{1+\beta+1/d}}. Therefore, assuming NN is sufficiently large such that

(c1N)11+β+1/d(logN)1βN,(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq(\log N)^{-\frac{1}{\beta}}N,

we obtain

PnDn(ec4Dndnd+1)nc1nβN(ec4(c1N)dn1+βd+d)nfor 1n(c1N)11+β+1/d.\displaystyle P_{n}\leq D_{n}\left(\frac{e^{c_{4}}D_{n}^{d}}{n^{d+1}}\right)^{n}\leq c_{1}{n^{-\beta}}N\left(\frac{e^{c_{4}}(c_{1}N)^{d}}{n^{1+\beta d+d}}\right)^{n}\qquad\textrm{for }1\leq n\leq(c_{1}N)^{\frac{1}{1+\beta+1/d}}. (4.18)

Moreover, in the case (c1N)11+β+1/dn(logN)1βN(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq n\leq(\log N)^{-\frac{1}{\beta}}N, we have

c5Dndd+1=c5(c1nβN)dd+1c5[(c1N)1β1+β+1/d]dd+1=c5(c1N)11+β+1/dc_{5}D_{n}^{\frac{d}{d+1}}=c_{5}(c_{1}n^{-\beta}N)^{\frac{d}{d+1}}\leq c_{5}\left[(c_{1}N)^{1-\frac{\beta}{1+\beta+1/d}}\right]^{\frac{d}{d+1}}=c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}

and thus using (4.17)

PnDnec5Dndd+1c1nβNec5(c1N)11+β+1/dfor (c1N)11+β+1/dn(logN)1βN.\displaystyle P_{n}\leq D_{n}e^{c_{5}D_{n}^{\frac{d}{d+1}}}\leq c_{1}n^{-\beta}Ne^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}\qquad\textrm{for }(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq n\leq{(\log N)^{-\frac{1}{\beta}}N}. (4.19)

Case (ii): (logN)1βNnN(\log N)^{-\frac{1}{\beta}}N\leq n\leq N. In this case, Dn=c1N1βlogND_{n}=c_{1}N^{1-\beta}\log N and so using (4.17)

PnDnec5Dndd+1c1N1βlogNec5(c1N1βlogN)dd+1.\displaystyle P_{n}\leq D_{n}e^{c_{5}D_{n}^{\frac{d}{d+1}}}\leq c_{1}N^{1-\beta}\log Ne^{c_{5}(c_{1}N^{1-\beta}\log N)^{\frac{d}{d+1}}}. (4.20)

In order to bound the total number of parameters needed to construct VnNDnV_{nND_{n}} for 1nN1\leq n\leq N, we consider sum of (4.18), (4.19), and (4.20) over their respective ranges of nn:

We start with (4.18). Since the function g(t):=(ctα)tg(t):=\big{(}\frac{c}{t^{\alpha}}\big{)}^{t} (for c,α>1c,\alpha>1) has a global maximum at t=1ec1/αt^{\star}=\frac{1}{e}c^{1/\alpha} with g(t)=exp[αec1α]g(t^{\star})=\mathrm{exp}\big{[}\frac{\alpha}{e}c^{\frac{1}{\alpha}}\big{]}, we have taking c=ec4(c1N)dc=e^{c_{4}}(c_{1}N)^{d} and α=1+βd+d\alpha=1+\beta d+d

n=1(c1N)11+β+1/dPn\displaystyle\sum_{n=1}^{\lfloor(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rfloor}P_{n} n=1(c1N)11+β+1/dc1nβN(ec4(c1N)dn1+βd+d)n\displaystyle\leq\sum_{n=1}^{\lfloor(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rfloor}c_{1}n^{-\beta}N\left(\frac{e^{c_{4}}(c_{1}N)^{d}}{n^{1+\beta d+d}}\right)^{n}
(c1N)1+11+β+1/dexp[1+βd+de[ec4(c1N)d]11+βd+d]\displaystyle\leq(c_{1}N)^{1+\frac{1}{1+\beta+1/d}}\cdot\mathrm{exp}\left[\frac{1+\beta d+d}{e}[e^{c_{4}}(c_{1}N)^{d}]^{\frac{1}{1+\beta d+d}}\right]
(c1N)2+β+1/d1+β+1/dec6N11+β+1/d\displaystyle\leq(c_{1}N)^{\frac{2+\beta+1/d}{1+\beta+1/d}}e^{c_{6}N^{\frac{1}{1+\beta+1/d}}} (4.21)

for some c6>0c_{6}>0. Here, we have used the bound nβ1n^{-\beta}\leq 1.

Next, we consider the sum of (4.19). On the relevant range of nn, we have nβ(c1N)β1+β+1/dn^{-\beta}\leq(c_{1}N)^{-\frac{\beta}{1+\beta+1/d}}, and thus

n=(c1N)11+β+1/d(logN)1βNPn\displaystyle\sum_{n=\lceil(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rceil}^{(\log N)^{-\frac{1}{\beta}}N}P_{n} n=(c1N)11+β+1/d(logN)1βNc1nβNec5(c1N)11+β+1/d\displaystyle\leq\sum_{n=\lceil(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rceil}^{(\log N)^{-\frac{1}{\beta}}N}c_{1}n^{-\beta}Ne^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}
N(c1N)1β1+β+1/dec5(c1N)11+β+1/d\displaystyle\leq N(c_{1}N)^{1-\frac{\beta}{1+\beta+1/d}}e^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}
N2+β+2/d1+β+1/dec7N11+β+1/d\displaystyle\lesssim N^{\frac{2+\beta+2/d}{1+\beta+1/d}}e^{c_{7}N^{\frac{1}{1+\beta+1/d}}} (4.22)

for some c7>0c_{7}>0.

Finally, summing (4.20) gives

n=(logN)1βNNPnc1N2βlogNec5(c1N1βlogN)dd+1.\displaystyle\sum_{n=(\log N)^{-\frac{1}{\beta}}N}^{N}P_{n}\leq c_{1}N^{2-\beta}\log Ne^{c_{5}(c_{1}N^{1-\beta}\log N)^{\frac{d}{d+1}}}. (4.23)

The dominant contribution (for large NN) is either (4.21) or (4.22) since (1β)dd+1<11+β+1/d(1-\beta)\frac{d}{d+1}<\frac{1}{1+\beta+1/d} for all β(0,1)\beta\in(0,1) and d1d\geq 1. In particular, there exists c8>0c_{8}>0 such that the total number of parameters needed to construct VnNDnV_{nND_{n}} for n=1,,Nn=1,\dots,N satisfies 𝒫ec8N11+β+1/d\mathcal{P}\leq e^{c_{8}N^{\frac{1}{1+\beta+1/d}}} and thus

N(1c8log𝒫)1+β+1dN\geq\Big{(}\frac{1}{c_{8}}\log\mathcal{P}\Big{)}^{1+\beta+\frac{1}{d}}

as required to prove (4.16). ∎

5 Conclusion

We have established rigorous approximation rates for sparse and symmetric polynomial approximations of symmetric functions in high dimension. What is particularly intriguing about our results is that they highlight clearly how symmetry reduces the curse of dimensionality, sometimes significantly so. Our results also build a foundation for analysing the approximation of functions defined on a configuration space (multi-sets), which we outlined as well.

Further open challenges include the incorporation of more complex symmetries, e.g., coupling permutation with Lie group symmetries such as O(d)O(d) for classical particles or O(1,d)O(1,d) for relativistic particles, as well as the identification of practical constructive algorithms to construct approximants in our context that achieve (close to) the optimal rates.

Acknowledgements

M.B. acknowledges funding by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 233630050 – TRR 146; G. D.’s work was supported by the French “Investissements d’Avenir” program, project ISITE-BFC (contract ANR-15-IDEX-0003); C.O. is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) [IDGR019381]; J.T. is supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant EP/W522594/1.

Appendix A Justification of the Many-Body Expansion

The body-ordered approximation asserted in Assumption 4.2 is motivated by recent results presented in [20] for a wide class of tight binding models. In this section, we give a brief justification of these results.

A.1 Boundedness: |VnN(xj1,,xjn)|2n|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}

Here, we will briefly note that naive body-ordered approximations satisfy the required bound. For f:MS(Ω)f\colon\mathrm{MS}(\Omega)\to\mathbb{R}, we may define the following vacuum cluster expansion:

fN(𝒙)\displaystyle f_{N}(\bm{x}) :=V0+n=1Nj1<<jnVn(xj1,,xjn)\displaystyle:=V_{0}+\sum_{n=1}^{N}\sum_{j_{1}<\dots<j_{n}}V_{n}(x_{j_{1}},\dots,x_{j_{n}}) (A.1)
whereVn(xj1,,xjn)\displaystyle\textrm{where}\qquad V_{n}(x_{j_{1}},\dots,x_{j_{n}}) :=K{j1,,jn}(1)n|K|f([xk:kK]).\displaystyle:=\sum_{K\subseteq\{j_{1},\dots,j_{n}\}}(-1)^{n-|K|}f\big{(}[x_{k}\colon k\in K]\big{)}. (A.2)

That is, the nn-body potential is obtained by considering the nn variables of interest and removing the terms that correspond to strictly smaller body-order. The alternating summation comes from the inclusion-exclusion principle. By construction, the expansion is exact for systems of size at most NN (i.e. f([x1,,xM])=fN([x1,,xM])f([x_{1},\dots,x_{M}])=f_{N}([x_{1},\dots,x_{M}]) for all MNM\leq N). Moreover, assuming ff is uniformly bounded, we obtain the required estimate |VnN(xj1,,xjn)|2n|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}.

While this classical vacuum cluster expansion is perhaps the most natural, it is not guaranteed to converge rapidly if at all. Instead, we will review recent results for the site energies for a particular class of simple electronic structure models.

A.2 Convergence: A tight-binding example

For an atomic configuration 𝒙=[x1,,xM]MS(Ω)\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega), a (two-centre) tight binding Hamiltonian (𝒙)\mathcal{H}(\bm{x}), describing the interaction between the atoms in the system, is given by the following matrix

(𝒙)ij:=h(xixj)\mathcal{H}(\bm{x})_{ij}:=h(x_{i}-x_{j})

for some smooth function h:dh\colon\mathbb{R}^{d}\to\mathbb{R} satisfying |h(ξ)|+|h(ξ)|h0eγ0|ξ||h(\xi)|+|\nabla h(\xi)|\leq h_{0}e^{-\gamma_{0}|\xi|}.

For functions o:o\colon\mathbb{R}\to\mathbb{R}, the corresponding observable, O(𝒙)O(\bm{x}), is written as the following function of the Hamiltonian:

O(𝒙):=Tro((𝒙))=i=1Mo((𝒙))ii.\displaystyle O(\bm{x}):=\mathrm{Tr}\,o\big{(}\mathcal{H}(\bm{x})\big{)}=\sum_{i=1}^{M}o\big{(}\mathcal{H}(\bm{x})\big{)}_{ii}. (A.3)

We will think of O(𝒙)O(\bm{x}) as the total energy of the system and the right-hand side of (A.3) as a site energy decomposition. We will justify Assumption 4.2 for the site energy f(𝒙):=o((𝒙))11f(\bm{x}):=o\big{(}\mathcal{H}(\bm{x})\big{)}_{11}. By approximating oo with a polynomial oN(x)k=0Nckxko_{N}(x)\coloneqq\sum_{k=0}^{N}c_{k}x^{k} of degree NN, we define the body-ordered approximation by

fN(𝒙)\displaystyle f_{N}(\bm{x}) :=oN((𝒙))11=k=0Nck[(𝒙)k]11\displaystyle:=o_{N}\big{(}\mathcal{H}(\bm{x})\big{)}_{11}=\sum_{k=0}^{N}c_{k}\big{[}\mathcal{H}(\bm{x})^{k}\big{]}_{11}
=k=0Ncki1,,ik1h(x1xi1)h(xi1xi2)h(xik2xik1)h(xik1x1),\displaystyle=\sum_{k=0}^{N}c_{k}\sum_{i_{1},\dots,i_{k-1}}h(x_{1}-x_{i_{1}})h(x_{i_{1}}-x_{i_{2}})\cdots h(x_{i_{k-2}}-x_{i_{k-1}})h(x_{i_{k-1}}-x_{1}), (A.4)

a function of body-order at most NN. Convergence results for this approximation scheme follow from the estimate:

|f(𝒙)fN(𝒙)||[o((𝒙))oN((𝒙))]11|supzσ((𝒙))|o(z)oN(z)|.|f(\bm{x})-f_{N}(\bm{x})|\leq\big{|}\big{[}o\big{(}\mathcal{H}(\bm{x})\big{)}-o_{N}\big{(}\mathcal{H}(\bm{x})\big{)}\big{]}_{11}\big{|}\leq\sup_{z\in\sigma(\mathcal{H}(\bm{x}))}|o(z)-o_{N}(z)|.

Now, if oo is an analytic function in a neighbourhood of the spectrum σ((𝒙))\sigma\big{(}\mathcal{H}(\bm{x})\big{)}, we are able to construct approximations oNo_{N} that give an exponential rate of convergence with rate depending on the region of analyticity of oo [20].

Written more explicitly, the body-ordered approximation has a similar expression to that of (A.2):

fN(𝒙)\displaystyle f_{N}(\bm{x}) =V0N+n=1N1j1<<jnVnN(xj1,,xjn)where\displaystyle=V_{0N}+\sum_{n=1}^{N-1}\sum_{j_{1}<\dots<j_{n}}V_{nN}(x_{j_{1}},\dots,x_{j_{n}})\qquad\text{where}
VnN(xj1,,xjn)\displaystyle V_{nN}(x_{j_{1}},\dots,x_{j_{n}}) :=k=0Ncki0,i1,ik:i0=ik=1,{i0,,ik}={1,j1,,jn}l=0k1h(xjlxjl+1)\displaystyle:=\sum_{k=0}^{N}c_{k}\sum_{\genfrac{}{}{0.0pt}{}{i_{0},i_{1}\dots,i_{k}\colon i_{0}=i_{k}=1,}{\{i_{0},\dots,i_{k}\}=\{1,j_{1},\dots,j_{n}\}}}\prod_{l=0}^{k-1}h(x_{j_{l}}-x_{j_{l+1}})
=K{j1,,jn}(1)n|K|oN((𝒙)|1,K)11.\displaystyle=\sum_{K\subseteq\{j_{1},\dots,j_{n}\}}(-1)^{n-|K|}o_{N}\big{(}\mathcal{H}(\bm{x})|_{1,K}\big{)}_{11}. (A.5)

Here, (𝒙)|1,K\mathcal{H}(\bm{x})|_{1,K} is the restriction of (𝒙)\mathcal{H}(\bm{x}) to {1}K\{1\}\cup K defined as [(𝒙)|1,K]ij=(𝒙)ij[\mathcal{H}(\bm{x})|_{1,K}]_{ij}=\mathcal{H}(\bm{x})_{ij} if i,j{1}Ki,j\in\{1\}\cup K and [(𝒙)|1,K]ij=0[\mathcal{H}(\bm{x})|_{1,K}]_{ij}=0 otherwise. Therefore, in this expression, VnN(xj1,,xjn)V_{nN}(x_{j_{1}},\dots,x_{j_{n}}) is an (n+1)(n+1)-body potential of the central atom x1x_{1} and xj1,,xjnx_{j_{1}},\dots,x_{j_{n}} (some authors therefore call VnNV_{nN} an nn-correlation potential). The final line in (A.5) follows from an inclusion-exclusion principle [20]. In particular, the boundedness of the VnNV_{nN} follows from the boundedness of oNo_{N} [20], as in (A.2). Moreover, VnNV_{nN} inherits the analyticity properties of the Hamiltonian.

We have therefore seen that the site energies in the tight binding framework satisfy both the conditions in Assumption 4.2: (i) convergence of a body-ordered approximation, with an exponential rate, and (ii) the VnNV_{nN} are analytic (with region of analyticity depending on the regularity of the Hamiltonian).

References

  • [1] A. G. Beged-dov, Lower and upper bounds for the number of lattice points in a simplex, SIAM J. Appl. Math., 22 (1972), pp. 106–108.
  • [2] H. Chen and C. Ortner, QM/MM methods for crystalline defects. part 1: Locality of the tight binding model, Multiscale Model. Simul., 14 (2016), pp. 232–264.
  • [3] A. Cohen and R. DeVore, Approximation of high-dimensional parametric PDEs *, Acta Numer., 24 (2015), pp. 1–159.
  • [4] H. Derksen and G. Kemper, Computational Invariant Theory, Springer, Dec. 2015.
  • [5] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Phys. Rev. B Condens. Matter, 99 (2019), p. 014104.
  • [6] R. Drautz and C. Ortner. in preparation.
  • [7] G. Dusson, M. Bachmayr, G. Csanyi, R. Drautz, S. Etter, C. van der Oord, and C. Ortner, Atomic cluster expansion: Completeness, efficiency and stability, J. Comp. Phys., 454 (2022).
  • [8] P. Erdos, On an elementary proof of some asymptotic formulas in the theory of partitions, Ann. Math., 43 (1942), pp. 437–450.
  • [9] M. Griebel and J. Oettershagen, On tensor product approximation of analytic functions, Journal of Approximation Theory, 207 (2016), pp. 348–379.
  • [10] J. Han, Y. Li, L. Lin, J. Lu, J. Zhang, and L. Zhang, Universal approximation of symmetric and anti-symmetric functions, arXiv e-prints, 1912.01765 (2019).
  • [11] G. H. Hardy and S. Ramanujan, Asymptotic formulaæ in combinatory analysis, Proceedings of the London Mathematical Society, (1918).
  • [12] I. Kaliuzhnyi and C. Ortner, Optimal evaluation of symmetry-adapted nn-correlations via recursive contraction of sparse symmetric tensors. arXiv:2202.04140.
  • [13] J. C. Mason, Near-best multivariate approximation by fourier series, chebyshev series and chebyshev interpolation, J. Approx. Theory, 28 (1980), pp. 349–358.
  • [14] E. Novak and H. Woźniakowski, Tractability of multivariate problems. Vol. 1: Linear information, vol. 6 of EMS Tracts in Mathematics, European Mathematical Society (EMS), Zürich, 2008.
  • [15] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [16] R B J and A. Slomson, How to Count: An Introduction to Combinatorics, Second Edition, CRC Press, July 2011.
  • [17] J. L. Ramírez Alfonsín, The Diophantine Frobenius Problem, OUP Oxford, Dec. 2005.
  • [18] R. A. Ryan, Introduction to Tensor Products of Banach Spaces, Springer Monographs in Mathematics, Springer London, 2002.
  • [19] B. Sagan, The symmetric group: representations, combinatorial algorithms, and symmetric functions, vol. 203, Springer Science & Business Media, 2001.
  • [20] J. Thomas, H. Chen, and C. Ortner, Rigorous body-order approximations of an electronic structure potential energy landscape, arXiv e-prints, 2106.12572 (2021).
  • [21] L. Trefethen, Multivariate polynomial approximation in the hypercube, Proc. Am. Math. Soc., (2017).
  • [22] E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, and M. A. Osborne, On the limitations of representing functions on sets, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 6487–6494.
  • [23] M. Weimar, The complexity of linear tensor product problems in (anti)symmetric Hilbert spaces, J. Approx. Theory, 164 (2012), pp. 1345–1368.
  • [24] A. P. Yutsis and A. A. Bandzaitis, Theory of angular momentum in quantum mechanics, Vil’nyus, (1965).
  • [25] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017.