Polynomial Approximation of Symmetric Functions

Markus Bachmayr Institut für Geometrie und Praktische Mathematik, RWTH Aachen University, Templergraben 55, 52056 Aachen, Germany Geneviève Dusson¹¹1[email protected] Laboratoire de Mathématiques de Besançon, UMR CNRS 6623, Université Bourgogne Franche-Comté, 16 route de Gray, 25030 Besançon, France Christoph Ortner Department of Mathematics, University of British Columbia, 1984 Mathematics Road, Vancouver, BC, V6T 1Z2, Canada Jack Thomas Mathematics Institute, Zeeman Building, University of Warwick, CV4 7AL, UK

Abstract

We study the polynomial approximation of symmetric multivariate functions and of multi-set functions. Specifically, we consider $f(x_{1},\dots,x_{N})$ , where $x_{i}\in\mathbb{R}^{d}$ , and $f$ is invariant under permutations of its $N$ arguments. We demonstrate how these symmetries can be exploited to improve the cost versus error ratio in a polynomial approximation of the function $f$ , and in particular study the dependence of that ratio on $d,N$ and the polynomial degree. These results are then used to construct approximations and prove approximation rates for functions defined on multi-sets where $N$ becomes a parameter of the input.

1 Introduction

Many quantities of interest in sciences and engineering exhibit symmetries. The approximation of such quantities can be made more efficient when these symmetries are correctly exploited. A typical problem we have in mind is the approximation of particle models, in particular interatomic potentials, or Hamiltonians in quantum chemistry, materials science or bio-chemistry. Similar symmetries are present in $n$ -point correlation or cumulant functions of stochastic processes. Our current work focuses on polynomial approximation of multivariate functions that are symmetric under arbitrary permutations of coordinates, as a foundation for more complex symmetries. Even though our focus in the present work is on symmetric functions, our results are also directly relevant for anti-symmetric wave functions [6]: first, our complexity estimates immediately apply to anti-symmetric parameterisations; and secondly, many successful nonlinear parameterisations of wave functions (e.g., Jastrow or backflow) are in terms of symmetric components.

The performance of an approximation can be measured in different ways. One can measure the number of degrees of freedom required to achieve a given accuracy, in a target norm. A second important factor is the evaluation cost of the approximate function. Indeed, one may reduce the number of basis functions required to approximate a given function within a given tolerance while greatly increasing the evaluation cost. Therefore, a good compromise between these two aspects is particularly important.

Thus, the aim of this article is to show how the approximation of permutation invariant functions can be made efficient by combining two elements: First, a particular symmetrisation in the evaluation of the function leading to a linear evaluation cost of each basis function with respect to the number of variables; and second, profiting from the symmetries to speed the convergence of the approximation with respect to the number of basis functions and evaluation cost.

For physical models, one should also incorporate isometry invariance into the analysis, however this would make the analysis significantly more complex while only marginally improving our results. Thus for the sake of simplicity, the present work will only consider permutation invariant functions (symmetric functions) and we refer to [24, 7] to explain how invariance under $O(d)$ (or other groups) can in principle also be incorporated into our framework.

In general, multivariate approximation of functions by polynomials on product domains (see for example [13, 9, 21]) is – even for analytic approximands – subject to the curse of dimensionality: the number of parameters necessary to reach a given accuracy increases exponentially with the space dimension. One possibility to avoid this effect when approximating smooth high-dimensional functions is to exploit anisotropy in the different dimensions [3, 9]. In the present setting, however, due to the symmetry all dimensions play an equivalent role. Approximations of symmetric functions have been studied in [10] in a context of nonlinear approximation, where the authors provide bounds on the number of parameters needed to approximate symmetric or anti-symmetric functions within a target accuracy. These results also exhibit the curse of dimensionality. However, note that due to the two different sets of assumptions used in [10] and our work, it is not straightforward to directly compare the results. Related to our own work is the study of deep set architectures [25, 15] for approximating set-functions, where the symmetry of the approximation is enforced by summation in a latent space. Theoretical results on deep sets [25, 22] relate the maximum number of elements in the set input to the dimension of the latent space (which is analogous to the total degree of our approximation) that is required to represent the function exactly, but no error estimates are available so far.

In the present work we develop a rigorous approximation theory for symmetric (and multi-set) functions that directly relates the number of parameters to the approximation error, independently of the space dimension (or the number of inputs). We also note that such efficient representations of symmetric functions can heavily rely on invariant theory [4] and group theory [19]. We will start in Section 2 by considering functions with full permutation symmetry, that is, invariance with respect to the permutation of any two variables. In Section 3, we will provide generic results for functions exhibiting symmetry with respect to permutations of vectorial arguments $(\mathbb{R}^{d})^{N}$ , which is the most typical situation for physical models. Here, $d$ is the dimension of each argument $x_{i}$ , which could for example represent the position or momentum of a particle, while $N$ denotes the number of such arguments. Our analysis is particularly concerned with the question of how the two dimensions $d,N$ and the polynomial degree $D$ are connected in terms of approximation error and computational cost. In this regard, our point of view differs from the one of independent accuracy and dimensionality parameters, as taken for instance in [23, 10].

Our primary motivation for this study is as a foundation for the approximation of extremely high-dimensional functions and of multi-set functions which can be decomposed in terms of a body-order expansion. The multi-set function setting is particularly interesting for us since it commonly arises in the representation of particle interactions. We will show in Section 4 how to extend our symmetric function approximation results to obtain approximation rates for functions defined on multi-sets. In that setting we will have to address the simultaneous approximation of a family of related functions with increasing dimensionality. Since our framework has close connections with deepsets, our analysis may also shed new light on those architectures. We briefly comment on the connection in Remark 4.4, but do not include a deeper exploration in the present work.

2 Symmetric Functions in $\mathbb{R}^{N}$

Before we formulate our most general results we consider the approximation of a smooth symmetric function $f:[-1,1]^{N}\to\mathbb{R}$ . By $f$ being symmetric we mean that

f(x_{\sigma 1},\dots,x_{\sigma N})=f(x_{1},\dots,x_{N})\qquad\forall{\bm{x}}\in[-1,1]^{N},\quad\sigma\in{\rm Sym}(N),

(2.1)

with ${\rm Sym}(N)$ denoting the symmetric group of degree $N$ . For later reference we define

C_{\rm sym}([-1,1]^{N}):=\big{\{}f\in C([-1,1]^{N})\colon f\text{ is symmetric}\big{\}}.

In this section we will outline the main ideas how symmetry can be optimally incorporated into approximation schemes in the simplest possible concrete setting, but will then generalize them in various ways in § 3.

A general $f\in C([-1,1]^{N})$ can be expanded as a Chebyshev series,

f({\bm{x}})=\sum_{{\bm{v}}\in\mathbb{N}^{N}}\hat{f}_{\bm{v}}T_{\bm{v}}({\bm{x}}),

where $T_{\bm{v}}=\otimes_{n=1}^{N}T_{v_{n}}$ and $T_{v}$ are the standard Chebyshev polynomials of the first kind. To allow for the possibility of constructing sparse approximations we will assume that $f$ belongs to a Korobov class,

\mathcal{K}(M,\mu,\rho):=\big{\{}f\text{\leavevmode\nobreak\ s.t.\leavevmode\nobreak\ }{\textstyle\sum_{{\bm{v}}\in\mathbb{N}^{N}}}\rho^{\|{\bm{v}}\|_{1}}|\hat{f}_{\bm{v}}|\leq M\mu^{N}\big{\}},

(2.2)

which one can justify, e.g., through a suitable multi-variate generalisation of analyticity. The dependence of the upper bound on $N$ also arises naturally in this context.

In high-dimensional approximation one often considers spaces with weighted norms that reflect the relative importance of dimensions, which would lead to replacing $\|{\bm{v}}\|_{1}$ by accordingly weighted quantities; see, e.g., Chapter 5 of [14] and the references given there. However, due to our assumption of symmetry, all dimensions are equally important. In this case, the following total degree approximation is natural: For $D>0$ , define

f_{D}({\bm{x}}):=\sum_{{\bm{v}}:\|{\bm{v}}\|_{1}\leq D}\hat{f}_{\bm{v}}T_{\bm{v}}({\bm{x}}),

(2.3)

then we immediately obtain the exponential approximation error estimate

\|f-f_{D}\|_{L^{\infty}}\leq M\mu^{N}\rho^{-D}.

(2.4)

Although the term $\rho^{-D}$ suggests an excellent approximation rate, the curse of dimensionality still enters through the prefactor $\mu^{N}$ as well as through the cost of evaluating $f_{D}$ , which scales as

\binom{N+D}{D}\sim\begin{cases}D^{N}/N!,&\text{as }D\to\infty,\\ N^{D}/D!,&\text{as }N\to\infty.\end{cases}

where $\binom{N+D}{D}$ is the number of terms in (2.4). The number of operations involved in each term can be reduced to $\mathcal{O}(1)$ ; cf Remark. 3.6. Although this is far superior to the naive tensor product approximation leading to $\mathcal{O}(D^{N})$ terms, it remains expensive in high dimensions.

Incorporating the symmetry into the parameterisation allows us to significantly reduce the number of basis functions (or, parameters): If we require that $f_{D}$ inherits the symmetry (2.1) – see [7, Section 3] why this does not worsen the approximation quality – one can readily see that it is reflected in the coefficients via

\hat{f}_{{\bm{v}}}=\hat{f}_{\sigma({\bm{v}})}\qquad\forall\sigma\in{\rm Sym}(N).

Thus, we can obtain a symmetrised representation

	$\displaystyle f_{D}({\bm{x}})$	$\displaystyle=\sum_{{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\\|{\bm{v}}\\|_{1}\leq D}c_{{\bm{v}}}\;\operatorname{sym}T_{{\bm{v}}}({\bm{x}})$		(2.5)
	where	$\displaystyle\operatorname{sym}T_{{\bm{v}}}({\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}T_{{\bm{v}}}(\sigma{\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}T_{\sigma{\bm{v}}}({\bm{x}})$		(2.6)

and $\mathbb{N}^{N}_{\rm ord}$ denotes the set of all ordered $N$ -tuples, i.e., ${\bm{v}}\in\mathbb{N}^{N}_{\rm ord}$ if $v_{1}\leq v_{2}\leq\dots\leq v_{N}$ . When ${\bm{v}}$ is not strictly ordered then $\sigma{\bm{v}}={\bm{v}}$ for some permutations and hence the coefficient $c_{{\bm{v}}}$ is different from $\hat{f}_{{\bm{v}}}$ .

It is immediate to see that $\operatorname{sym}T_{\bm{v}}$ form a basis of the space of symmetric polynomials, which in turn is dense in $C_{\rm sym}$ ; see [7, Proposition 1] for more details.

Although the representation (2.5) significantly reduces the number of parameters $c_{\bm{v}}$ (almost by a factor $N!$ ), it does not reduce the cost of evaluating $f_{D}$ due to the $N!$ cost of evaluating each symmetrised basis function $T^{\rm sym}_{{\bm{v}}}$ . However, an elementary idea from invariant theory leads to an alternative symmetric basis, and hence an alternative scheme to evaluate $f_{D}$ , which significantly lowers the cost and appears to entirely overcome the curse of dimensionality: It is a classical result that, since $f_{D}$ is a symmetric polynomial, it can be written in the form

f_{D}({\bm{x}})=q_{D}(p_{1},\dots,p_{N}),

(2.7)

where $p_{n}({\bm{x}}):=\sum_{j=1}^{N}x_{j}^{n}$ are the power sum polynomials. This representation fully exploits the symmetry and one could expand on this idea to construct an efficient evaluation scheme.

Here, we follow a closely related construction, inspired by [5], which is easier to analyze and most importantly to generalize to more complex scenarios (cf. § 3). A straightforward generalisation of the power sum polynomials is the following symmetric one-body basis,

A_{v}({\bm{x}}):=\sum_{n=1}^{N}T_{v}(x_{n}),\qquad v\in\mathbb{N}.

(2.8)

Remark 2.1.

Since $\big{(}T_{v}(x_{n})\big{)}_{n=1}^{N}$ represents a feature vector, (2.8) is a pooling operation analogous to that introduced in [25] to impose symmetry in deep set architectures. We will say more about this connection in Remark 4.4.

Indeed, $A_{1},\dots,A_{N}$ could play the same role as $p_{1},\dots,p_{N}$ in (2.7), however, we use them differently by forming the products

{\bm{A}}_{\bm{v}}({\bm{x}}):=\prod_{t=1}^{N}A_{v_{t}}({\bm{x}}),\qquad{\bm{v}}\in\mathbb{N}^{N}.

(2.9)

If $\sigma({\bm{v}})$ is a permutation of ${\bm{v}}$ then the two products ${\bm{A}}_{{\bm{v}}}$ and ${\bm{A}}_{\sigma({\bm{v}})}$ coincide; that is, ${\bm{A}}_{\bm{v}}$ are again symmetric polynomials. In fact they form a complete basis of that space.

Lemma 2.2.

The set ${\bm{A}}:=\{{\bm{A}}_{\bm{v}}\colon\,{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}\}$ is a complete basis of the space of symmetric polynomials.

Proof.

We begin by rewriting the naive symmetrised basis as

	$\displaystyle\operatorname{sym}T_{{\bm{v}}}({\bm{x}})=\sum_{\sigma\in{\rm Sym}(N)}\prod_{t=1}^{N}T_{v_{t}}(x_{\sigma t})$	$\displaystyle=\frac{1}{N!}\sum_{j_{1}\neq\cdots\neq j_{N}}\prod_{t=1}^{N}T_{v_{t}}(x_{j_{t}})$
		$\displaystyle=\frac{1}{N!}\sum_{j_{1},\dots,j_{N}}\prod_{t=1}^{N}T_{v_{t}}(x_{j_{t}})+B^{\prime}$
		$\displaystyle=\frac{1}{N!}{\bm{A}}_{{\bm{v}}}({\bm{x}})+B^{\prime},$

where $B^{\prime}$ contains the “self-interactions”, i.e., repeated $j_{t}$ indices. Therefore,

B^{\prime}\in\text{span}\bigg{\{}\sum_{\sigma\in{\rm Sym}(N)}\prod_{j=1}^{N}T_{v_{j}}(x_{\sigma j})\,\colon\;\text{at least one $v_{j}$ is 0}\bigg{\}}.

Since $T_{0}=1,$ $B^{\prime}$ only involves terms with strictly less than $N$ products. Therefore, the change of basis from $T^{\rm sym}_{\bm{v}}$ to ${\bm{A}}_{\bm{v}}$ is lower triangular with $1/N!$ terms on the diagonal, and is hence invertible. ∎

A second immediate observation is that the total degree $\|{\bm{v}}\|_{1}$ of a tensor product basis function $\otimes_{t=1}^{N}T_{v_{t}}$ immediately translates to the ${\bm{A}}$ basis. That is, the total degree of ${\bm{A}}_{\bm{v}}$ is again $\|{\bm{v}}\|_{1}$ , which yields the next result.

Corollary 2.3.

There exist coefficients $\tilde{c}_{\bm{v}}$ such that

f_{D}({\bm{x}})=\sum_{{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D}\tilde{c}_{{\bm{v}}}{\bm{A}}_{{\bm{v}}}({\bm{x}}).

(2.10)

In Remark 3.6 we explain that the computational cost of evaluating (2.10) is directly proportional to the the number of terms, or parameters, which we denote by

P(N,D):=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}}.

When clear from the context we will write $P=P(N,D)$ . To estimate that number we observe that the set

\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}=D\big{\}}

can be interpreted as the set of all integer partitions of $D$ , of length at most $N$ (indices $v=0$ do not contribute). There exist various bounds for the number of such partitions that incorporate both $N$ and $D$ , such as [17, Theorem 4.9.2], originally presented in [1],

P(N,D)\leq\frac{\left(D+\frac{N(N+1)}{2}\right)^{N}}{(N!)^{2}}\sim\frac{D^{N}}{(N!)^{2}}\qquad\text{as }D\rightarrow\infty,

(2.11)

which (unsurprisingly) suggests that we gain an additional factor $N!$ in the number of parameters and in the computational cost, compared to the total degree approximation which has asymptotic cost $\binom{N+D}{D}\sim\frac{D^{N}}{N!}$ as $D\to\infty$ . We will return to these estimates below.

Since we are particularly interested in an $N$ -independent bound we will instead use a classical result of Hardy and Ramanujan [11].

Lemma 2.4.

For any $N,D$ we have

P(N,D)\leq\frac{1}{8\sqrt{3}D}\exp\left(\pi\sqrt{\textstyle{\frac{4}{3}}D}\right).

(2.12)

Proof.

The result of [16, Theorem 6.8] states that, the cardinality of the set $\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}}$ is bounded by the number of additive integer partitions of $2D$ , i.e.,

P(N,D)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}\leq D\big{\}}\leq\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:\|{\bm{v}}\|_{1}=2D\big{\}}.

The result of Hardy and Ramanujan [11] estimates the latter cardinality as stated in (2.12). ∎

The key property of this bound is that it is independent of the domain dimension $N$ , which yields the following super-algebraic (but sub-exponential) convergence rate.

Theorem 2.5.

Let $f\in C_{\rm sym}([-1,1]^{N})\cap\mathcal{K}(M,\mu,\rho)$ , then there exists a constant $c>0$ such that for all $D\geq cN$ , the symmetric total degree approximation (2.10) satisfies

\|f-f_{D}\|_{L^{\infty}}\leq C\exp\Big{(}-\alpha[\log P]^{2}\Big{)},

where $C,\alpha>0$ are independent of $N$ and $D$ but may depend on $M,\mu,\rho$ .

Proof.

From our foregoing discussion we obtain that

\log\|f-f_{D}\|_{L^{\infty}}\leq\log M+N\log\mu-D\log\rho.

Fix any $1<\bar{\rho}<\rho$ , then clearly, there exists $c>0$ such that $D\geq cN$ implies

\log M+N\log\mu-D\log\rho\leq\log M-D\log\bar{\rho},

hence, in this regime, we obtain

\|f-f_{D}\|_{L^{\infty}}\leq M\bar{\rho}^{-D}.

(2.13)

Next, we estimate $D$ in terms of the number of parameters, using Lemma 2.4, by $\log P\leq C_{1}\sqrt{D}$ , which we invert to obtain

D\geq\bigg{[}\frac{\log P}{C_{1}}\bigg{]}^{2}.

Inserting this into (2.13) completes the proof with $C=M$ and $\alpha=\log(\bar{\rho})/C_{1}^{2}$ . ∎

Remark 2.6.

Although the proof of Theorem 2.5 appears to sacrifice a significant amount of information by using the Hardy-Ramanujan result instead of the sharper estimate (2.11), it turns out to be sharp in the regime $c_{1}N\leq D\leq C_{1}N^{2}$ . We will explain this observation in more detail in § 3.3.

3 General Results

3.1 A basis for symmetric functions in $(\mathbb{R}^{d})^{N}$

We consider the approximation of functions $f({\bm{x}}_{1},\dots,{\bm{x}}_{N})$ where each coordinate ${\bm{x}}_{j}\in\Omega\subset\mathbb{R}^{d}$ , $d\in\mathbb{N}$ , and $f$ is invariant under permutations, i.e.,

f({\bm{x}}_{1},\dots,{\bm{x}}_{N})=f({\bm{x}}_{\sigma 1},\dots,{\bm{x}}_{\sigma N})\qquad\forall\sigma\in{\rm Sym}(N).

(3.1)

We will indicate this scenario by saying that $f:\Omega^{N}\to\mathbb{R}$ is symmetric. We assume throughout that $\Omega$ is the closure of a domain, and define the space

C_{\rm sym}(\Omega^{N}):=\{f\in C(\Omega^{N})\,\colon\;f\text{ is symmetric}\}.

To construct approximants we begin from a one-body basis,

\Phi:=\big{\{}\phi_{v}\,\colon\;v\in\mathbb{N}\big{\}}.

Assumption 3.1.

To enable the convenient extension of uni-variate to multi-variate constructions we make the following assumptions which are easily justified for all basis sets that we have in mind; see Remark 3.4.

( $\Phi$ 1)

The set of basis functions $\Phi$ is dense in $C(\Omega)$ .
( $\Phi$ 2)

Each $\phi_{v}\in\Phi$ has an associated degree $\operatorname{deg}\phi_{v}\in\mathbb{N}$ .
( $\Phi$ 3)

$v=0$ is an admissible index, $\phi_{0}=1$ and $\operatorname{deg}\phi_{0}=0$ .

Finally, we need a definition of “intrinsic dimensionality” of our basis, and for simplicity also require that it matches the dimensionality of the domain $\Omega$ . We therefore make the following additional assumption:

( $\Phi$ 4)

The number of one-body basis functions $\phi_{v}$ of degree $i$ is bounded by

$c_{d}(i)\leq c(i+d-1)(i+d-2)\ldots(i+1),$

where $c>0$ is a constant.

Under assumption ( $\Phi$ 1), the tensor product basis functions

\Phi^{\otimes N}:=\big{\{}\phi_{{\bm{v}}}=\otimes_{t=1}^{N}\phi_{v_{t}}\colon\,{\bm{v}}\in\mathbb{N}^{N}\big{\}}

then satisfy that $\operatorname{span}\Phi^{\otimes N}$ is dense in $C(\Omega^{N})$ by the characterization of this space as an $N$ -fold injective tensor product (see, e.g., [18, §3.2]). As an immediate consequence, we obtain that the symmetrised tensor products,

\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]}:=\bigg{\{}\operatorname{sym}\big{[}\phi_{{\bm{v}}}\big{]}:=\frac{1}{N!}\sum_{\sigma\in{\rm Sym}(N)}\phi_{{\bm{v}}}\circ\sigma\,\colon\;{\bm{v}}\in\mathbb{N}^{N}\bigg{\}}

span the symmetric functions, that is, we have the following result.

Proposition 3.2.

${\rm span}\,\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]}$ is dense in $C_{\rm sym}(\Omega^{N})$ , and

\|f-\operatorname{sym}[f_{D}]\|\leq\|f-f_{D}\|.

Proof.

The argument is a verbatim repetition of the proof of the argument in § 2, Lemma 2.2, and of [7, Sec. 3]. ∎

An alternative symmetric many-body basis can be constructed by mimicking the construction of the power sum polynomials in § 2 (see also [5, 7] which are our inspiration for this construction),

	$\displaystyle{\bm{A}}_{\bm{v}}({\bm{X}})$	$\displaystyle=\prod_{t=1}^{N}A_{v_{t}},\qquad\text{where}$		(3.2)
	$\displaystyle A_{v}({\bm{X}})$	$\displaystyle=\sum_{n=1}^{N}\phi_{v}({\bm{x}}_{n}).$		(3.3)

We denote the corresponding basis set by

{\bm{A}}:=\big{\{}{\bm{A}}_{{\bm{v}}}\,\colon\;{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}\},

where we recall that $\mathbb{N}^{N}_{\rm ord}:=\{{\bm{v}}\in\mathbb{N}^{N}\,\colon\;v_{1}\leq v_{2}\leq\dots\}$ . For future reference we define

{\rm deg}({\bm{v}}):=\sum_{t=1}^{N}\operatorname{deg}\phi_{v_{t}},\qquad\text{for }{\bm{v}}\in\mathbb{N}^{N}.

Proposition 3.3.

${\rm span}\,{\bm{A}}$ = ${\rm span}\,\operatorname{sym}\big{[}\Phi^{\otimes N}\big{]}$ , and in particular, ${\rm span}\,{\bm{A}}$ is dense in $C_{\rm sym}(\Omega^{N})$ . Moreover, ${\bm{A}}$ is linearly independent.

Proof.

The argument is identical to the $d=1$ case. ∎

Remark 3.4 (Examples of Basis Sets).

(i) Tensor product Chebyshev polynomials. The first concrete case we consider is $\Omega=[-1,1]^{d}$ with one-particle basis $\Phi:=\big{\{}T_{\bm{k}}:=\otimes_{i=1}^{d}T_{k_{i}}\,\colon\,{\bm{k}}\in\mathbb{N}^{d}\big{\}}.$ This is the simplest setting to which our analysis applies. In this case, we would simply take $\operatorname{deg}T_{k}=k$ .

(ii) Two-dimensional rotational symmetry. Assuming a rotationally symmetric domain, $\Omega:=\{x\in\mathbb{R}^{2}\,\colon\,r_{0}\leq|x|\leq r_{1}\}$ , a natural one-body basis is

\Phi:=\big{\{}\phi_{nm}(x)=R_{nm}(|x|)e^{im\,{\arg x}}\,\colon\,n\in\mathbb{N},m\in\mathbb{Z}\big{\}},

where the radial basis $R_{n}$ or $R_{nm}$ could, e.g., be chosen as transformed Zernike, Chebyshev, or other orthogonal polynomials. This has the natural associated total degree definition ${\rm deg}\phi_{nm}:=n+|m|.$

(iii) Three-dimensional rotational symmetry. Our final example concerns three-dimensional spherical symmetry, where the one-particle domain is given by $\Omega:=\{x\in\mathbb{R}^{3}\,\colon\,r_{0}\leq|x|\leq r_{1}\}.$ In this case, it is natural to employ spherical harmonics to discretize the angular component, i.e.,

\Phi:=\big{\{}\phi_{nlm}(x):=R_{nl}(|x|)Y_{l}^{m}(\hat{x})\,\colon\,n,l\in\mathbb{N},m=-l,\dots,l\big{\}},

with associated degree ${\rm deg}\phi_{nlm}:=n+l+|m|.$ Since $|m|\leq l$ and since $Y_{l}^{m}$ are coupled under rotations it is often more natural though to take $\operatorname{deg}\phi_{nlm}=n+l$ , which does not change our results.

Remark 3.5 (Lie Group Symmetries).

In many physical modelling scenarios one would also like to incorporate Lie group symmetries, e.g. invariance or covariance under rotations and reflections, or under in relativity theory under the Lorentz group. This can in principle be incorporated into our analysis, but there in general the coupling between the permutation group and such Lie groups is non-trivial and it becomes difficult to give sharp results. We therefore postpone such an analysis to future works.

That said, such additional symmetry does not reduce the parameterisation complexity nearly as much as the permutation symmetry does. For example in case (ii) of the previous remark, if rotational symmetry were imposed then this would only lead to an additional constraint $\sum_{t=1}^{N}m_{t}=0$ on the basis functions. Secondly, such structural sparsity changes our remarks (e.g., Remark 3.6) on computational complexity. We show in [12] that asymptotically optimal evaluation algorithms can still be constructed for those cases.

Remark 3.6 (Computational Cost).

We briefly justify our claim that the computational cost for both (2.3) and (2.10) is directly proportional to the number of parameters. For the sake of brevity we focus only on the evaluation (2.10) in two stages. First, we compute the symmetric one-body basis $A_{v}$ defined in (2.8), where $v=0,\dots,\max_{\phi_{v}\in\Phi}\operatorname{deg}(\phi_{v})$ . According to ( $\Phi 4$ ) there can be at most $c_{d}(D)$ elements of that basis. We shall assume that the cost of evaluating it scales as $\mathcal{O}(ND^{d})$ , which can for example be achieved by suitable recursive evaluations of a polynomial basis.

Once the one-body basis $A_{v}$ is precomputed and stored, the products ${\bm{A}}_{\bm{v}}$ can be evaluated recursively. Assume that the multi-indices are stored in a lexicographically ordered list. As we traverse the list we store the computed basis functions ${\bm{A}}_{{\bm{v}}}$ . For each new multi-index in the list,

{\bm{v}}=(v_{1},\dots,v_{N}),

we express the associated basis function as

{\bm{A}}_{{\bm{v}}}=N^{-1}A_{v_{1}}{\bm{A}}_{0,v_{2},\dots,v_{N}},

thus evaluating ${\bm{A}}_{{\bm{v}}}$ with $\mathcal{O}(1)$ cost. Note that the multi-indices ${\bm{v}}$ in the total degree approximations are a downset, hence if $(v_{1},\dots,v_{N})$ is part of the approximation then the new multi-index $(0,v_{2},\dots,v_{N})$ is also part of the approximation. In summary we therefore obtain that the total cost in evaluating (2.10) is bounded, up to a constant, by

ND^{d}+P(N,D).

3.2 Approximation errors

Our starting point in the $d=1$ case was to relate analyticity of the target function $f$ to approximation rates for the Chebyshev expansion. The same idea can be applied here but would be restrictive. In the multi-dimensional setting there is a far greater choice of coordinates and corresponding basis sets, which are highly problem-dependent. What is essential for our analysis is that the naive unsymmetrized (but sparse) approximation scheme has an exponential rate of convergence. In order to maintain the generality of our results we now formulate this as an assumption, which, loosely speaking encapsulates that $\Phi$ is a “good” basis to approximate smooth functions on $\Omega$ and therefore $\big{[}\Phi^{\otimes N}\big{]}$ is a good basis for approximation in $\Omega^{N}$ :

Assumption 3.7.

The target function $f\in C_{\rm sym}(\Omega^{N})$ has a sparse polynomial approximate $f_{D}\in{\rm span}(\Phi^{\otimes N})$ , satisfying

\|f-f_{D}\|_{\infty}\leq Me^{-\alpha^{\prime}D},

(3.4)

where $M,\alpha^{\prime}>0$ , $D$ is the total degree of $f_{D}$ defined by

D=\max_{{\bm{v}}}{\rm deg}({\bm{v}}),

and the maximum is taken over all basis functions $\phi_{\bm{v}}$ involved in the definition of $f_{D}$ .

Arguing as above we have

\|f-\operatorname{sym}[f_{D}]\|_{\infty}\leq\|f-f_{D}\|_{\infty},

hence we may again assume without loss of generality that $f_{D}$ is symmetric. It can be represented either in terms of the basis $\operatorname{sym}[\Phi^{\otimes N}]$ or equivalently in terms of the basis ${\bm{A}}$ , which optimizes the evaluation cost. This yields the representation

f_{D}=\sum_{{\rm deg}({\bm{v}})\leq D}\tilde{c}_{{\bm{v}}}{\bm{A}}_{{\bm{v}}}.

To obtain approximation rates in terms of the number of parameters,

P(N,D,d):=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}\colon\text{ordered, and }{\rm deg}({\bm{v}})\leq D\big{\}},

we generalize the one-dimensional result of Hardy and Ramanujan [11].

Theorem 3.8.

Assume that Assumption 3.1 is satisfied, in particular ( $\Phi$ 4) defining the intrinsic dimension $d$ . Then, there exists a constant ${\beta_{d}}>0$ such that for any $N,D$ ,

P(N,D,d)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})\leq D\big{\}}\leq De^{{\beta_{d}}D^{d/(d+1)}}.

(3.5)

Proof.

We adapt the proof of Erdös presented in [8] for the estimation of the number of integer partitions to the current context. First, we define the partition function

F(x)=\prod_{i=1}^{\infty}(1+g(i,1)x^{i}+g(i,2)x^{2i}+\cdots),

(3.6)

where $g(i,k)$ denotes the number of $k$ -body basis functions composed only of one-body basis functions with partial degree $i$ . Then, in the decomposition of $F(x)$ as

F(x)=\sum_{n\geq 0}f(n)x^{n},

(3.7)

the coefficient $f(n)$ is equal to the number of basis functions with total degree $n$ . Note that there is no limitations on the number of terms in the product in (3.6), which corresponds to having no restriction on $N$ .

Now, denoting by $c_{d}(i)$ the number of one-body basis functions with degree $i$ , the number $g(i,k)$ is the number of combinations with repetitions of $k$ in the set $\{1,2,\ldots,c_{d}(i)\}$ , which is

g(i,k)={k+c_{d}(i)-1\choose k}={k+c_{d}(i)-1\choose c_{d}(i)-1}.

Then, for $0<x<1$ , a well-known identity on power series gives

\displaystyle\sum_{k=0}^{\infty}{k+c_{d}(i)-1\choose c_{d}(i)-1}x^{k}=\frac{1}{(c_{d}(i)-1)!}\;\frac{\mathrm{d}^{c_{d}(i)-1}}{\mathrm{d}x^{c_{d}(i)-1}}(1-x)^{-1}=\frac{1}{(1-x)^{c_{d}(i)}}.

Hence, the partition function $F(x)$ can be written as

F(x)=\prod_{i=1}^{\infty}\frac{1}{(1-x^{i})^{c_{d}(i)}}.

Taking the logarithm of $F(x)$ , we obtain

\log F(x)=-\sum_{i=1}^{\infty}c_{d}(i)\log(1-x^{i}),

from which we deduce, differentiating this expression, that

\frac{F^{\prime}(x)}{F(x)}=\sum_{i=1}^{\infty}c_{d}(i)\frac{ix^{i-1}}{1-x^{i}}.

Using the expansion (3.7), we write $F^{\prime}(x)=\sum_{n\geq 1}nf(n)x^{n-1}$ , which leads to

\sum_{n\geq 1}nf(n)x^{n-1}=\sum_{l\geq 0}\sum_{i\geq 1}f(l)x^{l}c_{d}(i)\frac{ix^{i-1}}{1-x^{i}}.

Therefore, expanding $(1-x^{i})^{-1}$ as a power series,

\sum_{n\geq 1}nf(n)x^{n-1}=\sum_{l\geq 0}\sum_{i\geq 1}\sum_{k\geq 0}f(l)c_{d}(i)ix^{l+i(k+1)-1}.

Multiplying by $x$ on both sides, and substituting $k=k+1$ , we obtain

\sum_{n\geq 1}nf(n)x^{n}=\sum_{l\geq 0}\sum_{i\geq 1}\sum_{k\geq 1}f(l)c_{d}(i)ix^{l+ik}.

Hence, equating the coefficients of the power series on both sides, there holds

nf(n)=\sum_{\begin{subarray}{c}i\geq 1,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)if(n-ki).

Let us now show by induction that $f(n)\leq e^{{\beta_{d}}n^{\alpha}}$ , with

{\beta_{d}}=\frac{d+1}{d}\left(cd!p_{d}\zeta(d+1)\right)^{1/(d+1)},

(3.8)

where

p_{d}=\sup_{x\in\mathbb{R}^{+}}\frac{x^{d+1}e^{-x}}{(1-e^{-x})^{d+1}},

(3.9)

$c$ comes from ( $\Phi$ 4) and $\displaystyle\alpha=\frac{d}{d+1}$ .

First, $f(0)=1$ , therefore the property $f(n)\leq e^{{\beta_{d}}n^{\alpha}}$ is satisfied for $n=0$ .

Now, we assume that for any $k<n$ , $f(k)\leq e^{{\beta_{d}}k^{\alpha}}$ . Using the recurrence property, there holds

\displaystyle nf(n)

\displaystyle=\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)if(n-ki)\leq\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}c_{d}(i)ie^{{\beta_{d}}(n-ki)^{\alpha}}.

(3.10)

Then, using the concavity of the function $x\mapsto x^{\alpha}$ and noting that $\frac{ki}{n}\leq 1$ , we obtain

\displaystyle(n-ki)^{\alpha}

\displaystyle=n^{\alpha}\left(1-\frac{ki}{n}\right)^{\alpha}\leq n^{\alpha}\left(1-\alpha\frac{ki}{n}\right).

Inserting the latter bound in (3.10) and using Assumption ( $\Phi$ 4), we deduce

\displaystyle nf(n)

\displaystyle\leq ce^{{\beta_{d}}n^{\alpha}}\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\end{subarray}}(i+d-1)\dots(i+1)ie^{-{\beta_{d}}\alpha kin^{\alpha-1}}.

Summing over all $i\in\mathbb{N}$ , using that for $0<x<1$ ,

\displaystyle\sum_{i\geq 1}^{\infty}(i+d-1)\dots(i+1)ix^{i}=x\left[\frac{1}{1-x}\right]^{(d+1)}=\frac{d!\;x}{(1-x)^{d+1}},

we obtain, taking $x=e^{-{\beta_{d}}\alpha kn^{\alpha-1}}$ ,

\displaystyle nf(n)

\displaystyle\leq c\;d!\;e^{{\beta_{d}}n^{\alpha}}\sum_{k\geq 1}\frac{e^{-{\beta_{d}}\alpha kn^{\alpha-1}}}{\left(1-e^{-{\beta_{d}}\alpha kn^{\alpha-1}}\right)^{d+1}}.

Introducing $p_{d}$ defined in (3.9), we get

	$\displaystyle nf(n)$	$\displaystyle\leq c\;d!\;p_{d}\;e^{{\beta_{d}}n^{\alpha}}\sum_{k\geq 1}\frac{1}{({\beta_{d}}\alpha kn^{\alpha-1})^{d+1}}$
		$\displaystyle=c\;d!\;p_{d}\;\frac{\zeta(d+1)}{{\beta_{d}}^{d+1}\alpha^{d+1}}n^{(d+1)(1-\alpha)}e^{{\beta_{d}}n^{\alpha}}.$

Since $\alpha=\frac{d}{d+1}$ and ${\beta_{d}}=\frac{d+1}{d}\left(cd!p_{d}\zeta(d+1)\right)^{1/(d+1)}$ , we obtain

nf(n)\leq ne^{{\beta_{d}}n^{\alpha}},

from which we deduce that for all $n\in\mathbb{N}$ , $f(n)\leq e^{{\beta_{d}}n^{\alpha}}$ , i.e.

\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}}\leq e^{{\beta_{d}}D^{d/(d+1)}}.

We then easily deduce the result of the theorem. ∎

With the basis size estimate in hand, we can now readily extend the one-dimensional approximation rate.

Theorem 3.9.

Under Assumptions 3.1 and 3.7 there exist constants $\alpha,M>0$ such that

\|f-f_{D}\|_{\infty}\leq Me^{-\alpha[\log P]^{1+1/d}}

Proof.

This follows immediately from (3.4) and, denoting $c_{d}=\max_{D}\frac{1}{\beta_{d}}\frac{\log D}{D^{d/(d+1)}}$ , using that $\log P\leq\beta_{d}D^{d/(d+1)}+\log D$ implies

-D\leq-\left[\frac{\log P}{(1+c_{d})\beta_{d}}\right]^{1+1/d}

and absorbing $\beta_{d}$ into $\alpha:=\alpha^{\prime}/((1+c_{d})\beta_{d})^{1+1/d}$ . ∎

3.3 Asymptotic Results for $D\gg N^{1+1/d}$

Our results up to this point are uniform in $N,D$ but in fact, they turn out to be sharp only in the regime of fairly moderate degree $D$ , specifically for $D\lesssim N^{1+1/d}$ . Our final set of results provides improved complexity and error estimates for the regime $D\gg N^{1+1/d}$ . In particular, they will also provide strong indication that Theorem 3.9 is sharp in the regime $D\lesssim N^{1+1/d}$ .

Here, and below it will be convenient to use the symbols $\lesssim,\gtrsim,\eqsim$ to mean less than, greater than, or bounded above and below up to uniform positive constants.

Our subsequent analysis is based on the fact that there exist constants $c_{0},{c_{1}}$ such that, for $D\geq cN^{1+1/d}$ with $c$ sufficiently large,

c_{0}^{N}\frac{D^{dN}}{(dN)!N!}\leq P\leq c_{1}^{N}\frac{D^{dN+1}}{(dN)!N!}.

(3.11)

The lower bound is straightforward to establish. The upper bound for $d=1$ follows immediately from (2.11). For $d>1$ , and in the limit $D\to\infty$ it is again straightforward to establish. We prove a slightly modified generic upper bound for $d\geq 1$ below. We then recover the upper bound (3.11) from the following theorem noting that $\frac{D^{dN+1}}{(dN)!(N-1)!}\leq c^{N}\frac{D^{dN+1}}{(dN)!N!}$ with $c=3^{1/3}\approx 1.44$ . (The constants in (3.11) and (3.12) are different).

Theorem 3.10.

Assume that Assumption 3.1 is satisfied, in particular ( $\Phi$ 4) defining the intrinsic dimension $d$ . Then, there exist two constants $c_{1},c_{2}>0$ that depend on $d$ such that for any $N,D$ ,

P(N,D,d)=\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})\leq D\big{\}}\leq Dc_{1}^{N}\frac{(D+c_{2}N^{\frac{d+1}{d}})^{dN}}{(dN)!(N-1)!}.

(3.12)

Proof.

First, we define the partition function

F(x,y)=\prod_{i=1}^{\infty}(1+g(i,1)x^{i}y+g(i,2)x^{2i}y^{2}+\cdots),

(3.13)

where $g(i,k)$ denotes the number of $k$ -body basis functions composed only of one-body basis function with partial degree $i$ . Then, in the decomposition of $F(x,y)$ as

F(x,y)=\sum_{n,m\geq 0}f(n,m)x^{n}y^{m},

(3.14)

the coefficient $f(n,m)$ is equal to the number of basis functions with total degree $n$ and number of parts $m$ .

Now, as in the proof of Theorem 3.8, denoting by $c_{d}(i)$ the number of one-body basis functions of degree $i$ , we can write the partition function $F(x,y)$ as

F(x,y)=\prod_{i=1}^{\infty}\frac{1}{(1-x^{i}y)^{c_{d}(i)}}.

Taking the logarithm of $F(x,y)$ , we obtain

\log F(x,y)=-\sum_{i=1}^{\infty}c_{d}(i)\log(1-x^{i}y),

from which we deduce, differentiating this expression with respect to $y$ , that

\frac{\frac{\partial F}{\partial y}(x,y)}{F(x,y)}=\sum_{i=1}^{\infty}c_{d}(i)\frac{x^{i}}{1-x^{i}y}.

Using the expansion (3.14), we write $\frac{\partial F}{\partial y}(x,y)=\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m-1}$ , which leads to

	$\displaystyle\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m-1}$	$\displaystyle=\sum_{p,q\geq 0}f(p,q)x^{p}y^{q}\sum_{i\geq 1}c_{d}(i)x^{i}\left[\sum_{k\geq 0}(x^{i}y)^{k}\right]$
		$\displaystyle=\sum_{p,q\geq 0}\sum_{i\geq 1}\sum_{k\geq 0}f(p,q)c_{d}(i)x^{p+i(k+1)}y^{k+q}.$

Multiplying by $y$ on both sides, and substituting $k+1$ for $k$ , we obtain

\sum_{n,m\geq 0}mf(n,m)x^{n}y^{m}=\sum_{p,q\geq 0}\sum_{i,k\geq 1}f(p,q)c_{d}(i)x^{p+ik}y^{k+q}.

Hence, equating the coefficients of the power series on both sides, we conclude

mf(n,m)=\sum_{\begin{subarray}{c}i\geq 1,k\geq 1\\ ki\leq n\\ k\leq m\end{subarray}}c_{d}(i)f(n-ki,m-k).

(3.15)

We now show by induction that for any $c_{1},c_{2}>0$ that satisfy

c_{1}c_{2}^{d}>d^{d}\;\text{ and }\;(c_{1}-\widetilde{c}\alpha_{d})c_{2}^{d}\geq d^{d},

(3.16)

where $\tilde{c}$ is the constant appearing in ( $\Phi$ 4), $\alpha_{1}=1$ , and $\alpha_{d}=\max\{(d-1)!\,2^{d},(d-1)!+d(d-1)^{d-1}\}$ for $d\geq 2$ , we have

f(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}.

(3.17)

Note that (3.16) holds in particular for the choice $c_{1}=1+\widetilde{c}\alpha_{d}$ and $c_{2}=d$ .

First, we observe that $f(0,0)=1$ , $f(n,0)=0$ for $n\geq 1$ , $f(0,m)=0$ for $m\geq 1$ , $f(1,1)=c_{d}(1)\leq\widetilde{c}$ , and $f(1,m)=0$ for all $m\geq 2$ . Thus (3.17) holds in each of these cases.

Now for arbitrary but fixed $m$ and $n$ , we assume that (3.17) holds true for any $k<m$ , $i<n$ . Using the recurrence property, starting from (3.15), and noting that $c_{d}(i)$ appearing in ( $\Phi$ 4) can be bounded by $\tilde{c}i^{d-1}$ for some $\tilde{c}>0$ , we find

	$\displaystyle mf(n,m)$	$\displaystyle\leq\sum_{\begin{subarray}{c}i,k\geq 1\\ ki\leq n\\ k\leq m\end{subarray}}c_{d}(i)c_{1}^{m-k}\frac{\left(n-ik+c_{2}(m-k)^{\frac{d+1}{d}}\right)^{d(m-k)}}{(d(m-k))!(m-k)!}$
		$\displaystyle\leq\widetilde{c}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}}{(d(m-k))!(m-k)!}S(n,k,c_{2}(m-k)^{\frac{d+1}{d}},d(m-k),d-1),$		(3.18)

with

S(n,k,\gamma,p,q)=\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}(n-ik+\gamma)^{p}i^{q},

where $\gamma\geq 0$ , $p,q,k\in\mathbb{N}$ , $k\geq 1$ . For estimating this quantity, we use the following two lemmas.

Lemma 3.11.

For $\gamma\geq 0$ , $n,k,p,q\in\mathbb{N}$ , $k\geq 1$ , we have

S(n,k,\gamma,p,q)\leq\beta(q,p,k,\gamma)\frac{(n+\gamma)^{p+q+1}}{(p+1)(p+2)\ldots(p+q+1)},

(3.19)

with $\beta(0,p,k,\gamma)=1$ , $\beta(q,0,k,\gamma)=q!\,2^{q+1}$ , and $\beta(q,p,k,\gamma)=\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(1+\gamma)}\frac{(p+q+1)}{(p+q)^{p}}$ for $p,q\geq 1$ .

Proof.

To prove (3.19), we introduce the function $g:x\mapsto(n-kx+\gamma)^{p}x^{q}$ and compare the sum $S(n,k,\gamma,p,q)=\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}g(i)$ to the integral of $g$ . Note that (3.19) trivially holds for $n=0$ . Hence we assume $n\geq 1$ . We treat three different cases depending on the values of $p$ and $q$ .

If $q=0$ , the function $g$ decreases on $[0,\frac{n}{k}]$ . Therefore,

S(n,k,\gamma,p,q)\leq\int_{0}^{\frac{n}{k}}g(x)dx=\left[-\frac{(n-kx+\gamma)^{p+1}}{(p+1)k}\right]_{0}^{\frac{n}{k}}\leq\frac{(n+\gamma)^{p+1}}{(p+1)k}.

Since $k\geq 1$ , (3.19) holds in this case, with $\beta(0,p,k,\gamma)=1$ .

If $p=0$ , using $k\geq 1$ , we obtain

S(n,k,\gamma,p,q)\leq\frac{(\frac{n}{k}+1)^{q+1}}{q+1}\leq\frac{(n+1)^{q+1}}{q+1}\leq\frac{n^{q+1}}{(q+1)!}q!\left(\frac{n+1}{n}\right)^{q+1}.

Hence (3.19) is satisfied with $\beta(q,0,k,\gamma)=q!\,2^{q+1}$ .

As the third case, we consider $p\geq 1,q\geq 1$ . For $x\geq 0$ ,

g^{\prime}(x)=(n-kx+\gamma)^{p-1}x^{q-1}\left(q(n+\gamma)-xk(p+q)\right),

so the function $g$ is increasing on $\left[0,\frac{q(n+\gamma)}{k(p+q)}\right]$ , and decreasing on $\left[\frac{q(n+\gamma)}{k(p+q)},+\infty\right[$ . In particular, the function has a single local maximum in $x=\frac{q(n+\gamma)}{k(p+q)}$ . Hence, there holds

\sum_{1\leq i\leq\left\lfloor\frac{n}{k}\right\rfloor}g(i)\leq\int_{0}^{\frac{n}{k}}g(x)dx+g\left(\frac{q(n+\gamma)}{k(p+q)}\right).

Integrating by parts $q$ times, we obtain

	$\displaystyle\int_{0}^{\frac{n}{k}}g(x)dx$	$\displaystyle=\left[-\frac{x^{q}(n-kx+\gamma)^{p+1}}{k(p+1)}\right]_{0}^{\frac{n}{k}}+\int_{0}^{\frac{n}{k}}\frac{q}{k(p+1)}(n-kx+\gamma)^{p+1}x^{q-1}dx$
		$\displaystyle\leq\frac{q}{k(p+1)}\int_{0}^{\frac{n}{k}}(n-kx+\gamma)^{p+1}x^{q-1}dx$
		$\displaystyle\leq\frac{q!}{k^{q}(p+1)\ldots(p+q)}\int_{0}^{\frac{n}{k}}(n-kx+\gamma)^{p+q}dx$
		$\displaystyle\leq\frac{q!(n+\gamma)^{p+q+1}}{k^{q+1}(p+1)\ldots(p+q+1)}.$

Moreover,

g\left(\frac{q(n+\gamma)}{k(p+q)}\right)=\left(n-\frac{q(n+\gamma)}{p+q}+\gamma\right)^{p}\left(\frac{q(n+\gamma)}{k(p+q)}\right)^{q}=\left(n+\gamma\right)^{p+q}\frac{p^{p}q^{q}}{k^{q}(p+q)^{p+q}}.

Since $n,k,p,q\geq 1,$ we easily deduce

	$\displaystyle g\left(\frac{q(n+\gamma)}{k(p+q)}\right)$	$\displaystyle=\frac{(n+\gamma)^{p+q+1}}{{(p+1)\ldots(p+q+1)}}\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{{(p+1)\ldots(p+q)(p+q+1)}}{(p+q)^{p+q}}$
		$\displaystyle\leq\frac{(n+\gamma)^{p+q+1}}{{(p+1)\ldots(p+q+1)}}\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{(p+q+1)}{(p+q)^{p}},$

and thus

S(n,k,\gamma,p,q)\leq\frac{(n+\gamma)^{p+q+1}}{(p+1)\ldots(p+q+1)}\left(\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(n+\gamma)}\frac{(p+q+1)}{(p+q)^{p}}\right).

Therefore, (3.19) holds also in this third case, with $\beta(q,p,k,\gamma)=\frac{q!}{k^{q+1}}+\frac{p^{p}q^{q}}{k^{q}(1+\gamma)}\frac{(p+q+1)}{(p+q)^{p}}$ . This concludes the proof of the lemma. ∎

We now show that the prefactor can be bounded uniformly in $m$ and $k$ .

Lemma 3.12.

For $m,k\in\mathbb{N},$ $m,k\geq 1$ , we have

\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\leq\alpha_{d},

(3.20)

with $\alpha_{1}=1,$ and $\alpha_{d}=\max\{(d-1)!\,2^{d},(d-1)!+d(d-1)^{d-1}\}$ for $d\geq 2$ , provided that $c_{2}\geq 1$ .

Proof.

If $d=1$ (corresponding to $q=0$ in the previous lemma) or $k=m$ (corresponding to $p=0$ ), the estimate (3.20) holds with $\alpha_{1}=1$ and $\alpha_{d}=(d-1)!\,2^{d}$ , respectively, by Lemma 3.11. Now if $1\leq k<m$ ,

\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\\ =\frac{(d-1)!}{k^{d}}+\frac{[d(m-k)]^{d(m-k)}(d-1)^{(d-1)}}{k^{d-1}(1+c_{2}(m-k)^{\frac{d+1}{d}})}\frac{(d(m-k)+d)}{(d(m-k)+d-1)^{d(m-k)}}\\ \leq(d-1)!+\frac{(d-1)^{(d-1)}}{k^{d-1}}\frac{d(m-k+1)}{(1+c_{2}(m-k)^{\frac{d+1}{d}})}\left[\frac{d(m-k)}{d(m-k)+d-1}\right]^{d(m-k)}.

Noting that $\frac{d(m-k)}{d(m-k)+d-1}\leq 1$ , $k\geq 1$ , and that $\frac{d(m-k+1)}{1+c_{2}(m-k)^{\frac{d+1}{d}}}\leq d$ if $c_{2}\geq 1$ , we obtain

\beta\bigl{(}d-1,d(m-k),k,c_{2}(m-k)^{\frac{d+1}{d}}\bigr{)}\leq(d-1)!+d(d-1)^{d-1},

completing the proof of (3.20). ∎

Coming back to (3.18) and using the estimates of the two lemmas, we obtain

	$\displaystyle mf(n,m)$	$\displaystyle\leq\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}}{(d(m-k))!(m-k)!}\frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k)+d}}{(d(m-k)+1)(d(m-k)+2)\ldots(d(m-k)+d)}$
		$\displaystyle=\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{c_{1}^{m-k}(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(d(m-k+1))!(m-k)!}$
		$\displaystyle\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{1}{c_{1}^{k}}\frac{(dm)!}{(d(m-k+1))!}\frac{m!}{(m-k)!}\frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}.$

Combining this with the estimates

\begin{gathered}\frac{(dm)!}{(d(m-k+1))!}\leq(dm)^{d(k-1)},\qquad\frac{m!}{(m-k)!}\leq m^{k},\\ \frac{(n+c_{2}(m-k)^{\frac{d+1}{d}})^{d(m-k+1)}}{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}\leq\frac{1}{(n+c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}\leq\frac{1}{(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}\end{gathered}

yields

mf(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\alpha_{d}\sum_{1\leq k\leq m}\frac{(dm)^{d(k-1)}m^{k}}{c_{1}^{k}(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}.

Noting that $\displaystyle\frac{(dm)^{d(k-1)}m^{k}}{(c_{2}m^{\frac{d+1}{d}})^{d(k-1)}}=m\frac{d^{d(k-1)}}{c_{2}^{d(k-1)}}$ we obtain, replacing the index $k$ with $k-1$ ,

\displaystyle mf(n,m)

\displaystyle\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\frac{\alpha_{d}m}{c_{1}}\sum_{0\leq k\leq(m-1)}\left(\frac{d^{d}}{c_{1}c_{2}^{d}}\right)^{k}.

Taking $c_{1},c_{2}$ such that $c_{1}c_{2}^{d}>d^{d}$ and dividing by $m$ yields

f(n,m)\leq c_{1}^{m}\frac{(n+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}\widetilde{c}\frac{\alpha_{d}}{c_{1}}\frac{1}{1-\frac{d^{d}}{c_{1}c_{2}^{d}}}.

Therefore, (3.17) is satisfied if $\displaystyle\widetilde{c}\alpha_{d}c_{2}^{d}\leq c_{1}c_{2}^{d}-d^{d},$ that is, if $\displaystyle c_{1}c_{2}^{d}\geq\widetilde{c}\alpha_{d}c_{2}^{d}+d^{d}.$ This completes the induction. Hence, we deduce that

\displaystyle\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}}

\displaystyle=\sum_{m=0}^{N}f(D,m)\leq\sum_{m=0}^{N}c_{1}^{m}\frac{(D+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}.

Finally, we note that a sufficient condition for $\mathbb{N}\ni m\mapsto c_{1}^{m}\frac{(D+c_{2}m^{\frac{d+1}{d}})^{dm}}{(dm)!m!}$ to be increasing is that $c_{1}c_{2}^{d}\geq d^{d}$ . Since this is satisfied under by (3.16),

\#\big{\{}{\bm{v}}\in\mathbb{N}^{N}_{\rm ord}:{\rm deg}({\bm{v}})=D\big{\}}\leq c_{1}^{N}\frac{(D+c_{2}N^{\frac{d+1}{d}})^{dN}}{(dN)!(N-1)!}.

Summing over the possible values of $D$ , we easily deduce the result of the Theorem. ∎

With this result at hand, we now consider the dependency of the number of parameters on $D$ .

Lemma 3.13.

Suppose that $D=N^{t}$ with $t\geq 1+1/d$ . Then, for sufficiently large $D$ ,

\log P\eqsim D^{1/t}+(t-1-1/d)D^{1/t}\log D^{1/t},

(3.21)

uniformly for all $t\geq 1+1/d$ (but the constants may depend on $d$ ).

Proof.

From (3.11) we can estimate

	$\displaystyle\log P$	$\displaystyle\leq N\log c_{1}+dN\log D-dN\log\big{(}dN/e\big{)}-N\log\big{(}N/e\big{)}+\log D$
		$\displaystyle=dN\Big{(}c_{2}+\log D-\log N-\log N^{1/d}\Big{)}+\log D$
		$\displaystyle=dD^{1/t}\Big{(}c_{2}+\log D^{1-1/t-1/dt})+\log D.$

Noting that asymptotically as $D\to\infty$ , $\log D\ll D^{1/t}$ we easily obtain the upper bound. The lower bound is analogous. ∎

Proposition 3.14.

Suppose that $D=N^{t}$ for some $t>1+1/d$ , and that $N,D$ are sufficiently large, then

D\gtrsim\frac{1}{(t-1-1/d)^{t}}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}.

(3.22)

Proof.

From Lemma 3.13, for the case $t>1+1/d$ and $N,D$ sufficiently large we have

a_{0}(t-1-1/d)D^{1/t}\log D^{1/t}\leq\log P\leq a_{1}(t-1-1/d)D^{1/t}\log D^{1/t}.

(3.23)

From the lower bound we deduce

\log(a_{0}(t-1-1/d)/t)+\log D^{1/t}+\log\log D^{1/t}\leq\log\log P,

and for $D$ sufficiently large, this implies

\log D^{1/t}\leq\log\log P,\qquad\text{or,}\qquad\frac{1}{\log\log P}\leq\frac{1}{\log D^{1/t}}.

Multiplying this with the right-hand inequality in (3.23) yields

\frac{\log P}{\log\log P}\leq a_{1}(t-1-1/d)D^{1/t},

and taking this to power $t$ gives the stated result. ∎

In particular, in the regime $D=N^{t},t>1+1/d$ , if we start from the exponential convergence rate of Assumption 3.7, then we obtain that there exists a symmetric approximation $f_{D}$ with

\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha(t-1-1/d)^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)},

(3.24)

for some $\alpha>0$ . This rate is clearly superior to the uniform $N$ -independent rate of Theorem 3.9 (in this regime at least). Moreover, as $t\to 1+1/d$ it strongly hints at a kind of “phase transition”, in other words, that the behaviour of the approximation scheme significantly changes at that point.

Remark 3.15.

It is furthermore interesting to compare our result (3.24) with analogous estimates if we had not exploited symmetry, or sparsity. An analogous analysis shows that if we use a (sparse) total degree approximation, but use a naive basis instead of a symmetrized basis, then we obtain

\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha_{2}(t-1)^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)},

Finally, if we drop even the sparsity, so that $P\approx D^{dN}$ , then we obtain

\|f-f_{D}\|_{\infty}\lesssim\exp\bigg{(}-\alpha_{3}t^{-t}\bigg{[}\frac{\log P}{\log\log P}\bigg{]}^{t}\bigg{)}

Note here that the $t$ dependence of the exponents highlights quite different qualitative behaviour of the three bases. Specifically, this provides strong qualitative evidence that (unsurprisingly) the sparse basis gives a significant advantage over the naive tensor product basis especially in the regime $1\leq t$ , but that it is still significantly more costly than the sparse symmetrized basis. Moreover, these estimates clearly highlight the unique advantage that the symmetrized basis provides in the regime $D\lesssim N^{1+1/d}$ treated in Theorem 3.9.

4 Approximation of multi-set functions

Our original motivation to study the approximation of symmetric functions arises in the atomic cluster expansion [5, 7] which is in fact concerned with the approximation of multi-set functions. We now study how our approximation results of the previous sections can be applied in this context. Similarly as in the previous sections, this discussion will again ignore isometry-invariance. We will focus on a general abstract setting, but employ assumptions that can be rigorously established in the setting of [5, 7].

Let $\mathrm{MS}(\Omega)$ denote the set of all multi-sets (or, msets) on $\Omega$ ; i.e.,

\mathrm{MS}(\Omega):=\big{\{}{\bm{x}}=[x_{1},\dots,x_{M}]\colon M\in\mathbb{N},\{x_{j}\}_{j=1}^{M}\subset\Omega\big{\}},

(4.1)

where $[x_{1},\dots,x_{M}]$ denotes an unordered tuple or mset, e.g. describing a collection of (positions of) $M$ classical particles. The crucial aspect is that $M$ is now variable and no longer fixed. This is equivalent to the classical definition of a multiset which has the defining feature of allowing for multiple instances for each of its elements. We are interested in parameterising (approximating) mset functions

\displaystyle f\colon\mathrm{MS}(\Omega)\to\mathbb{R}.

(4.2)

Remark 4.1 (Context).

This situation arises, e.g., when modelling interactions between particles. Different local or global structures of particle systems lead to a flexible number of particles entering the range of the interaction law. It seems tempting to take the limit $M\to\infty$ , which leads to a mean-field-like scenario where a signal processing perspective could be of interest. However, we are interested in an intermediate situation where $M$ is “moderate”; say, in the range $M=10$ to $100$ , and thus this limit is not of interest. Indeed, the number of interacting particles $M$ can be understood as another approximation parameter, which we discuss in more detail in Remark 4.3 below.

In order to reduce the approximation of an mset function $f$ to our foregoing results we must first produce a representation of $f$ in terms of finite-dimensional symmetric components. A classical idea is the many-body or ANOVA expansion, which we will formulate as an assumption. However, we emphasize that our recent results [20] rigorously justify this assumption in the context of coarse-graining electronic structure models into interatomic potentials models. The following formulation is modelled on those results, which are summarized in Appendix A.

Assumption 4.2.

(i) For all $N$ , there exist symmetric $V_{nN}\colon\Omega^{n}\to\mathbb{R}$ for $n=0,\dots,N$ , and $\eta>0$ such that $f_{N}\colon\mathrm{MS}(\Omega)\to\mathbb{R}$ , defined by

	$\displaystyle f_{N}([x_{1},\dots,x_{M}]):=V_{0N}+\sum_{j}V_{1N}(x_{j})+\dots+\sum_{j_{1}<\dots<j_{N}}V_{NN}(x_{j_{1}},\dots,x_{j_{N}})$
	$\displaystyle\text{satisfies}\qquad\|f(\bm{x})-f_{N}(\bm{x})\|\lesssim e^{-\eta N}$		(4.3)

for all $\bm{x}\in\mathrm{MS}(\Omega)$ .

(ii) Moreover, we suppose that each $V_{nN}$ has a sparse polynomial approximation, $V_{nND}$ where $D$ denotes the total degree, satisfying

\big{|}V_{nN}(x_{1},\dots,x_{n})-V_{nND}(x_{1},\dots,x_{n})\big{|}\lesssim c^{n}e^{-\alpha_{n}D}\qquad\forall x_{1},\dots,x_{n}\in\Omega

(4.4)

for some $c,\alpha_{n}>0$ and $n=1,\dots,N$ . Here, $V_{0N}$ is simply a constant term and requires no approximation.

Part (i) of Assumption 4.2 is the main result of [20], while part (ii) essentially encodes the assumption that the $V_{nN}$ are analytic, with the region of analyticity encoded in $\alpha_{n}$ and possibly varying. Under a suitable choice of one-particle basis, one then obtains (4.4). Note in particular, that in this context the dimensionality of the target functions $V_{nN}$ is an approximation parameter, which makes it particularly natural to consider the different regimes how $D$ and $N$ are related in the foregoing sections.

Throughout the following discussion suppose that Assumption 4.2 is satisfied. Then, for a tuple ${\bm{D}}=(D_{1},\dots,D_{N})$ specifying the total degrees $D_{n}$ used to approximate the components $V_{nN}$ , the cluster expansion approximation to $f$ is given by

f_{N{\bm{D}}}(\bm{x}):=V_{0N}+\sum_{n=1}^{N}\sum_{j_{1}<\dots<j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}}),

(4.5)

with $V_{nND_{n}}$ of the form $V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})=\sum_{v_{1},\dots,v_{n}}c^{nND_{n}}_{\bm{v}}\prod_{t=1}^{n}\phi_{v_{t}}(x_{j_{t}})$ , for some coefficients $c^{nND_{n}}_{\bm{v}}$ and one-body basis functions $(\phi_{v})_{v\in\mathbb{N}}$ . We immediately deduce an approximation error estimate: for all $\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)$ ,

	$\displaystyle\left\|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right\|$	$\displaystyle\leq\|f(\bm{x})-f_{N}(\bm{x})\|+\sum_{n=1}^{N}\sum_{1\leq j_{1}<\dots<j_{n}\leq M}\big{\|}(V_{nN}-V_{nND})(x_{j_{1}},\dots,x_{j_{n}})\big{\|}$
		$\displaystyle\lesssim e^{-\eta N}+\sum_{n=1}^{N}{M\choose n}c^{n}e^{-\alpha_{n}D_{n}}.$		(4.6)

From this expression we can now minimize the computational cost subject to the constraint that the overall error is no worse than the many-body approximation error $e^{-\eta N}$ and then obtain a resulting error vs cost estimate.

Remark 4.3.

As we already hinted in Remark 4.1, the function $f$ is often more naturally defined on the whole of $\mathrm{MS}(\mathbb{R}^{d})$ (or even on the space of infinite multi-sets) but only “weakly” depends on points far away. That is, on defining $\bm{x}_{R}:=[x\in\bm{x}\colon|x|\leq R]$ (i.e. we remove from $\bm{x}$ only points outside $B_{R}$ ) we have

\big{|}f(\bm{x})-f(\bm{x}_{R})\big{|}\lesssim e^{-\gamma R},

that is, the domain $\Omega$ itself becomes an approximation parameter as well. Such exponential “locality” results arise for example in modelling of interatomic interaction laws [20, 2]. For the sake of simplicity we will not incorporate this feature into our analysis, except for a natural assumption on how $M$ and $N$ are related:

If we approximate the restriction of $f$ to $\mathrm{MS}(B_{R})$ then we obtain

\displaystyle|f(\bm{x})-f_{N}(\bm{x}_{R})|\lesssim e^{-\gamma R}+e^{-\eta N}

(4.7)

Balancing the error, we choose $\gamma R=\eta N$ . In many physical situations we can assume that particles do not cluster and this leads to the bound $M\lesssim R^{d}$ . More generally, we will therefore assume below that $M$ is bounded by a polynomial in $N$ , which will make our analysis a little more concrete.

Remark 4.4 (Connection to Deep sets).

Deep set architectures are based on the idea that set-functions $f\colon\bigcup_{m=0}^{M}\big{(}\mathbb{R}^{d}\big{)}^{m}\to\mathbb{R}$ which are continuous when restricted to any fixed number of inputs are symmetric if and only if there exist continuous functions $\phi\colon\mathbb{R}^{d}\to\mathbb{R}^{Z}$ and $\rho\colon\mathbb{R}^{Z}\to\mathbb{R}$ such that

\displaystyle f(\bm{x})=\rho\Big{(}\textstyle\sum_{x\in\bm{x}}\phi(x)\Big{)}.

(4.8)

For $d=1$ , it has been shown that $Z=M$ is sufficient [25], and there exist functions where $Z=M$ is also necessary [22]. Motivated by this characterisation, one obtains a deep set approximation to $f$ by choosing a latent space dimension $Z$ and learning the mappings $\phi$ and $\rho$ , e.g. parametrised using multi-layer perceptrons.

The cluster expansion approximation $f_{N\bm{D}}$ (4.5) can therefore be thought of as a special case of (4.8) with both $\phi$ and $\rho$ polynomials. However, the conceptual key difference in our setting is that, on the one hand, $Z$ is significantly larger than $d$ and should be taken as an approximation parameter, while on the other hand, the embedding $\phi$ is known a priori and need not be learned. Despite these philosophical differences, one can see the approximation results of this section as approximation rates for deep set approximations.

4.1 Computational cost of the cluster expansion

The expression in (4.5) suggests that the evaluation cost scales as ${M\choose N}$ (notwithstanding the cost of evaluating the $V_{nN}$ components), but in fact a similar transformation as in § 2 allows us to reduce this to a cost that is linear in $M$ ; see also [5, 7].

We consider a single term,

	$\displaystyle\sum_{j_{1}<\dots<j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})$	$\displaystyle=\frac{1}{n!}\sum_{j_{1}\neq\dots\neq j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})$
		$\displaystyle=\sum_{j_{1},\dots,j_{n}}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})+W_{n-1},$

where $W_{n-1}$ contains the artificial self-interactions introduced when converting from $\sum_{j_{1}\neq\dots\neq j_{n}}$ to $\sum_{j_{1},\dots,j_{n}}$ . We will return to this term momentarily. Now, inserting the expansion of $V_{nND_{n}}$ , writing $c_{\bm{v}}$ instead of $c^{nND_{n}}_{\bm{v}}$ , and interchanging summation order, we obtain

	$\displaystyle\sum_{j_{1},\dots,j_{n}=1}^{M}V_{nND_{n}}(x_{j_{1}},\dots,x_{j_{n}})$	$\displaystyle=\sum_{j_{1},\dots,j_{n}=1}^{M}\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}\phi_{v_{t}}(x_{j_{t}})$
		$\displaystyle=\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}\sum_{j=1}^{M}\phi_{v_{t}}(x_{j})$
		$\displaystyle=\sum_{v_{1},\dots,v_{n}}c_{\bm{v}}\prod_{t=1}^{n}A_{v}({\bm{x}}),$

where the inner-most terms (power sum polynomials, atomic base, density projection) are now computed over the full input range $x_{1},\dots x_{M}$ instead of only a subcluster,

A_{v}({\bm{x}}):=\sum_{j=1}^{M}\phi_{v}(x_{j}).

The self-interaction terms $W_{n-1}$ are simply polynomials of lower correlation-order and can be absorbed into the $\mathcal{O}(n-1)$ terms, provided that $D_{n}\leq D_{n-1}$ . Thus, we can equivalently write (4.5) as

f_{N{\bm{D}}}=\sum_{n=0}^{N}\sum_{{\bm{v}}}\tilde{c}_{\bm{v}}\prod_{t=1}^{n}A_{v_{t}},

(4.9)

where the sum $\sum_{{\bm{v}}}$ ranges over all ordered tuples ${\bm{v}}$ of length $n$ , with $\sum_{t=1}^{n}{\rm deg}(\phi_{v_{t}})\leq D_{n}$ .

This leads to the following result, which states that the cost of evaluating $f_{N{\bm{D}}}$ is the same as evaluating a single instance of each of the components $V_{nND_{n}}$ ; that is, the sum over all possible clusters does not incur an additional cost.

Proposition 4.5.

Assume that Assumption 3.1 is satisfied, that the one-particle basis can be evaluated with $\mathcal{O}(1)$ operations per basis function (e.g. via recursion), and that the degrees are decreasing, $D_{n}\leq D_{n-1}$ . Then, cluster expansion $f_{N{\bm{D}}}$ can be evaluated with at most

C\bigg{(}MD_{1}^{d}+\mathcal{P}\bigg{)},

(4.10)

arithmetic operations ( $+,\times$ ), where $\mathcal{P}\coloneqq\sum_{n=1}^{N}P(n,D_{n},d)$ is the number of parameters used to represent the $N$ components $V_{nND_{n}}$ for $n=1,\dots,N$ and $C$ is some positive constant.

Remark 4.6.

The number of parameters used to represent $V_{nND_{n}}$ for all $n=1,\dots,N$ can be estimated using the results of the foregoing sections. In particular, this allows us to obtain error estimates in Theorems 4.7 and 4.8 in terms of the computational cost.

Proof.

Assumption 3.1 implies that there are $O(D_{1}^{d})$ one-particle basis functions, where $D_{1}=\max_{1\leq n\leq N}D_{n}$ . Precomputing all density projections $A_{v}$ therefore requires $\mathcal{O}(MD_{1}^{d})$ operations. The $n$ -correlations $\prod_{t=1}^{n}A_{v_{t}}$ can be evaluated recursively with increasing correlation-order $n$ , with $\mathcal{O}(1)$ cost. Hence, the total cost of evaluating (4.5) will be as stated in (4.10). ∎

4.2 Error vs Cost estimates: special case

Having established the remarkably low computational cost of the cluster expansion in the reformulation (4.9), we can now return to the derivation of error versus cost estimates from (4.6). In order to illustrate our main results, we first consider the simplest case and suppose $\alpha_{n}=\alpha$ is constant. Motivated by Remark 4.3, we assume $N\ll M\leq N^{p}$ for some $p>1$ and define $D_{n}=D:=\lceil c_{1}N\log M\rceil$ (independently of $n$ ), for some $c_{1}$ to be determined later. Since $D_{n}$ is constant, the number of parameters for the $N$ -correlation contributions will dominate. Since $N\ll D\ll N^{1+1/d}$ this puts us into the regime of the integer partition type estimates, which yields the following result.

Theorem 4.7.

Assume that $\alpha_{n}=\alpha$ appearing in (4.4) is constant and $p\geq 1$ . If we choose $D_{n}=D=c_{1}N\log N$ for $c_{1}$ sufficiently large, then

|f(\bm{x})-f_{N\bm{D}}(\bm{x})|\lesssim e^{-\eta N}\qquad\forall\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)\text{ with }M\leq N^{p}.

(4.11)

In terms of the total number of free parameters, $\mathcal{P}=\sum_{n=1}^{N}P(n,D_{n},d)$ , which is also directly proportional to the computational cost, the estimate reads

|f(\bm{x})-f_{N\bm{D}}(\bm{x})|\lesssim\exp\Big{(}-\tilde{\eta}\frac{[\log\mathcal{P}]^{1+1/d}}{\log\log\mathcal{P}}\Big{)},

(4.12)

for all $\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)$ with $M\leq N^{p}$ , for some $\tilde{\eta}>0$ , which is still a super-alebraic rate of convergence.

Proof.

Applying (4.6) directly with $D_{n}=D$ for $1\leq n\leq N$ , we obtain

	$\displaystyle\|f(\bm{x})-f_{N\bm{D}}(\bm{x})\|$	$\displaystyle\lesssim e^{-\eta N}+\sum_{n=1}^{N}{M\choose n}c^{n}N^{-\alpha c_{1}N}$
		$\displaystyle\leq e^{-\eta N}+N({\max\{1,c\}}N^{p})^{N}N^{-\alpha c_{1}N}.$

Therefore, we obtain the desired estimate (4.11) by choosing $c_{1}>0$ sufficiently large.

Using (3.5), there exists $c_{2},c_{3}>0$ such that

\mathcal{P}=\sum_{n=1}^{N}P(n,D,d)\leq c_{1}N^{2}\log Ne^{c_{2}(c_{1}N\log N)^{\frac{d}{d+1}}}\leq e^{c_{3}(N\log N)^{\frac{d}{d+1}}}.

That is, $c_{3}^{\frac{d+1}{d}}N\log N\geq[\log\mathcal{P}]^{1+\frac{1}{d}}$ . Now, if $a=N\log N$ , then $N=\frac{\log a}{\log N}\frac{a}{\log a}=\big{(}1+\frac{\log\log N}{\log N}\big{)}\frac{a}{\log a}$ . Therefore, since $t\mapsto\frac{\log\log t}{\log t}$ is bounded for $t>1+\delta$ (for all $\delta>0$ ) and $t\mapsto\frac{t}{\log t}$ is increasing for $t>e$ , there exists $c_{4}>0$ such that $N\geq c_{4}\frac{[\log\mathcal{P}]^{1+\frac{1}{d}}}{\log\log\mathcal{P}}$ which proves (4.12). ∎

4.3 Error vs Cost estimates: General Case

To make this analysis concrete we will assume that in (4.4)

\alpha_{n}=\alpha_{1}n^{\beta},\qquad\text{for some }\beta\in(0,1).

(4.13)

For the sake of generality, we consider this more general class, which better highlights the importance of balancing degree $D_{n}$ and dimensionality $n$ , and also motivates us to revisit and try to sharpen the analysis of Appendix A and [20] in the future. In light of Proposition 4.5 we will formulate the result in terms of the number of parameters rather than in terms of computational cost.

Theorem 4.8.

Assume that $\alpha_{n}=\alpha_{1}n^{\beta}$ for some $\beta\in(0,1)$ and $p\geq 1$ . If we choose

D_{n}=c_{1}\begin{cases}n^{-\beta}N&\text{if }n\leq(\log N)^{-\frac{1}{\beta}}N\\ N^{1-\beta}\log N&\text{otherwise,}\end{cases}

(4.14)

for $c_{1}$ sufficiently large, then

\left|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right|\lesssim e^{-\eta N}\qquad\forall\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)\text{ with }M\leq N^{p}.

(4.15)

In terms of the total number of free parameters, $\mathcal{P}=\sum_{n=1}^{N}P(n,D_{n},d)$ , we have, for sufficiently large $N$ ,

\left|f(\bm{x})-f_{N\bm{D}}(\bm{x})\right|\lesssim\exp\left(-\tilde{\eta}[\log\mathcal{P}]^{1+\beta+\frac{1}{d}}\right)

(4.16)

for all $\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)$ with $M\leq N^{p}$ , for some $\tilde{\eta}>0$ .

Proof.

Part 1: Error estimates (4.15). We balance the error by choosing $\bm{D}=(D_{n})$ such that $\displaystyle{M\choose n}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq\frac{1}{N}e^{-\eta N}$ for $n=1,\dots,N$ where $c>0$ is the constant from Assumption 4.2. To do so, we first apply Stirling’s formula, which gives for $n\geq 1$

\displaystyle{M\choose n}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}=\frac{M!}{n!(M-n)!}c^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq\left(\frac{ceM}{n}\right)^{n}e^{-\alpha_{1}n^{\beta}D_{n}}.

Using that $e^{-N}\leq 1/N$ for $N\geq 0$ , we therefore choose $D_{n}$ such that

\displaystyle\left(\frac{ceM}{n}\right)^{n}e^{-\alpha_{1}n^{\beta}D_{n}}\leq e^{-(\eta+1)N}.

That is, to obtain the required estimate, noting that $n\leq N$ , it is enough to choose

\alpha_{1}n^{\beta}D_{n}\geq n\log\frac{M}{n}+(\eta+\log c+2)N.

Since $\log\frac{M}{n}\leq\log\frac{N^{p}}{n}\leq p\log N$ for all $1\leq n\leq N$ , we may instead choose $(D_{n})$ such that

D_{n}\geq c_{1}n^{-\beta}\max\left\{n\log N,N\right\}

where $c_{1}:=\alpha_{1}^{-1}(p+\eta+\log c+2)$ . Since $n\log N\leq N$ for all $1\leq n\leq(\log N)^{-1}N$ and $n\log N\geq N$ for all $n\geq(\log N)^{-1}N$ , and by incorporating the condition that $(D_{n})$ is decreasing, we may choose

D_{n}:=c_{1}\max_{k\geq n}\begin{cases}k^{-\beta}N&\text{if }1\leq k\leq(\log N)^{-1}N\\ k^{1-\beta}\log N&\text{if }(\log N)^{-1}N\leq k\leq N.\end{cases}

To conclude, we note that, for $\beta\in(0,1)$ , the function $k\mapsto k^{-\beta}N$ is decreasing and $k\mapsto k^{1-\beta}\log N$ is increasing and so there exists $1\leq n^{\star}\leq(\log N)^{-1}N$ for which $D_{n}=c_{1}n^{-\beta}N$ for all $n\leq n^{\star}$ and $D_{n}=c_{1}N^{1-\beta}\log N$ for all $n\geq n^{\star}$ . Solving $(n^{\star})^{-\beta}N=N^{1-\beta}\log N$ yields $n^{\star}=(\log N)^{-\frac{1}{\beta}}N$ as required. This concludes the proof of (4.15).

Remark 4.9.

Moreover, by the same arguments, if $\beta\leq 0$ , then we may choose $D_{n}=c_{1}N^{1-\beta}\log N$ for all $1\leq n\leq N$ (which, in particular, agrees with the choice in Theorem 4.7), and if $\beta\geq 1$ , we can choose

D_{n}=c_{1}\begin{cases}n^{-\beta}N&\text{if }n\leq(\log N)^{-1}N\\ n^{1-\beta}\log N&\text{otherwise.}\end{cases}

Part 2: Estimates on the number of parameters (4.16). Let $P_{n}\coloneqq P(n,D_{n},d)$ be the number of parameters needed to construct $V_{nND_{n}}$ . According to (3.12), there exist $c_{2},c_{3}>0$ such that

\displaystyle P_{n}

\displaystyle\leq nc_{2}^{n}D_{n}\frac{(D_{n}+c_{3}n^{\frac{d+1}{d}})^{dn}}{(dn)!n!}.

Using Stirling’s estimate, we obtain

P_{n}\leq\frac{(c_{2}e^{d+1})^{n}D_{n}}{2\pi d^{dn+\frac{1}{2}}n^{(d+1)n}}\;2^{dn}\max\{D_{n},c_{3}n^{\frac{d+1}{d}}\}^{dn}.

Hence, for some $c_{4}>0$ ,

\displaystyle P_{n}

\displaystyle\leq e^{c_{4}n}D_{n}\max\left\{\left(\tfrac{D_{n}^{d}}{n^{d+1}}\right)^{n},1\right\}=\begin{cases}D_{n}\left(\frac{e^{c_{4}}D_{n}^{d}}{n^{d+1}}\right)^{n}&\text{if }1\leq n\leq D_{n}^{\frac{d}{d+1}}\\ D_{n}e^{c_{4}n}&\text{if }D_{n}^{\frac{d}{d+1}}\leq n\leq N\end{cases}

On the other hand, by (3.5), there exists $c_{5}>0$ such that

P_{n}\leq D_{n}\mathrm{exp}\big{[}c_{5}D_{n}^{\frac{d}{d+1}}\big{]},

(4.17)

for all $1\leq n\leq N$ .

Case (i): $1\leq n\leq(\log N)^{-\frac{1}{\beta}}N$ . In this case, we have following (4.14) $D_{n}=c_{1}n^{-\beta}N$ and so $n\leq D_{n}^{\frac{d}{d+1}}$ if and only if $n\leq(c_{1}N)^{\frac{1}{1+\beta+1/d}}$ . Therefore, assuming $N$ is sufficiently large such that

(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq(\log N)^{-\frac{1}{\beta}}N,

we obtain

\displaystyle P_{n}\leq D_{n}\left(\frac{e^{c_{4}}D_{n}^{d}}{n^{d+1}}\right)^{n}\leq c_{1}{n^{-\beta}}N\left(\frac{e^{c_{4}}(c_{1}N)^{d}}{n^{1+\beta d+d}}\right)^{n}\qquad\textrm{for }1\leq n\leq(c_{1}N)^{\frac{1}{1+\beta+1/d}}.

(4.18)

Moreover, in the case $(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq n\leq(\log N)^{-\frac{1}{\beta}}N$ , we have

c_{5}D_{n}^{\frac{d}{d+1}}=c_{5}(c_{1}n^{-\beta}N)^{\frac{d}{d+1}}\leq c_{5}\left[(c_{1}N)^{1-\frac{\beta}{1+\beta+1/d}}\right]^{\frac{d}{d+1}}=c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}

and thus using (4.17)

\displaystyle P_{n}\leq D_{n}e^{c_{5}D_{n}^{\frac{d}{d+1}}}\leq c_{1}n^{-\beta}Ne^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}\qquad\textrm{for }(c_{1}N)^{\frac{1}{1+\beta+1/d}}\leq n\leq{(\log N)^{-\frac{1}{\beta}}N}.

(4.19)

Case (ii): $(\log N)^{-\frac{1}{\beta}}N\leq n\leq N$ . In this case, $D_{n}=c_{1}N^{1-\beta}\log N$ and so using (4.17)

\displaystyle P_{n}\leq D_{n}e^{c_{5}D_{n}^{\frac{d}{d+1}}}\leq c_{1}N^{1-\beta}\log Ne^{c_{5}(c_{1}N^{1-\beta}\log N)^{\frac{d}{d+1}}}.

(4.20)

In order to bound the total number of parameters needed to construct $V_{nND_{n}}$ for $1\leq n\leq N$ , we consider sum of (4.18), (4.19), and (4.20) over their respective ranges of $n$ :

We start with (4.18). Since the function $g(t):=\big{(}\frac{c}{t^{\alpha}}\big{)}^{t}$ (for $c,\alpha>1$ ) has a global maximum at $t^{\star}=\frac{1}{e}c^{1/\alpha}$ with $g(t^{\star})=\mathrm{exp}\big{[}\frac{\alpha}{e}c^{\frac{1}{\alpha}}\big{]}$ , we have taking $c=e^{c_{4}}(c_{1}N)^{d}$ and $\alpha=1+\beta d+d$

$\displaystyle\sum_{n=1}^{\lfloor(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rfloor}P_{n}$	$\displaystyle\leq\sum_{n=1}^{\lfloor(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rfloor}c_{1}n^{-\beta}N\left(\frac{e^{c_{4}}(c_{1}N)^{d}}{n^{1+\beta d+d}}\right)^{n}$
	$\displaystyle\leq(c_{1}N)^{1+\frac{1}{1+\beta+1/d}}\cdot\mathrm{exp}\left[\frac{1+\beta d+d}{e}[e^{c_{4}}(c_{1}N)^{d}]^{\frac{1}{1+\beta d+d}}\right]$
	$\displaystyle\leq(c_{1}N)^{\frac{2+\beta+1/d}{1+\beta+1/d}}e^{c_{6}N^{\frac{1}{1+\beta+1/d}}}$	(4.21)

for some $c_{6}>0$ . Here, we have used the bound $n^{-\beta}\leq 1$ .

Next, we consider the sum of (4.19). On the relevant range of $n$ , we have $n^{-\beta}\leq(c_{1}N)^{-\frac{\beta}{1+\beta+1/d}}$ , and thus

$\displaystyle\sum_{n=\lceil(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rceil}^{(\log N)^{-\frac{1}{\beta}}N}P_{n}$	$\displaystyle\leq\sum_{n=\lceil(c_{1}N)^{\frac{1}{1+\beta+1/d}}\rceil}^{(\log N)^{-\frac{1}{\beta}}N}c_{1}n^{-\beta}Ne^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}$
	$\displaystyle\leq N(c_{1}N)^{1-\frac{\beta}{1+\beta+1/d}}e^{c_{5}(c_{1}N)^{\frac{1}{1+\beta+1/d}}}$
	$\displaystyle\lesssim N^{\frac{2+\beta+2/d}{1+\beta+1/d}}e^{c_{7}N^{\frac{1}{1+\beta+1/d}}}$	(4.22)

for some $c_{7}>0$ .

Finally, summing (4.20) gives

\displaystyle\sum_{n=(\log N)^{-\frac{1}{\beta}}N}^{N}P_{n}\leq c_{1}N^{2-\beta}\log Ne^{c_{5}(c_{1}N^{1-\beta}\log N)^{\frac{d}{d+1}}}.

(4.23)

The dominant contribution (for large $N$ ) is either (4.21) or (4.22) since $(1-\beta)\frac{d}{d+1}<\frac{1}{1+\beta+1/d}$ for all $\beta\in(0,1)$ and $d\geq 1$ . In particular, there exists $c_{8}>0$ such that the total number of parameters needed to construct $V_{nND_{n}}$ for $n=1,\dots,N$ satisfies $\mathcal{P}\leq e^{c_{8}N^{\frac{1}{1+\beta+1/d}}}$ and thus

N\geq\Big{(}\frac{1}{c_{8}}\log\mathcal{P}\Big{)}^{1+\beta+\frac{1}{d}}

as required to prove (4.16). ∎

5 Conclusion

We have established rigorous approximation rates for sparse and symmetric polynomial approximations of symmetric functions in high dimension. What is particularly intriguing about our results is that they highlight clearly how symmetry reduces the curse of dimensionality, sometimes significantly so. Our results also build a foundation for analysing the approximation of functions defined on a configuration space (multi-sets), which we outlined as well.

Further open challenges include the incorporation of more complex symmetries, e.g., coupling permutation with Lie group symmetries such as $O(d)$ for classical particles or $O(1,d)$ for relativistic particles, as well as the identification of practical constructive algorithms to construct approximants in our context that achieve (close to) the optimal rates.

Acknowledgements

M.B. acknowledges funding by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 233630050 – TRR 146; G. D.’s work was supported by the French “Investissements d’Avenir” program, project ISITE-BFC (contract ANR-15-IDEX-0003); C.O. is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) [IDGR019381]; J.T. is supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant EP/W522594/1.

Appendix A Justification of the Many-Body Expansion

The body-ordered approximation asserted in Assumption 4.2 is motivated by recent results presented in [20] for a wide class of tight binding models. In this section, we give a brief justification of these results.

A.1 Boundedness: $|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}$

Here, we will briefly note that naive body-ordered approximations satisfy the required bound. For $f\colon\mathrm{MS}(\Omega)\to\mathbb{R}$ , we may define the following vacuum cluster expansion:

	$\displaystyle f_{N}(\bm{x})$	$\displaystyle:=V_{0}+\sum_{n=1}^{N}\sum_{j_{1}<\dots<j_{n}}V_{n}(x_{j_{1}},\dots,x_{j_{n}})$		(A.1)
	$\displaystyle\textrm{where}\qquad V_{n}(x_{j_{1}},\dots,x_{j_{n}})$	$\displaystyle:=\sum_{K\subseteq\{j_{1},\dots,j_{n}\}}(-1)^{n-\|K\|}f\big{(}[x_{k}\colon k\in K]\big{)}.$		(A.2)

That is, the $n$ -body potential is obtained by considering the $n$ variables of interest and removing the terms that correspond to strictly smaller body-order. The alternating summation comes from the inclusion-exclusion principle. By construction, the expansion is exact for systems of size at most $N$ (i.e. $f([x_{1},\dots,x_{M}])=f_{N}([x_{1},\dots,x_{M}])$ for all $M\leq N$ ). Moreover, assuming $f$ is uniformly bounded, we obtain the required estimate $|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}$ .

While this classical vacuum cluster expansion is perhaps the most natural, it is not guaranteed to converge rapidly if at all. Instead, we will review recent results for the site energies for a particular class of simple electronic structure models.

A.2 Convergence: A tight-binding example

For an atomic configuration $\bm{x}=[x_{1},\dots,x_{M}]\in\mathrm{MS}(\Omega)$ , a (two-centre) tight binding Hamiltonian $\mathcal{H}(\bm{x})$ , describing the interaction between the atoms in the system, is given by the following matrix

\mathcal{H}(\bm{x})_{ij}:=h(x_{i}-x_{j})

for some smooth function $h\colon\mathbb{R}^{d}\to\mathbb{R}$ satisfying $|h(\xi)|+|\nabla h(\xi)|\leq h_{0}e^{-\gamma_{0}|\xi|}$ .

For functions $o\colon\mathbb{R}\to\mathbb{R}$ , the corresponding observable, $O(\bm{x})$ , is written as the following function of the Hamiltonian:

\displaystyle O(\bm{x}):=\mathrm{Tr}\,o\big{(}\mathcal{H}(\bm{x})\big{)}=\sum_{i=1}^{M}o\big{(}\mathcal{H}(\bm{x})\big{)}_{ii}.

(A.3)

We will think of $O(\bm{x})$ as the total energy of the system and the right-hand side of (A.3) as a site energy decomposition. We will justify Assumption 4.2 for the site energy $f(\bm{x}):=o\big{(}\mathcal{H}(\bm{x})\big{)}_{11}$ . By approximating $o$ with a polynomial $o_{N}(x)\coloneqq\sum_{k=0}^{N}c_{k}x^{k}$ of degree $N$ , we define the body-ordered approximation by

	$\displaystyle f_{N}(\bm{x})$	$\displaystyle:=o_{N}\big{(}\mathcal{H}(\bm{x})\big{)}_{11}=\sum_{k=0}^{N}c_{k}\big{[}\mathcal{H}(\bm{x})^{k}\big{]}_{11}$
		$\displaystyle=\sum_{k=0}^{N}c_{k}\sum_{i_{1},\dots,i_{k-1}}h(x_{1}-x_{i_{1}})h(x_{i_{1}}-x_{i_{2}})\cdots h(x_{i_{k-2}}-x_{i_{k-1}})h(x_{i_{k-1}}-x_{1}),$		(A.4)

a function of body-order at most $N$ . Convergence results for this approximation scheme follow from the estimate:

|f(\bm{x})-f_{N}(\bm{x})|\leq\big{|}\big{[}o\big{(}\mathcal{H}(\bm{x})\big{)}-o_{N}\big{(}\mathcal{H}(\bm{x})\big{)}\big{]}_{11}\big{|}\leq\sup_{z\in\sigma(\mathcal{H}(\bm{x}))}|o(z)-o_{N}(z)|.

Now, if $o$ is an analytic function in a neighbourhood of the spectrum $\sigma\big{(}\mathcal{H}(\bm{x})\big{)}$ , we are able to construct approximations $o_{N}$ that give an exponential rate of convergence with rate depending on the region of analyticity of $o$ [20].

Written more explicitly, the body-ordered approximation has a similar expression to that of (A.2):

$\displaystyle f_{N}(\bm{x})$	$\displaystyle=V_{0N}+\sum_{n=1}^{N-1}\sum_{j_{1}<\dots<j_{n}}V_{nN}(x_{j_{1}},\dots,x_{j_{n}})\qquad\text{where}$
$\displaystyle V_{nN}(x_{j_{1}},\dots,x_{j_{n}})$	$\displaystyle:=\sum_{k=0}^{N}c_{k}\sum_{\genfrac{}{}{0.0pt}{}{i_{0},i_{1}\dots,i_{k}\colon i_{0}=i_{k}=1,}{\{i_{0},\dots,i_{k}\}=\{1,j_{1},\dots,j_{n}\}}}\prod_{l=0}^{k-1}h(x_{j_{l}}-x_{j_{l+1}})$
	$\displaystyle=\sum_{K\subseteq\{j_{1},\dots,j_{n}\}}(-1)^{n-\|K\|}o_{N}\big{(}\mathcal{H}(\bm{x})\|_{1,K}\big{)}_{11}.$	(A.5)

Here, $\mathcal{H}(\bm{x})|_{1,K}$ is the restriction of $\mathcal{H}(\bm{x})$ to $\{1\}\cup K$ defined as $[\mathcal{H}(\bm{x})|_{1,K}]_{ij}=\mathcal{H}(\bm{x})_{ij}$ if $i,j\in\{1\}\cup K$ and $[\mathcal{H}(\bm{x})|_{1,K}]_{ij}=0$ otherwise. Therefore, in this expression, $V_{nN}(x_{j_{1}},\dots,x_{j_{n}})$ is an $(n+1)$ -body potential of the central atom $x_{1}$ and $x_{j_{1}},\dots,x_{j_{n}}$ (some authors therefore call $V_{nN}$ an $n$ -correlation potential). The final line in (A.5) follows from an inclusion-exclusion principle [20]. In particular, the boundedness of the $V_{nN}$ follows from the boundedness of $o_{N}$ [20], as in (A.2). Moreover, $V_{nN}$ inherits the analyticity properties of the Hamiltonian.

We have therefore seen that the site energies in the tight binding framework satisfy both the conditions in Assumption 4.2: (i) convergence of a body-ordered approximation, with an exponential rate, and (ii) the $V_{nN}$ are analytic (with region of analyticity depending on the regularity of the Hamiltonian).

References

[1] A. G. Beged-dov, Lower and upper bounds for the number of lattice points in a simplex, SIAM J. Appl. Math., 22 (1972), pp. 106–108.
[2] H. Chen and C. Ortner, QM/MM methods for crystalline defects. part 1: Locality of the tight binding model, Multiscale Model. Simul., 14 (2016), pp. 232–264.
[3] A. Cohen and R. DeVore, Approximation of high-dimensional parametric PDEs *, Acta Numer., 24 (2015), pp. 1–159.
[4] H. Derksen and G. Kemper, Computational Invariant Theory, Springer, Dec. 2015.
[5] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Phys. Rev. B Condens. Matter, 99 (2019), p. 014104.
[6] R. Drautz and C. Ortner. in preparation.
[7] G. Dusson, M. Bachmayr, G. Csanyi, R. Drautz, S. Etter, C. van der Oord, and C. Ortner, Atomic cluster expansion: Completeness, efficiency and stability, J. Comp. Phys., 454 (2022).
[8] P. Erdos, On an elementary proof of some asymptotic formulas in the theory of partitions, Ann. Math., 43 (1942), pp. 437–450.
[9] M. Griebel and J. Oettershagen, On tensor product approximation of analytic functions, Journal of Approximation Theory, 207 (2016), pp. 348–379.
[10] J. Han, Y. Li, L. Lin, J. Lu, J. Zhang, and L. Zhang, Universal approximation of symmetric and anti-symmetric functions, arXiv e-prints, 1912.01765 (2019).
[11] G. H. Hardy and S. Ramanujan, Asymptotic formulaæ in combinatory analysis, Proceedings of the London Mathematical Society, (1918).
[12] I. Kaliuzhnyi and C. Ortner, Optimal evaluation of symmetry-adapted $n$ -correlations via recursive contraction of sparse symmetric tensors. arXiv:2202.04140.
[13] J. C. Mason, Near-best multivariate approximation by fourier series, chebyshev series and chebyshev interpolation, J. Approx. Theory, 28 (1980), pp. 349–358.
[14] E. Novak and H. Woźniakowski, Tractability of multivariate problems. Vol. 1: Linear information, vol. 6 of EMS Tracts in Mathematics, European Mathematical Society (EMS), Zürich, 2008.
[15] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[16] R B J and A. Slomson, How to Count: An Introduction to Combinatorics, Second Edition, CRC Press, July 2011.
[17] J. L. Ramírez Alfonsín, The Diophantine Frobenius Problem, OUP Oxford, Dec. 2005.
[18] R. A. Ryan, Introduction to Tensor Products of Banach Spaces, Springer Monographs in Mathematics, Springer London, 2002.
[19] B. Sagan, The symmetric group: representations, combinatorial algorithms, and symmetric functions, vol. 203, Springer Science & Business Media, 2001.
[20] J. Thomas, H. Chen, and C. Ortner, Rigorous body-order approximations of an electronic structure potential energy landscape, arXiv e-prints, 2106.12572 (2021).
[21] L. Trefethen, Multivariate polynomial approximation in the hypercube, Proc. Am. Math. Soc., (2017).
[22] E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, and M. A. Osborne, On the limitations of representing functions on sets, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 6487–6494.
[23] M. Weimar, The complexity of linear tensor product problems in (anti)symmetric Hilbert spaces, J. Approx. Theory, 164 (2012), pp. 1345–1368.
[24] A. P. Yutsis and A. A. Bandzaitis, Theory of angular momentum in quantum mechanics, Vil’nyus, (1965).
[25] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017.

Polynomial Approximation of Symmetric Functions

Abstract

1 Introduction

2 Symmetric Functions in ℝN\mathbb{R}^{N}

Remark 2.1.

Lemma 2.2.

Proof.

Corollary 2.3.

Lemma 2.4.

Proof.

Theorem 2.5.

Proof.

Remark 2.6.

3 General Results

3.1 A basis for symmetric functions in (ℝd)N(\mathbb{R}^{d})^{N}

Assumption 3.1.

Proposition 3.2.

Proof.

Proposition 3.3.

Proof.

Remark 3.4 (Examples of Basis Sets).

Remark 3.5 (Lie Group Symmetries).

Remark 3.6 (Computational Cost).

3.2 Approximation errors

Assumption 3.7.

Theorem 3.8.

Proof.

Theorem 3.9.

Proof.

3.3 Asymptotic Results for D≫N1+1/dD\gg N^{1+1/d}

Theorem 3.10.

Proof.

Lemma 3.11.

Proof.

Lemma 3.12.

Proof.

Lemma 3.13.

Proof.

Proposition 3.14.

Proof.

Remark 3.15.

4 Approximation of multi-set functions

Remark 4.1 (Context).

Assumption 4.2.

Remark 4.3.

Remark 4.4 (Connection to Deep sets).

4.1 Computational cost of the cluster expansion

Proposition 4.5.

Remark 4.6.

Proof.

4.2 Error vs Cost estimates: special case

Theorem 4.7.

Proof.

4.3 Error vs Cost estimates: General Case

Theorem 4.8.

Proof.

Remark 4.9.

5 Conclusion

Acknowledgements

Appendix A Justification of the Many-Body Expansion

A.1 Boundedness: |Vn​N​(xj1,…,xjn)|≲2n|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}

A.2 Convergence: A tight-binding example

References

2 Symmetric Functions in $\mathbb{R}^{N}$

3.1 A basis for symmetric functions in $(\mathbb{R}^{d})^{N}$

3.3 Asymptotic Results for $D\gg N^{1+1/d}$

A.1 Boundedness: $|V_{nN}(x_{j_{1}},\dots,x_{j_{n}})|\lesssim 2^{n}$