Average-Case Complexity of Tensor Decomposition
for Low-Degree Polynomials

Alexander S. Wein Email: [email protected]. Part of this work was done while with the Algorithms and Randomness Center at Georgia Tech, supported by NSF grants CCF-2007443 and CCF-2106444. Department of Mathematics, University of California, Davis

Abstract

Suppose we are given an $n$ -dimensional order-3 symmetric tensor $T\in(\mathbb{R}^{n})^{\otimes 3}$ that is the sum of $r$ random rank-1 terms. The problem of recovering the rank-1 components is possible in principle when $r\lesssim n^{2}$ but polynomial-time algorithms are only known in the regime $r\ll n^{3/2}$ . Similar “statistical-computational gaps” occur in many high-dimensional inference tasks, and in recent years there has been a flurry of work on explaining the apparent computational hardness in these problems by proving lower bounds against restricted (yet powerful) models of computation such as statistical queries (SQ), sum-of-squares (SoS), and low-degree polynomials (LDP). However, no such prior work exists for tensor decomposition, largely because its hardness does not appear to be explained by a “planted versus null” testing problem.

We consider a model for random order-3 tensor decomposition where one component is slightly larger in norm than the rest (to break symmetry), and the components are drawn uniformly from the hypercube. We resolve the computational complexity in the LDP model: $O(\log n)$ -degree polynomial functions of the tensor entries can accurately estimate the largest component when $r\ll n^{3/2}$ but fail to do so when $r\gg n^{3/2}$ . This provides rigorous evidence suggesting that the best known algorithms for tensor decomposition cannot be improved, at least by known approaches. A natural extension of the result holds for tensors of any fixed order $k\geq 3$ , in which case the LDP threshold is $r\sim n^{k/2}$ .

1 Introduction

Tensor decomposition is a basic computational primitive underlying many algorithms for a variety of data analysis tasks, including phylogenetic reconstruction [MR05], topic modeling [AFH⁺12], community detection [AGHK13, HS17, AAA17, JLLX21], learning mixtures of Gaussians [HK13, GHK15, BCMV14, ABG⁺14], independent component analysis [GVX14], dictionary learning [BKS15, MSS16], and multi-reference alignment [PWB⁺19]. For further discussion, we refer the reader to [Moi14, RSG17, SDF⁺17, BM20].

In this work we will consider order- $k$ tensors of the form $T\in(\mathbb{R}^{n})^{\otimes k}$ for an integer $k\geq 3$ , that is, $T$ is an $n\times\cdots\times n$ ( $k$ times) array of real numbers with entries denoted by $T_{i_{1},\ldots,i_{k}}$ with $i_{1},\ldots,i_{k}\in[n]:=\{1,2,\ldots,n\}$ . A vector $u=(u_{i})_{i\in[n]}\in\mathbb{R}^{n}$ can be used to form a rank-1 tensor $u^{\otimes k}$ defined by $(u^{\otimes k})_{i_{1},\ldots,i_{k}}=u_{i_{1}}u_{i_{2}}\cdots u_{i_{k}}$ , akin to the rank-1 matrix $uu^{\top}$ .

In the tensor decomposition problem, we observe a rank- $r$ tensor of the form

T=\sum_{j=1}^{r}a_{j}^{\otimes k}

(1)

with unknown components $a_{j}\in\mathbb{R}^{n}$ . The goal is to recover the components $a_{1},\ldots,a_{r}$ up to the inherent symmetries in the problem, i.e., we cannot recover the ordering of the list $a_{1},\ldots,a_{r}$ and if $k$ is even we cannot distinguish $a_{j}$ from $-a_{j}$ . Our regime of interest will be $k$ fixed and $n$ large, with $r$ possibly growing with $n$ .

A large number of algorithmic results for tensor decomposition now exist under various assumptions on $k,n,r$ and $\{a_{j}\}$ [LRA93, DCC07, GVX14, BCMV14, AGJ15, AGJ14, BKS15, GM15, HSSS16, MSS16, GM17, SS17, HSS19, KP20, DdL⁺22]. We will focus on the order-3 case ( $k=3$ ) to simplify the discussion. If $a_{1},\ldots,a_{r}$ are linearly independent, the decomposition problem can be solved by a classical method based on simultaneous diagonalization¹¹1This method is sometimes called Jennrich’s algorithm, although this may be a misnomer; see [Kol21]. [LRA93]. Most recent work focuses on the more difficult overcomplete regime where $r>n$ . For random order-3 tensors, where the components $a_{j}$ are drawn uniformly from the unit sphere, the state-of-the-art polynomial-time algorithms succeed when $r\ll n^{3/2}$ (where $\ll$ hides a polylogarithmic factor $(\log n)^{O(1)}$ ); this was achieved first by a method based on the sum-of-squares hierarchy [MSS16] and then later by a faster spectral method [DdL⁺22]. Under the weaker condition $r\lesssim n^{2}$ (where $\lesssim$ hides a constant factor), the decomposition is unique [BCO14] and so the problem is solvable in principle. However, no polynomial-time algorithm is known in the regime $r\gtrsim n^{3/2}$ .

For random tensors of any fixed order $k\geq 4$ , the situation is similar. The decomposition is unique when $r\lesssim n^{k-1}$ [BCO14], but poly-time algorithms are only known for substantially smaller rank. For $k=4$ , poly-time decomposition is known when $r\ll n^{2}$ [DCC07, MSS16], and it is expected that $r\ll n^{k/2}$ is achievable for general $k$ ; however, to our knowledge, the best poly-time algorithms in the literature for $k\geq 5$ require $r\lesssim n^{\lfloor(k-1)/2\rfloor}$ [LRA93, BCMV14]. These algorithms for $k\geq 4$ work not just for random tensors but also under much weaker assumptions on the components.

Statistical-computational gaps.

The motivation for this work is to understand whether the “statistical-computational gap” mentioned above is inherent, that is, we hope to investigate whether or not a poly-time algorithm exists in the apparent “possible but hard” regime $n^{k/2}\lesssim r\lesssim n^{k-1}$ . For context, gaps between statistical and computational thresholds are a nearly ubiquitous phenomenon throughout high-dimensional statistics, appearing in settings such as planted clique, sparse PCA, community detection, tensor PCA, and more; see e.g. [WX21, GMZ22] for exposition. In these average-case problems where the input is random, classical computational complexity theory has little to say: results on NP-hardness (including those for tensor decomposition [Hås89, HL13]) typically show hardness for worst-case instances, which does not imply hardness in the average case. Still, a number of frameworks have emerged to justify why the “hard” regime is actually hard, and to predict the location of the hard regime in new problems. One approach is based on average-case reductions which formally transfer suspected hardness of one average-case problem (usually planted clique) to another (e.g. [BR13, HWX15, BBH18, BB20]). Another approach is to prove unconditional lower bounds against particular algorithms or classes of algorithms (e.g. [Jer92, DKMZ11, LKZ15]), and sometimes this analysis is based on intricate geometric properties of the solution space (e.g. [AC08, GS17, GZ17]; see [Gam21] for a survey). Perhaps some of the most powerful and well-established classes of algorithms to prove lower bounds against are statistical query (SQ) algorithms (e.g. [FGR⁺17, DKS17]), the sum-of-squares (SoS) hierarchy (e.g. [BHK⁺19, KMOW17]; see [RSS18] for a survey), and low-degree polynomials (LDP) (e.g. [HS17, HKP⁺17, Hop18]; see [KWB22] for a survey). In recent years, all of the above frameworks have been widely successful at providing many different perspectives on what makes statistical problems easy versus hard, and there has also been progress on understanding formal connections between different frameworks [HKP⁺17, GJW20, BBH⁺21, BEH⁺22]. For many conjectured statistical-computational gaps, we now have sharp lower bounds in one (or more) of these frameworks, suggesting that the best known algorithms cannot be improved (at least by certain known approaches).

Despite all these successes, the tensor decomposition problem is a rare example where (prior to this work) essentially no progress has been made on any kind of average-case hardness, even though the problem itself and the suspected threshold at $r\sim n^{k/2}$ have maintained a high profile in the community since the work of [GM15] in 2015. The lecture notes [Kun22] highlight “hardness of tensor decomposition” as one of 15 prominent open problems related to sum-of-squares (Open Problem 7.1). In Section 1.3.2 we explain why tensor decomposition is more difficult to analyze than the other statistical problems that were understood previously, and in Section 1.4.1 we explain how to overcome these challenges.

The LDP framework.

Our main result (Theorem 1.4) determines the computational complexity of tensor decomposition in the low-degree polynomial (LDP) framework, establishing an easy-hard phase transition at the threshold $r\sim n^{k/2}$ . This means we consider a restricted class of algorithms, namely those that can be expressed as $O(\log n)$ -degree multivariate polynomials in the tensor entries. The study of this class of algorithms in the context of high-dimensional statistics first emerged from a line of work on the sum-of-squares (SoS) hierarchy [BHK⁺19, HS17, HKP⁺17, Hop18] and is now a popular tool for predicting and explaining statistical-computational gaps; see [KWB22] for a survey. Low-degree polynomials capture various algorithmic paradigms such as spectral methods and approximate message passing (see e.g. Section 4 of [KWB22], Appendix A of [GJW20], and Theorem 1.4 of [MW22]), and so LDP lower bounds allow us to rule out a large class of known approaches all at once. For the types of problems that typically arise in high-dimensional statistics (informally speaking), the LDP framework has a great track record for consistently matching the performance of the best known poly-time algorithms. As a result, LDP lower bounds can be taken as evidence for inherent hardness of certain types of average-case problems. While there are some settings where LDPs are outperformed by another algorithm, these other algorithms tend to be “brittle” algebraic methods with almost no noise tolerance, and so the LDP framework is arguably still saying something meaningful in these settings; see [HW21, Kun21, KM21a, ZSWB22, DK22] for discussion.

Existing LDP lower bounds apply to a wide variety of statistical tasks, which can be classified as hypothesis testing (e.g. [HS17, HKP⁺17, Hop18]), estimation (e.g. [SW22, KM21a]), and optimization (e.g. [GJW20, Wei22, BH22]). The techniques used to prove these lower bounds are quite different in the three cases, and the current work introduces a new technique for the case of estimation.

1.1 Largest Component Recovery

In order to formulate the random tensor decomposition problem in the LDP framework, we will define a variant of the problem where one component has slightly larger norm than the rest, and the goal is to recover this particular component.

Problem 1.1 (Largest component recovery).

Let $k\geq 3$ be odd and $\delta>0$ . In the largest component recovery problem, we observe

T=(1+\delta)a_{1}^{\otimes k}+\sum_{j=2}^{r}a_{j}^{\otimes k}

where each $a_{j}$ (for $1\leq j\leq r$ ) is drawn uniformly and independently from the hypercube $\{\pm 1\}^{n}$ . The goal is to recover $a_{1}$ with high probability.

One can think of $\delta$ as a small constant, although our results will apply more generally. The purpose of increasing the norm of the first component is to give the algorithm a concrete goal, namely to recover $a_{1}$ ; without this, the algorithm has no way to break symmetry among the components. We have restricted to odd $k$ here because otherwise there is no hope to disambiguate between $a_{1}$ and $-a_{1}$ ; however, we give similar results for even $k$ in Section 2. The assumption that $a_{j}$ is drawn from the hypercube helps to make some of the proofs cleaner, but our approach can likely handle other natural distributions for $a_{j}$ such as $\mathcal{N}(0,I_{n})$ (and we do not expect this to change the threshold).

Connection to decomposition.

While Problem 1.1 does not have formal implications for the canonical random tensor decomposition model (i.e., (1) with $a_{j}$ uniform on the sphere), it does have implications for a mild generalization of this problem.

Problem 1.2 (Semirandom tensor decomposition).

Let $k\geq 3$ be odd and $\delta>0$ . In the semirandom tensor decomposition problem, we observe

T=\sum_{j=1}^{r}\lambda_{j}a_{j}^{\otimes k}

where $\lambda_{j}\in[1,1+\delta]$ are arbitrary and $a_{j}$ are drawn uniformly and independently from the hypercube $\{\pm 1\}^{n}$ . The goal is to recover $a_{1},\ldots,a_{r}$ (up to re-ordering) as well as the corresponding $\lambda_{j}$ ’s, with high probability.

Note that any algorithm for Problem 1.2 can be used to solve Problem 1.1 (with the same parameters $k,n,r,\delta$ ): simply run the algorithm to recover all the $(\lambda_{j},a_{j})$ pairs and then output the $a_{j}$ with the largest corresponding $\lambda_{j}$ . When $r\gg n^{k/2}$ , our main result (Theorem 1.4) will give rigorous evidence for hardness of Problem 1.2 via a two-step argument: we will show LDP-hardness of Problem 1.1, justifying the conjecture that there is no poly-time algorithm for Problem 1.1; this conjecture, if true, formally implies that there is no poly-time algorithm for Problem 1.2.

We see Problem 1.2 as a natural variant of random tensor decomposition. In particular, we expect that existing algorithms [MSS16, DdL⁺22] can be readily adapted to solve this variant when $r\ll n^{k/2}$ , at least for $k\in\{3,4\}$ . As such, understanding the complexity of Problem 1.2 (via the connection to Problem 1.1 described above) can be viewed as the end-goal of this work. From this point onward, we will study Problem 1.1. In Section 1.5 we discuss possible extensions of our result that would more directly address the canonical model for random tensor decomposition.

Remark 1.3.

In line with the discussion in the Introduction, our results should really only be considered evidence for hardness of Problems 1.1 and 1.2 in the presence of noise. Indeed, if $\delta$ is not an integer, there is a trivial algorithm for Problem 1.1 by examining the non-integer part of $T$ (the author thanks Bruce Hajek for pointing this out). This algorithm is easily defeated by adding Gaussian noise of variance $\sigma^{2}\gg 1$ to each entry of $T$ . This is a low level of noise, as we believe our low-degree upper bound tolerates $\sigma\ll n^{k/4}$ , akin to tensor PCA.

1.2 Main Results

We now show how to cast Problem 1.1 in the LDP framework and state our main results. The class of algorithms we consider are (multivariate) polynomials $f$ of degree (at most) $D$ with real coefficients, whose input variables are the $n^{k}$ entries of $T\in(\mathbb{R}^{n})^{\otimes k}$ and whose output is a real scalar value; we write $\mathbb{R}[T]_{\leq D}$ for the space of such polynomials. We will be interested in whether such a polynomial can accurately estimate $a_{11}:=(a_{1})_{1}$ , the first entry of the vector $a_{1}$ . Following [SW22], define the degree- $D$ minimum mean squared error

\mathsf{MMSE}_{\leq D}:=\inf_{f\in\mathbb{R}[T]_{\leq D}}\mathbb{E}[(f(T)-a_{11})^{2}]

where the expectation is over $\{a_{i}\}$ and $T$ sampled as in Problem 1.1. Note that the above “scalar MMSE” is directly related to the “vector MMSE”: by linearity of expectation and symmetry among the coordinates,

\inf_{f_{1},\ldots,f_{n}\in\mathbb{R}[T]_{\leq D}}\mathbb{E}\sum_{i=1}^{n}(f_{i}(T)-(a_{1})_{i})^{2}=n\cdot\mathsf{MMSE}_{\leq D}.

(2)

It therefore suffices to study the scalar MMSE, with $\mathsf{MMSE}_{\leq D}\to 0$ indicating near-perfect estimation and $\mathsf{MMSE}_{\leq D}\to 1$ indicating near-trivial estimation (no better than the constant function $f(T)\equiv 0$ ).

Theorem 1.4.

Fix any $k\geq 3$ odd and $\delta>0$ . As in Problem 1.1, let

T=(1+\delta)a_{1}^{\otimes k}+\sum_{j=2}^{r}a_{j}^{\otimes k}

where each $a_{j}$ (for $1\leq j\leq r$ ) is drawn uniformly and independently from the hypercube $\{\pm 1\}^{n}$ . Let $r=r_{n}$ grow with $n$ as $r=\Theta(n^{\alpha})$ for an arbitrary fixed $\alpha>0$ .

•

(“Easy”) If $\alpha<k/2$ then $\lim_{n\to\infty}\mathsf{MMSE}_{\leq C\log n}=0$ for a constant $C=C(k,\delta)>0$ .
•

(“Hard”) If $\alpha>k/2$ then $\lim_{n\to\infty}\mathsf{MMSE}_{\leq n^{c}}=1$ for a constant $c=c(k,\alpha)>0$ .

More precise and more general statements of the lower bound (“hard” regime) and upper bound (“easy” regime) can be found in Section 2 (Theorems 2.1 and 2.2), where the case of $k$ even is also handled. The lower bound shows failure of LDP algorithms when $r\gg n^{k/2}$ . Since no one has managed to find a poly-time algorithm in the LDP-hard regime for many other problems of a similar nature (including planted clique [Hop18], community detection [HS17], tensor PCA [HKP⁺17, KWB22], and more), this suggests that there is no poly-time algorithm for Problem 1.1 (and therefore also for Problem 1.2) when $r\gg n^{k/2}$ , at least without a new algorithmic breakthrough. This is the first average-case hardness result of any type for tensor decomposition, and it gives a concrete reason to believe that the existing algorithms for $k=3$ (which succeed when $r\ll n^{3/2}$ [MSS16, DdL⁺22]) are essentially optimal. Following the heuristic correspondence between polynomial degree and runtime in Hypothesis 2.1.5 of [Hop18] (see also [KWB22, DKWB19])—namely, degree $D$ corresponds to runtime $\exp(\tilde{\Theta}(D))$ where $\tilde{\Theta}$ hides factors of $\log n$ —the lower bound suggests that runtime $\exp(n^{\Omega(1)})$ is required in the hard regime.

The upper bound shows that a degree- $O(\log n)$ polynomial successfully estimates $a_{1}$ in the regime $r\ll n^{k/2}$ . Such a polynomial has $n^{O(\log n)}$ terms and can thus be evaluated in time $n^{O(\log n)}$ (quasi-polynomial). This is in stark contrast to the $\exp(n^{\Omega(1)})$ runtime that we expect to need in the hard regime. We are confident that for $k\in\{3,4\}$ and $r\ll n^{k/2}$ , Problems 1.1 and 1.2 can in fact be solved in polynomial time by adapting existing algorithms [MSS16, DdL⁺22], but we have not attempted this here as our main focus is on establishing the phase transition for polynomials. We consider the lower bound to be our main contribution, and the upper bound serves to make the lower bound more meaningful.

1.3 Discussion

1.3.1 Why the LDP framework?

Since there are many different frameworks for exploring statistical-computational gaps, we briefly discuss the possibility of applying some of the others to tensor decomposition, and explain why the LDP framework seems especially suited to this task.

Statistical query (SQ) model.

The SQ model (e.g. [FGR⁺17, DKS17]) is applicable only to settings where the observation consists of i.i.d. samples from some unknown distribution. There does not seem to be a clear way to cast tensor decomposition in this form.

Sum-of-squares (SoS) hierarchy.

SoS is a powerful algorithmic tool, and SoS lower bounds (e.g. [BHK⁺19, KMOW17, RSS18]) are sometimes viewed as a gold standard in average-case hardness. However, it is important to note that SoS lower bounds show hardness of certification problems, whereas Problems 1.1 and 1.2 are recovery problems. Hardness of certification does not necessarily imply hardness of recovery, as discussed in [BKW20, BMR21, BBK⁺21]. Therefore, while there are some natural certification problems related to tensor decomposition that would be nice to have SoS lower bounds for (e.g. Open Problem 7.2 in [Kun22]), it is not clear how to use an SoS lower bound to directly address the decomposition problem.

Optimization landscape.

Prior work [AGJ15, GM17] has studied the landscape of a certain non-convex optimization problem related to tensor decomposition, leading to some algorithmic results. In other settings, recent work has studied structural properties of optimization problems as a means to prove failure of certain algorithms (e.g., local search, Markov chains, and gradient descent [GZ17, BGJ20, GJS21, BWZ20]) to recover a planted structure. If results of this type were obtained for tensor decomposition, they would complement ours by ruling out different types of algorithms. One caveat is that recent work on tensor PCA [RM14, BGJ20, MKUZ19, WEM19, BCRT20] and planted clique [GZ19, CMZ22] has revealed that the choice of which function to optimize can be very important. Thus, “hardness” of one particular canonical optimization landscape need not indicate inherent hardness of the recovery problem.

Reductions.

Reductions from planted clique (e.g. [BR13, HWX15, BBH18, BB20]) are among the strongest forms of average-case hardness that exist for statistical problems, and it would be great to have reduction-based evidence for hardness of tensor decomposition. This would likely require new ideas beyond those in the current literature, as Problem 1.1 is somewhat different in structure from planted clique and the problems it has been reduced to thus far. These differences are explained in Section 1.3.2 below.

In summary, there are many complementary approaches that could be explored in future work in order to give new perspectives on hardness of tensor decomposition. In this work we have chosen to pursue the LDP framework, some strengths of which include its stellar track record of predicting statistical-computational gaps in the past, and its flexibility to directly address the decomposition problem (Problem 1.2) rather than a related certification or optimization problem.

1.3.2 Difficulties of tensor decomposition

Although the suspected threshold $r\sim n^{k/2}$ for random tensor decomposition has been around for some time now [GM15, MSS16], and despite the many recent successes in computational lower bounds for other statistical problems, there are (prior to this work) effectively no results that point to whether $r\sim n^{k/2}$ is a fundamental barrier or not. Here we will discuss what makes tensor decomposition more difficult to analyze than the other statistical-computational gaps that have received recent attention in the literature.

As a point of comparison, we briefly introduce the tensor principal component analysis (tensor PCA) problem [RM14]: here we observe an order- $k$ tensor $Y=\lambda x^{\otimes k}+Z$ where $\lambda>0$ is a signal-to-noise ratio, $x\in\mathbb{R}^{n}$ is drawn uniformly from the unit sphere, and $Z\in(\mathbb{R}^{n})^{\otimes k}$ is a tensor with i.i.d. $\mathcal{N}(0,1)$ entries; the goal is to (approximately) recover the vector $x$ (up to sign, if $k$ is even). Tensor PCA also has a statistical-computational gap, and it is now well-understood: the best known poly-time algorithms require $\lambda\gtrsim n^{k/4}$ , and there are matching lower bounds in both the SoS and LDP frameworks [HSS15, HKP⁺17, PR20, KWB22]. While tensor decomposition and tensor PCA may look superficially similar, tensor decomposition is much harder to analyze for the reason described below.

With a few exceptions ([SW22, KM21b, KM21a]), essentially all existing lower bounds for recovering a planted signal in the SQ, SoS, or LDP frameworks leverage hardness of testing between a “planted” distribution and a “null” distribution, where the null distribution has independent entries. In the case of tensor PCA, in order to show hardness of recovering $x$ when $\lambda\ll n^{k/4}$ , the SoS and LDP lower bounds crucially leverage the fact that it is hard to even detect the planted structure when $\lambda\ll n^{k/4}$ , that is, it is hard to distinguish between a spiked tensor $Y$ and an i.i.d. Gaussian tensor $Z$ . Now for the decomposition problem, we are hoping to show hardness of decomposition whenever $r\gg n^{k/2}$ , but the problem of distinguishing between a random rank- $r$ tensor and the appropriate Gaussian tensor²²2The Gaussian tensor matches moments of order $1$ and $2$ with the rank- $r$ tensor. (One could choose a different null distribution, but the analysis may be difficult; for instance, see Section 1.5.) is actually easy when $r\ll n^{k}$ ; this can be achieved by thresholding a particular degree-4 polynomial in the tensor entries, where each monomial forms a double cover of $2k$ elements of $[n]$ . This “detection-recovery gap” creates difficulty because the standard tools for proving lower bounds cannot show hardness of recovery unless detection is also hard. The tools with this limitation include the SDA (statistical dimension on average) approach for SQ [FGR⁺17], the pseudo-calibration approach for SoS [BHK⁺19], and the low-degree likelihood ratio [HS17, HKP⁺17, Hop18, KWB22] (as well as its conditional variants [BEH⁺22, CGH⁺22]) for LDP. Reductions from planted clique also typically require hardness of detection [BBH18].

The method of [SW22] overcomes this challenge in some settings and gives LDP lower bounds for recovery problems even in regimes where detection is easy. This methods applies to problems of the form (roughly speaking) “signal plus noise” where the noise has independent entries. For instance, this gives LDP-hardness of recovery for variants of tensor PCA with detection-recovery gaps [LZ22]. However, tensor decomposition does not take this “signal plus noise” form: Problem 1.1 has a “signal” term $(1+\delta)a_{1}^{\otimes k}$ but the remaining “noise” term is a rank- $(r-1)$ tensor, which does not have independent entries. As a result, we will need to introduce a new method in order to prove a lower bound for tensor decomposition. The proof strategy is outlined in Section 1.4.1 below.

1.4 Proof Techniques

1.4.1 Lower bound

Instead of mean squared error it will be convenient to work with an equivalent quantity, the degree- $D$ maximum correlation

\mathsf{Corr}_{\leq D}:=\sup_{f\in\mathbb{R}[T]_{\leq D}}\frac{\mathbb{E}[f(T)\cdot a_{11}]}{\sqrt{\mathbb{E}[f(T)^{2}]}}.

This is directly related to $\mathsf{MMSE}_{\leq D}$ via the identity $\mathsf{MMSE}_{\leq D}=\mathbb{E}[a_{11}^{2}]-\mathsf{Corr}_{\leq D}^{2}=1-\mathsf{Corr}_{\leq D}^{2}$ (see [SW22, Fact 1.1]), so our objective is to show $\mathsf{Corr}_{\leq D}=o(1)$ when $r\gg n^{k/2}$ and $D=n^{\Omega(1)}$ .

A first attempt is to compute $\mathsf{Corr}_{\leq D}$ explicitly. Expand an arbitrary degree- $D$ polynomial in the monomial basis: $f(T)=\sum_{0\leq|S|\leq D}\hat{f}_{S}T^{S}$ where $S$ ranges over multi-sets of tensor entries and $T^{S}:=\prod_{I\in S}T_{I}$ . The numerator of $\mathsf{Corr}_{\leq D}$ is a linear functional of the coefficient vector $\hat{f}=(\hat{f}_{S})_{|S|\leq D}$ :

\mathbb{E}[f(T)\cdot a_{11}]=\sum_{S}\hat{f}_{S}\,\mathbb{E}[T^{S}\cdot a_{11}]=c^{\top}\hat{f}\qquad\text{where }c_{S}:=\mathbb{E}[T^{S}\cdot a_{11}].

For the denominator, $\mathbb{E}[f(T)^{2}]$ is a quadratic form:

\mathbb{E}[f(T)^{2}]=\sum_{S,S^{\prime}}\hat{f}_{S}\hat{f}_{S^{\prime}}\,\mathbb{E}[T^{S}T^{S^{\prime}}]=\hat{f}^{\top}P\hat{f}\qquad\text{where }P_{S,S^{\prime}}:=\mathbb{E}[T^{S}T^{S^{\prime}}].

This means we have the explicit formula

\mathsf{Corr}_{\leq D}=\sup_{\hat{f}}\frac{c^{\top}\hat{f}}{\sqrt{\hat{f}^{\top}P\hat{f}}}=\sqrt{c^{\top}P^{-1}c}.

The difficulty here is that, while we can write down an explicit formula for the vector $c$ and matrix $P$ , it does not seem tractable to write down an explicit formula for the inverse matrix $P^{-1}$ . We will instead find an upper bound on $\mathsf{Corr}_{\leq D}$ that is manageable to work with.

Our difficulties stem from the fact that we do not have an orthogonal basis of polynomials to work with. After all, if $\{T^{S}\}_{|S|\leq D}$ were orthogonal polynomials—i.e., if $P$ were a diagonal matrix—then we would have no problem inverting $P$ . Since the distribution of $T$ is complicated (in particular, it is not a product measure), it seems difficult to construct an explicit basis of orthogonal polynomials. Instead, we will think of $f(T)$ as a function of the underlying i.i.d. Rademacher random variables $A=(a_{ij}):=(a_{j})_{i}$ and work with an orthogonal basis of polynomials in those variables. For any $f(T)$ there is an associated polynomial $g(A)$ such that $f(T)=g(A)$ , and we can expand these as

\sum_{|S|\leq D}\hat{f}_{S}T^{S}=f(T)=g(A)=\sum_{|U|\leq kD}\hat{g}_{U}A^{U},

where $U\subseteq[n]\times[r]$ , $A^{U}:=\prod_{(i,j)\in U}a_{ij}$ , and $\hat{g}=(\hat{g}_{U})_{|U|\leq kD}$ is some vector of coefficients. Here $\{A^{U}\}$ is the standard Fourier basis for functions on the hypercube, which crucially does have the desired orthogonality property: $\mathbb{E}[A^{U}A^{U^{\prime}}]=\mathbbm{1}_{U=U^{\prime}}$ . As a result, we can express the denominator of $\mathsf{Corr}_{\leq D}$ as simply

\mathbb{E}[f(T)^{2}]=\mathbb{E}[g(A)^{2}]=\|\hat{g}\|^{2}.

To exploit this, we will need to understand the relation between $\hat{f}$ and the corresponding $\hat{g}$ . We will write down an explicit linear transformation, i.e., a matrix $M$ such that $\hat{g}=M\hat{f}$ . The crux of our new strategy is that to obtain an upper bound on $\mathsf{Corr}_{\leq D}$ it suffices to construct a left-inverse for $M$ , i.e., a matrix $M^{+}$ such that $M^{+}M=I$ . To see why this suffices,

\mathsf{Corr}_{\leq D}=\sup_{\hat{f}}\frac{c^{\top}\hat{f}}{\|M\hat{f}\|}=\sup_{\hat{f}}\frac{c^{\top}M^{+}M\hat{f}}{\|M\hat{f}\|}\leq\sup_{h}\frac{c^{\top}M^{+}h}{\|h\|}=\|c^{\top}M^{+}\|,

where in the inequality step, $h$ plays the role of $\hat{g}=M\hat{f}$ except we have relaxed the problem to allow $h$ to be any vector, not necessarily in the image of $M$ .

In light of the above, our remaining task is to construct an explicit left-inverse $M^{+}$ . There are many possible choices, some of which will yield better bounds on $\mathsf{Corr}_{\leq D}$ than others. We will need one that yields a good bound but is also analytically tractable to write down explicitly. We now explain the intuition behind our construction for $M^{+}$ . Note that the left-inverse property is equivalent to $M^{+}\hat{g}=\hat{f}$ whenever $\hat{g}=M\hat{f}$ . That is, $M^{+}$ is a procedure to recover the (coefficients of the) polynomial $f$ when given the (coefficients of the) polynomial $g$ that satisfies $f(T)=g(A)$ . Luckily there is a methodical process for this task which starts from the highest-degree terms and works downward. At each stage, there is always a particular monomial in $A$ whose coefficient in $g$ allows us to immediately deduce some particular coefficient in $f$ . To illustrate, if $k=3$ and $D=2$ then the coefficient of the monomial

a_{27}a_{37}a_{47}a_{28}a_{38}a_{58}

(3)

in $g(A)$ allows us to immediately deduce the coefficient of the monomial $T_{234}T_{235}$ in $f(T)$ , since this is the only monomial in $T$ (of degree at most $2$ ) whose expansion in terms of $A$ contains the term (3). Once we have deduced the highest-degree coefficients of $f$ , we can subtract off the corresponding terms in $g$ and repeat the process recursively. Our choice of $M^{+}$ is inspired by this process; the full details can be found in Section 3.4.

Recall now that the final step is to bound $\|c^{\top}M^{+}\|$ for our choice of $M^{+}$ . Due to the recursive structure of $M^{+}$ , this boils down to analyzing certain recursively-defined values $v_{S}$ (see Section 3.6). As above, $S$ is a multi-set of tensor entries, which can be thought of as a hypergraph on vertex set $[n]$ (with a hyperedge for each tensor entry). The value $v_{S}$ is computed by summing over all possible ways to reduce $S$ to a smaller hypergraph by replacing some collection of hyperedges by a new hyperedge equal to their symmetric difference. Analyzing this recurrence is the most technically involved part of the proof, since there are subtle cancellations that need to be understood in order to get the right bound.

1.4.2 Upper bound

To construct a polynomial that accurately estimates the largest component, we take inspiration from a line of algorithmic work that builds spectral methods from tensor networks [HSSS16, MW19, DdL⁺22, OTR22]. A tensor network is a diagram that specifies how to “multiply” together a collection of tensors to form another tensor; see [MW19] for details. Some existing algorithms for order-3 tensor decomposition [HSSS16, DdL⁺22] (including the fastest known method that reaches the threshold $r\sim n^{3/2}$ [DdL⁺22]) are based on the idea of building a tensor network from multiple copies of the input tensor, flattening the result to a matrix, and running a spectral method on this matrix. Morally speaking, these are all steps that one should be able to capture using low-degree polynomials, but the algorithm of [DdL⁺22] is actually a multi-stage process that seems challenging to write as a single polynomial. Also, since our goal is to estimate the largest component, we do not need to inject randomness to break symmetry among the components as in [DdL⁺22].

To construct our polynomial, we multiply $O(\log n)$ copies of $T$ together in a tensor network whose shape is an expander graph. The output is a single vector, which we take as our estimate for $a_{1}$ . There is one key trick we use to greatly simplify the analysis: we deviate slightly from the standard notion of a tensor network by disallowing repeated “edge labels.” The precise definition of the polynomial appears in Section 4.2.

1.5 Future Directions

We collect here a list of possible extensions of our main result and related open problems.

•

Distribution of $a_{j}$ : We have taken $a_{j}$ drawn uniformly from the hypercube in the interest of keeping the proofs as clean as possible. We expect our approach could be adapted to other distributions for $a_{j}$ such as $\mathcal{N}(0,I_{n})$ .
•

Canonical model: While we feel that the semirandom tensor decomposition model (Problem 1.2) is quite natural, there may be interest in showing hardness for the more canonical model where every $\lambda_{j}$ is equal to 1. One way to approach this could be to study the task of hypothesis testing between a random rank- $r$ tensor and a random rank- $(r+1)$ tensor. Hardness (for poly-time algorithms) of this testing problem implies hardness of decomposition in the canonical model. It may be possible to use our proof technique to show LDP-hardness of this testing problem. Another alternative approach could be to consider a variant of Problem 1.1 where symmetry between the tensor components is broken by giving the algorithm a small amount of side-information rather than increasing the norm of one component.
•

Tensors with algebraic structure: Hopefully the proof techniques introduced here will open the door to other hardness results for related problems. For instance, orbit recovery problems—a class of statistical problems involving group actions [BBK⁺17, FSWW20]—give rise to variants of tensor decomposition with algebraic structure. One example is multi-reference alignment (MRA), where the tensor components are cyclic shifts of one another [PWB⁺19]. A statistical-computational gap in heterogeneous MRA was conjectured in [BBLS18]; the positive side was resolved in [Wei18] using [MSS16], and the negative side should now be approachable using our techniques. Tensors with continuous group symmetry are even less well understood [MW19, LM21].
•

Smoothed order-3 tensor decomposition: For $k=3$ , the algorithms achieving the optimal condition $r\ll n^{3/2}$ seem to crucially exploit randomness in the components [MSS16, DdL⁺22]. In contrast, other algorithms succeed in the smoothed analysis model (which imposes minimal assumptions on the components) but require $r\lesssim n$ [GVX14, BCMV14]. It remains unclear whether better algorithms for the smoothed model exist, or whether there is an inherent gap between the random and smoothed settings. For $k=4$ there is no such gap, with both random and smoothed algorithms achieving $r\sim n^{2}$ [DCC07, MSS16].
•

Other LDP estimation lower bounds: Proving LDP lower bounds for estimation problems remains a difficult task in many settings, with relatively few tools available compared to detection (hypothesis testing) problems. A few open problems are to resolve the LDP recovery threshold for sparse linear regression and group testing; these problems have detection-recovery gaps and the LDP detection thresholds were resolved in [BEH⁺22, CGH⁺22].

2 General Setting and Results

It will be convenient to generalize Problem 1.1 in a few different ways. First, we allow all the components to have potentially different norms. This way, there is nothing distinguished about the first component, eliminating some tedious casework. To this end, we will consider tensors of the form

\sum_{j=1}^{r}\lambda_{j}a_{j}^{\otimes k}

(4)

where each component $a_{j}$ is drawn uniformly and independently from the hypercube $\{\pm 1\}^{n}$ , and $\lambda_{j}\in\mathbb{R}$ are deterministic nonzero scalars that are known to the algorithm. Without loss of generality we assume $1=\lambda_{1}\geq|\lambda_{2}|\geq|\lambda_{3}|\geq\cdots\geq|\lambda_{r}|=:\lambda_{\min}>0$ . The goal is to recover the largest component $a_{1}$ . The case $\lambda_{2}=\lambda_{3}=\cdots=\lambda_{r}=(1+\delta)^{-1}$ is equivalent to Problem 1.1.

Second, we will potentially allow the algorithm access not just to an order- $k$ tensor but also to tensors of lower order. For $I\subseteq[n]$ write

T_{I}:=\sum_{j=1}^{r}\lambda_{j}a_{j}^{I}

(5)

with $\lambda_{j}$ and $a_{j}$ as above, and where for a vector $v\in\mathbb{R}^{n}$ , $v^{I}:=\prod_{i\in I}v_{i}$ . Note that since the $a_{j}$ have $\{\pm 1\}$ -valued entries, each entry of the tensor (4) takes the form $T_{I}$ for some $0\leq|I|\leq k$ .

Let $\Omega$ be a collection of subsets of $[n]$ . We consider the task of estimating $a_{11}:=(a_{1})_{1}$ using a degree- $D$ polynomial $f:\mathbb{R}^{\Omega}\to\mathbb{R}$ whose input variables are $\{T_{I}\}_{I\in\Omega}$ . Accordingly, we define

\mathsf{MMSE}_{\leq D}^{\Omega}:=\inf_{\begin{subarray}{c}f:\mathbb{R}^{\Omega}\to\mathbb{R}\\ \deg(f)\leq D\end{subarray}}\mathbb{E}[(f(T)-a_{11})^{2}].

To make our lower bound for order- $k$ tensors as strong as possible, we will choose $\Omega=\{I\subseteq[n]\,:\,0<|I|\leq k\}$ . This is equivalent to giving the algorithm access to all the tensors

\sum_{j=1}^{r}\lambda_{j}a_{j},\qquad\sum_{j=1}^{r}\lambda_{j}a_{j}^{\otimes 2},\qquad\ldots,\qquad\sum_{j=1}^{r}\lambda_{j}a_{j}^{\otimes k},

(which is a situation that often arises when using the method of moments, e.g. [HK13, PWB⁺19]). Note that a lower bound in this setting implies a lower bound in the original setting where only the order- $k$ tensor is revealed.

To make our upper bound as strong as possible, we will give the algorithm access to minimal information. For $k$ odd, we will take $\Omega=\{I\subseteq[n]\,:\,|I|=k\}$ , that is, our polynomial only needs access to the “off-diagonal” entries of the order- $k$ tensor ( $T_{i_{1},\ldots,i_{k}}$ where $i_{1},\ldots,i_{k}$ are all distinct). For $k$ even, the order- $k$ tensor alone does not suffice to disambiguate between $a_{1}$ and $-a_{1}$ , so we additionally reveal the (off-diagonal part of the) order- $(k-1)$ tensor, that is, we take $\Omega=\{I\subseteq[n]\,:\,k-1\leq|I|\leq k\}$ . (An alternative formulation for $k$ even could be to reveal only the order- $k$ tensor and ask the algorithm to estimate $(a_{1})_{1}\cdot(a_{1})_{2}$ ; we expect this could be analyzed using our methods.)

The following are non-asymptotic statements of our main results. Together, these immediately imply Theorem 1.4.

Theorem 2.1 (Lower bound).

Consider the setting described above with any parameters $k\geq 3$ , $n\geq 1$ , $D\geq 0$ , and set $\Omega=\{I\subseteq[n]\,:\,0<|I|\leq k\}$ . If

r\geq 19k^{k}D^{k+4}\lambda_{\min}^{-2}n^{k/2}

(6)

then

\mathsf{MMSE}_{\leq D}^{\Omega}\geq 1-n^{-1/2}.

In particular, if $k\geq 3$ is fixed, $\lambda_{\min}\geq\delta$ for fixed $\delta>0$ , and $r=r_{n}$ grows as $r=\Theta(n^{\alpha})$ for fixed $\alpha>k/2$ , then $\lim_{n\to\infty}\mathsf{MMSE}_{\leq n^{c}}^{\Omega}=1$ for a constant $c=c(k,\alpha)>0$ .

Theorem 2.2 (Upper bound).

Consider the setting described above with any $k\geq 3$ . Let $k^{\prime}\in\{k-1,k\}$ be odd, and set $\Omega=\{I\subseteq[n]\,:\,k^{\prime}\leq|I|\leq k\}$ . Suppose $n\geq n_{0}$ for some constant $n_{0}=n_{0}(k)$ . Let $D$ be the smallest odd integer such that

D\geq\frac{k\log n}{1-|\lambda_{2}|}.

If $|\lambda_{2}|\leq 1-n^{-1/52}$ and

r\leq\frac{1}{2}k^{-k/2}D^{-27k}n^{k/2}

(7)

then

\mathsf{MMSE}_{\leq D}^{\Omega}\leq 10k^{k-1}D^{52k}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}}\leq 10k^{k-1}D^{52k}\frac{r}{n^{k/2}}.

(8)

In particular, if $k\geq 3$ is fixed, $|\lambda_{2}|\leq 1-\delta$ for fixed $\delta>0$ , and $r=r_{n}$ grows as $r=\Theta(n^{\alpha})$ for fixed $0<\alpha<k/2$ , then $\lim_{n\to\infty}\mathsf{MMSE}_{\leq C\log n}^{\Omega}=0$ for a constant $C=C(k,\delta)>0$ .

Note that (7) is the bottleneck, requiring $r\ll n^{k/2}$ . The intermediate step in (8) is a sharper bound on the MMSE that we use for the remark below, but it does not dictate the threshold for $r$ .

Remark 2.3 (Exact recovery).

We remark that if $\mathsf{MMSE}_{\leq D}=o(1/n)$ then exact recovery of $a_{1}$ is possible with probability $1-o(1)$ by thresholding the values of certain degree- $D$ polynomials $f_{1},\ldots,f_{n}$ . To see this: combining (2) with Markov’s inequality, we have $f_{1},\ldots,f_{n}$ such that $\sum_{i=1}^{n}(f_{i}(T)-(a_{1})_{i})^{2}<1$ with probability $1-o(1)$ . Furthermore, thresholding $f_{1},\ldots,f_{n}$ is guaranteed to exactly recover $a_{1}$ whenever $\sum_{i=1}^{n}(f_{i}(T)-(a_{1})_{i})^{2}<1$ .

As a result, when $|\lambda_{2}|\leq 1-\Omega(1)$ and $r\leq n^{k/2}/(\log n)^{O(1)}$ , our upper bound (Theorem 2.2) gives exact recovery for any fixed $k\geq 5$ . We expect a similar result to hold for $k\in\{3,4\}$ but this may require modifying the construction of the polynomial in the proof of Theorem 2.2.

3 Proof of Theorem 2.1: Lower Bound

3.1 Setup

Throughout, define $\Omega:=\{I\subseteq[n]\,:\,0<|I|\leq k\}$ . Our goal will be to give an upper bound on

\mathsf{Corr}_{\leq D}:=\sup_{\begin{subarray}{c}f:\mathbb{R}^{\Omega}\to\mathbb{R}\\ \deg(f)\leq D\end{subarray}}\frac{\mathbb{E}[f(T)\cdot a_{11}]}{\sqrt{\mathbb{E}[f(T)^{2}]}}.

This will imply the desired result due to the following direct relation between $\mathsf{Corr}_{\leq D}$ and the associated MMSE.

Fact 3.1 ([SW22, Fact 1.1]).

$\mathsf{MMSE}_{\leq D}^{\Omega}=\mathbb{E}[a_{11}^{2}]-\mathsf{Corr}_{\leq D}^{2}=1-\mathsf{Corr}_{\leq D}^{2}$ .

Any degree- $D$ polynomial $f:\mathbb{R}^{\Omega}\to\mathbb{R}$ admits an expansion of the form

f(T)=\sum_{0\leq|S|\leq D}\hat{f}_{S}T^{S}

for some coefficients $\hat{f}_{S}\in\mathbb{R}$ , where $S$ takes values $S\in\mathbb{N}^{\Omega}:=\{0,1,2,\ldots\}^{\Omega}$ with $|S|:=\sum_{I\in\Omega}S_{I}$ and $T^{S}:=\prod_{I\in\Omega}T_{I}^{S_{I}}$ .

At the same time, $T$ can be thought of as a function of $A$ , the $n\times r$ matrix with columns $a_{j}$ . This means we can also expand $f$ as

f(T)=g(A)=\sum_{U}\hat{g}_{U}A^{U}

where $U$ ranges over subsets $U\subseteq[n]\times[r]$ of cardinality $|U|\leq kD$ . This expansion will be useful because, since $A$ has i.i.d. Rademacher entries, $\{A^{U}\}$ forms an orthonormal basis in the sense $\mathbb{E}[A^{U}A^{U^{\prime}}]=\mathbbm{1}_{U=U^{\prime}}$ . As a consequence,

\mathbb{E}[f(T)^{2}]=\mathbb{E}[g(A)^{2}]=\|\hat{g}\|^{2}:=\sum_{U}\hat{g}_{U}^{2}.

(9)

Any vector of coefficients $\hat{f}=(\hat{f}_{S})_{|S|\leq D}$ induces a unique choice of $\hat{g}=(\hat{g}_{U})_{|U|\leq kD}$ such that

\sum_{|S|\leq D}\hat{f}_{S}T^{S}=f(T)=g(A)=\sum_{|U|\leq kD}\hat{g}_{U}A^{U}.

(10)

It will be important to understand this mapping from $\hat{f}$ to $\hat{g}$ .

3.2 Proof Overview

We will show that the mapping from $\hat{f}$ to $\hat{g}$ in (10) is a linear transformation that takes the form $\hat{g}=M\hat{f}$ for an explicit matrix $M=(M_{US})$ . A key step in the proof will be to construct an explicit left inverse for $M$ , that is, a matrix $M^{+}$ satisfying $M^{+}M=I$ . In other words, $M^{+}$ is a matrix that recovers the coefficients $\hat{f}$ from the coefficients $\hat{g}$ : $M^{+}\hat{g}=M^{+}M\hat{f}=\hat{f}$ .

The numerator of $\mathsf{Corr}_{\leq D}$ can be expressed as

\mathbb{E}[f\cdot a_{11}]=\sum_{|S|\leq D}\hat{f}_{S}\,\mathbb{E}[T^{S}\cdot a_{11}]=c^{\top}\hat{f}

where the vector $c=(c_{S})_{|S|\leq D}$ is defined by

c_{S}=\mathbb{E}[T^{S}\cdot a_{11}].

(11)

Using (9), the denominator can be expressed as $\sqrt{\mathbb{E}[f(T)^{2}]}=\|\hat{g}\|=\|M\hat{f}\|$ . This means we can write

\mathsf{Corr}_{\leq D}=\sup_{\hat{f}}\frac{c^{\top}\hat{f}}{\|M\hat{f}\|}=\sup_{\hat{f}}\frac{c^{\top}M^{+}M\hat{f}}{\|M\hat{f}\|}\leq\sup_{h}\frac{c^{\top}M^{+}h}{\|h\|}=\|c^{\top}M^{+}\|.

(12)

In the crucial inequality above, $h$ plays the role of $\hat{g}=M\hat{f}$ except we have relaxed the problem to allow $h$ to be any vector, not necessarily in the image of $M$ . After this simplification, the optimizer for $h$ is $h^{*}=(c^{\top}M^{+})^{\top}$ (or any scalar multiple thereof), yielding the explicit expression $\|c^{\top}M^{+}\|$ for the optimum. So long as we can construct a left inverse $M^{+}$ , this gives an upper bound on $\mathsf{Corr}_{\leq D}$ . While many choices for $M^{+}$ are possible, we will need to find one that (i) is simple enough to work with explicitly, and (ii) results in a good bound on $\mathsf{Corr}_{\leq D}$ at the end of the day.

3.3 Computing $M$

The first step in writing down an explicit expression for $M$ will be to express $T^{S}$ in the basis $\{A^{U}\}$ .

Definition 3.2 (List notation for $S$ ).

We will identify $S\in\mathbb{N}^{\Omega}$ with a multi-set, namely the multi-set containing $S_{I}$ copies of each $I\in\Omega$ . With some abuse of notation, write $S$ as an ordered list $S=(I_{1},\ldots,I_{|S|})$ containing the elements of the associated multi-set sorted according to some arbitrary but fixed ordering on $\Omega$ . For a labeling $\ell=(\ell_{1},\ldots,\ell_{|S|})\in[r]^{|S|}$ , define $S(\ell)\subseteq[n]\times[r]$ to be the subset containing all $(i,j)$ pairs with the following property: the element $i$ occurs in an odd number of the sets $\{I_{d}\,:\,\ell_{d}=j\}$ . (Informally, $S(\ell)$ is produced by placing each $I_{d}$ into column $\ell_{d}$ of an $n\times r$ grid and then XOR’ing the contents of each column.)

With this notation we can write

T^{S}=\prod_{I\in\Omega}\left(\sum_{j=1}^{r}\lambda_{j}a_{j}^{I}\right)^{S_{I}}=\prod_{d=1}^{|S|}\left(\sum_{\ell_{d}=1}^{r}\lambda_{\ell_{d}}a_{\ell_{d}}^{I_{d}}\right)=\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}A^{S(\ell)}

(13)

where $\lambda^{\ell}:=\prod_{d}\lambda_{\ell_{d}}$ .

Next we consider an arbitrary vector of coefficients $\hat{f}$ and write down an expression for the corresponding $\hat{g}$ in (10). We have

	$\displaystyle f(T)$	$\displaystyle=\sum_{\|S\|\leq D}\hat{f}_{S}T^{S}$
		$\displaystyle=\sum_{\|S\|\leq D}\hat{f}_{S}\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}A^{S(\ell)}$
		$\displaystyle=\sum_{\|U\|\leq kD}A^{U}\underbrace{\sum_{\|S\|\leq D}\hat{f}_{S}\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U}}_{\hat{g}_{U}}.$

In other words, $\hat{g}=M\hat{f}$ where

M_{US}:=\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U}.

(14)

3.4 Constructing the Left Inverse

We now construct a left inverse $M^{+}$ for $M$ , which, recall, is needed to apply the bound (12). The intuition behind this construction is described in Section 1.4.1.

Definition 3.3 (Generic $U$ ).

We call $U\subseteq[n]\times[r]$ generic provided that every column $U_{j}\subseteq[n]$ satisfies $|U_{j}|\leq k$ , and at most $D$ columns satisfy $|U_{j}|>0$ .

These are the “generic” terms appearing in the expansion (13): $U$ is generic if and only if there exist $S\in\mathbb{N}^{\Omega}$ with $|S|\leq D$ and $\ell\in[r]^{|S|}$ with distinct entries $\ell_{1},\ldots,\ell_{|S|}$ such that $S(\ell)=U$ . (Note that (6) implies $D\leq r$ , so it is possible for $\ell$ to have distinct entries.) Furthermore, if $U$ is generic then there is a unique corresponding $S$ , namely $\mathsf{cols}(U)$ defined as follows.

Definition 3.4.

For a generic $U$ , define $\mathsf{cols}(U)\in\mathbb{N}^{\Omega}$ by letting $\mathsf{cols}(U)_{I}$ be the number of columns $j$ for which $U_{j}=I$ .

When viewing $S$ as a multi-set as per Definition 3.2, $\mathsf{cols}(U)$ is simply the multi-set of columns $U_{j}$ . Recalling the definition $|S|:=\sum_{I\in\Omega}S_{I}$ , note that $|\mathsf{cols}(U)|$ denotes the number of non-empty columns of $U$ .

Our left inverse $M^{+}$ will satisfy $M^{+}_{SU}=0$ whenever $U$ is not generic; in other words, our procedure for recovering $\hat{f}$ from $\hat{g}$ only uses the values $\hat{g}_{U}$ for which $U$ is generic. Write $M$ in block form

M=\left[\begin{array}[]{c}G\\ N\end{array}\right]

where $G$ (“generic”) is indexed by generic $U$ ’s and $N$ (“not”) is indexed by the rest. It suffices to construct a left inverse $G^{+}$ for $G$ and then set

M^{+}=\left[\begin{array}[]{cc}G^{+}&0\end{array}\right].

(15)

Note that $G=G(D)$ has the recursive structure

G(D)=\left[\begin{array}[]{cc}G(D-1)&R(D)\\ 0&Q(D)\end{array}\right]

where the first block of columns is indexed by $|S|\leq D-1$ and the second is indexed by $|S|=D$ , and the first block of rows is indexed by $|\mathsf{cols}(U)|\leq D-1$ and the second by $|\mathsf{cols}(U)|=D$ . Crucially, the lower-left block is 0: recall (14) and note that $S(\ell)=U$ is only possible when $|S|\geq|\mathsf{cols}(U)|$ . Given any left inverses $G(D-1)^{+},Q(D)^{+}$ for $G(D-1),Q(D)$ respectively, one can verify that the following matrix is a valid left inverse for $G(D)$ :

G(D)^{+}:=\left[\begin{array}[]{cc}G(D-1)^{+}&-G(D-1)^{+}R(D)Q(D)^{+}\\ 0&Q(D)^{+}\end{array}\right].

(16)

The left inverse $G(D-1)^{+}$ can be constructed by applying (16) recursively. The matrix $Q(D)$ has only one nonzero entry per row and so we will be able to construct $Q(D)^{+}$ by hand.

Lemma 3.5.

The following is a valid left inverse for $Q=Q(D)$ : for $U$ generic and $|S|=|\mathsf{cols}(U)|=D$ , define $Q^{+}=(Q^{+}_{SU})$ by

Q^{+}_{SU}:=\frac{\mathbbm{1}_{\mathsf{cols}(U)=S}}{\lambda^{U}r^{\underline{|S|}}}

(17)

where

\lambda^{U}:=\prod_{j\,:\,|U_{j}|>0}\lambda_{j}

and

r^{\underline{d}}:=\underbrace{r(r-1)(r-2)\cdots(r-d+1)}_{d\text{ factors}}.

(18)

Proof.

We need to verify $Q^{+}Q=I$ . For $|S|=|S^{\prime}|=D$ ,

(Q^{+}Q)_{SS^{\prime}}=\sum_{U\,:\,|\mathsf{cols}(U)|=D}Q^{+}_{SU}Q_{US^{\prime}}=\sum_{U\,:\,|\mathsf{cols}(U)|=D}\frac{\mathbbm{1}_{\mathsf{cols}(U)=S}}{\lambda^{U}r^{\underline{|S|}}}\sum_{\ell\in[r]^{|S^{\prime}|}}\lambda^{\ell}\,\mathbbm{1}_{S^{\prime}(\ell)=U}=\mathbbm{1}_{S=S^{\prime}},

where the last step is justified as follows: since $|S^{\prime}|=|\mathsf{cols}(U)|=D$ , the indicator $\mathbbm{1}_{S^{\prime}(\ell)=U}$ can only be satisfied when $\ell$ has distinct entries. There are $r^{\underline{|S^{\prime}|}}$ such $\ell$ ’s, and for each there is exactly one term in the first sum satisfying $\mathbbm{1}_{S^{\prime}(\ell)=U}$ , namely $U=S^{\prime}(\ell)$ , and this implies $\lambda^{U}=\lambda^{\ell}$ and $S^{\prime}=\mathsf{cols}(U)$ . This means the indicator $\mathbbm{1}_{\mathsf{cols}(U)=S}$ becomes $\mathbbm{1}_{S=S^{\prime}}$ , and the other factors cancel. This completes the proof. ∎

This completes the description of the left inverse $M^{+}$ .

3.5 Recurrence for $w$

Ultimately we are interested in an expression not for $M^{+}$ but for the vector $w^{\top}:=c^{\top}M^{+}$ appearing in (12). We will use the calculations from the previous section to write down a self-contained recursive formula for the entries of $w$ . Note that from (15), $w_{U}=0$ whenever $U$ is not generic, so we will focus on computing $w_{\mathrm{gen}}=(w_{U}\,:\,U\text{ generic})$ . Using (16),

w_{\mathrm{gen}}^{\top}=c^{\top}G^{+}=\left[c^{\top}\left[\begin{array}[]{c}G(D-1)^{+}\\ 0\end{array}\right]\quad c^{\top}\left[\begin{array}[]{c}-G(D-1)^{+}R(D)Q(D)^{+}\\ Q(D)^{+}\end{array}\right]\right]=:\left[x^{\top}\quad y^{\top}\right],

where $x$ is indexed by $|\mathsf{cols}(U)|\leq D-1$ and $y$ is indexed by $|\mathsf{cols}(U)|=D$ . The expression for $x$ reveals that $w_{U}$ does not depend on $D$ (so long as $D\geq|\mathsf{cols}(U)|$ ). By comparing the expressions for $x$ and $y$ we can write

y^{\top}=c^{\top}\left[\begin{array}[]{c}0\\ Q(D)^{+}\end{array}\right]-x^{\top}R(D)Q(D)^{+}.

For any generic $U$ , the case $D=|\mathsf{cols}(U)|$ of the above gives

w_{U}=\underbrace{\sum_{S\,:\,|S|=|\mathsf{cols}(U)|}c_{S}Q(D)^{+}_{SU}}_{\text{(I)}}-\underbrace{\sum_{\begin{subarray}{c}\text{generic }U^{\prime}\\ |\mathsf{cols}(U^{\prime})|<|\mathsf{cols}(U)|\end{subarray}}\;\sum_{S\,:\,|S|=|\mathsf{cols}(U)|}w_{U^{\prime}}R(D)_{U^{\prime}S}Q(D)^{+}_{SU}}_{\text{(II)}}.

We treat the two terms $\text{(I)},\text{(II)}$ separately. Using the definition (17) for $Q^{+}$ ,

	(I)	$\displaystyle=\sum_{S\,:\,\|S\|=\|\mathsf{cols}(U)\|}c_{S}\,\frac{\mathbbm{1}_{\mathsf{cols}(U)=S}}{\lambda^{U}r^{\underline{\|S\|}}}$
		$\displaystyle=\frac{c_{S}}{\lambda^{U}r^{\underline{\|S\|}}}\qquad\text{where }S=\mathsf{cols}(U).$

Now for the second term (II), suppressing the constraints on $U^{\prime}$ for ease of notation,

	(II)	$\displaystyle=\sum_{U^{\prime}}\;\sum_{S\,:\,\|S\|=\|\mathsf{cols}(U)\|}w_{U^{\prime}}R(D)_{U^{\prime}S}Q(D)^{+}_{SU}$
		$\displaystyle=\sum_{U^{\prime}}\;\sum_{S\,:\,\|S\|=\|\mathsf{cols}(U)\|}w_{U^{\prime}}\left(\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\right)\frac{\mathbbm{1}_{\mathsf{cols}(U)=S}}{\lambda^{U}r^{\underline{\|S\|}}}$
		$\displaystyle=\sum_{U^{\prime}}w_{U^{\prime}}\left(\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\right)\frac{1}{\lambda^{U}r^{\underline{\|S\|}}}\qquad\text{where }S=\mathsf{cols}(U).$

Putting it together, we have now shown

\mathsf{Corr}_{\leq D}^{2}\leq\sum_{\begin{subarray}{c}\text{generic }U\\ |\mathsf{cols}(U)|\leq D\end{subarray}}w_{U}^{2}

(19)

where, for generic $U$ , the $w_{U}$ are defined by the recurrence

w_{U}=\frac{1}{\lambda^{U}r^{\underline{|S|}}}\left(c_{S}-\sum_{\begin{subarray}{c}\text{generic }U^{\prime}\\ |\mathsf{cols}(U^{\prime})|<|\mathsf{cols}(U)|\end{subarray}}w_{U^{\prime}}\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\right)\qquad\text{where }S=\mathsf{cols}(U).

(20)

3.6 Recurrence for $v$

It will be convenient to rewrite the recurrence (20) in terms of a different quantity indexed by $S\in\mathbb{N}^{\Omega}$ instead of $U\subseteq[n]\times[r]$ , namely

v_{S}:=\lambda^{U}r^{\underline{|S|}}\,w_{U}\qquad\text{for $U$ satisfying }\mathsf{cols}(U)=S.

(21)

It can be seen from (20) that $v_{S}$ is well-defined in the sense that it does not depend on the choice of $U$ in (21). In particular,

v_{S}=c_{S}-\sum_{\begin{subarray}{c}\text{generic }U^{\prime}\\ |\mathsf{cols}(U^{\prime})|<|S|\end{subarray}}w_{U^{\prime}}\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}.

Using (21) we can turn this into a self-contained recurrence for $v$ (not involving $w$ ):

	$\displaystyle v_{S}$	$\displaystyle=c_{S}-\sum_{\begin{subarray}{c}\text{generic }U^{\prime}\\ \|\mathsf{cols}(U^{\prime})\|<\|S\|\end{subarray}}\frac{v_{S^{\prime}}}{\lambda^{U^{\prime}}r^{\underline{\|S^{\prime}\|}}}\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\qquad\text{where }S^{\prime}=\mathsf{cols}(U^{\prime})$
		$\displaystyle=c_{S}-\sum_{\begin{subarray}{c}S^{\prime}\in\mathbb{N}^{\Omega}\\ \|S^{\prime}\|<\|S\|\end{subarray}}v_{S^{\prime}}\sum_{\ell\in[r]^{\|S\|}}\frac{\lambda^{\ell}}{\lambda^{S(\ell)}r^{\underline{\|S^{\prime}\|}}}\,\mathbbm{1}_{\mathsf{cols}(S(\ell))=S^{\prime}}$		(22)

where the predicate $\mathsf{cols}(S(\ell))=S^{\prime}$ in particular requires $S(\ell)$ to be generic. For any $S\in\mathbb{N}^{\Omega}$ there are at most $r^{\underline{|S|}}$ corresponding generic $U$ ’s for which $S=\mathsf{cols}(U)$ . For any such $U$ , we have from (21) that $|w_{U}|\leq|v_{S}|/(\lambda_{\min}^{|S|}\,r^{\underline{|S|}})$ . This means (19) gives

\mathsf{Corr}_{\leq D}^{2}\leq\sum_{\begin{subarray}{c}S\in\mathbb{N}^{\Omega}\\ |S|\leq D\end{subarray}}r^{\underline{|S|}}\left(\frac{v_{S}}{\lambda_{\min}^{|S|}\,r^{\underline{|S|}}}\right)^{2}=\sum_{\begin{subarray}{c}S\in\mathbb{N}^{\Omega}\\ |S|\leq D\end{subarray}}\frac{v_{S}^{2}}{\lambda_{\min}^{2|S|}\,r^{\underline{|S|}}}

(23)

where $v_{S}$ is defined by the recurrence (22).

3.7 Grouping by Patterns

The core challenge remaining is to analyze the recurrence (22) and establish an upper bound on $|v_{S}|$ . Naively bounding the terms in (22) will not suffice, as there are subtle cancellations that occur. To understand the nature of these cancellations, we will rewrite (22) in a different form. First we will group the terms in (22) by their type, defined as followed. We use $\oplus$ to denote the XOR (symmetric difference) operation on sets.

Definition 3.6.

Fix $S\in\mathbb{N}^{\Omega}$ , viewed as a list $S=(I_{1},\ldots,I_{|S|})$ as per Definition 3.2. Define a pattern $\pi=(\pi_{1},\ldots,\pi_{|S|})$ to be an element of $([r]\cup\{\star\})^{|S|}$ satisfying the following rules:

(i)

“No singletons”: if $\pi_{d}=j\in[r]$ then there must exist $d^{\prime}\neq d$ such that $\pi_{d^{\prime}}=j$ .
(ii)

“Not all stars”: there must exist $d$ such that $\pi_{d}\neq\star$ .
(iii)

“Every column valid”: for every $j\in[r]$ , we have $|\oplus_{d\,:\,\pi_{d}=j}I_{d}|\leq k$ .

Let $\Pi(S)$ denote the set of all such patterns. We let $S(\pi)_{j}:=\oplus_{d\,:\,\pi_{d}=j}I_{d}$ and define $S(\pi)\subseteq[n]\times[r]$ to have $j$ th column $S(\pi)_{j}$ for all $j\in[r]$ .

Note that if $\pi$ has no stars, it is simply a labeling $\ell\in[r]^{|S|}$ , in which case the definitions $S(\pi)$ and $S(\ell)$ coincide. Intuitively, a pattern $\pi$ describes a class of possible labelings $\ell$ , where the stars are “wildcards” that may represent any element of $[r]$ (subject to some restrictions). More formally, we now define which $\ell$ ’s belong to a pattern $\pi$ .

Definition 3.7.

Fix $S\in\mathbb{N}^{\Omega}$ , viewed as a list $S=(I_{1},\ldots,I_{|S|})$ . For $\ell\in[r]^{|S|}$ and $\pi\in\Pi(S)$ , write $\pi\vdash\ell$ if the following conditions hold:

(i)

For each $d$ , if $\pi_{d}=j\in[r]$ then $\ell_{d}=j$ .
(ii)

The “starred” columns $\ell_{\star}:=\{\ell_{d}\,:\,\pi_{d}=\star\}$ are distinct.
(iii)

For every $j\in\ell_{\star}$ we have $S(\pi)_{j}=\emptyset$ .

Let $\mathcal{L}(S)$ denote the set of labelings $\ell\in[r]^{|S|}$ for which there exists $S^{\prime}\in\mathbb{N}^{\Omega}$ such that $|S^{\prime}|<|S|$ and $\mathsf{cols}(S(\ell))=S^{\prime}$ ; these are the $\ell$ ’s that contribute to the sum in (22). For any $\ell\in\mathcal{L}(S)$ , there is at least one (and possibly more than one) pattern $\pi\in\Pi(S)$ for which $\pi\vdash\ell$ . The following inclusion-exclusion formula will allow us to sum over $\pi\in\Pi(S)$ in a way that counts every $\ell\in\mathcal{L}(S)$ exactly once.

Lemma 3.8.

Fix $S\in\mathbb{N}^{\Omega}$ , viewed as a list $S=(I_{1},\ldots,I_{|S|})$ . Fix a function $\phi:[r]^{|S|}\to\mathbb{R}$ . For $\pi\in\Pi(S)$ and $j\in[r]$ , define

m_{\pi}(j)=|\{d\,:\,\pi_{d}=j\text{ and }I_{d}=S(\pi)_{j}\}|

and

m_{\pi}=\prod_{j\in[r]}(1-m_{\pi}(j)).

Then we have

\sum_{\ell\in\mathcal{L}(S)}\phi(\ell)=\sum_{\pi\in\Pi(S)}m_{\pi}\sum_{\ell\in[r]^{|S|}\,:\,\pi\vdash\ell}\phi(\ell).

(24)

Proof.

First note that $\pi\vdash\ell$ implies $\ell\in\mathcal{L}(S)$ and so there are no “extra” terms $\phi(\ell)$ on the right-hand side that are not present on the left-hand side. For any fixed $\ell\in\mathcal{L}(S)$ , the term $\phi(\ell)$ appears exactly once on the left-hand side of (24). Our goal is to show that it also appears exactly once on the right-hand side, that is, it suffices to prove

\sum_{\pi\in\Pi(S)\,:\,\pi\vdash\ell}m_{\pi}=1.

For a fixed $\ell$ , we need to enumerate the possible patterns $\pi$ for which $\pi\vdash\ell$ . The rules for these patterns are as follows:

•

(Case 1) For any $j$ , if there is exactly one $d$ for which $\ell_{d}=j$ then $\pi_{d}=\star$ . In this case, $m_{\pi}(j)=0$ .
•

(Case 2) For any $j$ , if $S(\ell)_{j}=\emptyset$ then there are no stars among $\{\pi_{d}\,:\,\ell_{d}=j\}$ . In this case, $m_{\pi}(j)=0$ .
•

(Case 3) For any $j$ not in Case 1, if $S(\ell)_{j}\neq\emptyset$ then among $\{\pi_{d}\,:\,\ell_{d}=j\}$ there are either no stars or exactly one star of the form $\pi_{d}=\star$ where $\ell_{d}=j$ and $I_{d}=S(\ell)_{j}$ . If there are no stars then $m_{\pi}(j)=m_{\ell}(j):=|\{d\,:\,\ell_{d}=j\text{ and }I_{d}=S(\ell)_{j}\}|$ . There are $m_{\ell}(j)$ ways to have one star, and each yields $m_{\pi}(j)=0$ .

Now we have

\sum_{\pi\,:\,\pi\vdash\ell}m_{\pi}=\sum_{\pi\,:\,\pi\vdash\ell}\prod_{j\in[r]}(1-m_{\pi}(j))=\prod_{j\text{ in Case 3}}[(1-m_{\ell}(j))+m_{\ell}(j)(1-0)]=1

as desired. ∎

Using Lemma 3.8, we can rewrite (22) as

	$\displaystyle v_{S}$	$\displaystyle=c_{S}-\sum_{S^{\prime}\,:\,\|S^{\prime}\|<\|S\|}v_{S^{\prime}}\sum_{\ell\in[r]^{\|S\|}}\frac{\lambda^{\ell}}{\lambda^{S(\ell)}r^{\underline{\|S^{\prime}\|}}}\,\mathbbm{1}_{\mathsf{cols}(S(\ell))=S^{\prime}}$
		$\displaystyle=c_{S}-\sum_{S^{\prime}\,:\,\|S^{\prime}\|<\|S\|}v_{S^{\prime}}\sum_{\pi\in\Pi(S)}m_{\pi}\sum_{\ell\,:\,\pi\vdash\ell}\frac{\lambda^{\ell}}{\lambda^{S(\ell)}r^{\underline{\|S^{\prime}\|}}}\,\mathbbm{1}_{\mathsf{cols}(S(\ell))=S^{\prime}}.$

The number of labelings $\ell$ such that $\pi\vdash\ell$ is $(r_{\pi})^{\underline{s_{\pi}}}$ where $r_{\pi}$ is the number of columns $j\in[r]$ for which $S(\pi)_{j}=\emptyset$ , and $s_{\pi}$ is the number of stars in $\pi$ (i.e., the number of indices $d$ such that $\pi_{d}=\star$ ). Note that the ratio $\frac{\lambda^{\ell}}{\lambda^{S(\ell)}}$ depends only on $\pi$ (not on $\ell$ ), and so we define $\lambda^{(\pi)}:=\frac{\lambda^{\ell}}{\lambda^{S(\ell)}}$ . Also, the predicate $\mathsf{cols}(S(\ell))=S^{\prime}$ depends only on $\pi$ (not on $\ell$ ), and we write this predicate as $S\overset{\pi}{\longrightarrow}S^{\prime}$ . With this notation, the recurrence for $v_{S}$ becomes

v_{S}=c_{S}-\sum_{S^{\prime}\,:\,|S^{\prime}|<|S|}v_{S^{\prime}}\sum_{\pi\in\Pi(S)}m_{\pi}\,(r_{\pi})^{\,\underline{s_{\pi}}}\,\frac{\lambda^{(\pi)}}{r^{\underline{|S^{\prime}|}}}\,\mathbbm{1}_{S\overset{\pi}{\longrightarrow}S^{\prime}}.

(25)

3.8 Unravelling the Recurrence

Next we will rewrite (25) in closed form (without recursion) by expanding the recursion tree as a sum over “paths.”

We first unpack the definition (11) for $c_{S}$ . Using (13) and recalling that $A$ has i.i.d. Rademacher entries,

c_{S}=\mathbb{E}[T^{S}\cdot a_{11}]=\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}\,\mathbb{E}[A^{S(\ell)}\cdot a_{11}]=\sum_{\ell\in[r]^{|S|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=\{(1,1)\}}.

(26)

By expanding the recursion tree of (25) we can write $v_{S}$ as a sum over paths which we denote by

S=S^{0}\overset{\pi^{0}}{\longrightarrow}S^{1}\overset{\pi^{1}}{\longrightarrow}\cdots\overset{\pi^{p-1}}{\longrightarrow}S^{p}\overset{\pi^{p}}{\longrightarrow}\,\perp.

(27)

Formally, a path consists of $S^{0}$ and $\pi^{1},\ldots,\pi^{p}$ (which then determine $S^{1},\ldots,S^{p}$ ) with $S^{t}\in\mathbb{N}^{\Omega}$ for all $t$ , $\pi^{t}\in\Pi(S^{t})$ for $t\leq p-1$ , and $\pi^{p}\in[r]^{|S^{p}|}$ (“no stars at the final step”) such that the following properties hold:

•

For all $0\leq t\leq p-1$ , we require $|S^{t+1}|<|S^{t}|$ and the predicate $S^{t}\overset{\pi^{t}}{\longrightarrow}S^{t+1}$ holds.
•

For the final step $S^{p}\overset{\pi^{p}}{\longrightarrow}\,\perp$ we require $S^{p}(\pi^{p})=\{(1,1)\}$ .

With this notation, (25) can be written as

v_{S}=\sum_{p\geq 0}\;\sum_{S=S^{0}\overset{\pi^{0}}{\longrightarrow}S^{1}\overset{\pi^{1}}{\longrightarrow}\,\cdots\,\overset{\pi^{p-1}}{\longrightarrow}S^{p}\overset{\pi^{p}}{\longrightarrow}\,\perp}(-1)^{p}\left(\prod_{t=0}^{p-1}m_{\pi^{t}}\,(r_{\pi^{t}})^{\,\underline{s_{\pi^{t}}}}\,\frac{\lambda^{(\pi^{t})}}{r^{\underline{|S^{t+1}|}}}\right)\lambda^{\pi^{p}},

(28)

where the special rule for the final step comes from (26).

3.9 Excluding Bad Paths

The next step is the crux of the argument: we will show that only certain types of paths contribute to (28), due to cancellations among the remaining paths.

Definition 3.9 (Event).

For a path of the form (27), we say an event occurs at timestep $t\in\{0,1,\ldots,p\}$ on column $j\in[r]$ if there exists $d$ for which $\pi^{t}_{d}=j$ .

Note that Definition 3.6 requires every timestep $t$ to have an event on at least one column $j$ .

Definition 3.10 (Deletion event).

An event at timestep $t$ on column $j$ is called a deletion event if $S^{t}(\pi^{t})_{j}=\emptyset$ .

Definition 3.11 (Good/bad paths).

A path of the form (27) is called bad if there exists a column $j\in[r]$ such that the last event (i.e., with the largest $t$ ) on that column is a deletion event. If a path is not bad, it is called good.

Lemma 3.12.

The total contribution from bad paths to (28) is 0. That is,

v_{S}=\sum_{p\geq 0}\;\sum_{\begin{subarray}{c}S=S^{0}\overset{\pi^{0}}{\longrightarrow}S^{1}\overset{\pi^{1}}{\longrightarrow}\,\cdots\,\overset{\pi^{p-1}}{\longrightarrow}S^{p}\overset{\pi^{p}}{\longrightarrow}\,\perp\\ \text{good}\end{subarray}}(-1)^{p}\left(\prod_{t=0}^{p-1}m_{\pi^{t}}\,(r_{\pi^{t}})^{\,\underline{s_{\pi^{t}}}}\,\frac{\lambda^{(\pi^{t})}}{r^{\underline{|S^{t+1}|}}}\right)\lambda^{\pi^{p}}.

(29)

Proof.

We will show that the bad paths can be paired up so that within each pair, the two paths contribute the same term to (28) but with opposite signs. The pairing is described by the following procedure, an involution that maps each bad path to its partner (and vice versa).

(1)

Given a bad path as input, let $j^{*}$ denote the largest column index for which the last event is a deletion event. Let $t^{*}$ denote the timestep on which this deletion event occurs.

(2a)

If there exists another event at timestep $t^{*}$ (on some column $j\neq j^{*}$ ), “promote” the $(t^{*},j^{*})$ deletion event to its own timestep. That is, replace

\cdots\;S^{t^{*}}\overset{\pi^{t^{*}}}{\longrightarrow}\cdots\qquad\text{by}\qquad\cdots\;S^{t^{*}}\overset{\tau}{\longrightarrow}S^{\prime}\overset{\sigma}{\longrightarrow}\cdots

where $\tau,\sigma$ are defined as follows. For all $d$ such that $\pi^{t^{*}}_{d}=j^{*}$ , set $\tau_{d}=j^{*}$ ; for all other $d$ , set $\tau_{d}=\star$ . Now $S^{\prime}$ (viewed as a list) is produced from $S^{t^{*}}$ by removing the elements $I_{d}$ for which $\pi^{t^{*}}_{d}=j^{*}$ ; similarly, $\sigma$ is produced from $\pi^{t^{*}}$ by removing the elements $\pi^{t^{*}}_{d}$ that are equal to $j^{*}$ .

(2b)

If instead there is no other event at timestep $t^{*}$ (which cannot happen if $t^{*}=p$ due to the non-deletion event on column 1), “merge” timestep $t^{*}$ with the subsequent timestep. That is, replace

\cdots\;S^{t^{*}}\overset{\pi^{t^{*}}}{\longrightarrow}S^{t^{*}+1}\overset{\pi^{t^{*}+1}}{\longrightarrow}\cdots\qquad\text{by}\qquad\cdots\;S^{t^{*}}\overset{\tau}{\longrightarrow}\cdots

where $\tau$ is defined as follows. For all $d$ such that $\pi^{t^{*}}_{d}=j^{*}$ , set $\tau_{d}=j^{*}$ . The number of remaining entries of $\tau$ is exactly $|S^{t^{*}+1}|$ , and we designate these entries as “unassigned”; for each $1\leq i\leq|S^{t^{*}+1}|$ , set the $i$ th unassigned entry of $\tau$ to be $\pi^{t^{*}+1}_{i}$ . Note that since the $(t^{*},j^{*})$ event was the last event in its column, the merge operation will not cause it to “collide” with another event.

A few claims remain to be checked before the proof is complete. First note that the above procedure maps any bad path to a different bad path (its “partner”), and applying the procedure again on the partner recovers the original path. For instance, if applying the procedure to the original path resulted in a “promote” operation on column $j^{*}$ , applying the procedure on the partner will undo this using a “merge” operation on the same column $j^{*}$ .

We furthermore claim that for any bad path and its partner, both paths have the same value for the factor $\left(\prod_{t=0}^{p-1}\cdots\right)\lambda^{\pi^{p}}$ in (28). However, the lengths of the two paths differ by one, causing the two corresponding terms in (28) to cancel due to the $(-1)^{p}$ factor. To prove the claim, compare the cases $S^{t}\overset{\pi^{t}}{\longrightarrow}S^{t+1}$ and $S^{t}\overset{\tau}{\longrightarrow}S^{\prime}\overset{\sigma}{\longrightarrow}S^{t+1}$ from (2a) above, where for now we assume $S^{t+1}\neq\perp$ . For the $m$ factors, note that $m_{\pi}(j)=0$ whenever $\pi$ has either a deletion event on column $j$ or no event on column $j$ , and so $m_{\pi^{t}}=m_{\tau}m_{\sigma}$ . For the $r^{\underline{s}}$ factors, note that $r_{\pi^{t}}=r_{\sigma}$ and $s_{\pi^{t}}=s_{\sigma}$ ; also, $r_{\tau}=r$ and $s_{\tau}=|S^{\prime}|$ , so $(r_{\tau})^{\underline{s_{\tau}}}$ cancels with the existing factor of $1/r^{\underline{|S^{\prime}|}}$ in (28). Finally, for the $\lambda$ factors we have $\lambda^{(\pi^{t})}=\lambda^{(\tau)}\lambda^{(\sigma)}$ . The other case $S^{t}\overset{\pi^{t}}{\longrightarrow}\,\perp$ versus $S^{t}\overset{\tau}{\longrightarrow}S^{\prime}\overset{\sigma}{\longrightarrow}\,\perp$ is treated similarly, where now we have $m_{\tau}=1$ , $(r_{\tau})^{\underline{s_{\tau}}}=r^{\underline{|S^{\prime}|}}$ , and $\lambda^{\pi^{t}}=\lambda^{(\tau)}\lambda^{\sigma}$ . This completes the proof. ∎

3.10 Bounding $|v_{S}|$

Now that we have identified the crucial cancellations between pairs of bad paths, the rest of the proof will follow by bounding the terms in (29) in a straightforward way. We start by collecting some bounds on the individual pieces of (29).

Lemma 3.13.

For any step $S\overset{\pi}{\longrightarrow}S^{\prime}$ , $|m_{\pi}|\leq 2^{|S|-|S^{\prime}|}$ .

Proof.

Note that

|m_{\pi}(j)|\leq|\{d\,:\,\pi_{d}=j\}|=:|\pi^{-1}(j)|.

The number of stars plus the number of distinct columns in $\pi$ must be at least $|S^{\prime}|$ . This leaves at most $|S|-|S^{\prime}|$ entries of $\pi$ that repeat a previous column, i.e.,

\sum_{j\in[r]\,:\,|\pi^{-1}(j)|\geq 2}(|\pi^{-1}(j)|-1)\leq|S|-|S^{\prime}|.

(30)

This means

	$\displaystyle\|m_{\pi}\|=\prod_{j\in[r]}\|m_{\pi}(j)-1\|$	$\displaystyle\leq\prod_{j\in[r]\,:\,\|m_{\pi}(j)\|\geq 2}(m_{\pi}(j)-1)\leq\prod_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}(\|\pi^{-1}(j)\|-1)$
		$\displaystyle\leq\prod_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}2^{(\|\pi^{-1}(j)\|-1)}=2^{\sum_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}(\|\pi^{-1}(j)\|-1)}$
		$\displaystyle\leq 2^{\|S\|-\|S^{\prime}\|}$

where the final step uses (30). ∎

Lemma 3.14.

For any step $S\overset{\pi}{\longrightarrow}S^{\prime}$ , $|\lambda^{(\pi)}|\leq 1$ .

Proof.

Recall $\lambda^{(\pi)}:=\frac{\lambda^{\ell}}{\lambda^{S(\ell)}}$ for any $\ell$ such that $\pi\vdash\ell$ . Recall that $\lambda^{S(\ell)}$ is the product of $\lambda_{j}$ over the non-empty columns $j$ of $S(\ell)$ . For any such non-empty column $j$ , there must exist $d$ for which $\ell_{d}=j$ . Thus, every factor of $\lambda_{j}$ in the denominator of $\frac{\lambda^{\ell}}{\lambda^{S(\ell)}}$ is cancelled by the numerator, and the result now follows because $|\lambda_{j}|\leq 1$ . ∎

We now state the main conclusion of this section.

Lemma 3.15.

For any $S\in\mathbb{N}^{\Omega}$ with $|S|\geq 1$ , we have $|v_{S}|\leq(3|S|^{2})^{|S|}$ .

Note that for $|S|=0$ , it can be verified directly that $v_{\emptyset}=0$ .

Proof.

Proceed by induction on $|S|$ . We will bound the sum of absolute values of terms in (29); it will no longer be important to exploit cancellations between positive and negative terms. First consider paths of the form $S\overset{\pi^{0}}{\longrightarrow}\,\perp$ . There is at most one such path that is good, namely $\pi^{0}=(1,1,\ldots,1)$ ; the value of this term is $|\lambda^{\pi^{0}}|\leq 1$ .

All remaining paths take the form $S\overset{\pi^{0}}{\longrightarrow}S^{1}\cdots$ where $|S^{1}|=i$ for some $1\leq i\leq|S|-1$ . In order to produce $S^{1}$ , $\pi^{0}$ must have exactly $i$ entries $d$ that are either stars (i.e., $\pi^{0}_{d}=\star$ ) or first in a “combination” event (i.e., for some $j\in[r]$ with $S(\pi^{0})_{j}\neq\emptyset$ , $d$ is the lowest index such that $\pi^{0}_{d}=j$ ). There are $\binom{|S|}{i}$ choices for which $i$ elements of $\pi^{0}$ play these roles; by default they will all be stars, and will be converted to “combinations” if another entry of $\pi^{0}$ decides to join them on the same column.

Now there are $|S|-i$ entries of $\pi^{0}$ remaining, which have a few options. One option is to participate in a combination event by joining one of the $i$ previously designated entries on the same column. The other option is to participate in a deletion event. Since we are only counting good paths, this can only happen on a column on which a later event will occur. Regardless of the remainder of the path $S^{1}\overset{\pi^{1}}{\longrightarrow}\cdots\,\perp$ , there are at most $|S^{1}|=i$ such columns available. Since each of $|S|-i$ entries of $\pi^{0}$ has at most $2i$ choices, this gives a total of at most $(2i)^{|S|-i}$ choices.

We also need to decide which columns the combination events occur on. If $\pi^{0}$ has $s_{\pi^{0}}$ stars and $i-s_{\pi^{0}}$ combination events, there are $r^{\underline{i-s_{\pi^{0}}}}$ choices for the columns. Note that this exactly cancels the factor $(r_{\pi^{0}})^{\underline{s_{\pi^{0}}}}/r^{\underline{|S^{1}|}}$ in (29) because $r_{\pi^{0}}=r-(i-s_{\pi^{0}})$ and $|S^{1}|=i$ .

Recall that we aim to show $|v_{S}|\leq b(|S|)$ where $b(d):=(3d^{2})^{d}$ . Plugging the above calculations (along with Lemmas 3.13 and 3.14) into (29) and using the induction hypothesis to handle the remainder of the path $S^{1}\overset{\pi^{1}}{\longrightarrow}\cdots\,\perp$ , we have

	$\displaystyle\|v_{S}\|$	$\displaystyle\leq 1+\sum_{i=1}^{\|S\|-1}b(i)\binom{\|S\|}{i}(2i)^{\|S\|-i}\,2^{\|S\|-i}$
		$\displaystyle=1+\sum_{i=1}^{\|S\|-1}(3i^{2})^{i}\binom{\|S\|}{i}(4i)^{\|S\|-i}.$
At this point we can verify the case $\|S\|=1$ directly. Assuming $\|S\|\geq 2$ , we continue:
		$\displaystyle\leq 1+\sum_{i=1}^{\|S\|-1}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}$
		$\displaystyle=1+\sum_{i=0}^{\|S\|}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}-(4(\|S\|-1))^{\|S\|}-(3(\|S\|-1)^{2})^{\|S\|}$
		$\displaystyle\leq\sum_{i=0}^{\|S\|}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}$
		$\displaystyle=[3(\|S\|-1)^{2}+4(\|S\|-1)]^{\|S\|}$
		$\displaystyle=(3\|S\|^{2}-6\|S\|+3+4\|S\|-4)^{\|S\|}$
		$\displaystyle\leq(3\|S\|^{2})^{\|S\|},$

completing the proof. ∎

3.11 Putting it Together

We now combine (23) with Lemma 3.15 to complete the proof of Theorem 2.1. We first note that $v_{S}\neq 0$ only when the elements of $S$ (viewed as a multi-set) together with $\{1\}$ form an even cover of $[n]$ .

Lemma 3.16.

Let $P(S)$ denote the property that $S(\ell)=\{(1,1)\}$ for $\ell=(1,1,\ldots,1)$ . If $P(S)$ fails to hold then $v_{S}=0$ .

Proof.

From (26), note that if $c_{S}\neq 0$ then $S(\ell)=\{(1,1)\}$ for some $\ell$ , which implies $S(\ell)=\{(1,1)\}$ for $\ell=(1,1,\ldots,1)$ . Thus $c_{S}=0$ whenever $P(S)$ fails.

Also note that if $P(S)$ fails and $\mathsf{cols}(S(\ell))=S^{\prime}$ for some $\ell$ , then $P(S^{\prime})$ fails. The result now follows from (22) using induction on $|S|$ . ∎

Lemma 3.17.

For any $d\geq 1$ , the number of multi-sets $S\in\mathbb{N}^{\Omega}$ with $|S|=d$ such that $P(S)$ holds is at most $n^{(kd-1)/2}((kd+3)/2)^{kd}$ .

Proof.

Since $P(S)$ holds, $S$ together with $\{1\}$ forms an even cover of $[n]$ . Therefore the number of elements of $[n]\setminus\{1\}$ covered by $S$ is at most $(kd-1)/2$ . The number of ways to choose this many elements is at most $n^{(kd-1)/2}$ . Once these are chosen, $S$ has at most $(kd-1)/2+1=(kd+1)/2$ elements to draw from, so the number of possibilities for $S$ is at most $((kd+1)/2+1)^{kd}=((kd+3)/2)^{kd}$ . ∎

Proof of Theorem 2.1.

Starting from (23) and using Lemmas 3.15, 3.16, and 3.17,

	$\displaystyle\mathsf{Corr}_{\leq D}^{2}$	$\displaystyle\leq\sum_{\|S\|\leq D}\frac{v_{S}^{2}}{\lambda_{\min}^{2\|S\|}\,r^{\underline{\|S\|}}}$
		$\displaystyle\leq\sum_{d=1}^{D}n^{(kd-1)/2}((kd+3)/2)^{kd}\frac{(3d^{2})^{2d}}{\lambda_{\min}^{2d}(r-d+1)^{d}}$
		$\displaystyle=n^{-1/2}\sum_{d=1}^{D}\left(\frac{n^{k/2}((kd+3)/2)^{k}(3d^{2})^{2}}{\lambda_{\min}^{2}(r-d+1)}\right)^{d}$
		$\displaystyle\leq n^{-1/2}\sum_{d=1}^{D}\left(9D^{4}(k(D+1)/2)^{k}\cdot\frac{n^{k/2}}{\lambda_{\min}^{2}(r-D)}\right)^{d}$
		$\displaystyle\leq n^{-1/2}\sum_{d=1}^{D}\left(9D^{4}(kD)^{k}\cdot\frac{19n^{k/2}}{18\lambda_{\min}^{2}r}\right)^{d}$
since (6) implies $D\leq r/19$
		$\displaystyle=n^{-1/2}\sum_{d=1}^{D}\left(\frac{19}{2}\,k^{k}D^{k+4}\cdot\frac{n^{k/2}}{\lambda_{\min}^{2}r}\right)^{d}$
		$\displaystyle\leq n^{-1/2}\sum_{d=1}^{D}\left(\frac{1}{2}\right)^{d}$
where we have used the assumption (6)
		$\displaystyle\leq n^{-1/2}.$

Using Fact 3.1, this completes the proof. ∎

4 Proof of Theorem 2.2: Upper Bound

4.1 Expander Graphs

We begin by collecting some standard properties of expander graphs.

Proposition 4.1.

Fix an integer $k\geq 3$ . For all even $N$ exceeding a constant $N_{0}=N_{0}(k)$ , there exists a $k$ -regular $N$ -vertex (simple) graph with the following properties:

•

the minimum cut value is $k$ (achieved by a single vertex), and
•

for any $S\subseteq V(G)$ with $0<|S|\leq N/2$ , $|\partial S|\geq c|S|$ ,

where $\partial S$ is the set of edges with exactly one endpoint in $S$ , and $c\geq 0.08$ is an absolute constant.

Proof.

It follows from classical results that a uniformly random $k$ -regular $N$ -vertex graph has these properties with high probability, i.e., probability $1-o(1)$ as $N\to\infty$ with $k$ fixed. Letting $G$ be such a graph, it is well-known that $G$ is $k$ -connected with high probability [Bol98, Section 7.6], which proves the first statement about the minimum cut.

For the second statement, let $k=\mu_{1}\geq\mu_{2}\geq\cdots\geq\mu_{N}$ denote the eigenvalues of the adjacency matrix of $G$ . Friedman’s second eigenvalue theorem [Fri08] states that for any fixed $\epsilon>0$ , $\mu_{2}\leq 2\sqrt{k-1}+\epsilon$ with high probability. Cheeger’s inequality [Dod84, AM85] (see Theorem 2.4 of [HLW06]) tells us that for any $S\subseteq V(G)$ with $0<|S|\leq N/2$ , $\frac{|\partial S|}{|S|}\geq\frac{1}{2}(k-\mu_{2})$ . Combining these gives

|\partial S|\geq\frac{|S|}{2}(k-\mu_{2})\geq\frac{|S|}{2}(k-2\sqrt{k-1}-\epsilon)

which concludes the proof for any choice of $c$ satisfying $0<c<\frac{1}{2}(k-2\sqrt{k-1})$ . The expression $\frac{1}{2}(k-2\sqrt{k-1})$ is minimized (over $k\geq 3$ ) when $k=3$ . ∎

4.2 Constructing the Polynomial

Let $N=D-1$ where $D$ is defined as in Theorem 2.2, choosing $n_{0}$ large enough so that $N\geq N_{0}$ . From this point onward, let $G$ denote the $k$ -regular $N$ -vertex graph guaranteed by Proposition 4.1. We construct a new graph $H$ as follows. Starting with $G$ , add two additional vertices called $\circ$ and $u$ , and add the edge $(\circ,u)$ . Recall that $k^{\prime}$ is the odd element of $\{k-1,k\}$ . Choose $p:=(k^{\prime}-1)/2$ arbitrary edges $(i_{1},j_{1}),\ldots,(i_{p},j_{p})$ of $G$ with no endpoints in common. Delete these $p$ edges and add the edges $(u,i_{1}),(u,j_{1}),\ldots,(u,i_{p}),(u,j_{p})$ . This completes the description of $H$ . Note that $H$ is $k$ -regular aside from the degree-1 vertex $\circ$ and the degree- $k^{\prime}$ vertex $u$ .

Definition 4.2.

Define an edge-labeling of $H$ to be a function $\phi:E(H)\to[n]$ that is injective (no two edges get the same label) with $\phi(\circ,u)=1$ . Let $\Phi$ denote the set of all edge-labelings of $H$ .

For an edge-labeling $\phi$ and a vertex $v\in V(H)\setminus\{\circ\}$ , define $T_{v}(\phi)$ to be the following entry of the input tensor: let $e_{1},\ldots,e_{m}$ be the edges incident to $v$ (where $m\in\{k,k^{\prime}\}$ ) and then let $T_{v}(\phi):=T_{\phi(e_{1}),\ldots,\phi(e_{m})}$ . Our polynomial estimator is defined as follows:

f(T)=\frac{1}{|\Phi|}\sum_{\phi\in\Phi}\,\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi).

(31)

4.3 Vertex Labelings

Definition 4.3.

Define a vertex-labeling of $H$ to be a function $\psi:V(H)\setminus\{\circ\}\to[r]$ . Let $\Psi$ denote the set of all vertex-labelings of $H$ .

For $\phi\in\Phi$ , $\psi\in\Psi$ , and $v\in V(H)\setminus\{\circ\}$ , define $T_{v}(\phi,\psi)$ as follows: letting $e_{1},\ldots,e_{m}$ be the edges incident to $v$ , and $j:=\psi(v)$ , let $T_{v}(\phi,\psi):=\lambda_{j}\prod_{i=1}^{m}(a_{j})_{\phi(e_{i})}$ . Recalling (5), we can expand (31) as

	$\displaystyle f(T)$	$\displaystyle=\frac{1}{\|\Phi\|}\sum_{\phi\in\Phi}\,\sum_{\psi\in\Psi}\,\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi,\psi)$
		$\displaystyle=\sum_{\psi\in\Psi}\frac{1}{\|\Phi\|}\sum_{\phi\in\Phi}\,\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi,\psi).$

We will break the above sum into different terms depending on $\psi$ . Define the partition $\Psi=\Psi_{1}\sqcup\Psi_{2}\sqcup\Psi_{3}$ as follows:

•

$\Psi_{1}=\{\psi_{1}\}$ where $\psi_{1}$ denotes the all-ones labeling: $\psi_{1}(v)=1$ for all $v$ ,
•

$\Psi_{2}=\{\psi_{2},\ldots,\psi_{r}\}$ where $\psi_{j}$ denotes the all- $j$ ’s labeling: $\psi_{j}(v)=j$ for all $v$ ,
•

$\Psi_{3}=\Psi\setminus(\Psi_{1}\cup\Psi_{2})$ .

We can now write $f=f_{1}+f_{2}+f_{3}$ where for $i\in\{1,2,3\}$ ,

f_{i}:=\sum_{\psi\in\Psi_{i}}\frac{1}{|\Phi|}\sum_{\phi\in\Phi}\,\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi,\psi).

4.4 Signal Term

We first handle the terms $f_{1}$ and $f_{2}$ .

Lemma 4.4.

$f_{1}=a_{11}$ .

Proof.

We have

f_{1}=\frac{1}{|\Phi|}\sum_{\phi\in\Phi}\,\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi,\psi_{1}).

Recalling that $a_{j}$ has $\{\pm 1\}$ -valued entries and $\lambda_{1}=1$ , note that for any $\phi\in\Phi$ ,

\prod_{v\in V(H)\setminus\{\circ\}}T_{v}(\phi,\psi_{1})=a_{11},

because each edge $e\in E(H)$ contributes a factor of $(a_{1})_{\phi(e)}^{2}=1$ except the edge $(\circ,u)$ , which is required by Definition 4.2 to have $\phi(\circ,u)=1$ and thus contributes a factor of $a_{11}$ . The result follows. ∎

Lemma 4.5.

$\mathbb{E}[f_{2}^{2}]\leq(r-1)\cdot|\lambda_{2}|^{2(N+1)}$ .

Proof.

Similarly to the proof of Lemma 4.4,

f_{2}=\sum_{j=2}^{r}\lambda_{j}^{|V(H)|-1}(a_{j})_{1}.

Note that $|V(H)|-1=N+1$ where, recall, $N=|V(G)|$ . Now compute

	$\displaystyle\mathbb{E}[f_{2}^{2}]$	$\displaystyle=\sum_{j=2}^{r}\,\sum_{j^{\prime}=2}^{r}\lambda_{j}^{N+1}\lambda_{j^{\prime}}^{N+1}\,\mathbb{E}[(a_{j})_{1}(a_{j^{\prime}})_{1}]$
		$\displaystyle=\sum_{j=2}^{r}\lambda_{j}^{2(N+1)}$
		$\displaystyle\leq(r-1)\cdot\|\lambda_{2}\|^{2(N+1)},$

completing the proof. ∎

4.5 Noise Term

We now handle the term $f_{3}$ . This section is devoted to proving the following.

Lemma 4.6.

Under the assumptions of Theorem 2.2, $\mathbb{E}[f_{3}^{2}]\leq 4k^{k-1}D^{52k}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}}$ .

To compute $\mathbb{E}[f_{3}^{2}]$ , it will help to introduce an auxiliary graph $\tilde{H}$ defined as follows. Start with two disjoint copies of $H$ , called $H$ and $H^{\prime}$ . Delete the vertices $\circ$ and $\circ^{\prime}$ and connect the two leftover half-edges to form the edge $(u,u^{\prime})$ . This completes the description of $\tilde{H}$ .

The vertices of $\tilde{H}$ can be partitioned as $V(\tilde{H})=\{u,u^{\prime}\}\sqcup V\sqcup V^{\prime}$ where $V$ comes from the copy of $G$ in $H$ , and $V^{\prime}$ from $H^{\prime}$ . Similarly, the edges of $\tilde{H}$ can be partitioned as $E(\tilde{H})=\{(u,u^{\prime})\}\sqcup E\sqcup E^{\prime}$ where $E$ comes from $H$ , and $E^{\prime}$ from $H^{\prime}$ .

Definition 4.7.

Define an edge-labeling of $\tilde{H}$ to be a function $\phi:E(\tilde{H})\to[n]$ such that $\phi(u,u^{\prime})=1$ , no other edge has the label 1, no two edges in $E$ (defined above) have the same label, and no two edges in $E^{\prime}$ have the same label. Let $\tilde{\Phi}$ denote the set of all edge-labelings of $\tilde{H}$ .

Definition 4.8.

Define a vertex-labeling of $\tilde{H}$ to be a function $\psi:V(\tilde{H})\to[r]$ such that $\psi$ takes at least two different values within $V\cup\{u\}$ (defined above), and $\psi$ takes at least two different values within $V^{\prime}\cup\{u^{\prime}\}$ . Let $\tilde{\Psi}$ denote the set of all vertex-labelings of $\tilde{H}$ .

For $\phi\in\tilde{\Phi}$ , $\psi\in\tilde{\Psi}$ , and $v\in V(\tilde{H})$ , define $T_{v}(\phi,\psi)$ similarly to above: letting $e_{1},\ldots,e_{m}$ be the edges incident to $v$ , and $j:=\psi(v)$ , let $T_{v}(\phi,\psi):=\lambda_{j}\prod_{i=1}^{m}(a_{j})_{\phi(e_{i})}$ .

With the above definitions in hand, and recalling that $\Psi_{3}$ is the set of vertex-labelings of $H$ that take at least two different values, we can write

f_{3}^{2}=\frac{1}{|\Phi|^{2}}\sum_{\phi\in\tilde{\Phi}}\,\sum_{\psi\in\tilde{\Psi}}\,\prod_{v\in V(\tilde{H})}T_{v}(\phi,\psi).

(32)

Only certain “valid” pairs $(\phi,\psi)$ yield a term with nonzero expectation.

Definition 4.9.

For $\phi\in\tilde{\Phi}$ and $\psi\in\tilde{\Psi}$ , we say $(\phi,\psi)$ is valid if the following holds: for each $i\in[n]$ and $j\in[r]$ , there is an even number of edges $e\in E(\tilde{H})$ with the property that $\phi(e)=i$ and exactly one endpoint of $e$ has vertex-label $j$ . In fact, this even number must be 2 because, by Definition 4.7, only two edges can share the same label $i$ .

Note that valid pairs $(\phi,\psi)$ are precisely those for which the corresponding term in (32) has an even number of factors of each $(a_{j})_{i}$ . We can now write

	$\displaystyle\mathbb{E}[f_{3}^{2}]$	$\displaystyle=\frac{1}{\|\Phi\|^{2}}\sum_{(\phi,\psi)\text{ valid}}\mathbb{E}\prod_{v\in V(\tilde{H})}T_{v}(\phi,\psi)$
		$\displaystyle=\frac{1}{\|\Phi\|^{2}}\sum_{(\phi,\psi)\text{ valid}}\,\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}.$		(33)

Definition 4.10.

For $\psi\in\tilde{\Psi}$ , a region is the preimage under $\psi$ of some $j\in[r]$ . In other words, a region consists of all vertices of $\tilde{H}$ that have a particular label.

For a valid $(\phi,\psi)$ pair, let $R$ denote the number of non-empty regions. By Definition 4.8 we must have $R\geq 2$ . Since $(u,u^{\prime})$ is the only edge with label $1$ (Definition 4.7) and $(\phi,\psi)$ is valid, $u$ and $u^{\prime}$ must belong to the same region; call this region 1, and number the other non-empty regions $2,\ldots,R$ . For $1\leq i\leq R$ , let $s_{i}$ denote the number of vertices in $V$ that belong to region $i$ and let $s^{\prime}_{i}$ denote the number of vertices in $V^{\prime}$ that belong to region $i$ . Let $\bar{s}_{i}=\min\{s_{i},N-s_{i}\}$ and $\bar{s}^{\prime}_{i}=\min\{s^{\prime}_{i},N-s^{\prime}_{i}\}$ where, recall, $N=|V|=|V^{\prime}|$ . For $1\leq i\leq R$ , let $\ell_{i}$ denote the number of edges in $E$ that “cross” region $i$ (i.e., have exactly one endpoint in region $i$ ). Since the edges of $\tilde{H}$ crossing region $i$ must be paired up with each pair having the same edge-label (Definition 4.9), and edge-labels cannot repeat within $E$ or $E^{\prime}$ (Definition 4.7), $\ell_{i}$ must also be equal to the number of edges in $E^{\prime}$ that cross region $i$ . The total number of cross-edges (i.e., edges of $\tilde{H}$ whose endpoints have different vertex-labels) is $\ell=\sum_{i=1}^{R}\ell_{i}$ . Note that $(u,u^{\prime})$ is never a cross-edge since both its endpoints belong to region 1. As a consequence of the above discussion, every non-empty region must include at least one vertex from both $V\cup\{u\}$ and $V^{\prime}\cup\{u^{\prime}\}$ .

Lemma 4.11.

For any valid $(\phi,\psi)$ pair and any $i\in\{1,2\}$ ,

\ell_{i}\geq\max\{k^{\prime}-1,c\bar{s}_{i},c\bar{s}^{\prime}_{i}\}

where $c>0$ is the constant from Proposition 4.1.

Proof.

Recall that $u$ belongs to region 1 by convention. Let $S\subseteq V$ denote the vertices in $V$ that belong to region $i$ . The case $S=\emptyset$ is possible only if $i=1$ , in which case we have $\ell_{i}=k^{\prime}-1$ . The case $S=V$ is possible only if $i=R=2$ (since there must be at least 2 regions, each containing a vertex from both $V\cup\{u\}$ and $V^{\prime}\cup\{u^{\prime}\}$ ), in which case again $\ell_{i}=k^{\prime}-1$ . This leaves the case $0<|S|<N$ . Proposition 4.1 (applied to either $S$ or $V\setminus S$ , whichever is smaller) tells us that the number of original edges of $G$ (recall some edges were deleted to form $H$ ) crossing $S$ is at least the maximum of $k$ and $c\cdot\min\{|S|,N-|S|\}=c\bar{s}_{i}$ . For each edge $(v_{1},v_{2})$ that was deleted from $G$ to form $H$ , if $(v_{1},v_{2})$ crosses $S$ then one of the two new edges $(u,v_{1})$ or $(u,v_{2})$ must cross region $i$ . We conclude that at least $\max\{k,c\bar{s}_{i}\}$ edges in $E$ cross region $i$ . The same argument applied to $V^{\prime}$ gives the bound $\max\{k,c\bar{s}^{\prime}_{i}\}$ . ∎

Lemma 4.12.

For any valid $(\phi,\psi)$ pair and any $3\leq i\leq R$ ,

\ell_{i}\geq\max\{k,c\bar{s}_{i},c\bar{s}^{\prime}_{i}\}

where $c>0$ is the constant from Proposition 4.1.

Proof.

The proof is the same as that of Lemma 4.11 except now the cases $S=\emptyset$ and $S=V$ are impossible. ∎

Proof of Lemma 4.6.

Combining Lemmas 4.11 and 4.12, we have for any valid $(\phi,\psi)$ , the total number of cross-edges is

$\displaystyle\ell=\sum_{i=1}^{R}\ell_{i}$	$\displaystyle\geq\sum_{i=1}^{R}\max\{k,c\bar{s}_{i},c\bar{s}^{\prime}_{i}\}-2(1+\mathbbm{1}_{k\text{ even}})$
	$\displaystyle=\sum_{i=1}^{R}(k+\Delta_{i})-2(1+\mathbbm{1}_{k\text{ even}})$
	$\displaystyle=Rk-2(1+\mathbbm{1}_{k\text{ even}})+\Delta$	(34)

where

\Delta_{i}:=\max\{0,c\bar{s}_{i}-k,c\bar{s}^{\prime}_{i}-k\}

(35)

and

\Delta:=\sum_{i=1}^{R}\Delta_{i}.

We now work towards bounding (33). Since every non-empty region must include at least one vertex from both $V\cup\{u\}$ and $V^{\prime}\cup\{u^{\prime}\}$ , the possible values for $R$ are $2\leq R\leq N+1$ . The number of ways to choose the values $s_{1},\ldots,s_{R}$ and $s_{1}^{\prime},\ldots,s_{R}^{\prime}$ is at most $N^{2R}$ because $0\leq s_{1},s_{1}^{\prime}\leq N-1$ and for $i\geq 2$ , $1\leq s_{i},s_{i}^{\prime}\leq N$ . Once these values are chosen, the number of ways to partition $V(\tilde{H})$ into $R$ non-empty regions of the prescribed sizes is at most

\prod_{i=1}^{R}\binom{N}{s_{i}}\binom{N}{s_{i}^{\prime}}\leq\prod_{i=1}^{R}N^{\bar{s}_{i}+\bar{s}^{\prime}_{i}}\leq N^{2Rk/c+2\Delta/c},

where the last step uses (35) to conclude $\bar{s}_{i},\bar{s}_{i}^{\prime}\leq(k+\Delta_{i})/c$ .

Now that the regions are chosen, we next count the number of ways to assign vertex-labels $\psi$ that respect these regions. At the same time, we will also bound the term $\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}$ appearing in (33). Recall that all vertices in a given region must have the same vertex-label. We consider two cases. First suppose every region contains at most $N+1$ vertices (half the total number in $\tilde{H}$ ). There are at most $r^{R}$ ways to assign the vertex-labels and, since at most half the vertices have label 1, we have $\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}\leq|\lambda_{2}|^{N+1}$ . Now consider the other case where some “large” region has more than $N+1$ vertices. If we choose to assign vertex-label 1 to the large region, then there are at most $r^{R-1}$ ways to assign the remaining labels and $\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}\leq 1$ ; otherwise, there are at most $r^{R}$ ways to assign the labels and $\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}\leq|\lambda_{2}|^{N+1}$ .

Now we count the number of ways to assign edge-labels $\phi$ . Recall that the edge $(u,u^{\prime})$ is required to have edge-label $1$ , and no other edge can have edge-label $1$ . Recall that there are $\ell/2$ cross-edges in $E$ and $\ell/2$ cross-edges in $E^{\prime}$ . These need to be paired up, with each cross-edge in $E$ having the same edge-label as some cross-edge in $E^{\prime}$ . There are $(\ell/2)!$ ways to choose the pairing and then $(n-1)^{\underline{\ell/2}}$ ways to assign edge-labels to the cross-edges, recalling the falling factorial notation (18). There are $|E|-\ell/2$ edges in $E$ remaining, which can have any edge-labels subject to not repeating within $E$ , so there are $(n-\ell/2-1)^{\underline{|E|-\ell/2}}$ ways to label these edges and the same number of ways to label the rest of $E^{\prime}$ . (Here we have assumed $\ell/2\leq n-1$ , which will indeed be the case: $\ell/2\leq|E|$ by definition, and we will see $|E|\leq n/2$ below.)

Note that $|E|=\frac{kN}{2}+\frac{k^{\prime}-1}{2}\leq\frac{1}{2}k(N+1)=\frac{1}{2}kD$ , and by the assumptions of Theorem 2.2,

D\leq kn^{1/52}\log n+2.

(36)

Thus, for sufficiently large $n_{0}$ we have $|E|\leq n/2$ .

Putting it all together, (33) becomes

	$\displaystyle\mathbb{E}[f_{3}^{2}]$	$\displaystyle=\frac{1}{\|\Phi\|^{2}}\sum_{(\phi,\psi)\text{ valid}}\,\prod_{v\in V(\tilde{H})}\lambda_{\psi(v)}$
		$\displaystyle\leq\frac{1}{\|\Phi\|^{2}}\sum_{R=2}^{N+1}N^{2R+2Rk/c+2\Delta/c}(r^{R-1}+r^{R}\|\lambda_{2}\|^{N+1})\sup_{\ell}\,(\ell/2)!(n-1)^{\underline{\ell/2}}\left[(n-\ell/2-1)^{\underline{\|E\|-\ell/2}}\right]^{2}.$

We will bound pieces of this expression separately. First, since $|\Phi|=(n-1)^{\underline{|E|}}$ , we have

	$\displaystyle\frac{1}{\|\Phi\|^{2}}(\ell/2)!(n-1)^{\underline{\ell/2}}\left[(n-\ell/2-1)^{\underline{\|E\|-\ell/2}}\right]^{2}$	$\displaystyle=(\ell/2)!(n-1)^{\underline{\ell/2}}\left[\frac{(n-\ell/2-1)^{\underline{\|E\|-\ell/2}}}{(n-1)^{\underline{\|E\|}}}\right]^{2}$
		$\displaystyle=(\ell/2)!(n-1)^{\underline{\ell/2}}\left[\frac{1}{(n-1)^{\underline{\ell/2}}}\right]^{2}$
		$\displaystyle\leq(\ell/2)^{\ell/2}(n-\ell/2)^{-\ell/2}$
		$\displaystyle\leq\left(\frac{\ell/2}{n-\ell/2}\right)^{\ell/2}$
and recalling $\ell/2\leq\|E\|\leq n/2$ from above,
		$\displaystyle\leq\left(\frac{\ell/2}{n/2}\right)^{\ell/2}.$
Recalling from (34) that $\ell\geq Rk-2(1+\mathbbm{1}_{k\text{ even}})+\Delta$ , this becomes
		$\displaystyle\leq\left(\frac{\ell}{n}\right)^{\frac{1}{2}(Rk-2(1+\mathbbm{1}_{k\text{ even}})+\Delta)}$
and recalling $\ell\leq 2\|E\|\leq kD$ ,
		$\displaystyle\leq\left(\frac{kD}{n}\right)^{\frac{1}{2}(Rk-2(1+\mathbbm{1}_{k\text{ even}})+\Delta)}.$

We now show $r|\lambda_{2}|^{N+1}\leq 1$ . Using $1-|\lambda_{2}|\leq\log(1/|\lambda_{2}|)$ and the definition of $D$ (see Theorem 2.2),

|\lambda_{2}|^{N+1}=|\lambda_{2}|^{D}\leq|\lambda_{2}|^{\frac{k\log n}{1-|\lambda_{2}|}}=\exp\left(-\frac{k\log n}{1-|\lambda_{2}|}\log\frac{1}{|\lambda_{2}|}\right)\leq\exp\left(-k\log n\right)=n^{-k}.

(37)

Since (7) implies $r\leq n^{k/2}$ , this gives $r|\lambda_{2}|^{N+1}\leq 1$ as desired, implying

r^{R-1}+r^{R}|\lambda_{2}|^{N+1}=r^{R-1}(1+r|\lambda_{2}|^{N+1})\leq 2r^{R-1}.

Combining the above,

\displaystyle\mathbb{E}[f_{3}^{2}]

\displaystyle\leq 2\sum_{R=2}^{N+1}\sup_{\Delta}N^{2R+2Rk/c+2\Delta/c}r^{R-1}\left(\frac{k(N+1)}{n}\right)^{\frac{1}{2}(Rk-2(1+\mathbbm{1}_{k\text{ even}})+\Delta)}

\displaystyle=\frac{2}{r}\left(\frac{n}{k(N+1)}\right)^{1+\mathbbm{1}_{k\text{ even}}}\sum_{R=2}^{N+1}\left(k^{k/2}N^{2+2k/c}(N+1)^{k/2}\frac{r}{n^{k/2}}\right)^{R}\sup_{\Delta}\left(N^{2/c}\sqrt{\frac{k(N+1)}{n}}\right)^{\Delta}.

Recall $c\geq 0.08$ (Proposition 4.1), which gives $1/c\leq 12.5$ . Using (36),

N^{2/c}\sqrt{\frac{k(N+1)}{n}}\leq D^{2/c+1/2}\sqrt{\frac{k}{n}}\leq D^{25.5}\sqrt{\frac{k}{n}}\leq\left(kn^{1/52}\log n+2\right)^{25.5}\sqrt{\frac{k}{n}}\leq 1

for sufficiently large

n_{0}

. Since

\Delta\geq 0

, the supremum above is achieved at

\Delta=0

and the bound on

\mathbb{E}[f_{3}^{2}]

becomes

\displaystyle=\frac{2}{r}\left(\frac{n}{k(N+1)}\right)^{1+\mathbbm{1}_{k\text{ even}}}\sum_{R=2}^{N+1}\left(k^{k/2}N^{2+2k/c}(N+1)^{k/2}\frac{r}{n^{k/2}}\right)^{R}

\displaystyle\leq\frac{2}{r}\left(\frac{n}{kD}\right)^{1+\mathbbm{1}_{k\text{ even}}}\sum_{R=2}^{N+1}\left(k^{k/2}D^{2+2k/c+k/2}\frac{r}{n^{k/2}}\right)^{R}

\displaystyle\leq\frac{2}{r}\left(\frac{n}{kD}\right)^{1+\mathbbm{1}_{k\text{ even}}}\sum_{R=2}^{\infty}\left(k^{k/2}D^{25.5k+2}\frac{r}{n^{k/2}}\right)^{R}

and now (7) implies

k^{k/2}D^{25.5k+2}\frac{r}{n^{k/2}}\leq k^{k/2}D^{27k}\frac{r}{n^{k/2}}\leq 1/2

, which gives

\displaystyle\leq\frac{4}{r}\left(\frac{n}{kD}\right)^{1+\mathbbm{1}_{k\text{ even}}}\left(k^{k/2}D^{25.5k+2}\frac{r}{n^{k/2}}\right)^{2}

\displaystyle=4k^{k-1-\mathbbm{1}_{k\text{ even}}}D^{51k+3-\mathbbm{1}_{k\text{ even}}}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}}

\displaystyle\leq 4k^{k-1}D^{52k}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}},

completing the proof. ∎

4.6 Putting it Together

Proof of Theorem 2.2.

Using Lemma 4.4,

	$\displaystyle\mathbb{E}[(f-a_{11})^{2}]$	$\displaystyle=\mathbb{E}[(f_{1}+f_{2}+f_{3}-a_{11})^{2}]$
		$\displaystyle=\mathbb{E}[(f_{2}+f_{3})^{2}]$
		$\displaystyle\leq 2(\mathbb{E}[f_{2}^{2}]+\mathbb{E}[f_{3}^{2}]).$

Recall that Lemma 4.5 gives $\mathbb{E}[f_{2}^{2}]\leq r|\lambda_{2}|^{2(N+1)}=r|\lambda_{2}|^{2D}$ , and from (37) we have $|\lambda|^{2D}\leq n^{-2k}$ . Also, (7) implies $r\leq n^{k/2}$ , which gives $\mathbb{E}[f_{2}^{2}]\leq n^{-k}$ . Combining this with Lemma 4.6 yields

\mathbb{E}[(f-a_{11})^{2}]\leq 2(n^{-k}+4k^{k-1}D^{52k}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}})\leq 10k^{k-1}D^{52k}\frac{r}{n^{k-1-\mathbbm{1}_{k\text{ even}}}},

completing the proof. ∎

Acknowledgments

The author is indebted to Jonathan Niles-Weed, Tselil Schramm, and Jerry Li for numerous detailed discussions about this problem. The author also thanks Jingqiu Ding, Bruce Hajek, Tim Kunisky, Cris Moore, Aaron Potechin, Bobby Shi, and anonymous reviewers, for helpful discussions and comments.

References

[AAA17] Esraa Al-Sharoa, Mahmood Al-Khassaweneh, and Selin Aviyente. A tensor based framework for community detection in dynamic networks. In International conference on acoustics, speech and signal processing (ICASSP), pages 2312–2316. IEEE, 2017.
[ABG⁺14] Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, and James Voss. The more, the merrier: the blessing of dimensionality for learning large gaussian mixtures. In Conference on Learning Theory, pages 1135–1164. PMLR, 2014.
[AC08] Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic barriers from phase transitions. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 793–802. IEEE, 2008.
[AFH⁺12] Anima Anandkumar, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Yi-Kai Liu. A spectral algorithm for latent dirichlet allocation. Advances in neural information processing systems, 25, 2012.
[AGHK13] Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. A tensor spectral approach to learning mixed membership community models. In Conference on Learning Theory, pages 867–881. PMLR, 2013.
[AGJ14] Anima Anandkumar, Rong Ge, and Majid Janzamin. Analyzing tensor power method dynamics: Applications to learning overcomplete latent variable models. arXiv preprint arXiv:1411.1488, 2014.
[AGJ15] Animashree Anandkumar, Rong Ge, and Majid Janzamin. Learning overcomplete latent variable models through tensor methods. In Conference on Learning Theory, pages 36–112. PMLR, 2015.
[AM85] Noga Alon and Vitali D Milman. $\lambda_{1}$ , isoperimetric inequalities for graphs, and superconcentrators. Journal of Combinatorial Theory, Series B, 38(1):73–88, 1985.
[BB20] Matthew Brennan and Guy Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
[BBH18] Matthew Brennan, Guy Bresler, and Wasim Huleihel. Reducibility and computational lower bounds for problems with planted sparse structure. In Conference On Learning Theory, pages 48–166. PMLR, 2018.
[BBH⁺21] Matthew S Brennan, Guy Bresler, Sam Hopkins, Jerry Li, and Tselil Schramm. Statistical query algorithms and low degree tests are almost equivalent. In Conference on Learning Theory, pages 774–774. PMLR, 2021.
[BBK⁺17] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexander S Wein. Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163, 2017.
[BBK⁺21] Afonso S Bandeira, Jess Banks, Dmitriy Kunisky, Christopher Moore, and Alexander S Wein. Spectral planting and the hardness of refuting cuts, colorability, and communities in random graphs. In Conference on Learning Theory, pages 410–473. PMLR, 2021.
[BBLS18] Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer. Heterogeneous multireference alignment: A single pass approach. In 52nd Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2018.
[BCMV14] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603, 2014.
[BCO14] Cristiano Bocci, Luca Chiantini, and Giorgio Ottaviani. Refined methods for the identifiability of tensors. Annali di Matematica Pura ed Applicata (1923-), 193(6):1691–1702, 2014.
[BCRT20] Giulio Biroli, Chiara Cammarota, and Federico Ricci-Tersenghi. How to iron out rough landscapes and get optimal performances: averaged gradient descent and its application to tensor PCA. Journal of Physics A: Mathematical and Theoretical, 53(17):174003, 2020.
[BEH⁺22] Afonso S Bandeira, Ahmed El Alaoui, Samuel B Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik. The Franz-Parisi criterion and computational trade-offs in high dimensional statistics. arXiv preprint arXiv:2205.09727, 2022.
[BGJ20] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor PCA. The Annals of Probability, 48(4):2052–2087, 2020.
[BH22] Guy Bresler and Brice Huang. The algorithmic phase transition of random k-SAT for low degree polynomials. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 298–309. IEEE, 2022.
[BHK⁺19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.
[BKS15] Boaz Barak, Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 143–151, 2015.
[BKW20] Afonso S Bandeira, Dmitriy Kunisky, and Alexander S Wein. Computational hardness of certifying bounds on constrained PCA problems. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151, page 78. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
[BM20] Davide Bacciu and Danilo P Mandic. Tensor decompositions in deep learning. arXiv preprint arXiv:2002.11835, 2020.
[BMR21] Jess Banks, Sidhanth Mohanty, and Prasad Raghavendra. Local statistics, semidefinite programming, and community detection. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1298–1316. SIAM, 2021.
[Bol98] Béla Bollobás. Random graphs. In Modern graph theory, pages 215–252. Springer, 1998.
[BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.
[BWZ20] Gérard Ben Arous, Alexander S Wein, and Ilias Zadik. Free energy wells and overlap gap property in sparse PCA. In Conference on Learning Theory, pages 479–482. PMLR, 2020.
[CGH⁺22] Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Alexander S Wein, and Ilias Zadik. Statistical and computational phase transitions in group testing. In Conference on Learning Theory, pages 4764–4781. PMLR, 2022.
[CMZ22] Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-linear planted cliques elude the metropolis process. arXiv preprint arXiv:2204.01911, 2022.
[DCC07] Lieven De Lathauwer, Josphine Castaing, and Jean-Franois Cardoso. Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing, 55(6):2965–2973, 2007.
[DdL⁺22] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022.
[DK22] Ilias Diakonikolas and Daniel Kane. Non-gaussian component analysis via lattice basis reduction. In Conference on Learning Theory, pages 4535–4547. PMLR, 2022.
[DKMZ11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.
[DKS17] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73–84. IEEE, 2017.
[DKWB19] Yunzi Ding, Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Subexponential-time algorithms for sparse PCA. arXiv preprint arXiv:1907.11635, 2019.
[Dod84] Jozef Dodziuk. Difference equations, isoperimetric inequality and transience of certain random walks. Transactions of the American Mathematical Society, 284(2):787–794, 1984.
[FGR⁺17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
[Fri08] Joel Friedman. A proof of Alon’s second eigenvalue conjecture and related problems. American Mathematical Soc., 2008.
[FSWW20] Zhou Fan, Yi Sun, Tianhao Wang, and Yihong Wu. Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model. Communications on Pure and Applied Mathematics, 2020.
[Gam21] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41):e2108492118, 2021.
[GHK15] Rong Ge, Qingqing Huang, and Sham M Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 761–770, 2015.
[GJS21] David Gamarnik, Aukosh Jagannath, and Subhabrata Sen. The overlap gap property in principal submatrix recovery. Probability Theory and Related Fields, 181(4):757–814, 2021.
[GJW20] David Gamarnik, Aukosh Jagannath, and Alexander S Wein. Low-degree hardness of random optimization problems. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 131–140. IEEE, 2020.
[GM15] Rong Ge and Tengyu Ma. Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, page 829–849, 2015.
[GM17] Rong Ge and Tengyu Ma. On the optimization landscape of tensor decompositions. Advances in Neural Information Processing Systems, 30, 2017.
[GMZ22] David Gamarnik, Cristopher Moore, and Lenka Zdeborová. Disordered systems insights on computational hardness. arXiv preprint arXiv:2210.08312, 2022.
[GS17] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. The Annals of Probability, pages 2353–2376, 2017.
[GVX14] Navin Goyal, Santosh Vempala, and Ying Xiao. Fourier PCA and robust tensor decomposition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 584–593, 2014.
[GZ17] David Gamarnik and Ilias Zadik. Sparse high-dimensional linear regression. algorithmic barriers and a local search algorithm. arXiv preprint arXiv:1711.04952, 2017.
[GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property. arXiv preprint arXiv:1904.07174, 2019.
[Hås89] Johan Håstad. Tensor rank is NP-complete. In International Colloquium on Automata, Languages, and Programming, pages 451–460. Springer, 1989.
[HK13] Daniel Hsu and Sham M Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 11–20, 2013.
[HKP⁺17] Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, and David Steurer. The power of sum-of-squares for detecting hidden structures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2017.
[HL13] Christopher J Hillar and Lek-Heng Lim. Most tensor problems are NP-hard. Journal of the ACM (JACM), 60(6):1–39, 2013.
[HLW06] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43(4):439–561, 2006.
[Hop18] Samuel Hopkins. Statistical Inference and the Sum of Squares Method. PhD thesis, Cornell University, 2018.
[HS17] Samuel B Hopkins and David Steurer. Efficient bayesian estimation from few samples: community detection and related problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017.
[HSS15] Samuel B Hopkins, Jonathan Shi, and David Steurer. Tensor principal component analysis via sum-of-square proofs. In Conference on Learning Theory, pages 956–1006. PMLR, 2015.
[HSS19] Samuel B Hopkins, Tselil Schramm, and Jonathan Shi. A robust spectral algorithm for overcomplete tensor decomposition. In Conference on Learning Theory, pages 1683–1722. PMLR, 2019.
[HSSS16] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 178–191, 2016.
[HW21] Justin Holmgren and Alexander S Wein. Counterexamples to the low-degree conjecture. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), volume 185, 2021.
[HWX15] Bruce Hajek, Yihong Wu, and Jiaming Xu. Computational lower bounds for community detection on random graphs. In Conference on Learning Theory, pages 899–928. PMLR, 2015.
[Jer92] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.
[JLLX21] Bing-Yi Jing, Ting Li, Zhongyuan Lyu, and Dong Xia. Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics, 49(6):3181–3205, 2021.
[KM21a] Frederic Koehler and Elchanan Mossel. Reconstruction on trees and low-degree polynomials. arXiv preprint arXiv:2109.06915, 2021.
[KM21b] Pravesh K Kothari and Peter Manohar. A stress-free sum-of-squares lower bound for coloring. arXiv preprint arXiv:2105.07517, 2021.
[KMOW17] Pravesh K Kothari, Ryuhei Mori, Ryan O’Donnell, and David Witmer. Sum of squares lower bounds for refuting any CSP. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 132–145, 2017.
[Kol21] Tamara G. Kolda. Will the real Jennrich’s algorithm please stand up? Available online at www.mathsci.ai/post/jennrich, December 2021. Accessed: 10-22-2022.
[KP20] Bohdan Kivva and Aaron Potechin. Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 SOS. arXiv preprint arXiv:2011.09416, 2020.
[Kun21] Dmitriy Kunisky. Hypothesis testing with low-degree polynomials in the Morris class of exponential families. In Conference on Learning Theory, pages 2822–2848. PMLR, 2021.
[Kun22] Dmitriy Kunisky. Lecture notes on sum-of-squares optimization. Available online at www.kunisky.com/static/teaching/2022spring-sos/sos-notes.pdf, 2022. Accessed: 09-29-2022.
[KWB22] Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. In ISAAC Congress (International Society for Analysis, its Applications and Computation), pages 1–50. Springer, 2022.
[LKZ15] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 680–687. IEEE, 2015.
[LM21] Allen Liu and Ankur Moitra. Algorithms from invariants: Smoothed analysis of orbit recovery over $SO(3)$ . arXiv preprint arXiv:2106.02680, 2021.
[LRA93] Sue E Leurgans, Robert T Ross, and Rebecca B Abel. A decomposition for three-way arrays. SIAM Journal on Matrix Analysis and Applications, 14(4):1064–1083, 1993.
[LZ22] Yuetian Luo and Anru R Zhang. Tensor clustering with planted structures: Statistical optimality and computational limits. The Annals of Statistics, 50(1):584–613, 2022.
[MKUZ19] Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. In International conference on machine learning, pages 4333–4342. PMLR, 2019.
[Moi14] Ankur Moitra. Algorithmic aspects of machine learning. Lecture notes, 2014.
[MR05] Elchanan Mossel and Sébastien Roch. Learning nonsingular phylogenies and hidden markov models. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 366–375, 2005.
[MSS16] Tengyu Ma, Jonathan Shi, and David Steurer. Polynomial-time tensor decompositions with sum-of-squares. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 438–446. IEEE, 2016.
[MW19] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 926–937, 2019.
[MW22] Andrea Montanari and Alexander S Wein. Equivalence of approximate message passing and low-degree polynomials in rank-one matrix estimation. arXiv preprint arXiv:2212.06996, 2022.
[OTR22] Mohamed Ouerfelli, Mohamed Tamaazousti, and Vincent Rivasseau. Random tensor theory for tensor decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
[PR20] Aaron Potechin and Goutham Rajendran. Machinery for proving sum-of-squares lower bounds on certification problems. arXiv preprint arXiv:2011.04253, 2020.
[PWB⁺19] Amelia Perry, Jonathan Weed, Afonso S Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497–517, 2019.
[RM14] Emile Richard and Andrea Montanari. A statistical model for tensor PCA. Advances in neural information processing systems, 27, 2014.
[RSG17] Stephan Rabanser, Oleksandr Shchur, and Stephan Günnemann. Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781, 2017.
[RSS18] Prasad Raghavendra, Tselil Schramm, and David Steurer. High dimensional estimation via sum-of-squares proofs. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, pages 3389–3423. World Scientific, 2018.
[SDF⁺17] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E Papalexakis, and Christos Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551–3582, 2017.
[SS17] Tselil Schramm and David Steurer. Fast and robust tensor decomposition with applications to dictionary learning. In Conference on Learning Theory, pages 1760–1793. PMLR, 2017.
[SW22] Tselil Schramm and Alexander S Wein. Computational barriers to estimation from low-degree polynomials. The Annals of Statistics, 50(3):1833–1858, 2022.
[Wei18] Alexander Spence Wein. Statistical estimation in the presence of group actions. PhD thesis, Massachusetts Institute of Technology, 2018.
[Wei22] Alexander S Wein. Optimal low-degree hardness of maximum independent set. Mathematical Statistics and Learning, 4(3):221–251, 2022.
[WEM19] Alexander S Wein, Ahmed El Alaoui, and Cristopher Moore. The Kikuchi hierarchy and tensor PCA. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1446–1468. IEEE, 2019.
[WX21] Yihong Wu and Jiaming Xu. Statistical problems with planted structures: Information-theoretical and computational limits. Information-Theoretic Methods in Data Science, 383, 2021.
[ZSWB22] Ilias Zadik, Min Jae Song, Alexander S Wein, and Joan Bruna. Lattice-based methods surpass sum-of-squares in clustering. In Conference on Learning Theory, pages 1247–1248. PMLR, 2022.

	$\displaystyle f(T)$	$\displaystyle=\sum_{\|S\|\leq D}\hat{f}_{S}T^{S}$
		$\displaystyle=\sum_{\|S\|\leq D}\hat{f}_{S}\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}A^{S(\ell)}$
		$\displaystyle=\sum_{\|U\|\leq kD}A^{U}\underbrace{\sum_{\|S\|\leq D}\hat{f}_{S}\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U}}_{\hat{g}_{U}}.$

	(II)	$\displaystyle=\sum_{U^{\prime}}\;\sum_{S\,:\,\|S\|=\|\mathsf{cols}(U)\|}w_{U^{\prime}}R(D)_{U^{\prime}S}Q(D)^{+}_{SU}$
		$\displaystyle=\sum_{U^{\prime}}\;\sum_{S\,:\,\|S\|=\|\mathsf{cols}(U)\|}w_{U^{\prime}}\left(\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\right)\frac{\mathbbm{1}_{\mathsf{cols}(U)=S}}{\lambda^{U}r^{\underline{\|S\|}}}$
		$\displaystyle=\sum_{U^{\prime}}w_{U^{\prime}}\left(\sum_{\ell\in[r]^{\|S\|}}\lambda^{\ell}\,\mathbbm{1}_{S(\ell)=U^{\prime}}\right)\frac{1}{\lambda^{U}r^{\underline{\|S\|}}}\qquad\text{where }S=\mathsf{cols}(U).$

	$\displaystyle\|m_{\pi}\|=\prod_{j\in[r]}\|m_{\pi}(j)-1\|$	$\displaystyle\leq\prod_{j\in[r]\,:\,\|m_{\pi}(j)\|\geq 2}(m_{\pi}(j)-1)\leq\prod_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}(\|\pi^{-1}(j)\|-1)$
		$\displaystyle\leq\prod_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}2^{(\|\pi^{-1}(j)\|-1)}=2^{\sum_{j\in[r]\,:\,\|\pi^{-1}(j)\|\geq 2}(\|\pi^{-1}(j)\|-1)}$
		$\displaystyle\leq 2^{\|S\|-\|S^{\prime}\|}$

	$\displaystyle\|v_{S}\|$	$\displaystyle\leq 1+\sum_{i=1}^{\|S\|-1}b(i)\binom{\|S\|}{i}(2i)^{\|S\|-i}\,2^{\|S\|-i}$
		$\displaystyle=1+\sum_{i=1}^{\|S\|-1}(3i^{2})^{i}\binom{\|S\|}{i}(4i)^{\|S\|-i}.$
At this point we can verify the case $\|S\|=1$ directly. Assuming $\|S\|\geq 2$ , we continue:
		$\displaystyle\leq 1+\sum_{i=1}^{\|S\|-1}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}$
		$\displaystyle=1+\sum_{i=0}^{\|S\|}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}-(4(\|S\|-1))^{\|S\|}-(3(\|S\|-1)^{2})^{\|S\|}$
		$\displaystyle\leq\sum_{i=0}^{\|S\|}\binom{\|S\|}{i}(3(\|S\|-1)^{2})^{i}(4(\|S\|-1))^{\|S\|-i}$
		$\displaystyle=[3(\|S\|-1)^{2}+4(\|S\|-1)]^{\|S\|}$
		$\displaystyle=(3\|S\|^{2}-6\|S\|+3+4\|S\|-4)^{\|S\|}$
		$\displaystyle\leq(3\|S\|^{2})^{\|S\|},$

Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials

Abstract

1 Introduction

Statistical-computational gaps.

The LDP framework.

1.1 Largest Component Recovery

Problem 1.1 (Largest component recovery).

Connection to decomposition.

Problem 1.2 (Semirandom tensor decomposition).

Remark 1.3.

1.2 Main Results

Theorem 1.4.

1.3 Discussion

1.3.1 Why the LDP framework?

Statistical query (SQ) model.

Sum-of-squares (SoS) hierarchy.

Optimization landscape.

Reductions.

1.3.2 Difficulties of tensor decomposition

1.4 Proof Techniques

1.4.1 Lower bound

1.4.2 Upper bound

1.5 Future Directions

2 General Setting and Results

Theorem 2.1 (Lower bound).

Theorem 2.2 (Upper bound).

Remark 2.3 (Exact recovery).

3 Proof of Theorem 2.1: Lower Bound

3.1 Setup

Fact 3.1 ([SW22, Fact 1.1]).

3.2 Proof Overview

3.3 Computing MM

Definition 3.2 (List notation for SS).

3.4 Constructing the Left Inverse

Definition 3.3 (Generic UU).

Definition 3.4.

Lemma 3.5.

Proof.

3.5 Recurrence for ww

3.6 Recurrence for vv

3.7 Grouping by Patterns

Definition 3.6.

Definition 3.7.

Lemma 3.8.

Proof.

3.8 Unravelling the Recurrence

3.9 Excluding Bad Paths

Definition 3.9 (Event).

Definition 3.10 (Deletion event).

Definition 3.11 (Good/bad paths).

Lemma 3.12.

Proof.

3.10 Bounding |vS||v_{S}|

Lemma 3.13.

Proof.

Lemma 3.14.

Proof.

Lemma 3.15.

Proof.

3.11 Putting it Together

Lemma 3.16.

Proof.

Lemma 3.17.

Proof.

Proof of Theorem 2.1.

4 Proof of Theorem 2.2: Upper Bound

4.1 Expander Graphs

Proposition 4.1.

Proof.

4.2 Constructing the Polynomial

Definition 4.2.

4.3 Vertex Labelings

Definition 4.3.

4.4 Signal Term

Lemma 4.4.

Proof.

Lemma 4.5.

Proof.

4.5 Noise Term

Lemma 4.6.

Average-Case Complexity of Tensor Decomposition
for Low-Degree Polynomials

3.3 Computing $M$

Definition 3.2 (List notation for $S$ ).

Definition 3.3 (Generic $U$ ).

3.5 Recurrence for $w$

3.6 Recurrence for $v$

3.10 Bounding $|v_{S}|$