Average-Case Complexity of Tensor Decomposition
for Low-Degree Polynomials
Abstract
Suppose we are given an -dimensional order-3 symmetric tensor that is the sum of random rank-1 terms. The problem of recovering the rank-1 components is possible in principle when but polynomial-time algorithms are only known in the regime . Similar “statistical-computational gaps” occur in many high-dimensional inference tasks, and in recent years there has been a flurry of work on explaining the apparent computational hardness in these problems by proving lower bounds against restricted (yet powerful) models of computation such as statistical queries (SQ), sum-of-squares (SoS), and low-degree polynomials (LDP). However, no such prior work exists for tensor decomposition, largely because its hardness does not appear to be explained by a “planted versus null” testing problem.
We consider a model for random order-3 tensor decomposition where one component is slightly larger in norm than the rest (to break symmetry), and the components are drawn uniformly from the hypercube. We resolve the computational complexity in the LDP model: -degree polynomial functions of the tensor entries can accurately estimate the largest component when but fail to do so when . This provides rigorous evidence suggesting that the best known algorithms for tensor decomposition cannot be improved, at least by known approaches. A natural extension of the result holds for tensors of any fixed order , in which case the LDP threshold is .
1 Introduction
Tensor decomposition is a basic computational primitive underlying many algorithms for a variety of data analysis tasks, including phylogenetic reconstruction [MR05], topic modeling [AFH+12], community detection [AGHK13, HS17, AAA17, JLLX21], learning mixtures of Gaussians [HK13, GHK15, BCMV14, ABG+14], independent component analysis [GVX14], dictionary learning [BKS15, MSS16], and multi-reference alignment [PWB+19]. For further discussion, we refer the reader to [Moi14, RSG17, SDF+17, BM20].
In this work we will consider order- tensors of the form for an integer , that is, is an ( times) array of real numbers with entries denoted by with . A vector can be used to form a rank-1 tensor defined by , akin to the rank-1 matrix .
In the tensor decomposition problem, we observe a rank- tensor of the form
(1) |
with unknown components . The goal is to recover the components up to the inherent symmetries in the problem, i.e., we cannot recover the ordering of the list and if is even we cannot distinguish from . Our regime of interest will be fixed and large, with possibly growing with .
A large number of algorithmic results for tensor decomposition now exist under various assumptions on and [LRA93, DCC07, GVX14, BCMV14, AGJ15, AGJ14, BKS15, GM15, HSSS16, MSS16, GM17, SS17, HSS19, KP20, DdL+22]. We will focus on the order-3 case () to simplify the discussion. If are linearly independent, the decomposition problem can be solved by a classical method based on simultaneous diagonalization111This method is sometimes called Jennrich’s algorithm, although this may be a misnomer; see [Kol21]. [LRA93]. Most recent work focuses on the more difficult overcomplete regime where . For random order-3 tensors, where the components are drawn uniformly from the unit sphere, the state-of-the-art polynomial-time algorithms succeed when (where hides a polylogarithmic factor ); this was achieved first by a method based on the sum-of-squares hierarchy [MSS16] and then later by a faster spectral method [DdL+22]. Under the weaker condition (where hides a constant factor), the decomposition is unique [BCO14] and so the problem is solvable in principle. However, no polynomial-time algorithm is known in the regime .
For random tensors of any fixed order , the situation is similar. The decomposition is unique when [BCO14], but poly-time algorithms are only known for substantially smaller rank. For , poly-time decomposition is known when [DCC07, MSS16], and it is expected that is achievable for general ; however, to our knowledge, the best poly-time algorithms in the literature for require [LRA93, BCMV14]. These algorithms for work not just for random tensors but also under much weaker assumptions on the components.
Statistical-computational gaps.
The motivation for this work is to understand whether the “statistical-computational gap” mentioned above is inherent, that is, we hope to investigate whether or not a poly-time algorithm exists in the apparent “possible but hard” regime . For context, gaps between statistical and computational thresholds are a nearly ubiquitous phenomenon throughout high-dimensional statistics, appearing in settings such as planted clique, sparse PCA, community detection, tensor PCA, and more; see e.g. [WX21, GMZ22] for exposition. In these average-case problems where the input is random, classical computational complexity theory has little to say: results on NP-hardness (including those for tensor decomposition [Hås89, HL13]) typically show hardness for worst-case instances, which does not imply hardness in the average case. Still, a number of frameworks have emerged to justify why the “hard” regime is actually hard, and to predict the location of the hard regime in new problems. One approach is based on average-case reductions which formally transfer suspected hardness of one average-case problem (usually planted clique) to another (e.g. [BR13, HWX15, BBH18, BB20]). Another approach is to prove unconditional lower bounds against particular algorithms or classes of algorithms (e.g. [Jer92, DKMZ11, LKZ15]), and sometimes this analysis is based on intricate geometric properties of the solution space (e.g. [AC08, GS17, GZ17]; see [Gam21] for a survey). Perhaps some of the most powerful and well-established classes of algorithms to prove lower bounds against are statistical query (SQ) algorithms (e.g. [FGR+17, DKS17]), the sum-of-squares (SoS) hierarchy (e.g. [BHK+19, KMOW17]; see [RSS18] for a survey), and low-degree polynomials (LDP) (e.g. [HS17, HKP+17, Hop18]; see [KWB22] for a survey). In recent years, all of the above frameworks have been widely successful at providing many different perspectives on what makes statistical problems easy versus hard, and there has also been progress on understanding formal connections between different frameworks [HKP+17, GJW20, BBH+21, BEH+22]. For many conjectured statistical-computational gaps, we now have sharp lower bounds in one (or more) of these frameworks, suggesting that the best known algorithms cannot be improved (at least by certain known approaches).
Despite all these successes, the tensor decomposition problem is a rare example where (prior to this work) essentially no progress has been made on any kind of average-case hardness, even though the problem itself and the suspected threshold at have maintained a high profile in the community since the work of [GM15] in 2015. The lecture notes [Kun22] highlight “hardness of tensor decomposition” as one of 15 prominent open problems related to sum-of-squares (Open Problem 7.1). In Section 1.3.2 we explain why tensor decomposition is more difficult to analyze than the other statistical problems that were understood previously, and in Section 1.4.1 we explain how to overcome these challenges.
The LDP framework.
Our main result (Theorem 1.4) determines the computational complexity of tensor decomposition in the low-degree polynomial (LDP) framework, establishing an easy-hard phase transition at the threshold . This means we consider a restricted class of algorithms, namely those that can be expressed as -degree multivariate polynomials in the tensor entries. The study of this class of algorithms in the context of high-dimensional statistics first emerged from a line of work on the sum-of-squares (SoS) hierarchy [BHK+19, HS17, HKP+17, Hop18] and is now a popular tool for predicting and explaining statistical-computational gaps; see [KWB22] for a survey. Low-degree polynomials capture various algorithmic paradigms such as spectral methods and approximate message passing (see e.g. Section 4 of [KWB22], Appendix A of [GJW20], and Theorem 1.4 of [MW22]), and so LDP lower bounds allow us to rule out a large class of known approaches all at once. For the types of problems that typically arise in high-dimensional statistics (informally speaking), the LDP framework has a great track record for consistently matching the performance of the best known poly-time algorithms. As a result, LDP lower bounds can be taken as evidence for inherent hardness of certain types of average-case problems. While there are some settings where LDPs are outperformed by another algorithm, these other algorithms tend to be “brittle” algebraic methods with almost no noise tolerance, and so the LDP framework is arguably still saying something meaningful in these settings; see [HW21, Kun21, KM21a, ZSWB22, DK22] for discussion.
Existing LDP lower bounds apply to a wide variety of statistical tasks, which can be classified as hypothesis testing (e.g. [HS17, HKP+17, Hop18]), estimation (e.g. [SW22, KM21a]), and optimization (e.g. [GJW20, Wei22, BH22]). The techniques used to prove these lower bounds are quite different in the three cases, and the current work introduces a new technique for the case of estimation.
1.1 Largest Component Recovery
In order to formulate the random tensor decomposition problem in the LDP framework, we will define a variant of the problem where one component has slightly larger norm than the rest, and the goal is to recover this particular component.
Problem 1.1 (Largest component recovery).
Let be odd and . In the largest component recovery problem, we observe
where each (for ) is drawn uniformly and independently from the hypercube . The goal is to recover with high probability.
One can think of as a small constant, although our results will apply more generally. The purpose of increasing the norm of the first component is to give the algorithm a concrete goal, namely to recover ; without this, the algorithm has no way to break symmetry among the components. We have restricted to odd here because otherwise there is no hope to disambiguate between and ; however, we give similar results for even in Section 2. The assumption that is drawn from the hypercube helps to make some of the proofs cleaner, but our approach can likely handle other natural distributions for such as (and we do not expect this to change the threshold).
Connection to decomposition.
While Problem 1.1 does not have formal implications for the canonical random tensor decomposition model (i.e., (1) with uniform on the sphere), it does have implications for a mild generalization of this problem.
Problem 1.2 (Semirandom tensor decomposition).
Let be odd and . In the semirandom tensor decomposition problem, we observe
where are arbitrary and are drawn uniformly and independently from the hypercube . The goal is to recover (up to re-ordering) as well as the corresponding ’s, with high probability.
Note that any algorithm for Problem 1.2 can be used to solve Problem 1.1 (with the same parameters ): simply run the algorithm to recover all the pairs and then output the with the largest corresponding . When , our main result (Theorem 1.4) will give rigorous evidence for hardness of Problem 1.2 via a two-step argument: we will show LDP-hardness of Problem 1.1, justifying the conjecture that there is no poly-time algorithm for Problem 1.1; this conjecture, if true, formally implies that there is no poly-time algorithm for Problem 1.2.
We see Problem 1.2 as a natural variant of random tensor decomposition. In particular, we expect that existing algorithms [MSS16, DdL+22] can be readily adapted to solve this variant when , at least for . As such, understanding the complexity of Problem 1.2 (via the connection to Problem 1.1 described above) can be viewed as the end-goal of this work. From this point onward, we will study Problem 1.1. In Section 1.5 we discuss possible extensions of our result that would more directly address the canonical model for random tensor decomposition.
Remark 1.3.
In line with the discussion in the Introduction, our results should really only be considered evidence for hardness of Problems 1.1 and 1.2 in the presence of noise. Indeed, if is not an integer, there is a trivial algorithm for Problem 1.1 by examining the non-integer part of (the author thanks Bruce Hajek for pointing this out). This algorithm is easily defeated by adding Gaussian noise of variance to each entry of . This is a low level of noise, as we believe our low-degree upper bound tolerates , akin to tensor PCA.
1.2 Main Results
We now show how to cast Problem 1.1 in the LDP framework and state our main results. The class of algorithms we consider are (multivariate) polynomials of degree (at most) with real coefficients, whose input variables are the entries of and whose output is a real scalar value; we write for the space of such polynomials. We will be interested in whether such a polynomial can accurately estimate , the first entry of the vector . Following [SW22], define the degree- minimum mean squared error
where the expectation is over and sampled as in Problem 1.1. Note that the above “scalar MMSE” is directly related to the “vector MMSE”: by linearity of expectation and symmetry among the coordinates,
(2) |
It therefore suffices to study the scalar MMSE, with indicating near-perfect estimation and indicating near-trivial estimation (no better than the constant function ).
Theorem 1.4.
Fix any odd and . As in Problem 1.1, let
where each (for ) is drawn uniformly and independently from the hypercube . Let grow with as for an arbitrary fixed .
-
•
(“Easy”) If then for a constant .
-
•
(“Hard”) If then for a constant .
More precise and more general statements of the lower bound (“hard” regime) and upper bound (“easy” regime) can be found in Section 2 (Theorems 2.1 and 2.2), where the case of even is also handled. The lower bound shows failure of LDP algorithms when . Since no one has managed to find a poly-time algorithm in the LDP-hard regime for many other problems of a similar nature (including planted clique [Hop18], community detection [HS17], tensor PCA [HKP+17, KWB22], and more), this suggests that there is no poly-time algorithm for Problem 1.1 (and therefore also for Problem 1.2) when , at least without a new algorithmic breakthrough. This is the first average-case hardness result of any type for tensor decomposition, and it gives a concrete reason to believe that the existing algorithms for (which succeed when [MSS16, DdL+22]) are essentially optimal. Following the heuristic correspondence between polynomial degree and runtime in Hypothesis 2.1.5 of [Hop18] (see also [KWB22, DKWB19])—namely, degree corresponds to runtime where hides factors of —the lower bound suggests that runtime is required in the hard regime.
The upper bound shows that a degree- polynomial successfully estimates in the regime . Such a polynomial has terms and can thus be evaluated in time (quasi-polynomial). This is in stark contrast to the runtime that we expect to need in the hard regime. We are confident that for and , Problems 1.1 and 1.2 can in fact be solved in polynomial time by adapting existing algorithms [MSS16, DdL+22], but we have not attempted this here as our main focus is on establishing the phase transition for polynomials. We consider the lower bound to be our main contribution, and the upper bound serves to make the lower bound more meaningful.
1.3 Discussion
1.3.1 Why the LDP framework?
Since there are many different frameworks for exploring statistical-computational gaps, we briefly discuss the possibility of applying some of the others to tensor decomposition, and explain why the LDP framework seems especially suited to this task.
Statistical query (SQ) model.
Sum-of-squares (SoS) hierarchy.
SoS is a powerful algorithmic tool, and SoS lower bounds (e.g. [BHK+19, KMOW17, RSS18]) are sometimes viewed as a gold standard in average-case hardness. However, it is important to note that SoS lower bounds show hardness of certification problems, whereas Problems 1.1 and 1.2 are recovery problems. Hardness of certification does not necessarily imply hardness of recovery, as discussed in [BKW20, BMR21, BBK+21]. Therefore, while there are some natural certification problems related to tensor decomposition that would be nice to have SoS lower bounds for (e.g. Open Problem 7.2 in [Kun22]), it is not clear how to use an SoS lower bound to directly address the decomposition problem.
Optimization landscape.
Prior work [AGJ15, GM17] has studied the landscape of a certain non-convex optimization problem related to tensor decomposition, leading to some algorithmic results. In other settings, recent work has studied structural properties of optimization problems as a means to prove failure of certain algorithms (e.g., local search, Markov chains, and gradient descent [GZ17, BGJ20, GJS21, BWZ20]) to recover a planted structure. If results of this type were obtained for tensor decomposition, they would complement ours by ruling out different types of algorithms. One caveat is that recent work on tensor PCA [RM14, BGJ20, MKUZ19, WEM19, BCRT20] and planted clique [GZ19, CMZ22] has revealed that the choice of which function to optimize can be very important. Thus, “hardness” of one particular canonical optimization landscape need not indicate inherent hardness of the recovery problem.
Reductions.
Reductions from planted clique (e.g. [BR13, HWX15, BBH18, BB20]) are among the strongest forms of average-case hardness that exist for statistical problems, and it would be great to have reduction-based evidence for hardness of tensor decomposition. This would likely require new ideas beyond those in the current literature, as Problem 1.1 is somewhat different in structure from planted clique and the problems it has been reduced to thus far. These differences are explained in Section 1.3.2 below.
In summary, there are many complementary approaches that could be explored in future work in order to give new perspectives on hardness of tensor decomposition. In this work we have chosen to pursue the LDP framework, some strengths of which include its stellar track record of predicting statistical-computational gaps in the past, and its flexibility to directly address the decomposition problem (Problem 1.2) rather than a related certification or optimization problem.
1.3.2 Difficulties of tensor decomposition
Although the suspected threshold for random tensor decomposition has been around for some time now [GM15, MSS16], and despite the many recent successes in computational lower bounds for other statistical problems, there are (prior to this work) effectively no results that point to whether is a fundamental barrier or not. Here we will discuss what makes tensor decomposition more difficult to analyze than the other statistical-computational gaps that have received recent attention in the literature.
As a point of comparison, we briefly introduce the tensor principal component analysis (tensor PCA) problem [RM14]: here we observe an order- tensor where is a signal-to-noise ratio, is drawn uniformly from the unit sphere, and is a tensor with i.i.d. entries; the goal is to (approximately) recover the vector (up to sign, if is even). Tensor PCA also has a statistical-computational gap, and it is now well-understood: the best known poly-time algorithms require , and there are matching lower bounds in both the SoS and LDP frameworks [HSS15, HKP+17, PR20, KWB22]. While tensor decomposition and tensor PCA may look superficially similar, tensor decomposition is much harder to analyze for the reason described below.
With a few exceptions ([SW22, KM21b, KM21a]), essentially all existing lower bounds for recovering a planted signal in the SQ, SoS, or LDP frameworks leverage hardness of testing between a “planted” distribution and a “null” distribution, where the null distribution has independent entries. In the case of tensor PCA, in order to show hardness of recovering when , the SoS and LDP lower bounds crucially leverage the fact that it is hard to even detect the planted structure when , that is, it is hard to distinguish between a spiked tensor and an i.i.d. Gaussian tensor . Now for the decomposition problem, we are hoping to show hardness of decomposition whenever , but the problem of distinguishing between a random rank- tensor and the appropriate Gaussian tensor222The Gaussian tensor matches moments of order and with the rank- tensor. (One could choose a different null distribution, but the analysis may be difficult; for instance, see Section 1.5.) is actually easy when ; this can be achieved by thresholding a particular degree-4 polynomial in the tensor entries, where each monomial forms a double cover of elements of . This “detection-recovery gap” creates difficulty because the standard tools for proving lower bounds cannot show hardness of recovery unless detection is also hard. The tools with this limitation include the SDA (statistical dimension on average) approach for SQ [FGR+17], the pseudo-calibration approach for SoS [BHK+19], and the low-degree likelihood ratio [HS17, HKP+17, Hop18, KWB22] (as well as its conditional variants [BEH+22, CGH+22]) for LDP. Reductions from planted clique also typically require hardness of detection [BBH18].
The method of [SW22] overcomes this challenge in some settings and gives LDP lower bounds for recovery problems even in regimes where detection is easy. This methods applies to problems of the form (roughly speaking) “signal plus noise” where the noise has independent entries. For instance, this gives LDP-hardness of recovery for variants of tensor PCA with detection-recovery gaps [LZ22]. However, tensor decomposition does not take this “signal plus noise” form: Problem 1.1 has a “signal” term but the remaining “noise” term is a rank- tensor, which does not have independent entries. As a result, we will need to introduce a new method in order to prove a lower bound for tensor decomposition. The proof strategy is outlined in Section 1.4.1 below.
1.4 Proof Techniques
1.4.1 Lower bound
Instead of mean squared error it will be convenient to work with an equivalent quantity, the degree- maximum correlation
This is directly related to via the identity (see [SW22, Fact 1.1]), so our objective is to show when and .
A first attempt is to compute explicitly. Expand an arbitrary degree- polynomial in the monomial basis: where ranges over multi-sets of tensor entries and . The numerator of is a linear functional of the coefficient vector :
For the denominator, is a quadratic form:
This means we have the explicit formula
The difficulty here is that, while we can write down an explicit formula for the vector and matrix , it does not seem tractable to write down an explicit formula for the inverse matrix . We will instead find an upper bound on that is manageable to work with.
Our difficulties stem from the fact that we do not have an orthogonal basis of polynomials to work with. After all, if were orthogonal polynomials—i.e., if were a diagonal matrix—then we would have no problem inverting . Since the distribution of is complicated (in particular, it is not a product measure), it seems difficult to construct an explicit basis of orthogonal polynomials. Instead, we will think of as a function of the underlying i.i.d. Rademacher random variables and work with an orthogonal basis of polynomials in those variables. For any there is an associated polynomial such that , and we can expand these as
where , , and is some vector of coefficients. Here is the standard Fourier basis for functions on the hypercube, which crucially does have the desired orthogonality property: . As a result, we can express the denominator of as simply
To exploit this, we will need to understand the relation between and the corresponding . We will write down an explicit linear transformation, i.e., a matrix such that . The crux of our new strategy is that to obtain an upper bound on it suffices to construct a left-inverse for , i.e., a matrix such that . To see why this suffices,
where in the inequality step, plays the role of except we have relaxed the problem to allow to be any vector, not necessarily in the image of .
In light of the above, our remaining task is to construct an explicit left-inverse . There are many possible choices, some of which will yield better bounds on than others. We will need one that yields a good bound but is also analytically tractable to write down explicitly. We now explain the intuition behind our construction for . Note that the left-inverse property is equivalent to whenever . That is, is a procedure to recover the (coefficients of the) polynomial when given the (coefficients of the) polynomial that satisfies . Luckily there is a methodical process for this task which starts from the highest-degree terms and works downward. At each stage, there is always a particular monomial in whose coefficient in allows us to immediately deduce some particular coefficient in . To illustrate, if and then the coefficient of the monomial
(3) |
in allows us to immediately deduce the coefficient of the monomial in , since this is the only monomial in (of degree at most ) whose expansion in terms of contains the term (3). Once we have deduced the highest-degree coefficients of , we can subtract off the corresponding terms in and repeat the process recursively. Our choice of is inspired by this process; the full details can be found in Section 3.4.
Recall now that the final step is to bound for our choice of . Due to the recursive structure of , this boils down to analyzing certain recursively-defined values (see Section 3.6). As above, is a multi-set of tensor entries, which can be thought of as a hypergraph on vertex set (with a hyperedge for each tensor entry). The value is computed by summing over all possible ways to reduce to a smaller hypergraph by replacing some collection of hyperedges by a new hyperedge equal to their symmetric difference. Analyzing this recurrence is the most technically involved part of the proof, since there are subtle cancellations that need to be understood in order to get the right bound.
1.4.2 Upper bound
To construct a polynomial that accurately estimates the largest component, we take inspiration from a line of algorithmic work that builds spectral methods from tensor networks [HSSS16, MW19, DdL+22, OTR22]. A tensor network is a diagram that specifies how to “multiply” together a collection of tensors to form another tensor; see [MW19] for details. Some existing algorithms for order-3 tensor decomposition [HSSS16, DdL+22] (including the fastest known method that reaches the threshold [DdL+22]) are based on the idea of building a tensor network from multiple copies of the input tensor, flattening the result to a matrix, and running a spectral method on this matrix. Morally speaking, these are all steps that one should be able to capture using low-degree polynomials, but the algorithm of [DdL+22] is actually a multi-stage process that seems challenging to write as a single polynomial. Also, since our goal is to estimate the largest component, we do not need to inject randomness to break symmetry among the components as in [DdL+22].
To construct our polynomial, we multiply copies of together in a tensor network whose shape is an expander graph. The output is a single vector, which we take as our estimate for . There is one key trick we use to greatly simplify the analysis: we deviate slightly from the standard notion of a tensor network by disallowing repeated “edge labels.” The precise definition of the polynomial appears in Section 4.2.
1.5 Future Directions
We collect here a list of possible extensions of our main result and related open problems.
-
•
Distribution of : We have taken drawn uniformly from the hypercube in the interest of keeping the proofs as clean as possible. We expect our approach could be adapted to other distributions for such as .
-
•
Canonical model: While we feel that the semirandom tensor decomposition model (Problem 1.2) is quite natural, there may be interest in showing hardness for the more canonical model where every is equal to 1. One way to approach this could be to study the task of hypothesis testing between a random rank- tensor and a random rank- tensor. Hardness (for poly-time algorithms) of this testing problem implies hardness of decomposition in the canonical model. It may be possible to use our proof technique to show LDP-hardness of this testing problem. Another alternative approach could be to consider a variant of Problem 1.1 where symmetry between the tensor components is broken by giving the algorithm a small amount of side-information rather than increasing the norm of one component.
-
•
Tensors with algebraic structure: Hopefully the proof techniques introduced here will open the door to other hardness results for related problems. For instance, orbit recovery problems—a class of statistical problems involving group actions [BBK+17, FSWW20]—give rise to variants of tensor decomposition with algebraic structure. One example is multi-reference alignment (MRA), where the tensor components are cyclic shifts of one another [PWB+19]. A statistical-computational gap in heterogeneous MRA was conjectured in [BBLS18]; the positive side was resolved in [Wei18] using [MSS16], and the negative side should now be approachable using our techniques. Tensors with continuous group symmetry are even less well understood [MW19, LM21].
-
•
Smoothed order-3 tensor decomposition: For , the algorithms achieving the optimal condition seem to crucially exploit randomness in the components [MSS16, DdL+22]. In contrast, other algorithms succeed in the smoothed analysis model (which imposes minimal assumptions on the components) but require [GVX14, BCMV14]. It remains unclear whether better algorithms for the smoothed model exist, or whether there is an inherent gap between the random and smoothed settings. For there is no such gap, with both random and smoothed algorithms achieving [DCC07, MSS16].
-
•
Other LDP estimation lower bounds: Proving LDP lower bounds for estimation problems remains a difficult task in many settings, with relatively few tools available compared to detection (hypothesis testing) problems. A few open problems are to resolve the LDP recovery threshold for sparse linear regression and group testing; these problems have detection-recovery gaps and the LDP detection thresholds were resolved in [BEH+22, CGH+22].
2 General Setting and Results
It will be convenient to generalize Problem 1.1 in a few different ways. First, we allow all the components to have potentially different norms. This way, there is nothing distinguished about the first component, eliminating some tedious casework. To this end, we will consider tensors of the form
(4) |
where each component is drawn uniformly and independently from the hypercube , and are deterministic nonzero scalars that are known to the algorithm. Without loss of generality we assume . The goal is to recover the largest component . The case is equivalent to Problem 1.1.
Second, we will potentially allow the algorithm access not just to an order- tensor but also to tensors of lower order. For write
(5) |
with and as above, and where for a vector , . Note that since the have -valued entries, each entry of the tensor (4) takes the form for some .
Let be a collection of subsets of . We consider the task of estimating using a degree- polynomial whose input variables are . Accordingly, we define
To make our lower bound for order- tensors as strong as possible, we will choose . This is equivalent to giving the algorithm access to all the tensors
(which is a situation that often arises when using the method of moments, e.g. [HK13, PWB+19]). Note that a lower bound in this setting implies a lower bound in the original setting where only the order- tensor is revealed.
To make our upper bound as strong as possible, we will give the algorithm access to minimal information. For odd, we will take , that is, our polynomial only needs access to the “off-diagonal” entries of the order- tensor ( where are all distinct). For even, the order- tensor alone does not suffice to disambiguate between and , so we additionally reveal the (off-diagonal part of the) order- tensor, that is, we take . (An alternative formulation for even could be to reveal only the order- tensor and ask the algorithm to estimate ; we expect this could be analyzed using our methods.)
The following are non-asymptotic statements of our main results. Together, these immediately imply Theorem 1.4.
Theorem 2.1 (Lower bound).
Consider the setting described above with any parameters , , , and set . If
(6) |
then
In particular, if is fixed, for fixed , and grows as for fixed , then for a constant .
Theorem 2.2 (Upper bound).
Consider the setting described above with any . Let be odd, and set . Suppose for some constant . Let be the smallest odd integer such that
If and
(7) |
then
(8) |
In particular, if is fixed, for fixed , and grows as for fixed , then for a constant .
Note that (7) is the bottleneck, requiring . The intermediate step in (8) is a sharper bound on the MMSE that we use for the remark below, but it does not dictate the threshold for .
Remark 2.3 (Exact recovery).
We remark that if then exact recovery of is possible with probability by thresholding the values of certain degree- polynomials . To see this: combining (2) with Markov’s inequality, we have such that with probability . Furthermore, thresholding is guaranteed to exactly recover whenever .
3 Proof of Theorem 2.1: Lower Bound
3.1 Setup
Throughout, define . Our goal will be to give an upper bound on
This will imply the desired result due to the following direct relation between and the associated MMSE.
Fact 3.1 ([SW22, Fact 1.1]).
.
Any degree- polynomial admits an expansion of the form
for some coefficients , where takes values with and .
At the same time, can be thought of as a function of , the matrix with columns . This means we can also expand as
where ranges over subsets of cardinality . This expansion will be useful because, since has i.i.d. Rademacher entries, forms an orthonormal basis in the sense . As a consequence,
(9) |
Any vector of coefficients induces a unique choice of such that
(10) |
It will be important to understand this mapping from to .
3.2 Proof Overview
We will show that the mapping from to in (10) is a linear transformation that takes the form for an explicit matrix . A key step in the proof will be to construct an explicit left inverse for , that is, a matrix satisfying . In other words, is a matrix that recovers the coefficients from the coefficients : .
The numerator of can be expressed as
where the vector is defined by
(11) |
Using (9), the denominator can be expressed as . This means we can write
(12) |
In the crucial inequality above, plays the role of except we have relaxed the problem to allow to be any vector, not necessarily in the image of . After this simplification, the optimizer for is (or any scalar multiple thereof), yielding the explicit expression for the optimum. So long as we can construct a left inverse , this gives an upper bound on . While many choices for are possible, we will need to find one that (i) is simple enough to work with explicitly, and (ii) results in a good bound on at the end of the day.
3.3 Computing
The first step in writing down an explicit expression for will be to express in the basis .
Definition 3.2 (List notation for ).
We will identify with a multi-set, namely the multi-set containing copies of each . With some abuse of notation, write as an ordered list containing the elements of the associated multi-set sorted according to some arbitrary but fixed ordering on . For a labeling , define to be the subset containing all pairs with the following property: the element occurs in an odd number of the sets . (Informally, is produced by placing each into column of an grid and then XOR’ing the contents of each column.)
With this notation we can write
(13) |
where .
Next we consider an arbitrary vector of coefficients and write down an expression for the corresponding in (10). We have
In other words, where
(14) |
3.4 Constructing the Left Inverse
We now construct a left inverse for , which, recall, is needed to apply the bound (12). The intuition behind this construction is described in Section 1.4.1.
Definition 3.3 (Generic ).
We call generic provided that every column satisfies , and at most columns satisfy .
These are the “generic” terms appearing in the expansion (13): is generic if and only if there exist with and with distinct entries such that . (Note that (6) implies , so it is possible for to have distinct entries.) Furthermore, if is generic then there is a unique corresponding , namely defined as follows.
Definition 3.4.
For a generic , define by letting be the number of columns for which .
When viewing as a multi-set as per Definition 3.2, is simply the multi-set of columns . Recalling the definition , note that denotes the number of non-empty columns of .
Our left inverse will satisfy whenever is not generic; in other words, our procedure for recovering from only uses the values for which is generic. Write in block form
where (“generic”) is indexed by generic ’s and (“not”) is indexed by the rest. It suffices to construct a left inverse for and then set
(15) |
Note that has the recursive structure
where the first block of columns is indexed by and the second is indexed by , and the first block of rows is indexed by and the second by . Crucially, the lower-left block is 0: recall (14) and note that is only possible when . Given any left inverses for respectively, one can verify that the following matrix is a valid left inverse for :
(16) |
The left inverse can be constructed by applying (16) recursively. The matrix has only one nonzero entry per row and so we will be able to construct by hand.
Lemma 3.5.
The following is a valid left inverse for : for generic and , define by
(17) |
where
and
(18) |
Proof.
We need to verify . For ,
where the last step is justified as follows: since , the indicator can only be satisfied when has distinct entries. There are such ’s, and for each there is exactly one term in the first sum satisfying , namely , and this implies and . This means the indicator becomes , and the other factors cancel. This completes the proof. ∎
This completes the description of the left inverse .
3.5 Recurrence for
Ultimately we are interested in an expression not for but for the vector appearing in (12). We will use the calculations from the previous section to write down a self-contained recursive formula for the entries of . Note that from (15), whenever is not generic, so we will focus on computing . Using (16),
where is indexed by and is indexed by . The expression for reveals that does not depend on (so long as ). By comparing the expressions for and we can write
For any generic , the case of the above gives
We treat the two terms separately. Using the definition (17) for ,
(I) | |||
Now for the second term (II), suppressing the constraints on for ease of notation,
(II) | |||
Putting it together, we have now shown
(19) |
where, for generic , the are defined by the recurrence
(20) |
3.6 Recurrence for
It will be convenient to rewrite the recurrence (20) in terms of a different quantity indexed by instead of , namely
(21) |
It can be seen from (20) that is well-defined in the sense that it does not depend on the choice of in (21). In particular,
Using (21) we can turn this into a self-contained recurrence for (not involving ):
(22) |
where the predicate in particular requires to be generic. For any there are at most corresponding generic ’s for which . For any such , we have from (21) that . This means (19) gives
(23) |
where is defined by the recurrence (22).
3.7 Grouping by Patterns
The core challenge remaining is to analyze the recurrence (22) and establish an upper bound on . Naively bounding the terms in (22) will not suffice, as there are subtle cancellations that occur. To understand the nature of these cancellations, we will rewrite (22) in a different form. First we will group the terms in (22) by their type, defined as followed. We use to denote the XOR (symmetric difference) operation on sets.
Definition 3.6.
Fix , viewed as a list as per Definition 3.2. Define a pattern to be an element of satisfying the following rules:
-
(i)
“No singletons”: if then there must exist such that .
-
(ii)
“Not all stars”: there must exist such that .
-
(iii)
“Every column valid”: for every , we have .
Let denote the set of all such patterns. We let and define to have th column for all .
Note that if has no stars, it is simply a labeling , in which case the definitions and coincide. Intuitively, a pattern describes a class of possible labelings , where the stars are “wildcards” that may represent any element of (subject to some restrictions). More formally, we now define which ’s belong to a pattern .
Definition 3.7.
Fix , viewed as a list . For and , write if the following conditions hold:
-
(i)
For each , if then .
-
(ii)
The “starred” columns are distinct.
-
(iii)
For every we have .
Let denote the set of labelings for which there exists such that and ; these are the ’s that contribute to the sum in (22). For any , there is at least one (and possibly more than one) pattern for which . The following inclusion-exclusion formula will allow us to sum over in a way that counts every exactly once.
Lemma 3.8.
Fix , viewed as a list . Fix a function . For and , define
and
Then we have
(24) |
Proof.
First note that implies and so there are no “extra” terms on the right-hand side that are not present on the left-hand side. For any fixed , the term appears exactly once on the left-hand side of (24). Our goal is to show that it also appears exactly once on the right-hand side, that is, it suffices to prove
For a fixed , we need to enumerate the possible patterns for which . The rules for these patterns are as follows:
-
•
(Case 1) For any , if there is exactly one for which then . In this case, .
-
•
(Case 2) For any , if then there are no stars among . In this case, .
-
•
(Case 3) For any not in Case 1, if then among there are either no stars or exactly one star of the form where and . If there are no stars then . There are ways to have one star, and each yields .
Now we have
as desired. ∎
Using Lemma 3.8, we can rewrite (22) as
The number of labelings such that is where is the number of columns for which , and is the number of stars in (i.e., the number of indices such that ). Note that the ratio depends only on (not on ), and so we define . Also, the predicate depends only on (not on ), and we write this predicate as . With this notation, the recurrence for becomes
(25) |
3.8 Unravelling the Recurrence
Next we will rewrite (25) in closed form (without recursion) by expanding the recursion tree as a sum over “paths.”
We first unpack the definition (11) for . Using (13) and recalling that has i.i.d. Rademacher entries,
(26) |
By expanding the recursion tree of (25) we can write as a sum over paths which we denote by
(27) |
Formally, a path consists of and (which then determine ) with for all , for , and (“no stars at the final step”) such that the following properties hold:
-
•
For all , we require and the predicate holds.
-
•
For the final step we require .
With this notation, (25) can be written as
(28) |
where the special rule for the final step comes from (26).
3.9 Excluding Bad Paths
The next step is the crux of the argument: we will show that only certain types of paths contribute to (28), due to cancellations among the remaining paths.
Definition 3.9 (Event).
For a path of the form (27), we say an event occurs at timestep on column if there exists for which .
Note that Definition 3.6 requires every timestep to have an event on at least one column .
Definition 3.10 (Deletion event).
An event at timestep on column is called a deletion event if .
Definition 3.11 (Good/bad paths).
A path of the form (27) is called bad if there exists a column such that the last event (i.e., with the largest ) on that column is a deletion event. If a path is not bad, it is called good.
Lemma 3.12.
The total contribution from bad paths to (28) is 0. That is,
(29) |
Proof.
We will show that the bad paths can be paired up so that within each pair, the two paths contribute the same term to (28) but with opposite signs. The pairing is described by the following procedure, an involution that maps each bad path to its partner (and vice versa).
-
(1)
Given a bad path as input, let denote the largest column index for which the last event is a deletion event. Let denote the timestep on which this deletion event occurs.
-
(2a)
If there exists another event at timestep (on some column ), “promote” the deletion event to its own timestep. That is, replace
where are defined as follows. For all such that , set ; for all other , set . Now (viewed as a list) is produced from by removing the elements for which ; similarly, is produced from by removing the elements that are equal to .
-
(2b)
If instead there is no other event at timestep (which cannot happen if due to the non-deletion event on column 1), “merge” timestep with the subsequent timestep. That is, replace
where is defined as follows. For all such that , set . The number of remaining entries of is exactly , and we designate these entries as “unassigned”; for each , set the th unassigned entry of to be . Note that since the event was the last event in its column, the merge operation will not cause it to “collide” with another event.
A few claims remain to be checked before the proof is complete. First note that the above procedure maps any bad path to a different bad path (its “partner”), and applying the procedure again on the partner recovers the original path. For instance, if applying the procedure to the original path resulted in a “promote” operation on column , applying the procedure on the partner will undo this using a “merge” operation on the same column .
We furthermore claim that for any bad path and its partner, both paths have the same value for the factor in (28). However, the lengths of the two paths differ by one, causing the two corresponding terms in (28) to cancel due to the factor. To prove the claim, compare the cases and from (2a) above, where for now we assume . For the factors, note that whenever has either a deletion event on column or no event on column , and so . For the factors, note that and ; also, and , so cancels with the existing factor of in (28). Finally, for the factors we have . The other case versus is treated similarly, where now we have , , and . This completes the proof. ∎
3.10 Bounding
Now that we have identified the crucial cancellations between pairs of bad paths, the rest of the proof will follow by bounding the terms in (29) in a straightforward way. We start by collecting some bounds on the individual pieces of (29).
Lemma 3.13.
For any step , .
Proof.
Note that
The number of stars plus the number of distinct columns in must be at least . This leaves at most entries of that repeat a previous column, i.e.,
(30) |
This means
where the final step uses (30). ∎
Lemma 3.14.
For any step , .
Proof.
Recall for any such that . Recall that is the product of over the non-empty columns of . For any such non-empty column , there must exist for which . Thus, every factor of in the denominator of is cancelled by the numerator, and the result now follows because . ∎
We now state the main conclusion of this section.
Lemma 3.15.
For any with , we have .
Note that for , it can be verified directly that .
Proof.
Proceed by induction on . We will bound the sum of absolute values of terms in (29); it will no longer be important to exploit cancellations between positive and negative terms. First consider paths of the form . There is at most one such path that is good, namely ; the value of this term is .
All remaining paths take the form where for some . In order to produce , must have exactly entries that are either stars (i.e., ) or first in a “combination” event (i.e., for some with , is the lowest index such that ). There are choices for which elements of play these roles; by default they will all be stars, and will be converted to “combinations” if another entry of decides to join them on the same column.
Now there are entries of remaining, which have a few options. One option is to participate in a combination event by joining one of the previously designated entries on the same column. The other option is to participate in a deletion event. Since we are only counting good paths, this can only happen on a column on which a later event will occur. Regardless of the remainder of the path , there are at most such columns available. Since each of entries of has at most choices, this gives a total of at most choices.
We also need to decide which columns the combination events occur on. If has stars and combination events, there are choices for the columns. Note that this exactly cancels the factor in (29) because and .
3.11 Putting it Together
We now combine (23) with Lemma 3.15 to complete the proof of Theorem 2.1. We first note that only when the elements of (viewed as a multi-set) together with form an even cover of .
Lemma 3.16.
Let denote the property that for . If fails to hold then .
Proof.
From (26), note that if then for some , which implies for . Thus whenever fails.
Also note that if fails and for some , then fails. The result now follows from (22) using induction on . ∎
Lemma 3.17.
For any , the number of multi-sets with such that holds is at most .
Proof.
Since holds, together with forms an even cover of . Therefore the number of elements of covered by is at most . The number of ways to choose this many elements is at most . Once these are chosen, has at most elements to draw from, so the number of possibilities for is at most . ∎
4 Proof of Theorem 2.2: Upper Bound
4.1 Expander Graphs
We begin by collecting some standard properties of expander graphs.
Proposition 4.1.
Fix an integer . For all even exceeding a constant , there exists a -regular -vertex (simple) graph with the following properties:
-
•
the minimum cut value is (achieved by a single vertex), and
-
•
for any with , ,
where is the set of edges with exactly one endpoint in , and is an absolute constant.
Proof.
It follows from classical results that a uniformly random -regular -vertex graph has these properties with high probability, i.e., probability as with fixed. Letting be such a graph, it is well-known that is -connected with high probability [Bol98, Section 7.6], which proves the first statement about the minimum cut.
For the second statement, let denote the eigenvalues of the adjacency matrix of . Friedman’s second eigenvalue theorem [Fri08] states that for any fixed , with high probability. Cheeger’s inequality [Dod84, AM85] (see Theorem 2.4 of [HLW06]) tells us that for any with , . Combining these gives
which concludes the proof for any choice of satisfying . The expression is minimized (over ) when . ∎
4.2 Constructing the Polynomial
Let where is defined as in Theorem 2.2, choosing large enough so that . From this point onward, let denote the -regular -vertex graph guaranteed by Proposition 4.1. We construct a new graph as follows. Starting with , add two additional vertices called and , and add the edge . Recall that is the odd element of . Choose arbitrary edges of with no endpoints in common. Delete these edges and add the edges . This completes the description of . Note that is -regular aside from the degree-1 vertex and the degree- vertex .
Definition 4.2.
Define an edge-labeling of to be a function that is injective (no two edges get the same label) with . Let denote the set of all edge-labelings of .
For an edge-labeling and a vertex , define to be the following entry of the input tensor: let be the edges incident to (where ) and then let . Our polynomial estimator is defined as follows:
(31) |
4.3 Vertex Labelings
Definition 4.3.
Define a vertex-labeling of to be a function . Let denote the set of all vertex-labelings of .
For , , and , define as follows: letting be the edges incident to , and , let . Recalling (5), we can expand (31) as
We will break the above sum into different terms depending on . Define the partition as follows:
-
•
where denotes the all-ones labeling: for all ,
-
•
where denotes the all-’s labeling: for all ,
-
•
.
We can now write where for ,
4.4 Signal Term
We first handle the terms and .
Lemma 4.4.
.
Proof.
We have
Recalling that has -valued entries and , note that for any ,
because each edge contributes a factor of except the edge , which is required by Definition 4.2 to have and thus contributes a factor of . The result follows. ∎
Lemma 4.5.
.
Proof.
4.5 Noise Term
We now handle the term . This section is devoted to proving the following.
Lemma 4.6.
Under the assumptions of Theorem 2.2, .
To compute , it will help to introduce an auxiliary graph defined as follows. Start with two disjoint copies of , called and . Delete the vertices and and connect the two leftover half-edges to form the edge . This completes the description of .
The vertices of can be partitioned as where comes from the copy of in , and from . Similarly, the edges of can be partitioned as where comes from , and from .
Definition 4.7.
Define an edge-labeling of to be a function such that , no other edge has the label 1, no two edges in (defined above) have the same label, and no two edges in have the same label. Let denote the set of all edge-labelings of .
Definition 4.8.
Define a vertex-labeling of to be a function such that takes at least two different values within (defined above), and takes at least two different values within . Let denote the set of all vertex-labelings of .
For , , and , define similarly to above: letting be the edges incident to , and , let .
With the above definitions in hand, and recalling that is the set of vertex-labelings of that take at least two different values, we can write
(32) |
Only certain “valid” pairs yield a term with nonzero expectation.
Definition 4.9.
For and , we say is valid if the following holds: for each and , there is an even number of edges with the property that and exactly one endpoint of has vertex-label . In fact, this even number must be 2 because, by Definition 4.7, only two edges can share the same label .
Note that valid pairs are precisely those for which the corresponding term in (32) has an even number of factors of each . We can now write
(33) |
Definition 4.10.
For , a region is the preimage under of some . In other words, a region consists of all vertices of that have a particular label.
For a valid pair, let denote the number of non-empty regions. By Definition 4.8 we must have . Since is the only edge with label (Definition 4.7) and is valid, and must belong to the same region; call this region 1, and number the other non-empty regions . For , let denote the number of vertices in that belong to region and let denote the number of vertices in that belong to region . Let and where, recall, . For , let denote the number of edges in that “cross” region (i.e., have exactly one endpoint in region ). Since the edges of crossing region must be paired up with each pair having the same edge-label (Definition 4.9), and edge-labels cannot repeat within or (Definition 4.7), must also be equal to the number of edges in that cross region . The total number of cross-edges (i.e., edges of whose endpoints have different vertex-labels) is . Note that is never a cross-edge since both its endpoints belong to region 1. As a consequence of the above discussion, every non-empty region must include at least one vertex from both and .
Lemma 4.11.
Proof.
Recall that belongs to region 1 by convention. Let denote the vertices in that belong to region . The case is possible only if , in which case we have . The case is possible only if (since there must be at least 2 regions, each containing a vertex from both and ), in which case again . This leaves the case . Proposition 4.1 (applied to either or , whichever is smaller) tells us that the number of original edges of (recall some edges were deleted to form ) crossing is at least the maximum of and . For each edge that was deleted from to form , if crosses then one of the two new edges or must cross region . We conclude that at least edges in cross region . The same argument applied to gives the bound . ∎
Lemma 4.12.
Proof.
The proof is the same as that of Lemma 4.11 except now the cases and are impossible. ∎
Proof of Lemma 4.6.
Combining Lemmas 4.11 and 4.12, we have for any valid , the total number of cross-edges is
(34) |
where
(35) |
and
We now work towards bounding (33). Since every non-empty region must include at least one vertex from both and , the possible values for are . The number of ways to choose the values and is at most because and for , . Once these values are chosen, the number of ways to partition into non-empty regions of the prescribed sizes is at most
where the last step uses (35) to conclude .
Now that the regions are chosen, we next count the number of ways to assign vertex-labels that respect these regions. At the same time, we will also bound the term appearing in (33). Recall that all vertices in a given region must have the same vertex-label. We consider two cases. First suppose every region contains at most vertices (half the total number in ). There are at most ways to assign the vertex-labels and, since at most half the vertices have label 1, we have . Now consider the other case where some “large” region has more than vertices. If we choose to assign vertex-label 1 to the large region, then there are at most ways to assign the remaining labels and ; otherwise, there are at most ways to assign the labels and .
Now we count the number of ways to assign edge-labels . Recall that the edge is required to have edge-label , and no other edge can have edge-label . Recall that there are cross-edges in and cross-edges in . These need to be paired up, with each cross-edge in having the same edge-label as some cross-edge in . There are ways to choose the pairing and then ways to assign edge-labels to the cross-edges, recalling the falling factorial notation (18). There are edges in remaining, which can have any edge-labels subject to not repeating within , so there are ways to label these edges and the same number of ways to label the rest of . (Here we have assumed , which will indeed be the case: by definition, and we will see below.)
Putting it all together, (33) becomes
We will bound pieces of this expression separately. First, since , we have
and recalling from above, | ||||
Recalling from (34) that , this becomes | ||||
and recalling , | ||||
4.6 Putting it Together
Acknowledgments
The author is indebted to Jonathan Niles-Weed, Tselil Schramm, and Jerry Li for numerous detailed discussions about this problem. The author also thanks Jingqiu Ding, Bruce Hajek, Tim Kunisky, Cris Moore, Aaron Potechin, Bobby Shi, and anonymous reviewers, for helpful discussions and comments.
References
- [AAA17] Esraa Al-Sharoa, Mahmood Al-Khassaweneh, and Selin Aviyente. A tensor based framework for community detection in dynamic networks. In International conference on acoustics, speech and signal processing (ICASSP), pages 2312–2316. IEEE, 2017.
- [ABG+14] Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, and James Voss. The more, the merrier: the blessing of dimensionality for learning large gaussian mixtures. In Conference on Learning Theory, pages 1135–1164. PMLR, 2014.
- [AC08] Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic barriers from phase transitions. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 793–802. IEEE, 2008.
- [AFH+12] Anima Anandkumar, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Yi-Kai Liu. A spectral algorithm for latent dirichlet allocation. Advances in neural information processing systems, 25, 2012.
- [AGHK13] Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. A tensor spectral approach to learning mixed membership community models. In Conference on Learning Theory, pages 867–881. PMLR, 2013.
- [AGJ14] Anima Anandkumar, Rong Ge, and Majid Janzamin. Analyzing tensor power method dynamics: Applications to learning overcomplete latent variable models. arXiv preprint arXiv:1411.1488, 2014.
- [AGJ15] Animashree Anandkumar, Rong Ge, and Majid Janzamin. Learning overcomplete latent variable models through tensor methods. In Conference on Learning Theory, pages 36–112. PMLR, 2015.
- [AM85] Noga Alon and Vitali D Milman. , isoperimetric inequalities for graphs, and superconcentrators. Journal of Combinatorial Theory, Series B, 38(1):73–88, 1985.
- [BB20] Matthew Brennan and Guy Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
- [BBH18] Matthew Brennan, Guy Bresler, and Wasim Huleihel. Reducibility and computational lower bounds for problems with planted sparse structure. In Conference On Learning Theory, pages 48–166. PMLR, 2018.
- [BBH+21] Matthew S Brennan, Guy Bresler, Sam Hopkins, Jerry Li, and Tselil Schramm. Statistical query algorithms and low degree tests are almost equivalent. In Conference on Learning Theory, pages 774–774. PMLR, 2021.
- [BBK+17] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexander S Wein. Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163, 2017.
- [BBK+21] Afonso S Bandeira, Jess Banks, Dmitriy Kunisky, Christopher Moore, and Alexander S Wein. Spectral planting and the hardness of refuting cuts, colorability, and communities in random graphs. In Conference on Learning Theory, pages 410–473. PMLR, 2021.
- [BBLS18] Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer. Heterogeneous multireference alignment: A single pass approach. In 52nd Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2018.
- [BCMV14] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603, 2014.
- [BCO14] Cristiano Bocci, Luca Chiantini, and Giorgio Ottaviani. Refined methods for the identifiability of tensors. Annali di Matematica Pura ed Applicata (1923-), 193(6):1691–1702, 2014.
- [BCRT20] Giulio Biroli, Chiara Cammarota, and Federico Ricci-Tersenghi. How to iron out rough landscapes and get optimal performances: averaged gradient descent and its application to tensor PCA. Journal of Physics A: Mathematical and Theoretical, 53(17):174003, 2020.
- [BEH+22] Afonso S Bandeira, Ahmed El Alaoui, Samuel B Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik. The Franz-Parisi criterion and computational trade-offs in high dimensional statistics. arXiv preprint arXiv:2205.09727, 2022.
- [BGJ20] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor PCA. The Annals of Probability, 48(4):2052–2087, 2020.
- [BH22] Guy Bresler and Brice Huang. The algorithmic phase transition of random k-SAT for low degree polynomials. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 298–309. IEEE, 2022.
- [BHK+19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.
- [BKS15] Boaz Barak, Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 143–151, 2015.
- [BKW20] Afonso S Bandeira, Dmitriy Kunisky, and Alexander S Wein. Computational hardness of certifying bounds on constrained PCA problems. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151, page 78. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- [BM20] Davide Bacciu and Danilo P Mandic. Tensor decompositions in deep learning. arXiv preprint arXiv:2002.11835, 2020.
- [BMR21] Jess Banks, Sidhanth Mohanty, and Prasad Raghavendra. Local statistics, semidefinite programming, and community detection. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1298–1316. SIAM, 2021.
- [Bol98] Béla Bollobás. Random graphs. In Modern graph theory, pages 215–252. Springer, 1998.
- [BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.
- [BWZ20] Gérard Ben Arous, Alexander S Wein, and Ilias Zadik. Free energy wells and overlap gap property in sparse PCA. In Conference on Learning Theory, pages 479–482. PMLR, 2020.
- [CGH+22] Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Alexander S Wein, and Ilias Zadik. Statistical and computational phase transitions in group testing. In Conference on Learning Theory, pages 4764–4781. PMLR, 2022.
- [CMZ22] Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-linear planted cliques elude the metropolis process. arXiv preprint arXiv:2204.01911, 2022.
- [DCC07] Lieven De Lathauwer, Josphine Castaing, and Jean-Franois Cardoso. Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing, 55(6):2965–2973, 2007.
- [DdL+22] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022.
- [DK22] Ilias Diakonikolas and Daniel Kane. Non-gaussian component analysis via lattice basis reduction. In Conference on Learning Theory, pages 4535–4547. PMLR, 2022.
- [DKMZ11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.
- [DKS17] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73–84. IEEE, 2017.
- [DKWB19] Yunzi Ding, Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Subexponential-time algorithms for sparse PCA. arXiv preprint arXiv:1907.11635, 2019.
- [Dod84] Jozef Dodziuk. Difference equations, isoperimetric inequality and transience of certain random walks. Transactions of the American Mathematical Society, 284(2):787–794, 1984.
- [FGR+17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
- [Fri08] Joel Friedman. A proof of Alon’s second eigenvalue conjecture and related problems. American Mathematical Soc., 2008.
- [FSWW20] Zhou Fan, Yi Sun, Tianhao Wang, and Yihong Wu. Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model. Communications on Pure and Applied Mathematics, 2020.
- [Gam21] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41):e2108492118, 2021.
- [GHK15] Rong Ge, Qingqing Huang, and Sham M Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 761–770, 2015.
- [GJS21] David Gamarnik, Aukosh Jagannath, and Subhabrata Sen. The overlap gap property in principal submatrix recovery. Probability Theory and Related Fields, 181(4):757–814, 2021.
- [GJW20] David Gamarnik, Aukosh Jagannath, and Alexander S Wein. Low-degree hardness of random optimization problems. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 131–140. IEEE, 2020.
- [GM15] Rong Ge and Tengyu Ma. Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, page 829–849, 2015.
- [GM17] Rong Ge and Tengyu Ma. On the optimization landscape of tensor decompositions. Advances in Neural Information Processing Systems, 30, 2017.
- [GMZ22] David Gamarnik, Cristopher Moore, and Lenka Zdeborová. Disordered systems insights on computational hardness. arXiv preprint arXiv:2210.08312, 2022.
- [GS17] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. The Annals of Probability, pages 2353–2376, 2017.
- [GVX14] Navin Goyal, Santosh Vempala, and Ying Xiao. Fourier PCA and robust tensor decomposition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 584–593, 2014.
- [GZ17] David Gamarnik and Ilias Zadik. Sparse high-dimensional linear regression. algorithmic barriers and a local search algorithm. arXiv preprint arXiv:1711.04952, 2017.
- [GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property. arXiv preprint arXiv:1904.07174, 2019.
- [Hås89] Johan Håstad. Tensor rank is NP-complete. In International Colloquium on Automata, Languages, and Programming, pages 451–460. Springer, 1989.
- [HK13] Daniel Hsu and Sham M Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 11–20, 2013.
- [HKP+17] Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, and David Steurer. The power of sum-of-squares for detecting hidden structures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2017.
- [HL13] Christopher J Hillar and Lek-Heng Lim. Most tensor problems are NP-hard. Journal of the ACM (JACM), 60(6):1–39, 2013.
- [HLW06] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43(4):439–561, 2006.
- [Hop18] Samuel Hopkins. Statistical Inference and the Sum of Squares Method. PhD thesis, Cornell University, 2018.
- [HS17] Samuel B Hopkins and David Steurer. Efficient bayesian estimation from few samples: community detection and related problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017.
- [HSS15] Samuel B Hopkins, Jonathan Shi, and David Steurer. Tensor principal component analysis via sum-of-square proofs. In Conference on Learning Theory, pages 956–1006. PMLR, 2015.
- [HSS19] Samuel B Hopkins, Tselil Schramm, and Jonathan Shi. A robust spectral algorithm for overcomplete tensor decomposition. In Conference on Learning Theory, pages 1683–1722. PMLR, 2019.
- [HSSS16] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 178–191, 2016.
- [HW21] Justin Holmgren and Alexander S Wein. Counterexamples to the low-degree conjecture. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), volume 185, 2021.
- [HWX15] Bruce Hajek, Yihong Wu, and Jiaming Xu. Computational lower bounds for community detection on random graphs. In Conference on Learning Theory, pages 899–928. PMLR, 2015.
- [Jer92] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.
- [JLLX21] Bing-Yi Jing, Ting Li, Zhongyuan Lyu, and Dong Xia. Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics, 49(6):3181–3205, 2021.
- [KM21a] Frederic Koehler and Elchanan Mossel. Reconstruction on trees and low-degree polynomials. arXiv preprint arXiv:2109.06915, 2021.
- [KM21b] Pravesh K Kothari and Peter Manohar. A stress-free sum-of-squares lower bound for coloring. arXiv preprint arXiv:2105.07517, 2021.
- [KMOW17] Pravesh K Kothari, Ryuhei Mori, Ryan O’Donnell, and David Witmer. Sum of squares lower bounds for refuting any CSP. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 132–145, 2017.
- [Kol21] Tamara G. Kolda. Will the real Jennrich’s algorithm please stand up? Available online at www.mathsci.ai/post/jennrich, December 2021. Accessed: 10-22-2022.
- [KP20] Bohdan Kivva and Aaron Potechin. Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 SOS. arXiv preprint arXiv:2011.09416, 2020.
- [Kun21] Dmitriy Kunisky. Hypothesis testing with low-degree polynomials in the Morris class of exponential families. In Conference on Learning Theory, pages 2822–2848. PMLR, 2021.
- [Kun22] Dmitriy Kunisky. Lecture notes on sum-of-squares optimization. Available online at www.kunisky.com/static/teaching/2022spring-sos/sos-notes.pdf, 2022. Accessed: 09-29-2022.
- [KWB22] Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. In ISAAC Congress (International Society for Analysis, its Applications and Computation), pages 1–50. Springer, 2022.
- [LKZ15] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 680–687. IEEE, 2015.
- [LM21] Allen Liu and Ankur Moitra. Algorithms from invariants: Smoothed analysis of orbit recovery over . arXiv preprint arXiv:2106.02680, 2021.
- [LRA93] Sue E Leurgans, Robert T Ross, and Rebecca B Abel. A decomposition for three-way arrays. SIAM Journal on Matrix Analysis and Applications, 14(4):1064–1083, 1993.
- [LZ22] Yuetian Luo and Anru R Zhang. Tensor clustering with planted structures: Statistical optimality and computational limits. The Annals of Statistics, 50(1):584–613, 2022.
- [MKUZ19] Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. In International conference on machine learning, pages 4333–4342. PMLR, 2019.
- [Moi14] Ankur Moitra. Algorithmic aspects of machine learning. Lecture notes, 2014.
- [MR05] Elchanan Mossel and Sébastien Roch. Learning nonsingular phylogenies and hidden markov models. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 366–375, 2005.
- [MSS16] Tengyu Ma, Jonathan Shi, and David Steurer. Polynomial-time tensor decompositions with sum-of-squares. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 438–446. IEEE, 2016.
- [MW19] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 926–937, 2019.
- [MW22] Andrea Montanari and Alexander S Wein. Equivalence of approximate message passing and low-degree polynomials in rank-one matrix estimation. arXiv preprint arXiv:2212.06996, 2022.
- [OTR22] Mohamed Ouerfelli, Mohamed Tamaazousti, and Vincent Rivasseau. Random tensor theory for tensor decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
- [PR20] Aaron Potechin and Goutham Rajendran. Machinery for proving sum-of-squares lower bounds on certification problems. arXiv preprint arXiv:2011.04253, 2020.
- [PWB+19] Amelia Perry, Jonathan Weed, Afonso S Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497–517, 2019.
- [RM14] Emile Richard and Andrea Montanari. A statistical model for tensor PCA. Advances in neural information processing systems, 27, 2014.
- [RSG17] Stephan Rabanser, Oleksandr Shchur, and Stephan Günnemann. Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781, 2017.
- [RSS18] Prasad Raghavendra, Tselil Schramm, and David Steurer. High dimensional estimation via sum-of-squares proofs. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, pages 3389–3423. World Scientific, 2018.
- [SDF+17] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E Papalexakis, and Christos Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551–3582, 2017.
- [SS17] Tselil Schramm and David Steurer. Fast and robust tensor decomposition with applications to dictionary learning. In Conference on Learning Theory, pages 1760–1793. PMLR, 2017.
- [SW22] Tselil Schramm and Alexander S Wein. Computational barriers to estimation from low-degree polynomials. The Annals of Statistics, 50(3):1833–1858, 2022.
- [Wei18] Alexander Spence Wein. Statistical estimation in the presence of group actions. PhD thesis, Massachusetts Institute of Technology, 2018.
- [Wei22] Alexander S Wein. Optimal low-degree hardness of maximum independent set. Mathematical Statistics and Learning, 4(3):221–251, 2022.
- [WEM19] Alexander S Wein, Ahmed El Alaoui, and Cristopher Moore. The Kikuchi hierarchy and tensor PCA. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1446–1468. IEEE, 2019.
- [WX21] Yihong Wu and Jiaming Xu. Statistical problems with planted structures: Information-theoretical and computational limits. Information-Theoretic Methods in Data Science, 383, 2021.
- [ZSWB22] Ilias Zadik, Min Jae Song, Alexander S Wein, and Joan Bruna. Lattice-based methods surpass sum-of-squares in clustering. In Conference on Learning Theory, pages 1247–1248. PMLR, 2022.