Circuit Lower Bounds for the p-Spin Optimization Problem
Abstract
We consider the problem of finding a near ground state of a -spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors’ recent work [GJW20], we establish that any poly-size -output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least as grows. This is stronger than the known state of the art bounds of the form for similar combinatorial optimization problems, where depends on the optimality value. For example, for the largest clique problem corresponds to the square of the size of the clique [Ros10]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work [GJW20], the approach is based on the overlap gap property (OGP) exhibited by random -spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem.
To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity.
1 Introduction
Boolean circuits constitute one of the standard models for understanding algorithmic tractability and hardness in combinatorial optimization problems [AB09, Sip97, AS04]. One potential route to proving the widely-believed conjecture would be to show non-existence of polynomial-size Boolean circuits solving problems in . A large body of literature in circuit complexity is devoted to establishing limits on the power of circuits with bounds on its depth, specifically the power of constant depth circuits known as AC[0] circuits and its immediate extension – circuits with nearly logarithmic (in problem size) depth. Many hardness results have been obtained by working on models with random inputs. A classical result is one by Håstad [Hås86] who established a lower bound on the depth of any poly-size circuit that computes the -parity function. A lot of focus is on circuit complexity for computing various natural combinatorial optimization problems. For example Rossman [Ros10] has shown that poly-size circuits detecting the presence of a -size clique in a graph must have depth at least , where is the number of graph nodes and is any function growing in . This was obtained by working with the sparse random Erdös-Rényi graph model with the edge probability tuned so that the largest “naturally occurring” clique is of size smaller than , with high probability as increases. In fact, for combinatorial optimization problems this represents the state of the art lower bound on circuit depth for polynomial-size circuits, though many extensions exist for subgraph homomorphism existence problems [Ros18, LRR17].
In this paper we establish a lower bound of on the depth of any polynomial-size Boolean circuit that solves another natural combinatorial optimization problem. It is a core computational problem in statistical physics of finding a near ground state of a p-spin model with i.i.d. Rademacher couplings. Specifically, we show that any polynomial-size -output Boolean circuit that produces a spin assignment with objective value within a certain constant factor of optimality, has depth at least . In particular, unlike the aforementioned result for cliques, our lower bound does not depend on the objective value. The result though, strictly speaking, does not improve upon [Ros10], since our circuits are supposed to produce the entire solution, as opposed to just solving the associated YES/NO decision problem. Nevertheless, to the best of our knowledge, our result presents the strongest known circuit depth lower bounds for combinatorial optimization problems.
Our result is obtained as a fairly direct extension of our recent work [GJW20], where limits are obtained for algorithms based on low-degree polynomials for the same problem (though with Gaussian as opposed to Rademacher couplings). The associated optimization problem admits a polynomial-time approximation scheme for the case [Mon21] and also for some so-called mixed spin models [AMS20], as well as for some so-called spherical spin glass models [Sub21] associated with optimizing over the sphere as opposed to the binary cube. But as shown in [GJW20], algorithms based on low-degree polynomials fail in general for the problem of finding near ground states in -spin models.
As in our earlier paper we employ the overlap gap property (OGP) exhibited by this and many other combinatorial optimization and constraint satisfaction problems with random inputs. In the present context, the OGP says that every two spin assignments that achieve value within a certain multiplicative factor away from optimality have to either agree on many coordinates (high overlap) or disagree on many coordinates (low overlap). More specifically, as in the aforementioned reference, we employ the so-called ensemble overlap gap property (e-OGP), which we formally introduce in the next section. At the same time, employing standard circuit insensitivity results, specifically the Linial-Mansour-Nisan Theorem [LMN93], we show that vectors produced by the outputs of low-depth circuits can produce two spin assignments with overlap value ruled out by the e-OGP, thus obtaining a contradiction. The Linial-Mansour-Nisan Theorem states that low-depth polynomial-size circuits are well approximated by the low-degree terms in their Fourier expansion. The version we use in this paper is found on page 93 in [O’D14]. The tightest known bound of this kind was obtained by Tal in [Tal17].
We believe that the approach used in this paper can be extended to many other models exhibiting OGP and its variants (see [GJW20] for a more thorough treatment of the subjects). However, additional analysis seems necessary for validating such extensions, since many such models are defined on sparse random graphs, and the variant of the Linial-Mansour-Nisan Theorem for sparse graph models (see for example Lemma 9 in [FJS91]) appears unfortunately too weak to rule out depths of order . Similarly, it is an open question whether OGP-based methods can be used to rule out decision-type (single output) low-depth circuits for these problems.
2 Models, Assumptions and Results
2.1 -spin model and its ground state
We begin by describing the optimization problem of finding the ground state value of a -spin model. Let denote a -tensor on with -valued entries. That is, and for each -tuple . For every , we let
The optimal value of the optimization problem
(1) |
is called the ground state energy of the -spin model with coupling . Any optimizer achieving the optimal value is called the ground state. We will assume that the entries of are i.i.d. equiprobable (that is, Rademacher) entries. In this setting, we denote the optimal value of (1) by , which is a random variable. The normalization by in the objective function is chosen so that the ground state energy is typically order 1. In particular, we recall here the following well-known concentration result.
Theorem 2.1.
For every , there exists such that for every
for some and all large enough .
The existence of the limit above follows from a rather simple interpolation argument based essentially on Gaussian comparison inequality techniques. This was shown first by Guerra and Toninelli [GT02]. The value of can be computed to any desired precision using the powerful Parisi variational representation as was done first by Parisi [Par80], and rigorously verified later by Talagrand [Tal06]. See Panchenko [Pan13] for a book treatment of the subject.
We now state the ensemble overlap gap property (e-OGP) exhibited by the -spin model above. For this purpose we need to consider an ensemble of random tensors constructed as follows. Let and be two independent copies of the random -spin model above with Rademacher entries constructed as follows. Let denote the (random) total order on , given by drawing points from that set uniformly at random without replacement. We call this the uniform at random order. Let be the point that was drawn at time . Starting with , let denote the random tensor obtained from , by resampling the entry indexed by according to the Rademacher distribution, independently. Define . Note that for each fixed , the values and are identically distributed (but not independent). We now formally state the e-OGP.
Theorem 2.2.
For every even , there exists and , and such that for all sufficiently large , with probability at least , the following holds:
-
(a)
For every and every pair satisfying for , it is the case
-
(b)
In the special case , in fact
The proof is given in Section 3.2. Informally, e-OGP states that for every pair of instances in the interpolated family, including the case of identical instances (that is ), every two solutions achieving multiplicative factor to optimality in those two instances are either close or far from each other. Furthermore, for the case of independent instances (that is ), only the latter is possible. Theorem 2.2 was proved in [GJ21] for the case of Gaussian as opposed to Rademacher distribution, but its adaptation to the latter case follows from standard universality-type arguments based on Lindeberg’s approach, which can be found in the original paper [GT02] which established the existence of the limit . For a recent work that covers many examples of such arguments see [Sen18] and for a textbook presentation of this and related spin glass results, see [Pan13].
The core of the proof in [GJ21] is the case of a single-instance (non-ensemble) OGP for the Gaussian case which was done in [CGPR19]. Its adaptation to the ensemble case follows in a rather straightforward way using the chaos property exhibited by the -spin models. Adaptation to non-Gaussian distribution, as we said, uses standard universality arguments.
2.2 Boolean circuits
We now turn to the discussion of Boolean circuits. Formal definitions of those can be found in most standard textbooks on algorithms and computation [AB09, Sip97, AS04]. Informally, these are functions from obtained by considering a directed graph with input nodes which have in-degree zero, and output nodes which have out-degree zero. Each intermediate node corresponds to one of three standard Boolean operation or . The size of the circuit is the number of nodes in the associated graph and its depth is the length of its longest directed path. Equivalently, one may consider instead functions from by adopting the appropriate logical functions in gates. We also adopt this assumption for convenience. In particular, when , such a Boolean function maps any -tensor with entries into an -vector with entries. Thus given an -output, -input Boolean circuit and -tensor with entries, we denote by the binary vector in produced when takes input . We denote by the single-output Boolean circuit associated with the -th component of so that .
As stated in the introduction, we are interested in Boolean circuits with bounds on their size and depth. Fix any two -dependent positive integer-valued sequences and a constant . We denote by the family of all -input, -output Boolean circuits which satisfy the following properties:
-
(a)
The size (the number of gates) of is at most and its depth is at most .
- (b)
In other words, this is a family of circuits which is required to produce a solution with value which is at least a multiplicative constant away from optimality when the value of the optimization problem is non-negative. Clearly a family of such circuits can be empty if either or is too restrictive.
We now state our main result.
Theorem 2.3.
For every even , and , for all sufficiently large , either the associated family of circuits is empty, or , or .
The result above essentially rules out poly-size circuits with the stated bound on depth that obtain a solution with value at least times the optimum value, where is any constant larger than . The proof is given in Section 3.1. The constant in the depth lower bound can be slightly improved, although it appears that it has to be larger than for our analysis to go through. We note that does not appear in the claimed lower bound on the circuit depth because the bound holds even for mildly super-polynomially sized circuits. We will not attempt to find the optimal growth rate.
3 Proofs
3.1 Proof of Theorem 2.3
We fix any and any circuit . Recall that the output of the Boolean circuit acting on any input is a vector in and thus has norm equal . Fix a constant . Consider any two tensors which differ in at most one entry. Namely, there exists such that and are identical at all multi-indices that are distinct from the -tuple . We say that the pair is -bad if
(2) |
Fix two independent copies and of a random -tensor with i.i.d. entries and consider the discrete interpolation associated with the statement of Theorem 2.2. The following “stability” result is a direct analogue of Theorem 4.2 in [GJW20] (but for circuits instead of low-degree polynomials).
Theorem 3.1.
The probability that no pair is -bad is at least
(3) |
for all large enough .
The constants will be subsumed by as we will see. Note that this probability converges to zero if is bounded away from zero. For our purposes, we will need the rate of convergence in this bound to be no faster than exponential. (In particular, this requirement will control our depth bound.)
The proof of Theorem 3.1 (which we include for completeness) is similar to Theorem 4.2 of [GJW20] but with two minor changes: first we observe that [GJW20] applies not just to functions of bounded degree but more generally to functions of bounded total influence; we then use the Linial-Mansour-Nisan Theorem to bound the total influence of any low-depth circuit.
Let’s first see how Theorem 3.1 implies the result.
Proof of Theorem 2.3.
We fix with value specified later. By Theorem 2.1, with probability at least , the objective values associated with instances are all at least . By Theorem 2.2, with probability at least the e-OGP holds with parameters . Call the intersection of these two events and assume the same constant for convenience, so that and the multiplier in the union bound is absorbed by making smaller. Fix any . Denote the event described in Theorem 3.1—namely the event that no pair is -bad—by . We claim that . Assuming this claim, we obtain
and the desired lower bound on depth is obtained by changing the constant from to in order to subsume the term depending on .
It remains to prove the claim . By way of contradiction, suppose that is non-empty. (Note that the event implies in particular that the value of the optimization problem (1) is non-negative for every .) By assumption, the circuit must produce solutions with value at least a factor of from optimality. On the other hand, on the event , we have that the optimal value is at least So if we choose small enough that , we have that the solutions yield an objective value above the onset of OGP, that is, .
On we have by part (b) of the e-OGP that . Let be the smallest index for which , and thus . (Note that this includes the possibility .) On , part (a) of the e-OGP applied in the case implies that in fact . By Cauchy–Schwarz,
and so, since ,
and thus is a -bad pair, contradicting the definition of the event . ∎
Before turning to the proof of Theorem 3.1, let us briefly recall the following notions from Boolean analysis/Fourier analysis on ; see e.g. [O’D14] for a reference. (In our setting, we will always work with the specific case .) Consider the standard Fourier expansion of functions on associated with the uniform measure on . The basis of this expansion are monomials of the form . For every function , the associated Fourier coefficients are
where the expectation is with respect to the uniform measure on . Then the Fourier expansion of is , and the Parseval (or Walsh) identity states that . For , let denote the Laplacian operator:
and let be the total influence
(4) |
Also note that
(5) |
where for , is obtained from by fixing the -th coordinate of to .
Proof of Theorem 3.1.
Fix with some constant . Let be the Fourier coefficients of the function , which we recall is the -component of . Fix a constant and let
(6) |
for all sufficiently large . We use the Linial-Mansour-Nisan Theorem [LMN93] (the version we take here is the version from page 93 in [O’D14]) which states for some universal constant the spectrum of a depth- circuit with size is -concentrated on degree . More precisely, in our context, it states that for each ,
Thus for the influence function we have for each
(7) |
where in the last equation we use the fact and thus .
Let be event that the pair is -bad (as defined in (2)). The following is similar to Lemma 4.3 in [GJW20].
Lemma 3.2.
The following holds for any fixed order on the entries of the tensor:
(8) |
As we have and we obtain that for all large enough ,
(9) |
For any fixed let be the probability of the event when we condition on . Namely is the probability of there being no -bad pairs when the initial tensor of the interpolation sequence is . The following is a special case of Lemma 4.4 in [GJW20]; we will include the proof for completeness.
Lemma 3.3.
The following bound holds for any fixed order on the entries of the tensor:
where is i.i.d.
We now combine these two results to complete the proof of Theorem 3.1. Note that the probability of no -bad pairs is
where the expectation is with respect to the starting sample and . By Lemma 3.3 we have
Applying (9) this is at most . Exponentiating, we obtain
Recalling bound (6) and since is a constant, we obtain the claim by increasing the exponent in (6) to . ∎
Proof of Lemma 3.2.
Using the inequality , the right-hand side is at least
Note that for any ,
Summing over we obtain
using (2). ∎
Proof of Lemma 3.3.
Fix any order with respect to which the sequence is generated. We refer to coordinates as coordinates associated with this order. In particular, coordinate is resampled when moving from to . Coordinate is resampled when moving from to , etc.
For every and tensor let be the probability that the truncated interpolation path does not contain any -bad pairs , when conditioned on . We claim the following stronger bound, from which the required claim follows as a special case : for every ,
The proof is by induction on . In the base case , the inequality above is in fact an equality with both sides equal to 0.
Assume the assertion holds for . We now establish it for . For any tensor let be the probability of the event that the interpolation path does not contain any -bad pairs , when conditioned on . Observe that by our inductive assumption we have
(10) |
For any tensor let be the tensor obtained from by forcing coordinate to be . We have
(11) |
Let be the event that changing the value of the coordinate is -bad. Note that this event is measurable (determined) by the realization of , and furthermore on the event that coordinate of (and therefore ) is different from one of , and otherwise. In particular, .
We claim
(12) |
We justify this identity as follows. With probability the coordinate of is , namely . Conditioned on this event we have . On the other hand, with probability coordinate of is . In this case no bad pair occurs if takes place and furthermore no bad pairs occur during the remaining resamplings, the probability of which is .
3.2 Proof of Theorem 2.2
The proof of Theorem 2.2 is similar to that in the Gaussian case after a coupling argument where we couple the path above to a discretized induced by correlated Rademacher random vectors. We explain here the key steps that are different. In the following for two mean zero random vectors we say that , if and are independent for and for each .
Let us begin with the following construction of which corresponds to said coupling. Fix so that is an integer, and let be . We now construct a sequence of correlated instances of Rademacher random vectors as follows. Let be an i.i.d. Ber(1/2) random vector. Now for each entry of , we resample it independently with probability and leave it as it was with probability , to obtain . Notice that . Now for those entries of that have not been resampled, resample them with probability . Repeat this process times to obtain a sequence of vectors each of which has i.i.d. Ber(1/2) entries and are such that if then . Furthermore, notice that as , in the last step of this sequence, all of the remaining entries are resampled (independently) with probability 1, so that is an independent copy of . Now for each , let be indices that have been resampled when going from to . We now define in the obvious way: if , then if . Within we let be the uniform at random order for . Note that is a partition of . Evidently the law of is that of the uniform at random order. Notice that if we construct in the corresponding way and let be such that and , then .
Now let us note the following continuity argument that allows us to prove the OGP statement for . To this end, let us note the following lemma whose proof we defer momentarily:
Lemma 3.4.
We have that
has
for some depending only on .
With this in hand, note that there are at most pairs of such , so that by a union bound and this lemma, we have that
From here, the result follows from the following theorem which is essentially Theorem 2.2 but for the -correlated instances. The proof of this theorem is found in [GJ21] for the case of Gaussian distribution of the entries of . Its extension for the case of Rademacher distribution is obtained by standard universality type arguments and is omitted. See for example [CH06, AC16, Sen18, CGPR19] for similar arguments.
Theorem 3.5.
Let be i.i.d. bounded random variables with
for some . Let be such that . Let denote the Hamiltonian corresponding to and denote that corresponding to . Then for every and , there exists such that with probability , for every satisfying we have that . If , the same result holds except we have that there are such that .
Proof of Lemma 3.4.
First, by symmetrization, it suffices to prove the same bound for . In this case, note that conditionally on , the map
is uniformly -Lipschitz. On the other hand, conditionally on , we have that is a Rademacher vector of length at most where are those indices resample between time and . Note that . Thus applying McDairmids inequality, we have that, conditionally on , this maximum,
has
Now, since is conditionally sub-gaussian, we may apply Talagrand’s comparison inequality [Ver18, Cor 8.6.2], to obtain
for some universal constant where is obtained from by replacing with gaussian random vectors correlated in the same fashion. By a standard application of Slepian’s inequality, we obtain,
where we have used here that the variance of is at most . Finally note that with probability , we have . Combining these bounds yields the desired result. ∎
Acknowledgements
The authors are very grateful to Benjamin Rossman for educating us on the state of the art bounds in circuit complexity. The first author gratefully acknowledges the support from the NSF grant DMS-2015517. The second author gratefully acknowledges support from NSERC; cette recherche a été financée en partie par CRSNG [RGPIN-2020-04597, DGECR-2020-00199]. The third author was partially supported by NSF grant DMS-1712730 and by the Simons Collaboration on Algorithms and Geometry.
References
- [AB09] Sanjeev Arora and Boaz Barak, Computational complexity: a modern approach, Cambridge University Press, 2009.
- [AC16] Antonio Auffinger and Wei-Kuo Chen, Universality of chaos and ultrametricity in mixed p-spin models, Communications on Pure and Applied Mathematics 69 (2016), no. 11, 2107–2130.
- [AMS20] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, arXiv preprint arXiv:2001.00904 (2020).
- [AS04] Noga Alon and Joel H Spencer, The probabilistic method, John Wiley & Sons, 2004.
- [CGPR19] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman, Suboptimality of local algorithms for a class of max-cut problems, The Annals of Probability 47 (2019), no. 3, 1587–1618.
- [CH06] Philippe Carmona and Yueyun Hu, Universality in Sherrington–Kirkpatrick’s spin glass model, Annales de l’Institut Henri Poincare (B) Probability and Statistics, vol. 42, Elsevier, 2006, pp. 215–222.
- [FJS91] Merrick L Furst, Jeffrey C Jackson, and Sean W Smith, Improved learning of AC0 functions, COLT, vol. 91, 1991, pp. 317–325.
- [GJ21] David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for -spin models, The Annals of Probability 49 (2021), no. 1, 180–205.
- [GJW20] David Gamarnik, Aukosh Jagannath, and Alexander S Wein, Low-degree hardness of random optimization problems, 61st Annual Symposium on Foundations of Computer Science, 2020.
- [GT02] F. Guerra and F. L. Toninelli, The thermodynamic limit in mean field spin glass models, Commun. Math. Phys. 230 (2002), 71–79.
- [Hås86] Johan Håstad, Almost optimal lower bounds for small depth circuits, Proceedings of the eighteenth annual ACM symposium on Theory of computing, 1986, pp. 6–20.
- [LMN93] Nathan Linial, Yishay Mansour, and Noam Nisan, Constant depth circuits, fourier transform, and learnability, Journal of the ACM (JACM) 40 (1993), no. 3, 607–620.
- [LRR17] Yuan Li, Alexander Razborov, and Benjamin Rossman, On the AC0 complexity of subgraph isomorphism, SIAM Journal on Computing 46 (2017), no. 3, 936–971.
- [Mon21] Andrea Montanari, Optimization of the Sherrington–Kirkpatrick hamiltonian, SIAM Journal on Computing (2021), FOCS19–1.
- [O’D14] Ryan O’Donnell, Analysis of boolean functions, Cambridge University Press, 2014.
- [Pan13] Dmitry Panchenko, The Sherrington-Kirkpatrick model, Springer Science & Business Media, 2013.
- [Par80] Giorgio Parisi, A sequence of approximated solutions to the SK model for spin glasses, Journal of Physics A: Mathematical and General 13 (1980), no. 4, L115.
- [Ros10] Benjamin Rossman, Average-case complexity of detecting cliques, Ph.D. thesis, Massachusetts Institute of Technology, 2010.
- [Ros18] , Lower bounds for subgraph isomorphism, Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, World Scientific, 2018, pp. 3425–3446.
- [Sen18] Subhabrata Sen, Optimization on sparse random hypergraphs and spin glasses, Random Structures & Algorithms 53 (2018), no. 3, 504–536.
- [Sip97] M. Sipser, Introduction to the theory of computability, PWS Publishing Company, 1997.
- [Sub21] Eliran Subag, Following the ground states of full-RSB spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021–1044.
- [Tal06] Michel Talagrand, The Parisi formula, Annals of mathematics (2006), 221–263.
- [Tal17] Avishay Tal, Tight bounds on the fourier spectrum of AC0, 32nd Computational Complexity Conference (CCC 2017), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
- [Ver18] Roman Vershynin, High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge University Press, 2018.