This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Circuit Lower Bounds for the p-Spin Optimization Problem

David Gamarnik Operations Research Center and Sloan School of Management, MIT. Email: [email protected]    Aukosh Jagannath Department of Statistics and Actuarial Science and Department of Applied Mathematics, University of Waterloo. Email: [email protected]    Alexander S. Wein Department of Mathematics, Courant Institute of Mathematical Sciences, New York University. Email: [email protected]
Abstract

We consider the problem of finding a near ground state of a pp-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors’ recent work [GJW20], we establish that any poly-size nn-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least logn/(2loglogn)\log n/(2\log\log n) as nn grows. This is stronger than the known state of the art bounds of the form Ω(logn/(k(n)loglogn))\Omega(\log n/(k(n)\log\log n)) for similar combinatorial optimization problems, where k(n)k(n) depends on the optimality value. For example, for the largest clique problem k(n)k(n) corresponds to the square of the size of the clique [Ros10]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work [GJW20], the approach is based on the overlap gap property (OGP) exhibited by random pp-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem.

To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity.

1 Introduction

Boolean circuits constitute one of the standard models for understanding algorithmic tractability and hardness in combinatorial optimization problems [AB09, Sip97, AS04]. One potential route to proving the widely-believed conjecture PNPP\neq NP would be to show non-existence of polynomial-size Boolean circuits solving problems in NPNP. A large body of literature in circuit complexity is devoted to establishing limits on the power of circuits with bounds on its depth, specifically the power of constant depth circuits known as AC[0] circuits and its immediate extension – circuits with nearly logarithmic (in problem size) depth. Many hardness results have been obtained by working on models with random inputs. A classical result is one by Håstad [Hås86] who established a logn/(loglogn+O(1))\log n/\left(\log\log n+O(1)\right) lower bound on the depth of any poly-size circuit that computes the nn-parity function. A lot of focus is on circuit complexity for computing various natural combinatorial optimization problems. For example Rossman [Ros10] has shown that poly-size circuits detecting the presence of a k(n)k(n)-size clique in a graph must have depth at least Ω(logn/(k2(n)loglogn))\Omega(\log n/(k^{2}(n)\log\log n)), where nn is the number of graph nodes and k(n)k(n) is any function growing in nn. This was obtained by working with the sparse random Erdös-Rényi graph model with the edge probability tuned so that the largest “naturally occurring” clique is of size smaller than k(n)k(n), with high probability as nn increases. In fact, for combinatorial optimization problems this represents the state of the art lower bound on circuit depth for polynomial-size circuits, though many extensions exist for subgraph homomorphism existence problems [Ros18, LRR17].

In this paper we establish a lower bound of logn/(2loglogn)\log n/(2\log\log n) on the depth of any polynomial-size Boolean circuit that solves another natural combinatorial optimization problem. It is a core computational problem in statistical physics of finding a near ground state of a p-spin model with i.i.d. Rademacher couplings. Specifically, we show that any polynomial-size nn-output Boolean circuit that produces a ±1\pm 1 spin assignment with objective value within a certain constant factor of optimality, has depth at least logn/(2loglogn)\log n/(2\log\log n). In particular, unlike the aforementioned result for cliques, our lower bound does not depend on the objective value. The result though, strictly speaking, does not improve upon [Ros10], since our circuits are supposed to produce the entire solution, as opposed to just solving the associated YES/NO decision problem. Nevertheless, to the best of our knowledge, our result presents the strongest known circuit depth lower bounds for combinatorial optimization problems.

Our result is obtained as a fairly direct extension of our recent work [GJW20], where limits are obtained for algorithms based on low-degree polynomials for the same problem (though with Gaussian as opposed to Rademacher couplings). The associated optimization problem admits a polynomial-time approximation scheme for the case p=2p=2 [Mon21] and also for some so-called mixed spin models [AMS20], as well as for some so-called spherical spin glass models [Sub21] associated with optimizing over the sphere as opposed to the binary cube. But as shown in [GJW20], algorithms based on low-degree polynomials fail in general for the problem of finding near ground states in pp-spin models.

As in our earlier paper we employ the overlap gap property (OGP) exhibited by this and many other combinatorial optimization and constraint satisfaction problems with random inputs. In the present context, the OGP says that every two spin assignments that achieve value within a certain multiplicative factor away from optimality have to either agree on many coordinates (high overlap) or disagree on many coordinates (low overlap). More specifically, as in the aforementioned reference, we employ the so-called ensemble overlap gap property (e-OGP), which we formally introduce in the next section. At the same time, employing standard circuit insensitivity results, specifically the Linial-Mansour-Nisan Theorem [LMN93], we show that vectors produced by the outputs of nn low-depth circuits can produce two spin assignments with overlap value ruled out by the e-OGP, thus obtaining a contradiction. The Linial-Mansour-Nisan Theorem states that low-depth polynomial-size circuits are well approximated by the low-degree terms in their Fourier expansion. The version we use in this paper is found on page 93 in [O’D14]. The tightest known bound of this kind was obtained by Tal in [Tal17].

We believe that the approach used in this paper can be extended to many other models exhibiting OGP and its variants (see [GJW20] for a more thorough treatment of the subjects). However, additional analysis seems necessary for validating such extensions, since many such models are defined on sparse random graphs, and the variant of the Linial-Mansour-Nisan Theorem for sparse graph models (see for example Lemma 9 in [FJS91]) appears unfortunately too weak to rule out depths of order logn/loglogn\log n/\log\log n. Similarly, it is an open question whether OGP-based methods can be used to rule out decision-type (single output) low-depth circuits for these problems.

2 Models, Assumptions and Results

2.1 pp-spin model and its ground state

We begin by describing the optimization problem of finding the ground state value of a pp-spin model. Let J{±1}npJ\in\{\pm 1\}^{n^{p}} denote a pp-tensor on n\mathbb{R}^{n} with {±1}\{\pm 1\}-valued entries. That is, J=(Ji1,,ip,1i1,i2,,ipn)J=\left(J_{i_{1},\ldots,i_{p}},1\leq i_{1},i_{2},\ldots,i_{p}\leq n\right) and Ji1,,ip=±1J_{i_{1},\ldots,i_{p}}=\pm 1 for each pp-tuple i1,,ipi_{1},\ldots,i_{p}. For every σ{±1}n\sigma\in\{\pm 1\}^{n}, we let

J,σ1i1,i2,,ipnJi1,i2,,ipσi1σi2σip.\displaystyle\langle J,\sigma\rangle\triangleq\sum_{1\leq i_{1},i_{2},\ldots,i_{p}\leq n}J_{i_{1},i_{2},\ldots,i_{p}}\sigma_{i_{1}}\sigma_{i_{2}}\cdots\sigma_{i_{p}}.

The optimal value of the optimization problem

np+12maxσ{±1}nJ,σ\displaystyle n^{-{p+1\over 2}}\max_{\sigma\in\{\pm 1\}^{n}}\langle J,\sigma\rangle (1)

is called the ground state energy of the pp-spin model with coupling JJ. Any optimizer σ\sigma achieving the optimal value is called the ground state. We will assume that the entries of JJ are i.i.d. equiprobable ±1\pm 1 (that is, Rademacher) entries. In this setting, we denote the optimal value of (1) by ηn,p\eta^{*}_{n,p}, which is a random variable. The normalization by np+12n^{p+1\over 2} in the objective function is chosen so that the ground state energy is typically order 1. In particular, we recall here the following well-known concentration result.

Theorem 2.1.

For every pp, there exists ηp>0\eta^{*}_{p}>0 such that for every ϵ>0\epsilon>0

(|ηn,pηp|ϵ)exp(γn),\displaystyle\mathbb{P}(|\eta^{*}_{n,p}-\eta^{*}_{p}|\geq\epsilon)\leq\exp(-\gamma n),

for some γ>0\gamma>0 and all large enough nn.

The existence of the limit above follows from a rather simple interpolation argument based essentially on Gaussian comparison inequality techniques. This was shown first by Guerra and Toninelli [GT02]. The value of ηp\eta^{*}_{p} can be computed to any desired precision using the powerful Parisi variational representation as was done first by Parisi [Par80], and rigorously verified later by Talagrand [Tal06]. See Panchenko [Pan13] for a book treatment of the subject.

We now state the ensemble overlap gap property (e-OGP) exhibited by the pp-spin model above. For this purpose we need to consider an ensemble of random tensors constructed as follows. Let JJ and J~\tilde{J} be two independent copies of the random pp-spin model above with Rademacher entries constructed as follows. Let R\leq_{R} denote the (random) total order on [np][n^{p}], given by drawing points from that set uniformly at random without replacement. We call this the uniform at random order. Let It[np]I_{t}\in[n^{p}] be the point that was drawn at time tt. Starting with J0=JJ_{0}=J, let JtJ_{t} denote the random tensor obtained from Jt1J_{t-1}, by resampling the entry indexed by ItI_{t} according to the Rademacher distribution, independently. Define J~=Jnp\tilde{J}=J_{n^{p}}. Note that for each fixed tt, the values Jt,JJ_{t},J and J~\tilde{J} are identically distributed (but not independent). We now formally state the e-OGP.

Theorem 2.2.

For every even p4p\geq 4, there exists 0<ηOGP,p<ηp0<\eta_{\rm OGP,p}<\eta_{p}^{*} and 0<ν1,p<ν2,p<10<\nu_{1,p}<\nu_{2,p}<1, and γ>0\gamma>0 such that for all sufficiently large nn, with probability at least 1exp(γn)1-\exp(-\gamma n), the following holds:

  1. (a)

    For every 0t1t2np0\leq t_{1}\leq t_{2}\leq n^{p} and every pair σ1,σ2{±1}n\sigma_{1},\sigma_{2}\in\{\pm 1\}^{n} satisfying Jtj,σjηOGP,pnp+12\langle J_{t_{j}},\sigma_{j}\rangle\geq\eta_{\rm OGP,p}n^{p+1\over 2} for j=1,2j=1,2, it is the case

    |σ1,σ2|n(ν1,p,ν2,p),\displaystyle{|\langle\sigma_{1},\sigma_{2}\rangle|\over n}\notin(\nu_{1,p},\nu_{2,p}),
  2. (b)

    In the special case t1=0,t2=npt_{1}=0,t_{2}=n^{p}, in fact

    |σ1,σ2|nν1,p.\displaystyle{|\langle\sigma_{1},\sigma_{2}\rangle|\over n}\leq\nu_{1,p}.

The proof is given in Section 3.2. Informally, e-OGP states that for every pair of instances in the interpolated family, including the case of identical instances (that is t1=t2t_{1}=t_{2}), every two solutions achieving multiplicative factor ηOGP,p/η\eta_{\rm OGP,p}/\eta^{*} to optimality in those two instances are either close or far from each other. Furthermore, for the case of independent instances (that is t1=0,t2=npt_{1}=0,t_{2}=n^{p}), only the latter is possible. Theorem 2.2 was proved in [GJ21] for the case of Gaussian as opposed to Rademacher distribution, but its adaptation to the latter case follows from standard universality-type arguments based on Lindeberg’s approach, which can be found in the original paper [GT02] which established the existence of the limit ηp\eta_{p}^{*}. For a recent work that covers many examples of such arguments see [Sen18] and for a textbook presentation of this and related spin glass results, see [Pan13].

The core of the proof in [GJ21] is the case of a single-instance (non-ensemble) OGP for the Gaussian case which was done in [CGPR19]. Its adaptation to the ensemble case follows in a rather straightforward way using the chaos property exhibited by the pp-spin models. Adaptation to non-Gaussian distribution, as we said, uses standard universality arguments.

2.2 Boolean circuits

We now turn to the discussion of Boolean circuits. Formal definitions of those can be found in most standard textbooks on algorithms and computation [AB09, Sip97, AS04]. Informally, these are functions from {0,1}M{0,1}n\{0,1\}^{M}\to\{0,1\}^{n} obtained by considering a directed graph with MM input nodes which have in-degree zero, and nn output nodes which have out-degree zero. Each intermediate node corresponds to one of three standard Boolean operation ,\vee,\wedge or ¬\neg. The size of the circuit is the number of nodes in the associated graph and its depth is the length of its longest directed path. Equivalently, one may consider instead functions from {±1}M{±1}n\{\pm 1\}^{M}\to\{\pm 1\}^{n} by adopting the appropriate logical functions in gates. We also adopt this assumption for convenience. In particular, when M=npM=n^{p}, such a Boolean function maps any pp-tensor with ±1\pm 1 entries into an nn-vector with ±1\pm 1 entries. Thus given an nn-output, M=npM=n^{p}-input Boolean circuit CC and pp-tensor JJ with ±1\pm 1 entries, we denote by C(J)C(J) the binary vector in {±1}n\{\pm 1\}^{n} produced when CC takes input JJ. We denote by CjC_{j} the single-output Boolean circuit associated with the jj-th component of CC so that C(J)=(Cj(J),1jn)C(J)=(C_{j}(J),1\leq j\leq n).

As stated in the introduction, we are interested in Boolean circuits with bounds on their size and depth. Fix any two nn-dependent positive integer-valued sequences s(n),d(n)s(n),d(n) and a constant 0<ρ<10<\rho<1. We denote by 𝒞p(n,s(n),d(n),ρ)\mathcal{C}_{p}(n,s(n),d(n),\rho) the family of all npn^{p}-input, nn-output Boolean circuits CC which satisfy the following properties:

  1. (a)

    The size (the number of gates) of CC is at most s(n)s(n) and its depth is at most d(n)d(n).

  2. (b)

    For every tensor J{±1}npJ\in\{\pm 1\}^{n^{p}} such that the value in (1) is non-negative, the output C(G)C(G) satisfies

    J,C(J)ρmaxσ{±1}nJ,σ.\displaystyle\langle J,C(J)\rangle\geq\rho\max_{\sigma\in\{\pm 1\}^{n}}\langle J,\sigma\rangle.

    If the value in (1) is negative, the output C(J)C(J) can be arbitrary.

In other words, this is a family of circuits which is required to produce a solution with value which is at least a multiplicative constant ρ\rho away from optimality when the value of the optimization problem is non-negative. Clearly a family of such circuits can be empty if either s(n)s(n) or d(n)d(n) is too restrictive.

We now state our main result.

Theorem 2.3.

For every even p4p\geq 4, α>0\alpha>0 and ρ>ηOGP,p/ηp\rho>\eta_{\rm OGP,p}/\eta^{*}_{p}, for all sufficiently large nn, either the associated family of circuits 𝒞p(n,s(n),d(n),ρ)\mathcal{C}_{p}(n,s(n),d(n),\rho) is empty, or s(n)nαs(n)\geq n^{\alpha}, or d(n)logn/(2loglogn)d(n)\geq\log n/(2\log\log n).

The result above essentially rules out poly-size circuits with the stated bound on depth that obtain a solution with value at least ρ\rho times the optimum value, where ρ\rho is any constant larger than ηOGP,p/ηp\eta_{\rm OGP,p}/\eta^{*}_{p}. The proof is given in Section 3.1. The constant 22 in the depth lower bound can be slightly improved, although it appears that it has to be larger than 11 for our analysis to go through. We note that α\alpha does not appear in the claimed lower bound on the circuit depth because the bound holds even for mildly super-polynomially sized circuits. We will not attempt to find the optimal growth rate.

3 Proofs

3.1 Proof of Theorem 2.3

We fix any α>0\alpha>0 and any circuit C𝒞p(n,nα,d(n),ρ)C\in\mathcal{C}_{p}(n,n^{\alpha},d(n),\rho). Recall that the output of the Boolean circuit CC acting on any input J{±1}npJ\in\{\pm 1\}^{n^{p}} is a vector in {±1}n\{\pm 1\}^{n} and thus has 22\|\cdot\|_{2}^{2} norm equal nn. Fix a constant κ>0\kappa>0. Consider any two tensors J1,J2{±1}npJ_{1},J_{2}\in\{\pm 1\}^{n^{p}} which differ in at most one entry. Namely, there exists 1i1,,ipn1\leq i_{1},\ldots,i_{p}\leq n such that J1J_{1} and J2J_{2} are identical at all multi-indices (i1,,ip)(i_{1}^{\prime},\ldots,i_{p}^{\prime}) that are distinct from the pp-tuple (i1,,ip)(i_{1},\ldots,i_{p}). We say that the pair (J1,J2)(J_{1},J_{2}) is κ\kappa-bad if

C(J1)C(J2)22κn.\|C(J_{1})-C(J_{2})\|_{2}^{2}\geq\kappa n. (2)

Fix two independent copies JJ and J~\tilde{J} of a random pp-tensor with i.i.d. ±1\pm 1 entries and consider the discrete interpolation J=J0,J1,J2,,Jnp=J~J=J_{0},J_{1},J_{2},\ldots,J_{n^{p}}=\tilde{J} associated with the statement of Theorem 2.2. The following “stability” result is a direct analogue of Theorem 4.2 in [GJW20] (but for circuits instead of low-degree polynomials).

Theorem 3.1.

The probability that no pair (Jt,Jt+1),0tnp1(J_{t},J_{t+1}),0\leq t\leq n^{p}-1 is κ\kappa-bad is at least

exp((logn)1.2d(n)),\displaystyle\exp\left(-\left(\log n\right)^{1.2d(n)}\right), (3)

for all large enough nn.

The constants κ,α,ηOGP,p\kappa,\alpha,\eta_{\rm OGP,p} will be subsumed by logn\log n as we will see. Note that this probability converges to zero if d(n)d(n) is bounded away from zero. For our purposes, we will need the rate of convergence in this bound to be no faster than exponential. (In particular, this requirement will control our depth bound.)

The proof of Theorem 3.1 (which we include for completeness) is similar to Theorem 4.2 of [GJW20] but with two minor changes: first we observe that [GJW20] applies not just to functions of bounded degree but more generally to functions of bounded total influence; we then use the Linial-Mansour-Nisan Theorem to bound the total influence of any low-depth circuit.

Let’s first see how Theorem 3.1 implies the result.

Proof of Theorem 2.3.

We fix ϵ>0\epsilon>0 with value specified later. By Theorem 2.1, with probability at least 1(np+1)exp(γn)1-(n^{p}+1)\exp(-\gamma n), the objective values associated with instances JtJ_{t} are all at least (1ϵ)ηpnp+12(1-\epsilon)\eta_{p}^{*}n^{p+1\over 2}. By Theorem 2.2, with probability at least 1exp(γn)1-\exp(-\gamma n) the e-OGP holds with parameters ηOGP,p,ν1,p<ν2,p\eta_{\rm OGP,p},~{}\nu_{1,p}<\nu_{2,p}. Call the intersection of these two events 𝒜\mathcal{A} and assume the same constant γ\gamma for convenience, so that (𝒜)1exp(γn)\mathbb{P}(\mathcal{A})\geq 1-\exp(-\gamma n) and the multiplier npn^{p} in the union bound is absorbed by making γ\gamma smaller. Fix any κ(ν2,pν1,p)2\kappa\leq(\nu_{2,p}-\nu_{1,p})^{2}. Denote the event described in Theorem 3.1—namely the event that no pair (Jt,Jt+1),0tnp1(J_{t},J_{t+1}),0\leq t\leq n^{p}-1 is κ\kappa-bad—by \mathcal{B}. We claim that 𝒜c\mathcal{B}\subset\mathcal{A}^{c}. Assuming this claim, we obtain

exp((logn)1.2d(n))()(𝒜c)exp(γn),\displaystyle\exp\left(-\left(\log n\right)^{1.2d(n)}\right)\leq\mathbb{P}(\mathcal{B})\leq\mathbb{P}(\mathcal{A}^{c})\leq\exp(-\gamma n),

and the desired lower bound on depth d(n)d(n) is obtained by changing the constant from 1.21.2 to 22 in order to subsume the term depending on γ\gamma.

It remains to prove the claim 𝒜c\mathcal{B}\subset\mathcal{A}^{c}. By way of contradiction, suppose that 𝒜\mathcal{B}\cap\mathcal{A} is non-empty. (Note that the event 𝒜\mathcal{A} implies in particular that the value of the optimization problem (1) is non-negative for every JtJ_{t}.) By assumption, the circuit CC must produce solutions C(Jt),1tnpC(J_{t}),1\leq t\leq n^{p} with value at least a factor of ρ\rho from optimality. On the other hand, on the event 𝒜\mathcal{A}, we have that the optimal value is at least (1ϵ)ηpnp+12.(1-\epsilon)\eta_{p}^{*}n^{p+1\over 2}. So if we choose ϵ\epsilon small enough that ρ(1ϵ)ηp>ηOGP,p\rho(1-\epsilon)\eta_{p}^{*}>\eta_{{\rm OGP},p}, we have that the solutions C(Jt)C(J_{t}) yield an objective value above the onset of OGP, that is, Jt,C(Jt)ηOGP,pnp+12\langle J_{t},C(J_{t})\rangle\geq\eta_{\rm OGP,p}n^{p+1\over 2}.

On 𝒜\mathcal{A} we have by part (b) of the e-OGP that C(J0),C(Jnp)ν1,pn\langle C(J_{0}),C(J_{n^{p}})\rangle\leq\nu_{1,p}n. Let 1t0np1\leq t_{0}\leq n^{p} be the smallest index for which C(J0),C(Jt0)ν1,pn\langle C(J_{0}),C(J_{t_{0}})\rangle\leq\nu_{1,p}n, and thus C(J0),C(Jt01)>ν1,pn\langle C(J_{0}),C(J_{t_{0}-1})\rangle>\nu_{1,p}n. (Note that this includes the possibility t0=1t_{0}=1.) On 𝒜\mathcal{A}, part (a) of the e-OGP applied in the case t1=0,t2=t01t_{1}=0,t_{2}=t_{0}-1 implies that in fact C(J0),C(Jt01)ν2,pn\langle C(J_{0}),C(J_{t_{0}-1})\rangle\geq\nu_{2,p}n. By Cauchy–Schwarz,

C(J0)2C(Jt01)C(Jt0)2\displaystyle\|C(J_{0})\|_{2}\cdot\|C(J_{t_{0}-1})-C(J_{t_{0}})\|_{2} |C(J0),C(Jt01)C(Jt0)|\displaystyle\geq|\langle C(J_{0}),C(J_{t_{0}-1})-C(J_{t_{0}})\rangle|
=|C(J0),C(Jt01)C(J0),C(Jt0)|\displaystyle=|\langle C(J_{0}),C(J_{t_{0}-1})\rangle-\langle C(J_{0}),C(J_{t_{0}})\rangle|
ν2,pnν1,pn\displaystyle\geq\nu_{2,p}n-\nu_{1,p}n

and so, since C(J0)2=n\|C(J_{0})\|_{2}=\sqrt{n},

C(Jt01)C(Jt0)22(ν2,pν1,p)2nκn,\|C(J_{t_{0}-1})-C(J_{t_{0}})\|_{2}^{2}\geq(\nu_{2,p}-\nu_{1,p})^{2}n\geq\kappa n,

and thus (Jt01,Jt0)(J_{t_{0}-1},J_{t_{0}}) is a κ\kappa-bad pair, contradicting the definition of the event \mathcal{B}. ∎

Before turning to the proof of Theorem 3.1, let us briefly recall the following notions from Boolean analysis/Fourier analysis on {±1}m\{\pm 1\}^{m}; see e.g. [O’D14] for a reference. (In our setting, we will always work with the specific case m=npm=n^{p}.) Consider the standard Fourier expansion of functions on {±1}m\{\pm 1\}^{m} associated with the uniform measure on {±1}m\{\pm 1\}^{m}. The basis of this expansion are monomials of the form xSiSxi,S[m]x_{S}\triangleq\prod_{i\in S}x_{i},\,S\subset[m]. For every function g:{±1}mg:\{\pm 1\}^{m}\to\mathbb{R}, the associated Fourier coefficients are

g^S=𝔼[g(x)xS],S[m],\displaystyle\hat{g}_{S}=\mathbb{E}\!\left[g(x)x_{S}\right],\qquad S\subset[m],

where the expectation is with respect to the uniform measure on x=(x1,,xm){±1}mx=(x_{1},\ldots,x_{m})\in\{\pm 1\}^{m}. Then the Fourier expansion of gg is g=S[m]g^SxSg=\sum_{S\subset[m]}\hat{g}_{S}x_{S}, and the Parseval (or Walsh) identity states that Sg^S2=𝔼[g(x)2]\sum_{S}\hat{g}_{S}^{2}=\mathbb{E}\!\left[g(x)^{2}\right]. For i[m]i\in[m], let LiL_{i} denote the Laplacian operator:

Lig(x)=Sig^SxS,\displaystyle L_{i}g(x)=\sum_{S\ni i}\hat{g}_{S}x_{S},

and let I(g)I(g) be the total influence

I(g)=i[m]𝔼[Lig(x)2]=S[m]|S|g^S2.\displaystyle I(g)=\sum_{i\in[m]}\mathbb{E}\!\left[L_{i}g(x)^{2}\right]=\sum_{S\subset[m]}|S|\,\hat{g}_{S}^{2}. (4)

Also note that

𝔼[Lig(x)2]=12(𝔼[Lig(xi,1)2]+𝔼[Lig(xi,1)2])\displaystyle\mathbb{E}\!\left[L_{i}g(x)^{2}\right]={1\over 2}\left(\mathbb{E}\!\left[L_{i}g(x_{-i,-1})^{2}\right]+\mathbb{E}\!\left[L_{i}g(x_{-i,1})^{2}\right]\right) (5)

where for =±1\ell=\pm 1, xi,{±1}mx_{-i,\ell}\in\{\pm 1\}^{m} is obtained from xx by fixing the ii-th coordinate of xx to \ell.

Proof of Theorem 3.1.

Fix δn=nβ\delta_{n}=n^{-\beta} with some constant β>p\beta>p. Let C^S,j,S[np]\hat{C}_{S,j},S\subset[n^{p}] be the Fourier coefficients of the function CjC_{j}, which we recall is the jj-component of CC. Fix a constant c>0c>0 and let

Dn\displaystyle D_{n} =log(1/δn)(clogs(n)δn)d(n)1\displaystyle=\log(1/\delta_{n})\left(c\log{s(n)\over\delta_{n}}\right)^{d(n)-1}
=βlogn(c(α+β)logn)d(n)1\displaystyle=\beta\log n\left(c(\alpha+\beta)\log n\right)^{d(n)-1}
(logn)1.1d(n)\displaystyle\leq\left(\log n\right)^{1.1d(n)} (6)

for all sufficiently large nn. We use the Linial-Mansour-Nisan Theorem [LMN93] (the version we take here is the version from page 93 in [O’D14]) which states for some universal constant cc the spectrum of a depth-d(n)d(n) circuit with size s(n)s(n) is δn\delta_{n}-concentrated on degree Dn\leq D_{n}. More precisely, in our context, it states that for each j=1,2,,nj=1,2,\ldots,n,

S[np]:|S|>Dn(C^S,j)2δn.\displaystyle\sum_{S\subset[n^{p}]\,:\,|S|>D_{n}}\left(\hat{C}_{S,j}\right)^{2}\leq\delta_{n}.

Thus for the influence function we have for each jj

I(Cj)\displaystyle I(C_{j}) =S[np]|S|(C^S,j)2\displaystyle=\sum_{S\subset[n^{p}]}|S|\left(\hat{C}_{S,j}\right)^{2}
S:|S|Dn|S|(C^S,j)2+npδn\displaystyle\leq\sum_{S\,:\,|S|\leq D_{n}}|S|\left(\hat{C}_{S,j}\right)^{2}+n^{p}\delta_{n}
DnS:|S|Dn(C^S,j)2+npδn\displaystyle\leq D_{n}\sum_{S\,:\,|S|\leq D_{n}}\left(\hat{C}_{S,j}\right)^{2}+n^{p}\delta_{n}
DnS(C^S,j)2+npδn\displaystyle\leq D_{n}\sum_{S}\left(\hat{C}_{S,j}\right)^{2}+n^{p}\delta_{n}
=Dn+npδn,\displaystyle=D_{n}+n^{p}\delta_{n}, (7)

where in the last equation we use the fact Cj=±1C_{j}=\pm 1 and thus 1=𝔼[Cj2]=SC^S,j21=\mathbb{E}\!\left[C_{j}^{2}\right]=\sum_{S}\hat{C}_{S,j}^{2}.

Let t\mathcal{E}_{t} be event that the pair (Jt,Jt+1)(J_{t},J_{t+1}) is κ\kappa-bad (as defined in (2)). The following is similar to Lemma 4.3 in [GJW20].

Lemma 3.2.

The following holds for any fixed order R\leq_{R} on the npn^{p} entries of the tensor:

κ40tnp1(t)Dn+npδn.\displaystyle{\kappa\over 4}\sum_{0\leq t\leq n^{p}-1}\mathbb{P}(\mathcal{E}_{t})\leq D_{n}+n^{p}\delta_{n}. (8)

As β>p\beta>p we have npδn=o(1)n^{p}\delta_{n}=o(1) and we obtain that for all large enough nn,

κ40tnp1(t)2Dn.\displaystyle{\kappa\over 4}\sum_{0\leq t\leq n^{p}-1}\mathbb{P}(\mathcal{E}_{t})\leq 2D_{n}. (9)

For any fixed x{±1}npx\in\{\pm 1\}^{n^{p}} let q(x)q(x) be the probability of the event ttc\cap_{t}\mathcal{E}_{t}^{c} when we condition on J0=xJ_{0}=x. Namely q(x)q(x) is the probability of there being no κ\kappa-bad pairs when the initial tensor of the interpolation sequence is xx. The following is a special case of Lemma 4.4 in [GJW20]; we will include the proof for completeness.

Lemma 3.3.

The following bound holds for any fixed order R\leq_{R} on the npn^{p} entries of the tensor:

𝔼[logq(J)]2log20tnp1(t),\displaystyle-\mathbb{E}\!\left[\log q(J)\right]\leq 2\log 2\sum_{0\leq t\leq n^{p}-1}\mathbb{P}(\mathcal{E}_{t}),

where J{±1}npJ\in\{\pm 1\}^{n^{p}} is i.i.d.

We now combine these two results to complete the proof of Theorem 3.1. Note that the probability of no κ\kappa-bad pairs is

(ttc)=𝔼[q(J)],\displaystyle\mathbb{P}\left(\cap_{t}\mathcal{E}^{c}_{t}\right)=\mathbb{E}\!\left[q(J)\right],

where the expectation is with respect to the starting sample J=J0J=J_{0} and R\leq_{R}. By Lemma 3.3 we have

log𝔼[q(J)]𝔼[logq(J)]2log20tnp1(t)\displaystyle-\log\mathbb{E}\!\left[q(J)\right]\leq-\mathbb{E}\!\left[\log q(J)\right]\leq 2\log 2\sum_{0\leq t\leq n^{p}-1}\mathbb{P}(\mathcal{E}_{t})

Applying (9) this is at most 16(log2)Dn/κ16(\log 2)D_{n}/\kappa. Exponentiating, we obtain

(ttc)exp(16(log2)Dn/κ).\displaystyle\mathbb{P}\left(\cap_{t}\mathcal{E}^{c}_{t}\right)\geq\exp\left(-16(\log 2)D_{n}/\kappa\right).

Recalling bound (6) and since κ\kappa is a constant, we obtain the claim by increasing the exponent 1.1d(n)1.1d(n) in (6) to 1.2d(n)1.2d(n). ∎

Proof of Lemma 3.2.

Combining (7),(4) and (5) we obtain for each jj,

Dn+npδni[np]12(𝔼[LiCj(xi,1)2+LiCj(xi,1)2]).\displaystyle D_{n}+n^{p}\delta_{n}\geq\sum_{i\in[n^{p}]}{1\over 2}\left(\mathbb{E}\!\left[L_{i}C_{j}(x_{-i,-1})^{2}+L_{i}C_{j}(x_{-i,1})^{2}\right]\right).

Using the inequality a2+b212(ab)2a^{2}+b^{2}\geq\frac{1}{2}(a-b)^{2}, the right-hand side is at least

14i[np]𝔼[(LiCj(xi,1)LiCj(xi,1))2].\displaystyle{1\over 4}\sum_{i\in[n^{p}]}\mathbb{E}\!\left[\left(L_{i}C_{j}(x_{-i,-1})-L_{i}C_{j}(x_{-i,1})\right)^{2}\right].

Note that for any gg,

Lig(xi,1)Lig(xi,1)=g(xi,1)g(xi,1).\displaystyle L_{i}g(x_{-i,-1})-L_{i}g(x_{-i,1})=g(x_{-i,-1})-g(x_{-i,1}).

Summing over jj we obtain

(Dn+npδn)n\displaystyle(D_{n}+n^{p}\delta_{n})n 14i[np]1jn𝔼[(Cj(xi,1)Cj(xi,1))2]\displaystyle\geq{1\over 4}\sum_{i\in[n^{p}]}\sum_{1\leq j\leq n}\mathbb{E}\!\left[\left(C_{j}(x_{-i,-1})-C_{j}(x_{-i,1})\right)^{2}\right]
=14i[np]𝔼[C(xi,1)C(xi,1)22]\displaystyle={1\over 4}\sum_{i\in[n^{p}]}\mathbb{E}\!\left[\|C(x_{-i,-1})-C(x_{-i,1})\|_{2}^{2}\right]
14t[np]κn(t),\displaystyle\geq{1\over 4}\sum_{t\in[n^{p}]}\kappa n\,\mathbb{P}(\mathcal{E}_{t}),

using (2). ∎

Proof of Lemma 3.3.

Fix any order R\leq_{R} with respect to which the sequence J0,J1,,JnpJ_{0},J_{1},\ldots,J_{n^{p}} is generated. We refer to coordinates 1,2,,np1,2,\ldots,n^{p} as coordinates associated with this order. In particular, coordinate 11 is resampled when moving from J0J_{0} to J1J_{1}. Coordinate 22 is resampled when moving from J1J_{1} to J2J_{2}, etc.

For every 0mnp0\leq m\leq n^{p} and tensor x{±}npx\in\{\pm\}^{n^{p}} let qm(x)q_{m}(x) be the probability that the truncated interpolation path J0,J1,,JmJ_{0},J_{1},\ldots,J_{m} does not contain any κ\kappa-bad pairs (Jt,Jt+1),tm1(J_{t},J_{t+1}),t\leq m-1, when conditioned on J0=xJ_{0}=x. We claim the following stronger bound, from which the required claim follows as a special case m=npm=n^{p}: for every m=0,1,,npm=0,1,\ldots,n^{p},

𝔼[logqm(J0)]2log20tm1(t).\displaystyle-\mathbb{E}\!\left[\log q_{m}(J_{0})\right]\leq 2\log 2\sum_{0\leq t\leq m-1}\mathbb{P}(\mathcal{E}_{t}).

The proof is by induction on mm. In the base case m=0m=0, the inequality above is in fact an equality with both sides equal to 0.

Assume the assertion holds for mm1m^{\prime}\leq m-1. We now establish it for mm. For any tensor x{±}npx\in\{\pm\}^{n^{p}} let q1,m(x)q_{1,m}(x) be the probability of the event that the interpolation path J1,J2,,JmJ_{1},J_{2},\ldots,J_{m} does not contain any κ\kappa-bad pairs (Jt,Jt+1),1tm1(J_{t},J_{t+1}),1\leq t\leq m-1, when conditioned on J1=xJ_{1}=x. Observe that by our inductive assumption we have

𝔼[logq1,m(J1)]2log21tm1(t).\displaystyle-\mathbb{E}\!\left[\log q_{1,m}(J_{1})\right]\leq 2\log 2\sum_{1\leq t\leq m-1}\mathbb{P}(\mathcal{E}_{t}). (10)

For any tensor x{±}npx\in\{\pm\}^{n^{p}} let x±x_{\pm} be the tensor obtained from xx by forcing coordinate 11 to be ±\pm. We have

𝔼[logqm(J0)]=𝔼[(1/2)logqm(J0,+)+(1/2)logqm(J0,)].\displaystyle-\mathbb{E}\!\left[\log q_{m}(J_{0})\right]=-\mathbb{E}\!\left[(1/2)\log q_{m}(J_{0,+})+(1/2)\log q_{m}(J_{0,-})\right]. (11)

Let \mathcal{F} be the event that changing the value of the coordinate 11 is κ\kappa-bad. Note that this event is measurable (determined) by the realization of J0J_{0}, and furthermore 0=\mathcal{E}_{0}=\mathcal{F} on the event that coordinate 11 of J1J_{1} (and therefore J2,,JmJ_{2},\ldots,J_{m}) is different from one of J0J_{0}, and 0=\mathcal{E}_{0}=\emptyset otherwise. In particular, (0)=(1/2)()\mathbb{P}(\mathcal{E}_{0})=(1/2)\mathbb{P}(\mathcal{F}).

We claim

qm(J0,+)=(1/2)q1,m(J1,+)+(1/2)q1,m(J1,)𝟏(c).\displaystyle q_{m}(J_{0,+})=(1/2)q_{1,m}(J_{1,+})+(1/2)q_{1,m}(J_{1,-})\mbox{\boldmath$1$}(\mathcal{F}^{c}). (12)

We justify this identity as follows. With probability 1/21/2 the coordinate 11 of J1J_{1} is +1+1, namely J1=J1,+J_{1}=J_{1,+}. Conditioned on this event we have qm(J0,+)=q1,m(J1,+)q_{m}(J_{0,+})=q_{1,m}(J_{1,+}). On the other hand, with probability 1/21/2 coordinate 11 of J1J_{1} is 1-1. In this case no bad pair occurs if c\mathcal{F}^{c} takes place and furthermore no bad pairs occur during the remaining m1m-1 resamplings, the probability of which is q1,m(J1,)q_{1,m}(J_{1,-}).

Similarly,

qm(J0,)=(1/2)q1,m(J1,)+(1/2)q1,m(J1,+)𝟏(c).\displaystyle q_{m}(J_{0,-})=(1/2)q_{1,m}(J_{1,-})+(1/2)q_{1,m}(J_{1,+})\mbox{\boldmath$1$}(\mathcal{F}^{c}). (13)

If the event c\mathcal{F}^{c} occurs, the right-hand sides of the expressions (12) and (13) are identical, so on this event we obtain by the concavity of log\log

(1/2)logqm(J0,+)+(1/2)logqm(J0,)\displaystyle(1/2)\log q_{m}(J_{0,+})+(1/2)\log q_{m}(J_{0,-}) =log((1/2)q1,m(J1,+)+(1/2)q1,m(J1,))\displaystyle=\log\left((1/2)q_{1,m}(J_{1,+})+(1/2)q_{1,m}(J_{1,-})\right)
(1/2)log(q1,m(J1,+))+(1/2)log(q1,m(J1,)).\displaystyle\geq(1/2)\log\left(q_{1,m}(J_{1,+})\right)+(1/2)\log\left(q_{1,m}(J_{1,-})\right).

On the other hand, if the event \mathcal{F} occurs then

logqm(J0,+)=log((1/2)q1,m(J1,+)),\displaystyle\log q_{m}(J_{0,+})=\log\left((1/2)q_{1,m}(J_{1,+})\right),

and

logqm(J0,)=log((1/2)q1,m(J0,)).\displaystyle\log q_{m}(J_{0,-})=\log\left((1/2)q_{1,m}(J_{0,-})\right).

In this case

(1/2)logqm(J0,+)+(1/2)logqm(J0,)\displaystyle(1/2)\log q_{m}(J_{0,+})+(1/2)\log q_{m}(J_{0,-}) =(1/2)log((1/2)q1,m(J1,+))+(1/2)log((1/2)q1,m(J1,))\displaystyle=(1/2)\log\left((1/2)q_{1,m}(J_{1,+})\right)+(1/2)\log\left((1/2)q_{1,m}(J_{1,-})\right)
=log2+(1/2)log(q1,m(J1,+))+(1/2)log(q1,m(J1,)).\displaystyle=-\log 2+(1/2)\log\left(q_{1,m}(J_{1,+})\right)+(1/2)\log\left(q_{1,m}(J_{1,-})\right).

Combining, we obtain

(1/2)logqm(J0,+)+(1/2)logqm(J0,)\displaystyle(1/2)\log q_{m}(J_{0,+})+(1/2)\log q_{m}(J_{0,-})
(log2)𝟏()+(1/2)log(q1,m(J1,+))+(1/2)log(q1,m(J1,)).\displaystyle\geq-(\log 2)\mbox{\boldmath$1$}(\mathcal{F})+(1/2)\log\left(q_{1,m}(J_{1,+})\right)+(1/2)\log\left(q_{1,m}(J_{1,-})\right).

Applying (11) we obtain

𝔼[logqm(J0)]\displaystyle-\mathbb{E}\!\left[\log q_{m}(J_{0})\right] (log2)()(1/2)𝔼[logq1,m(J1,+)](1/2)𝔼[logq1,m(J1,)]\displaystyle\leq(\log 2)\mathbb{P}(\mathcal{F})-(1/2)\mathbb{E}\!\left[\log q_{1,m}(J_{1,+})\right]-(1/2)\mathbb{E}\!\left[\log q_{1,m}(J_{1,-})\right]
=(log2)()𝔼[logq1,m(J1)].\displaystyle=(\log 2)\mathbb{P}(\mathcal{F})-\mathbb{E}\!\left[\log q_{1,m}(J_{1})\right].

Recalling ()=2(0)\mathbb{P}(\mathcal{F})=2\mathbb{P}(\mathcal{E}_{0}) and applying (10) we obtain the claim. ∎

3.2 Proof of Theorem 2.2

The proof of Theorem 2.2 is similar to that in the Gaussian case after a coupling argument where we couple the path JtJ_{t} above to a discretized J^t\hat{J}_{t} induced by correlated Rademacher random vectors. We explain here the key steps that are different. In the following for two mean zero random vectors we say that XρYX\sim_{\rho}Y, if (Xi,Yi)(X_{i},Y_{i}) and (Xj,Yj)(X_{j},Y_{j}) are independent for iji\neq j and 𝔼[XiYi]=ρ\mathbb{E}\!\left[X_{i}Y_{i}\right]=\rho for each ii.

Let us begin with the following construction of R\leq_{R} which corresponds to said coupling. Fix δ>0\delta>0 so that 1/δ1/\delta is an integer, and let 0=ρ0<ρ1<<ρ1δ=10=\rho_{0}<\rho_{1}<\ldots<\rho_{\frac{1}{\delta}}=1 be ρk=kδ\rho_{k}=k\delta. We now construct a sequence of correlated instances of Rademacher random vectors as follows. Let J^0\hat{J}_{0} be an i.i.d. Ber(1/2) random vector. Now for each entry of J^0\hat{J}_{0}, we resample it independently with probability ρ1\rho_{1} and leave it as it was with probability 1ρ11-\rho_{1}, to obtain J^1\hat{J}_{1}. Notice that J^11ρ1J^0\hat{J}_{1}\sim_{1-\rho_{1}}\hat{J}_{0}. Now for those entries of J^1\hat{J}_{1} that have not been resampled, resample them with probability (ρ2ρ1)/(1ρ1)(\rho_{2}-\rho_{1})/(1-\rho_{1}). Repeat this process 1/δ1/\delta times to obtain a sequence J^k\hat{J}_{k} of vectors each of which has i.i.d. Ber(1/2) entries and are such that if ρk>ρ\rho_{k}>\rho_{\ell} then J^k1(ρkρ)J^\hat{J}_{k}\sim_{1-(\rho_{k}-\rho_{\ell})}\hat{J}_{\ell}. Furthermore, notice that as ρ1/δ=1\rho_{1/\delta}=1, in the last step of this sequence, all of the remaining entries are resampled (independently) with probability 1, so that J^1/δ\hat{J}_{1/\delta} is an independent copy of J^0\hat{J}_{0}. Now for each k=1,,1/δk=1,\ldots,1/\delta, let k{\cal I}_{k} be indices that have been resampled when going from J^k1\hat{J}_{k-1} to J^k\hat{J}_{k}. We now define R\leq_{R} in the obvious way: if Ik,II\in{\cal I}_{k},I^{\prime}\in{\cal I}_{\ell}, then IRII\leq_{R}I^{\prime} if kk\leq\ell. Within k{\cal I}_{k} we let R\leq_{R} be the uniform at random order for k{\cal I}_{k}. Note that {k}\{{\cal I}_{k}\} is a partition of [np][n^{p}]. Evidently the law of R\leq_{R} is that of the uniform at random order. Notice that if we construct (Jk)(J_{k}) in the corresponding way and let τk\tau_{k} be such that τ0=0\tau_{0}=0 and τkτk1=|k|\tau_{k}-\tau_{k-1}=\lvert\mathcal{I}_{k}\rvert, then Jτk=J^kJ_{\tau_{k}}=\hat{J}_{k}.

Now let us note the following continuity argument that allows us to prove the OGP statement for (Jτk)(J_{\tau_{k}}). To this end, let us note the following lemma whose proof we defer momentarily:

Lemma 3.4.

We have that

Ht,s(σ)=1np12(Jt,σJs,σ)H_{t,s}(\sigma)=\frac{1}{n^{\frac{p-1}{2}}}(\langle{J_{t},\sigma}\rangle-\langle{J_{s},\sigma}\rangle)

has

P(maxσ|Ht,s(σ)|Ktsn)Cecn,P(\max_{\sigma}\lvert H_{t,s}(\sigma)\rvert\geq K\cdot\sqrt{t-s}n)\leq Ce^{-cn},

for some K,C,c>0K,C,c>0 depending only on pp.

With this in hand, note that there are at most O(n2p)O(n^{2p}) pairs of such t,st,s, so that by a union bound and this lemma, we have that

maxk1δmaxτk1tsτk|Ht,s(σ)|Kδn.\max_{k\leq\frac{1}{\delta}}\max_{\tau_{k-1}\leq t\leq s\leq\tau_{k}}\lvert H_{t,s}(\sigma)\rvert\leq K\sqrt{\delta}n.

From here, the result follows from the following theorem which is essentially Theorem 2.2 but for the ρ\rho-correlated instances. The proof of this theorem is found in [GJ21] for the case of Gaussian distribution of the entries of JJ. Its extension for the case of Rademacher distribution is obtained by standard universality type arguments and is omitted. See for example [CH06, AC16, Sen18, CGPR19] for similar arguments.

Theorem 3.5.

Let (XI)(X_{I}) be i.i.d. bounded random variables with

𝔼[XI]=0𝔼[XI2]=1XI<M\mathbb{E}\!\left[X_{I}\right]=0\qquad\mathbb{E}\!\left[X_{I}^{2}\right]=1\qquad\|X_{I}\|_{\infty}<M

for some M>0M>0. Let (XIρ)(X^{\rho}_{I}) be such that XρρXX^{\rho}\sim_{\rho}X. Let H(σ)H(\sigma) denote the Hamiltonian corresponding to XX and HρH_{\rho} denote that corresponding to XρX^{\rho}. Then for every ϵ>0\epsilon>0 and ρ(0,1)\rho\in(0,1), there exists C,μ~>0C,\tilde{\mu}>0 such that with probability 1exp(Cn)/C1-\exp(-Cn)/C, for every σ1,σ2{±1}n\sigma_{1},\sigma_{2}\in\{\pm 1\}^{n} satisfying H(σ1)/n𝔼[ηn]μ~,Hρ(σ2)/n𝔼[ηn]μ~H(\sigma_{1})/n\geq\mathbb{E}\!\left[\eta_{n}\right]-\tilde{\mu},H^{\rho}(\sigma_{2})/n\geq\mathbb{E}\!\left[\eta_{n}\right]-\tilde{\mu} we have that |σ1,σ2|nϵ\lvert\langle\sigma_{1},\sigma_{2}\rangle\rvert\leq n\epsilon. If ρ=1\rho=1, the same result holds except we have that there are 0a<b10\leq a<b\leq 1 such that |σ1,σ2|[0,a][b,1]\lvert\langle\sigma_{1},\sigma_{2}\rangle\rvert\leq[0,a]\cup[b,1].

Proof of Lemma 3.4.

First, by symmetrization, it suffices to prove the same bound for Ht,s(σ)H_{t,s}(\sigma). In this case, note that conditionally on R\leq_{R}, the map

J^tJ^s1nmaxσHt,s(σ)\hat{J}_{t}-\hat{J}_{s}\mapsto\frac{1}{n}\max_{\sigma}H_{t,s}(\sigma)

is uniformly 2p|τkτs|/np+1\sqrt{2^{p}\lvert\tau_{k}-\tau_{s}\rvert/n^{p+1}}-Lipschitz. On the other hand, conditionally on R\leq_{R}, we have that Y=JtJsY=J_{t}-J_{s} is a Rademacher vector of length at most t,s\mathcal{I}_{t,s} where t,s\mathcal{I}_{t,s} are those indices resample between time tt and ss. Note that |t,s|=|τkτs|\lvert\mathcal{I}_{t,s}\rvert=\lvert\tau_{k}-\tau_{s}\rvert. Thus applying McDairmids inequality, we have that, conditionally on R\leq_{R}, this maximum,

M=1nmaxσHt,s(σ)M=\frac{1}{n}\max_{\sigma}H_{t,s}(\sigma)

has

P(|M𝔼[M]||τkτs|η)exp(2η2|τkτs|).P(\lvert M-\mathbb{E}\!\left[M\right]\rvert\geq\sqrt{\lvert\tau_{k}-\tau_{s}\rvert}\eta)\leq\exp(-2\eta^{2}\lvert\tau_{k}-\tau_{s}\rvert).

Now, since Ht,sH_{t,s} is conditionally sub-gaussian, we may apply Talagrand’s comparison inequality [Ver18, Cor 8.6.2], to obtain

𝔼[M]C𝔼[M~]\mathbb{E}\!\left[M\right]\leq C\mathbb{E}\!\left[\tilde{M}\right]

for some universal constant where M~\tilde{M} is obtained from MM by replacing (XI,XIρ)(X_{I},X^{\rho}_{I}) with gaussian random vectors correlated in the same fashion. By a standard application of Slepian’s inequality, we obtain,

𝔼[M~]C(p)|τkτs|,\mathbb{E}\!\left[\tilde{M}\right]\leq C(p)\sqrt{\lvert\tau_{k}-\tau_{s}\rvert},

where we have used here that the variance of Ht,sH_{t,s} is at most 2(ts)2(t-s). Finally note that with probability 1exp(cnp)1-\exp(-cn^{p}), we have |τkτs|2(ts)\lvert\tau_{k}-\tau_{s}\rvert\leq 2(t-s). Combining these bounds yields the desired result. ∎

Acknowledgements

The authors are very grateful to Benjamin Rossman for educating us on the state of the art bounds in circuit complexity. The first author gratefully acknowledges the support from the NSF grant DMS-2015517. The second author gratefully acknowledges support from NSERC; cette recherche a été financée en partie par CRSNG [RGPIN-2020-04597, DGECR-2020-00199]. The third author was partially supported by NSF grant DMS-1712730 and by the Simons Collaboration on Algorithms and Geometry.

References

  • [AB09] Sanjeev Arora and Boaz Barak, Computational complexity: a modern approach, Cambridge University Press, 2009.
  • [AC16] Antonio Auffinger and Wei-Kuo Chen, Universality of chaos and ultrametricity in mixed p-spin models, Communications on Pure and Applied Mathematics 69 (2016), no. 11, 2107–2130.
  • [AMS20] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, arXiv preprint arXiv:2001.00904 (2020).
  • [AS04] Noga Alon and Joel H Spencer, The probabilistic method, John Wiley & Sons, 2004.
  • [CGPR19] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman, Suboptimality of local algorithms for a class of max-cut problems, The Annals of Probability 47 (2019), no. 3, 1587–1618.
  • [CH06] Philippe Carmona and Yueyun Hu, Universality in Sherrington–Kirkpatrick’s spin glass model, Annales de l’Institut Henri Poincare (B) Probability and Statistics, vol. 42, Elsevier, 2006, pp. 215–222.
  • [FJS91] Merrick L Furst, Jeffrey C Jackson, and Sean W Smith, Improved learning of AC0 functions, COLT, vol. 91, 1991, pp. 317–325.
  • [GJ21] David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for pp-spin models, The Annals of Probability 49 (2021), no. 1, 180–205.
  • [GJW20] David Gamarnik, Aukosh Jagannath, and Alexander S Wein, Low-degree hardness of random optimization problems, 61st Annual Symposium on Foundations of Computer Science, 2020.
  • [GT02] F. Guerra and F. L. Toninelli, The thermodynamic limit in mean field spin glass models, Commun. Math. Phys. 230 (2002), 71–79.
  • [Hås86] Johan Håstad, Almost optimal lower bounds for small depth circuits, Proceedings of the eighteenth annual ACM symposium on Theory of computing, 1986, pp. 6–20.
  • [LMN93] Nathan Linial, Yishay Mansour, and Noam Nisan, Constant depth circuits, fourier transform, and learnability, Journal of the ACM (JACM) 40 (1993), no. 3, 607–620.
  • [LRR17] Yuan Li, Alexander Razborov, and Benjamin Rossman, On the AC0 complexity of subgraph isomorphism, SIAM Journal on Computing 46 (2017), no. 3, 936–971.
  • [Mon21] Andrea Montanari, Optimization of the Sherrington–Kirkpatrick hamiltonian, SIAM Journal on Computing (2021), FOCS19–1.
  • [O’D14] Ryan O’Donnell, Analysis of boolean functions, Cambridge University Press, 2014.
  • [Pan13] Dmitry Panchenko, The Sherrington-Kirkpatrick model, Springer Science & Business Media, 2013.
  • [Par80] Giorgio Parisi, A sequence of approximated solutions to the SK model for spin glasses, Journal of Physics A: Mathematical and General 13 (1980), no. 4, L115.
  • [Ros10] Benjamin Rossman, Average-case complexity of detecting cliques, Ph.D. thesis, Massachusetts Institute of Technology, 2010.
  • [Ros18]  , Lower bounds for subgraph isomorphism, Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, World Scientific, 2018, pp. 3425–3446.
  • [Sen18] Subhabrata Sen, Optimization on sparse random hypergraphs and spin glasses, Random Structures & Algorithms 53 (2018), no. 3, 504–536.
  • [Sip97] M. Sipser, Introduction to the theory of computability, PWS Publishing Company, 1997.
  • [Sub21] Eliran Subag, Following the ground states of full-RSB spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021–1044.
  • [Tal06] Michel Talagrand, The Parisi formula, Annals of mathematics (2006), 221–263.
  • [Tal17] Avishay Tal, Tight bounds on the fourier spectrum of AC0, 32nd Computational Complexity Conference (CCC 2017), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
  • [Ver18] Roman Vershynin, High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge University Press, 2018.