This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Almost-Linear Planted Cliques
Elude the Metropolis Process

Zongchen Chen [email protected] Department of Mathematics, MIT Elchanan Mossel [email protected] Department of Mathematics, MIT Ilias Zadik [email protected] Department of Mathematics, MIT
Abstract

A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size k=Θ(nα),α(0,1/2)k=\Theta(n^{\alpha}),\alpha\in(0,1/2), which is planted in the Erdős-Rényi random graph G(n,1/2)G(n,1/2), in polynomial time. Information theoretically it is possible to find such planted cliques as soon as k(2+ε)lognk\geq(2+\varepsilon)\log n.

Since the work of Jerrum, the computational problem of finding a planted clique in G(n,1/2)G(n,1/2) was studied extensively and many polynomial time algorithms were shown to find the planted clique if it is of size k=Ω(n)k=\Omega(\sqrt{n}), while no polynomial-time algorithm is known to work when k=o(n)k=o(\sqrt{n}). The computational problem of finding a planted clique of k=o(n)k=o(\sqrt{n}) is now widely considered as a foundational problem in the study of computational-statistical gaps. Notably, the first evidence of the problem’s algorithmic hardness is commonly attributed to the result of Jerrum from 1992.

In this paper we revisit the original Metropolis algorithm suggested by Jerrum. Interestingly, we find that the Metropolis algorithm actually fails to recover a planted clique of size k=Θ(nα)k=\Theta(n^{\alpha}) for any constant 0α<10\leq\alpha<1, unlike many other efficient algorithms that succeed when α>1/2\alpha>1/2. Moreover, we strengthen Jerrum’s results in a number of other ways including:

  • Like many results in the MCMC literature, the result of Jerrum shows that there exists a starting state (which may depend on the instance) for which the Metropolis algorithm fails to find the planted clique in polynomial time. For a wide range of temperatures, we show that the algorithm fails when started at the most natural initial state, which is the empty clique. This answers an open problem stated in Jerrum (1992). It is rather rare to be able to show the failure of a Markov chain starting from a specific state.

  • We show that the simulated tempering version of the Metropolis algorithm, a more sophisticated temperature-exchange variant of it, also fails at the same regime of parameters.

Our results substantially extend Jerrum’s result. Furthermore, they confirm recent predictions by Gamarnik and Zadik (2019) and Angelini, Fachin, de Feo (2021). Finally, they highlight the subtleties of using the sole failure of one, however natural, family of algorithms as a strong sign of a fundamental statistical-computational gap.

1 Introduction

The problem of finding in polynomial-time large cliques in the nn-vertex Erdős-Rényi random graph 𝒢(n,1/2),\mathcal{G}(n,1/2), where each edge is present independently with probability 1/21/2, is a fundamental open problem in algorithmic random graph theory [Kar79]. In 𝒢(n,1/2)\mathcal{G}(n,1/2) it is known that there is a clique of size (2o(1))logn(2-o(1))\log n with high probability (w.h.p.) as n+n\rightarrow+\infty, and several simple polynomial-time algorithms can find a clique of size (1+o(1))logn(1+o(1))\log n w.h.p. which is nearly half the size of the maximum clique (e.g. see [GM75]). (Note that here and everywhere we denote by log\log the logarithm with base 2.) Interestingly, there is no known polynomial-time algorithm which is able to find a clique of size (1+ε)logn(1+\varepsilon)\log n for some constant ε>0\varepsilon>0. The problem of finding such a clique in polynomial-time was suggested by Karp [Kar79] and still remains open to this day.

Jerrum’s result and the planted clique model

Motivated by the challenge of finding a clique in 𝒢(n,1/2)\mathcal{G}(n,1/2), Jerrum in [Jer92] established that large cliques elude the Metropolis process in 𝒢(n,1/2)\mathcal{G}(n,1/2). Specifically, he considered the following Gibbs Measure for β>0,\beta>0, G𝒢(n,1/2)G\sim\mathcal{G}(n,1/2)

πβ(C)exp(β|C|),\pi_{\beta}(C)\propto\exp(\beta|C|), (1)

where CV(G)C\subseteq V(G) induces a clique in the instance GG. Notice that since β>0,\beta>0, by definition πβ\pi_{\beta} assigns higher mass on cliques of larger size. Jerrum considered the Metropolis process with stationary measure πβ\pi_{\beta}, which is initialized in some clique, say X0X_{0} of G.G. Then the process moves “locally” between cliques which differ in exactly one vertex. In more detail, every step of the Metropolis process is described as follows (see also Algorithm 1). Choose a vertex vVv\in V uniformly at random. If vXtv\in X_{t} where XtX_{t} is the current clique, then let Xt+1=Xt{v}X_{t+1}=X_{t}\setminus\{v\} (a “downward” step) with probability eβe^{-\beta} and let Xt+1=XtX_{t+1}=X_{t} with the remaining probability; else if vCtv\notin C_{t} and Xt{v}X_{t}\cup\{v\} is a clique, then let Xt+1=Xt{v}X_{t+1}=X_{t}\cup\{v\} (an “upward” step); otherwise, let Xt+1=XtX_{t+1}=X_{t}. For this process, Jerrum established the negative result that it fails to reach a clique of size (1+ε)logn(1+\varepsilon)\log n in polynomial-time for any constant ε>0\varepsilon>0. We note that, as customary in the theory of Markov chains, the failure is subject to the Metropolis process being “worst-case initialized”, that is starting from some “bad” clique X0X_{0}. This is a point we revisit later in this work.

The planted clique problem was introduced by Jerrum in [Jer92] in order to highlight the failure of the Metropolis process. For k,knk\in\mathbb{N},k\leq n the planted clique model 𝒢(n,1/2,k)\mathcal{G}(n,1/2,k) is defined by first sampling an instance of 𝒢(n,1/2)\mathcal{G}(n,1/2), then choosing kk out of the nn vertices uniformly at random and finally adding all the edges between them (if they did not already exist from the randomness of 𝒢(n,1/2)\mathcal{G}(n,1/2)). The set of kk chosen vertices is called the planted clique 𝒫𝒞\mathcal{PC}. It is perhaps natural to expect that the existence of 𝒫𝒞\mathcal{PC} in GG can assist the Metropolis process to reach faster cliques of larger size. Yet, Jerrum proved that as long as k=nαk=\left\lfloor n^{\alpha}\right\rfloor for some constant α<1/2\alpha<1/2 the Metropolis process continues to fail to find a clique of size (1+ε)logn(1+\varepsilon)\log n in polynomial-time, for any ε>0\varepsilon>0. As he also noticed when α>1/2\alpha>1/2 one can actually trivially recover 𝒫𝒞\mathcal{PC} from GG via a simple heuristic which chooses the top-kk degrees of the observed graph (see also [Kuč95]). In particular, one can trivially find a clique of size much larger than logn\log n when α>1/2.\alpha>1/2. Importantly though, he never proved that the Metropolis process actually succeeds in finding large cliques when α>1/2,\alpha>1/2, leaving open a potentially important piece of the performance of the Metropolis process in the planted clique model. In his words from the conclusion of [Jer92]:

“If the failure of the Metropolis process to reach (1+ε)logn(1+\varepsilon)\log n-size cliques could be shown to hold for some α>1/2\alpha>1/2, it would represent a severe indictment of the Metropolis process as a heuristic search technique for large cliques in a random graph.”

In this work, we seek to investigate the performance of the Metropolis process for all α(0,1).\alpha\in(0,1).

Input: a graph GG, a starting clique X0ΩX_{0}\in\Omega, stopping time TT
for t=1,,Tt=1,\dots,T do
       Pick vVv\in V uniformly at random;
       CXt1{v}C\leftarrow X_{t-1}\oplus\{v\};
       if CΩC\in\Omega then
             Xt{C,with probability min{1,exp(β(|C||Xt|))};Xt1,with remaining probability;X_{t}\leftarrow\begin{cases}C,&\text{with probability~{}}\min\{1,\exp(\beta(|C|-|X_{t}|))\};\\ X_{t-1},&\text{with remaining probability};\end{cases}
      else
            XtXt1X_{t}\leftarrow X_{t-1};
            
       end if
      
end for
Output: XTX_{T}
Algorithm 1 Metropolis Process [Jer92]
The planted clique conjecture

Following the work of Jerrum [Jer92] and Kucera [Kuč95] the planted clique model has received a great deal of attention and became a hallmark of a research area that is now called study of statistical-computational gaps. The planted clique problem can be phrased as a statistical or inference problem in the following way: given an instance of 𝒢(n,1/2,k)\mathcal{G}(n,1/2,k) recover 𝒫𝒞,\mathcal{PC}, the planted clique vertices. It is impossible to recover the clique when it is of size k<(2ε)lognk<(2-\varepsilon)\log n for any constant ε>0\varepsilon>0 (see e.g. [ACV14]), but possible information theoretically (and in quasi-polynomial time nO(logn)n^{O(\log n)}) whenever k>(2+ε)lognk>(2+\varepsilon)\log n for any constant ε>0\varepsilon>0 (see e.g. the discussion in [FGR+17]). Meanwhile, multiple polynomial-time algorithms have been proven to succeed in recovering the planted clique but only under the much larger size k=Ω(n)k=\Omega(\sqrt{n}) [AKS98, RF10, DGGP14, DM15]. The intense study of the model, as well as the failure to find better algorithms, has lead to the planted clique conjecture, stating that the planted clique recovery task is impossible in polynomial-time, albeit information-theoretically possible, whenever (2+ε)lognk=o(n)(2+\varepsilon)\log n\leq k=o(\sqrt{n}). The planted clique conjecture has since lead to many applications, a highlight of which is that it serves as the main starting point for building reductions between a plethora of statistical tasks and their computational-statistical trade-offs (see e.g. [BR13, MW15, BB20]).

Unfortunately, because of the average-case nature of the planted clique model, a complexity theory explanation is still lacking for the planted clique conjecture. For this reason, researchers have so far mainly focused on supporting the conjecture by establishing the failure of restricted families of polynomial-time methods, examples of which are Sum-of-Squares lower bounds [BHK+19], statistical-query lower bounds [FGR+17] and low-temperature MCMC lower bounds [GZ19]. Because of the focus on establishing such restricted lower bounds, the vast majority of works studying the planted clique conjecture cite the result of Jerrum [Jer92] on the failure of the Metropolis process as the “first evidence” for the validity of it. Note though that given what we discussed above, such a claim for Jerrum’s result can be problematic. Indeed, recall that Jerrum have not established the success of the Metropolis process when k=nα,α>1/2k=\left\lfloor n^{\alpha}\right\rfloor,\alpha>1/2 in reaching cliques of size (1+ε)logn(1+\varepsilon)\log n, let alone identifying the planted clique. That means that the Metropolis process could in principle simply be not recovering the planted clique for all α(0,1),\alpha\in(0,1), offering no evidence for the planted clique conjecture.

In fact, in 2019, Gamarnik and Zadik [GZ19] studied the performance of various MCMC methods (not including the Metropolis process described above) for the planted clique model and conjectured their failure to recover the planted clique when k<n2/3k<n^{2/3}, that is the failure much beyond the n\sqrt{n} threshold. For this reason, they raised again the question whether Jerrum’s original Metropolis process actually fails for kk larger than n\sqrt{n}. Later work by Angelini, Fachin and de Feo [AFdF21] simulated the performance of the Metropolis process and predicted its failure to recover the planted clique again much beyond the n\sqrt{n} threshold. Both of these results suggest that the Metropolis process may not be able to recover the planted clique for some values of k=nα,α>1/2k=n^{\alpha},\alpha>1/2. We consider the absence of a negative or positive result for whether the Metropolis process of [Jer92] recovers the planted clique a major gap in the literature of the planted clique conjecture which we investigate in this work.

Empty clique initialization

A common deficiency of many Markov chain lower bounds is their failure to establish lower bounds for starting from any particular state. Indeed, many of the lower bounds in the theory of Markov chains including spectral and conductance lower bounds are proved with high probability or in expectation over the stationary measure of the chain. Of course, since the lower bounds provide usually evidence that is hard to sample from the stationary distribution, the deficiency of the lower bound is that it is hard to find a state where the lower bound applies! This is indeed the case for the specific example of the Metropolis process in [Jer92], where a-priori, other than the empty set we do not know which subsets of the nodes make a clique.

Jerrum noted this deficiency in his paper [Jer92]:

“The most obviously unsatisfactory feature of Theorems … is that these theorems assert only the existence of a starting state from which the Metropolis process takes super-polynomial time to reach a large clique … It seems almost certain that the empty clique is a particular example of a bad starting state, but different proof techniques would be required to demonstrate this.”

In this work we develop a new approach based on comparison with birth and death processes allowing us to prove lower bounds starting from the empty clique. We note that previous work on birth and death processes established Markov chain bounds starting from a specific state [DSC06].

2 Main Contribution

We now present our main contributions on the performance of the Metropolis process for the planted clique model 𝒢(n,1/2,k),\mathcal{G}(n,1/2,k), where k=nαk=\left\lfloor n^{\alpha}\right\rfloor for some constant α(0,1)\alpha\in(0,1). Our results hold for the Metropolis process associated with any Gibbs measure defined as follows. For an arbitrary Hamiltonian vector h=(hq,q[n])h=(h_{q},q\in[n]) and arbitrary β0,\beta\geq 0, let

πβ(C)exp(βh|C|),CΩ(G)\displaystyle\pi_{\beta}(C)\propto\exp(\beta h_{|C|}),C\in\Omega(G) (2)

where by Ω(G)\Omega(G) we denote the cliques of GG. In some results, we would require a small degree of regularity from the vector hh, which for us is to satisfy h0=0h_{0}=0 and to be 1-Lipschitz in the sense that

|hqhq||qq|,q,q[2logn].\displaystyle|h_{q}-h_{q^{\prime}}|\leq|q-q^{\prime}|,\forall q,q^{\prime}\in[\left\lfloor 2\log n\right\rfloor]. (3)

We call these two conditions, as simply hh being “regular”. The regularity property is for technical reasons, as it allows to appropriately bound the transition probabilities of the Metropolis process between small cliques. Notice that hq=q,q[n]h_{q}=q,q\in[n] is trivially regular and corresponds to the Gibbs measure and Metropolis process considered by Jerrum, per (1).

Our first theorem is a very general lower bound which holds under worst-case initialization for all α(0,1).\alpha\in(0,1).

Theorem 2.1 (Informal version of Theorem 6.1 and Theorem 7.1).

Let k=nαk=\left\lfloor n^{\alpha}\right\rfloor for any α(0,1)\alpha\in(0,1).

  1. I.

    For arbitrary hh and arbitrary inverse temperature β\beta, for any constant γ>0\gamma>0 there exists a “bad” initialization such that the Metropolis process requires nΩ(logn)n^{\Omega(\log n)} time to reach γlogn\gamma\log n intersection with the planted clique.

  2. II.

    For arbitrary regular hh and arbitrary inverse temperature β=O(logn)\beta=O(\log n), for any constant ε>0,\varepsilon>0, there exists a “bad” initialization such that the Metropolis process requires nΩ(logn)n^{\Omega(\log n)} time to reach a clique of size (1+ε)logn(1+\varepsilon)\log n.

One way to think about the two parts of Theorem 2.1 is in terms of the statistical failure and the optimization failure of the Metropolis algorithm. The first part establishes the statistical failure as the algorithm cannot even find γlogn\gamma\log n vertices of the planted clique. The second part shows that it fails as an optimization algorithm: the existence of a huge planted clique still does not improve the performance over the (1+ε)logn(1+\varepsilon)\log n level.

Note that second part extends the result of [Jer92] to all α(0,1),\alpha\in(0,1), when β=O(logn)\beta=O(\log n). (The case β=ω(logn)\beta=\omega(\log n) is proven below in Theorem 2.3 since for that low temperature the process behaves like the greedy algorithm). In Jerrum’s words, our result reveals “a severe indictment of the Metropolis process in finding large cliques in random graphs”; even a n1δn^{1-\delta}-size planted clique does not help the Metropolis process to reach cliques of size (1+ε)logn(1+\varepsilon)\log n in polynomial time. At a technical level it is an appropriate combination of the bottleneck argument used by Jerrum in [Jer92], which focus on comparing cliques based on their size, and a separate bottleneck argument comparing cliques based on how much they intersect the planted clique.

Our next result concerns the case where the Metropolis process is initialized from the empty clique. We obtain for all β=o(logn)\beta=o(\log n) the failure of the Metropolis process starting from any o(logn)o(\log n)-size clique (including in particular the empty clique). In particular, this answers in the affirmative the question from [Jer92] for all β=o(logn).\beta=o(\log n).

Theorem 2.2 (Informal version of Theorem 6.2 and Theorem 7.6).

Let k=nαk=\left\lfloor n^{\alpha}\right\rfloor for any α(0,1)\alpha\in(0,1). Then for arbitrary regular hh and arbitrary inverse temperature β=o(logn)\beta=o(\log n), the Metropolis process started from any clique of size o(logn)o(\log n) (in particular the empty clique) requires nΩ(logn)n^{\Omega(\log n)} time to reach a clique which for some constants γ,ε>0\gamma,\varepsilon>0 either

  • has at least γlogn\gamma\log n intersection with the planted clique or,

  • has size at least (1+ε)logn.(1+\varepsilon)\log n.

The proof of Theorem 2.2 is based on the expansion properties of all (1ε)logn(1-\varepsilon)\log n-cliques of G(n,1/2)G(n,1/2). The expansion properties allow us to compare the Metropolis process to an one dimensional birth and death process that keeps track of the size of the clique (or the size of the intersection with the hidden clique). The analysis of this process is based on a time-reversal argument.

One can wonder, whether a lower bound can be also established in the case β=Ω(logn)\beta=\Omega(\log n) when we start from the empty clique. We partially answer this, by obtaining the failure of the Metropolis process starting from the empty clique in the case where hq=q,q[n]h_{q}=q,q\in[n] and β=ω(logn)\beta=\omega(\log n).

Theorem 2.3 (Informal version of Theorem 7.7).

Let k=nαk=\left\lfloor n^{\alpha}\right\rfloor for any α(0,1)\alpha\in(0,1). For hq=q,q[n]h_{q}=q,q\in[n] and arbitrary inverse temperature β=ω(logn)\beta=\omega(\log n), the Metropolis process started from any clique of size o(logn)o(\log n) (in particular the empty clique) requires nω(1)n^{\omega(1)} time to reach a clique which for some constants γ,ε>0\gamma,\varepsilon>0 either

  • has at least γlogn\gamma\log n intersection with the planted clique or,

  • has size at least (1+ε)logn.(1+\varepsilon)\log n.

The key idea here is that for hq=q,h_{q}=q, if β=ω(logn)\beta=\omega(\log n) then with high probability the Metropolis process never removes vertices, so it is in fact the same as the greedy algorithm. This observation allows for a much easier analysis of the algorithm.

In a different direction, Jerrum in [Jer92] asked whether one can extend the failure of the Metropolis process to the failure of simulated annealing on finding large cliques in random graphs. We make a step also in this direction, by considering the simulated tempering (ST) version of the Metropolis process [MP92]. The simulated tempering is a celebrated Monte Carlo scheme originated in the physics literature that considers a Gibbs measure, say the one in (2), at different temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m}, in other words considers the Gibbs measures πβ1,πβ2,,πβM\pi_{\beta_{1}},\pi_{\beta_{2}},\ldots,\pi_{\beta_{M}}. Then it runs a Metropolis process on the product space between the set of temperatures and the Gibbs measures, which allows to modify the temperature during its evolution and interpolate between the different πβi\pi_{\beta_{i}} (for the exact definitions see Section 8.1). The ST dynamics have been observed extensively in practise to outperform their single temperature Metropolis process counterparts but rigorous analysis of these processes is rather rare, see [BR04].

It turns out that our “bad” initialization results extend in a straightforward manner to the ST dynamics.

Theorem 2.4 (Informal version of Theorem 8.3, and Theorem 8.4).

Let k=nαk=\left\lfloor n^{\alpha}\right\rfloor for any α(0,1)\alpha\in(0,1).

  • For arbitrary hh, arbitrary mm\in\mathbb{N} and arbitrary sequence of inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m}, there exists a “bad” initialization such as the ST dynamics require nΩ(logn)n^{\Omega(\log n)} time to reach εlogn\varepsilon\log n intersection with the planted clique, for some constant ε>0.\varepsilon>0.

  • For arbitrary regular hh, arbitrary mm\in\mathbb{N} and arbitrary sequence of inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m} with maxi|βi|=O(logn)\max_{i}|\beta_{i}|=O(\log n), there exists a “bad” initialization such as the ST dynamics require nΩ(logn)n^{\Omega(\log n)} time to reach a clique of size (1+ε)logn(1+\varepsilon)\log n-size for some constant ε>0.\varepsilon>0.

The key idea behind the proof of Theorem 2.4 is that the bottleneck set considered in the proof of Theorem 2.1 is “universal”, in the sense that the same bottleneck set applies to all inverse temperatures β.\beta. For this reason, the same bottleneck argument can be applied to simulated tempering that is allowed to change temperatures during its evolution.

On top of this, we establish a lower bound on the ST dynamics, when started from the empty clique.

Theorem 2.5 (Informal version of Theorem 8.5).

Let k=nαk=\left\lfloor n^{\alpha}\right\rfloor for any α(0,1)\alpha\in(0,1). For arbitrary regular and increasing hq=q,q[n],h_{q}=q,q\in[n], arbitrary m=o(logn)m=o(\log n) and arbitrary sequence of inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m} with maxi|βi|=O(1)\max_{i}|\beta_{i}|=O(1), the ST dynamics started from the pair of the empty clique and the temperature β1\beta_{1} require nω(1)n^{\omega(1)} time to reach a clique which for some constant γ,ε>0\gamma,\varepsilon>0 either

  • has at least γlogn\gamma\log n intersection with the planted clique or,

  • has size at least (1+ε)logn.(1+\varepsilon)\log n.

2.1 Further Comparison with Related Work

Comparison with [AFdF21]

As mentioned in the Introduction, the authors of [AFdF21] predicted the failure of the Metropolis process for the Gibbs measure (1) to recover the planted clique. Specifically, based on simulations they predicted the failure for all k=nα,k=\left\lfloor n^{\alpha}\right\rfloor, for α<α0.91,\alpha<\alpha^{*}\approx 0.91, though they comment that the exact predicted threshold α0.91\alpha^{*}\approx 0.91 could be an artifact of finite sized effects. In this work, using Theorem 2.1 we establish that (worst-case initialized) the Metropolis process fails for all α(0,1),\alpha\in(0,1), confirming their prediction but for α=1.\alpha^{*}=1. In the same work, the authors suggest studying instead the Metropolis process for another Gibbs measure of the form (2), which they call BayesMC. Their suggested Gibbs measure for a specific value of β\beta matches a (slightly perturbed) version of the posterior of the planted clique recovery problem. The authors predict based on simulations that by appropriately turning (and “mismatching”) β>0\beta>0, the Metropolis chain now recovers the planted clique for all k=nα,α>1/2k=\left\lfloor n^{\alpha}\right\rfloor,\alpha>1/2. In this work, we partially refute this prediction using Theorem 2.1, as (worst-case initialized) the Metropolis process for any Gibbs measure (2), including the mismatched posterior that [AFdF21] considers, fails to recover the planted clique for all α(0,1).\alpha\in(0,1).

MCMC underperformance in statistical inference

Our lower bounds show the suboptimality of certain natural MCMC methods in inferring the planted clique in a regime where simple algorithms work. Interestingly, this agrees with a line of work establishing the suboptimality of MCMC methods in inferring hidden structures in noisy environments. Such a phenomenon have been generally well-understood in the context of tensor principal component analysis [RM14, BAGJ20], where Langevin dynamics and gradient descent on the empirical landscape are known to fail to infer the hidden tensor, when simple spectral methods succeed. Moreover, this suboptimality has been recently observed through statistical physics methods for other models including sparse PCA [AFUZ19], the spiked matrix-tensor model [MKUZ19] and phase retrieval [MBC+20]. Finally, it has been also observed for a different family of MCMC methods but again for the planted clique model in [GZ19].

3 Proof Techniques and Intuitions

In this section, we offer intuition regarding the proofs of our results. In the first two subsections, we make a proof overview subsection we provide intuition behind our worst-case initialization results for the Metropolis process Theorem 2.1 and ST dynamics Theorem 2.4. In the following one we discuss about our results on the Metropolis process and ST dynamics with the empty clique initialization Theorems 6.2 and 7.6.

3.1 Worst-case Initialization for Reaching Large Intersection

We start with discussing the lower bound for obtaining γlogn\gamma\log n intersection with the planted clique. We employ a relatively simple bottleneck argument, based on Lemma 5.5.

For q,rq,r\in\mathbb{N} let us denote by Wq,rW_{q,r} the number of cliques in G𝒢(n,1/2,k)G\sim\mathcal{G}(n,1/2,k) which have size qq and intersect the planted clique exactly on rr vertices. Our first starting point towards building the bottleneck is the following simple observation. For any q<2logn,q<2\log n, and r=γlognr=\gamma\log n for a constant 0<γ<2(1α)0<\gamma<2(1-\alpha) we have w.h.p. as n+.n\rightarrow+\infty.

Wq,r/Wq,0nΩ(logn).\displaystyle W_{q,r}/W_{q,0}\leq n^{-\Omega(\log n)}. (4)

In words, the number of qq-cliques of intersection rr with the planted clique are a quasi-polynomial factor less than the number of qq-cliques which are disjoint with the planted clique. Indeed, at the first-moment level it can be checked to hold 𝔼[Wq,r]/𝔼[Wq,0]=exp((((1α)γ+γ22)ln2(logn)2)\mathbb{E}[W_{q,r}]/\mathbb{E}[W_{q,0}]=\exp((-((1-\alpha)\gamma+\frac{\gamma^{2}}{2})\ln 2(\log n)^{2}) and a standard second moment argument gives the result (both results are direct outcomes of Lemma 5.1). Notice that this property holds for all α(0,1).\alpha\in(0,1).

Now let us assume for start that β=0\beta=0. In that case the Gibbs measure π0\pi_{0} is simply the uniform measure over cliques of GG. For r=γlognr=\gamma\log n let 𝒜r\mathcal{A}_{r} be the subset of the cliques with intersection with the planted clique at most rr. Using standard results (see Lemma 5.5) to prove our hitting time lower bound it suffice to show for any r=γlognr=\gamma\log n with constant γ>0,\gamma>0,

π0(𝒜r)π0(𝒜r)=q=1nWq,rq[n],0srWq,snΩ(logn),\displaystyle\frac{\pi_{0}(\partial\mathcal{A}_{r})}{\pi_{0}(\mathcal{A}_{r})}=\frac{\sum_{q=1}^{n}W_{q,r}}{\sum_{q\in[n],0\leq s\leq r}W_{q,s}}\leq n^{-\Omega(\log n)}, (5)

w.h.p. as n+n\rightarrow+\infty. Indeed, given such result based on Lemma 5.5 there is an initialization of the Metropolis process in 𝒜r\mathcal{A}_{r} from which it will take quasi-polynomial time to reach the boundary of 𝒜r\mathcal{A}_{r}, which are exactly the cliques of intersection rr with the planted clique.

Now another first moment argument, (an outcome of part (3) of Lemma 5.1) allows us to conclude that since γ<2(1α)\gamma<2(1-\alpha) is not too large, the only cliques of intersection r=γlognr=\gamma\log n with the planted clique satisfy q<(2ε)lognq<(2-\varepsilon)\log n for small enough ε>0.\varepsilon>0. In other words Wq,r=0W_{q,r}=0 unless q<(2ε)logn.q<(2-\varepsilon)\log n. But now using also (4) we have

q[n],0srWq,sq[(2ε)logn]Wq,0nΩ(logn)q[(2ε)logn]Wq,r=nΩ(logn)q[n]Wq,r,\displaystyle\sum_{q\in[n],0\leq s\leq r}W_{q,s}\geq\sum_{q\in[(2-\varepsilon)\log n]}W_{q,0}\geq n^{\Omega(\log n)}\sum_{q\in[(2-\varepsilon)\log n]}W_{q,r}=n^{\Omega(\log n)}\sum_{q\in[n]}W_{q,r},

and the result follows.

Now, the interesting thing is that the exact same calculation works for arbitrary β0\beta\geq 0 and arbitrary Hamiltonian vector h.h. Consider as before the same subset of cliques 𝒜r.\mathcal{A}_{r}. It suffices to show

πβ(𝒜r)πβ(𝒜r)=q=1nWq,reβhqq[n],0srWq,seβhqnΩ(logn),\displaystyle\frac{\pi_{\beta}(\partial\mathcal{A}_{r})}{\pi_{\beta}(\mathcal{A}_{r})}=\frac{\sum_{q=1}^{n}W_{q,r}e^{\beta h_{q}}}{\sum_{q\in[n],0\leq s\leq r}W_{q,s}e^{\beta h_{q}}}\leq n^{-\Omega(\log n)}, (6)

But as before

q[n],0srWq,seβhqq[(2ε)logn]Wq,0eβhqnΩ(logn)q[(2ε)logn]Wq,reβhq=nΩ(logn)q[n]Wq,reβhq,\displaystyle\sum_{q\in[n],0\leq s\leq r}W_{q,s}e^{\beta h_{q}}\geq\sum_{q\in[(2-\varepsilon)\log n]}W_{q,0}e^{\beta h_{q}}\geq n^{\Omega(\log n)}\sum_{q\in[(2-\varepsilon)\log n]}W_{q,r}e^{\beta h_{q}}=n^{\Omega(\log n)}\sum_{q\in[n]}W_{q,r}e^{\beta h_{q}},

and the general result follows.

Interestingly, this bottleneck argument transfers easily to a bottleneck for the ST dynamics. The key observation is that to construct the bottleneck set for the Metropolis process, we used the same bottleneck set 𝒜r\mathcal{A}_{r} for all inverse temperatures β\beta and Hamiltonian vectors hh. Now the ST dynamics are a Metropolis process on the enlarged product space of the temperatures times the cliques. In particular, the stationary distribution of the ST dynamics is simply a mixture of the Gibbs distributions πβi,i[m]\pi_{\beta_{i}},i\in[m] say πST=i=1mviπβi\pi_{\mathrm{ST}}=\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}} for some weights vi,i[m]v_{i},i\in[m] (see Section 8.1 for the exact choice of the weights vi,i[m]v_{i},i\in[m], though they are not essential for the argument). Now using (6) and that the boundary operator satisfies ([m]×𝒜r)=[m]×(𝒜r)\partial([m]\times\mathcal{A}_{r})=[m]\times\partial(\mathcal{A}_{r}) we immediately have

πST(([m]×𝒜r))πST([m]×𝒜r)=i=1mviπβi(𝒜r)i=1mviπβi(𝒜r)nΩ(logn).\displaystyle\frac{\pi_{\mathrm{ST}}(\partial([m]\times\mathcal{A}_{r}))}{\pi_{\mathrm{ST}}([m]\times\mathcal{A}_{r})}=\frac{\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}}(\partial\mathcal{A}_{r})}{\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}}(\mathcal{A}_{r})}\leq n^{-\Omega(\log n)}. (7)

The result then follows again using Lemma 5.5.

3.2 Worst-case Initialization for Reaching Large Cliques

The worst-case initialization lower bound for reaching (1+ε)logn(1+\varepsilon)\log n cliques is also based on a bottleneck argument. This time the argument is more involved and is a combination of the bottleneck argument by Jerrum in [Jer92] which worked for α<1/2\alpha<1/2 and the bottleneck argument used in the previous subsection for reaching large intersection with the planted clique which worked for all α<1.\alpha<1.

We first need some notation. For rqr\leq q we denote by Ωq,r\Omega_{q,r} the set of qq-cliques of intersection rr with the planted clique and Ωq,\Omega_{q,*} the set of all cliques of size qq. We also define Ωq,<r\Omega_{q,<r} the set of qq-cliques of intersection less than rr with the planted clique and analogously by Ω<q,r\Omega_{<q,r} the set of cliques of size less than qq and intersection rr with the planted clique

We first quickly remind the reader the argument of Jerrum. Recall the notion of a qq-gateway clique from [Jer92], which informally is the last clique of its size in some path starting from the empty clique and ending to a clique of size qq (see Section 7 for the exact definition). For qq we call Ψq\Psi_{q} the set of of qq-gateway cliques. Importantly, for any p<qp<q any path from the empty clique to a qq-clique crosses from some qq-gateway of size p.p. Jerrum’s bottleneck argument in [Jer92] is then based on the observation that assuming α<1/2\alpha<1/2 for ε>0\varepsilon>0 a small enough constant, if p=(1+2ε/3)lognp=\left\lfloor(1+2\varepsilon/3)\log n\right\rfloor and q=(1+ε)lognq=\left\lfloor(1+\varepsilon)\log n\right\rfloor then

|ΨqΩp,|/|Ωp,|nΩ(logn).\displaystyle|\Psi_{q}\cap\Omega_{p,*}|/|\Omega_{p,*}|\leq n^{-\Omega(\log n)}. (8)

Unfortunately, such a bottleneck argument is hopeless if α>1/2\alpha>1/2 since in that case most cliques of size at most qq are fully included in the planted clique and the ratio trivializes.

We identify the new bottleneck by leveraging more the “intersection axis” with the planted clique”. Our first observation is that a relation like (8) holds actually for all α<1\alpha<1 if the cliques are restricted to have low-intersection with the planted clique, i.e. for all α(0,1)\alpha\in(0,1) if r=γlognr=\gamma\log n for small enough constant γ>0\gamma>0 and p,qp,q defined as above then it holds

|ΨqΩp,<r|/|Ωp,<r|nΩ(logn).\displaystyle|\Psi_{q}\cap\Omega_{p,<r}|/|\Omega_{p,<r}|\leq n^{-\Omega(\log n)}. (9)

The second observation, is that for the Metropolis process to hit a clique of size q=(1+ε)lognq=(1+\varepsilon)\log n one needs to hit either Ω<q,r,\Omega_{<q,r}, that is a clique of size less than qq and intersection rr with the planted clique, or a clique in ΨqΩp,<r,\Psi_{q}\cap\Omega_{p,<r}, that is a qq-gateway of size pp and intersection less than r.r. Set this target set =(ΨqΩp,<r)Ω<q,r\mathcal{B}=(\Psi_{q}\cap\Omega_{p,<r})\cup\Omega_{<q,r} which we hope to be “small” enough to create a bottleneck.

We now make the following construction which has boundary included in the target set .\mathcal{B}. Consider 𝒜\mathcal{A} the set of all cliques reachable from a path with start from the empty clique and uses only cliques not included in ,\mathcal{B}, except maybe for the destination. It is easy to see AB\partial A\subseteq B and because of the inclusion of qq-gateways in the definition of ,\mathcal{B}, one can also see that no qq-clique is in 𝒜\mathcal{A}. Therefore using Lemma 5.5 it suffices to show that w.h.p.

πβ()/πβ(𝒜)nΩ(logn).\displaystyle\pi_{\beta}(\mathcal{B})/\pi_{\beta}(\mathcal{A})\leq n^{-\Omega(\log n)}. (10)

To show (10) we observe that Ωp,<rΩp,0𝒜\Omega_{p,<r}\cup\Omega_{\leq p,0}\subset\mathcal{A}, that is 𝒜\mathcal{A} contains all cliques of size pp and intersection less than rr with the planted clique, or size less than pp and are disjoint with the planted clique. Indeed, it is straightforward than one can reach these cliques from a path from the empty clique without using cliques from \mathcal{B} besides maybe the destination.

A final calculation then gives

πβ()/πβ(𝒜)|ΨqΩp,<r|/|Ωp,<r|+πβ(Ω<q,r)/πβ(Ωp,0)\displaystyle\pi_{\beta}(\mathcal{B})/\pi_{\beta}(\mathcal{A})\leq|\Psi_{q}\cap\Omega_{p,<r}|/|\Omega_{p,<r}|+\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0}) (11)

The first term is quasipolynomially small according to (9). For the second term, notice that from the equation (5) from the previous subsection for all xqx\leq q it holds

|Ωx,r|=Wx,rnΩ(logn)Wx,0=nΩ(logn)|Ωx,0|.|\Omega_{x,r}|=W_{x,r}\leq n^{-\Omega(\log n)}W_{x,0}=n^{-\Omega(\log n)}|\Omega_{x,0}|.

Now using a first and second moment argument mostly based on Lemma 5.1 we prove that for all p<pqp<p^{\prime}\leq q it holds w.h.p.

πβ(Ωp,0)exp(O((β+logn)(|pp|))πβ(Ωp,0).\pi_{\beta}(\Omega_{p,0})\leq\exp({O((\beta+\log n)(|p-p^{\prime}|))}\pi_{\beta}(\Omega_{p^{\prime},0}).

Combining the above with a small case-analysis this allows us to conclude

πβ(Ω<q,r)/πβ(Ωp,0)exp(O((β+logn)(qp)))nΩ(logn).\displaystyle\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0})\leq\exp\left({O((\beta+\log n)(q-p))}\right)n^{-\Omega(\log n)}.

Now since |qp|=O(εlogn)|q-p|=O(\varepsilon\log n) and β=O(logn)\beta=O(\log n) we can always choose ε>0\varepsilon>0 small enough but constant so that βε/logn\beta\varepsilon/\log n is at most a small enough constant and in particular

πβ(Ω<q,r)/πβ(Ωp,0)nΩ(logn).\displaystyle\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0})\leq n^{-\Omega(\log n)}.

This completes the proof overview for the failure of the Metropolis process.

For the ST dynamics notice that the only way the value of β\beta is used for the construction of the bottleneck is to identify a value of ε\varepsilon so that the term βε/logn\beta\varepsilon/\log n is small enough, which then allows to choose the values of p,q.p,q. But now if we have a sequence of inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m} with maxi|βi|=O(logn)\max_{i}|\beta_{i}|=O(\log n) we can choose a “universal” ε\varepsilon so that for all ii, (maxi=1mβi)ε\left(\max_{i=1}^{m}\beta_{i}\right)\varepsilon is small enough, leading to a “universal” bottleneck construction for all πβi.\pi_{\beta_{i}}. The proof then follows exactly the proof for the ST failure described in the previous subsection.

3.3 Failure when Starting from the Empty Clique

Here we explain the failure of the Metropolis process in the high temperature regime β=o(logn)\beta=o(\log n) when starting from the empty clique. We show in Theorems 6.2 and 7.6 that, when starting from the empty clique which is the most natural and common choice of initialization, the Metropolis process fails to reach either cliques of intersection with the empty clique at least γlogn\gamma\log n for any small constant γ>0\gamma>0, or cliques of size (1+ε)logn(1+\varepsilon)\log n for any small constant ε>0\varepsilon>0.

One important observation is that since we are only considering when the process reaches either intersection γlogn\gamma\log n or size (1+ε)logn(1+\varepsilon)\log n, we may assume that the process will stop and stay at the same state once hitting such cliques. In particular, this means we can exclude from the state space all cliques of intersection >γlogn>\gamma\log n and all cliques of size >(1+ε)logn>(1+\varepsilon)\log n, and consider only cliques CC such that

|C𝒫𝒞|γlognand|C|(1+ε)logn.|C\cap\mathcal{PC}|\leq\gamma\log n\quad\text{and}\quad|C|\leq(1+\varepsilon)\log n. (12)

Indeed, the geometry of the restricted state space to these cliques are what really matters for whether the Metropolis process starting from the empty clique can reach our desired destinations or not. In particular, for γ\gamma sufficiently small (say, γ1α\gamma\leq 1-\alpha), most of cliques satisfying Eq. 12 will have intersection o(logn)o(\log n) and also size (1±o(1))logn(1\pm o(1))\log n. Note that this partially explains why having a large planted clique of size nαn^{\alpha} does not help much for the tasks of interest, since for the restricted state space (those satisfying Eq. 12) most cliques do not have vertices from the planted clique and so any constant α<1\alpha<1 does not help.

The key property that allows us to establish our result is a notion we call the “expansion” property. Observe that, for a clique CC of size q=ρlognq=\rho\log n with intersection o(logn)o(\log n), the expected number of vertices which can be added to CC to become a larger clique is n/2q=n1ρ~{}n/2^{q}=n^{1-\rho}; this is also the number of common neighbors of vertices from CC. Via the union bound and a concentration inequality, one can easily show that in fact, for all cliques with ρ<1\rho<1 the number of common neighbors is concentrated around its expectation with constant multiplicative error; see Definitions 6.4 and 6.5 (where we actually only need the lower bound). This immediately implies that

Pr(|Xt|=p1|Xt1|=p)=pneβandPr(|Xt|=p+1|Xt1|=p)12p.\displaystyle\Pr(|X_{t}|=p-1\mid|X_{t-1}|=p)=\frac{p}{n}e^{-\beta}\quad\text{and}\quad\Pr(|X_{t}|=p+1\mid|X_{t-1}|=p)\approx\frac{1}{2^{p}}.

which allows us to track how the size of cliques evolves for the Metropolis process, under the assumption that the intersection is always o(logn)o(\log n).

To actually prove our result, it will be helpful to consider the time-reversed dynamics and argue that when starting from cliques of intersection γlogn\gamma\log n or cliques of size (1+ε)logn(1+\varepsilon)\log n, it is unlikely to reach the empty clique. Suppose we have the identity Hamiltonian function hi=ih_{i}=i for simplicity. Consider as an example the probability of hitting cliques of large intersection as in Theorem 6.2. Recall that Ωp,s\Omega_{p,s} is the collection of cliques of size p=ρlognp=\rho\log n for ρ1+ε\rho\leq 1+\varepsilon and of intersection s=γlogns=\gamma\log n. Then by reversibility we have that for all t1t\geq 1,

Pr(XtΩp,sX0=)=σΩp,sPr(Xt=σX0=)=eβpσΩp,sPr(Xt=X0=σ)\displaystyle\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=\emptyset\right)=\sum_{\sigma\in\Omega_{p,s}}\Pr\left(X_{t}=\sigma\mid X_{0}=\emptyset\right)=e^{\beta p}\sum_{\sigma\in\Omega_{p,s}}\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)

Notice that eβp=no(logn)e^{\beta p}=n^{o(\log n)} as β=o(logn)\beta=o(\log n). Our intuition is that in fact, for any clique σ\sigma of size pp, the probability of reaching the empty clique when starting from σ\sigma is at most 1/𝔼[Wp,]~{}1/\mathbb{E}[W_{p^{\prime},*}] where p=min{p,logn}p^{\prime}=\min\{p,\log n\}; that is, for all integer t1t\geq 1 and all clique σ\sigma of size pp,

Pr(Xt=X0=σ)1𝔼[Wp,],\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)\leq\frac{1}{\mathbb{E}[W_{p^{\prime},*}]}, (13)

and hence we obtain

Pr(XtΩp,sX0=)eβp𝔼[Wp,s]𝔼[Wp,]nclogn,\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=\emptyset\right)\lesssim e^{\beta p}\frac{\mathbb{E}[W_{p,s}]}{\mathbb{E}[W_{p^{\prime},*}]}\leq n^{-c^{\prime}\log n},

where the last inequality is because most cliques has intersection o(logn)o(\log n) and size (1+o(1))logn(1+o(1))\log n.

The proof of Eq. 13 utilizes the expansion property mentioned above to analyze the time-reversed dynamics for |Xt||X_{t}|. More specifically, we introduce an auxiliary birth and death process {Yt}\{Y_{t}\} on [n][n] which is stochastically dominated by {|Xt|}\{|X_{t}|\}, in the sense that

Pr(Yt=p1Yt1=p)\displaystyle\Pr(Y_{t}=p-1\mid Y_{t-1}=p) =pneβ=Pr(|Xt|=p1|Xt1|=p);\displaystyle=\frac{p}{n}e^{-\beta}=\Pr(|X_{t}|=p-1\mid|X_{t-1}|=p);
Pr(Yt=p+1Yt1=p)\displaystyle\Pr(Y_{t}=p+1\mid Y_{t-1}=p) =1202p<12pPr(|Xt|=p+1|Xt1|=p).\displaystyle=\frac{1}{20\cdot 2^{p}}<\frac{1}{2^{p}}\approx\Pr(|X_{t}|=p+1\mid|X_{t-1}|=p).

Through step-by-step coupling it is easy to see that Yt|Xt|Y_{t}\leq|X_{t}| always, and thus,

Pr(Xt=X0=σ)Pr(Yt=0Y0=p).\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)\leq\Pr\left(Y_{t}=0\mid Y_{0}=p\right).

This allows us to establish Eq. 13 by studying the much simpler process {Yt}\{Y_{t}\}. Furthermore, we assume the YtY_{t} process does not go beyond value logn\log n (in fact, (1η)logn(1-\eta)\log n for any fixed constant η(0,1)\eta\in(0,1)) so that we are in the regime where the expansion property holds; this is the reason p=min{p,logn}p^{\prime}=\min\{p,\log n\} appears.

The same approach works perfectly for bounding the probability of hitting cliques of size (1+ε)logn(1+\varepsilon)\log n when starting from the empty clique, as in Theorem 7.6. For the low-temperature metropolis process with β=ω(logn)\beta=\omega(\log n) in Theorem 7.7, we also need one more observation that the process in fact does not remove any vertex in polynomially many steps and so it is equivalent to a greedy algorithm. In particular, we can apply the same approach to argue that the process never reaches cliques of size logn\log n that are subsets of cliques with large intersection or large size, which have much smaller measure. For the ST dynamics in Theorem 8.5, the same approach also works but we need a more sophisticated auxiliary process for the pair of clique size and inverse temperature, along with more complicated coupling arguments.

4 Organization of Main Body

The rest of the paper is organized as follows. In Section 5 we introduce the needed definitions and notation for the formal statements and proofs. In Section 6 we present our lower bounds for reaching a clique of intersection γlogn\gamma\log n with the planted clique. First we present the worst-case initialization result and then the case of empty clique initialization. Then in Section 7 we discuss our lower bounds for reaching a clique of size (1+ε)logn.(1+\varepsilon)\log n. As before, we first present the worst-case initialization result, and then discuss the empty clique initialization ones. Finally, in Section 8 we present our lower bounds for the simulated tempering dynamics.

5 Getting Started

For nn\in\mathbb{N}, let [n]={0,1,,n}[n]=\{0,1,\dots,n\} to be the set of all non-negative integers which are at most nn. Throughout the paper, we use log\log to represent logarithm to the base 22, i.e., logx=log2x\log x=\log_{2}x for x+x\in\mathbb{R}^{+}.

We say an event 𝒜\mathcal{A} holds with high probability (w.h.p.) if 1Pr(𝒜)o(1)1-\Pr(\mathcal{A})\leq o(1).

5.1 Random Graphs with a Planted Clique

Let 𝒢(n,12)\mathcal{G}(n,\frac{1}{2}) denote the random graph on nn vertices where every pair {u,v}\{u,v\} is an edge with probability 1/21/2 independently. For k[n]k\in[n], we denote by 𝒢(n,12,k)\mathcal{G}(n,\frac{1}{2},k) the random graph 𝒢(n,12)\mathcal{G}(n,\frac{1}{2}) with a planted kk-clique, where a subset of kk out of nn vertices is chosen uniformly at random and the random graph is obtained by taking the union of 𝒢(n,12)\mathcal{G}(n,\frac{1}{2}) and 𝒫𝒞\mathcal{PC} the kk-clique formed by those vertices.

Let Ω=Ω(G)\Omega=\Omega(G) be the collection of all cliques of an instance of graph G𝒢(n,12,k)G\sim\mathcal{G}(n,\frac{1}{2},k), and for q,r[n]q,r\in[n] with qrq\geq r let

Ωq,r={CΩ:|C|=q,|C𝒫𝒞|=r}.\Omega_{q,r}=\{C\in\Omega:|C|=q,\,|C\cap\mathcal{PC}|=r\}.

We also define for convenience Ωq,r=\Omega_{q,r}=\emptyset when q<rq<r or q>nq>n. Furthermore, let

Ωq,=r=0qΩq,randΩ,r=q=rnΩq,r.\Omega_{q,*}=\bigcup_{r=0}^{q}\Omega_{q,r}\quad\text{and}\quad\Omega_{*,r}=\bigcup_{q=r}^{n}\Omega_{q,r}.

We also define Ωq,=q=0qΩq,\Omega_{\leq q,*}=\bigcup_{q^{\prime}=0}^{q}\Omega_{q^{\prime},*} and Ω,r=r=0rΩ,r\Omega_{*,\leq r}=\bigcup_{r^{\prime}=0}^{r}\Omega_{*,r^{\prime}}.

For q,r[n]q,r\in[n] with qrq\geq r, let Wq,r=|Ωq,r|W_{q,r}=|\Omega_{q,r}| the number of qq-cliques CC in GG with |C𝒫𝒞|=r|C\cap\mathcal{PC}|=r. Similarly, Wq,r=0W_{q,r}=0 when q<rq<r or q>nq>n.

Lemma 5.1.

Let a constant α[0,1)\alpha\in[0,1) and consider the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique. Fix any absolute constant ε>0\varepsilon>0.

  1. (1)

    For any q=ρlog2nq=\left\lfloor\rho\log_{2}n\right\rfloor with parameter ρ>0\rho>0 and any r=γlog2nr=\left\lfloor\gamma\log_{2}n\right\rfloor with parameter 0γρ0\leq\gamma\leq\rho, it holds

    𝔼[Wq,r]=exp[(ln2)(logn)2(ρρ22(1α)γ+γ22+o(1))]\displaystyle\mathbb{E}[W_{q,r}]=\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]

    w.h.p. as n+n\rightarrow+\infty.

  2. (2)

    For r=0r=0 and any q=ρlog2nq=\left\lfloor\rho\log_{2}n\right\rfloor with 0<ρ2ε0<\rho\leq 2-\varepsilon, it holds

    Wq,012𝔼[Wq,0]\displaystyle W_{q,0}\geq\frac{1}{2}\,\mathbb{E}[W_{q,0}]

    w.h.p. as n+n\rightarrow+\infty.

  3. (3)

    For any r=γlognr=\left\lfloor\gamma\log n\right\rfloor with 0γ1α0\leq\gamma\leq 1-\alpha and any q=ρlog2nq=\left\lfloor\rho\log_{2}n\right\rfloor satisfying the inequality ρ1+(1γ)2+2αγ+ε\rho\geq 1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon, it holds

    Wq,r=0\displaystyle W_{q,r}=0

    w.h.p. as n+n\rightarrow+\infty.

The proof of the lemma is deferred to the Appendix.

Definition 5.2 (Clique-Counts Properties 𝒫𝗎𝗉𝗉\mathcal{P}_{\mathsf{upp}} and 𝒫𝗅𝗈𝗐\mathcal{P}_{\mathsf{low}}).

Let ε(0,1)\varepsilon\in(0,1) be an arbitrary constant.

  1. (1)

    (Upper Bounds) We say the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the property 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) if the following is true: For all integers q,rq,r\in\mathbb{N} with 0rqn0\leq r\leq q\leq n, it holds

    Wq,rn3𝔼[Wq,r];W_{q,r}\leq n^{3}\,\mathbb{E}[W_{q,r}];

    in particular, for 0r(1α)logn0\leq r\leq(1-\alpha)\log n and q(1+(1γ)2+2αγ+ε)lognq\geq(1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon)\log n where γ=r/logn\gamma=r/\log n, it holds

    Wq,r=0.W_{q,r}=0.
  2. (2)

    (Lower Bounds) We say the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the property 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) if the following is true: For every integer qq\in\mathbb{N} with 0q(2ε)logn0\leq q\leq(2-\varepsilon)\log n, it holds

    Wq,012𝔼[Wq,0].W_{q,0}\geq\frac{1}{2}\,\mathbb{E}[W_{q,0}].
Lemma 5.3.

For any constant α(0,1)\alpha\in(0,1) and any constant ε(0,1)\varepsilon\in(0,1), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies both 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) and 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) with probability 1o(1)1-o(1) as nn\to\infty.

Proof.

Follows immediately from Lemma 5.1, the Markov’s inequality, and the union bound. ∎

5.2 Hamiltonian and Gibbs Measure

For given nn\in\mathbb{N}, let h:[n]h:[n]\to\mathbb{R} be an arbitrary function. For ease of notations we write hq=h(q),q[n]h_{q}=h(q),q\in[n] and thus the function hh is identified by the vector (h0,h1,,hn)(h_{0},h_{1},\dots,h_{n}). Given an nn-vertex graph GG, consider the Hamiltonian function H:ΩH:\Omega\to\mathbb{R} where H(C)=h|C|H(C)=h_{|C|}. For β\beta\in\mathbb{R}, the corresponding Gibbs measure is defined as

πβ(C)wβ(C):=exp(βh|C|).\pi_{\beta}(C)\propto w_{\beta}(C):=\exp\left(\beta h_{|C|}\right). (14)

Let Z(β)=Z(G,h;β)Z(\beta)=Z(G,h;\beta) to be the partition function given by

Z(β)=CΩwβ(C).Z(\beta)=\sum_{C\in\Omega}w_{\beta}(C).

Furthermore, let

Zq,(β)=CΩq,wβ(C)andZ,r(β)=CΩ,rwβ(C).Z_{q,*}(\beta)=\sum_{C\in\Omega_{q,*}}w_{\beta}(C)\quad\text{and}\quad Z_{*,r}(\beta)=\sum_{C\in\Omega_{*,r}}w_{\beta}(C).
Assumption 5.4.

We assume that the Hamiltonian hh satisfies

  1. (a)

    h0=0h_{0}=0;

  2. (b)

    hh is 11-Lipschitz, i.e., |h(q)h(q)||qq||h(q)-h(q^{\prime})|\leq|q-q^{\prime}| for q,q2.1lognq,q^{\prime}\leq 2.1\log n.

5.3 Metropolis Process and the Hitting Time Lower Bound

In this work, we study the dynamics of the Metropolis process with the respect to the Gibbs measure defined in (14). The Metropolis process is a Markov chain on Ω=Ω(G)\Omega=\Omega(G), the space of all cliques of GG. The Metropolis process is described in Algorithm 2.

Input: a graph GG, a starting clique X0ΩX_{0}\in\Omega, stopping time TT
for t=1,,Tt=1,\dots,T do
       Pick vVv\in V uniformly at random;
       CXt1{v}C\leftarrow X_{t-1}\oplus\{v\};
       if CΩC\in\Omega then
             Xt{C,with probability min{1,πβ(C)/πβ(Xt)};Xt1,with remaining probability;X_{t}\leftarrow\begin{cases}C,&\text{with probability~{}}\min\{1,\pi_{\beta}(C)/\pi_{\beta}(X_{t})\};\\ X_{t-1},&\text{with remaining probability};\end{cases}
      else
            XtXt1X_{t}\leftarrow X_{t-1};
            
       end if
      
end for
Output: XTX_{T}
Algorithm 2 Metropolis Process

The Metropolis process is an ergodic and reversible Markov chain, with the unique stationary distribution πβ\pi_{\beta} except in the degenerate case when β=0\beta=0 and GG is the complete graph; see [Jer92].

The following lemma is a well-known fact for lower bounding the hitting time of some target set of a Markov chain using conductance; see [MWW09, Claim 2.1], [LP17, Theorem 7.4] and [AWZ20, Proposition 2.2]. We rely crucially on the following lemma for our worst-case initialization results.

Lemma 5.5.

Let PP be the transition matrix of an ergodic Markov chain on a finite state space Ω\Omega with stationary distribution π\pi. Let AΩA\subseteq\Omega be a set of states and let BΩB\subseteq\Omega be a set of boundary states for AA such that P(x,y)=0P(x,y)=0 for all xABx\in A\setminus B and yA𝖼By\in A^{\mathsf{c}}\setminus B. Then for any t>0t>0 there exists an initial state xAx\in A such that

Pr(it such that XiA𝖼X0=x)π(B)π(A)t.\Pr\left(\exists i\leq t\text{~{}such that~{}}X_{i}\in A^{\mathsf{c}}\mid X_{0}=x\right)\leq\frac{\pi(B)}{\pi(A)}\,t.

6 Quasi-polynomial Hitting Time of Large intersection

6.1 Existence of a Bad Initial Clique

Theorem 6.1.

Let α(0,1)\alpha\in(0,1) be any fixed constant. For any constant γ>0\gamma>0, the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general Gibbs measure given by Eq. 14 for arbitrary hh and arbitrary inverse temperature β\beta. There exists a constant c=c(α,γ)>0c=c(\alpha,\gamma)>0 and an initialization state for the Metropolis process from which it requires at least nclognn^{c\log n} steps to reach a clique of intersection with the planted clique at least γlogn\gamma\log n, with probability at least 1nclogn1-n^{-c\log n}. In particular, under the worst-case initialization it fails to recover the planted clique in polynomial-time.

Proof.

Notice that we can assume without loss of generality that the constant γ\gamma satisfies 0<γ1α0<\gamma\leq 1-\alpha. We pick

ε=12(1(1γ)2+2αγ).\varepsilon=\frac{1}{2}\left(1-\sqrt{(1-\gamma)^{2}+2\alpha\gamma}\right). (15)

Note that ε>0\varepsilon>0 since 0<γ1α0<\gamma\leq 1-\alpha. Then, by Lemma 5.3 we know that the random graph 𝒢(n,12,nα)\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor) satisfies both properties 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) and 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) simultaneously with probability 1o(1)1-o(1) as nn\to\infty. In the rest of the proof we assume that both 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) and 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) hold.

For any γ(0,1α]\gamma\in(0,1-\alpha], let r=γlognr=\left\lfloor\gamma\log n\right\rfloor. It suffices to show that there exists a constant c=c(α,γ)>0c=c(\alpha,\gamma)>0 such that

πβ(Ω,r)πβ(Ω,r)=Z,rZ,rexp(clog2n).\displaystyle\frac{\pi_{\beta}(\Omega_{*,r})}{\pi_{\beta}(\Omega_{*,\leq r})}=\frac{Z_{*,r}}{Z_{*,\leq r}}\leq\exp\left(-c\log^{2}n\right). (16)

Indeed, given Eq. 16, Theorem 6.1 is an immediate consequence of Lemma 5.5.

By the property 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) we have that Wq,r=0W_{q,r}=0 for all q>q¯q>\bar{q} where

q¯=1+(1γ)2+2αγ+εlogn=2εlogn.\bar{q}=\left\lfloor 1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon\right\rfloor\log n=\left\lfloor 2-\varepsilon\right\rfloor\log n.

Hence, we get from the definition of the restricted partition function Z,rZ_{*,r} that

Z,r=q=rnZq,r=q=rnWq,rexp(βhq)=q=rq¯Wq,rexp(βhq)n3q=rq¯𝔼[Wq,r]exp(βhq),Z_{*,r}=\sum_{q=r}^{n}Z_{q,r}=\sum_{q=r}^{n}W_{q,r}\exp\left(\beta h_{q}\right)=\sum_{q=r}^{\bar{q}}W_{q,r}\exp\left(\beta h_{q}\right)\leq n^{3}\sum_{q=r}^{\bar{q}}\mathbb{E}\left[W_{q,r}\right]\exp\left(\beta h_{q}\right),

where the last inequality again follows from 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon). We define

q=argmaxq[q¯]𝔼[Wq,r]exp(βhq).q^{*}=\operatorname*{arg\,max}_{q\in[\bar{q}]}\,\mathbb{E}\left[W_{q,r}\right]\exp\left(\beta h_{q}\right).

Thus, we have that

Z,rn4𝔼[Wq,r]exp(βhq).Z_{*,r}\leq n^{4}\,\mathbb{E}\left[W_{q^{*},r}\right]\exp\left(\beta h_{q^{*}}\right). (17)

Meanwhile, we have

Z,rZq,0=Wq,0exp(βhq).Z_{*,\leq r}\geq Z_{q^{*},0}=W_{q^{*},0}\exp\left(\beta h_{q^{*}}\right).

We deduce from qq¯(2ε)lognq^{*}\leq\bar{q}\leq(2-\varepsilon)\log n and the property 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) that

Z,r12𝔼[Wq,0]exp(βhq).Z_{*,\leq r}\geq\frac{1}{2}\,\mathbb{E}\left[W_{q^{*},0}\right]\exp\left(\beta h_{q^{*}}\right). (18)

Combining Eqs. 17 and 18, we get that,

Z,rZ,r2n4𝔼[Wq,r]𝔼[Wq,0].\frac{Z_{*,r}}{Z_{*,\leq r}}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{q^{*},r}\right]}{\mathbb{E}\left[W_{q^{*},0}\right]}.

Suppose that ρ=q/logn\rho=q^{*}/\log n. Then, an application of Item 1 of Lemma 5.1 implies that,

Z,rZ,r\displaystyle\frac{Z_{*,r}}{Z_{*,\leq r}} 2n4exp[(ln2)(logn)2(ρρ22(1α)γ+γ22+o(1))]exp[(ln2)(logn)2(ρρ22+o(1))]\displaystyle\leq 2n^{4}\,\frac{\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]}{\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}+o(1)\right)\right]}
exp[(ln2)(logn)2((1α)γ+γ22+o(1))].\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right].

This establishes (16) for c=c(α,γ):=(1α)γγ22>0c=c(\alpha,\gamma):=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}>0 since 0<γ1α0<\gamma\leq 1-\alpha, as we wanted. ∎

6.2 Starting from the Empty Clique

In this section, we strengthen Theorem 6.1 for a wide range of temperatures by showing that the Metropolis Dynamics still fails to obtain a significant intersection with the planted clique when starting from the empty clique (or any clique of sufficiently small size), a nature choice of initial configuration in practice.

6.2.1 Our Result

Theorem 6.2.

Let α(0,1)\alpha\in(0,1) be any fixed constant. For any constant γ(0,1α)\gamma\in(0,1-\alpha), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general Gibbs measure given by Eq. 14 for arbitrary 11-Lipschitz hh with h0=0h_{0}=0 and inverse temperature β(ln2)γlogn\beta\leq(\ln 2)\gamma\log n. Let {Xt}\{X_{t}\} denote the Metropolis process on GG with stationary distribution πβ\pi_{\beta}. Then there exist constants ξ=ξ(α,γ)>0\xi=\xi(\alpha,\gamma)>0 and c=c(α,γ)>0c=c(\alpha,\gamma)>0 such that for any clique CΩC\in\Omega of size at most ξlogn\xi\log n, one has

Pr(ttnclogn s.t. |Xt𝒫𝒞|γlogn|X0=C)nclogn.\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq n^{c\log n}\text{~{}s.t.~{}}|X_{t}\cap\mathcal{PC}|\geq\gamma\log n\;\Big{|}\;X_{0}=C\Big{)}\leq n^{-c\log n}.

In particular, the Metropolis process starting from the empty clique requires nΩ(logn)n^{\Omega(\log n)} steps to reach cliques of intersection with the planted clique at least γlogn\gamma\log n, with probability 1nΩ(logn)1-n^{-\Omega(\log n)}. As a consequence, it fails to recover the planted clique in polynomial time.

We proceed with the proof of the theorem.

6.2.2 Key Lemmas

We start with tracking how the size of the clique changes during the process. It is also helpful to consider the reversed process: show that it is unlikely to hit the empty clique \emptyset when starting from some clique of size logn\log n.

Definition 6.3.

For a graph G=(V,E)G=(V,E) and a subset UVU\subseteq V of vertices, we say a vertex vVUv\in V\setminus U is fully adjacent to UU if vv is adjacent to all vertices in UU; equivalently, UN(v)U\subseteq N(v) where N(v)N(v) denotes the set of neighboring vertices of vv in the graph GG. Let A(U)A(U) denote the set of all vertices in VUV\setminus U that are fully adjacent to UU; equivalently, A(U)A(U) is the set of all common neighbors in VUV\setminus U of all vertices from UU.

Definition 6.4 (Expansion Property 𝒫𝖾𝗑𝗉\mathcal{P}_{\mathsf{exp}}).

Let η(0,1)\eta\in(0,1) be an arbitrary constant. We say the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) if the following is true: For every UVU\subseteq V with |U|(1η)logn|U|\leq(1-\eta)\log n, it holds

|A(U)|n202|U|.|A(U)|\geq\frac{n}{20\cdot 2^{|U|}}.

The following lemma establishes the desired “expansion lemma” on the cliques of size less than logn\log n in G(n,12,k).G(n,\frac{1}{2},k).

Lemma 6.5 (“Expansion Lemma”).

For any constant α(0,1)\alpha\in(0,1) and any constant η(0,1)\eta\in(0,1), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) with probability 1o(1)1-o(1) as nn\to\infty.

The proof of the lemma is deferred to the Appendix.

The Lemma 6.5 is quite useful for us, as it allows us to obtain the following bounds on the size transitions for the Metropolis process:

Pr(|Xt|=q1|Xt1|=q)\displaystyle\Pr\left(|X_{t}|=q-1\mid|X_{t-1}|=q\right) =qnmin{1,exp[β(hq1hq)]}\displaystyle=\frac{q}{n}\min\left\{1,\exp\left[\beta\left(h_{q-1}-h_{q}\right)\right]\right\} (19)
Pr(|Xt|=q+1|Xt1|=q)\displaystyle\Pr\left(|X_{t}|=q+1\mid|X_{t-1}|=q\right) 1202qmin{1,exp[β(hq+1hq)]}.\displaystyle\geq\frac{1}{20\cdot 2^{q}}\min\left\{1,\exp\left[\beta\left(h_{q+1}-h_{q}\right)\right]\right\}. (20)

The proof is also making use of the following delicate lemma.

Lemma 6.6.

Consider the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique conditional on satisfying the property 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) for some fixed constant η(0,1)\eta\in(0,1). Let t,p,q,r+t,p,q,r\in\mathbb{N}_{+} be integers with prqp\leq r\leq q. Denote also ξ=p/logn,\xi=p/\log n, ρ=q/logn\rho=q/\log n and γ=r/logn.\gamma=r/\log n. For any CΩp,C\in\Omega_{p,*} we have

Pr(XtΩq,rX0=C)texp[(ln2)(logn)2(((1+β^)ρρ22)((1+β^)ρ(ρ)22)((1α)γγ22ξ+ξ22)+o(1))],\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)\leq\\ t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right],

where ρ=min{ρ,1η}\rho^{\prime}=\min\{\rho,1-\eta\} and β^=β/((ln2)(logn)).\hat{\beta}=\beta/((\ln 2)(\log n)).

We postpone the proof of Lemma 6.6 to Section 6.2.4 and first show how it can be used to prove Theorem 6.2. On a high level, we bound the probability of hitting Ωq,r\Omega_{q,r} by studying how the size of the clique evolves during the process up to size (1η)logn(1-\eta)\log n. The term ((1+β^)ρρ2/2)((1+β^)ρ(ρ)2/2)((1+\hat{\beta})\rho-\rho^{2}/2)-((1+\hat{\beta})\rho^{\prime}-(\rho^{\prime})^{2}/2) represents the approximation error when the clique goes beyond size (1η)logn(1-\eta)\log n; in particular, when the destination clique size q=ρlognq=\rho\log n is at most (1η)logn(1-\eta)\log n, we have ρ=ρ\rho^{\prime}=\rho and this error is zero. Meanwhile, the term (1α)γγ2/2ξ+ξ2/2(1-\alpha)\gamma-\gamma^{2}/2-\xi+\xi^{2}/2 corresponds to the fact that the number of cliques of size qq and intersection r=γlognr=\gamma\log n with the planted clique is much smaller than the total number of cliques of size qq, and hence reaching intersection rr is very unlikely.

6.2.3 Proof of Theorem 6.2, given Lemma 6.6

Proof of Theorem 6.2.

Let β^=β/((ln2)(logn))\hat{\beta}=\beta/((\ln 2)(\log n)) and recall β^γ<1α.\hat{\beta}\leq\gamma<1-\alpha. As will be clear later, we shall choose

ξ=14(1αγ)γandη=min{18(1αγ),γ}.\xi=\frac{1}{4}(1-\alpha-\gamma)\gamma\quad\text{and}\quad\eta=\min\left\{\frac{1}{8}(1-\alpha-\gamma),\gamma\right\}.

By Lemmas 5.3 and 6.5, the random graph 𝒢(n,12,nα)\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor) satisfies both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) with probability 1o(1)1-o(1) as nn\to\infty, for the choice of η\eta given above. In the rest of the proof we assume that both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) are satisfied.

Suppose that |C|=p=ξlognξlogn|C|=p=\xi^{\prime}\log n\leq\xi\log n. By the union bound we have

Pr(ttT:XtΩ,r|X0=C)\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{*,r}\;\Big{|}\;X_{0}=C\Big{)} t=1Tq=rnPr(XtΩq,rX0=C)\displaystyle\leq\sum_{t=1}^{T}\sum_{q=r}^{n}\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)
Tnmaxt[T]maxq[n]Pr(XtΩq,rX0=C).\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{q\in[n]}\,\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right). (21)

Now let us fix some t[T],q[n].t\in[T],q\in[n]. By Lemma 6.6,

Pr(XtΩq,rX0=C)\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)
\displaystyle\leq{} texp[(ln2)(logn)2(((1+β^)ρρ22)((1+β^)ρ(ρ)22)((1α)γγ22ξ+(ξ)22)+o(1))]\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi^{\prime}+\frac{(\xi^{\prime})^{2}}{2}\right)+o(1)\right)\right]
\displaystyle\leq{} texp[(ln2)(logn)2(((1+β^)ρρ22)((1+β^)ρ(ρ)22)((1α)γγ22ξ+ξ22)+o(1))],\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right],

where we have used that ξξ1.\xi^{\prime}\leq\xi\leq 1. Write for shorthand

A=((1+β^)ρρ22)((1+β^)ρ(ρ)22)andB=(1α)γγ22ξ+ξ22.A=\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)\quad\text{and}\quad B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}.

So we have

Pr(XtΩq,rX0=C)texp[(ln2)(logn)2(AB+o(1))].\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)\leq t\exp\left[(\ln 2)(\log n)^{2}\left(A-B+o(1)\right)\right]. (22)

Given (22) it suffices to show that for all α(0,1),γ(0,1α)\alpha\in(0,1),\gamma\in(0,1-\alpha) there exists c0(α,γ)>0c_{0}(\alpha,\gamma)>0 such that uniformly for all values of interest of the parameters ρ,β^\rho,\hat{\beta} we have ABc0(α,γ)A-B\leq-c_{0}(\alpha,\gamma). Indeed, then by (21) we have

Pr(ttT:XtΩ,r|X0=C)T2nexp[c0(α,γ)(ln2)(logn)2]\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{*,r}\;\Big{|}\;X_{0}=C\Big{)}\leq T^{2}n\exp\left[-c_{0}(\alpha,\gamma)(\ln 2)(\log n)^{2}\right]

and Theorem 6.2 follows e.g. for c(α,γ)=ln220c0(α,γ).c(\alpha,\gamma)=\frac{\ln 2}{20}c_{0}(\alpha,\gamma).

We now construct the desired function c0(α,γ)>0.c_{0}(\alpha,\gamma)>0. If ρ1η\rho\leq 1-\eta, then ρ=ρ\rho^{\prime}=\rho and A=0A=0. Meanwhile, we have

B=(1α)γγ22ξ+ξ22(1α)γγ2214(1αγ)γ34(1αγ)γ.B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\geq(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\frac{1}{4}(1-\alpha-\gamma)\gamma\geq\frac{3}{4}(1-\alpha-\gamma)\gamma.

So AB34(1αγ)γA-B\leq-\frac{3}{4}(1-\alpha-\gamma)\gamma.

If ρ>1η\rho>1-\eta, then ρ=1η\rho^{\prime}=1-\eta and we have

A\displaystyle A =((1+β^)ρρ22)((1+β^)ρ(ρ)22)\displaystyle=\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)
=β^(ρ1)12(ρ1)2+β^η+η22\displaystyle=\hat{\beta}(\rho-1)-\frac{1}{2}(\rho-1)^{2}+\hat{\beta}\eta+\frac{\eta^{2}}{2}
γ22+14(1αγ)γ,\displaystyle\leq\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma,

where the last inequality follows from

β^(ρ1)12(ρ1)2β^22γ22\hat{\beta}(\rho-1)-\frac{1}{2}(\rho-1)^{2}\leq\frac{\hat{\beta}^{2}}{2}\leq\frac{\gamma^{2}}{2}

and also

β^η+η22γ18(1αγ)+γ218(1αγ)14(1αγ)γ\hat{\beta}\eta+\frac{\eta^{2}}{2}\leq\gamma\cdot\frac{1}{8}(1-\alpha-\gamma)+\frac{\gamma}{2}\cdot\frac{1}{8}(1-\alpha-\gamma)\leq\frac{1}{4}(1-\alpha-\gamma)\gamma

since η(1αγ)/8\eta\leq(1-\alpha-\gamma)/8 and ηγ\eta\leq\gamma. Meanwhile, we have

B=(1α)γγ22ξ+ξ22(1α)γγ2214(1αγ)γ.B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\geq(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\frac{1}{4}(1-\alpha-\gamma)\gamma.

So, we deduce that

ABγ22+14(1αγ)γ(1α)γ+γ22+14(1αγ)γ=12(1αγ)γ.A-B\leq\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma=-\frac{1}{2}(1-\alpha-\gamma)\gamma.

Hence, in all cases ABc0(α,γ)A-B\leq-c_{0}(\alpha,\gamma) for

c0(α,γ)=12(1αγ)γ.c_{0}(\alpha,\gamma)=\frac{1}{2}(1-\alpha-\gamma)\gamma.

This completes the proof of the theorem. ∎

6.2.4 Proof of Lemma 6.6

In this subsection we prove the crucial Lemma 6.6. Throughout this subsection we assume that the property 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) hold for some fixed constant η(0,1)\eta\in(0,1). First, recall that Wq,r=|Ωq,r|W_{q,r}=|\Omega_{q,r}|. We have from 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) that

Pr(XtΩq,rX0=C)\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right) =σΩq,rPr(Xt=σX0=C)\displaystyle=\sum_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
Wq,rmaxσΩq,rPr(Xt=σX0=C)\displaystyle\leq W_{q,r}\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
n3𝔼[Wq,r]maxσΩq,rPr(Xt=σX0=C).\displaystyle\leq n^{3}\,\mathbb{E}[W_{q,r}]\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right). (23)

For now, we fix a σΩq,r\sigma\in\Omega_{q,r} and focus on bounding Pr(Xt=σX0=C).\Pr\left(X_{t}=\sigma\mid X_{0}=C\right). The key idea to bound this probability is to exploit the reversibility of the Metropolis process. The following two standard facts are going to be useful.

Fact 6.7 ([LP17]).

If PP is the transition matrix of a reversible Markov chain over a finite state space Γ\Gamma with stationary distribution μ\mu, then for all x,yΓx,y\in\Gamma and all integer t1t\geq 1 it holds

μ(x)Pt(x,y)=μ(y)Pt(y,x).\mu(x)P^{t}(x,y)=\mu(y)P^{t}(y,x).
Fact 6.8 ([LP17]).

For a birth-death process on [n][n] with transition probabilities

P(i,i+1)=pi,P(i,i1)=qi,andP(i,i)=1piqi,P(i,i+1)=p_{i},\quad P(i,i-1)=q_{i},\quad\text{and}\quad P(i,i)=1-p_{i}-q_{i},

the stationary distribution is given by

μ(i)s=1ips1qs,i[n].\mu(i)\propto\prod_{s=1}^{i}\frac{p_{s-1}}{q_{s}},\quad\forall i\in[n].

Now, notice that using the time-reversed dynamics it suffices try to bound the probability of reaching a small clique CC when starting from a large clique σ\sigma. Indeed, by reversibility we have

Pr(Xt=σX0=C)=exp(β(hqhp))Pr(Xt=CX0=σ).\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)=\exp\left(\beta\left(h_{q}-h_{p}\right)\right)\Pr\left(X_{t}=C\mid X_{0}=\sigma\right). (24)

which is an application of Fact 6.7.

We introduce a birth-death process on [n][n] denoted by {Yt}\{Y_{t}\} with transition matrix PP given by the following transition probabilities:

P(s,s1)\displaystyle P(s,s-1) =snmin{exp(β(hs1hs)),1},1sn;\displaystyle=\frac{s}{n}\min\left\{\exp\left(\beta\left(h_{s-1}-h_{s}\right)\right),1\right\},\quad 1\leq s\leq n;
P(s,s+1)\displaystyle P(s,s+1) ={1202smin{exp(β(hs+1hs)),1},0s<(1η)logn;0,(1η)lognsn1;\displaystyle=\begin{cases}\frac{1}{20\cdot 2^{s}}\min\left\{\exp\left(\beta\left(h_{s+1}-h_{s}\right)\right),1\right\},&\quad 0\leq s<\left\lfloor(1-\eta)\log n\right\rfloor;\\ 0,&\quad\left\lfloor(1-\eta)\log n\right\rfloor\leq s\leq n-1;\end{cases}
P(s,s)\displaystyle P(s,s) =1P(s,s1)P(s,s+1),0sn.\displaystyle=1-P(s,s-1)-P(s,s+1),\quad 0\leq s\leq n.

Denote the stationary distribution of {Yt}\{Y_{t}\} by ν\nu, which is supported on {0,1,,(1η)logn}\{0,1,\dots,\left\lfloor(1-\eta)\log n\right\rfloor\}. The process {Yt}\{Y_{t}\} serves as an approximation of {|Xt|}\{|X_{t}|\}; note that {|Xt|}\{|X_{t}|\} itself is not a Markov process.

The following lemma shows that YtY_{t} is stochastically dominated by |Xt||X_{t}|. The proof of this fact is essentially based on the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta), and the derived in the proof below bounds on the size transition of |Xt||X_{t}|, described in Eqs. 19 and 20.

Lemma 6.9.

Let {Xt}\{X_{t}\} denote the Metropolis process starting from some X0=σΩq,X_{0}=\sigma\in\Omega_{q,*}. Let {Yt}\{Y_{t}\} denote the birth-death process described above with parameter η(0,1)\eta\in(0,1) starting from Y0=qY_{0}=q. Then there exists a coupling {(Xt,Yt)}\{(X_{t},Y_{t})\} of the two processes such that for all integer t1t\geq 1 it holds

Yt|Xt|.Y_{t}\leq|X_{t}|.

In particular, for all integer t1t\geq 1 it holds

Pr(Xt=CX0=σ)Pr(ttt:Yt=pY0=q)t=1tPr(Yt=pY0=q).\Pr\left(X_{t}=C\mid X_{0}=\sigma\right)\leq\Pr\left(\exists t^{\prime}\in\mathbb{N}\wedge t^{\prime}\leq t:\,Y_{t^{\prime}}=p\mid Y_{0}=q\right)\leq\sum_{t^{\prime}=1}^{t}\Pr\left(Y_{t^{\prime}}=p\mid Y_{0}=q\right).
Proof.

We couple {|Xt|}\{|X_{t}|\} and {Yt}\{Y_{t}\} as follows. Suppose that Yt1|Xt1|Y_{t-1}\leq|X_{t-1}| for some integer t1t\geq 1. We will construct a coupling of XtX_{t} and YtY_{t} such that Yt|Xt|Y_{t}\leq|X_{t}|. Notice that the following probability inequality is a straightforward corollary of that.

Since the probability that Yt=Yt1+1Y_{t}=Y_{t-1}+1 is less than 1/21/2 and so does the probability of |Xt|=|Xt1|1|X_{t}|=|X_{t-1}|-1, we may couple XtX_{t} and YtY_{t} such that |Xt|Yt|X_{t}|-Y_{t} decreases at most one; namely, it never happens that YtY_{t} increases by 11 while XtX_{t} decreases in size. Thus, it suffices to consider the extremal case when |Xt1|=Yt1=s|X_{t-1}|=Y_{t-1}=s. We have

Pr(|Xt|=s1|Xt1|=s)\displaystyle\Pr\left(|X_{t}|=s-1\mid|X_{t-1}|=s\right) =snmin{1,exp[β(hs1hs)]}=P(s,s1)\displaystyle=\frac{s}{n}\min\left\{1,\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right]\right\}=P\left(s,s-1\right) (25)

Meanwhile, recall that A(Xt1)A(X_{t-1}) is the set of vertices vv such that Xt1{v}ΩX_{t-1}\cup\{v\}\in\Omega. Then we have

|A(Xt1)|n202s|A(X_{t-1})|\geq\frac{n}{20\cdot 2^{s}}

whenever snη:=(1η)logns\leq n_{\eta}:=\left\lfloor(1-\eta)\log n\right\rfloor by the expansion property 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta). Hence, we deduce that

Pr(|Xt|=s+1|Xt1|=s)\displaystyle\Pr\left(|X_{t}|=s+1\mid|X_{t-1}|=s\right) =|A(Xt1)|nmin{1,exp[β(hs+1hs)]}\displaystyle=\dfrac{|A(X_{t-1})|}{n}\min\left\{1,\exp\left[\beta\left(h_{s+1}-h_{s}\right)\right]\right\}
𝟙{s<nη}202smin{1,exp[β(hs+1hs)]}=P(s,s+1)\displaystyle\geq\dfrac{\mathbbm{1}\{s<n_{\eta}\}}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta\left(h_{s+1}-h_{s}\right)\right]\right\}=P\left(s,s+1\right) (26)

Using (25) and (26) we can couple |Xt||X_{t}| and YtY_{t} such that either |Xt|=Yt|X_{t}|=Y_{t} or |Xt|=s+1|X_{t}|=s+1 and Yt=sY_{t}=s, as desired. ∎

The next lemma upper bounds the tt-step transition probability Pr(Yt=pY0=q)\Pr\left(Y_{t}=p\mid Y_{0}=q\right).

Lemma 6.10.

Let {Yt}\{Y_{t}\} denote the birth-death process described above with parameter η(0,1)\eta\in(0,1) starting from Y0=q=ρlognY_{0}=q=\rho\log n and let p=ξlognp=\xi\log n with ξ1η\xi\leq 1-\eta. Then for all integer t1t\geq 1 we have

Pr(Yt=pY0=q)exp[(ln2)(logn)2(ρ+(ρ)22+ξξ22+o(1))]exp[β(hphq)]\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{p}-h_{q^{\prime}}\right)\right] (27)

where ρ=min{ρ,1η}\rho^{\prime}=\min\{\rho,1-\eta\} and q=ρlognq^{\prime}=\rho^{\prime}\log n.

Proof.

We consider first the case where ρ1η\rho\leq 1-\eta. By Fact 6.7, we have

Pr(Yt=pY0=q)=Pt(q,p)=ν(p)ν(q)Pt(p,q)ν(p)ν(q).\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right)=P^{t}(q,p)=\frac{\nu(p)}{\nu(q)}P^{t}(p,q)\leq\frac{\nu(p)}{\nu(q)}. (28)

By Fact 6.8, we have

ν(p)ν(q)\displaystyle\frac{\nu(p)}{\nu(q)} =s=p+1qsnmin{exp[β(hs1hs)],1}1202s1min{exp[β(hshs1)],1}\displaystyle=\prod_{s=p+1}^{q}\frac{\frac{s}{n}\min\left\{\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right],1\right\}}{\frac{1}{20\cdot 2^{s-1}}\min\left\{\exp\left[\beta\left(h_{s}-h_{s-1}\right)\right],1\right\}}
=s=p+1q20s2s1nexp[β(hs1hs)]\displaystyle=\prod_{s=p+1}^{q}\frac{20s\cdot 2^{s-1}}{n}\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right]
=20qp(q!/p!)2(q2)(p2)nqpexp[β(hphq)]\displaystyle=\frac{20^{q-p}\cdot(q!/p!)\cdot 2^{\binom{q}{2}-\binom{p}{2}}}{n^{q-p}}\exp\left[\beta\left(h_{p}-h_{q}\right)\right]
=exp[(ln2)(logn)2(ρ+ρ22+ξξ22+o(1))]exp[β(hphq)].\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\rho+\frac{\rho^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{p}-h_{q}\right)\right].

Next, consider the case where ρ>1η\rho>1-\eta, or equivalently q>qq>q^{\prime}. Let τ\tau be the first time that Yt=qY_{t^{\prime}}=q^{\prime} and we obtain from (28) that

Pr(Yt=pY0=q)\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right) =t=0tPr(τ=tY0=q)Pr(Ytt=pY0=q)\displaystyle=\sum_{t^{\prime}=0}^{t}\Pr\left(\tau=t^{\prime}\mid Y_{0}=q\right)\Pr\left(Y_{t-t^{\prime}}=p\mid Y_{0}=q^{\prime}\right)
ν(p)ν(q)t=0tPr(τ=tY0=q)=ν(p)ν(q)Pr(τtY0=q)ν(p)ν(q).\displaystyle\leq\frac{\nu(p)}{\nu(q^{\prime})}\sum_{t^{\prime}=0}^{t}\Pr\left(\tau=t^{\prime}\mid Y_{0}=q\right)=\frac{\nu(p)}{\nu(q^{\prime})}\Pr\left(\tau\leq t\mid Y_{0}=q\right)\leq\frac{\nu(p)}{\nu(q^{\prime})}.

This completes the proof of the lemma. ∎

We now proceed with the proof of Lemma 6.6.

Proof of Lemma 6.6.

From Lemmas 6.9 and 6.10, we have for each σΩq,r\sigma\in\Omega_{q,r}

Pr(Xt=σX0=C)\displaystyle\Pr\left(X_{t}=\sigma\mid X_{0}=C\right) =exp[β(hqhp)]Pr(Xt=CX0=σ)\displaystyle=\exp\left[\beta\left(h_{q}-h_{p}\right)\right]\Pr\left(X_{t}=C\mid X_{0}=\sigma\right)
exp[β(hqhp)]maxt[t]tPr(Yt=pY0=q)\displaystyle\leq\exp\left[\beta\left(h_{q}-h_{p}\right)\right]\,\max_{t^{\prime}\in[t]}\,t\Pr\left(Y_{t^{\prime}}=p\mid Y_{0}=q\right)
texp[(ln2)(logn)2(ρ+(ρ)22+ξξ22+o(1))]exp[β(hqhq)],\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right],

where ρ=min{ρ,1η}\rho^{\prime}=\min\{\rho,1-\eta\} and q=ρlognq^{\prime}=\rho^{\prime}\log n.

Combining now with Eq. 23 and Lemma 5.1 we have that Pr(XtΩq,rX0=C)\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right) is at most

n3𝔼[Wq,r]maxσΩq,rPr(Xt=σX0=C)\displaystyle n^{3}\,\mathbb{E}[W_{q,r}]\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
\displaystyle\leq{} n3texp[(ln2)(logn)2(ρρ22(1α)γ+γ22+o(1))]\displaystyle n^{3}t\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]
exp[(ln2)(logn)2(ρ+(ρ)22+ξξ22+o(1))]exp[β(hqhq)]\displaystyle\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right]
=\displaystyle={} texp[(ln2)(logn)2((ρρ22)(ρ(ρ)22)((1α)γγ22ξ+ξ22)+o(1))]exp[β(hqhq)].\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left(\rho-\frac{\rho^{2}}{2}\right)-\left(\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right].

Since hh is 11-Lipschitz and qqq^{\prime}\leq q, we have

exp[β(hqhq)]exp[β(qq)]exp[(ln2)(logn)2(β^(ρρ))]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right]\leq\exp\left[\beta\left(q-q^{\prime}\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(\hat{\beta}\left(\rho-\rho^{\prime}\right)\right)\right]

where we recall that β=(ln2)β^logn\beta=(\ln 2)\hat{\beta}\log n. Therefore,

Pr(XtΩq,rX0=C)\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)
\displaystyle\leq{} texp[(ln2)(logn)2(((1+β^)ρρ22)((1+β^)ρ(ρ)22)((1α)γγ22ξ+ξ22)+o(1))].\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right].

The proof of the lemma is complete. ∎

7 Quasi-polynomial Hitting Time of Large Cliques

In this section, we present our results about the failure of the Metropolis process to even find cliques of size at least (1+ε)logn(1+\varepsilon)\log n, for any planted clique size k=nα,α(0,1).k=\left\lfloor n^{\alpha}\right\rfloor,\alpha\in(0,1).

7.1 Existence of a Bad Initial Clique

We start with the “worst-case” initialization result which now works for all inverse temperatures β=O(logn)\beta=O(\log n), but establishes that from this initialization the Metropolis process fails to find either a clique of size at least (1+ε)logn(1+\varepsilon)\log n or to find a clique with intersection at least γlogn\gamma\log n with the planted clique.

Theorem 7.1.

Let α[0,1)\alpha\in[0,1) be any fixed constant. Then the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general Gibbs measure given by Eq. 14 for arbitrary hh satisfying 5.4 and arbitrary inverse temperature β=O(logn)\beta=O(\log n). For any constants ε(0,1α)\varepsilon\in(0,1-\alpha) and γ(0,1α]\gamma\in(0,1-\alpha], there exists a constant c>0c>0 and an initialization state for the Metropolis process from which it requires at least nclognn^{c\log n} steps to reach

  • either cliques of size at least (1+ε)logn(1+\varepsilon)\log n,

  • or cliques of intersection with the planted clique at least γlogn\gamma\log n,

with probability at least 1nclogn1-n^{-c\log n}.

We now present the proof of Theorem 7.1. We first need the notion of gateways as introduced by [Jer92] in his original argument for the failure of the Metropolis process.

Definition 7.2 (Gateways).

For q[n]q\in[n], we say a clique CΩC\in\Omega is a qq-gateway if there exists \ell\in\mathbb{N} and a sequence of cliques C0=C,C1,,CΩC_{0}=C,C_{1},\dots,C_{\ell}\in\Omega such that

  1. (1)

    For every 1i1\leq i\leq\ell, Ci1C_{i-1} and CiC_{i} differ by exactly one vertex;

  2. (2)

    For every 0i0\leq i\leq\ell, |Ci||C||C_{i}|\geq|C|;

  3. (3)

    |C|=q|C_{\ell}|=q.

Let Ψq\Psi_{q} denote the collection of all cliques that are qq-gateways.

Notice that by definition if a clique σ\sigma is a qq-gateway then |σ|q|\sigma|\leq q.

Definition 7.3 (Gateway-Counts Property 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}}).

We say the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the gateway-counts property 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}} if the following is true: For all integers q=(1+ε)lognq=\left\lfloor(1+\varepsilon)\log n\right\rfloor, p=(1+εθ)lognp=\left\lfloor(1+\varepsilon-\theta)\log n\right\rfloor, and u=(ε/6)lognu=\left\lfloor(\varepsilon/6)\log n\right\rfloor with parameters ε(0,1α)\varepsilon\in(0,1-\alpha) and θ(0,ε)\theta\in(0,\varepsilon), it holds

|ΨqΩp,u|exp[(ln2)(logn)2((1+εθ)12(1+εθ)2θ(56ε2θ)+o(1))].\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right].

The following lemma follows immediately from arguments in [Jer92].

Lemma 7.4.

For any constant α[0,1)\alpha\in[0,1), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the gateway-counts property 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}} with probability 1o(1)1-o(1) as nn\to\infty.

Proof.

We follow the same approach as in [Jer92] with the slight modification that θ\theta is not equal to ε/3\varepsilon/3 but arbitrary. For any CΨqΩp,uC\in\Psi_{q}\cap\Omega_{p,\leq u}, there exists a set UVCU\in V\setminus C of size |U|=qp|U|=q-p and a subset WC𝒫𝒞W\subseteq C\setminus\mathcal{PC} of size 2pqu2p-q-u such that every vertex from UU is adjacent to every vertex from WW. To see this, consider a path C0=C,C1,,CC_{0}=C,C_{1},\dots,C_{\ell} as in Definition 7.2, and consider the first clique CC^{\prime} in the path such that |CC|=qp|C^{\prime}\setminus C|=q-p. Such CC^{\prime} must exist since the destination clique CC_{\ell} has size qq while |C|=p<q|C|=p<q. Note that CC^{\prime} corresponds to the first time when qpq-p new vertices are added. Meanwhile, since |C|p|C^{\prime}|\geq p, we have |CC|p(qp)=2pq|C\cap C^{\prime}|\geq p-(q-p)=2p-q and |CC𝒫𝒞|2pqu|C\cap C^{\prime}\setminus\mathcal{PC}|\geq 2p-q-u. We can thus take U=CCU=C^{\prime}\setminus C and any WCC𝒫𝒞W\subseteq C\cap C^{\prime}\setminus\mathcal{PC} of size 2pqu2p-q-u.

Hence, we can associate every qq-gateway CC in Ωp,u\Omega_{p,\leq u} with a tuple (C,U,W)(C,U,W) satisfying all the conditions mentioned above: CC is a clique of size pp and intersection at most uu with 𝒫𝒞\mathcal{PC}, UVCU\subseteq V\setminus C has size qpq-p, WC𝒫𝒞W\subseteq C\setminus\mathcal{PC} has size 2pqu2p-q-u, and U,WU,W are fully connected. Let XX denote the number of such tuples. Then, |ΨqΩp,u|X|\Psi_{q}\cap\Omega_{p,\leq u}|\leq X. The first moment of XX is given by

𝔼[X]\displaystyle\mathbb{E}[X] =r=0u(kr)(nkpr)(npqp)(pr2pqu)(12)(p2)(r2)+(qp)(2pqu)\displaystyle=\sum_{r=0}^{u}\binom{k}{r}\binom{n-k}{p-r}\binom{n-p}{q-p}\binom{p-r}{2p-q-u}\left(\frac{1}{2}\right)^{\binom{p}{2}-\binom{r}{2}+(q-p)(2p-q-u)}
exp[(ln2)(logn)2((1+εθ)12(1+εθ)2θ(56ε2θ)+o(1))],\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right],

where in the first equality, (kr)(nkpr)\binom{k}{r}\binom{n-k}{p-r} counts the number of choices of CC for rr ranging from 0 to uu, (1/2)(p2)(r2)\left(1/2\right)^{\binom{p}{2}-\binom{r}{2}} is the probability CC being a clique, (npqp)\binom{n-p}{q-p} is the number of choices of UU, (pr2pqu)\binom{p-r}{2p-q-u} is for WW, and finally (1/2)(qp)(2pqu)\left(1/2\right)^{(q-p)(2p-q-u)} is the probability of U,WU,W being fully connected. The lemma then follows from the Markov’s inequality

|ΨqΩp,u|Xn𝔼[X],|\Psi_{q}\cap\Omega_{p,\leq u}|\leq X\leq n\,\mathbb{E}[X],

and a union bound over the choices of qq, pp, and uu. ∎

We now present the proof of Theorem 7.1.

Proof of Theorem 7.1.

By Lemmas 5.3 and 7.4, the random graph 𝒢(n,12,nα)\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor) satisfies both the clique-counts properties 𝒫𝗎𝗉𝗉(ε0)\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0}) and 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}) for ε0=α1ε\varepsilon_{0}=\alpha\leq 1-\varepsilon and the gateway-counts properties 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}} simultaneously with probability 1o(1)1-o(1) as nn\to\infty. Throughout the proof we assume that 𝒫𝗎𝗉𝗉(ε0)\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0}), 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}), and 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}} are all satisfied.

Suppose β^\hat{\beta} is such that β^=β/((ln2)(logn))\hat{\beta}=\beta/((\ln 2)(\log n)) so that β^=O(1)\hat{\beta}=O(1). Pick a constant θ(0,ε/3)\theta\in(0,\varepsilon/3) such that

β^θ12((1α)γγ22).\hat{\beta}\theta\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Let q=(1+ε)lognq=(1+\varepsilon)\log n, p=(1+εθ)lognp=(1+\varepsilon-\theta)\log n, and r=γlognr=\gamma\log n. We define

=(ΨqΩp,<r)Ω<q,r\mathcal{B}=\left(\Psi_{q}\cap\Omega_{p,<r}\right)\cup\Omega_{<q,r}

to be the “bottleneck” set to which we will apply Lemma 5.5. Let 𝒜Ω\mathcal{A}\subseteq\Omega denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from \mathcal{B} except possibly for the destination. The following claim, whose proof is postponed to the end of this subsection, follows easily from the definitions of 𝒜\mathcal{A} and \mathcal{B}.

Claim 7.5.
  1. 1.

    Cliques from 𝒜\mathcal{A}\setminus\mathcal{B} are not adjacent to cliques from 𝒜𝖼\mathcal{A}^{\mathsf{c}} (i.e., they differ by at least two vertices);

  2. 2.

    Ωq,𝒜𝖼\Omega_{q,*}\subseteq\mathcal{A}^{\mathsf{c}}\setminus\mathcal{B};

  3. 3.

    Ω,r𝒜𝖼\Omega_{*,r}\subseteq\mathcal{A}^{\mathsf{c}}\cup\mathcal{B}.

Now observe from Claim 7.5 that to prove what we want, it suffices to show that starting from an appropriate state the Metropolis process does not hit any clique from \mathcal{B} in exp(clog2n)\exp(c\log^{2}n)-time with probability 1exp(clog2n)1-\exp(-c\log^{2}n). For a collection 𝒰Ω\mathcal{U}\subseteq\Omega of cliques we write Z(𝒰)=Z(β;𝒰)=σ𝒰eβh|σ|Z(\mathcal{U})=Z(\beta;\mathcal{U})=\sum_{\sigma\in\mathcal{U}}e^{\beta h_{|\sigma|}} to represent the partition function restricted to the set 𝒰\mathcal{U}. Since the boundary of 𝒜\mathcal{A} is included in \mathcal{B} by Claim 7.5 it suffices to show that there exists a constant c>0c>0 such that

Z()Z(𝒜)exp(clog2n).\displaystyle\frac{Z(\mathcal{B})}{Z(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right). (29)

Given (29), Theorem 7.1 is an immediate consequence of Lemma 5.5.

Observe that we have the following inclusion,

𝒜Ωp,<rΩp,0.\mathcal{A}\supseteq\Omega_{p,<r}\cup\Omega_{\leq p,0}.

To see this, for every clique in Ωp,<r\Omega_{p,<r}, it can be reached from the empty clique by adding vertices one by one in any order, so that none of the intermediate cliques are from Ωp,<r(ΨqΩp,<r)\Omega_{p,<r}\supseteq\left(\Psi_{q}\cap\Omega_{p,<r}\right) or from Ω<q,r\Omega_{<q,r}, except for possibly the last one. Similarly, every clique in Ωp,0\Omega_{\leq p,0} can be reached from the empty clique in the same way. (Note that cliques in Ωq,0\Omega_{\leq q,0}, however, may not be reachable from \emptyset without crossing \mathcal{B}; for example, by adding vertices one by one to reach a clique of size qq, it may first reach a clique of size p<qp<q which is a qq-gateway with intersection <r<r.) Thus, we have

Z()Z(𝒜)Z(ΨqΩp,<r)Z(𝒜)+Z(Ω<q,r)Z(𝒜)Z(ΨqΩp,<r)Z(Ωp,<r)+Z(Ω<q,r)Z(Ωp,0)=|ΨqΩp,<r|Wp,<r+Z<q,rZp,0.\frac{Z(\mathcal{B})}{Z(\mathcal{A})}\leq\frac{Z\left(\Psi_{q}\cap\Omega_{p,<r}\right)}{Z(\mathcal{A})}+\frac{Z(\Omega_{<q,r})}{Z(\mathcal{A})}\leq\frac{Z\left(\Psi_{q}\cap\Omega_{p,<r}\right)}{Z(\Omega_{p,<r})}+\frac{Z(\Omega_{<q,r})}{Z(\Omega_{\leq p,0})}=\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}+\frac{Z_{<q,r}}{Z_{\leq p,0}}. (30)

The rest of the proof aims to upper bound the two ratios in Eq. 30 respectively.

For the first ratio, the key observation is that since γ1α\gamma\leq 1-\alpha, Ωp,<r\Omega_{p,<r} is dominated by cliques of intersection o(logn)o(\log n) (almost completely outside the planted clique) or more rigorously speaking those with sufficiently small intersection, say, in Ωp,u\Omega_{p,\leq u} where u=(ε/6)lognu=(\varepsilon/6)\log n. Hence, we write

|ΨqΩp,<r|=|ΨqΩp,u|+|ΨqΩp,u<<r||ΨqΩp,u|+Wp,u<<r,\left|\Psi_{q}\cap\Omega_{p,<r}\right|=\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|+\left|\Psi_{q}\cap\Omega_{p,u<\cdot<r}\right|\leq\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|+W_{p,u<\cdot<r},

and combining Wp,<rWp,0W_{p,<r}\geq W_{p,0} we get

|ΨqΩp,<r|Wp,<r|ΨqΩp,u|Wp,0+Wp,u<<rWp,0.\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}\leq\frac{\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|}{W_{p,0}}+\frac{W_{p,u<\cdot<r}}{W_{p,0}}. (31)

By Lemma 5.1, the clique-counts property 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}), and the gateway-counts property 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}}, we upper bound the first term in Eq. 31 by

|ΨqΩp,u|Wp,0\displaystyle\frac{\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|}{W_{p,0}} 2|ΨqΩp,u|𝔼[Wp,0]\displaystyle\leq\frac{2\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|}{\mathbb{E}[W_{p,0}]}
exp[(ln2)(logn)2((1+εθ)12(1+εθ)2θ(56ε2θ)+o(1))]\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right]
exp[(ln2)(logn)2((1+εθ)+12(1+εθ)2+o(1))]\displaystyle~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-(1+\varepsilon-\theta)+\frac{1}{2}(1+\varepsilon-\theta)^{2}+o(1)\right)\right]
=exp[(ln2)(logn)2(θ(56ε2θ)+o(1))].\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right]. (32)

For the second term in Eq. 31, 𝒫𝗎𝗉𝗉(ε0)\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0}) and 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}) imply that

Wp,u<<rWp,02n3𝔼[Wp,u<<r]𝔼[Wp,0]2n4𝔼[Wp,u]𝔼[Wp,0]exp[(ln2)(logn)2(112(1α)ε+o(1))],\frac{W_{p,u<\cdot<r}}{W_{p,0}}\leq 2n^{3}\,\frac{\mathbb{E}\left[W_{p,u<\cdot<r}\right]}{\mathbb{E}\left[W_{p,0}\right]}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{p,u}\right]}{\mathbb{E}\left[W_{p,0}\right]}\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{12}(1-\alpha)\varepsilon+o(1)\right)\right], (33)

where the last inequality uses ε1α\varepsilon\leq 1-\alpha. Combining Eqs. 31, 7.1 and 33, we obtain

|ΨqΩp,<r|Wp,<rexp(c1log2n+o(log2n))\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}\leq\exp\left(-c_{1}\log^{2}n+o(\log^{2}n)\right)

for some constant c1=c1(α,ε,θ)c_{1}=c_{1}(\alpha,\varepsilon,\theta). This bounds the first ratio in Eq. 30.

For the second ratio in Eq. 30, we have from 𝒫𝗎𝗉𝗉(ε0)\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0}) and 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}) that

Z<q,rZp,02n3𝔼[Z<q,r)]𝔼[Zp,0].\frac{Z_{<q,r}}{Z_{\leq p,0}}\leq 2n^{3}\,\frac{\mathbb{E}\left[Z_{<q,r})\right]}{\mathbb{E}\left[Z_{\leq p,0}\right]}. (34)

Using linearity of expectation we have that

𝔼[Z<q,r]=q=rq1𝔼[Zq,r]=q=rq1𝔼[Wq,r]exp(βhq).\mathbb{E}\left[Z_{<q,r}\right]=\sum_{q^{\prime}=r}^{q-1}\mathbb{E}\left[Z_{q^{\prime},r}\right]=\sum_{q^{\prime}=r}^{q-1}\mathbb{E}\left[W_{q^{\prime},r}\right]\exp\left(\beta h_{q^{\prime}}\right).

Let

q=argmaxq[n]:rq<q𝔼[Wq,r]exp(βhq).q^{*}=\operatorname*{arg\,max}_{q^{\prime}\in[n]:\;r\leq q^{\prime}<q}\,\mathbb{E}\left[W_{q^{\prime},r}\right]\exp\left(\beta h_{q^{\prime}}\right).

It follows that

𝔼[Z<q,r]n𝔼[Wq,r]exp(βhq).\mathbb{E}\left[Z_{<q,r}\right]\leq n\,\mathbb{E}\left[W_{q^{*},r}\right]\exp\left(\beta h_{q^{*}}\right). (35)

Meanwhile, let q=min{q,p}q^{*\prime}=\min\{q^{*},p\}, and we have

𝔼[Zp,0]𝔼[Wq,0]exp(βhq).\mathbb{E}\left[Z_{\leq p,0}\right]\geq\mathbb{E}\left[W_{q^{*\prime},0}\right]\exp\left(\beta h_{q^{*\prime}}\right). (36)

Combining Eqs. 34, 35 and 36, we get that

Z<q,rZp,02n4𝔼[Wq,r]𝔼[Wq,0]exp[β(hqhq)].\frac{Z_{<q,r}}{Z_{\leq p,0}}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{q^{*},r}\right]}{\mathbb{E}\left[W_{q^{*\prime},0}\right]}\exp\left[\beta(h_{q^{*}}-h_{q^{*\prime}})\right].

Let ρ=q/logn\rho=q^{*}/\log n and ρ=q/logn\rho^{\prime}=q^{*\prime}/\log n. Then by definition we have that ρρ1+ε\rho^{\prime}\leq\rho\leq 1+\varepsilon and ρ=min{ρ,1+εθ}\rho^{\prime}=\min\{\rho,1+\varepsilon-\theta\}. Furthermore by 5.4 we have

β(hqhq)(ln2)(logn)2β^(ρρ).\beta(h_{q^{*}}-h_{q^{*\prime}})\leq(\ln 2)(\log n)^{2}\hat{\beta}(\rho-\rho^{\prime}).

Then, an application of Item 1 of Lemma 5.1 implies that

Z<q,rZp,0\displaystyle\frac{Z_{<q,r}}{Z_{\leq p,0}} exp[(ln2)(logn)2(ρρ22(1α)γ+γ22ρ+(ρ)22+β^(ρρ)+o(1))]\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\hat{\beta}(\rho-\rho^{\prime})+o(1)\right)\right]
=exp[(ln2)(logn)2((ρρ)(β^+112(ρ+ρ))((1α)γγ22)+o(1))].\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left((\rho-\rho^{\prime})\left(\hat{\beta}+1-\frac{1}{2}(\rho+\rho^{\prime})\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right].

If ρ=ρ\rho=\rho^{\prime} then Z<q,r/Zp,0exp(c2log2n+o(log2n))Z_{<q,r}/Z_{\leq p,0}\leq\exp\left(-c_{2}\log^{2}n+o(\log^{2}n)\right) for c2=(ln2)((1α)γγ2/2)c_{2}=(\ln 2)\left((1-\alpha)\gamma-\gamma^{2}/2\right). If ρ>ρ\rho>\rho^{\prime} then

(ρρ)(β^+112(ρ+ρ))θβ^12((1α)γγ22)(\rho-\rho^{\prime})\left(\hat{\beta}+1-\frac{1}{2}(\rho+\rho^{\prime})\right)\leq\theta\hat{\beta}\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)

since 11+εθ=ρ<ρ1+ε1\leq 1+\varepsilon-\theta=\rho^{\prime}<\rho\leq 1+\varepsilon. Therefore, we have Z<q,r/Zp,0exp(c2log2n+o(log2n))Z_{<q,r}/Z_{\leq p,0}\leq\exp\left(-c_{2}\log^{2}n+o(\log^{2}n)\right) for c2=ln22((1α)γγ2/2)c_{2}=\frac{\ln 2}{2}\left((1-\alpha)\gamma-\gamma^{2}/2\right). This bounds the second ratio in Eq. 30. Hence, we establish Eq. 29 and the theorem then follows. ∎

Proof of Claim 7.5.

The first item is obvious, since if a clique σ𝒜\sigma\in\mathcal{A}\setminus\mathcal{B} is adjacent to another clique σ\sigma^{\prime}, then by appending (σ,σ)(\sigma,\sigma^{\prime}) to the path from \emptyset to σ\sigma we get a path from \emptyset to σ\sigma^{\prime} without passing through cliques from \mathcal{B} except possibly at σ′′\sigma^{\prime\prime}, since σ\sigma\notin\mathcal{B}. This implies that σ′′𝒜\sigma^{\prime\prime}\in\mathcal{A} which proves the first item.

For the second item, suppose for contradiction that there exists a clique CC of size qq that is in 𝒜\mathcal{A}\cup\mathcal{B}. Clearly CC\notin\mathcal{B} and so C𝒜C\in\mathcal{A}. Then there exists a path of cliques =C0,C1,,C=C\emptyset=C_{0},C_{1},\dots,C_{\ell}=C which contain no cliques from \mathcal{B}. Let CjC_{j} for j[]j\in[\ell] be the first clique of size qq in this path; that is, |Ci|<q|C_{i}|<q for 0i<j0\leq i<j. Then, the (sub)path =C0,C1,,Cj\emptyset=C_{0},C_{1},\dots,C_{j} must contain a qq-gateway of size pp, call it CC^{\prime}, as one can choose the largest i<ji<j for which |Ci|=p|C_{i}|=p and set C=CiC^{\prime}=C_{i}. If CC^{\prime} has intersection with the planted clique less than rr, then CΨqΩp,<rC^{\prime}\in\Psi_{q}\cap\Omega_{p,<r}\subseteq\mathcal{B}, contradiction. Otherwise, CC^{\prime} has intersection at least rr which means at some earlier time, it will pass a clique of intersection exactly rr whose size is less than qq, which is again in Ω<q,r\Omega_{<q,r}\subseteq\mathcal{B} leading to a contradiction. This establishes the second item.

For the third one, if σ\sigma is a clique of intersection rr, then either |σ|<q|\sigma|<q meaning σΩ<q,r\sigma\in\Omega_{<q,r}\subseteq\mathcal{B} or |σ|q|\sigma|\geq q meaning any path from \emptyset to σ\sigma must pass through a clique of size exactly qq and by the second item must pass through cliques from \mathcal{B}, implying σ𝒜𝖼\sigma\in\mathcal{A}^{\mathsf{c}}. ∎

7.2 Starting from the Empty Clique

Theorem 7.6.

Let α[0,1)\alpha\in[0,1) be any fixed constant. For any constant ε(0,1)\varepsilon\in(0,1), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general Gibbs measure given by Eq. 14 for arbitrary 11-Lipschitz hh with h0=0h_{0}=0 and inverse temperature β=o(logn)\beta=o(\log n). Let {Xt}\{X_{t}\} denote the Metropolis process on GG with stationary distribution πβ\pi_{\beta}. Then there exist constants ξ=ξ(α,ε)>0\xi=\xi(\alpha,\varepsilon)>0 and c=c(α,ε)>0c=c(\alpha,\varepsilon)>0 such that for any clique CΩC\in\Omega of size at most ξlogn\xi\log n, one has

Pr(ttnclogn s.t. |Xt|(1+ε)logn|X0=C)nclogn.\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq n^{c\log n}\text{~{}s.t.~{}}|X_{t}|\geq(1+\varepsilon)\log n\;\Big{|}\;X_{0}=C\Big{)}\leq n^{-c\log n}.

In particular, the Metropolis process starting from the empty clique requires nΩ(logn)n^{\Omega(\log n)} steps to reach cliques of size at least (1+ε)logn(1+\varepsilon)\log n, with probability 1nΩ(logn)1-n^{-\Omega(\log n)}.

Proof.

By Lemmas 5.3 and 6.5, the random graph 𝒢(n,12,nα)\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor) satisfies both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) for η=ε/2\eta=\varepsilon/2 with probability 1o(1)1-o(1) as nn\to\infty. In the rest of the proof we assume that both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) are satisfied.

Suppose that |C|=p=ξlogn|C|=p=\xi^{\prime}\log n, for 0ξξ0\leq\xi^{\prime}\leq\xi and ξ>0\xi>0 is a sufficiently small constant to be determined. Let q=(1+ε)lognq=(1+\varepsilon)\log n and s=(1α)logns=(1-\alpha)\log n. First observe that if the chain arrives at a clique in Ωq,\Omega_{q,*}, it either hits a clique from Ωq,<s\Omega_{q,<s} or previously reached a clique in Ω<q,s\Omega_{<q,s}. This implies that

Pr(ttT:XtΩq,|X0=C)Pr(ttT:XtΩq,<s|X0=C)+Pr(ttT:XtΩ<q,s|X0=C).\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,*}\;\Big{|}\;X_{0}=C\Big{)}\\ \leq\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,<s}\;\Big{|}\;X_{0}=C\Big{)}+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{<q,s}\;\Big{|}\;X_{0}=C\Big{)}.

It suffices to upper bound each of the two terms respectively.

Similar as in Eq. 21, we deduce from the union bound and 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) that

Pr(ttT:XtΩq,<s|X0=C)\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,<s}\;\Big{|}\;X_{0}=C\Big{)} Tnmaxt[T]maxr[s]Pr(XtΩq,rX0=C);\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{r\in[s]}\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right);
Pr(ttT:XtΩ<q,s|X0=C)\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{<q,s}\;\Big{|}\;X_{0}=C\Big{)} Tnmaxt[T]maxp[q]Pr(XtΩp,sX0=C).\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{p\in[q]}\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right).

It suffices to show that for all integer t1t\geq 1, all pairs

(p,r){(q,r):r[s]}{(p,s):p[q]},(p,r)\in\left\{(q,r):r\in[s]\right\}\cup\left\{(p,s):p\in[q]\right\},

and all clique σΩp,r\sigma\in\Omega_{p,r}, it holds

Pr(XtΩp,rX0=C)tnclogn\Pr\left(X_{t}\in\Omega_{p,r}\mid X_{0}=C\right)\leq tn^{-c\log n} (37)

for some constant c>0c>0.

Without loss of generality we may assume that ε1α\varepsilon\leq 1-\alpha. Let η=ε/2\eta=\varepsilon/2 and ξ=ε2/8\xi=\varepsilon^{2}/8. Write β=(ln2)β^logn\beta=(\ln 2)\hat{\beta}\log n with β^=o(1)\hat{\beta}=o(1). Consider first the pair (q,r)(q,r) where r[s]r\in[s]. Suppose that r=γlognr=\gamma\log n with 0γ1α0\leq\gamma\leq 1-\alpha. We deduce from Lemma 6.6, with β^=o(1)\hat{\beta}=o(1), ρ=1+ε\rho=1+\varepsilon, γ[0,1α]\gamma\in[0,1-\alpha], and ρ=min{ρ,1η}=1η\rho^{\prime}=\min\{\rho,1-\eta\}=1-\eta, that

Pr(XtΩq,rX0=C)\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)
\displaystyle\leq{} exp[(ln2)(logn)2(((1+ε)12(1+ε)2)((1η)12(1η)2)((1α)γγ22ξ+ξ22)+o(1))]\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\varepsilon)-\frac{1}{2}(1+\varepsilon)^{2}\right)-\left((1-\eta)-\frac{1}{2}(1-\eta)^{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]
\displaystyle\leq{} exp[(ln2)(logn)2(ε22+η22+ξ+o(1))]=exp[(ln2)(logn)2(ε24+o(1))].\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{4}+o(1)\right)\right].

This shows Eq. 37 for the first case.

For the second case, we have the pair (p,s)(p,s) where p[q]p\in[q]. Suppose p=ρlognp=\rho\log n with 0ρ1+ε0\leq\rho\leq 1+\varepsilon. Also recall that s=(1α)logns=(1-\alpha)\log n. Again, we deduce from Lemma 6.6, with β^=o(1)\hat{\beta}=o(1), ρ[0,1+ε]\rho\in[0,1+\varepsilon], γ=1α\gamma=1-\alpha, and ρ=min{ρ,1η}\rho^{\prime}=\min\{\rho,1-\eta\}, that

Pr(XtΩp,sX0=C)\displaystyle\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right)
\displaystyle\leq{} exp[(ln2)(logn)2((ρρ22)(ρ(ρ)22)(12(1α)2ξ+ξ22)+o(1))]\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(\left(\rho-\frac{\rho^{2}}{2}\right)-\left(\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left(\frac{1}{2}(1-\alpha)^{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]
\displaystyle\leq{} exp[(ln2)(logn)2(12(ρ1)2+12(ρ1)238(1α)2+o(1))],\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}-\frac{3}{8}(1-\alpha)^{2}+o(1)\right)\right],

where the last inequality follows from

12(1α)2ξ+ξ2212(1α)2ε2838(1α)2\frac{1}{2}(1-\alpha)^{2}-\xi+\frac{\xi^{2}}{2}\geq\frac{1}{2}(1-\alpha)^{2}-\frac{\varepsilon^{2}}{8}\geq\frac{3}{8}(1-\alpha)^{2}

since ε1α\varepsilon\leq 1-\alpha. If ρ=ρ1η\rho^{\prime}=\rho\leq 1-\eta, then 12(ρ1)2+12(ρ1)2=0-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}=0. If ρ=1η<ρ1+ε\rho^{\prime}=1-\eta<\rho\leq 1+\varepsilon, then

12(ρ1)2+12(ρ1)2η22ε2818(1α)2-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}\leq\frac{\eta^{2}}{2}\leq\frac{\varepsilon^{2}}{8}\leq\frac{1}{8}(1-\alpha)^{2}

where the last inequality is because ε1α\varepsilon\leq 1-\alpha. Hence,

Pr(XtΩp,sX0=C)exp[(ln2)(logn)2(14(1α)2+o(1))].\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{4}(1-\alpha)^{2}+o(1)\right)\right].

This shows Eq. 37 for the second case. The theorem then follows from Eq. 37. ∎

7.3 Low Temperature Regime and Greedy Algorithm

Theorem 7.7.

Let α[0,1)\alpha\in[0,1) be any fixed constant. For any constant ε(0,1)\varepsilon\in(0,1), the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general Gibbs measure given by Eq. 14 for the identity function hq=qh_{q}=q and inverse temperature β=ω(logn)\beta=\omega(\log n). For any constant γ(0,1α]\gamma\in(0,1-\alpha], there exists constant ξ=ξ(α,ε)>0\xi=\xi(\alpha,\varepsilon)>0 such that for any clique CΩC\in\Omega of size at most ξlogn\xi\log n and any constant c+c\in\mathbb{N}^{+}, with probability at least 1nω(1)1-n^{-\omega(1)} the Metropolis process starting from CC will not reach

  • either cliques of size at least (1+ε)logn(1+\varepsilon)\log n,

  • or cliques of intersection at least γlogn\gamma\log n with the planted clique,

within ncn^{c} steps.

For a subset 𝒮Ω\mathcal{S}\subseteq\Omega of cliques and an integer p+p\in\mathbb{N}^{+}, let 𝒮[p]\mathcal{S}^{[p]} denote the collection of all cliques of size pp that are subsets of cliques from 𝒮\mathcal{S}, i.e.,

𝒮[p]={CΩp,:σ𝒮s.t.Cσ}.\mathcal{S}^{[p]}=\left\{C\in\Omega_{p,*}:\exists\sigma\in\mathcal{S}\mathrm{~{}s.t.~{}}C\subseteq\sigma\right\}.

For 0rq0\leq r\leq q we define Wq,r[p]=|Ωq,r[p]|W^{[p]}_{q,r}=\left|\Omega^{[p]}_{q,r}\right| and similarly for Wq,<r[p]W^{[p]}_{q,<r}, W<q,r[p]W^{[p]}_{<q,r}, etc.

Lemma 7.8.

Consider the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique conditional on satisfying the property 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1). For any 0pq=ρlogn0\leq p\leq q=\rho\log n with ρ0\rho\geq 0 and any r=γlognr=\gamma\log n with 0γρ0\leq\gamma\leq\rho we have

Wq,r[p]exp[(ln2)(logn)2(ρρ22(1α)γ+γ22+o(1))]W^{[p]}_{q,r}\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]

with high probability as nn\to\infty.

Proof.

Notice that every clique of size qq has (qp)2qn2\binom{q}{p}\leq 2^{q}\leq n^{2} subsets of size pp. The lemma then follows immediately from 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon). ∎

We now give the proof of Theorem 7.7.

Proof of Theorem 7.7.

By Lemmas 5.3 and 6.5, the random graph 𝒢(n,12,nα)\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor) satisfies both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) for η=ε/2\eta=\varepsilon/2 with probability 1o(1)1-o(1) as nn\to\infty. In the rest of the proof we assume that both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) are satisfied.

First, a simple observation is that the process actually never remove vertices. Indeed, the probability of removing a vertex from the current clique in one step is at most eβ=nω(1)e^{-\beta}=n^{-\omega(1)}. Since we run the Metropolis process for polynomially many steps, the probability that the chain ever remove a vertex is upper bounded by poly(n)nω(1)=nω(1)\mathrm{poly}(n)\cdot n^{-\omega(1)}=n^{-\omega(1)}. Hence, in this low temperature regime the Metropolis process is equivalent to the greedy algorithm.

Without loss of generality, we may assume that

ε2((1α)γγ22).\varepsilon\leq\sqrt{2\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)}.

Let q=(1+ε)lognq=(1+\varepsilon)\log n and s=γlogns=\gamma\log n. Suppose the initial state is a clique CC of size ξlogn\xi\log n for ξε2/8\xi\leq\varepsilon^{2}/8. Let η=ε/2\eta=\varepsilon/2 and q=(1η)lognq^{\prime}=(1-\eta)\log n. If the chain arrives at a clique in Ωq,\Omega_{q,*}, it either hits a clique from Ωq,<s\Omega_{q,<s} or previously reached a clique in Ω<q,s\Omega_{<q,s}. Similarly, if the chain hits Ω,s\Omega_{*,s}, then it must also reach either Ω<q,s\Omega_{<q,s} or Ωq,<s\Omega_{q,<s}. Hence, to bound the probability of reaching either Ωq,\Omega_{q,*} or Ω,s\Omega_{*,s}, we only need to bound the probability of reaching Ωq,<s\Omega_{q,<s} or Ω<q,s\Omega_{<q,s}. Furthermore, since the process never removes a vertex with high probability in polynomially many steps, in the case this happens it must first reach a clique from Ωq,<s[q]\Omega^{[q^{\prime}]}_{q,<s}, or Ωq<<q,s[q]\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}, or Ωq,s\Omega_{\leq q^{\prime},s}. To summarize, we can deduce from the union bound that

Pr(ttT:XtΩq,Ω,s|X0=C)\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,*}\cup\Omega_{*,s}\;\Big{|}\;X_{0}=C\Big{)}
\displaystyle\leq{} Pr(ttT:XtΩq,<s[q]|X0=C)+Pr(ttT:XtΩq<<q,s[q]|X0=C)\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q,<s}\;\Big{|}\;X_{0}=C\Big{)}+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}\;\Big{|}\;X_{0}=C\Big{)}
+Pr(ttT:XtΩq,s|X0=C)+1nω(1).\displaystyle+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{\leq q^{\prime},s}\;\Big{|}\;X_{0}=C\Big{)}+\frac{1}{n^{\omega(1)}}.

We bound each of the three probabilities respectively.

For the first case, we have from the union bound and 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) that,

Pr(ttT:XtΩq,<s[q]|X0=C)Tn4maxt[T]maxr[s]maxσΩq,r[q]𝔼[Wq,r[q]]Pr(Xt=σX0=C).\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q,<s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{r\in[s]}\max_{\sigma\in\Omega^{[q^{\prime}]}_{q,r}}\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose r=γlogn[s]r=\gamma^{\prime}\log n\in[s]. We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with ρ=ρ=1η\rho=\rho^{\prime}=1-\eta and γ=0\gamma=0 for the notations of Lemma 6.6) that for every integer t1t\geq 1 and every clique σΩq,r[q]\sigma\in\Omega^{[q^{\prime}]}_{q,r} (note that |σ|=q|\sigma|=q^{\prime} since q<qq^{\prime}<q),

𝔼[Wq,r[q]]Pr(Xt=σX0=C)\displaystyle\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
\displaystyle\leq{} 𝔼[Wq,r[q]]𝔼[Wq,0]𝔼[Wq,0]Pr(Xt=σX0=C)\displaystyle\frac{\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]}{\mathbb{E}\left[W_{q^{\prime},0}\right]}\cdot\mathbb{E}\left[W_{q^{\prime},0}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
\displaystyle\leq{} texp[(ln2)(logn)2((1+ε)12(1+ε)2(1α)γ+(γ)22(1η)+12(1η)2+ξξ22+o(1))]\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon)-\frac{1}{2}(1+\varepsilon)^{2}-(1-\alpha)\gamma^{\prime}+\frac{(\gamma^{\prime})^{2}}{2}-(1-\eta)+\frac{1}{2}(1-\eta)^{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]
\displaystyle\leq{} texp[(ln2)(logn)2(ε22+η22+ξ+o(1))]texp[(ln2)(logn)2(ε24+o(1))],\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{4}+o(1)\right)\right],

where the last inequality follows from

η22+ξε28+ε28=ε24.\frac{\eta^{2}}{2}+\xi\leq\frac{\varepsilon^{2}}{8}+\frac{\varepsilon^{2}}{8}=\frac{\varepsilon^{2}}{4}.

Next consider the second case, where we have

Pr(ttT:XtΩq<<q,s[q]|X0=C)Tn4maxt[T]maxp[q][q]maxσΩp,s[q]𝔼[Wp,s[q]]Pr(Xt=σX0=C).\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{p\in[q]\setminus[q^{\prime}]}\max_{\sigma\in\Omega^{[q^{\prime}]}_{p,s}}\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose p=ρlogn[q][q]p=\rho\log n\in[q]\setminus[q^{\prime}] and so 1ηρ1+ε1-\eta\leq\rho\leq 1+\varepsilon. Also recall that s=γlogns=\gamma\log n. We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with ρ=ρ=1η\rho=\rho^{\prime}=1-\eta and γ=0\gamma=0 for the notations of Lemma 6.6) that for every integer t1t\geq 1 and every clique σΩp,s[q]\sigma\in\Omega^{[q^{\prime}]}_{p,s} (note that |σ|=q|\sigma|=q^{\prime} since q<qq^{\prime}<q),

𝔼[Wp,s[q]]Pr(Xt=σX0=C)\displaystyle\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
\displaystyle\leq{} 𝔼[Wp,s[q]]𝔼[Wq,0]𝔼[Wq,0]Pr(Xt=σX0=C)\displaystyle\frac{\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]}{\mathbb{E}[W_{q^{\prime},0}]}\cdot\mathbb{E}[W_{q^{\prime},0}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)
\displaystyle\leq{} texp[(ln2)(logn)2(ρρ22(1α)γ+γ22(1η)+12(1η)2+ξξ22+o(1))]\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}-(1-\eta)+\frac{1}{2}(1-\eta)^{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]
\displaystyle\leq{} texp[(ln2)(logn)2(((1α)γγ22)+η22+ξ+o(1))]\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]
\displaystyle\leq{} texp[(ln2)(logn)2(12((1α)γγ22)+o(1))],\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right],

where the last inequality follows from

η22+ξε2412((1α)γγ22).\frac{\eta^{2}}{2}+\xi\leq\frac{\varepsilon^{2}}{4}\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Finally consider the third case. Again we have

Pr(ttT:XtΩq,s|X0=C)Tn4maxt[T]maxp[q]maxσΩp,s𝔼[Wp,s]Pr(Xt=σX0=C).\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{\leq q^{\prime},s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{p\in[q^{\prime}]}\max_{\sigma\in\Omega_{p,s}}\mathbb{E}[W_{p,s}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose p=ρlogn[q]p=\rho\log n\in[q^{\prime}] and so ρ1η\rho\leq 1-\eta. Also recall that s=γlogns=\gamma\log n. We deduce from Lemma 6.6 that for every integer t1t\geq 1 and every clique σΩp,s\sigma\in\Omega_{p,s},

𝔼[Wp,s]Pr(Xt=σX0=C)\displaystyle\mathbb{E}[W_{p,s}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right) texp[(ln2)(logn)2(((1α)γγ22)+ξξ22+o(1))]\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]
texp[(ln2)(logn)2(34((1α)γγ22)+o(1))],\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{4}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right],

where the last inequality follows from

ξε2814((1α)γγ22).\xi\leq\frac{\varepsilon^{2}}{8}\leq\frac{1}{4}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Therefore, we conclude that

Pr(ttT:XtΩq,Ω,s|X0=C)Tn2nΩ(logn)+1nω(1)1nω(1),\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,*}\cup\Omega_{*,s}\;\Big{|}\;X_{0}=C\Big{)}\leq\frac{Tn^{2}}{n^{\Omega(\log n)}}+\frac{1}{n^{\omega(1)}}\leq\frac{1}{n^{\omega(1)}},

as we wanted. ∎

8 Simulated Tempering

In this section, we discuss our lower bounds against the simulated tempering versions of the Metropolis process.

8.1 Definition of the dynamics

We start with the formal definition. Suppose for some mm\in\mathbb{N} we have a collection of inverse temperatures β0<β1<<βm\beta_{0}<\beta_{1}<\dots<\beta_{m}. For each i[m]i\in[m], let Z^(βi)\widehat{Z}(\beta_{i}) denote an estimate of the partition function Z(βi)Z(\beta_{i}). The simulated tempering (ST) dynamics is a Markov chain on the state space Ω×[m]\Omega\times[m], which seeks to optimize a Hamiltonian defined on Ω×[m]\Omega\times[m], say given according to H(C)=h|C|H(C)=h_{|C|} for an arbitrary vector {hq,q[n]}\{h_{q},q\in[n]\}. The transition matrix is given as follows.

  • A level move: For C,CΩC,C^{\prime}\in\Omega and i[m]i\in[m] such that CC and CC^{\prime} differ by exactly one vertex,

    Pst((C,i),(C,i))=anmin{1,exp[βi(h|C|h|C|)]}P_{\textsc{st}}((C,i),(C^{\prime},i))=\frac{a}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{|C^{\prime}|}-h_{|C|}\right)\right]\right\}
  • A temperature move: For i,i[m]i,i^{\prime}\in[m] such that |ii|=1|i-i^{\prime}|=1,

    Pst((C,i),(C,i))=1a2min{1,Z^(βi)Z^(βi)exp[(βiβi)h|C|]}.P_{\textsc{st}}((C,i),(C,i^{\prime}))=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i^{\prime}})}\exp\left[\left(\beta_{i^{\prime}}-\beta_{i}\right)h_{|C|}\right]\right\}.

Some remarks are in order.

Remark 8.1.

The stationary distribution of the ST dynamics can be straightforwardly checked to be given by μ(C,i)Z(βi)Z^(βi)πβi(C)\mu(C,i)\propto\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(C), for πβi(C)\pi_{\beta_{i}}(C) the generic Gibbs measure defined in Eq. 14. Notice that if Z^(βi)=Z(βi)\widehat{Z}(\beta_{i})=Z(\beta_{i}) for all i[m]i\in[m] then we have

μ(C,i)=1m+1πβi(C).\mu(C,i)=\frac{1}{m+1}\pi_{\beta_{i}}(C).

and along a single temperature the ST dynamics is identical to the Metropolis process introduced in Section 5.3.

Remark 8.2.

The use of estimates Z^(βi)\hat{Z}(\beta_{i}) of the partition function Z(βi)Z(\beta_{i}) in the definition of the ST dynamics, as opposed to the original values is naturally motivated from applications where one cannot efficiently compute the value of Z(βi)Z(\beta_{i}) to decide the temperature move step.

8.2 Existence of a Bad Initial Clique

We now present our lower bound results which are for the ST dynamics under a worst-case initialization.

Our first result is about the ST dynamics failing to reach γlogn\gamma\log n intersection with the planted clique, similar to the Metropolis process according to Theorem 6.1. Interestingly, the lower bound holds for any choice of arbitrarily many temperatures and for any choice of estimators of the partition function.

Theorem 8.3.

Let α(0,1)\alpha\in(0,1) be any fixed constant. For any constant γ>0\gamma>0, the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general ST dynamics given in Section 8.1 for arbitrary h,h, arbitrary mm\in\mathbb{N}, arbitrary inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m}, and arbitrary estimates Z^(βi),i=1,,m\hat{Z}(\beta_{i}),i=1,\ldots,m. Then there is an initialization pair of temperature and clique for the ST dynamics from which it requires exp(Ω(log2n))\exp(\Omega(\log^{2}n))-time to reach a pair of temperature and clique where the clique is of intersection with the planted clique at least γlog2n,\gamma\log_{2}n, with probability at least 1exp(Ω(log2n))1-\exp(-\Omega(\log^{2}n)). In particular, under worst-case initialization it fails to recover the planted clique in polynomial-time.

Proof.

Throughout the proof we assume that both 𝒫𝗎𝗉𝗉(ε)\mathcal{P}_{\mathsf{upp}}(\varepsilon) and 𝒫𝗅𝗈𝗐(ε)\mathcal{P}_{\mathsf{low}}(\varepsilon) are satisfied for ε>0\varepsilon>0 given by Eq. 15, which happens with probability 1o(1)1-o(1) as nn\to\infty by Lemma 5.3.

Notice that we can assume without loss of generality that γ\gamma satisfies 0<γ<2(1α)0<\gamma<2(1-\alpha). We let r=γlognr=\left\lfloor\gamma\log n\right\rfloor. Now from the proof of Theorem 6.1 we have that for any such γ,\gamma, there exists a constant c=c(α,γ)>0c=c(\alpha,\gamma)>0 such that for all βi,i=1,,m\beta_{i},i=1,\ldots,m and the corresponding πβi,i=1,,m\pi_{\beta_{i}},i=1,\ldots,m Gibbs measure per Eq. 14,

πβi(Ω,r)πβi(Ω,r)=Z,rZ,rexp(clog2n),\displaystyle\frac{\pi_{\beta_{i}}(\Omega_{*,r})}{\pi_{\beta_{i}}(\Omega_{*,\leq r})}=\frac{Z_{*,r}}{Z_{*,\leq r}}\leq\exp\left(-c\log^{2}n\right), (38)

w.h.p. as n+.n\rightarrow+\infty. We now consider the set 𝒜=i[m]Ω,r×{βi}\mathcal{A}=\bigcup_{i\in[m]}\Omega_{*,\leq r}\times\{\beta_{i}\} the subset of the state space of the ST dynamics, and notice 𝒜=i[m]Ω,r×{βi},\partial\mathcal{A}=\bigcup_{i\in[m]}\Omega_{*,r}\times\{\beta_{i}\}, where A\partial A is the boundary of AA. In particular using (38) we conclude that w.h.p. as n+n\rightarrow+\infty

μ(𝒜)μ(𝒜)\displaystyle\frac{\mu(\partial\mathcal{A})}{\mu(\mathcal{A})} =i=1mZ(βi)Z^(βi)πβi(Ω,r)i=1mZ(βi)Z^(βi)πβi(Ω,r)exp(clog2n),\displaystyle=\frac{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\Omega_{*,r})}{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\Omega_{*,\leq r})}\leq\exp\left(-c\log^{2}n\right), (39)

Given Eq. 39, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎

Our second result is about the ST dynamics under the additional restriction that maxi[m]|βi|=O(logn).\max_{i\in[m]}|\beta_{i}|=O(\log n). In this case, similar to the Metropolis process per Theorem 7.1, we show that the ST dynamics fail to reach either (1+ε)logn(1+\varepsilon)\log n-cliques or cliques with intersection at least γlogn\gamma\log n with the planted clique. Interestingly, again, the lower bound holds for any choice of arbitrarily many temperatures of magnitude O(logn)O(\log n) and for any choice of estimators of the partition function.

Theorem 8.4.

Let α[0,1)\alpha\in[0,1) be any fixed constant. Then the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general ST dynamics given in Section 8.1 for arbitrary hh satisfying 5.4, arbitrary mm\in\mathbb{N}, arbitrary inverse temperatures β1<β2<<βm\beta_{1}<\beta_{2}<\ldots<\beta_{m} with maxi[m]|βi|=O(logn)\max_{i\in[m]}|\beta_{i}|=O(\log n), and arbitrary estimates Z^(βi),i=1,,m\hat{Z}(\beta_{i}),i=1,\ldots,m. For any constants ε(0,1α)\varepsilon\in(0,1-\alpha) and γ(0,1α]\gamma\in(0,1-\alpha], there is an initialization pair of temperature and clique for the ST dynamics from which it requires exp(Ω(log2n))\exp(\Omega(\log^{2}n))-time to reach a pair of temperature and clique where

  • either the clique is of size at least (1+ε)logn(1+\varepsilon)\log n,

  • or the clique is of intersection at least γlogn\gamma\log n with the planted clique,

with probability at least 1exp(Ω(log2n))1-\exp(-\Omega(\log^{2}n)).

Proof.

Throughout the proof we assume that 𝒫𝗎𝗉𝗉(ε0)\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0}), 𝒫𝗅𝗈𝗐(ε0)\mathcal{P}_{\mathsf{low}}(\varepsilon_{0}), and 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}} are all satisfied for ε0=α1ε\varepsilon_{0}=\alpha\leq 1-\varepsilon, which happens with probability 1o(1)1-o(1) as nn\to\infty by Lemmas 5.3 and 7.4.

We start with following the proof of Theorem 7.1. For i=1,,mi=1,\ldots,m let β^i\hat{\beta}_{i} be such that β^i=βi/((ln2)(logn))\hat{\beta}_{i}=\beta_{i}/((\ln 2)(\log n)). By assumption we have maxi[m]|βi^|=O(1)\max_{i\in[m]}|\hat{\beta_{i}}|=O(1). Pick a constant θ(0,ε/3)\theta\in(0,\varepsilon/3) such that for all i[m]i\in[m]

β^iθ12((1α)γγ22).\hat{\beta}_{i}\theta\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Let q=(1+ε)lognq=(1+\varepsilon)\log n, p=(1+εθ)lognp=(1+\varepsilon-\theta)\log n, and r=γlognr=\gamma\log n. Let also

=(ΨqΩp,<r)Ω<q,r.\mathcal{B}=\left(\Psi_{q}\cap\Omega_{p,<r}\right)\cup\Omega_{<q,r}.

Let also 𝒜Ω\mathcal{A}\subseteq\Omega denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from \mathcal{B} except possibly for the destination.

From the proof of Theorem 7.1 we have that there exists a constant c=c(α,γ,θ)>0c=c(\alpha,\gamma,\theta)>0 such that for all βi,i=1,,m\beta_{i},i=1,\ldots,m and the corresponding πβi,i=1,,m\pi_{\beta_{i}},i=1,\ldots,m being the Gibbs measure per Eq. 14,

πβi(𝒜)πβi(𝒜)πβi()πβi(𝒜)exp(clog2n),\displaystyle\frac{\pi_{\beta_{i}}(\partial\mathcal{A})}{\pi_{\beta_{i}}(\mathcal{A})}\leq\frac{\pi_{\beta_{i}}(\mathcal{B})}{\pi_{\beta_{i}}(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right), (40)

w.h.p. as n+.n\rightarrow+\infty. We now consider the set 𝒢=i[m]𝒜×{βi}\mathcal{G}=\bigcup_{i\in[m]}\mathcal{A}\times\{\beta_{i}\} the subset of the state space of the ST dynamics, and notice 𝒢=i[m]𝒜×{βi}.\partial\mathcal{G}=\bigcup_{i\in[m]}\partial\mathcal{A}\times\{\beta_{i}\}. In particular using (40) we conclude that w.h.p. as n+n\rightarrow+\infty

μ(𝒢)μ(𝒢)\displaystyle\frac{\mu(\partial\mathcal{G})}{\mu(\mathcal{G})} =i=1mZ(βi)Z^(βi)πβi(𝒜)i=1mZ(βi)Z^(βi)πβi(𝒜)exp(clog2n),\displaystyle=\frac{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\partial\mathcal{A})}{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right), (41)

Given Eq. 41, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎

8.3 Starting From the Empty Clique

Theorem 8.5.

Let α[0,1)\alpha\in[0,1) be any fixed constant. Then the random graph 𝒢(n,12,k=nα)\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor) with a planted clique satisfies the following with probability 1o(1)1-o(1) as nn\to\infty.

Consider the general ST dynamics given in Section 8.1 for monotone 11-Lipschitz hh with h0=0h_{0}=0, arbitrary inverse temperatures β0<β1<<βm\beta_{0}<\beta_{1}<\ldots<\beta_{m} with m=o(logn)m=o(\log n) and βm=O(1)\beta_{m}=O(1), and arbitrary estimates Z^(β0)<Z^(β1)<<Z^(βm)\hat{Z}(\beta_{0})<\hat{Z}(\beta_{1})<\cdots<\hat{Z}(\beta_{m}). For any constants cc\in\mathbb{N}, ε(0,1α)\varepsilon\in(0,1-\alpha), and γ(0,1α]\gamma\in(0,1-\alpha], the ST dynamics starting from (,0)(\emptyset,0) will not reach within ncn^{c} steps a pair of temperature and clique where

  • either the clique is of size at least (1+ε)logn(1+\varepsilon)\log n,

  • or the clique is of intersection at least γlogn\gamma\log n with the planted clique,

with probability 1nω(1)1-n^{-\omega(1)}.

In what follows we condition on both 𝒫𝗎𝗉𝗉(0.1)\mathcal{P}_{\mathsf{upp}}(0.1) and 𝒫𝖾𝗑𝗉(η)\mathcal{P}_{\mathsf{exp}}(\eta) for η=ε/2\eta=\varepsilon/2, which by Lemmas 5.3 and 6.5 hold with probability 1o(1)1-o(1) as nn\to\infty. Also, we may assume that

Z^(βi)Z^(βi+1)1nlognm\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\geq\frac{1}{n^{\sqrt{\frac{\log n}{m}}}} (42)

for all i=0,1,,m1i=0,1,\dots,m-1. Otherwise, for any clique CC of size O(logn)O(\log n) one has

Pst((C,i),(C,i+1))=1a2min{1,Z^(βi)Z^(βi+1)exp[(βi+1βi)h|C|]}eO(logn)nlognm=1nω(1)P_{\textsc{st}}((C,i),(C,i+1))=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{|C|}\right]\right\}\leq\frac{e^{O(\log n)}}{n^{\sqrt{\frac{\log n}{m}}}}=\frac{1}{n^{\omega(1)}}

since βm=O(1)\beta_{m}=O(1), h|C||C|=O(logn)h_{|C|}\leq|C|=O(\log n) and m=o(logn)m=o(\log n). This means that the chain with high probability will never make a move to the inverse temperature βi+1\beta_{i+1} in polynomially many steps, unless already having clique size, say, 10logn\geq 10\log n. Since we are studying reaching cliques of size (1+ε)logn(1+\varepsilon)\log n, we may assume that Eq. 42 holds for all ii, by removing those temperatures violating Eq. 42 and those larger since the chain does not reach them with high probability in poly(n)\mathrm{poly}(n) steps. An immediate corollary of Eq. 42 is that

Z^(βm)Z^(β0)nmlognm=nmlogn=no(logn).\frac{\widehat{Z}(\beta_{m})}{\widehat{Z}(\beta_{0})}\leq n^{m\sqrt{\frac{\log n}{m}}}=n^{\sqrt{m\log n}}=n^{o(\log n)}. (43)

Let q=(1+ε)lognq=(1+\varepsilon)\log n and s=γlogns=\gamma\log n. By the union bound we have

Pr(ttT:(Xt,It)(Ωq,<sΩ<q,s)×[m]|(X0,I0)=(,0))\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,(X_{t},I_{t})\in(\Omega_{q,<s}\cup\Omega_{<q,s})\times[m]\;\Big{|}\;(X_{0},I_{0})=(\emptyset,0)\Big{)}
\displaystyle\leq{} T(m+1)n4maxt[T]max[m]maxr[s]maxσΩq,r𝔼[Wq,r]Pr((Xt,It)=(σ,)(X0,I0)=(,0))\displaystyle T(m+1)n^{4}\,\max_{t\in[T]}\,\max_{\ell\in[m]}\,\max_{r\in[s]}\max_{\sigma\in\Omega_{q,r}}\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)
+T(m+1)n4maxt[T]max[m]maxp[q]maxσΩp,s𝔼[Wp,s]Pr((Xt,It)=(σ,)(X0,I0)=(,0))\displaystyle+T(m+1)n^{4}\,\max_{t\in[T]}\,\max_{\ell\in[m]}\,\max_{p\in[q]}\max_{\sigma\in\Omega_{p,s}}\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right) (44)

We will show that for all integer t1t\geq 1, all integer [m]\ell\in[m], all integer rsr\leq s, and all clique σΩq,r\sigma\in\Omega_{q,r}, it holds

𝔼[Wq,r]Pr((Xt,It)=(σ,)(X0,I0)=(,0))nω(1),\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)\leq n^{-\omega(1)}, (45)

and all integer pqp\leq q, and all clique σΩp,s\sigma\in\Omega_{p,s}, it holds

𝔼[Wp,s]Pr((Xt,It)=(σ,)(X0,I0)=(,0))nω(1),\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)\leq n^{-\omega(1)}, (46)

The theorem then follows from Eqs. 44, 45 and 46.

It will be helpful to consider the time-reversed dynamics and try to bound the probability of reaching \emptyset when starting from a large clique σ\sigma. By reversibility, we have

Pr((Xt,It)=(σ,)(X0,I0)=(,0))=Z^(β0)Z^(β)exp(βhq)exp(β0h0)Pr((Xt,It)=(,0)(X0,I0)=(σ,)).\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)={\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}}\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right). (47)

which is an application of Fact 6.7.

Let η(0,1)\eta\in(0,1) be a constant. Introduce a random walk {(Yt,Jt)}\{(Y_{t},J_{t})\} on [n]×[m][n]\times[m] with transition matrix PP given by

P((s,j),(s1,j))\displaystyle P\left((s,j),(s-1,j)\right) =asnmin{1,exp[βj(hs1hs)]};\displaystyle=\frac{as}{n}\min\left\{1,\exp\left[\beta_{j}\left(h_{s-1}-h_{s}\right)\right]\right\};
P((s,j),(s+1,j))\displaystyle P\left((s,j),(s+1,j)\right) ={a202smin{1,exp[βj(hs+1hs)]},for 0s<(1η)logn;0,for (1η)lognsn;\displaystyle=\begin{cases}\dfrac{a}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta_{j}\left(h_{s+1}-h_{s}\right)\right]\right\},&\text{for~{}}0\leq s<\left\lfloor(1-\eta)\log n\right\rfloor;\vspace{0.2em}\\ 0,&\text{for~{}}\left\lfloor(1-\eta)\log n\right\rfloor\leq s\leq n;\end{cases}
P((s,j),(s,j))\displaystyle P\left((s,j),(s,j^{\prime})\right) =1a2min{1,Z^(βj)Z^(βj)exp[(βjβj)hs]}for j=j±1.\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{j})}{\widehat{Z}(\beta_{j^{\prime}})}\exp\left[\left(\beta_{j^{\prime}}-\beta_{j}\right)h_{s}\right]\right\}\quad\text{for $j^{\prime}=j\pm 1$}.

and for all (s,j)(s,j),

P((s,j),(s,j))=1s=s±1P((s,j),(s,j))j=j±1P((s,j),(s,j)).P\left((s,j),(s,j)\right)=1-\sum_{s^{\prime}=s\pm 1}P\left((s,j),(s^{\prime},j)\right)-\sum_{j^{\prime}=j\pm 1}P\left((s,j),(s,j^{\prime})\right).

To be more precise, the above definition of P((s,j),(s,j))P\left((s,j),(s^{\prime},j^{\prime})\right) applies when (s,j)[n]×[m](s^{\prime},j^{\prime})\in[n]\times[m] and we assume P((s,j),(s,j))=0P\left((s,j),(s^{\prime},j^{\prime})\right)=0 if (s,j)[n]×[m](s^{\prime},j^{\prime})\notin[n]\times[m], e.g., when j=0j=0 and j=1j^{\prime}=-1.

We now calculate the stationary distribution of PP on states (s,i)(s,i) when restricted to s(1η)logns\leq\left\lfloor(1-\eta)\log n\right\rfloor. We start with proving that the random walk is time-reversible. Note that the random walk introduced is clearly aperiodic, positive recurrent and irreducible. Hence, by the Kolmogorov’s criterion the random walk is time-reversible if and only if for any cycle in the state space, the probability the random walk moves along the cycle in one direction equals to the probability of moving in the opposite direction. Given that the minimal cycles in the finite box [n]×[m][n]\times[m] are simply squares of the form {s,s+1}×{j,j+1}\{s,s+1\}\times\{j,j+1\}, it suffices to show that for any s[n1],j[m1],s\in[n-1],j\in[m-1], the criterion solely for these cycles, that is to show

P((s,j),(s+1,j))P((s+1,j),(s+1,j+1))P((s+1,j+1),(s+1,j))P((s+1,j),(s,j))\displaystyle P\left((s,j),(s+1,j)\right)P\left((s+1,j),(s+1,j+1)\right)P\left((s+1,j+1),(s+1,j)\right)P\left((s+1,j),(s,j)\right)
=\displaystyle= P((s,j),(s,j+1))P((s,j+1),(s+1,j+1))P((s+1,j+1),(s,j+1))P((s,j+1),(s,j)),\displaystyle P\left((s,j),(s,j+1)\right)P\left((s,j+1),(s+1,j+1)\right)P\left((s+1,j+1),(s,j+1)\right)P\left((s,j+1),(s,j)\right),

which can be straightforwardly checked to be true.

We now calculate the stationary distribution. Using the reversibility and that hh_{\ell} is monotonically increasing in \ell\in\mathbb{Z}, we have for arbitrary s,js,j

ν((s,j))\displaystyle\nu((s,j)) =ν((0,j))t=1sP((t,j),(t1,j))P((t1,j),(t,j))\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{P\left((t,j),(t-1,j)\right)}{P\left((t-1,j),(t,j)\right)}
=ν((0,j))t=1satnmin{exp[βj(ht1ht)],1}a202tmin{exp[βj(htht1)],1}\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{\frac{at}{n}\min\left\{\exp\left[\beta_{j}\left(h_{t-1}-h_{t}\right)\right],1\right\}}{\frac{a}{20\cdot 2^{t}}\min\left\{\exp\left[\beta_{j}\left(h_{t}-h_{t-1}\right)\right],1\right\}}
=ν((0,j))t=1s20t2tnexp[βj(ht1ht)]\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{20t\cdot 2^{t}}{n}\exp\left[\beta_{j}\left(h_{t-1}-h_{t}\right)\right]
=ν((0,j))20ss!2(s2)nsexp[βj(h0hs)].\displaystyle=\nu((0,j))\frac{20^{s}\cdot s!\cdot 2^{\binom{s}{2}}}{n^{s}}\exp\left[\beta_{j}(h_{0}-h_{s})\right].

Furthermore, since h0=0,h_{0}=0, we have again by reversibility,

ν((0,j))\displaystyle\nu((0,j)) =ν((0,1))t=1jP((0,t),(0,t1))P((0,t1),(0,t))\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{P\left((0,t),(0,t-1)\right)}{P\left((0,t-1),(0,t)\right)}
=ν((0,1))t=1j1a2min{1,Z^(βt)Z^(βt1)exp[(βt1βj)h0]}1a2min{1,Z^(βt1)Z^(βt)exp[(βtβt1)h0]}\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{t})}{\widehat{Z}(\beta_{t-1})}\exp\left[\left(\beta_{t-1}-\beta_{j}\right)h_{0}\right]\right\}}{\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{t-1})}{\widehat{Z}(\beta_{t})}\exp\left[\left(\beta_{t}-\beta_{t-1}\right)h_{0}\right]\right\}}
=ν((0,1))t=1jZ^(βt)Z^(βt1)\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{\widehat{Z}(\beta_{t})}{\widehat{Z}(\beta_{t-1})}
Z^(βj).\displaystyle\propto\widehat{Z}(\beta_{j}).

Combining the above, we conclude

ν((s,j))\displaystyle\nu((s,j)) Z^(βj)20ss!2(s2)nsexp[βjhs].\displaystyle\propto\widehat{Z}(\beta_{j})\frac{20^{s}\cdot s!\cdot 2^{\binom{s}{2}}}{n^{s}}\exp\left[-\beta_{j}h_{s}\right]. (48)

The following lemma shows that (Yt,Jt)(Y_{t},J_{t}) is stochastically dominated by the pair (|Xt|,It)(|X_{t}|,I_{t}).

Lemma 8.6.

Let {(Xt,It)}\{(X_{t},I_{t})\} denote the Simulated Tempering process starting from some X0=σΩq,X_{0}=\sigma\in\Omega_{q,*} and I0=I_{0}=\ell. Let {(Yt,Jt)}\{(Y_{t},J_{t})\} denote the stochastic process described above with parameter η(0,1)\eta\in(0,1) starting from Y0=qY_{0}=q and J0=J_{0}=\ell. Assume that GG satisfies the conclusion of Lemma 6.5 with parameter η\eta, then there exists a coupling {((Xt,It),(Yt,Jt))}\{((X_{t},I_{t}),(Y_{t},J_{t}))\} of the two processes such that for all integer t1t\geq 1 it holds

Yt|Xt|andJtIt.Y_{t}\leq|X_{t}|\quad\text{and}\quad J_{t}\leq I_{t}.

In particular, for all integer t1t\geq 1 it holds

Pr((Xt,It)=(,0)(X0,I0)=(σ,))Pr((Yt,Jt)=(0,0)(Y0,J0)=(q,)).\displaystyle\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right)\leq\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right).
Proof.

We couple {(Xt,It)}\{(X_{t},I_{t})\} and {(Yt,Jt)}\{(Y_{t},J_{t})\} as follows. Suppose that {(Yt1,Jt1)}{(Xt1,It1)}\{(Y_{t-1},J_{t-1})\}\leq\{(X_{t-1},I_{t-1})\} for some integer t1t\geq 1. We will construct a coupling of {(Xt,It)}\{(X_{t},I_{t})\} and {(Yt,Jt)}\{(Y_{t},J_{t})\} such that {(Yt,Jt)}{(Xt,It)}\{(Y_{t},J_{t})\}\leq\{(X_{t},I_{t})\}. With probability aa, the two chains both attempt to update the first coordinate, and with probability 1a1-a the second.

Consider first updating the first coordinate. Since the probability that Yt=Yt1+1Y_{t}=Y_{t-1}+1 is less than 1/21/2 and so does the probability of |Xt|=|Xt1|1|X_{t}|=|X_{t-1}|-1, we may couple XtX_{t} and YtY_{t} such that |Xt|Yt|X_{t}|-Y_{t} decreases at most one; namely, it never happens that YtY_{t} increases by 11 while XtX_{t} decreases in size. Thus, it suffices to consider the extremal case when |Xt1|=Yt1=s|X_{t-1}|=Y_{t-1}=s. Since i=It1Jt1=ji=I_{t-1}\geq J_{t-1}=j, we have βiβj\beta_{i}\geq\beta_{j} and thus

Pr(|Xt|=s1|Xt1|=s,It1=i)\displaystyle\Pr\left(|X_{t}|=s-1\mid|X_{t-1}|=s,I_{t-1}=i\right) =asnmin{1,exp[βi(hs1hs)]}\displaystyle=\frac{as}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{s-1}-h_{s}\right)\right]\right\}
asnmin{1,exp[βj(hs1hs)]}=P((s,j),(s1,j))\displaystyle\leq\frac{as}{n}\min\left\{1,\exp\left[\beta_{j}\left(h_{s-1}-h_{s}\right)\right]\right\}=P\left((s,j),(s-1,j)\right)

So we can couple {(Xt,It)}\{(X_{t},I_{t})\} and {(Yt,Jt)}\{(Y_{t},J_{t})\} such that either |Xt|=Yt|X_{t}|=Y_{t} or |Xt|=s|X_{t}|=s, Yt=s1Y_{t}=s-1. Meanwhile, recall that A(Xt1)A(X_{t-1}) is the set of vertices vv such that Xt1{v}ΩX_{t-1}\cup\{v\}\in\Omega. Then we have

|A(Xt1)|n202s|A(X_{t-1})|\geq\frac{n}{20\cdot 2^{s}}

whenever snηs\leq n_{\eta} by Lemma 6.5. Hence, we deduce that

Pr(|Xt|=s+1|Xt1|=s,It1=i)\displaystyle\Pr\left(|X_{t}|=s+1\mid|X_{t-1}|=s,I_{t-1}=i\right) =a|ext(Xt1)|nmin{1,exp[βi(hs+1hs)]}\displaystyle=\dfrac{a|\mathrm{ext}(X_{t-1})|}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{s+1}-h_{s}\right)\right]\right\}
a𝟙{snη}202smin{1,exp[βj(hs+1hs)]}=P((s,j),(s+1,j))\displaystyle\geq\dfrac{a\mathbbm{1}\{s\leq n_{\eta}\}}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta_{j}\left(h_{s+1}-h_{s}\right)\right]\right\}=P\left((s,j),(s+1,j)\right)

So we can couple {(Xt,It)}\{(X_{t},I_{t})\} and {(Yt,Jt)}\{(Y_{t},J_{t})\} such that either |Xt|=Yt|X_{t}|=Y_{t} or |Xt|=s+1|X_{t}|=s+1 and Yt=sY_{t}=s, as desired.

Next we consider update the second coordinate. Since the probability that Jt=Jt1+1J_{t}=J_{t-1}+1 is less than 1/21/2 and so does the probability of It=It11I_{t}=I_{t-1}-1, we can couple ItI_{t} and JtJ_{t} such that it never happens both Jt=Jt1+1J_{t}=J_{t-1}+1 and It=It11I_{t}=I_{t-1}-1. This means that it suffices for us to consider the extremal case where It=Jt=iI_{t}=J_{t}=i. Since g=|Xt1|Yt1=sg=|X_{t-1}|\geq Y_{t-1}=s, we have hghsh_{g}\geq h_{s} and therefore

Pr(It=i1It1=i,|Xt1|=g)\displaystyle\Pr\left(I_{t}=i-1\mid I_{t-1}=i,|X_{t-1}|=g\right) =1a2min{1,Z^(βi)Z^(βi1)exp[(βi1βi)hg]}\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i-1})}\exp\left[\left(\beta_{i-1}-\beta_{i}\right)h_{g}\right]\right\}
1a2min{1,Z^(βi)Z^(βi1)exp[(βi1βi)hs]}=P((s,i),(s,i1)).\displaystyle\leq\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i-1})}\exp\left[\left(\beta_{i-1}-\beta_{i}\right)h_{s}\right]\right\}=P\left((s,i),(s,i-1)\right).

and similarly,

Pr(It=i+1It1=i,|Xt1|=g)\displaystyle\Pr\left(I_{t}=i+1\mid I_{t-1}=i,|X_{t-1}|=g\right) =1a2min{1,Z^(βi)Z^(βi+1)exp[(βi+1βi)hg]}\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{g}\right]\right\}
1a2min{1,Z^(βi)Z^(βi+1)exp[(βi+1βi)hs]}=P((s,i),(s,i+1)).\displaystyle\geq\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{s}\right]\right\}=P\left((s,i),(s,i+1)\right).

Therefore, we can couple ItI_{t} and JtJ_{t} such that only one of the followings can happen:

  1. (i)

    It=JtI_{t}=J_{t};

  2. (ii)

    It=iI_{t}=i and Jt=i1J_{t}=i-1;

  3. (iii)

    It=i+1I_{t}=i+1 and Jt=iJ_{t}=i.

Therefore, one always has ItJtI_{t}\geq J_{t} under the constructed coupling. ∎

The next lemma upper bounds the tt-step transition probability Pr(Yt=pY0=q)\Pr\left(Y_{t}=p\mid Y_{0}=q\right).

Lemma 8.7.

Let {(Yt,Jt)}\{(Y_{t},J_{t})\} denote the Markov process on 2\mathbb{Z}^{2} described above with parameter η(0,1)\eta\in(0,1) starting from Y0=q=ρlognY_{0}=q=\rho\log n and J0=jJ_{0}=j. Then for all integer t1t\geq 1 we have

Pr((Yt,Jt)=(0,0)(Y0,J0)=(q,))exp[(ln2)(logn)2(ρ+(ρ)22+o(1))]Z^(β)Z^(β0)exp(β0h0)exp(βhq).\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]{\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}}.

where ρ=min{ρ,1η}\rho^{\prime}=\min\{\rho,1-\eta\}.

Proof.

When ρ>1η\rho>1-\eta, namely q>(1η)lognq>(1-\eta)\log n, the chain will first move to (1η)logn(1-\eta)\log n before reaching pp. By Fact 6.7, we have

Pr((Yt,Jt)=(0,0)(Y0,J0)=(q,))=Pt((q,),(0,0))=ν((q,))ν((0,0))Pt((0,0),(q,))ν((q,))ν((0,0)).\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)=P^{t}((q,\ell),(0,0))=\frac{\nu((q,\ell))}{\nu((0,0))}P^{t}((0,0),(q,\ell))\leq\frac{\nu((q,\ell))}{\nu((0,0))}.

By (48),

ν((q,))ν((0,0)).\displaystyle\frac{\nu((q,\ell))}{\nu((0,0))}. =Z^(β)Z^(β0)20q(q!)2(q2)nqexp[β0h0βhq]\displaystyle=\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{20^{q}\cdot(q!)\cdot 2^{\binom{q}{2}}}{n^{q}}\exp\left[\beta_{0}h_{0}-\beta_{\ell}h_{q}\right]
=Z^(β)Z^(β0)exp(β0h0)exp(βhq)exp[(ln2)(logn)2(ρ+ρ22+o(1))].\displaystyle=\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}\exp\left[(\ln 2)(\log n)^{2}\left(-\rho+\frac{\rho^{2}}{2}+o(1)\right)\right].

For ρ>1η\rho>1-\eta, let τ\tau be the first time that the chain reach size q=(1η)lognq^{\prime}=(1-\eta)\log n. Then we have

Pr((Yt,Jt)=(0,0)(Y0,J0)=(q,))\displaystyle\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)
=\displaystyle={} t=0tPr(τ=t)[m]Pr((Yt,Jt)=(q,)(Y0,J0)=(q,),τ=t)\displaystyle\sum_{t^{\prime}=0}^{t}\Pr(\tau=t^{\prime})\sum_{\ell^{\prime}\in[m]}\Pr\left((Y_{t^{\prime}},J_{t^{\prime}})=(q^{\prime},\ell^{\prime})\mid(Y_{0},J_{0})=(q,\ell),\tau=t^{\prime}\right)
Pr((Ytt,Jtt)=(0,0)(Y0,J0)=(q,))\displaystyle\cdot\Pr\left((Y_{t-t^{\prime}},J_{t-t^{\prime}})=(0,0)\mid(Y_{0},J_{0})=(q^{\prime},\ell^{\prime})\right)
max[m]ν((q,))ν((0,0))\displaystyle\leq\max_{\ell^{\prime}\in[m]}\frac{\nu((q^{\prime},\ell^{\prime}))}{\nu((0,0))}
max[m]Z^(β)Z^(β0)exp(β0h0)exp(βhq)exp[(ln2)(logn)2(ρ+(ρ)22+o(1))].\displaystyle\leq\max_{\ell^{\prime}\in[m]}\frac{\widehat{Z}(\beta_{\ell^{\prime}})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell^{\prime}}h_{q^{\prime}}\right)}\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right].

The lemma then follows from the fact that for all [m]\ell^{\prime}\in[m]

Z^(β)Z^(β)exp(βhq)exp(βhq)no(logn)eO(logn)=no(logn),\frac{\widehat{Z}(\beta_{\ell^{\prime}})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{\ell^{\prime}}h_{q^{\prime}}\right)}\leq n^{o(\log n)}e^{O(\log n)}=n^{o(\log n)},

where we use Eq. 43. ∎

We now present the proof of Theorem 8.5 provided Lemmas 8.6 and 8.7.

Proof of Theorem 8.5.

Recall that γ<2(1α)\gamma<2(1-\alpha). As will be clear later, we define

η=(1α)γγ22.\eta=\sqrt{(1-\alpha)\gamma-\frac{\gamma^{2}}{2}}.

We assume that our graph satisfies the conclusions of Lemmas 5.1 and 6.5 with parameter η\eta.

Consider first Eq. 45. From Lemmas 8.6 and 8.7, we deduce that

𝔼[Wq,r]Pr((Xt,It)=(σ,)(X0,I0)=(,0))\displaystyle~{}~{}~{}~{}\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)
=𝔼[Wq,r]Z^(β0)Z^(β)exp(βhq)exp(β0h0)Pr((Xt,It)=(,0)(X0,I0)=(σ,))\displaystyle=\mathbb{E}[W_{q,r}]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right)
𝔼[Wq,0]Z^(β0)Z^(β)exp(βhq)exp(β0h0)Pr((Yt,Jt)=(0,0)(Y0,J0)=(q,))\displaystyle\leq\mathbb{E}[W_{q,0}]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)
exp[(ln2)(logn)2(ρρ22+o(1))]Z^(β0)Z^(β)exp(βhq)exp(β0h0)\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}
exp[(ln2)(logn)2(ρ+(ρ)22+o(1))]Z^(β)Z^(β0)exp(β0h0)exp(βhq)\displaystyle~{}~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}
=exp[(ln2)(logn)2(ε22+η22+o(1))]exp[(ln2)(logn)2(38ε2+o(1))],\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+o(1)\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{8}\varepsilon^{2}+o(1)\right)\right],

where recall that ρ=1+ε\rho=1+\varepsilon and ρ=1η\rho^{\prime}=1-\eta and we choose η=ε/2\eta=\varepsilon/2.

For Eq. 46, by similar argument we have

𝔼[Wp,s]Pr((Xt,It)=(σ,)(X0,I0)=(,0))\displaystyle~{}~{}~{}~{}\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)
exp[(ln2)(logn)2(1212(1α)2+o(1))]Z^(β0)Z^(β)exp(βhq)exp(β0h0)\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\frac{1}{2}-\frac{1}{2}(1-\alpha)^{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}
exp[(ln2)(logn)2(ρ+(ρ)22+o(1))]Z^(β)Z^(β0)exp(β0h0)exp(βhq)\displaystyle~{}~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}
=exp[(ln2)(logn)2(12(1α)2+η22+o(1))]exp[(ln2)(logn)2(38(1α)2+o(1))],\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}(1-\alpha)^{2}+\frac{\eta^{2}}{2}+o(1)\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{8}(1-\alpha)^{2}+o(1)\right)\right],

where recall that ρ=1\rho=1 (for maximizing 𝔼[Wp,s]\mathbb{E}[W_{p,s}]) and ρ=1η\rho^{\prime}=1-\eta and we choose η=ε/2(1α)/2\eta=\varepsilon/2\leq(1-\alpha)/2.

Therefore, we obtain Eqs. 45 and 46. The theorem then follows. ∎

9 Conclusion

In this work we revisit the work by Jerrum [Jer92] that large cliques elude the Metropolis process. We extend [Jer92] by establishing the failure of the Metropolis process (1) for all planted clique sizes k=nαk=n^{\alpha} for any constant α(0,1),\alpha\in(0,1), (2) for arbitrary temperature and Hamiltonian vector (under worst-case initialization), (3) for a large family of of temperatures and Hamiltonian vectors (under the empty clique initialization) and obtain as well (4) lower bounds for the performance of the simulated tempering dynamics.

An important future direction would be to explore the generality of our proposed reversibility and birth and death process arguments which allowed us to prove the failure of the Metropolis process when initialized at the empty clique. It is interesting to see whether the proposed method can lead to MCMC lower bounds from a specific state in other inference settings beyond the planted clique model.

Moreover, it would be interesting to see if our results can be strengthened even more. First, a current shortcoming of our lower bounds for the Metropolis process when initialized from the empty clique do not completely cover the case where β=Clogn\beta=C\log n for an arbitrary constant C>0C>0. While we almost certainly think the result continues to hold in this case, some new idea seem to be needed to prove it. Second, it seems interesting to study the regime where α=1o(1).\alpha=1-o(1). Recall that there are polynomial-time algorithms that can find a clique of size (logn/loglogn)2(\log n/\log\log n)^{2} whenever a (worst-case) graph has a clique of size n/(logn)bn/(\log n)^{b}, for some constant b>0b>0 [Fei05]. If our lower bounds for the Metropolis process could be extended to the case α=1O(loglogn/logn)\alpha=1-O(\log\log n/\log n), this would mean that for some kk the Metropolis process fails to find in polynomial-time a clique of size (1+ε)logn(1+\varepsilon)\log n when a kk-clique is planted in 𝒢(n,1/2),\mathcal{G}(n,1/2), while some other polynomial-time algorithm can find a clique of size (1+ε)logn(1+\varepsilon)\log n on every (worst-case) graph which has a clique of size k.k. Such a result, if true, will make the failure of the Metropolis process even more profound.

Acknowledgement

EM and IZ are supported by the Simons-NSF grant DMS-2031883 on the Theoretical Foundations of Deep Learning and the Vannevar Bush Faculty Fellowship ONR-N00014-20-1-2826. EM is also supported by the Simons Investigator award (622132).

References

  • [ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940–969, 2014.
  • [AFdF21] Maria Chiara Angelini, Paolo Fachin, and Simone de Feo. Mismatching as a tool to enhance algorithmic performances of monte carlo methods for the planted clique model, 2021.
  • [AFUZ19] Fabrizio Antenucci, Silvio Franz, Pierfrancesco Urbani, and Lenka Zdeborová. On the glassy nature of the hard phase in inference problems. Physical Review X, 9:011020, January 2019.
  • [AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457–466, 1998.
  • [AWZ20] Gérard Ben Arous, Alexander S. Wein, and Ilias Zadik. Free energy wells and overlap gap property in sparse PCA. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 479–482. PMLR, 2020.
  • [BAGJ20] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor pca. Annals of Probability, 48:2052–2087, 07 2020.
  • [BB20] Matthew Brennan and Guy Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
  • [BE76] B. Bollobas and P. Erdös. Cliques in random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 80(3):419–427, 1976.
  • [BHK+19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.
  • [BR04] Nayantara Bhatnagar and Dana Randall. Torpid mixing of simulated tempering on the potts model. In SODA, volume 4, pages 478–487. Citeseer, 2004.
  • [BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.
  • [DGGP14] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. Combinatorics, Probability and Computing, 23(1):29–49, 2014.
  • [DM15] Yash Deshpande and Andrea Montanari. Finding hidden cliques of size N/e\sqrt{N/e} in nearly linear time. Foundations of Computational Mathematics, 15(4):1069–1128, 2015.
  • [DSC06] Persi Diaconis and Laurent Saloff-Coste. Separation cut-offs for birth and death chains. The Annals of Applied Probability, 16(4):2098–2122, 2006.
  • [Fei05] Uriel Feige. Approximating maximum clique by removing subgraphs. SIAM J. Discret. Math., 18(2):219–225, feb 2005.
  • [FGR+17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
  • [GM75] G. R. Grimmett and C. J. H. McDiarmid. On colouring random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 77(2):313–324, 1975.
  • [GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property, 2019.
  • [Jer92] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.
  • [Kar79] Richard M Karp. Recent advances in the probabilistic analysis of graph-theoretic algorithms. In International Colloquium on Automata, Languages, and Programming, pages 338–339. Springer, 1979.
  • [Kuč95] Luděk Kučera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2-3):193–212, 1995.
  • [LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  • [MBC+20] Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Complex dynamics in simple neural networks: Understanding gradient flow in phase retrieval, 2020.
  • [MKUZ19] Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4333–4342. PMLR, 09–15 Jun 2019.
  • [MP92] Enzo Marinari and Giorgio Parisi. Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6):451, 1992.
  • [MW15] Zongming Ma and Yihong Wu. Computational barriers in minimax submatrix detection. The Annals of Statistics, 43(3):1089–1116, 2015.
  • [MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143(3):401–439, 2009.
  • [RF10] Dorit Ron and Uriel Feige. Finding hidden cliques in linear time. Discrete Mathematics & Theoretical Computer Science, 2010.
  • [RM14] Emile Richard and Andrea Montanari. A statistical model for tensor pca. Advances in Neural Information Processing Systems, 27, 2014.

Appendix A Deferred Proofs

Proof of Lemma 5.1.

For part (1), notice that by linearity of expectation and the elementary application of Stirling’s formula that for m2m1m_{2}\leq m_{1} with m2=o(m1)m_{2}=o(m_{1}) it holds (m1m2)=(m1/m2)m2(1+o(1))\binom{m_{1}}{m_{2}}=(m_{1}/m_{2})^{m_{2}(1+o(1))} we have

𝔼[Wq,r]\displaystyle\mathbb{E}[W_{q,r}] =(kr)(nkqr)2(r2)(q2)\displaystyle=\binom{k}{r}\binom{n-k}{q-r}2^{\binom{r}{2}-\binom{q}{2}}
=(nαγlogn)(nnαρlognγlogn)2(γlogn2)(ρlogn2)\displaystyle=\binom{n^{\alpha}}{\left\lfloor\gamma\log n\right\rfloor}\binom{n-n^{\alpha}}{\left\lfloor\rho\log n\right\rfloor-\left\lfloor\gamma\log n\right\rfloor}2^{\binom{\left\lfloor\gamma\log n\right\rfloor}{2}-\binom{\left\lfloor\rho\log n\right\rfloor}{2}}
=exp[(ln2)(logn)2(αγ+(ργ)+γ22ρ22+o(1))]\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(\alpha\gamma+(\rho-\gamma)+\frac{\gamma^{2}}{2}-\frac{\rho^{2}}{2}+o(1)\right)\right]
=exp[(ln2)(logn)2(ρρ22(1α)γ+γ22+o(1))].\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right].

For part (2), notice that Wq,0W_{q,0} is distributed as the number of qq-cliques in 𝒢(nk,12)\mathcal{G}(n-k,\frac{1}{2}). Hence, standard calculation (e.g. [BE76, Proof of Theorem 1]) prove that since ρ=2Ω(1),\rho=2-\Omega(1),

Var(Wq,0)𝔼[Wq,0]2=O(q4n2)=O(1n).\frac{\mathrm{Var}(W_{q,0})}{\mathbb{E}[W_{q,0}]^{2}}=O(\frac{q^{4}}{n^{2}})=O(\frac{1}{n}).

Hence, Chebyshev’s inequality yields that with probability 1O(1/n),1-O(1/n), Wq,012𝔼[Wq,0].W_{q,0}\geq\frac{1}{2}\mathbb{E}[W_{q,0}]. Taking a union bound over the different values of q=O(logn)q=O(\log n) completes the proof of this part.

Finally, part (3) follows directly from part (1), Markov’s inequality and a union bound over the possible values of r,qr,q. ∎

Proof of Lemma 6.5.

It clearly suffices to establish this result for k=0k=0, i.e. for an the random graph G(n,12).G(n,\frac{1}{2}). For any fixed |U|(1η)logn,|U|\leq(1-\eta)\log n, |A(U)||A(U)| follows a Binomial distribution with n|U|n-|U| trials and probability 12|U|.\frac{1}{2^{|U|}}. In particular, it has a mean (1+o(1))n2|U|=Ω(nη)(1+o(1))\frac{n}{2^{|U|}}=\Omega(n^{\eta}). Hence, by Hoeffding’s inequality with probability 1exp(Ω(nη))1-\exp(-\Omega(n^{\eta})) it holds |A(U)|n202|U||A(U)|\geq\frac{n}{20\cdot 2^{|U|}}. As there are only (nlogn)=nO(logn)\binom{n}{\left\lfloor\log n\right\rfloor}=n^{O(\log n)} the result follows from a union bound over |U||U|. ∎