Almost-Linear Planted Cliques
Elude the Metropolis Process

Zongchen Chen [email protected] Department of Mathematics, MIT Elchanan Mossel [email protected] Department of Mathematics, MIT Ilias Zadik [email protected] Department of Mathematics, MIT

Abstract

A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size $k=\Theta(n^{\alpha}),\alpha\in(0,1/2)$ , which is planted in the Erdős-Rényi random graph $G(n,1/2)$ , in polynomial time. Information theoretically it is possible to find such planted cliques as soon as $k\geq(2+\varepsilon)\log n$ .

Since the work of Jerrum, the computational problem of finding a planted clique in $G(n,1/2)$ was studied extensively and many polynomial time algorithms were shown to find the planted clique if it is of size $k=\Omega(\sqrt{n})$ , while no polynomial-time algorithm is known to work when $k=o(\sqrt{n})$ . The computational problem of finding a planted clique of $k=o(\sqrt{n})$ is now widely considered as a foundational problem in the study of computational-statistical gaps. Notably, the first evidence of the problem’s algorithmic hardness is commonly attributed to the result of Jerrum from 1992.

In this paper we revisit the original Metropolis algorithm suggested by Jerrum. Interestingly, we find that the Metropolis algorithm actually fails to recover a planted clique of size $k=\Theta(n^{\alpha})$ for any constant $0\leq\alpha<1$ , unlike many other efficient algorithms that succeed when $\alpha>1/2$ . Moreover, we strengthen Jerrum’s results in a number of other ways including:

•

Like many results in the MCMC literature, the result of Jerrum shows that there exists a starting state (which may depend on the instance) for which the Metropolis algorithm fails to find the planted clique in polynomial time. For a wide range of temperatures, we show that the algorithm fails when started at the most natural initial state, which is the empty clique. This answers an open problem stated in Jerrum (1992). It is rather rare to be able to show the failure of a Markov chain starting from a specific state.
•

We show that the simulated tempering version of the Metropolis algorithm, a more sophisticated temperature-exchange variant of it, also fails at the same regime of parameters.

Our results substantially extend Jerrum’s result. Furthermore, they confirm recent predictions by Gamarnik and Zadik (2019) and Angelini, Fachin, de Feo (2021). Finally, they highlight the subtleties of using the sole failure of one, however natural, family of algorithms as a strong sign of a fundamental statistical-computational gap.

1 Introduction

The problem of finding in polynomial-time large cliques in the $n$ -vertex Erdős-Rényi random graph $\mathcal{G}(n,1/2),$ where each edge is present independently with probability $1/2$ , is a fundamental open problem in algorithmic random graph theory [Kar79]. In $\mathcal{G}(n,1/2)$ it is known that there is a clique of size $(2-o(1))\log n$ with high probability (w.h.p.) as $n\rightarrow+\infty$ , and several simple polynomial-time algorithms can find a clique of size $(1+o(1))\log n$ w.h.p. which is nearly half the size of the maximum clique (e.g. see [GM75]). (Note that here and everywhere we denote by $\log$ the logarithm with base 2.) Interestingly, there is no known polynomial-time algorithm which is able to find a clique of size $(1+\varepsilon)\log n$ for some constant $\varepsilon>0$ . The problem of finding such a clique in polynomial-time was suggested by Karp [Kar79] and still remains open to this day.

Jerrum’s result and the planted clique model

Motivated by the challenge of finding a clique in $\mathcal{G}(n,1/2)$ , Jerrum in [Jer92] established that large cliques elude the Metropolis process in $\mathcal{G}(n,1/2)$ . Specifically, he considered the following Gibbs Measure for $\beta>0,$ $G\sim\mathcal{G}(n,1/2)$

\pi_{\beta}(C)\propto\exp(\beta|C|),

(1)

where $C\subseteq V(G)$ induces a clique in the instance $G$ . Notice that since $\beta>0,$ by definition $\pi_{\beta}$ assigns higher mass on cliques of larger size. Jerrum considered the Metropolis process with stationary measure $\pi_{\beta}$ , which is initialized in some clique, say $X_{0}$ of $G.$ Then the process moves “locally” between cliques which differ in exactly one vertex. In more detail, every step of the Metropolis process is described as follows (see also Algorithm 1). Choose a vertex $v\in V$ uniformly at random. If $v\in X_{t}$ where $X_{t}$ is the current clique, then let $X_{t+1}=X_{t}\setminus\{v\}$ (a “downward” step) with probability $e^{-\beta}$ and let $X_{t+1}=X_{t}$ with the remaining probability; else if $v\notin C_{t}$ and $X_{t}\cup\{v\}$ is a clique, then let $X_{t+1}=X_{t}\cup\{v\}$ (an “upward” step); otherwise, let $X_{t+1}=X_{t}$ . For this process, Jerrum established the negative result that it fails to reach a clique of size $(1+\varepsilon)\log n$ in polynomial-time for any constant $\varepsilon>0$ . We note that, as customary in the theory of Markov chains, the failure is subject to the Metropolis process being “worst-case initialized”, that is starting from some “bad” clique $X_{0}$ . This is a point we revisit later in this work.

The planted clique problem was introduced by Jerrum in [Jer92] in order to highlight the failure of the Metropolis process. For $k\in\mathbb{N},k\leq n$ the planted clique model $\mathcal{G}(n,1/2,k)$ is defined by first sampling an instance of $\mathcal{G}(n,1/2)$ , then choosing $k$ out of the $n$ vertices uniformly at random and finally adding all the edges between them (if they did not already exist from the randomness of $\mathcal{G}(n,1/2)$ ). The set of $k$ chosen vertices is called the planted clique $\mathcal{PC}$ . It is perhaps natural to expect that the existence of $\mathcal{PC}$ in $G$ can assist the Metropolis process to reach faster cliques of larger size. Yet, Jerrum proved that as long as $k=\left\lfloor n^{\alpha}\right\rfloor$ for some constant $\alpha<1/2$ the Metropolis process continues to fail to find a clique of size $(1+\varepsilon)\log n$ in polynomial-time, for any $\varepsilon>0$ . As he also noticed when $\alpha>1/2$ one can actually trivially recover $\mathcal{PC}$ from $G$ via a simple heuristic which chooses the top- $k$ degrees of the observed graph (see also [Kuč95]). In particular, one can trivially find a clique of size much larger than $\log n$ when $\alpha>1/2.$ Importantly though, he never proved that the Metropolis process actually succeeds in finding large cliques when $\alpha>1/2,$ leaving open a potentially important piece of the performance of the Metropolis process in the planted clique model. In his words from the conclusion of [Jer92]:

“If the failure of the Metropolis process to reach $(1+\varepsilon)\log n$ -size cliques could be shown to hold for some $\alpha>1/2$ , it would represent a severe indictment of the Metropolis process as a heuristic search technique for large cliques in a random graph.”

In this work, we seek to investigate the performance of the Metropolis process for all $\alpha\in(0,1).$

Input: a graph

G

, a starting clique

X_{0}\in\Omega

, stopping time

T

for $t=1,\dots,T$ do

Pick

v\in V

uniformly at random;

C\leftarrow X_{t-1}\oplus\{v\}

;

if $C\in\Omega$ then

X_{t}\leftarrow\begin{cases}C,&\text{with probability~{}}\min\{1,\exp(\beta(|C|-|X_{t}|))\};\\ X_{t-1},&\text{with remaining probability};\end{cases}

else

X_{t}\leftarrow X_{t-1}

;

end if

end for

Output:

X_{T}

Algorithm 1 Metropolis Process [Jer92]

The planted clique conjecture

Following the work of Jerrum [Jer92] and Kucera [Kuč95] the planted clique model has received a great deal of attention and became a hallmark of a research area that is now called study of statistical-computational gaps. The planted clique problem can be phrased as a statistical or inference problem in the following way: given an instance of $\mathcal{G}(n,1/2,k)$ recover $\mathcal{PC},$ the planted clique vertices. It is impossible to recover the clique when it is of size $k<(2-\varepsilon)\log n$ for any constant $\varepsilon>0$ (see e.g. [ACV14]), but possible information theoretically (and in quasi-polynomial time $n^{O(\log n)}$ ) whenever $k>(2+\varepsilon)\log n$ for any constant $\varepsilon>0$ (see e.g. the discussion in [FGR⁺17]). Meanwhile, multiple polynomial-time algorithms have been proven to succeed in recovering the planted clique but only under the much larger size $k=\Omega(\sqrt{n})$ [AKS98, RF10, DGGP14, DM15]. The intense study of the model, as well as the failure to find better algorithms, has lead to the planted clique conjecture, stating that the planted clique recovery task is impossible in polynomial-time, albeit information-theoretically possible, whenever $(2+\varepsilon)\log n\leq k=o(\sqrt{n})$ . The planted clique conjecture has since lead to many applications, a highlight of which is that it serves as the main starting point for building reductions between a plethora of statistical tasks and their computational-statistical trade-offs (see e.g. [BR13, MW15, BB20]).

Unfortunately, because of the average-case nature of the planted clique model, a complexity theory explanation is still lacking for the planted clique conjecture. For this reason, researchers have so far mainly focused on supporting the conjecture by establishing the failure of restricted families of polynomial-time methods, examples of which are Sum-of-Squares lower bounds [BHK⁺19], statistical-query lower bounds [FGR⁺17] and low-temperature MCMC lower bounds [GZ19]. Because of the focus on establishing such restricted lower bounds, the vast majority of works studying the planted clique conjecture cite the result of Jerrum [Jer92] on the failure of the Metropolis process as the “first evidence” for the validity of it. Note though that given what we discussed above, such a claim for Jerrum’s result can be problematic. Indeed, recall that Jerrum have not established the success of the Metropolis process when $k=\left\lfloor n^{\alpha}\right\rfloor,\alpha>1/2$ in reaching cliques of size $(1+\varepsilon)\log n$ , let alone identifying the planted clique. That means that the Metropolis process could in principle simply be not recovering the planted clique for all $\alpha\in(0,1),$ offering no evidence for the planted clique conjecture.

In fact, in 2019, Gamarnik and Zadik [GZ19] studied the performance of various MCMC methods (not including the Metropolis process described above) for the planted clique model and conjectured their failure to recover the planted clique when $k<n^{2/3}$ , that is the failure much beyond the $\sqrt{n}$ threshold. For this reason, they raised again the question whether Jerrum’s original Metropolis process actually fails for $k$ larger than $\sqrt{n}$ . Later work by Angelini, Fachin and de Feo [AFdF21] simulated the performance of the Metropolis process and predicted its failure to recover the planted clique again much beyond the $\sqrt{n}$ threshold. Both of these results suggest that the Metropolis process may not be able to recover the planted clique for some values of $k=n^{\alpha},\alpha>1/2$ . We consider the absence of a negative or positive result for whether the Metropolis process of [Jer92] recovers the planted clique a major gap in the literature of the planted clique conjecture which we investigate in this work.

Empty clique initialization

A common deficiency of many Markov chain lower bounds is their failure to establish lower bounds for starting from any particular state. Indeed, many of the lower bounds in the theory of Markov chains including spectral and conductance lower bounds are proved with high probability or in expectation over the stationary measure of the chain. Of course, since the lower bounds provide usually evidence that is hard to sample from the stationary distribution, the deficiency of the lower bound is that it is hard to find a state where the lower bound applies! This is indeed the case for the specific example of the Metropolis process in [Jer92], where a-priori, other than the empty set we do not know which subsets of the nodes make a clique.

Jerrum noted this deficiency in his paper [Jer92]:

“The most obviously unsatisfactory feature of Theorems … is that these theorems assert only the existence of a starting state from which the Metropolis process takes super-polynomial time to reach a large clique … It seems almost certain that the empty clique is a particular example of a bad starting state, but different proof techniques would be required to demonstrate this.”

In this work we develop a new approach based on comparison with birth and death processes allowing us to prove lower bounds starting from the empty clique. We note that previous work on birth and death processes established Markov chain bounds starting from a specific state [DSC06].

2 Main Contribution

We now present our main contributions on the performance of the Metropolis process for the planted clique model $\mathcal{G}(n,1/2,k),$ where $k=\left\lfloor n^{\alpha}\right\rfloor$ for some constant $\alpha\in(0,1)$ . Our results hold for the Metropolis process associated with any Gibbs measure defined as follows. For an arbitrary Hamiltonian vector $h=(h_{q},q\in[n])$ and arbitrary $\beta\geq 0,$ let

\displaystyle\pi_{\beta}(C)\propto\exp(\beta h_{|C|}),C\in\Omega(G)

(2)

where by $\Omega(G)$ we denote the cliques of $G$ . In some results, we would require a small degree of regularity from the vector $h$ , which for us is to satisfy $h_{0}=0$ and to be 1-Lipschitz in the sense that

\displaystyle|h_{q}-h_{q^{\prime}}|\leq|q-q^{\prime}|,\forall q,q^{\prime}\in[\left\lfloor 2\log n\right\rfloor].

(3)

We call these two conditions, as simply $h$ being “regular”. The regularity property is for technical reasons, as it allows to appropriately bound the transition probabilities of the Metropolis process between small cliques. Notice that $h_{q}=q,q\in[n]$ is trivially regular and corresponds to the Gibbs measure and Metropolis process considered by Jerrum, per (1).

Our first theorem is a very general lower bound which holds under worst-case initialization for all $\alpha\in(0,1).$

Theorem 2.1 (Informal version of Theorem 6.1 and Theorem 7.1).

Let $k=\left\lfloor n^{\alpha}\right\rfloor$ for any $\alpha\in(0,1)$ .

I.

For arbitrary $h$ and arbitrary inverse temperature $\beta$ , for any constant $\gamma>0$ there exists a “bad” initialization such that the Metropolis process requires $n^{\Omega(\log n)}$ time to reach $\gamma\log n$ intersection with the planted clique.
II.

For arbitrary regular $h$ and arbitrary inverse temperature $\beta=O(\log n)$ , for any constant $\varepsilon>0,$ there exists a “bad” initialization such that the Metropolis process requires $n^{\Omega(\log n)}$ time to reach a clique of size $(1+\varepsilon)\log n$ .

One way to think about the two parts of Theorem 2.1 is in terms of the statistical failure and the optimization failure of the Metropolis algorithm. The first part establishes the statistical failure as the algorithm cannot even find $\gamma\log n$ vertices of the planted clique. The second part shows that it fails as an optimization algorithm: the existence of a huge planted clique still does not improve the performance over the $(1+\varepsilon)\log n$ level.

Note that second part extends the result of [Jer92] to all $\alpha\in(0,1),$ when $\beta=O(\log n)$ . (The case $\beta=\omega(\log n)$ is proven below in Theorem 2.3 since for that low temperature the process behaves like the greedy algorithm). In Jerrum’s words, our result reveals “a severe indictment of the Metropolis process in finding large cliques in random graphs”; even a $n^{1-\delta}$ -size planted clique does not help the Metropolis process to reach cliques of size $(1+\varepsilon)\log n$ in polynomial time. At a technical level it is an appropriate combination of the bottleneck argument used by Jerrum in [Jer92], which focus on comparing cliques based on their size, and a separate bottleneck argument comparing cliques based on how much they intersect the planted clique.

Our next result concerns the case where the Metropolis process is initialized from the empty clique. We obtain for all $\beta=o(\log n)$ the failure of the Metropolis process starting from any $o(\log n)$ -size clique (including in particular the empty clique). In particular, this answers in the affirmative the question from [Jer92] for all $\beta=o(\log n).$

Theorem 2.2 (Informal version of Theorem 6.2 and Theorem 7.6).

Let $k=\left\lfloor n^{\alpha}\right\rfloor$ for any $\alpha\in(0,1)$ . Then for arbitrary regular $h$ and arbitrary inverse temperature $\beta=o(\log n)$ , the Metropolis process started from any clique of size $o(\log n)$ (in particular the empty clique) requires $n^{\Omega(\log n)}$ time to reach a clique which for some constants $\gamma,\varepsilon>0$ either

•

has at least $\gamma\log n$ intersection with the planted clique or,
•

has size at least $(1+\varepsilon)\log n.$

The proof of Theorem 2.2 is based on the expansion properties of all $(1-\varepsilon)\log n$ -cliques of $G(n,1/2)$ . The expansion properties allow us to compare the Metropolis process to an one dimensional birth and death process that keeps track of the size of the clique (or the size of the intersection with the hidden clique). The analysis of this process is based on a time-reversal argument.

One can wonder, whether a lower bound can be also established in the case $\beta=\Omega(\log n)$ when we start from the empty clique. We partially answer this, by obtaining the failure of the Metropolis process starting from the empty clique in the case where $h_{q}=q,q\in[n]$ and $\beta=\omega(\log n)$ .

Theorem 2.3 (Informal version of Theorem 7.7).

Let $k=\left\lfloor n^{\alpha}\right\rfloor$ for any $\alpha\in(0,1)$ . For $h_{q}=q,q\in[n]$ and arbitrary inverse temperature $\beta=\omega(\log n)$ , the Metropolis process started from any clique of size $o(\log n)$ (in particular the empty clique) requires $n^{\omega(1)}$ time to reach a clique which for some constants $\gamma,\varepsilon>0$ either

•

has at least $\gamma\log n$ intersection with the planted clique or,
•

has size at least $(1+\varepsilon)\log n.$

The key idea here is that for $h_{q}=q,$ if $\beta=\omega(\log n)$ then with high probability the Metropolis process never removes vertices, so it is in fact the same as the greedy algorithm. This observation allows for a much easier analysis of the algorithm.

In a different direction, Jerrum in [Jer92] asked whether one can extend the failure of the Metropolis process to the failure of simulated annealing on finding large cliques in random graphs. We make a step also in this direction, by considering the simulated tempering (ST) version of the Metropolis process [MP92]. The simulated tempering is a celebrated Monte Carlo scheme originated in the physics literature that considers a Gibbs measure, say the one in (2), at different temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ , in other words considers the Gibbs measures $\pi_{\beta_{1}},\pi_{\beta_{2}},\ldots,\pi_{\beta_{M}}$ . Then it runs a Metropolis process on the product space between the set of temperatures and the Gibbs measures, which allows to modify the temperature during its evolution and interpolate between the different $\pi_{\beta_{i}}$ (for the exact definitions see Section 8.1). The ST dynamics have been observed extensively in practise to outperform their single temperature Metropolis process counterparts but rigorous analysis of these processes is rather rare, see [BR04].

It turns out that our “bad” initialization results extend in a straightforward manner to the ST dynamics.

Theorem 2.4 (Informal version of Theorem 8.3, and Theorem 8.4).

Let $k=\left\lfloor n^{\alpha}\right\rfloor$ for any $\alpha\in(0,1)$ .

•

For arbitrary $h$ , arbitrary $m\in\mathbb{N}$ and arbitrary sequence of inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ , there exists a “bad” initialization such as the ST dynamics require $n^{\Omega(\log n)}$ time to reach $\varepsilon\log n$ intersection with the planted clique, for some constant $\varepsilon>0.$
•

For arbitrary regular $h$ , arbitrary $m\in\mathbb{N}$ and arbitrary sequence of inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ with $\max_{i}|\beta_{i}|=O(\log n)$ , there exists a “bad” initialization such as the ST dynamics require $n^{\Omega(\log n)}$ time to reach a clique of size $(1+\varepsilon)\log n$ -size for some constant $\varepsilon>0.$

The key idea behind the proof of Theorem 2.4 is that the bottleneck set considered in the proof of Theorem 2.1 is “universal”, in the sense that the same bottleneck set applies to all inverse temperatures $\beta.$ For this reason, the same bottleneck argument can be applied to simulated tempering that is allowed to change temperatures during its evolution.

On top of this, we establish a lower bound on the ST dynamics, when started from the empty clique.

Theorem 2.5 (Informal version of Theorem 8.5).

Let $k=\left\lfloor n^{\alpha}\right\rfloor$ for any $\alpha\in(0,1)$ . For arbitrary regular and increasing $h_{q}=q,q\in[n],$ arbitrary $m=o(\log n)$ and arbitrary sequence of inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ with $\max_{i}|\beta_{i}|=O(1)$ , the ST dynamics started from the pair of the empty clique and the temperature $\beta_{1}$ require $n^{\omega(1)}$ time to reach a clique which for some constant $\gamma,\varepsilon>0$ either

•

has at least $\gamma\log n$ intersection with the planted clique or,
•

has size at least $(1+\varepsilon)\log n.$

2.1 Further Comparison with Related Work

Comparison with [AFdF21]

As mentioned in the Introduction, the authors of [AFdF21] predicted the failure of the Metropolis process for the Gibbs measure (1) to recover the planted clique. Specifically, based on simulations they predicted the failure for all $k=\left\lfloor n^{\alpha}\right\rfloor,$ for $\alpha<\alpha^{*}\approx 0.91,$ though they comment that the exact predicted threshold $\alpha^{*}\approx 0.91$ could be an artifact of finite sized effects. In this work, using Theorem 2.1 we establish that (worst-case initialized) the Metropolis process fails for all $\alpha\in(0,1),$ confirming their prediction but for $\alpha^{*}=1.$ In the same work, the authors suggest studying instead the Metropolis process for another Gibbs measure of the form (2), which they call BayesMC. Their suggested Gibbs measure for a specific value of $\beta$ matches a (slightly perturbed) version of the posterior of the planted clique recovery problem. The authors predict based on simulations that by appropriately turning (and “mismatching”) $\beta>0$ , the Metropolis chain now recovers the planted clique for all $k=\left\lfloor n^{\alpha}\right\rfloor,\alpha>1/2$ . In this work, we partially refute this prediction using Theorem 2.1, as (worst-case initialized) the Metropolis process for any Gibbs measure (2), including the mismatched posterior that [AFdF21] considers, fails to recover the planted clique for all $\alpha\in(0,1).$

MCMC underperformance in statistical inference

Our lower bounds show the suboptimality of certain natural MCMC methods in inferring the planted clique in a regime where simple algorithms work. Interestingly, this agrees with a line of work establishing the suboptimality of MCMC methods in inferring hidden structures in noisy environments. Such a phenomenon have been generally well-understood in the context of tensor principal component analysis [RM14, BAGJ20], where Langevin dynamics and gradient descent on the empirical landscape are known to fail to infer the hidden tensor, when simple spectral methods succeed. Moreover, this suboptimality has been recently observed through statistical physics methods for other models including sparse PCA [AFUZ19], the spiked matrix-tensor model [MKUZ19] and phase retrieval [MBC⁺20]. Finally, it has been also observed for a different family of MCMC methods but again for the planted clique model in [GZ19].

3 Proof Techniques and Intuitions

In this section, we offer intuition regarding the proofs of our results. In the first two subsections, we make a proof overview subsection we provide intuition behind our worst-case initialization results for the Metropolis process Theorem 2.1 and ST dynamics Theorem 2.4. In the following one we discuss about our results on the Metropolis process and ST dynamics with the empty clique initialization Theorems 6.2 and 7.6.

3.1 Worst-case Initialization for Reaching Large Intersection

We start with discussing the lower bound for obtaining $\gamma\log n$ intersection with the planted clique. We employ a relatively simple bottleneck argument, based on Lemma 5.5.

For $q,r\in\mathbb{N}$ let us denote by $W_{q,r}$ the number of cliques in $G\sim\mathcal{G}(n,1/2,k)$ which have size $q$ and intersect the planted clique exactly on $r$ vertices. Our first starting point towards building the bottleneck is the following simple observation. For any $q<2\log n,$ and $r=\gamma\log n$ for a constant $0<\gamma<2(1-\alpha)$ we have w.h.p. as $n\rightarrow+\infty.$

\displaystyle W_{q,r}/W_{q,0}\leq n^{-\Omega(\log n)}.

(4)

In words, the number of $q$ -cliques of intersection $r$ with the planted clique are a quasi-polynomial factor less than the number of $q$ -cliques which are disjoint with the planted clique. Indeed, at the first-moment level it can be checked to hold $\mathbb{E}[W_{q,r}]/\mathbb{E}[W_{q,0}]=\exp((-((1-\alpha)\gamma+\frac{\gamma^{2}}{2})\ln 2(\log n)^{2})$ and a standard second moment argument gives the result (both results are direct outcomes of Lemma 5.1). Notice that this property holds for all $\alpha\in(0,1).$

Now let us assume for start that $\beta=0$ . In that case the Gibbs measure $\pi_{0}$ is simply the uniform measure over cliques of $G$ . For $r=\gamma\log n$ let $\mathcal{A}_{r}$ be the subset of the cliques with intersection with the planted clique at most $r$ . Using standard results (see Lemma 5.5) to prove our hitting time lower bound it suffice to show for any $r=\gamma\log n$ with constant $\gamma>0,$

\displaystyle\frac{\pi_{0}(\partial\mathcal{A}_{r})}{\pi_{0}(\mathcal{A}_{r})}=\frac{\sum_{q=1}^{n}W_{q,r}}{\sum_{q\in[n],0\leq s\leq r}W_{q,s}}\leq n^{-\Omega(\log n)},

(5)

w.h.p. as $n\rightarrow+\infty$ . Indeed, given such result based on Lemma 5.5 there is an initialization of the Metropolis process in $\mathcal{A}_{r}$ from which it will take quasi-polynomial time to reach the boundary of $\mathcal{A}_{r}$ , which are exactly the cliques of intersection $r$ with the planted clique.

Now another first moment argument, (an outcome of part (3) of Lemma 5.1) allows us to conclude that since $\gamma<2(1-\alpha)$ is not too large, the only cliques of intersection $r=\gamma\log n$ with the planted clique satisfy $q<(2-\varepsilon)\log n$ for small enough $\varepsilon>0.$ In other words $W_{q,r}=0$ unless $q<(2-\varepsilon)\log n.$ But now using also (4) we have

\displaystyle\sum_{q\in[n],0\leq s\leq r}W_{q,s}\geq\sum_{q\in[(2-\varepsilon)\log n]}W_{q,0}\geq n^{\Omega(\log n)}\sum_{q\in[(2-\varepsilon)\log n]}W_{q,r}=n^{\Omega(\log n)}\sum_{q\in[n]}W_{q,r},

and the result follows.

Now, the interesting thing is that the exact same calculation works for arbitrary $\beta\geq 0$ and arbitrary Hamiltonian vector $h.$ Consider as before the same subset of cliques $\mathcal{A}_{r}.$ It suffices to show

\displaystyle\frac{\pi_{\beta}(\partial\mathcal{A}_{r})}{\pi_{\beta}(\mathcal{A}_{r})}=\frac{\sum_{q=1}^{n}W_{q,r}e^{\beta h_{q}}}{\sum_{q\in[n],0\leq s\leq r}W_{q,s}e^{\beta h_{q}}}\leq n^{-\Omega(\log n)},

(6)

But as before

\displaystyle\sum_{q\in[n],0\leq s\leq r}W_{q,s}e^{\beta h_{q}}\geq\sum_{q\in[(2-\varepsilon)\log n]}W_{q,0}e^{\beta h_{q}}\geq n^{\Omega(\log n)}\sum_{q\in[(2-\varepsilon)\log n]}W_{q,r}e^{\beta h_{q}}=n^{\Omega(\log n)}\sum_{q\in[n]}W_{q,r}e^{\beta h_{q}},

and the general result follows.

Interestingly, this bottleneck argument transfers easily to a bottleneck for the ST dynamics. The key observation is that to construct the bottleneck set for the Metropolis process, we used the same bottleneck set $\mathcal{A}_{r}$ for all inverse temperatures $\beta$ and Hamiltonian vectors $h$ . Now the ST dynamics are a Metropolis process on the enlarged product space of the temperatures times the cliques. In particular, the stationary distribution of the ST dynamics is simply a mixture of the Gibbs distributions $\pi_{\beta_{i}},i\in[m]$ say $\pi_{\mathrm{ST}}=\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}}$ for some weights $v_{i},i\in[m]$ (see Section 8.1 for the exact choice of the weights $v_{i},i\in[m]$ , though they are not essential for the argument). Now using (6) and that the boundary operator satisfies $\partial([m]\times\mathcal{A}_{r})=[m]\times\partial(\mathcal{A}_{r})$ we immediately have

\displaystyle\frac{\pi_{\mathrm{ST}}(\partial([m]\times\mathcal{A}_{r}))}{\pi_{\mathrm{ST}}([m]\times\mathcal{A}_{r})}=\frac{\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}}(\partial\mathcal{A}_{r})}{\sum_{i=1}^{m}v_{i}\pi_{\beta_{i}}(\mathcal{A}_{r})}\leq n^{-\Omega(\log n)}.

(7)

The result then follows again using Lemma 5.5.

3.2 Worst-case Initialization for Reaching Large Cliques

The worst-case initialization lower bound for reaching $(1+\varepsilon)\log n$ cliques is also based on a bottleneck argument. This time the argument is more involved and is a combination of the bottleneck argument by Jerrum in [Jer92] which worked for $\alpha<1/2$ and the bottleneck argument used in the previous subsection for reaching large intersection with the planted clique which worked for all $\alpha<1.$

We first need some notation. For $r\leq q$ we denote by $\Omega_{q,r}$ the set of $q$ -cliques of intersection $r$ with the planted clique and $\Omega_{q,*}$ the set of all cliques of size $q$ . We also define $\Omega_{q,<r}$ the set of $q$ -cliques of intersection less than $r$ with the planted clique and analogously by $\Omega_{<q,r}$ the set of cliques of size less than $q$ and intersection $r$ with the planted clique

We first quickly remind the reader the argument of Jerrum. Recall the notion of a $q$ -gateway clique from [Jer92], which informally is the last clique of its size in some path starting from the empty clique and ending to a clique of size $q$ (see Section 7 for the exact definition). For $q$ we call $\Psi_{q}$ the set of of $q$ -gateway cliques. Importantly, for any $p<q$ any path from the empty clique to a $q$ -clique crosses from some $q$ -gateway of size $p.$ Jerrum’s bottleneck argument in [Jer92] is then based on the observation that assuming $\alpha<1/2$ for $\varepsilon>0$ a small enough constant, if $p=\left\lfloor(1+2\varepsilon/3)\log n\right\rfloor$ and $q=\left\lfloor(1+\varepsilon)\log n\right\rfloor$ then

\displaystyle|\Psi_{q}\cap\Omega_{p,*}|/|\Omega_{p,*}|\leq n^{-\Omega(\log n)}.

(8)

Unfortunately, such a bottleneck argument is hopeless if $\alpha>1/2$ since in that case most cliques of size at most $q$ are fully included in the planted clique and the ratio trivializes.

We identify the new bottleneck by leveraging more the “intersection axis” with the planted clique”. Our first observation is that a relation like (8) holds actually for all $\alpha<1$ if the cliques are restricted to have low-intersection with the planted clique, i.e. for all $\alpha\in(0,1)$ if $r=\gamma\log n$ for small enough constant $\gamma>0$ and $p,q$ defined as above then it holds

\displaystyle|\Psi_{q}\cap\Omega_{p,<r}|/|\Omega_{p,<r}|\leq n^{-\Omega(\log n)}.

(9)

The second observation, is that for the Metropolis process to hit a clique of size $q=(1+\varepsilon)\log n$ one needs to hit either $\Omega_{<q,r},$ that is a clique of size less than $q$ and intersection $r$ with the planted clique, or a clique in $\Psi_{q}\cap\Omega_{p,<r},$ that is a $q$ -gateway of size $p$ and intersection less than $r.$ Set this target set $\mathcal{B}=(\Psi_{q}\cap\Omega_{p,<r})\cup\Omega_{<q,r}$ which we hope to be “small” enough to create a bottleneck.

We now make the following construction which has boundary included in the target set $\mathcal{B}.$ Consider $\mathcal{A}$ the set of all cliques reachable from a path with start from the empty clique and uses only cliques not included in $\mathcal{B},$ except maybe for the destination. It is easy to see $\partial A\subseteq B$ and because of the inclusion of $q$ -gateways in the definition of $\mathcal{B},$ one can also see that no $q$ -clique is in $\mathcal{A}$ . Therefore using Lemma 5.5 it suffices to show that w.h.p.

\displaystyle\pi_{\beta}(\mathcal{B})/\pi_{\beta}(\mathcal{A})\leq n^{-\Omega(\log n)}.

(10)

To show (10) we observe that $\Omega_{p,<r}\cup\Omega_{\leq p,0}\subset\mathcal{A}$ , that is $\mathcal{A}$ contains all cliques of size $p$ and intersection less than $r$ with the planted clique, or size less than $p$ and are disjoint with the planted clique. Indeed, it is straightforward than one can reach these cliques from a path from the empty clique without using cliques from $\mathcal{B}$ besides maybe the destination.

A final calculation then gives

\displaystyle\pi_{\beta}(\mathcal{B})/\pi_{\beta}(\mathcal{A})\leq|\Psi_{q}\cap\Omega_{p,<r}|/|\Omega_{p,<r}|+\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0})

(11)

The first term is quasipolynomially small according to (9). For the second term, notice that from the equation (5) from the previous subsection for all $x\leq q$ it holds

|\Omega_{x,r}|=W_{x,r}\leq n^{-\Omega(\log n)}W_{x,0}=n^{-\Omega(\log n)}|\Omega_{x,0}|.

Now using a first and second moment argument mostly based on Lemma 5.1 we prove that for all $p<p^{\prime}\leq q$ it holds w.h.p.

\pi_{\beta}(\Omega_{p,0})\leq\exp({O((\beta+\log n)(|p-p^{\prime}|))}\pi_{\beta}(\Omega_{p^{\prime},0}).

Combining the above with a small case-analysis this allows us to conclude

\displaystyle\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0})\leq\exp\left({O((\beta+\log n)(q-p))}\right)n^{-\Omega(\log n)}.

Now since $|q-p|=O(\varepsilon\log n)$ and $\beta=O(\log n)$ we can always choose $\varepsilon>0$ small enough but constant so that $\beta\varepsilon/\log n$ is at most a small enough constant and in particular

\displaystyle\pi_{\beta}(\Omega_{<q,r})/\pi_{\beta}(\Omega_{\leq p,0})\leq n^{-\Omega(\log n)}.

This completes the proof overview for the failure of the Metropolis process.

For the ST dynamics notice that the only way the value of $\beta$ is used for the construction of the bottleneck is to identify a value of $\varepsilon$ so that the term $\beta\varepsilon/\log n$ is small enough, which then allows to choose the values of $p,q.$ But now if we have a sequence of inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ with $\max_{i}|\beta_{i}|=O(\log n)$ we can choose a “universal” $\varepsilon$ so that for all $i$ , $\left(\max_{i=1}^{m}\beta_{i}\right)\varepsilon$ is small enough, leading to a “universal” bottleneck construction for all $\pi_{\beta_{i}}.$ The proof then follows exactly the proof for the ST failure described in the previous subsection.

3.3 Failure when Starting from the Empty Clique

Here we explain the failure of the Metropolis process in the high temperature regime $\beta=o(\log n)$ when starting from the empty clique. We show in Theorems 6.2 and 7.6 that, when starting from the empty clique which is the most natural and common choice of initialization, the Metropolis process fails to reach either cliques of intersection with the empty clique at least $\gamma\log n$ for any small constant $\gamma>0$ , or cliques of size $(1+\varepsilon)\log n$ for any small constant $\varepsilon>0$ .

One important observation is that since we are only considering when the process reaches either intersection $\gamma\log n$ or size $(1+\varepsilon)\log n$ , we may assume that the process will stop and stay at the same state once hitting such cliques. In particular, this means we can exclude from the state space all cliques of intersection $>\gamma\log n$ and all cliques of size $>(1+\varepsilon)\log n$ , and consider only cliques $C$ such that

|C\cap\mathcal{PC}|\leq\gamma\log n\quad\text{and}\quad|C|\leq(1+\varepsilon)\log n.

(12)

Indeed, the geometry of the restricted state space to these cliques are what really matters for whether the Metropolis process starting from the empty clique can reach our desired destinations or not. In particular, for $\gamma$ sufficiently small (say, $\gamma\leq 1-\alpha$ ), most of cliques satisfying Eq. 12 will have intersection $o(\log n)$ and also size $(1\pm o(1))\log n$ . Note that this partially explains why having a large planted clique of size $n^{\alpha}$ does not help much for the tasks of interest, since for the restricted state space (those satisfying Eq. 12) most cliques do not have vertices from the planted clique and so any constant $\alpha<1$ does not help.

The key property that allows us to establish our result is a notion we call the “expansion” property. Observe that, for a clique $C$ of size $q=\rho\log n$ with intersection $o(\log n)$ , the expected number of vertices which can be added to $C$ to become a larger clique is $~{}n/2^{q}=n^{1-\rho}$ ; this is also the number of common neighbors of vertices from $C$ . Via the union bound and a concentration inequality, one can easily show that in fact, for all cliques with $\rho<1$ the number of common neighbors is concentrated around its expectation with constant multiplicative error; see Definitions 6.4 and 6.5 (where we actually only need the lower bound). This immediately implies that

\displaystyle\Pr(|X_{t}|=p-1\mid|X_{t-1}|=p)=\frac{p}{n}e^{-\beta}\quad\text{and}\quad\Pr(|X_{t}|=p+1\mid|X_{t-1}|=p)\approx\frac{1}{2^{p}}.

which allows us to track how the size of cliques evolves for the Metropolis process, under the assumption that the intersection is always $o(\log n)$ .

To actually prove our result, it will be helpful to consider the time-reversed dynamics and argue that when starting from cliques of intersection $\gamma\log n$ or cliques of size $(1+\varepsilon)\log n$ , it is unlikely to reach the empty clique. Suppose we have the identity Hamiltonian function $h_{i}=i$ for simplicity. Consider as an example the probability of hitting cliques of large intersection as in Theorem 6.2. Recall that $\Omega_{p,s}$ is the collection of cliques of size $p=\rho\log n$ for $\rho\leq 1+\varepsilon$ and of intersection $s=\gamma\log n$ . Then by reversibility we have that for all $t\geq 1$ ,

\displaystyle\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=\emptyset\right)=\sum_{\sigma\in\Omega_{p,s}}\Pr\left(X_{t}=\sigma\mid X_{0}=\emptyset\right)=e^{\beta p}\sum_{\sigma\in\Omega_{p,s}}\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)

Notice that $e^{\beta p}=n^{o(\log n)}$ as $\beta=o(\log n)$ . Our intuition is that in fact, for any clique $\sigma$ of size $p$ , the probability of reaching the empty clique when starting from $\sigma$ is at most $~{}1/\mathbb{E}[W_{p^{\prime},*}]$ where $p^{\prime}=\min\{p,\log n\}$ ; that is, for all integer $t\geq 1$ and all clique $\sigma$ of size $p$ ,

\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)\leq\frac{1}{\mathbb{E}[W_{p^{\prime},*}]},

(13)

and hence we obtain

\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=\emptyset\right)\lesssim e^{\beta p}\frac{\mathbb{E}[W_{p,s}]}{\mathbb{E}[W_{p^{\prime},*}]}\leq n^{-c^{\prime}\log n},

where the last inequality is because most cliques has intersection $o(\log n)$ and size $(1+o(1))\log n$ .

The proof of Eq. 13 utilizes the expansion property mentioned above to analyze the time-reversed dynamics for $|X_{t}|$ . More specifically, we introduce an auxiliary birth and death process $\{Y_{t}\}$ on $[n]$ which is stochastically dominated by $\{|X_{t}|\}$ , in the sense that

	$\displaystyle\Pr(Y_{t}=p-1\mid Y_{t-1}=p)$	$\displaystyle=\frac{p}{n}e^{-\beta}=\Pr(\|X_{t}\|=p-1\mid\|X_{t-1}\|=p);$
	$\displaystyle\Pr(Y_{t}=p+1\mid Y_{t-1}=p)$	$\displaystyle=\frac{1}{20\cdot 2^{p}}<\frac{1}{2^{p}}\approx\Pr(\|X_{t}\|=p+1\mid\|X_{t-1}\|=p).$

Through step-by-step coupling it is easy to see that $Y_{t}\leq|X_{t}|$ always, and thus,

\Pr\left(X_{t}=\emptyset\mid X_{0}=\sigma\right)\leq\Pr\left(Y_{t}=0\mid Y_{0}=p\right).

This allows us to establish Eq. 13 by studying the much simpler process $\{Y_{t}\}$ . Furthermore, we assume the $Y_{t}$ process does not go beyond value $\log n$ (in fact, $(1-\eta)\log n$ for any fixed constant $\eta\in(0,1)$ ) so that we are in the regime where the expansion property holds; this is the reason $p^{\prime}=\min\{p,\log n\}$ appears.

The same approach works perfectly for bounding the probability of hitting cliques of size $(1+\varepsilon)\log n$ when starting from the empty clique, as in Theorem 7.6. For the low-temperature metropolis process with $\beta=\omega(\log n)$ in Theorem 7.7, we also need one more observation that the process in fact does not remove any vertex in polynomially many steps and so it is equivalent to a greedy algorithm. In particular, we can apply the same approach to argue that the process never reaches cliques of size $\log n$ that are subsets of cliques with large intersection or large size, which have much smaller measure. For the ST dynamics in Theorem 8.5, the same approach also works but we need a more sophisticated auxiliary process for the pair of clique size and inverse temperature, along with more complicated coupling arguments.

4 Organization of Main Body

The rest of the paper is organized as follows. In Section 5 we introduce the needed definitions and notation for the formal statements and proofs. In Section 6 we present our lower bounds for reaching a clique of intersection $\gamma\log n$ with the planted clique. First we present the worst-case initialization result and then the case of empty clique initialization. Then in Section 7 we discuss our lower bounds for reaching a clique of size $(1+\varepsilon)\log n.$ As before, we first present the worst-case initialization result, and then discuss the empty clique initialization ones. Finally, in Section 8 we present our lower bounds for the simulated tempering dynamics.

5 Getting Started

For $n\in\mathbb{N}$ , let $[n]=\{0,1,\dots,n\}$ to be the set of all non-negative integers which are at most $n$ . Throughout the paper, we use $\log$ to represent logarithm to the base $2$ , i.e., $\log x=\log_{2}x$ for $x\in\mathbb{R}^{+}$ .

We say an event $\mathcal{A}$ holds with high probability (w.h.p.) if $1-\Pr(\mathcal{A})\leq o(1)$ .

5.1 Random Graphs with a Planted Clique

Let $\mathcal{G}(n,\frac{1}{2})$ denote the random graph on $n$ vertices where every pair $\{u,v\}$ is an edge with probability $1/2$ independently. For $k\in[n]$ , we denote by $\mathcal{G}(n,\frac{1}{2},k)$ the random graph $\mathcal{G}(n,\frac{1}{2})$ with a planted $k$ -clique, where a subset of $k$ out of $n$ vertices is chosen uniformly at random and the random graph is obtained by taking the union of $\mathcal{G}(n,\frac{1}{2})$ and $\mathcal{PC}$ the $k$ -clique formed by those vertices.

Let $\Omega=\Omega(G)$ be the collection of all cliques of an instance of graph $G\sim\mathcal{G}(n,\frac{1}{2},k)$ , and for $q,r\in[n]$ with $q\geq r$ let

\Omega_{q,r}=\{C\in\Omega:|C|=q,\,|C\cap\mathcal{PC}|=r\}.

We also define for convenience $\Omega_{q,r}=\emptyset$ when $q<r$ or $q>n$ . Furthermore, let

\Omega_{q,*}=\bigcup_{r=0}^{q}\Omega_{q,r}\quad\text{and}\quad\Omega_{*,r}=\bigcup_{q=r}^{n}\Omega_{q,r}.

We also define $\Omega_{\leq q,*}=\bigcup_{q^{\prime}=0}^{q}\Omega_{q^{\prime},*}$ and $\Omega_{*,\leq r}=\bigcup_{r^{\prime}=0}^{r}\Omega_{*,r^{\prime}}$ .

For $q,r\in[n]$ with $q\geq r$ , let $W_{q,r}=|\Omega_{q,r}|$ the number of $q$ -cliques $C$ in $G$ with $|C\cap\mathcal{PC}|=r$ . Similarly, $W_{q,r}=0$ when $q<r$ or $q>n$ .

Lemma 5.1.

Let a constant $\alpha\in[0,1)$ and consider the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique. Fix any absolute constant $\varepsilon>0$ .

(1)

For any $q=\left\lfloor\rho\log_{2}n\right\rfloor$ with parameter $\rho>0$ and any $r=\left\lfloor\gamma\log_{2}n\right\rfloor$ with parameter $0\leq\gamma\leq\rho$ , it holds

\displaystyle\mathbb{E}[W_{q,r}]=\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]

w.h.p. as $n\rightarrow+\infty$ .

(2)

For $r=0$ and any $q=\left\lfloor\rho\log_{2}n\right\rfloor$ with $0<\rho\leq 2-\varepsilon$ , it holds

$\displaystyle W_{q,0}\geq\frac{1}{2}\,\mathbb{E}[W_{q,0}]$

w.h.p. as $n\rightarrow+\infty$ .
(3)

For any $r=\left\lfloor\gamma\log n\right\rfloor$ with $0\leq\gamma\leq 1-\alpha$ and any $q=\left\lfloor\rho\log_{2}n\right\rfloor$ satisfying the inequality $\rho\geq 1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon$ , it holds

$\displaystyle W_{q,r}=0$

w.h.p. as $n\rightarrow+\infty$ .

The proof of the lemma is deferred to the Appendix.

Definition 5.2 (Clique-Counts Properties $\mathcal{P}_{\mathsf{upp}}$ and $\mathcal{P}_{\mathsf{low}}$ ).

Let $\varepsilon\in(0,1)$ be an arbitrary constant.

(1)

(Upper Bounds) We say the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the property $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ if the following is true: For all integers $q,r\in\mathbb{N}$ with $0\leq r\leq q\leq n$ , it holds

$W_{q,r}\leq n^{3}\,\mathbb{E}[W_{q,r}];$

in particular, for $0\leq r\leq(1-\alpha)\log n$ and $q\geq(1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon)\log n$ where $\gamma=r/\log n$ , it holds

$W_{q,r}=0.$
(2)

(Lower Bounds) We say the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the property $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ if the following is true: For every integer $q\in\mathbb{N}$ with $0\leq q\leq(2-\varepsilon)\log n$ , it holds

$W_{q,0}\geq\frac{1}{2}\,\mathbb{E}[W_{q,0}].$

Lemma 5.3.

For any constant $\alpha\in(0,1)$ and any constant $\varepsilon\in(0,1)$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies both $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ with probability $1-o(1)$ as $n\to\infty$ .

Proof.

Follows immediately from Lemma 5.1, the Markov’s inequality, and the union bound. ∎

5.2 Hamiltonian and Gibbs Measure

For given $n\in\mathbb{N}$ , let $h:[n]\to\mathbb{R}$ be an arbitrary function. For ease of notations we write $h_{q}=h(q),q\in[n]$ and thus the function $h$ is identified by the vector $(h_{0},h_{1},\dots,h_{n})$ . Given an $n$ -vertex graph $G$ , consider the Hamiltonian function $H:\Omega\to\mathbb{R}$ where $H(C)=h_{|C|}$ . For $\beta\in\mathbb{R}$ , the corresponding Gibbs measure is defined as

\pi_{\beta}(C)\propto w_{\beta}(C):=\exp\left(\beta h_{|C|}\right).

(14)

Let $Z(\beta)=Z(G,h;\beta)$ to be the partition function given by

Z(\beta)=\sum_{C\in\Omega}w_{\beta}(C).

Furthermore, let

Z_{q,*}(\beta)=\sum_{C\in\Omega_{q,*}}w_{\beta}(C)\quad\text{and}\quad Z_{*,r}(\beta)=\sum_{C\in\Omega_{*,r}}w_{\beta}(C).

Assumption 5.4.

We assume that the Hamiltonian $h$ satisfies

(a)

$h_{0}=0$ ;
(b)

$h$ is $1$ -Lipschitz, i.e., $|h(q)-h(q^{\prime})|\leq|q-q^{\prime}|$ for $q,q^{\prime}\leq 2.1\log n$ .

5.3 Metropolis Process and the Hitting Time Lower Bound

In this work, we study the dynamics of the Metropolis process with the respect to the Gibbs measure defined in (14). The Metropolis process is a Markov chain on $\Omega=\Omega(G)$ , the space of all cliques of $G$ . The Metropolis process is described in Algorithm 2.

Input: a graph

G

, a starting clique

X_{0}\in\Omega

, stopping time

T

for $t=1,\dots,T$ do

Pick

v\in V

uniformly at random;

C\leftarrow X_{t-1}\oplus\{v\}

;

if $C\in\Omega$ then

X_{t}\leftarrow\begin{cases}C,&\text{with probability~{}}\min\{1,\pi_{\beta}(C)/\pi_{\beta}(X_{t})\};\\ X_{t-1},&\text{with remaining probability};\end{cases}

else

X_{t}\leftarrow X_{t-1}

;

end if

end for

Output:

X_{T}

Algorithm 2 Metropolis Process

The Metropolis process is an ergodic and reversible Markov chain, with the unique stationary distribution $\pi_{\beta}$ except in the degenerate case when $\beta=0$ and $G$ is the complete graph; see [Jer92].

The following lemma is a well-known fact for lower bounding the hitting time of some target set of a Markov chain using conductance; see [MWW09, Claim 2.1], [LP17, Theorem 7.4] and [AWZ20, Proposition 2.2]. We rely crucially on the following lemma for our worst-case initialization results.

Lemma 5.5.

Let $P$ be the transition matrix of an ergodic Markov chain on a finite state space $\Omega$ with stationary distribution $\pi$ . Let $A\subseteq\Omega$ be a set of states and let $B\subseteq\Omega$ be a set of boundary states for $A$ such that $P(x,y)=0$ for all $x\in A\setminus B$ and $y\in A^{\mathsf{c}}\setminus B$ . Then for any $t>0$ there exists an initial state $x\in A$ such that

\Pr\left(\exists i\leq t\text{~{}such that~{}}X_{i}\in A^{\mathsf{c}}\mid X_{0}=x\right)\leq\frac{\pi(B)}{\pi(A)}\,t.

6 Quasi-polynomial Hitting Time of Large intersection

6.1 Existence of a Bad Initial Clique

Theorem 6.1.

Let $\alpha\in(0,1)$ be any fixed constant. For any constant $\gamma>0$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the following with probability $1-o(1)$ as $n\to\infty$ .

Consider the general Gibbs measure given by Eq. 14 for arbitrary $h$ and arbitrary inverse temperature $\beta$ . There exists a constant $c=c(\alpha,\gamma)>0$ and an initialization state for the Metropolis process from which it requires at least $n^{c\log n}$ steps to reach a clique of intersection with the planted clique at least $\gamma\log n$ , with probability at least $1-n^{-c\log n}$ . In particular, under the worst-case initialization it fails to recover the planted clique in polynomial-time.

Proof.

Notice that we can assume without loss of generality that the constant $\gamma$ satisfies $0<\gamma\leq 1-\alpha$ . We pick

\varepsilon=\frac{1}{2}\left(1-\sqrt{(1-\gamma)^{2}+2\alpha\gamma}\right).

(15)

Note that $\varepsilon>0$ since $0<\gamma\leq 1-\alpha$ . Then, by Lemma 5.3 we know that the random graph $\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor)$ satisfies both properties $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ simultaneously with probability $1-o(1)$ as $n\to\infty$ . In the rest of the proof we assume that both $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ hold.

For any $\gamma\in(0,1-\alpha]$ , let $r=\left\lfloor\gamma\log n\right\rfloor$ . It suffices to show that there exists a constant $c=c(\alpha,\gamma)>0$ such that

\displaystyle\frac{\pi_{\beta}(\Omega_{*,r})}{\pi_{\beta}(\Omega_{*,\leq r})}=\frac{Z_{*,r}}{Z_{*,\leq r}}\leq\exp\left(-c\log^{2}n\right).

(16)

Indeed, given Eq. 16, Theorem 6.1 is an immediate consequence of Lemma 5.5.

By the property $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ we have that $W_{q,r}=0$ for all $q>\bar{q}$ where

\bar{q}=\left\lfloor 1+\sqrt{(1-\gamma)^{2}+2\alpha\gamma}+\varepsilon\right\rfloor\log n=\left\lfloor 2-\varepsilon\right\rfloor\log n.

Hence, we get from the definition of the restricted partition function $Z_{*,r}$ that

Z_{*,r}=\sum_{q=r}^{n}Z_{q,r}=\sum_{q=r}^{n}W_{q,r}\exp\left(\beta h_{q}\right)=\sum_{q=r}^{\bar{q}}W_{q,r}\exp\left(\beta h_{q}\right)\leq n^{3}\sum_{q=r}^{\bar{q}}\mathbb{E}\left[W_{q,r}\right]\exp\left(\beta h_{q}\right),

where the last inequality again follows from $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ . We define

q^{*}=\operatorname*{arg\,max}_{q\in[\bar{q}]}\,\mathbb{E}\left[W_{q,r}\right]\exp\left(\beta h_{q}\right).

Thus, we have that

Z_{*,r}\leq n^{4}\,\mathbb{E}\left[W_{q^{*},r}\right]\exp\left(\beta h_{q^{*}}\right).

(17)

Meanwhile, we have

Z_{*,\leq r}\geq Z_{q^{*},0}=W_{q^{*},0}\exp\left(\beta h_{q^{*}}\right).

We deduce from $q^{*}\leq\bar{q}\leq(2-\varepsilon)\log n$ and the property $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ that

Z_{*,\leq r}\geq\frac{1}{2}\,\mathbb{E}\left[W_{q^{*},0}\right]\exp\left(\beta h_{q^{*}}\right).

(18)

Combining Eqs. 17 and 18, we get that,

\frac{Z_{*,r}}{Z_{*,\leq r}}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{q^{*},r}\right]}{\mathbb{E}\left[W_{q^{*},0}\right]}.

Suppose that $\rho=q^{*}/\log n$ . Then, an application of Item 1 of Lemma 5.1 implies that,

	$\displaystyle\frac{Z_{,r}}{Z_{,\leq r}}$	$\displaystyle\leq 2n^{4}\,\frac{\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]}{\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}+o(1)\right)\right]}$
		$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right].$

This establishes (16) for $c=c(\alpha,\gamma):=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}>0$ since $0<\gamma\leq 1-\alpha$ , as we wanted. ∎

6.2 Starting from the Empty Clique

In this section, we strengthen Theorem 6.1 for a wide range of temperatures by showing that the Metropolis Dynamics still fails to obtain a significant intersection with the planted clique when starting from the empty clique (or any clique of sufficiently small size), a nature choice of initial configuration in practice.

6.2.1 Our Result

Theorem 6.2.

Let $\alpha\in(0,1)$ be any fixed constant. For any constant $\gamma\in(0,1-\alpha)$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the following with probability $1-o(1)$ as $n\to\infty$ .

Consider the general Gibbs measure given by Eq. 14 for arbitrary $1$ -Lipschitz $h$ with $h_{0}=0$ and inverse temperature $\beta\leq(\ln 2)\gamma\log n$ . Let $\{X_{t}\}$ denote the Metropolis process on $G$ with stationary distribution $\pi_{\beta}$ . Then there exist constants $\xi=\xi(\alpha,\gamma)>0$ and $c=c(\alpha,\gamma)>0$ such that for any clique $C\in\Omega$ of size at most $\xi\log n$ , one has

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq n^{c\log n}\text{~{}s.t.~{}}|X_{t}\cap\mathcal{PC}|\geq\gamma\log n\;\Big{|}\;X_{0}=C\Big{)}\leq n^{-c\log n}.

In particular, the Metropolis process starting from the empty clique requires $n^{\Omega(\log n)}$ steps to reach cliques of intersection with the planted clique at least $\gamma\log n$ , with probability $1-n^{-\Omega(\log n)}$ . As a consequence, it fails to recover the planted clique in polynomial time.

We proceed with the proof of the theorem.

6.2.2 Key Lemmas

We start with tracking how the size of the clique changes during the process. It is also helpful to consider the reversed process: show that it is unlikely to hit the empty clique $\emptyset$ when starting from some clique of size $\log n$ .

Definition 6.3.

For a graph $G=(V,E)$ and a subset $U\subseteq V$ of vertices, we say a vertex $v\in V\setminus U$ is fully adjacent to $U$ if $v$ is adjacent to all vertices in $U$ ; equivalently, $U\subseteq N(v)$ where $N(v)$ denotes the set of neighboring vertices of $v$ in the graph $G$ . Let $A(U)$ denote the set of all vertices in $V\setminus U$ that are fully adjacent to $U$ ; equivalently, $A(U)$ is the set of all common neighbors in $V\setminus U$ of all vertices from $U$ .

Definition 6.4 (Expansion Property $\mathcal{P}_{\mathsf{exp}}$ ).

Let $\eta\in(0,1)$ be an arbitrary constant. We say the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ if the following is true: For every $U\subseteq V$ with $|U|\leq(1-\eta)\log n$ , it holds

|A(U)|\geq\frac{n}{20\cdot 2^{|U|}}.

The following lemma establishes the desired “expansion lemma” on the cliques of size less than $\log n$ in $G(n,\frac{1}{2},k).$

Lemma 6.5 (“Expansion Lemma”).

For any constant $\alpha\in(0,1)$ and any constant $\eta\in(0,1)$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ with probability $1-o(1)$ as $n\to\infty$ .

The proof of the lemma is deferred to the Appendix.

The Lemma 6.5 is quite useful for us, as it allows us to obtain the following bounds on the size transitions for the Metropolis process:

	$\displaystyle\Pr\left(\|X_{t}\|=q-1\mid\|X_{t-1}\|=q\right)$	$\displaystyle=\frac{q}{n}\min\left\{1,\exp\left[\beta\left(h_{q-1}-h_{q}\right)\right]\right\}$		(19)
	$\displaystyle\Pr\left(\|X_{t}\|=q+1\mid\|X_{t-1}\|=q\right)$	$\displaystyle\geq\frac{1}{20\cdot 2^{q}}\min\left\{1,\exp\left[\beta\left(h_{q+1}-h_{q}\right)\right]\right\}.$		(20)

The proof is also making use of the following delicate lemma.

Lemma 6.6.

Consider the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique conditional on satisfying the property $\mathcal{P}_{\mathsf{upp}}(0.1)$ and the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ for some fixed constant $\eta\in(0,1)$ . Let $t,p,q,r\in\mathbb{N}_{+}$ be integers with $p\leq r\leq q$ . Denote also $\xi=p/\log n,$ $\rho=q/\log n$ and $\gamma=r/\log n.$ For any $C\in\Omega_{p,*}$ we have

\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)\leq\\ t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right],

where $\rho^{\prime}=\min\{\rho,1-\eta\}$ and $\hat{\beta}=\beta/((\ln 2)(\log n)).$

We postpone the proof of Lemma 6.6 to Section 6.2.4 and first show how it can be used to prove Theorem 6.2. On a high level, we bound the probability of hitting $\Omega_{q,r}$ by studying how the size of the clique evolves during the process up to size $(1-\eta)\log n$ . The term $((1+\hat{\beta})\rho-\rho^{2}/2)-((1+\hat{\beta})\rho^{\prime}-(\rho^{\prime})^{2}/2)$ represents the approximation error when the clique goes beyond size $(1-\eta)\log n$ ; in particular, when the destination clique size $q=\rho\log n$ is at most $(1-\eta)\log n$ , we have $\rho^{\prime}=\rho$ and this error is zero. Meanwhile, the term $(1-\alpha)\gamma-\gamma^{2}/2-\xi+\xi^{2}/2$ corresponds to the fact that the number of cliques of size $q$ and intersection $r=\gamma\log n$ with the planted clique is much smaller than the total number of cliques of size $q$ , and hence reaching intersection $r$ is very unlikely.

6.2.3 Proof of Theorem 6.2, given Lemma 6.6

Proof of Theorem 6.2.

Let $\hat{\beta}=\beta/((\ln 2)(\log n))$ and recall $\hat{\beta}\leq\gamma<1-\alpha.$ As will be clear later, we shall choose

\xi=\frac{1}{4}(1-\alpha-\gamma)\gamma\quad\text{and}\quad\eta=\min\left\{\frac{1}{8}(1-\alpha-\gamma),\gamma\right\}.

By Lemmas 5.3 and 6.5, the random graph $\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor)$ satisfies both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ with probability $1-o(1)$ as $n\to\infty$ , for the choice of $\eta$ given above. In the rest of the proof we assume that both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ are satisfied.

Suppose that $|C|=p=\xi^{\prime}\log n\leq\xi\log n$ . By the union bound we have

	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{*,r}\;\Big{\|}\;X_{0}=C\Big{)}$	$\displaystyle\leq\sum_{t=1}^{T}\sum_{q=r}^{n}\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$
		$\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{q\in[n]}\,\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right).$		(21)

Now let us fix some $t\in[T],q\in[n].$ By Lemma 6.6,

		$\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi^{\prime}+\frac{(\xi^{\prime})^{2}}{2}\right)+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right],$

where we have used that $\xi^{\prime}\leq\xi\leq 1.$ Write for shorthand

A=\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)\quad\text{and}\quad B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}.

So we have

\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)\leq t\exp\left[(\ln 2)(\log n)^{2}\left(A-B+o(1)\right)\right].

(22)

Given (22) it suffices to show that for all $\alpha\in(0,1),\gamma\in(0,1-\alpha)$ there exists $c_{0}(\alpha,\gamma)>0$ such that uniformly for all values of interest of the parameters $\rho,\hat{\beta}$ we have $A-B\leq-c_{0}(\alpha,\gamma)$ . Indeed, then by (21) we have

\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{*,r}\;\Big{|}\;X_{0}=C\Big{)}\leq T^{2}n\exp\left[-c_{0}(\alpha,\gamma)(\ln 2)(\log n)^{2}\right]

and Theorem 6.2 follows e.g. for $c(\alpha,\gamma)=\frac{\ln 2}{20}c_{0}(\alpha,\gamma).$

We now construct the desired function $c_{0}(\alpha,\gamma)>0.$ If $\rho\leq 1-\eta$ , then $\rho^{\prime}=\rho$ and $A=0$ . Meanwhile, we have

B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\geq(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\frac{1}{4}(1-\alpha-\gamma)\gamma\geq\frac{3}{4}(1-\alpha-\gamma)\gamma.

So $A-B\leq-\frac{3}{4}(1-\alpha-\gamma)\gamma$ .

If $\rho>1-\eta$ , then $\rho^{\prime}=1-\eta$ and we have

	$\displaystyle A$	$\displaystyle=\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)$
		$\displaystyle=\hat{\beta}(\rho-1)-\frac{1}{2}(\rho-1)^{2}+\hat{\beta}\eta+\frac{\eta^{2}}{2}$
		$\displaystyle\leq\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma,$

where the last inequality follows from

\hat{\beta}(\rho-1)-\frac{1}{2}(\rho-1)^{2}\leq\frac{\hat{\beta}^{2}}{2}\leq\frac{\gamma^{2}}{2}

and also

\hat{\beta}\eta+\frac{\eta^{2}}{2}\leq\gamma\cdot\frac{1}{8}(1-\alpha-\gamma)+\frac{\gamma}{2}\cdot\frac{1}{8}(1-\alpha-\gamma)\leq\frac{1}{4}(1-\alpha-\gamma)\gamma

since $\eta\leq(1-\alpha-\gamma)/8$ and $\eta\leq\gamma$ . Meanwhile, we have

B=(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\geq(1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\frac{1}{4}(1-\alpha-\gamma)\gamma.

So, we deduce that

A-B\leq\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+\frac{1}{4}(1-\alpha-\gamma)\gamma=-\frac{1}{2}(1-\alpha-\gamma)\gamma.

Hence, in all cases $A-B\leq-c_{0}(\alpha,\gamma)$ for

c_{0}(\alpha,\gamma)=\frac{1}{2}(1-\alpha-\gamma)\gamma.

This completes the proof of the theorem. ∎

6.2.4 Proof of Lemma 6.6

In this subsection we prove the crucial Lemma 6.6. Throughout this subsection we assume that the property $\mathcal{P}_{\mathsf{upp}}(0.1)$ and the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ hold for some fixed constant $\eta\in(0,1)$ . First, recall that $W_{q,r}=|\Omega_{q,r}|$ . We have from $\mathcal{P}_{\mathsf{upp}}(0.1)$ that

$\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$	$\displaystyle=\sum_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq W_{q,r}\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq n^{3}\,\mathbb{E}[W_{q,r}]\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).$	(23)

For now, we fix a $\sigma\in\Omega_{q,r}$ and focus on bounding $\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).$ The key idea to bound this probability is to exploit the reversibility of the Metropolis process. The following two standard facts are going to be useful.

Fact 6.7 ([LP17]).

If $P$ is the transition matrix of a reversible Markov chain over a finite state space $\Gamma$ with stationary distribution $\mu$ , then for all $x,y\in\Gamma$ and all integer $t\geq 1$ it holds

\mu(x)P^{t}(x,y)=\mu(y)P^{t}(y,x).

Fact 6.8 ([LP17]).

For a birth-death process on $[n]$ with transition probabilities

P(i,i+1)=p_{i},\quad P(i,i-1)=q_{i},\quad\text{and}\quad P(i,i)=1-p_{i}-q_{i},

the stationary distribution is given by

\mu(i)\propto\prod_{s=1}^{i}\frac{p_{s-1}}{q_{s}},\quad\forall i\in[n].

Now, notice that using the time-reversed dynamics it suffices try to bound the probability of reaching a small clique $C$ when starting from a large clique $\sigma$ . Indeed, by reversibility we have

\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)=\exp\left(\beta\left(h_{q}-h_{p}\right)\right)\Pr\left(X_{t}=C\mid X_{0}=\sigma\right).

(24)

which is an application of Fact 6.7.

We introduce a birth-death process on $[n]$ denoted by $\{Y_{t}\}$ with transition matrix $P$ given by the following transition probabilities:

	$\displaystyle P(s,s-1)$	$\displaystyle=\frac{s}{n}\min\left\{\exp\left(\beta\left(h_{s-1}-h_{s}\right)\right),1\right\},\quad 1\leq s\leq n;$
	$\displaystyle P(s,s+1)$	$\displaystyle=\begin{cases}\frac{1}{20\cdot 2^{s}}\min\left\{\exp\left(\beta\left(h_{s+1}-h_{s}\right)\right),1\right\},&\quad 0\leq s<\left\lfloor(1-\eta)\log n\right\rfloor;\\ 0,&\quad\left\lfloor(1-\eta)\log n\right\rfloor\leq s\leq n-1;\end{cases}$
	$\displaystyle P(s,s)$	$\displaystyle=1-P(s,s-1)-P(s,s+1),\quad 0\leq s\leq n.$

Denote the stationary distribution of $\{Y_{t}\}$ by $\nu$ , which is supported on $\{0,1,\dots,\left\lfloor(1-\eta)\log n\right\rfloor\}$ . The process $\{Y_{t}\}$ serves as an approximation of $\{|X_{t}|\}$ ; note that $\{|X_{t}|\}$ itself is not a Markov process.

The following lemma shows that $Y_{t}$ is stochastically dominated by $|X_{t}|$ . The proof of this fact is essentially based on the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ , and the derived in the proof below bounds on the size transition of $|X_{t}|$ , described in Eqs. 19 and 20.

Lemma 6.9.

Let $\{X_{t}\}$ denote the Metropolis process starting from some $X_{0}=\sigma\in\Omega_{q,*}$ . Let $\{Y_{t}\}$ denote the birth-death process described above with parameter $\eta\in(0,1)$ starting from $Y_{0}=q$ . Then there exists a coupling $\{(X_{t},Y_{t})\}$ of the two processes such that for all integer $t\geq 1$ it holds

Y_{t}\leq|X_{t}|.

In particular, for all integer $t\geq 1$ it holds

\Pr\left(X_{t}=C\mid X_{0}=\sigma\right)\leq\Pr\left(\exists t^{\prime}\in\mathbb{N}\wedge t^{\prime}\leq t:\,Y_{t^{\prime}}=p\mid Y_{0}=q\right)\leq\sum_{t^{\prime}=1}^{t}\Pr\left(Y_{t^{\prime}}=p\mid Y_{0}=q\right).

Proof.

We couple $\{|X_{t}|\}$ and $\{Y_{t}\}$ as follows. Suppose that $Y_{t-1}\leq|X_{t-1}|$ for some integer $t\geq 1$ . We will construct a coupling of $X_{t}$ and $Y_{t}$ such that $Y_{t}\leq|X_{t}|$ . Notice that the following probability inequality is a straightforward corollary of that.

Since the probability that $Y_{t}=Y_{t-1}+1$ is less than $1/2$ and so does the probability of $|X_{t}|=|X_{t-1}|-1$ , we may couple $X_{t}$ and $Y_{t}$ such that $|X_{t}|-Y_{t}$ decreases at most one; namely, it never happens that $Y_{t}$ increases by $1$ while $X_{t}$ decreases in size. Thus, it suffices to consider the extremal case when $|X_{t-1}|=Y_{t-1}=s$ . We have

\displaystyle\Pr\left(|X_{t}|=s-1\mid|X_{t-1}|=s\right)

\displaystyle=\frac{s}{n}\min\left\{1,\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right]\right\}=P\left(s,s-1\right)

(25)

Meanwhile, recall that $A(X_{t-1})$ is the set of vertices $v$ such that $X_{t-1}\cup\{v\}\in\Omega$ . Then we have

|A(X_{t-1})|\geq\frac{n}{20\cdot 2^{s}}

whenever $s\leq n_{\eta}:=\left\lfloor(1-\eta)\log n\right\rfloor$ by the expansion property $\mathcal{P}_{\mathsf{exp}}(\eta)$ . Hence, we deduce that

	$\displaystyle\Pr\left(\|X_{t}\|=s+1\mid\|X_{t-1}\|=s\right)$	$\displaystyle=\dfrac{\|A(X_{t-1})\|}{n}\min\left\{1,\exp\left[\beta\left(h_{s+1}-h_{s}\right)\right]\right\}$
		$\displaystyle\geq\dfrac{\mathbbm{1}\{s<n_{\eta}\}}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta\left(h_{s+1}-h_{s}\right)\right]\right\}=P\left(s,s+1\right)$		(26)

Using (25) and (26) we can couple $|X_{t}|$ and $Y_{t}$ such that either $|X_{t}|=Y_{t}$ or $|X_{t}|=s+1$ and $Y_{t}=s$ , as desired. ∎

The next lemma upper bounds the $t$ -step transition probability $\Pr\left(Y_{t}=p\mid Y_{0}=q\right)$ .

Lemma 6.10.

Let $\{Y_{t}\}$ denote the birth-death process described above with parameter $\eta\in(0,1)$ starting from $Y_{0}=q=\rho\log n$ and let $p=\xi\log n$ with $\xi\leq 1-\eta$ . Then for all integer $t\geq 1$ we have

\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{p}-h_{q^{\prime}}\right)\right]

(27)

where $\rho^{\prime}=\min\{\rho,1-\eta\}$ and $q^{\prime}=\rho^{\prime}\log n$ .

Proof.

We consider first the case where $\rho\leq 1-\eta$ . By Fact 6.7, we have

\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right)=P^{t}(q,p)=\frac{\nu(p)}{\nu(q)}P^{t}(p,q)\leq\frac{\nu(p)}{\nu(q)}.

(28)

By Fact 6.8, we have

	$\displaystyle\frac{\nu(p)}{\nu(q)}$	$\displaystyle=\prod_{s=p+1}^{q}\frac{\frac{s}{n}\min\left\{\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right],1\right\}}{\frac{1}{20\cdot 2^{s-1}}\min\left\{\exp\left[\beta\left(h_{s}-h_{s-1}\right)\right],1\right\}}$
		$\displaystyle=\prod_{s=p+1}^{q}\frac{20s\cdot 2^{s-1}}{n}\exp\left[\beta\left(h_{s-1}-h_{s}\right)\right]$
		$\displaystyle=\frac{20^{q-p}\cdot(q!/p!)\cdot 2^{\binom{q}{2}-\binom{p}{2}}}{n^{q-p}}\exp\left[\beta\left(h_{p}-h_{q}\right)\right]$
		$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\rho+\frac{\rho^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{p}-h_{q}\right)\right].$

Next, consider the case where $\rho>1-\eta$ , or equivalently $q>q^{\prime}$ . Let $\tau$ be the first time that $Y_{t^{\prime}}=q^{\prime}$ and we obtain from (28) that

	$\displaystyle\Pr\left(Y_{t}=p\mid Y_{0}=q\right)$	$\displaystyle=\sum_{t^{\prime}=0}^{t}\Pr\left(\tau=t^{\prime}\mid Y_{0}=q\right)\Pr\left(Y_{t-t^{\prime}}=p\mid Y_{0}=q^{\prime}\right)$
		$\displaystyle\leq\frac{\nu(p)}{\nu(q^{\prime})}\sum_{t^{\prime}=0}^{t}\Pr\left(\tau=t^{\prime}\mid Y_{0}=q\right)=\frac{\nu(p)}{\nu(q^{\prime})}\Pr\left(\tau\leq t\mid Y_{0}=q\right)\leq\frac{\nu(p)}{\nu(q^{\prime})}.$

This completes the proof of the lemma. ∎

We now proceed with the proof of Lemma 6.6.

Proof of Lemma 6.6.

From Lemmas 6.9 and 6.10, we have for each $\sigma\in\Omega_{q,r}$

	$\displaystyle\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$	$\displaystyle=\exp\left[\beta\left(h_{q}-h_{p}\right)\right]\Pr\left(X_{t}=C\mid X_{0}=\sigma\right)$
		$\displaystyle\leq\exp\left[\beta\left(h_{q}-h_{p}\right)\right]\,\max_{t^{\prime}\in[t]}\,t\Pr\left(Y_{t^{\prime}}=p\mid Y_{0}=q\right)$
		$\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right],$

where $\rho^{\prime}=\min\{\rho,1-\eta\}$ and $q^{\prime}=\rho^{\prime}\log n$ .

Combining now with Eq. 23 and Lemma 5.1 we have that $\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$ is at most

		$\displaystyle n^{3}\,\mathbb{E}[W_{q,r}]\max_{\sigma\in\Omega_{q,r}}\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle n^{3}t\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]$
		$\displaystyle\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right]$
	$\displaystyle={}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left(\rho-\frac{\rho^{2}}{2}\right)-\left(\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right].$

Since $h$ is $1$ -Lipschitz and $q^{\prime}\leq q$ , we have

\exp\left[\beta\left(h_{q}-h_{q^{\prime}}\right)\right]\leq\exp\left[\beta\left(q-q^{\prime}\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(\hat{\beta}\left(\rho-\rho^{\prime}\right)\right)\right]

where we recall that $\beta=(\ln 2)\hat{\beta}\log n$ . Therefore,

		$\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\hat{\beta})\rho-\frac{\rho^{2}}{2}\right)-\left((1+\hat{\beta})\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right].$

The proof of the lemma is complete. ∎

7 Quasi-polynomial Hitting Time of Large Cliques

In this section, we present our results about the failure of the Metropolis process to even find cliques of size at least $(1+\varepsilon)\log n$ , for any planted clique size $k=\left\lfloor n^{\alpha}\right\rfloor,\alpha\in(0,1).$

7.1 Existence of a Bad Initial Clique

We start with the “worst-case” initialization result which now works for all inverse temperatures $\beta=O(\log n)$ , but establishes that from this initialization the Metropolis process fails to find either a clique of size at least $(1+\varepsilon)\log n$ or to find a clique with intersection at least $\gamma\log n$ with the planted clique.

Theorem 7.1.

Let $\alpha\in[0,1)$ be any fixed constant. Then the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the following with probability $1-o(1)$ as $n\to\infty$ .

Consider the general Gibbs measure given by Eq. 14 for arbitrary $h$ satisfying 5.4 and arbitrary inverse temperature $\beta=O(\log n)$ . For any constants $\varepsilon\in(0,1-\alpha)$ and $\gamma\in(0,1-\alpha]$ , there exists a constant $c>0$ and an initialization state for the Metropolis process from which it requires at least $n^{c\log n}$ steps to reach

•

either cliques of size at least $(1+\varepsilon)\log n$ ,
•

or cliques of intersection with the planted clique at least $\gamma\log n$ ,

with probability at least $1-n^{-c\log n}$ .

We now present the proof of Theorem 7.1. We first need the notion of gateways as introduced by [Jer92] in his original argument for the failure of the Metropolis process.

Definition 7.2 (Gateways).

For $q\in[n]$ , we say a clique $C\in\Omega$ is a $q$ -gateway if there exists $\ell\in\mathbb{N}$ and a sequence of cliques $C_{0}=C,C_{1},\dots,C_{\ell}\in\Omega$ such that

(1)

For every $1\leq i\leq\ell$ , $C_{i-1}$ and $C_{i}$ differ by exactly one vertex;
(2)

For every $0\leq i\leq\ell$ , $|C_{i}|\geq|C|$ ;
(3)

$|C_{\ell}|=q$ .

Let $\Psi_{q}$ denote the collection of all cliques that are $q$ -gateways.

Notice that by definition if a clique $\sigma$ is a $q$ -gateway then $|\sigma|\leq q$ .

Definition 7.3 (Gateway-Counts Property $\mathcal{P}_{\mathsf{gw}}$ ).

We say the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the gateway-counts property $\mathcal{P}_{\mathsf{gw}}$ if the following is true: For all integers $q=\left\lfloor(1+\varepsilon)\log n\right\rfloor$ , $p=\left\lfloor(1+\varepsilon-\theta)\log n\right\rfloor$ , and $u=\left\lfloor(\varepsilon/6)\log n\right\rfloor$ with parameters $\varepsilon\in(0,1-\alpha)$ and $\theta\in(0,\varepsilon)$ , it holds

\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right].

The following lemma follows immediately from arguments in [Jer92].

Lemma 7.4.

For any constant $\alpha\in[0,1)$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the gateway-counts property $\mathcal{P}_{\mathsf{gw}}$ with probability $1-o(1)$ as $n\to\infty$ .

Proof.

We follow the same approach as in [Jer92] with the slight modification that $\theta$ is not equal to $\varepsilon/3$ but arbitrary. For any $C\in\Psi_{q}\cap\Omega_{p,\leq u}$ , there exists a set $U\in V\setminus C$ of size $|U|=q-p$ and a subset $W\subseteq C\setminus\mathcal{PC}$ of size $2p-q-u$ such that every vertex from $U$ is adjacent to every vertex from $W$ . To see this, consider a path $C_{0}=C,C_{1},\dots,C_{\ell}$ as in Definition 7.2, and consider the first clique $C^{\prime}$ in the path such that $|C^{\prime}\setminus C|=q-p$ . Such $C^{\prime}$ must exist since the destination clique $C_{\ell}$ has size $q$ while $|C|=p<q$ . Note that $C^{\prime}$ corresponds to the first time when $q-p$ new vertices are added. Meanwhile, since $|C^{\prime}|\geq p$ , we have $|C\cap C^{\prime}|\geq p-(q-p)=2p-q$ and $|C\cap C^{\prime}\setminus\mathcal{PC}|\geq 2p-q-u$ . We can thus take $U=C^{\prime}\setminus C$ and any $W\subseteq C\cap C^{\prime}\setminus\mathcal{PC}$ of size $2p-q-u$ .

Hence, we can associate every $q$ -gateway $C$ in $\Omega_{p,\leq u}$ with a tuple $(C,U,W)$ satisfying all the conditions mentioned above: $C$ is a clique of size $p$ and intersection at most $u$ with $\mathcal{PC}$ , $U\subseteq V\setminus C$ has size $q-p$ , $W\subseteq C\setminus\mathcal{PC}$ has size $2p-q-u$ , and $U,W$ are fully connected. Let $X$ denote the number of such tuples. Then, $|\Psi_{q}\cap\Omega_{p,\leq u}|\leq X$ . The first moment of $X$ is given by

	$\displaystyle\mathbb{E}[X]$	$\displaystyle=\sum_{r=0}^{u}\binom{k}{r}\binom{n-k}{p-r}\binom{n-p}{q-p}\binom{p-r}{2p-q-u}\left(\frac{1}{2}\right)^{\binom{p}{2}-\binom{r}{2}+(q-p)(2p-q-u)}$
		$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right],$

where in the first equality, $\binom{k}{r}\binom{n-k}{p-r}$ counts the number of choices of $C$ for $r$ ranging from $0$ to $u$ , $\left(1/2\right)^{\binom{p}{2}-\binom{r}{2}}$ is the probability $C$ being a clique, $\binom{n-p}{q-p}$ is the number of choices of $U$ , $\binom{p-r}{2p-q-u}$ is for $W$ , and finally $\left(1/2\right)^{(q-p)(2p-q-u)}$ is the probability of $U,W$ being fully connected. The lemma then follows from the Markov’s inequality

|\Psi_{q}\cap\Omega_{p,\leq u}|\leq X\leq n\,\mathbb{E}[X],

and a union bound over the choices of $q$ , $p$ , and $u$ . ∎

We now present the proof of Theorem 7.1.

Proof of Theorem 7.1.

By Lemmas 5.3 and 7.4, the random graph $\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor)$ satisfies both the clique-counts properties $\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0})$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ for $\varepsilon_{0}=\alpha\leq 1-\varepsilon$ and the gateway-counts properties $\mathcal{P}_{\mathsf{gw}}$ simultaneously with probability $1-o(1)$ as $n\to\infty$ . Throughout the proof we assume that $\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0})$ , $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ , and $\mathcal{P}_{\mathsf{gw}}$ are all satisfied.

Suppose $\hat{\beta}$ is such that $\hat{\beta}=\beta/((\ln 2)(\log n))$ so that $\hat{\beta}=O(1)$ . Pick a constant $\theta\in(0,\varepsilon/3)$ such that

\hat{\beta}\theta\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Let $q=(1+\varepsilon)\log n$ , $p=(1+\varepsilon-\theta)\log n$ , and $r=\gamma\log n$ . We define

\mathcal{B}=\left(\Psi_{q}\cap\Omega_{p,<r}\right)\cup\Omega_{<q,r}

to be the “bottleneck” set to which we will apply Lemma 5.5. Let $\mathcal{A}\subseteq\Omega$ denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from $\mathcal{B}$ except possibly for the destination. The following claim, whose proof is postponed to the end of this subsection, follows easily from the definitions of $\mathcal{A}$ and $\mathcal{B}$ .

Claim 7.5.

1.

Cliques from $\mathcal{A}\setminus\mathcal{B}$ are not adjacent to cliques from $\mathcal{A}^{\mathsf{c}}$ (i.e., they differ by at least two vertices);
2.

$\Omega_{q,*}\subseteq\mathcal{A}^{\mathsf{c}}\setminus\mathcal{B}$ ;
3.

$\Omega_{*,r}\subseteq\mathcal{A}^{\mathsf{c}}\cup\mathcal{B}$ .

Now observe from Claim 7.5 that to prove what we want, it suffices to show that starting from an appropriate state the Metropolis process does not hit any clique from $\mathcal{B}$ in $\exp(c\log^{2}n)$ -time with probability $1-\exp(-c\log^{2}n)$ . For a collection $\mathcal{U}\subseteq\Omega$ of cliques we write $Z(\mathcal{U})=Z(\beta;\mathcal{U})=\sum_{\sigma\in\mathcal{U}}e^{\beta h_{|\sigma|}}$ to represent the partition function restricted to the set $\mathcal{U}$ . Since the boundary of $\mathcal{A}$ is included in $\mathcal{B}$ by Claim 7.5 it suffices to show that there exists a constant $c>0$ such that

\displaystyle\frac{Z(\mathcal{B})}{Z(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right).

(29)

Given (29), Theorem 7.1 is an immediate consequence of Lemma 5.5.

Observe that we have the following inclusion,

\mathcal{A}\supseteq\Omega_{p,<r}\cup\Omega_{\leq p,0}.

To see this, for every clique in $\Omega_{p,<r}$ , it can be reached from the empty clique by adding vertices one by one in any order, so that none of the intermediate cliques are from $\Omega_{p,<r}\supseteq\left(\Psi_{q}\cap\Omega_{p,<r}\right)$ or from $\Omega_{<q,r}$ , except for possibly the last one. Similarly, every clique in $\Omega_{\leq p,0}$ can be reached from the empty clique in the same way. (Note that cliques in $\Omega_{\leq q,0}$ , however, may not be reachable from $\emptyset$ without crossing $\mathcal{B}$ ; for example, by adding vertices one by one to reach a clique of size $q$ , it may first reach a clique of size $p<q$ which is a $q$ -gateway with intersection $<r$ .) Thus, we have

\frac{Z(\mathcal{B})}{Z(\mathcal{A})}\leq\frac{Z\left(\Psi_{q}\cap\Omega_{p,<r}\right)}{Z(\mathcal{A})}+\frac{Z(\Omega_{<q,r})}{Z(\mathcal{A})}\leq\frac{Z\left(\Psi_{q}\cap\Omega_{p,<r}\right)}{Z(\Omega_{p,<r})}+\frac{Z(\Omega_{<q,r})}{Z(\Omega_{\leq p,0})}=\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}+\frac{Z_{<q,r}}{Z_{\leq p,0}}.

(30)

The rest of the proof aims to upper bound the two ratios in Eq. 30 respectively.

For the first ratio, the key observation is that since $\gamma\leq 1-\alpha$ , $\Omega_{p,<r}$ is dominated by cliques of intersection $o(\log n)$ (almost completely outside the planted clique) or more rigorously speaking those with sufficiently small intersection, say, in $\Omega_{p,\leq u}$ where $u=(\varepsilon/6)\log n$ . Hence, we write

\left|\Psi_{q}\cap\Omega_{p,<r}\right|=\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|+\left|\Psi_{q}\cap\Omega_{p,u<\cdot<r}\right|\leq\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|+W_{p,u<\cdot<r},

and combining $W_{p,<r}\geq W_{p,0}$ we get

\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}\leq\frac{\left|\Psi_{q}\cap\Omega_{p,\leq u}\right|}{W_{p,0}}+\frac{W_{p,u<\cdot<r}}{W_{p,0}}.

(31)

By Lemma 5.1, the clique-counts property $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ , and the gateway-counts property $\mathcal{P}_{\mathsf{gw}}$ , we upper bound the first term in Eq. 31 by

$\displaystyle\frac{\left\|\Psi_{q}\cap\Omega_{p,\leq u}\right\|}{W_{p,0}}$	$\displaystyle\leq\frac{2\left\|\Psi_{q}\cap\Omega_{p,\leq u}\right\|}{\mathbb{E}[W_{p,0}]}$
	$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon-\theta)-\frac{1}{2}(1+\varepsilon-\theta)^{2}-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right]$
	$\displaystyle~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-(1+\varepsilon-\theta)+\frac{1}{2}(1+\varepsilon-\theta)^{2}+o(1)\right)\right]$
	$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\theta\left(\frac{5}{6}\varepsilon-2\theta\right)+o(1)\right)\right].$	(32)

For the second term in Eq. 31, $\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0})$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ imply that

\frac{W_{p,u<\cdot<r}}{W_{p,0}}\leq 2n^{3}\,\frac{\mathbb{E}\left[W_{p,u<\cdot<r}\right]}{\mathbb{E}\left[W_{p,0}\right]}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{p,u}\right]}{\mathbb{E}\left[W_{p,0}\right]}\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{12}(1-\alpha)\varepsilon+o(1)\right)\right],

(33)

where the last inequality uses $\varepsilon\leq 1-\alpha$ . Combining Eqs. 31, 7.1 and 33, we obtain

\frac{\left|\Psi_{q}\cap\Omega_{p,<r}\right|}{W_{p,<r}}\leq\exp\left(-c_{1}\log^{2}n+o(\log^{2}n)\right)

for some constant $c_{1}=c_{1}(\alpha,\varepsilon,\theta)$ . This bounds the first ratio in Eq. 30.

For the second ratio in Eq. 30, we have from $\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0})$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ that

\frac{Z_{<q,r}}{Z_{\leq p,0}}\leq 2n^{3}\,\frac{\mathbb{E}\left[Z_{<q,r})\right]}{\mathbb{E}\left[Z_{\leq p,0}\right]}.

(34)

Using linearity of expectation we have that

\mathbb{E}\left[Z_{<q,r}\right]=\sum_{q^{\prime}=r}^{q-1}\mathbb{E}\left[Z_{q^{\prime},r}\right]=\sum_{q^{\prime}=r}^{q-1}\mathbb{E}\left[W_{q^{\prime},r}\right]\exp\left(\beta h_{q^{\prime}}\right).

Let

q^{*}=\operatorname*{arg\,max}_{q^{\prime}\in[n]:\;r\leq q^{\prime}<q}\,\mathbb{E}\left[W_{q^{\prime},r}\right]\exp\left(\beta h_{q^{\prime}}\right).

It follows that

\mathbb{E}\left[Z_{<q,r}\right]\leq n\,\mathbb{E}\left[W_{q^{*},r}\right]\exp\left(\beta h_{q^{*}}\right).

(35)

Meanwhile, let $q^{*\prime}=\min\{q^{*},p\}$ , and we have

\mathbb{E}\left[Z_{\leq p,0}\right]\geq\mathbb{E}\left[W_{q^{*\prime},0}\right]\exp\left(\beta h_{q^{*\prime}}\right).

(36)

Combining Eqs. 34, 35 and 36, we get that

\frac{Z_{<q,r}}{Z_{\leq p,0}}\leq 2n^{4}\,\frac{\mathbb{E}\left[W_{q^{*},r}\right]}{\mathbb{E}\left[W_{q^{*\prime},0}\right]}\exp\left[\beta(h_{q^{*}}-h_{q^{*\prime}})\right].

Let $\rho=q^{*}/\log n$ and $\rho^{\prime}=q^{*\prime}/\log n$ . Then by definition we have that $\rho^{\prime}\leq\rho\leq 1+\varepsilon$ and $\rho^{\prime}=\min\{\rho,1+\varepsilon-\theta\}$ . Furthermore by 5.4 we have

\beta(h_{q^{*}}-h_{q^{*\prime}})\leq(\ln 2)(\log n)^{2}\hat{\beta}(\rho-\rho^{\prime}).

Then, an application of Item 1 of Lemma 5.1 implies that

	$\displaystyle\frac{Z_{<q,r}}{Z_{\leq p,0}}$	$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+\hat{\beta}(\rho-\rho^{\prime})+o(1)\right)\right]$
		$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left((\rho-\rho^{\prime})\left(\hat{\beta}+1-\frac{1}{2}(\rho+\rho^{\prime})\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right].$

If $\rho=\rho^{\prime}$ then $Z_{<q,r}/Z_{\leq p,0}\leq\exp\left(-c_{2}\log^{2}n+o(\log^{2}n)\right)$ for $c_{2}=(\ln 2)\left((1-\alpha)\gamma-\gamma^{2}/2\right)$ . If $\rho>\rho^{\prime}$ then

(\rho-\rho^{\prime})\left(\hat{\beta}+1-\frac{1}{2}(\rho+\rho^{\prime})\right)\leq\theta\hat{\beta}\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)

since $1\leq 1+\varepsilon-\theta=\rho^{\prime}<\rho\leq 1+\varepsilon$ . Therefore, we have $Z_{<q,r}/Z_{\leq p,0}\leq\exp\left(-c_{2}\log^{2}n+o(\log^{2}n)\right)$ for $c_{2}=\frac{\ln 2}{2}\left((1-\alpha)\gamma-\gamma^{2}/2\right)$ . This bounds the second ratio in Eq. 30. Hence, we establish Eq. 29 and the theorem then follows. ∎

Proof of Claim 7.5.

The first item is obvious, since if a clique $\sigma\in\mathcal{A}\setminus\mathcal{B}$ is adjacent to another clique $\sigma^{\prime}$ , then by appending $(\sigma,\sigma^{\prime})$ to the path from $\emptyset$ to $\sigma$ we get a path from $\emptyset$ to $\sigma^{\prime}$ without passing through cliques from $\mathcal{B}$ except possibly at $\sigma^{\prime\prime}$ , since $\sigma\notin\mathcal{B}$ . This implies that $\sigma^{\prime\prime}\in\mathcal{A}$ which proves the first item.

For the second item, suppose for contradiction that there exists a clique $C$ of size $q$ that is in $\mathcal{A}\cup\mathcal{B}$ . Clearly $C\notin\mathcal{B}$ and so $C\in\mathcal{A}$ . Then there exists a path of cliques $\emptyset=C_{0},C_{1},\dots,C_{\ell}=C$ which contain no cliques from $\mathcal{B}$ . Let $C_{j}$ for $j\in[\ell]$ be the first clique of size $q$ in this path; that is, $|C_{i}|<q$ for $0\leq i<j$ . Then, the (sub)path $\emptyset=C_{0},C_{1},\dots,C_{j}$ must contain a $q$ -gateway of size $p$ , call it $C^{\prime}$ , as one can choose the largest $i<j$ for which $|C_{i}|=p$ and set $C^{\prime}=C_{i}$ . If $C^{\prime}$ has intersection with the planted clique less than $r$ , then $C^{\prime}\in\Psi_{q}\cap\Omega_{p,<r}\subseteq\mathcal{B}$ , contradiction. Otherwise, $C^{\prime}$ has intersection at least $r$ which means at some earlier time, it will pass a clique of intersection exactly $r$ whose size is less than $q$ , which is again in $\Omega_{<q,r}\subseteq\mathcal{B}$ leading to a contradiction. This establishes the second item.

For the third one, if $\sigma$ is a clique of intersection $r$ , then either $|\sigma|<q$ meaning $\sigma\in\Omega_{<q,r}\subseteq\mathcal{B}$ or $|\sigma|\geq q$ meaning any path from $\emptyset$ to $\sigma$ must pass through a clique of size exactly $q$ and by the second item must pass through cliques from $\mathcal{B}$ , implying $\sigma\in\mathcal{A}^{\mathsf{c}}$ . ∎

7.2 Starting from the Empty Clique

Theorem 7.6.

Let $\alpha\in[0,1)$ be any fixed constant. For any constant $\varepsilon\in(0,1)$ , the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique satisfies the following with probability $1-o(1)$ as $n\to\infty$ .

Consider the general Gibbs measure given by Eq. 14 for arbitrary $1$ -Lipschitz $h$ with $h_{0}=0$ and inverse temperature $\beta=o(\log n)$ . Let $\{X_{t}\}$ denote the Metropolis process on $G$ with stationary distribution $\pi_{\beta}$ . Then there exist constants $\xi=\xi(\alpha,\varepsilon)>0$ and $c=c(\alpha,\varepsilon)>0$ such that for any clique $C\in\Omega$ of size at most $\xi\log n$ , one has

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq n^{c\log n}\text{~{}s.t.~{}}|X_{t}|\geq(1+\varepsilon)\log n\;\Big{|}\;X_{0}=C\Big{)}\leq n^{-c\log n}.

In particular, the Metropolis process starting from the empty clique requires $n^{\Omega(\log n)}$ steps to reach cliques of size at least $(1+\varepsilon)\log n$ , with probability $1-n^{-\Omega(\log n)}$ .

Proof.

By Lemmas 5.3 and 6.5, the random graph $\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor)$ satisfies both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ for $\eta=\varepsilon/2$ with probability $1-o(1)$ as $n\to\infty$ . In the rest of the proof we assume that both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ are satisfied.

Suppose that $|C|=p=\xi^{\prime}\log n$ , for $0\leq\xi^{\prime}\leq\xi$ and $\xi>0$ is a sufficiently small constant to be determined. Let $q=(1+\varepsilon)\log n$ and $s=(1-\alpha)\log n$ . First observe that if the chain arrives at a clique in $\Omega_{q,*}$ , it either hits a clique from $\Omega_{q,<s}$ or previously reached a clique in $\Omega_{<q,s}$ . This implies that

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,*}\;\Big{|}\;X_{0}=C\Big{)}\\ \leq\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,<s}\;\Big{|}\;X_{0}=C\Big{)}+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{<q,s}\;\Big{|}\;X_{0}=C\Big{)}.

It suffices to upper bound each of the two terms respectively.

Similar as in Eq. 21, we deduce from the union bound and $\mathcal{P}_{\mathsf{upp}}(0.1)$ that

	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,<s}\;\Big{\|}\;X_{0}=C\Big{)}$	$\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{r\in[s]}\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right);$
	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{<q,s}\;\Big{\|}\;X_{0}=C\Big{)}$	$\displaystyle\leq Tn\,\max_{t\in[T]}\,\max_{p\in[q]}\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right).$

It suffices to show that for all integer $t\geq 1$ , all pairs

(p,r)\in\left\{(q,r):r\in[s]\right\}\cup\left\{(p,s):p\in[q]\right\},

and all clique $\sigma\in\Omega_{p,r}$ , it holds

\Pr\left(X_{t}\in\Omega_{p,r}\mid X_{0}=C\right)\leq tn^{-c\log n}

(37)

for some constant $c>0$ .

Without loss of generality we may assume that $\varepsilon\leq 1-\alpha$ . Let $\eta=\varepsilon/2$ and $\xi=\varepsilon^{2}/8$ . Write $\beta=(\ln 2)\hat{\beta}\log n$ with $\hat{\beta}=o(1)$ . Consider first the pair $(q,r)$ where $r\in[s]$ . Suppose that $r=\gamma\log n$ with $0\leq\gamma\leq 1-\alpha$ . We deduce from Lemma 6.6, with $\hat{\beta}=o(1)$ , $\rho=1+\varepsilon$ , $\gamma\in[0,1-\alpha]$ , and $\rho^{\prime}=\min\{\rho,1-\eta\}=1-\eta$ , that

		$\displaystyle\Pr\left(X_{t}\in\Omega_{q,r}\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(\left((1+\varepsilon)-\frac{1}{2}(1+\varepsilon)^{2}\right)-\left((1-\eta)-\frac{1}{2}(1-\eta)^{2}\right)-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{4}+o(1)\right)\right].$

This shows Eq. 37 for the first case.

For the second case, we have the pair $(p,s)$ where $p\in[q]$ . Suppose $p=\rho\log n$ with $0\leq\rho\leq 1+\varepsilon$ . Also recall that $s=(1-\alpha)\log n$ . Again, we deduce from Lemma 6.6, with $\hat{\beta}=o(1)$ , $\rho\in[0,1+\varepsilon]$ , $\gamma=1-\alpha$ , and $\rho^{\prime}=\min\{\rho,1-\eta\}$ , that

		$\displaystyle\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(\left(\rho-\frac{\rho^{2}}{2}\right)-\left(\rho^{\prime}-\frac{(\rho^{\prime})^{2}}{2}\right)-\left(\frac{1}{2}(1-\alpha)^{2}-\xi+\frac{\xi^{2}}{2}\right)+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}-\frac{3}{8}(1-\alpha)^{2}+o(1)\right)\right],$

where the last inequality follows from

\frac{1}{2}(1-\alpha)^{2}-\xi+\frac{\xi^{2}}{2}\geq\frac{1}{2}(1-\alpha)^{2}-\frac{\varepsilon^{2}}{8}\geq\frac{3}{8}(1-\alpha)^{2}

since $\varepsilon\leq 1-\alpha$ . If $\rho^{\prime}=\rho\leq 1-\eta$ , then $-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}=0$ . If $\rho^{\prime}=1-\eta<\rho\leq 1+\varepsilon$ , then

-\frac{1}{2}(\rho-1)^{2}+\frac{1}{2}(\rho^{\prime}-1)^{2}\leq\frac{\eta^{2}}{2}\leq\frac{\varepsilon^{2}}{8}\leq\frac{1}{8}(1-\alpha)^{2}

where the last inequality is because $\varepsilon\leq 1-\alpha$ . Hence,

\Pr\left(X_{t}\in\Omega_{p,s}\mid X_{0}=C\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{4}(1-\alpha)^{2}+o(1)\right)\right].

This shows Eq. 37 for the second case. The theorem then follows from Eq. 37. ∎

7.3 Low Temperature Regime and Greedy Algorithm

Theorem 7.7.

Consider the general Gibbs measure given by Eq. 14 for the identity function $h_{q}=q$ and inverse temperature $\beta=\omega(\log n)$ . For any constant $\gamma\in(0,1-\alpha]$ , there exists constant $\xi=\xi(\alpha,\varepsilon)>0$ such that for any clique $C\in\Omega$ of size at most $\xi\log n$ and any constant $c\in\mathbb{N}^{+}$ , with probability at least $1-n^{-\omega(1)}$ the Metropolis process starting from $C$ will not reach

•

either cliques of size at least $(1+\varepsilon)\log n$ ,
•

or cliques of intersection at least $\gamma\log n$ with the planted clique,

within $n^{c}$ steps.

For a subset $\mathcal{S}\subseteq\Omega$ of cliques and an integer $p\in\mathbb{N}^{+}$ , let $\mathcal{S}^{[p]}$ denote the collection of all cliques of size $p$ that are subsets of cliques from $\mathcal{S}$ , i.e.,

\mathcal{S}^{[p]}=\left\{C\in\Omega_{p,*}:\exists\sigma\in\mathcal{S}\mathrm{~{}s.t.~{}}C\subseteq\sigma\right\}.

For $0\leq r\leq q$ we define $W^{[p]}_{q,r}=\left|\Omega^{[p]}_{q,r}\right|$ and similarly for $W^{[p]}_{q,<r}$ , $W^{[p]}_{<q,r}$ , etc.

Lemma 7.8.

Consider the random graph $\mathcal{G}(n,\frac{1}{2},k=\left\lfloor n^{\alpha}\right\rfloor)$ with a planted clique conditional on satisfying the property $\mathcal{P}_{\mathsf{upp}}(0.1)$ . For any $0\leq p\leq q=\rho\log n$ with $\rho\geq 0$ and any $r=\gamma\log n$ with $0\leq\gamma\leq\rho$ we have

W^{[p]}_{q,r}\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right]

with high probability as $n\to\infty$ .

Proof.

Notice that every clique of size $q$ has $\binom{q}{p}\leq 2^{q}\leq n^{2}$ subsets of size $p$ . The lemma then follows immediately from $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ . ∎

We now give the proof of Theorem 7.7.

Proof of Theorem 7.7.

By Lemmas 5.3 and 6.5, the random graph $\mathcal{G}(n,\frac{1}{2},\left\lfloor n^{\alpha}\right\rfloor)$ satisfies both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ for $\eta=\varepsilon/2$ with probability $1-o(1)$ as $n\to\infty$ . In the rest of the proof we assume that both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ are satisfied.

First, a simple observation is that the process actually never remove vertices. Indeed, the probability of removing a vertex from the current clique in one step is at most $e^{-\beta}=n^{-\omega(1)}$ . Since we run the Metropolis process for polynomially many steps, the probability that the chain ever remove a vertex is upper bounded by $\mathrm{poly}(n)\cdot n^{-\omega(1)}=n^{-\omega(1)}$ . Hence, in this low temperature regime the Metropolis process is equivalent to the greedy algorithm.

Without loss of generality, we may assume that

\varepsilon\leq\sqrt{2\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)}.

Let $q=(1+\varepsilon)\log n$ and $s=\gamma\log n$ . Suppose the initial state is a clique $C$ of size $\xi\log n$ for $\xi\leq\varepsilon^{2}/8$ . Let $\eta=\varepsilon/2$ and $q^{\prime}=(1-\eta)\log n$ . If the chain arrives at a clique in $\Omega_{q,*}$ , it either hits a clique from $\Omega_{q,<s}$ or previously reached a clique in $\Omega_{<q,s}$ . Similarly, if the chain hits $\Omega_{*,s}$ , then it must also reach either $\Omega_{<q,s}$ or $\Omega_{q,<s}$ . Hence, to bound the probability of reaching either $\Omega_{q,*}$ or $\Omega_{*,s}$ , we only need to bound the probability of reaching $\Omega_{q,<s}$ or $\Omega_{<q,s}$ . Furthermore, since the process never removes a vertex with high probability in polynomially many steps, in the case this happens it must first reach a clique from $\Omega^{[q^{\prime}]}_{q,<s}$ , or $\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}$ , or $\Omega_{\leq q^{\prime},s}$ . To summarize, we can deduce from the union bound that

		$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,}\cup\Omega_{,s}\;\Big{\|}\;X_{0}=C\Big{)}$
	$\displaystyle\leq{}$	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q,<s}\;\Big{\|}\;X_{0}=C\Big{)}+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}\;\Big{\|}\;X_{0}=C\Big{)}$
		$\displaystyle+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{\leq q^{\prime},s}\;\Big{\|}\;X_{0}=C\Big{)}+\frac{1}{n^{\omega(1)}}.$

We bound each of the three probabilities respectively.

For the first case, we have from the union bound and $\mathcal{P}_{\mathsf{upp}}(0.1)$ that,

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q,<s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{r\in[s]}\max_{\sigma\in\Omega^{[q^{\prime}]}_{q,r}}\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose $r=\gamma^{\prime}\log n\in[s]$ . We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with $\rho=\rho^{\prime}=1-\eta$ and $\gamma=0$ for the notations of Lemma 6.6) that for every integer $t\geq 1$ and every clique $\sigma\in\Omega^{[q^{\prime}]}_{q,r}$ (note that $|\sigma|=q^{\prime}$ since $q^{\prime}<q$ ),

		$\displaystyle\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle\frac{\mathbb{E}\left[W^{[q^{\prime}]}_{q,r}\right]}{\mathbb{E}\left[W_{q^{\prime},0}\right]}\cdot\mathbb{E}\left[W_{q^{\prime},0}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left((1+\varepsilon)-\frac{1}{2}(1+\varepsilon)^{2}-(1-\alpha)\gamma^{\prime}+\frac{(\gamma^{\prime})^{2}}{2}-(1-\eta)+\frac{1}{2}(1-\eta)^{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{4}+o(1)\right)\right],$

where the last inequality follows from

\frac{\eta^{2}}{2}+\xi\leq\frac{\varepsilon^{2}}{8}+\frac{\varepsilon^{2}}{8}=\frac{\varepsilon^{2}}{4}.

Next consider the second case, where we have

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{p\in[q]\setminus[q^{\prime}]}\max_{\sigma\in\Omega^{[q^{\prime}]}_{p,s}}\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose $p=\rho\log n\in[q]\setminus[q^{\prime}]$ and so $1-\eta\leq\rho\leq 1+\varepsilon$ . Also recall that $s=\gamma\log n$ . We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with $\rho=\rho^{\prime}=1-\eta$ and $\gamma=0$ for the notations of Lemma 6.6) that for every integer $t\geq 1$ and every clique $\sigma\in\Omega^{[q^{\prime}]}_{p,s}$ (note that $|\sigma|=q^{\prime}$ since $q^{\prime}<q$ ),

		$\displaystyle\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle\frac{\mathbb{E}\left[W^{[q^{\prime}]}_{p,s}\right]}{\mathbb{E}[W_{q^{\prime},0}]}\cdot\mathbb{E}[W_{q^{\prime},0}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}-(1-\eta)+\frac{1}{2}(1-\eta)^{2}+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+\frac{\eta^{2}}{2}+\xi+o(1)\right)\right]$
	$\displaystyle\leq{}$	$\displaystyle t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right],$

where the last inequality follows from

\frac{\eta^{2}}{2}+\xi\leq\frac{\varepsilon^{2}}{4}\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Finally consider the third case. Again we have

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{\leq q^{\prime},s}\;\Big{|}\;X_{0}=C\Big{)}\leq Tn^{4}\,\max_{t\in[T]}\,\max_{p\in[q^{\prime}]}\max_{\sigma\in\Omega_{p,s}}\mathbb{E}[W_{p,s}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right).

Suppose $p=\rho\log n\in[q^{\prime}]$ and so $\rho\leq 1-\eta$ . Also recall that $s=\gamma\log n$ . We deduce from Lemma 6.6 that for every integer $t\geq 1$ and every clique $\sigma\in\Omega_{p,s}$ ,

	$\displaystyle\mathbb{E}[W_{p,s}]\Pr\left(X_{t}=\sigma\mid X_{0}=C\right)$	$\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+\xi-\frac{\xi^{2}}{2}+o(1)\right)\right]$
		$\displaystyle\leq t\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{4}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right)+o(1)\right)\right],$

where the last inequality follows from

\xi\leq\frac{\varepsilon^{2}}{8}\leq\frac{1}{4}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Therefore, we conclude that

\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,*}\cup\Omega_{*,s}\;\Big{|}\;X_{0}=C\Big{)}\leq\frac{Tn^{2}}{n^{\Omega(\log n)}}+\frac{1}{n^{\omega(1)}}\leq\frac{1}{n^{\omega(1)}},

as we wanted. ∎

8 Simulated Tempering

In this section, we discuss our lower bounds against the simulated tempering versions of the Metropolis process.

8.1 Definition of the dynamics

We start with the formal definition. Suppose for some $m\in\mathbb{N}$ we have a collection of inverse temperatures $\beta_{0}<\beta_{1}<\dots<\beta_{m}$ . For each $i\in[m]$ , let $\widehat{Z}(\beta_{i})$ denote an estimate of the partition function $Z(\beta_{i})$ . The simulated tempering (ST) dynamics is a Markov chain on the state space $\Omega\times[m]$ , which seeks to optimize a Hamiltonian defined on $\Omega\times[m]$ , say given according to $H(C)=h_{|C|}$ for an arbitrary vector $\{h_{q},q\in[n]\}$ . The transition matrix is given as follows.

•

A level move: For $C,C^{\prime}\in\Omega$ and $i\in[m]$ such that $C$ and $C^{\prime}$ differ by exactly one vertex,

P_{\textsc{st}}((C,i),(C^{\prime},i))=\frac{a}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{|C^{\prime}|}-h_{|C|}\right)\right]\right\}

•

A temperature move: For $i,i^{\prime}\in[m]$ such that $|i-i^{\prime}|=1$ ,

P_{\textsc{st}}((C,i),(C,i^{\prime}))=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i^{\prime}})}\exp\left[\left(\beta_{i^{\prime}}-\beta_{i}\right)h_{|C|}\right]\right\}.

Some remarks are in order.

Remark 8.1.

The stationary distribution of the ST dynamics can be straightforwardly checked to be given by $\mu(C,i)\propto\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(C)$ , for $\pi_{\beta_{i}}(C)$ the generic Gibbs measure defined in Eq. 14. Notice that if $\widehat{Z}(\beta_{i})=Z(\beta_{i})$ for all $i\in[m]$ then we have

\mu(C,i)=\frac{1}{m+1}\pi_{\beta_{i}}(C).

and along a single temperature the ST dynamics is identical to the Metropolis process introduced in Section 5.3.

Remark 8.2.

The use of estimates $\hat{Z}(\beta_{i})$ of the partition function $Z(\beta_{i})$ in the definition of the ST dynamics, as opposed to the original values is naturally motivated from applications where one cannot efficiently compute the value of $Z(\beta_{i})$ to decide the temperature move step.

8.2 Existence of a Bad Initial Clique

We now present our lower bound results which are for the ST dynamics under a worst-case initialization.

Our first result is about the ST dynamics failing to reach $\gamma\log n$ intersection with the planted clique, similar to the Metropolis process according to Theorem 6.1. Interestingly, the lower bound holds for any choice of arbitrarily many temperatures and for any choice of estimators of the partition function.

Theorem 8.3.

Consider the general ST dynamics given in Section 8.1 for arbitrary $h,$ arbitrary $m\in\mathbb{N}$ , arbitrary inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ , and arbitrary estimates $\hat{Z}(\beta_{i}),i=1,\ldots,m$ . Then there is an initialization pair of temperature and clique for the ST dynamics from which it requires $\exp(\Omega(\log^{2}n))$ -time to reach a pair of temperature and clique where the clique is of intersection with the planted clique at least $\gamma\log_{2}n,$ with probability at least $1-\exp(-\Omega(\log^{2}n))$ . In particular, under worst-case initialization it fails to recover the planted clique in polynomial-time.

Proof.

Throughout the proof we assume that both $\mathcal{P}_{\mathsf{upp}}(\varepsilon)$ and $\mathcal{P}_{\mathsf{low}}(\varepsilon)$ are satisfied for $\varepsilon>0$ given by Eq. 15, which happens with probability $1-o(1)$ as $n\to\infty$ by Lemma 5.3.

Notice that we can assume without loss of generality that $\gamma$ satisfies $0<\gamma<2(1-\alpha)$ . We let $r=\left\lfloor\gamma\log n\right\rfloor$ . Now from the proof of Theorem 6.1 we have that for any such $\gamma,$ there exists a constant $c=c(\alpha,\gamma)>0$ such that for all $\beta_{i},i=1,\ldots,m$ and the corresponding $\pi_{\beta_{i}},i=1,\ldots,m$ Gibbs measure per Eq. 14,

\displaystyle\frac{\pi_{\beta_{i}}(\Omega_{*,r})}{\pi_{\beta_{i}}(\Omega_{*,\leq r})}=\frac{Z_{*,r}}{Z_{*,\leq r}}\leq\exp\left(-c\log^{2}n\right),

(38)

w.h.p. as $n\rightarrow+\infty.$ We now consider the set $\mathcal{A}=\bigcup_{i\in[m]}\Omega_{*,\leq r}\times\{\beta_{i}\}$ the subset of the state space of the ST dynamics, and notice $\partial\mathcal{A}=\bigcup_{i\in[m]}\Omega_{*,r}\times\{\beta_{i}\},$ where $\partial A$ is the boundary of $A$ . In particular using (38) we conclude that w.h.p. as $n\rightarrow+\infty$

\displaystyle\frac{\mu(\partial\mathcal{A})}{\mu(\mathcal{A})}

\displaystyle=\frac{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\Omega_{*,r})}{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\Omega_{*,\leq r})}\leq\exp\left(-c\log^{2}n\right),

(39)

Given Eq. 39, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎

Our second result is about the ST dynamics under the additional restriction that $\max_{i\in[m]}|\beta_{i}|=O(\log n).$ In this case, similar to the Metropolis process per Theorem 7.1, we show that the ST dynamics fail to reach either $(1+\varepsilon)\log n$ -cliques or cliques with intersection at least $\gamma\log n$ with the planted clique. Interestingly, again, the lower bound holds for any choice of arbitrarily many temperatures of magnitude $O(\log n)$ and for any choice of estimators of the partition function.

Theorem 8.4.

Consider the general ST dynamics given in Section 8.1 for arbitrary $h$ satisfying 5.4, arbitrary $m\in\mathbb{N}$ , arbitrary inverse temperatures $\beta_{1}<\beta_{2}<\ldots<\beta_{m}$ with $\max_{i\in[m]}|\beta_{i}|=O(\log n)$ , and arbitrary estimates $\hat{Z}(\beta_{i}),i=1,\ldots,m$ . For any constants $\varepsilon\in(0,1-\alpha)$ and $\gamma\in(0,1-\alpha]$ , there is an initialization pair of temperature and clique for the ST dynamics from which it requires $\exp(\Omega(\log^{2}n))$ -time to reach a pair of temperature and clique where

•

either the clique is of size at least $(1+\varepsilon)\log n$ ,
•

or the clique is of intersection at least $\gamma\log n$ with the planted clique,

with probability at least $1-\exp(-\Omega(\log^{2}n))$ .

Proof.

Throughout the proof we assume that $\mathcal{P}_{\mathsf{upp}}(\varepsilon_{0})$ , $\mathcal{P}_{\mathsf{low}}(\varepsilon_{0})$ , and $\mathcal{P}_{\mathsf{gw}}$ are all satisfied for $\varepsilon_{0}=\alpha\leq 1-\varepsilon$ , which happens with probability $1-o(1)$ as $n\to\infty$ by Lemmas 5.3 and 7.4.

We start with following the proof of Theorem 7.1. For $i=1,\ldots,m$ let $\hat{\beta}_{i}$ be such that $\hat{\beta}_{i}=\beta_{i}/((\ln 2)(\log n))$ . By assumption we have $\max_{i\in[m]}|\hat{\beta_{i}}|=O(1)$ . Pick a constant $\theta\in(0,\varepsilon/3)$ such that for all $i\in[m]$

\hat{\beta}_{i}\theta\leq\frac{1}{2}\left((1-\alpha)\gamma-\frac{\gamma^{2}}{2}\right).

Let $q=(1+\varepsilon)\log n$ , $p=(1+\varepsilon-\theta)\log n$ , and $r=\gamma\log n$ . Let also

\mathcal{B}=\left(\Psi_{q}\cap\Omega_{p,<r}\right)\cup\Omega_{<q,r}.

Let also $\mathcal{A}\subseteq\Omega$ denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from $\mathcal{B}$ except possibly for the destination.

From the proof of Theorem 7.1 we have that there exists a constant $c=c(\alpha,\gamma,\theta)>0$ such that for all $\beta_{i},i=1,\ldots,m$ and the corresponding $\pi_{\beta_{i}},i=1,\ldots,m$ being the Gibbs measure per Eq. 14,

\displaystyle\frac{\pi_{\beta_{i}}(\partial\mathcal{A})}{\pi_{\beta_{i}}(\mathcal{A})}\leq\frac{\pi_{\beta_{i}}(\mathcal{B})}{\pi_{\beta_{i}}(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right),

(40)

w.h.p. as $n\rightarrow+\infty.$ We now consider the set $\mathcal{G}=\bigcup_{i\in[m]}\mathcal{A}\times\{\beta_{i}\}$ the subset of the state space of the ST dynamics, and notice $\partial\mathcal{G}=\bigcup_{i\in[m]}\partial\mathcal{A}\times\{\beta_{i}\}.$ In particular using (40) we conclude that w.h.p. as $n\rightarrow+\infty$

\displaystyle\frac{\mu(\partial\mathcal{G})}{\mu(\mathcal{G})}

\displaystyle=\frac{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\partial\mathcal{A})}{\sum_{i=1}^{m}\frac{Z(\beta_{i})}{\widehat{Z}(\beta_{i})}\pi_{\beta_{i}}(\mathcal{A})}\leq\exp\left(-c\log^{2}n\right),

(41)

Given Eq. 41, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎

8.3 Starting From the Empty Clique

Theorem 8.5.

Consider the general ST dynamics given in Section 8.1 for monotone $1$ -Lipschitz $h$ with $h_{0}=0$ , arbitrary inverse temperatures $\beta_{0}<\beta_{1}<\ldots<\beta_{m}$ with $m=o(\log n)$ and $\beta_{m}=O(1)$ , and arbitrary estimates $\hat{Z}(\beta_{0})<\hat{Z}(\beta_{1})<\cdots<\hat{Z}(\beta_{m})$ . For any constants $c\in\mathbb{N}$ , $\varepsilon\in(0,1-\alpha)$ , and $\gamma\in(0,1-\alpha]$ , the ST dynamics starting from $(\emptyset,0)$ will not reach within $n^{c}$ steps a pair of temperature and clique where

•

either the clique is of size at least $(1+\varepsilon)\log n$ ,
•

or the clique is of intersection at least $\gamma\log n$ with the planted clique,

with probability $1-n^{-\omega(1)}$ .

In what follows we condition on both $\mathcal{P}_{\mathsf{upp}}(0.1)$ and $\mathcal{P}_{\mathsf{exp}}(\eta)$ for $\eta=\varepsilon/2$ , which by Lemmas 5.3 and 6.5 hold with probability $1-o(1)$ as $n\to\infty$ . Also, we may assume that

\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\geq\frac{1}{n^{\sqrt{\frac{\log n}{m}}}}

(42)

for all $i=0,1,\dots,m-1$ . Otherwise, for any clique $C$ of size $O(\log n)$ one has

P_{\textsc{st}}((C,i),(C,i+1))=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{|C|}\right]\right\}\leq\frac{e^{O(\log n)}}{n^{\sqrt{\frac{\log n}{m}}}}=\frac{1}{n^{\omega(1)}}

since $\beta_{m}=O(1)$ , $h_{|C|}\leq|C|=O(\log n)$ and $m=o(\log n)$ . This means that the chain with high probability will never make a move to the inverse temperature $\beta_{i+1}$ in polynomially many steps, unless already having clique size, say, $\geq 10\log n$ . Since we are studying reaching cliques of size $(1+\varepsilon)\log n$ , we may assume that Eq. 42 holds for all $i$ , by removing those temperatures violating Eq. 42 and those larger since the chain does not reach them with high probability in $\mathrm{poly}(n)$ steps. An immediate corollary of Eq. 42 is that

\frac{\widehat{Z}(\beta_{m})}{\widehat{Z}(\beta_{0})}\leq n^{m\sqrt{\frac{\log n}{m}}}=n^{\sqrt{m\log n}}=n^{o(\log n)}.

(43)

Let $q=(1+\varepsilon)\log n$ and $s=\gamma\log n$ . By the union bound we have

	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,(X_{t},I_{t})\in(\Omega_{q,<s}\cup\Omega_{<q,s})\times[m]\;\Big{\|}\;(X_{0},I_{0})=(\emptyset,0)\Big{)}$
$\displaystyle\leq{}$	$\displaystyle T(m+1)n^{4}\,\max_{t\in[T]}\,\max_{\ell\in[m]}\,\max_{r\in[s]}\max_{\sigma\in\Omega_{q,r}}\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)$
	$\displaystyle+T(m+1)n^{4}\,\max_{t\in[T]}\,\max_{\ell\in[m]}\,\max_{p\in[q]}\max_{\sigma\in\Omega_{p,s}}\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)$	(44)

We will show that for all integer $t\geq 1$ , all integer $\ell\in[m]$ , all integer $r\leq s$ , and all clique $\sigma\in\Omega_{q,r}$ , it holds

\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)\leq n^{-\omega(1)},

(45)

and all integer $p\leq q$ , and all clique $\sigma\in\Omega_{p,s}$ , it holds

\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)\leq n^{-\omega(1)},

(46)

The theorem then follows from Eqs. 44, 45 and 46.

It will be helpful to consider the time-reversed dynamics and try to bound the probability of reaching $\emptyset$ when starting from a large clique $\sigma$ . By reversibility, we have

\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)={\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}}\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right).

(47)

which is an application of Fact 6.7.

Let $\eta\in(0,1)$ be a constant. Introduce a random walk $\{(Y_{t},J_{t})\}$ on $[n]\times[m]$ with transition matrix $P$ given by

	$\displaystyle P\left((s,j),(s-1,j)\right)$	$\displaystyle=\frac{as}{n}\min\left\{1,\exp\left[\beta_{j}\left(h_{s-1}-h_{s}\right)\right]\right\};$
	$\displaystyle P\left((s,j),(s+1,j)\right)$	$\displaystyle=\begin{cases}\dfrac{a}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta_{j}\left(h_{s+1}-h_{s}\right)\right]\right\},&\text{for~{}}0\leq s<\left\lfloor(1-\eta)\log n\right\rfloor;\vspace{0.2em}\\ 0,&\text{for~{}}\left\lfloor(1-\eta)\log n\right\rfloor\leq s\leq n;\end{cases}$
	$\displaystyle P\left((s,j),(s,j^{\prime})\right)$	$\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{j})}{\widehat{Z}(\beta_{j^{\prime}})}\exp\left[\left(\beta_{j^{\prime}}-\beta_{j}\right)h_{s}\right]\right\}\quad\text{for $j^{\prime}=j\pm 1$}.$

and for all $(s,j)$ ,

P\left((s,j),(s,j)\right)=1-\sum_{s^{\prime}=s\pm 1}P\left((s,j),(s^{\prime},j)\right)-\sum_{j^{\prime}=j\pm 1}P\left((s,j),(s,j^{\prime})\right).

To be more precise, the above definition of $P\left((s,j),(s^{\prime},j^{\prime})\right)$ applies when $(s^{\prime},j^{\prime})\in[n]\times[m]$ and we assume $P\left((s,j),(s^{\prime},j^{\prime})\right)=0$ if $(s^{\prime},j^{\prime})\notin[n]\times[m]$ , e.g., when $j=0$ and $j^{\prime}=-1$ .

We now calculate the stationary distribution of $P$ on states $(s,i)$ when restricted to $s\leq\left\lfloor(1-\eta)\log n\right\rfloor$ . We start with proving that the random walk is time-reversible. Note that the random walk introduced is clearly aperiodic, positive recurrent and irreducible. Hence, by the Kolmogorov’s criterion the random walk is time-reversible if and only if for any cycle in the state space, the probability the random walk moves along the cycle in one direction equals to the probability of moving in the opposite direction. Given that the minimal cycles in the finite box $[n]\times[m]$ are simply squares of the form $\{s,s+1\}\times\{j,j+1\}$ , it suffices to show that for any $s\in[n-1],j\in[m-1],$ the criterion solely for these cycles, that is to show

		$\displaystyle P\left((s,j),(s+1,j)\right)P\left((s+1,j),(s+1,j+1)\right)P\left((s+1,j+1),(s+1,j)\right)P\left((s+1,j),(s,j)\right)$
	$\displaystyle=$	$\displaystyle P\left((s,j),(s,j+1)\right)P\left((s,j+1),(s+1,j+1)\right)P\left((s+1,j+1),(s,j+1)\right)P\left((s,j+1),(s,j)\right),$

which can be straightforwardly checked to be true.

We now calculate the stationary distribution. Using the reversibility and that $h_{\ell}$ is monotonically increasing in $\ell\in\mathbb{Z}$ , we have for arbitrary $s,j$

	$\displaystyle\nu((s,j))$	$\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{P\left((t,j),(t-1,j)\right)}{P\left((t-1,j),(t,j)\right)}$
		$\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{\frac{at}{n}\min\left\{\exp\left[\beta_{j}\left(h_{t-1}-h_{t}\right)\right],1\right\}}{\frac{a}{20\cdot 2^{t}}\min\left\{\exp\left[\beta_{j}\left(h_{t}-h_{t-1}\right)\right],1\right\}}$
		$\displaystyle=\nu((0,j))\prod_{t=1}^{s}\frac{20t\cdot 2^{t}}{n}\exp\left[\beta_{j}\left(h_{t-1}-h_{t}\right)\right]$
		$\displaystyle=\nu((0,j))\frac{20^{s}\cdot s!\cdot 2^{\binom{s}{2}}}{n^{s}}\exp\left[\beta_{j}(h_{0}-h_{s})\right].$

Furthermore, since $h_{0}=0,$ we have again by reversibility,

	$\displaystyle\nu((0,j))$	$\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{P\left((0,t),(0,t-1)\right)}{P\left((0,t-1),(0,t)\right)}$
		$\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{t})}{\widehat{Z}(\beta_{t-1})}\exp\left[\left(\beta_{t-1}-\beta_{j}\right)h_{0}\right]\right\}}{\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{t-1})}{\widehat{Z}(\beta_{t})}\exp\left[\left(\beta_{t}-\beta_{t-1}\right)h_{0}\right]\right\}}$
		$\displaystyle=\nu((0,1))\prod_{t=1}^{j}\frac{\widehat{Z}(\beta_{t})}{\widehat{Z}(\beta_{t-1})}$
		$\displaystyle\propto\widehat{Z}(\beta_{j}).$

Combining the above, we conclude

\displaystyle\nu((s,j))

\displaystyle\propto\widehat{Z}(\beta_{j})\frac{20^{s}\cdot s!\cdot 2^{\binom{s}{2}}}{n^{s}}\exp\left[-\beta_{j}h_{s}\right].

(48)

The following lemma shows that $(Y_{t},J_{t})$ is stochastically dominated by the pair $(|X_{t}|,I_{t})$ .

Lemma 8.6.

Let $\{(X_{t},I_{t})\}$ denote the Simulated Tempering process starting from some $X_{0}=\sigma\in\Omega_{q,*}$ and $I_{0}=\ell$ . Let $\{(Y_{t},J_{t})\}$ denote the stochastic process described above with parameter $\eta\in(0,1)$ starting from $Y_{0}=q$ and $J_{0}=\ell$ . Assume that $G$ satisfies the conclusion of Lemma 6.5 with parameter $\eta$ , then there exists a coupling $\{((X_{t},I_{t}),(Y_{t},J_{t}))\}$ of the two processes such that for all integer $t\geq 1$ it holds

Y_{t}\leq|X_{t}|\quad\text{and}\quad J_{t}\leq I_{t}.

In particular, for all integer $t\geq 1$ it holds

\displaystyle\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right)\leq\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right).

Proof.

We couple $\{(X_{t},I_{t})\}$ and $\{(Y_{t},J_{t})\}$ as follows. Suppose that $\{(Y_{t-1},J_{t-1})\}\leq\{(X_{t-1},I_{t-1})\}$ for some integer $t\geq 1$ . We will construct a coupling of $\{(X_{t},I_{t})\}$ and $\{(Y_{t},J_{t})\}$ such that $\{(Y_{t},J_{t})\}\leq\{(X_{t},I_{t})\}$ . With probability $a$ , the two chains both attempt to update the first coordinate, and with probability $1-a$ the second.

Consider first updating the first coordinate. Since the probability that $Y_{t}=Y_{t-1}+1$ is less than $1/2$ and so does the probability of $|X_{t}|=|X_{t-1}|-1$ , we may couple $X_{t}$ and $Y_{t}$ such that $|X_{t}|-Y_{t}$ decreases at most one; namely, it never happens that $Y_{t}$ increases by $1$ while $X_{t}$ decreases in size. Thus, it suffices to consider the extremal case when $|X_{t-1}|=Y_{t-1}=s$ . Since $i=I_{t-1}\geq J_{t-1}=j$ , we have $\beta_{i}\geq\beta_{j}$ and thus

	$\displaystyle\Pr\left(\|X_{t}\|=s-1\mid\|X_{t-1}\|=s,I_{t-1}=i\right)$	$\displaystyle=\frac{as}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{s-1}-h_{s}\right)\right]\right\}$
		$\displaystyle\leq\frac{as}{n}\min\left\{1,\exp\left[\beta_{j}\left(h_{s-1}-h_{s}\right)\right]\right\}=P\left((s,j),(s-1,j)\right)$

So we can couple $\{(X_{t},I_{t})\}$ and $\{(Y_{t},J_{t})\}$ such that either $|X_{t}|=Y_{t}$ or $|X_{t}|=s$ , $Y_{t}=s-1$ . Meanwhile, recall that $A(X_{t-1})$ is the set of vertices $v$ such that $X_{t-1}\cup\{v\}\in\Omega$ . Then we have

|A(X_{t-1})|\geq\frac{n}{20\cdot 2^{s}}

whenever $s\leq n_{\eta}$ by Lemma 6.5. Hence, we deduce that

	$\displaystyle\Pr\left(\|X_{t}\|=s+1\mid\|X_{t-1}\|=s,I_{t-1}=i\right)$	$\displaystyle=\dfrac{a\|\mathrm{ext}(X_{t-1})\|}{n}\min\left\{1,\exp\left[\beta_{i}\left(h_{s+1}-h_{s}\right)\right]\right\}$
		$\displaystyle\geq\dfrac{a\mathbbm{1}\{s\leq n_{\eta}\}}{20\cdot 2^{s}}\min\left\{1,\exp\left[\beta_{j}\left(h_{s+1}-h_{s}\right)\right]\right\}=P\left((s,j),(s+1,j)\right)$

So we can couple $\{(X_{t},I_{t})\}$ and $\{(Y_{t},J_{t})\}$ such that either $|X_{t}|=Y_{t}$ or $|X_{t}|=s+1$ and $Y_{t}=s$ , as desired.

Next we consider update the second coordinate. Since the probability that $J_{t}=J_{t-1}+1$ is less than $1/2$ and so does the probability of $I_{t}=I_{t-1}-1$ , we can couple $I_{t}$ and $J_{t}$ such that it never happens both $J_{t}=J_{t-1}+1$ and $I_{t}=I_{t-1}-1$ . This means that it suffices for us to consider the extremal case where $I_{t}=J_{t}=i$ . Since $g=|X_{t-1}|\geq Y_{t-1}=s$ , we have $h_{g}\geq h_{s}$ and therefore

	$\displaystyle\Pr\left(I_{t}=i-1\mid I_{t-1}=i,\|X_{t-1}\|=g\right)$	$\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i-1})}\exp\left[\left(\beta_{i-1}-\beta_{i}\right)h_{g}\right]\right\}$
		$\displaystyle\leq\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i-1})}\exp\left[\left(\beta_{i-1}-\beta_{i}\right)h_{s}\right]\right\}=P\left((s,i),(s,i-1)\right).$

and similarly,

	$\displaystyle\Pr\left(I_{t}=i+1\mid I_{t-1}=i,\|X_{t-1}\|=g\right)$	$\displaystyle=\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{g}\right]\right\}$
		$\displaystyle\geq\frac{1-a}{2}\min\left\{1,\frac{\widehat{Z}(\beta_{i})}{\widehat{Z}(\beta_{i+1})}\exp\left[\left(\beta_{i+1}-\beta_{i}\right)h_{s}\right]\right\}=P\left((s,i),(s,i+1)\right).$

Therefore, we can couple $I_{t}$ and $J_{t}$ such that only one of the followings can happen:

(i)

$I_{t}=J_{t}$ ;
(ii)

$I_{t}=i$ and $J_{t}=i-1$ ;
(iii)

$I_{t}=i+1$ and $J_{t}=i$ .

Therefore, one always has $I_{t}\geq J_{t}$ under the constructed coupling. ∎

The next lemma upper bounds the $t$ -step transition probability $\Pr\left(Y_{t}=p\mid Y_{0}=q\right)$ .

Lemma 8.7.

Let $\{(Y_{t},J_{t})\}$ denote the Markov process on $\mathbb{Z}^{2}$ described above with parameter $\eta\in(0,1)$ starting from $Y_{0}=q=\rho\log n$ and $J_{0}=j$ . Then for all integer $t\geq 1$ we have

\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]{\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}}.

where $\rho^{\prime}=\min\{\rho,1-\eta\}$ .

Proof.

When $\rho>1-\eta$ , namely $q>(1-\eta)\log n$ , the chain will first move to $(1-\eta)\log n$ before reaching $p$ . By Fact 6.7, we have

\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)=P^{t}((q,\ell),(0,0))=\frac{\nu((q,\ell))}{\nu((0,0))}P^{t}((0,0),(q,\ell))\leq\frac{\nu((q,\ell))}{\nu((0,0))}.

By (48),

	$\displaystyle\frac{\nu((q,\ell))}{\nu((0,0))}.$	$\displaystyle=\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{20^{q}\cdot(q!)\cdot 2^{\binom{q}{2}}}{n^{q}}\exp\left[\beta_{0}h_{0}-\beta_{\ell}h_{q}\right]$
		$\displaystyle=\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}\exp\left[(\ln 2)(\log n)^{2}\left(-\rho+\frac{\rho^{2}}{2}+o(1)\right)\right].$

For $\rho>1-\eta$ , let $\tau$ be the first time that the chain reach size $q^{\prime}=(1-\eta)\log n$ . Then we have

		$\displaystyle\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)$
	$\displaystyle={}$	$\displaystyle\sum_{t^{\prime}=0}^{t}\Pr(\tau=t^{\prime})\sum_{\ell^{\prime}\in[m]}\Pr\left((Y_{t^{\prime}},J_{t^{\prime}})=(q^{\prime},\ell^{\prime})\mid(Y_{0},J_{0})=(q,\ell),\tau=t^{\prime}\right)$
		$\displaystyle\cdot\Pr\left((Y_{t-t^{\prime}},J_{t-t^{\prime}})=(0,0)\mid(Y_{0},J_{0})=(q^{\prime},\ell^{\prime})\right)$
		$\displaystyle\leq\max_{\ell^{\prime}\in[m]}\frac{\nu((q^{\prime},\ell^{\prime}))}{\nu((0,0))}$
		$\displaystyle\leq\max_{\ell^{\prime}\in[m]}\frac{\widehat{Z}(\beta_{\ell^{\prime}})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell^{\prime}}h_{q^{\prime}}\right)}\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right].$

The lemma then follows from the fact that for all $\ell^{\prime}\in[m]$

\frac{\widehat{Z}(\beta_{\ell^{\prime}})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{\ell^{\prime}}h_{q^{\prime}}\right)}\leq n^{o(\log n)}e^{O(\log n)}=n^{o(\log n)},

where we use Eq. 43. ∎

We now present the proof of Theorem 8.5 provided Lemmas 8.6 and 8.7.

Proof of Theorem 8.5.

Recall that $\gamma<2(1-\alpha)$ . As will be clear later, we define

\eta=\sqrt{(1-\alpha)\gamma-\frac{\gamma^{2}}{2}}.

We assume that our graph satisfies the conclusions of Lemmas 5.1 and 6.5 with parameter $\eta$ .

Consider first Eq. 45. From Lemmas 8.6 and 8.7, we deduce that

	$\displaystyle~{}~{}~{}~{}\mathbb{E}[W_{q,r}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)$
	$\displaystyle=\mathbb{E}[W_{q,r}]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}\Pr\left((X_{t},I_{t})=(\emptyset,0)\mid(X_{0},I_{0})=(\sigma,\ell)\right)$
	$\displaystyle\leq\mathbb{E}[W_{q,0}]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}\Pr\left((Y_{t},J_{t})=(0,0)\mid(Y_{0},J_{0})=(q,\ell)\right)$
	$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}$
	$\displaystyle~{}~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}$
	$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{\varepsilon^{2}}{2}+\frac{\eta^{2}}{2}+o(1)\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{8}\varepsilon^{2}+o(1)\right)\right],$

where recall that $\rho=1+\varepsilon$ and $\rho^{\prime}=1-\eta$ and we choose $\eta=\varepsilon/2$ .

For Eq. 46, by similar argument we have

	$\displaystyle~{}~{}~{}~{}\mathbb{E}[W_{p,s}]\Pr\left((X_{t},I_{t})=(\sigma,\ell)\mid(X_{0},I_{0})=(\emptyset,0)\right)$
	$\displaystyle\leq\exp\left[(\ln 2)(\log n)^{2}\left(\frac{1}{2}-\frac{1}{2}(1-\alpha)^{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{0})}{\widehat{Z}(\beta_{\ell})}\frac{\exp\left(\beta_{\ell}h_{q}\right)}{\exp\left(\beta_{0}h_{0}\right)}$
	$\displaystyle~{}~{}~{}\cdot\exp\left[(\ln 2)(\log n)^{2}\left(-\rho^{\prime}+\frac{(\rho^{\prime})^{2}}{2}+o(1)\right)\right]\frac{\widehat{Z}(\beta_{\ell})}{\widehat{Z}(\beta_{0})}\frac{\exp\left(\beta_{0}h_{0}\right)}{\exp\left(\beta_{\ell}h_{q}\right)}$
	$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{1}{2}(1-\alpha)^{2}+\frac{\eta^{2}}{2}+o(1)\right)\right]\leq\exp\left[(\ln 2)(\log n)^{2}\left(-\frac{3}{8}(1-\alpha)^{2}+o(1)\right)\right],$

where recall that $\rho=1$ (for maximizing $\mathbb{E}[W_{p,s}]$ ) and $\rho^{\prime}=1-\eta$ and we choose $\eta=\varepsilon/2\leq(1-\alpha)/2$ .

Therefore, we obtain Eqs. 45 and 46. The theorem then follows. ∎

9 Conclusion

In this work we revisit the work by Jerrum [Jer92] that large cliques elude the Metropolis process. We extend [Jer92] by establishing the failure of the Metropolis process (1) for all planted clique sizes $k=n^{\alpha}$ for any constant $\alpha\in(0,1),$ (2) for arbitrary temperature and Hamiltonian vector (under worst-case initialization), (3) for a large family of of temperatures and Hamiltonian vectors (under the empty clique initialization) and obtain as well (4) lower bounds for the performance of the simulated tempering dynamics.

An important future direction would be to explore the generality of our proposed reversibility and birth and death process arguments which allowed us to prove the failure of the Metropolis process when initialized at the empty clique. It is interesting to see whether the proposed method can lead to MCMC lower bounds from a specific state in other inference settings beyond the planted clique model.

Moreover, it would be interesting to see if our results can be strengthened even more. First, a current shortcoming of our lower bounds for the Metropolis process when initialized from the empty clique do not completely cover the case where $\beta=C\log n$ for an arbitrary constant $C>0$ . While we almost certainly think the result continues to hold in this case, some new idea seem to be needed to prove it. Second, it seems interesting to study the regime where $\alpha=1-o(1).$ Recall that there are polynomial-time algorithms that can find a clique of size $(\log n/\log\log n)^{2}$ whenever a (worst-case) graph has a clique of size $n/(\log n)^{b}$ , for some constant $b>0$ [Fei05]. If our lower bounds for the Metropolis process could be extended to the case $\alpha=1-O(\log\log n/\log n)$ , this would mean that for some $k$ the Metropolis process fails to find in polynomial-time a clique of size $(1+\varepsilon)\log n$ when a $k$ -clique is planted in $\mathcal{G}(n,1/2),$ while some other polynomial-time algorithm can find a clique of size $(1+\varepsilon)\log n$ on every (worst-case) graph which has a clique of size $k.$ Such a result, if true, will make the failure of the Metropolis process even more profound.

Acknowledgement

EM and IZ are supported by the Simons-NSF grant DMS-2031883 on the Theoretical Foundations of Deep Learning and the Vannevar Bush Faculty Fellowship ONR-N00014-20-1-2826. EM is also supported by the Simons Investigator award (622132).

References

[ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940–969, 2014.
[AFdF21] Maria Chiara Angelini, Paolo Fachin, and Simone de Feo. Mismatching as a tool to enhance algorithmic performances of monte carlo methods for the planted clique model, 2021.
[AFUZ19] Fabrizio Antenucci, Silvio Franz, Pierfrancesco Urbani, and Lenka Zdeborová. On the glassy nature of the hard phase in inference problems. Physical Review X, 9:011020, January 2019.
[AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457–466, 1998.
[AWZ20] Gérard Ben Arous, Alexander S. Wein, and Ilias Zadik. Free energy wells and overlap gap property in sparse PCA. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 479–482. PMLR, 2020.
[BAGJ20] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor pca. Annals of Probability, 48:2052–2087, 07 2020.
[BB20] Matthew Brennan and Guy Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
[BE76] B. Bollobas and P. Erdös. Cliques in random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 80(3):419–427, 1976.
[BHK⁺19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.
[BR04] Nayantara Bhatnagar and Dana Randall. Torpid mixing of simulated tempering on the potts model. In SODA, volume 4, pages 478–487. Citeseer, 2004.
[BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.
[DGGP14] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. Combinatorics, Probability and Computing, 23(1):29–49, 2014.
[DM15] Yash Deshpande and Andrea Montanari. Finding hidden cliques of size $\sqrt{N/e}$ in nearly linear time. Foundations of Computational Mathematics, 15(4):1069–1128, 2015.
[DSC06] Persi Diaconis and Laurent Saloff-Coste. Separation cut-offs for birth and death chains. The Annals of Applied Probability, 16(4):2098–2122, 2006.
[Fei05] Uriel Feige. Approximating maximum clique by removing subgraphs. SIAM J. Discret. Math., 18(2):219–225, feb 2005.
[FGR⁺17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
[GM75] G. R. Grimmett and C. J. H. McDiarmid. On colouring random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 77(2):313–324, 1975.
[GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property, 2019.
[Jer92] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.
[Kar79] Richard M Karp. Recent advances in the probabilistic analysis of graph-theoretic algorithms. In International Colloquium on Automata, Languages, and Programming, pages 338–339. Springer, 1979.
[Kuč95] Luděk Kučera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2-3):193–212, 1995.
[LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
[MBC⁺20] Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Complex dynamics in simple neural networks: Understanding gradient flow in phase retrieval, 2020.
[MKUZ19] Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4333–4342. PMLR, 09–15 Jun 2019.
[MP92] Enzo Marinari and Giorgio Parisi. Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6):451, 1992.
[MW15] Zongming Ma and Yihong Wu. Computational barriers in minimax submatrix detection. The Annals of Statistics, 43(3):1089–1116, 2015.
[MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143(3):401–439, 2009.
[RF10] Dorit Ron and Uriel Feige. Finding hidden cliques in linear time. Discrete Mathematics & Theoretical Computer Science, 2010.
[RM14] Emile Richard and Andrea Montanari. A statistical model for tensor pca. Advances in Neural Information Processing Systems, 27, 2014.

Appendix A Deferred Proofs

Proof of Lemma 5.1.

For part (1), notice that by linearity of expectation and the elementary application of Stirling’s formula that for $m_{2}\leq m_{1}$ with $m_{2}=o(m_{1})$ it holds $\binom{m_{1}}{m_{2}}=(m_{1}/m_{2})^{m_{2}(1+o(1))}$ we have

	$\displaystyle\mathbb{E}[W_{q,r}]$	$\displaystyle=\binom{k}{r}\binom{n-k}{q-r}2^{\binom{r}{2}-\binom{q}{2}}$
		$\displaystyle=\binom{n^{\alpha}}{\left\lfloor\gamma\log n\right\rfloor}\binom{n-n^{\alpha}}{\left\lfloor\rho\log n\right\rfloor-\left\lfloor\gamma\log n\right\rfloor}2^{\binom{\left\lfloor\gamma\log n\right\rfloor}{2}-\binom{\left\lfloor\rho\log n\right\rfloor}{2}}$
		$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(\alpha\gamma+(\rho-\gamma)+\frac{\gamma^{2}}{2}-\frac{\rho^{2}}{2}+o(1)\right)\right]$
		$\displaystyle=\exp\left[(\ln 2)(\log n)^{2}\left(\rho-\frac{\rho^{2}}{2}-(1-\alpha)\gamma+\frac{\gamma^{2}}{2}+o(1)\right)\right].$

For part (2), notice that $W_{q,0}$ is distributed as the number of $q$ -cliques in $\mathcal{G}(n-k,\frac{1}{2})$ . Hence, standard calculation (e.g. [BE76, Proof of Theorem 1]) prove that since $\rho=2-\Omega(1),$

\frac{\mathrm{Var}(W_{q,0})}{\mathbb{E}[W_{q,0}]^{2}}=O(\frac{q^{4}}{n^{2}})=O(\frac{1}{n}).

Hence, Chebyshev’s inequality yields that with probability $1-O(1/n),$ $W_{q,0}\geq\frac{1}{2}\mathbb{E}[W_{q,0}].$ Taking a union bound over the different values of $q=O(\log n)$ completes the proof of this part.

Finally, part (3) follows directly from part (1), Markov’s inequality and a union bound over the possible values of $r,q$ . ∎

Proof of Lemma 6.5.

It clearly suffices to establish this result for $k=0$ , i.e. for an the random graph $G(n,\frac{1}{2}).$ For any fixed $|U|\leq(1-\eta)\log n,$ $|A(U)|$ follows a Binomial distribution with $n-|U|$ trials and probability $\frac{1}{2^{|U|}}.$ In particular, it has a mean $(1+o(1))\frac{n}{2^{|U|}}=\Omega(n^{\eta})$ . Hence, by Hoeffding’s inequality with probability $1-\exp(-\Omega(n^{\eta}))$ it holds $|A(U)|\geq\frac{n}{20\cdot 2^{|U|}}$ . As there are only $\binom{n}{\left\lfloor\log n\right\rfloor}=n^{O(\log n)}$ the result follows from a union bound over $|U|$ . ∎

		$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{q,}\cup\Omega_{,s}\;\Big{\|}\;X_{0}=C\Big{)}$
	$\displaystyle\leq{}$	$\displaystyle\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q,<s}\;\Big{\|}\;X_{0}=C\Big{)}+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega^{[q^{\prime}]}_{q^{\prime}<\cdot<q,s}\;\Big{\|}\;X_{0}=C\Big{)}$
		$\displaystyle+\Pr\Big{(}\exists t\in\mathbb{N}\wedge t\leq T:\,X_{t}\in\Omega_{\leq q^{\prime},s}\;\Big{\|}\;X_{0}=C\Big{)}+\frac{1}{n^{\omega(1)}}.$

Almost-Linear Planted Cliques Elude the Metropolis Process

Abstract

1 Introduction

Jerrum’s result and the planted clique model

The planted clique conjecture

Empty clique initialization

2 Main Contribution

Theorem 2.1 (Informal version of Theorem 6.1 and Theorem 7.1).

Theorem 2.2 (Informal version of Theorem 6.2 and Theorem 7.6).

Theorem 2.3 (Informal version of Theorem 7.7).

Theorem 2.4 (Informal version of Theorem 8.3, and Theorem 8.4).

Theorem 2.5 (Informal version of Theorem 8.5).

2.1 Further Comparison with Related Work

Comparison with [AFdF21]

MCMC underperformance in statistical inference

3 Proof Techniques and Intuitions

3.1 Worst-case Initialization for Reaching Large Intersection

3.2 Worst-case Initialization for Reaching Large Cliques

3.3 Failure when Starting from the Empty Clique

4 Organization of Main Body

5 Getting Started

5.1 Random Graphs with a Planted Clique

Lemma 5.1.

Definition 5.2 (Clique-Counts Properties 𝒫𝗎𝗉𝗉\mathcal{P}_{\mathsf{upp}} and 𝒫𝗅𝗈𝗐\mathcal{P}_{\mathsf{low}}).

Lemma 5.3.

Proof.

5.2 Hamiltonian and Gibbs Measure

Assumption 5.4.

5.3 Metropolis Process and the Hitting Time Lower Bound

Lemma 5.5.

6 Quasi-polynomial Hitting Time of Large intersection

6.1 Existence of a Bad Initial Clique

Theorem 6.1.

Proof.

6.2 Starting from the Empty Clique

6.2.1 Our Result

Theorem 6.2.

6.2.2 Key Lemmas

Definition 6.3.

Definition 6.4 (Expansion Property 𝒫𝖾𝗑𝗉\mathcal{P}_{\mathsf{exp}}).

Lemma 6.5 (“Expansion Lemma”).

Lemma 6.6.

6.2.3 Proof of Theorem 6.2, given Lemma 6.6

Proof of Theorem 6.2.

6.2.4 Proof of Lemma 6.6

Fact 6.7 ([LP17]).

Fact 6.8 ([LP17]).

Lemma 6.9.

Proof.

Lemma 6.10.

Proof.

Proof of Lemma 6.6.

7 Quasi-polynomial Hitting Time of Large Cliques

7.1 Existence of a Bad Initial Clique

Theorem 7.1.

Definition 7.2 (Gateways).

Definition 7.3 (Gateway-Counts Property 𝒫𝗀𝗐\mathcal{P}_{\mathsf{gw}}).

Lemma 7.4.

Proof.

Proof of Theorem 7.1.

Claim 7.5.

Proof of Claim 7.5.

7.2 Starting from the Empty Clique

Theorem 7.6.

Proof.

7.3 Low Temperature Regime and Greedy Algorithm

Theorem 7.7.

Lemma 7.8.

Proof.

Proof of Theorem 7.7.

8 Simulated Tempering

8.1 Definition of the dynamics

Remark 8.1.

Remark 8.2.

8.2 Existence of a Bad Initial Clique

Theorem 8.3.

Proof.

Theorem 8.4.

Proof.

8.3 Starting From the Empty Clique

Almost-Linear Planted Cliques
Elude the Metropolis Process

Definition 5.2 (Clique-Counts Properties $\mathcal{P}_{\mathsf{upp}}$ and $\mathcal{P}_{\mathsf{low}}$ ).

Definition 6.4 (Expansion Property $\mathcal{P}_{\mathsf{exp}}$ ).

Definition 7.3 (Gateway-Counts Property $\mathcal{P}_{\mathsf{gw}}$ ).