Almost-Linear Planted Cliques
Elude the Metropolis Process
Abstract
A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size , which is planted in the Erdős-Rényi random graph , in polynomial time. Information theoretically it is possible to find such planted cliques as soon as .
Since the work of Jerrum, the computational problem of finding a planted clique in was studied extensively and many polynomial time algorithms were shown to find the planted clique if it is of size , while no polynomial-time algorithm is known to work when . The computational problem of finding a planted clique of is now widely considered as a foundational problem in the study of computational-statistical gaps. Notably, the first evidence of the problem’s algorithmic hardness is commonly attributed to the result of Jerrum from 1992.
In this paper we revisit the original Metropolis algorithm suggested by Jerrum. Interestingly, we find that the Metropolis algorithm actually fails to recover a planted clique of size for any constant , unlike many other efficient algorithms that succeed when . Moreover, we strengthen Jerrum’s results in a number of other ways including:
-
•
Like many results in the MCMC literature, the result of Jerrum shows that there exists a starting state (which may depend on the instance) for which the Metropolis algorithm fails to find the planted clique in polynomial time. For a wide range of temperatures, we show that the algorithm fails when started at the most natural initial state, which is the empty clique. This answers an open problem stated in Jerrum (1992). It is rather rare to be able to show the failure of a Markov chain starting from a specific state.
-
•
We show that the simulated tempering version of the Metropolis algorithm, a more sophisticated temperature-exchange variant of it, also fails at the same regime of parameters.
Our results substantially extend Jerrum’s result. Furthermore, they confirm recent predictions by Gamarnik and Zadik (2019) and Angelini, Fachin, de Feo (2021). Finally, they highlight the subtleties of using the sole failure of one, however natural, family of algorithms as a strong sign of a fundamental statistical-computational gap.
1 Introduction
The problem of finding in polynomial-time large cliques in the -vertex Erdős-Rényi random graph where each edge is present independently with probability , is a fundamental open problem in algorithmic random graph theory [Kar79]. In it is known that there is a clique of size with high probability (w.h.p.) as , and several simple polynomial-time algorithms can find a clique of size w.h.p. which is nearly half the size of the maximum clique (e.g. see [GM75]). (Note that here and everywhere we denote by the logarithm with base 2.) Interestingly, there is no known polynomial-time algorithm which is able to find a clique of size for some constant . The problem of finding such a clique in polynomial-time was suggested by Karp [Kar79] and still remains open to this day.
Jerrum’s result and the planted clique model
Motivated by the challenge of finding a clique in , Jerrum in [Jer92] established that large cliques elude the Metropolis process in . Specifically, he considered the following Gibbs Measure for
(1) |
where induces a clique in the instance . Notice that since by definition assigns higher mass on cliques of larger size. Jerrum considered the Metropolis process with stationary measure , which is initialized in some clique, say of Then the process moves “locally” between cliques which differ in exactly one vertex. In more detail, every step of the Metropolis process is described as follows (see also Algorithm 1). Choose a vertex uniformly at random. If where is the current clique, then let (a “downward” step) with probability and let with the remaining probability; else if and is a clique, then let (an “upward” step); otherwise, let . For this process, Jerrum established the negative result that it fails to reach a clique of size in polynomial-time for any constant . We note that, as customary in the theory of Markov chains, the failure is subject to the Metropolis process being “worst-case initialized”, that is starting from some “bad” clique . This is a point we revisit later in this work.
The planted clique problem was introduced by Jerrum in [Jer92] in order to highlight the failure of the Metropolis process. For the planted clique model is defined by first sampling an instance of , then choosing out of the vertices uniformly at random and finally adding all the edges between them (if they did not already exist from the randomness of ). The set of chosen vertices is called the planted clique . It is perhaps natural to expect that the existence of in can assist the Metropolis process to reach faster cliques of larger size. Yet, Jerrum proved that as long as for some constant the Metropolis process continues to fail to find a clique of size in polynomial-time, for any . As he also noticed when one can actually trivially recover from via a simple heuristic which chooses the top- degrees of the observed graph (see also [Kuč95]). In particular, one can trivially find a clique of size much larger than when Importantly though, he never proved that the Metropolis process actually succeeds in finding large cliques when leaving open a potentially important piece of the performance of the Metropolis process in the planted clique model. In his words from the conclusion of [Jer92]:
“If the failure of the Metropolis process to reach -size cliques could be shown to hold for some , it would represent a severe indictment of the Metropolis process as a heuristic search technique for large cliques in a random graph.”
In this work, we seek to investigate the performance of the Metropolis process for all
The planted clique conjecture
Following the work of Jerrum [Jer92] and Kucera [Kuč95] the planted clique model has received a great deal of attention and became a hallmark of a research area that is now called study of statistical-computational gaps. The planted clique problem can be phrased as a statistical or inference problem in the following way: given an instance of recover the planted clique vertices. It is impossible to recover the clique when it is of size for any constant (see e.g. [ACV14]), but possible information theoretically (and in quasi-polynomial time ) whenever for any constant (see e.g. the discussion in [FGR+17]). Meanwhile, multiple polynomial-time algorithms have been proven to succeed in recovering the planted clique but only under the much larger size [AKS98, RF10, DGGP14, DM15]. The intense study of the model, as well as the failure to find better algorithms, has lead to the planted clique conjecture, stating that the planted clique recovery task is impossible in polynomial-time, albeit information-theoretically possible, whenever . The planted clique conjecture has since lead to many applications, a highlight of which is that it serves as the main starting point for building reductions between a plethora of statistical tasks and their computational-statistical trade-offs (see e.g. [BR13, MW15, BB20]).
Unfortunately, because of the average-case nature of the planted clique model, a complexity theory explanation is still lacking for the planted clique conjecture. For this reason, researchers have so far mainly focused on supporting the conjecture by establishing the failure of restricted families of polynomial-time methods, examples of which are Sum-of-Squares lower bounds [BHK+19], statistical-query lower bounds [FGR+17] and low-temperature MCMC lower bounds [GZ19]. Because of the focus on establishing such restricted lower bounds, the vast majority of works studying the planted clique conjecture cite the result of Jerrum [Jer92] on the failure of the Metropolis process as the “first evidence” for the validity of it. Note though that given what we discussed above, such a claim for Jerrum’s result can be problematic. Indeed, recall that Jerrum have not established the success of the Metropolis process when in reaching cliques of size , let alone identifying the planted clique. That means that the Metropolis process could in principle simply be not recovering the planted clique for all offering no evidence for the planted clique conjecture.
In fact, in 2019, Gamarnik and Zadik [GZ19] studied the performance of various MCMC methods (not including the Metropolis process described above) for the planted clique model and conjectured their failure to recover the planted clique when , that is the failure much beyond the threshold. For this reason, they raised again the question whether Jerrum’s original Metropolis process actually fails for larger than . Later work by Angelini, Fachin and de Feo [AFdF21] simulated the performance of the Metropolis process and predicted its failure to recover the planted clique again much beyond the threshold. Both of these results suggest that the Metropolis process may not be able to recover the planted clique for some values of . We consider the absence of a negative or positive result for whether the Metropolis process of [Jer92] recovers the planted clique a major gap in the literature of the planted clique conjecture which we investigate in this work.
Empty clique initialization
A common deficiency of many Markov chain lower bounds is their failure to establish lower bounds for starting from any particular state. Indeed, many of the lower bounds in the theory of Markov chains including spectral and conductance lower bounds are proved with high probability or in expectation over the stationary measure of the chain. Of course, since the lower bounds provide usually evidence that is hard to sample from the stationary distribution, the deficiency of the lower bound is that it is hard to find a state where the lower bound applies! This is indeed the case for the specific example of the Metropolis process in [Jer92], where a-priori, other than the empty set we do not know which subsets of the nodes make a clique.
Jerrum noted this deficiency in his paper [Jer92]:
“The most obviously unsatisfactory feature of Theorems … is that these theorems assert only the existence of a starting state from which the Metropolis process takes super-polynomial time to reach a large clique … It seems almost certain that the empty clique is a particular example of a bad starting state, but different proof techniques would be required to demonstrate this.”
In this work we develop a new approach based on comparison with birth and death processes allowing us to prove lower bounds starting from the empty clique. We note that previous work on birth and death processes established Markov chain bounds starting from a specific state [DSC06].
2 Main Contribution
We now present our main contributions on the performance of the Metropolis process for the planted clique model where for some constant . Our results hold for the Metropolis process associated with any Gibbs measure defined as follows. For an arbitrary Hamiltonian vector and arbitrary let
(2) |
where by we denote the cliques of . In some results, we would require a small degree of regularity from the vector , which for us is to satisfy and to be 1-Lipschitz in the sense that
(3) |
We call these two conditions, as simply being “regular”. The regularity property is for technical reasons, as it allows to appropriately bound the transition probabilities of the Metropolis process between small cliques. Notice that is trivially regular and corresponds to the Gibbs measure and Metropolis process considered by Jerrum, per (1).
Our first theorem is a very general lower bound which holds under worst-case initialization for all
Theorem 2.1 (Informal version of Theorem 6.1 and Theorem 7.1).
Let for any .
-
I.
For arbitrary and arbitrary inverse temperature , for any constant there exists a “bad” initialization such that the Metropolis process requires time to reach intersection with the planted clique.
-
II.
For arbitrary regular and arbitrary inverse temperature , for any constant there exists a “bad” initialization such that the Metropolis process requires time to reach a clique of size .
One way to think about the two parts of Theorem 2.1 is in terms of the statistical failure and the optimization failure of the Metropolis algorithm. The first part establishes the statistical failure as the algorithm cannot even find vertices of the planted clique. The second part shows that it fails as an optimization algorithm: the existence of a huge planted clique still does not improve the performance over the level.
Note that second part extends the result of [Jer92] to all when . (The case is proven below in Theorem 2.3 since for that low temperature the process behaves like the greedy algorithm). In Jerrum’s words, our result reveals “a severe indictment of the Metropolis process in finding large cliques in random graphs”; even a -size planted clique does not help the Metropolis process to reach cliques of size in polynomial time. At a technical level it is an appropriate combination of the bottleneck argument used by Jerrum in [Jer92], which focus on comparing cliques based on their size, and a separate bottleneck argument comparing cliques based on how much they intersect the planted clique.
Our next result concerns the case where the Metropolis process is initialized from the empty clique. We obtain for all the failure of the Metropolis process starting from any -size clique (including in particular the empty clique). In particular, this answers in the affirmative the question from [Jer92] for all
Theorem 2.2 (Informal version of Theorem 6.2 and Theorem 7.6).
Let for any . Then for arbitrary regular and arbitrary inverse temperature , the Metropolis process started from any clique of size (in particular the empty clique) requires time to reach a clique which for some constants either
-
•
has at least intersection with the planted clique or,
-
•
has size at least
The proof of Theorem 2.2 is based on the expansion properties of all -cliques of . The expansion properties allow us to compare the Metropolis process to an one dimensional birth and death process that keeps track of the size of the clique (or the size of the intersection with the hidden clique). The analysis of this process is based on a time-reversal argument.
One can wonder, whether a lower bound can be also established in the case when we start from the empty clique. We partially answer this, by obtaining the failure of the Metropolis process starting from the empty clique in the case where and .
Theorem 2.3 (Informal version of Theorem 7.7).
Let for any . For and arbitrary inverse temperature , the Metropolis process started from any clique of size (in particular the empty clique) requires time to reach a clique which for some constants either
-
•
has at least intersection with the planted clique or,
-
•
has size at least
The key idea here is that for if then with high probability the Metropolis process never removes vertices, so it is in fact the same as the greedy algorithm. This observation allows for a much easier analysis of the algorithm.
In a different direction, Jerrum in [Jer92] asked whether one can extend the failure of the Metropolis process to the failure of simulated annealing on finding large cliques in random graphs. We make a step also in this direction, by considering the simulated tempering (ST) version of the Metropolis process [MP92]. The simulated tempering is a celebrated Monte Carlo scheme originated in the physics literature that considers a Gibbs measure, say the one in (2), at different temperatures , in other words considers the Gibbs measures . Then it runs a Metropolis process on the product space between the set of temperatures and the Gibbs measures, which allows to modify the temperature during its evolution and interpolate between the different (for the exact definitions see Section 8.1). The ST dynamics have been observed extensively in practise to outperform their single temperature Metropolis process counterparts but rigorous analysis of these processes is rather rare, see [BR04].
It turns out that our “bad” initialization results extend in a straightforward manner to the ST dynamics.
Theorem 2.4 (Informal version of Theorem 8.3, and Theorem 8.4).
Let for any .
-
•
For arbitrary , arbitrary and arbitrary sequence of inverse temperatures , there exists a “bad” initialization such as the ST dynamics require time to reach intersection with the planted clique, for some constant
-
•
For arbitrary regular , arbitrary and arbitrary sequence of inverse temperatures with , there exists a “bad” initialization such as the ST dynamics require time to reach a clique of size -size for some constant
The key idea behind the proof of Theorem 2.4 is that the bottleneck set considered in the proof of Theorem 2.1 is “universal”, in the sense that the same bottleneck set applies to all inverse temperatures For this reason, the same bottleneck argument can be applied to simulated tempering that is allowed to change temperatures during its evolution.
On top of this, we establish a lower bound on the ST dynamics, when started from the empty clique.
Theorem 2.5 (Informal version of Theorem 8.5).
Let for any . For arbitrary regular and increasing arbitrary and arbitrary sequence of inverse temperatures with , the ST dynamics started from the pair of the empty clique and the temperature require time to reach a clique which for some constant either
-
•
has at least intersection with the planted clique or,
-
•
has size at least
2.1 Further Comparison with Related Work
Comparison with [AFdF21]
As mentioned in the Introduction, the authors of [AFdF21] predicted the failure of the Metropolis process for the Gibbs measure (1) to recover the planted clique. Specifically, based on simulations they predicted the failure for all for though they comment that the exact predicted threshold could be an artifact of finite sized effects. In this work, using Theorem 2.1 we establish that (worst-case initialized) the Metropolis process fails for all confirming their prediction but for In the same work, the authors suggest studying instead the Metropolis process for another Gibbs measure of the form (2), which they call BayesMC. Their suggested Gibbs measure for a specific value of matches a (slightly perturbed) version of the posterior of the planted clique recovery problem. The authors predict based on simulations that by appropriately turning (and “mismatching”) , the Metropolis chain now recovers the planted clique for all . In this work, we partially refute this prediction using Theorem 2.1, as (worst-case initialized) the Metropolis process for any Gibbs measure (2), including the mismatched posterior that [AFdF21] considers, fails to recover the planted clique for all
MCMC underperformance in statistical inference
Our lower bounds show the suboptimality of certain natural MCMC methods in inferring the planted clique in a regime where simple algorithms work. Interestingly, this agrees with a line of work establishing the suboptimality of MCMC methods in inferring hidden structures in noisy environments. Such a phenomenon have been generally well-understood in the context of tensor principal component analysis [RM14, BAGJ20], where Langevin dynamics and gradient descent on the empirical landscape are known to fail to infer the hidden tensor, when simple spectral methods succeed. Moreover, this suboptimality has been recently observed through statistical physics methods for other models including sparse PCA [AFUZ19], the spiked matrix-tensor model [MKUZ19] and phase retrieval [MBC+20]. Finally, it has been also observed for a different family of MCMC methods but again for the planted clique model in [GZ19].
3 Proof Techniques and Intuitions
In this section, we offer intuition regarding the proofs of our results. In the first two subsections, we make a proof overview subsection we provide intuition behind our worst-case initialization results for the Metropolis process Theorem 2.1 and ST dynamics Theorem 2.4. In the following one we discuss about our results on the Metropolis process and ST dynamics with the empty clique initialization Theorems 6.2 and 7.6.
3.1 Worst-case Initialization for Reaching Large Intersection
We start with discussing the lower bound for obtaining intersection with the planted clique. We employ a relatively simple bottleneck argument, based on Lemma 5.5.
For let us denote by the number of cliques in which have size and intersect the planted clique exactly on vertices. Our first starting point towards building the bottleneck is the following simple observation. For any and for a constant we have w.h.p. as
(4) |
In words, the number of -cliques of intersection with the planted clique are a quasi-polynomial factor less than the number of -cliques which are disjoint with the planted clique. Indeed, at the first-moment level it can be checked to hold and a standard second moment argument gives the result (both results are direct outcomes of Lemma 5.1). Notice that this property holds for all
Now let us assume for start that . In that case the Gibbs measure is simply the uniform measure over cliques of . For let be the subset of the cliques with intersection with the planted clique at most . Using standard results (see Lemma 5.5) to prove our hitting time lower bound it suffice to show for any with constant
(5) |
w.h.p. as . Indeed, given such result based on Lemma 5.5 there is an initialization of the Metropolis process in from which it will take quasi-polynomial time to reach the boundary of , which are exactly the cliques of intersection with the planted clique.
Now another first moment argument, (an outcome of part (3) of Lemma 5.1) allows us to conclude that since is not too large, the only cliques of intersection with the planted clique satisfy for small enough In other words unless But now using also (4) we have
and the result follows.
Now, the interesting thing is that the exact same calculation works for arbitrary and arbitrary Hamiltonian vector Consider as before the same subset of cliques It suffices to show
(6) |
But as before
and the general result follows.
Interestingly, this bottleneck argument transfers easily to a bottleneck for the ST dynamics. The key observation is that to construct the bottleneck set for the Metropolis process, we used the same bottleneck set for all inverse temperatures and Hamiltonian vectors . Now the ST dynamics are a Metropolis process on the enlarged product space of the temperatures times the cliques. In particular, the stationary distribution of the ST dynamics is simply a mixture of the Gibbs distributions say for some weights (see Section 8.1 for the exact choice of the weights , though they are not essential for the argument). Now using (6) and that the boundary operator satisfies we immediately have
(7) |
The result then follows again using Lemma 5.5.
3.2 Worst-case Initialization for Reaching Large Cliques
The worst-case initialization lower bound for reaching cliques is also based on a bottleneck argument. This time the argument is more involved and is a combination of the bottleneck argument by Jerrum in [Jer92] which worked for and the bottleneck argument used in the previous subsection for reaching large intersection with the planted clique which worked for all
We first need some notation. For we denote by the set of -cliques of intersection with the planted clique and the set of all cliques of size . We also define the set of -cliques of intersection less than with the planted clique and analogously by the set of cliques of size less than and intersection with the planted clique
We first quickly remind the reader the argument of Jerrum. Recall the notion of a -gateway clique from [Jer92], which informally is the last clique of its size in some path starting from the empty clique and ending to a clique of size (see Section 7 for the exact definition). For we call the set of of -gateway cliques. Importantly, for any any path from the empty clique to a -clique crosses from some -gateway of size Jerrum’s bottleneck argument in [Jer92] is then based on the observation that assuming for a small enough constant, if and then
(8) |
Unfortunately, such a bottleneck argument is hopeless if since in that case most cliques of size at most are fully included in the planted clique and the ratio trivializes.
We identify the new bottleneck by leveraging more the “intersection axis” with the planted clique”. Our first observation is that a relation like (8) holds actually for all if the cliques are restricted to have low-intersection with the planted clique, i.e. for all if for small enough constant and defined as above then it holds
(9) |
The second observation, is that for the Metropolis process to hit a clique of size one needs to hit either that is a clique of size less than and intersection with the planted clique, or a clique in that is a -gateway of size and intersection less than Set this target set which we hope to be “small” enough to create a bottleneck.
We now make the following construction which has boundary included in the target set Consider the set of all cliques reachable from a path with start from the empty clique and uses only cliques not included in except maybe for the destination. It is easy to see and because of the inclusion of -gateways in the definition of one can also see that no -clique is in . Therefore using Lemma 5.5 it suffices to show that w.h.p.
(10) |
To show (10) we observe that , that is contains all cliques of size and intersection less than with the planted clique, or size less than and are disjoint with the planted clique. Indeed, it is straightforward than one can reach these cliques from a path from the empty clique without using cliques from besides maybe the destination.
A final calculation then gives
(11) |
The first term is quasipolynomially small according to (9). For the second term, notice that from the equation (5) from the previous subsection for all it holds
Now using a first and second moment argument mostly based on Lemma 5.1 we prove that for all it holds w.h.p.
Combining the above with a small case-analysis this allows us to conclude
Now since and we can always choose small enough but constant so that is at most a small enough constant and in particular
This completes the proof overview for the failure of the Metropolis process.
For the ST dynamics notice that the only way the value of is used for the construction of the bottleneck is to identify a value of so that the term is small enough, which then allows to choose the values of But now if we have a sequence of inverse temperatures with we can choose a “universal” so that for all , is small enough, leading to a “universal” bottleneck construction for all The proof then follows exactly the proof for the ST failure described in the previous subsection.
3.3 Failure when Starting from the Empty Clique
Here we explain the failure of the Metropolis process in the high temperature regime when starting from the empty clique. We show in Theorems 6.2 and 7.6 that, when starting from the empty clique which is the most natural and common choice of initialization, the Metropolis process fails to reach either cliques of intersection with the empty clique at least for any small constant , or cliques of size for any small constant .
One important observation is that since we are only considering when the process reaches either intersection or size , we may assume that the process will stop and stay at the same state once hitting such cliques. In particular, this means we can exclude from the state space all cliques of intersection and all cliques of size , and consider only cliques such that
(12) |
Indeed, the geometry of the restricted state space to these cliques are what really matters for whether the Metropolis process starting from the empty clique can reach our desired destinations or not. In particular, for sufficiently small (say, ), most of cliques satisfying Eq. 12 will have intersection and also size . Note that this partially explains why having a large planted clique of size does not help much for the tasks of interest, since for the restricted state space (those satisfying Eq. 12) most cliques do not have vertices from the planted clique and so any constant does not help.
The key property that allows us to establish our result is a notion we call the “expansion” property. Observe that, for a clique of size with intersection , the expected number of vertices which can be added to to become a larger clique is ; this is also the number of common neighbors of vertices from . Via the union bound and a concentration inequality, one can easily show that in fact, for all cliques with the number of common neighbors is concentrated around its expectation with constant multiplicative error; see Definitions 6.4 and 6.5 (where we actually only need the lower bound). This immediately implies that
which allows us to track how the size of cliques evolves for the Metropolis process, under the assumption that the intersection is always .
To actually prove our result, it will be helpful to consider the time-reversed dynamics and argue that when starting from cliques of intersection or cliques of size , it is unlikely to reach the empty clique. Suppose we have the identity Hamiltonian function for simplicity. Consider as an example the probability of hitting cliques of large intersection as in Theorem 6.2. Recall that is the collection of cliques of size for and of intersection . Then by reversibility we have that for all ,
Notice that as . Our intuition is that in fact, for any clique of size , the probability of reaching the empty clique when starting from is at most where ; that is, for all integer and all clique of size ,
(13) |
and hence we obtain
where the last inequality is because most cliques has intersection and size .
The proof of Eq. 13 utilizes the expansion property mentioned above to analyze the time-reversed dynamics for . More specifically, we introduce an auxiliary birth and death process on which is stochastically dominated by , in the sense that
Through step-by-step coupling it is easy to see that always, and thus,
This allows us to establish Eq. 13 by studying the much simpler process . Furthermore, we assume the process does not go beyond value (in fact, for any fixed constant ) so that we are in the regime where the expansion property holds; this is the reason appears.
The same approach works perfectly for bounding the probability of hitting cliques of size when starting from the empty clique, as in Theorem 7.6. For the low-temperature metropolis process with in Theorem 7.7, we also need one more observation that the process in fact does not remove any vertex in polynomially many steps and so it is equivalent to a greedy algorithm. In particular, we can apply the same approach to argue that the process never reaches cliques of size that are subsets of cliques with large intersection or large size, which have much smaller measure. For the ST dynamics in Theorem 8.5, the same approach also works but we need a more sophisticated auxiliary process for the pair of clique size and inverse temperature, along with more complicated coupling arguments.
4 Organization of Main Body
The rest of the paper is organized as follows. In Section 5 we introduce the needed definitions and notation for the formal statements and proofs. In Section 6 we present our lower bounds for reaching a clique of intersection with the planted clique. First we present the worst-case initialization result and then the case of empty clique initialization. Then in Section 7 we discuss our lower bounds for reaching a clique of size As before, we first present the worst-case initialization result, and then discuss the empty clique initialization ones. Finally, in Section 8 we present our lower bounds for the simulated tempering dynamics.
5 Getting Started
For , let to be the set of all non-negative integers which are at most . Throughout the paper, we use to represent logarithm to the base , i.e., for .
We say an event holds with high probability (w.h.p.) if .
5.1 Random Graphs with a Planted Clique
Let denote the random graph on vertices where every pair is an edge with probability independently. For , we denote by the random graph with a planted -clique, where a subset of out of vertices is chosen uniformly at random and the random graph is obtained by taking the union of and the -clique formed by those vertices.
Let be the collection of all cliques of an instance of graph , and for with let
We also define for convenience when or . Furthermore, let
We also define and .
For with , let the number of -cliques in with . Similarly, when or .
Lemma 5.1.
Let a constant and consider the random graph with a planted clique. Fix any absolute constant .
-
(1)
For any with parameter and any with parameter , it holds
w.h.p. as .
-
(2)
For and any with , it holds
w.h.p. as .
-
(3)
For any with and any satisfying the inequality , it holds
w.h.p. as .
The proof of the lemma is deferred to the Appendix.
Definition 5.2 (Clique-Counts Properties and ).
Let be an arbitrary constant.
-
(1)
(Upper Bounds) We say the random graph with a planted clique satisfies the property if the following is true: For all integers with , it holds
in particular, for and where , it holds
-
(2)
(Lower Bounds) We say the random graph with a planted clique satisfies the property if the following is true: For every integer with , it holds
Lemma 5.3.
For any constant and any constant , the random graph with a planted clique satisfies both and with probability as .
Proof.
Follows immediately from Lemma 5.1, the Markov’s inequality, and the union bound. ∎
5.2 Hamiltonian and Gibbs Measure
For given , let be an arbitrary function. For ease of notations we write and thus the function is identified by the vector . Given an -vertex graph , consider the Hamiltonian function where . For , the corresponding Gibbs measure is defined as
(14) |
Let to be the partition function given by
Furthermore, let
Assumption 5.4.
We assume that the Hamiltonian satisfies
-
(a)
;
-
(b)
is -Lipschitz, i.e., for .
5.3 Metropolis Process and the Hitting Time Lower Bound
In this work, we study the dynamics of the Metropolis process with the respect to the Gibbs measure defined in (14). The Metropolis process is a Markov chain on , the space of all cliques of . The Metropolis process is described in Algorithm 2.
The Metropolis process is an ergodic and reversible Markov chain, with the unique stationary distribution except in the degenerate case when and is the complete graph; see [Jer92].
The following lemma is a well-known fact for lower bounding the hitting time of some target set of a Markov chain using conductance; see [MWW09, Claim 2.1], [LP17, Theorem 7.4] and [AWZ20, Proposition 2.2]. We rely crucially on the following lemma for our worst-case initialization results.
Lemma 5.5.
Let be the transition matrix of an ergodic Markov chain on a finite state space with stationary distribution . Let be a set of states and let be a set of boundary states for such that for all and . Then for any there exists an initial state such that
6 Quasi-polynomial Hitting Time of Large intersection
6.1 Existence of a Bad Initial Clique
Theorem 6.1.
Let be any fixed constant. For any constant , the random graph with a planted clique satisfies the following with probability as .
Consider the general Gibbs measure given by Eq. 14 for arbitrary and arbitrary inverse temperature . There exists a constant and an initialization state for the Metropolis process from which it requires at least steps to reach a clique of intersection with the planted clique at least , with probability at least . In particular, under the worst-case initialization it fails to recover the planted clique in polynomial-time.
Proof.
Notice that we can assume without loss of generality that the constant satisfies . We pick
(15) |
Note that since . Then, by Lemma 5.3 we know that the random graph satisfies both properties and simultaneously with probability as . In the rest of the proof we assume that both and hold.
For any , let . It suffices to show that there exists a constant such that
(16) |
Indeed, given Eq. 16, Theorem 6.1 is an immediate consequence of Lemma 5.5.
By the property we have that for all where
Hence, we get from the definition of the restricted partition function that
where the last inequality again follows from . We define
Thus, we have that
(17) |
Meanwhile, we have
We deduce from and the property that
(18) |
6.2 Starting from the Empty Clique
In this section, we strengthen Theorem 6.1 for a wide range of temperatures by showing that the Metropolis Dynamics still fails to obtain a significant intersection with the planted clique when starting from the empty clique (or any clique of sufficiently small size), a nature choice of initial configuration in practice.
6.2.1 Our Result
Theorem 6.2.
Let be any fixed constant. For any constant , the random graph with a planted clique satisfies the following with probability as .
Consider the general Gibbs measure given by Eq. 14 for arbitrary -Lipschitz with and inverse temperature . Let denote the Metropolis process on with stationary distribution . Then there exist constants and such that for any clique of size at most , one has
In particular, the Metropolis process starting from the empty clique requires steps to reach cliques of intersection with the planted clique at least , with probability . As a consequence, it fails to recover the planted clique in polynomial time.
We proceed with the proof of the theorem.
6.2.2 Key Lemmas
We start with tracking how the size of the clique changes during the process. It is also helpful to consider the reversed process: show that it is unlikely to hit the empty clique when starting from some clique of size .
Definition 6.3.
For a graph and a subset of vertices, we say a vertex is fully adjacent to if is adjacent to all vertices in ; equivalently, where denotes the set of neighboring vertices of in the graph . Let denote the set of all vertices in that are fully adjacent to ; equivalently, is the set of all common neighbors in of all vertices from .
Definition 6.4 (Expansion Property ).
Let be an arbitrary constant. We say the random graph with a planted clique satisfies the expansion property if the following is true: For every with , it holds
The following lemma establishes the desired “expansion lemma” on the cliques of size less than in
Lemma 6.5 (“Expansion Lemma”).
For any constant and any constant , the random graph with a planted clique satisfies the expansion property with probability as .
The proof of the lemma is deferred to the Appendix.
The Lemma 6.5 is quite useful for us, as it allows us to obtain the following bounds on the size transitions for the Metropolis process:
(19) | ||||
(20) |
The proof is also making use of the following delicate lemma.
Lemma 6.6.
Consider the random graph with a planted clique conditional on satisfying the property and the expansion property for some fixed constant . Let be integers with . Denote also and For any we have
where and
We postpone the proof of Lemma 6.6 to Section 6.2.4 and first show how it can be used to prove Theorem 6.2. On a high level, we bound the probability of hitting by studying how the size of the clique evolves during the process up to size . The term represents the approximation error when the clique goes beyond size ; in particular, when the destination clique size is at most , we have and this error is zero. Meanwhile, the term corresponds to the fact that the number of cliques of size and intersection with the planted clique is much smaller than the total number of cliques of size , and hence reaching intersection is very unlikely.
6.2.3 Proof of Theorem 6.2, given Lemma 6.6
Proof of Theorem 6.2.
Let and recall As will be clear later, we shall choose
By Lemmas 5.3 and 6.5, the random graph satisfies both and with probability as , for the choice of given above. In the rest of the proof we assume that both and are satisfied.
Suppose that . By the union bound we have
(21) |
Now let us fix some By Lemma 6.6,
where we have used that Write for shorthand
So we have
(22) |
Given (22) it suffices to show that for all there exists such that uniformly for all values of interest of the parameters we have . Indeed, then by (21) we have
and Theorem 6.2 follows e.g. for
We now construct the desired function If , then and . Meanwhile, we have
So .
If , then and we have
where the last inequality follows from
and also
since and . Meanwhile, we have
So, we deduce that
Hence, in all cases for
This completes the proof of the theorem. ∎
6.2.4 Proof of Lemma 6.6
In this subsection we prove the crucial Lemma 6.6. Throughout this subsection we assume that the property and the expansion property hold for some fixed constant . First, recall that . We have from that
(23) |
For now, we fix a and focus on bounding The key idea to bound this probability is to exploit the reversibility of the Metropolis process. The following two standard facts are going to be useful.
Fact 6.7 ([LP17]).
If is the transition matrix of a reversible Markov chain over a finite state space with stationary distribution , then for all and all integer it holds
Fact 6.8 ([LP17]).
For a birth-death process on with transition probabilities
the stationary distribution is given by
Now, notice that using the time-reversed dynamics it suffices try to bound the probability of reaching a small clique when starting from a large clique . Indeed, by reversibility we have
(24) |
which is an application of Fact 6.7.
We introduce a birth-death process on denoted by with transition matrix given by the following transition probabilities:
Denote the stationary distribution of by , which is supported on . The process serves as an approximation of ; note that itself is not a Markov process.
The following lemma shows that is stochastically dominated by . The proof of this fact is essentially based on the expansion property , and the derived in the proof below bounds on the size transition of , described in Eqs. 19 and 20.
Lemma 6.9.
Let denote the Metropolis process starting from some . Let denote the birth-death process described above with parameter starting from . Then there exists a coupling of the two processes such that for all integer it holds
In particular, for all integer it holds
Proof.
We couple and as follows. Suppose that for some integer . We will construct a coupling of and such that . Notice that the following probability inequality is a straightforward corollary of that.
Since the probability that is less than and so does the probability of , we may couple and such that decreases at most one; namely, it never happens that increases by while decreases in size. Thus, it suffices to consider the extremal case when . We have
(25) |
The next lemma upper bounds the -step transition probability .
Lemma 6.10.
Let denote the birth-death process described above with parameter starting from and let with . Then for all integer we have
(27) |
where and .
Proof.
Next, consider the case where , or equivalently . Let be the first time that and we obtain from (28) that
This completes the proof of the lemma. ∎
We now proceed with the proof of Lemma 6.6.
Proof of Lemma 6.6.
7 Quasi-polynomial Hitting Time of Large Cliques
In this section, we present our results about the failure of the Metropolis process to even find cliques of size at least , for any planted clique size
7.1 Existence of a Bad Initial Clique
We start with the “worst-case” initialization result which now works for all inverse temperatures , but establishes that from this initialization the Metropolis process fails to find either a clique of size at least or to find a clique with intersection at least with the planted clique.
Theorem 7.1.
Let be any fixed constant. Then the random graph with a planted clique satisfies the following with probability as .
Consider the general Gibbs measure given by Eq. 14 for arbitrary satisfying 5.4 and arbitrary inverse temperature . For any constants and , there exists a constant and an initialization state for the Metropolis process from which it requires at least steps to reach
-
•
either cliques of size at least ,
-
•
or cliques of intersection with the planted clique at least ,
with probability at least .
We now present the proof of Theorem 7.1. We first need the notion of gateways as introduced by [Jer92] in his original argument for the failure of the Metropolis process.
Definition 7.2 (Gateways).
For , we say a clique is a -gateway if there exists and a sequence of cliques such that
-
(1)
For every , and differ by exactly one vertex;
-
(2)
For every , ;
-
(3)
.
Let denote the collection of all cliques that are -gateways.
Notice that by definition if a clique is a -gateway then .
Definition 7.3 (Gateway-Counts Property ).
We say the random graph with a planted clique satisfies the gateway-counts property if the following is true: For all integers , , and with parameters and , it holds
The following lemma follows immediately from arguments in [Jer92].
Lemma 7.4.
For any constant , the random graph with a planted clique satisfies the gateway-counts property with probability as .
Proof.
We follow the same approach as in [Jer92] with the slight modification that is not equal to but arbitrary. For any , there exists a set of size and a subset of size such that every vertex from is adjacent to every vertex from . To see this, consider a path as in Definition 7.2, and consider the first clique in the path such that . Such must exist since the destination clique has size while . Note that corresponds to the first time when new vertices are added. Meanwhile, since , we have and . We can thus take and any of size .
Hence, we can associate every -gateway in with a tuple satisfying all the conditions mentioned above: is a clique of size and intersection at most with , has size , has size , and are fully connected. Let denote the number of such tuples. Then, . The first moment of is given by
where in the first equality, counts the number of choices of for ranging from to , is the probability being a clique, is the number of choices of , is for , and finally is the probability of being fully connected. The lemma then follows from the Markov’s inequality
and a union bound over the choices of , , and . ∎
We now present the proof of Theorem 7.1.
Proof of Theorem 7.1.
By Lemmas 5.3 and 7.4, the random graph satisfies both the clique-counts properties and for and the gateway-counts properties simultaneously with probability as . Throughout the proof we assume that , , and are all satisfied.
Suppose is such that so that . Pick a constant such that
Let , , and . We define
to be the “bottleneck” set to which we will apply Lemma 5.5. Let denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from except possibly for the destination. The following claim, whose proof is postponed to the end of this subsection, follows easily from the definitions of and .
Claim 7.5.
-
1.
Cliques from are not adjacent to cliques from (i.e., they differ by at least two vertices);
-
2.
;
-
3.
.
Now observe from Claim 7.5 that to prove what we want, it suffices to show that starting from an appropriate state the Metropolis process does not hit any clique from in -time with probability . For a collection of cliques we write to represent the partition function restricted to the set . Since the boundary of is included in by Claim 7.5 it suffices to show that there exists a constant such that
(29) |
Given (29), Theorem 7.1 is an immediate consequence of Lemma 5.5.
Observe that we have the following inclusion,
To see this, for every clique in , it can be reached from the empty clique by adding vertices one by one in any order, so that none of the intermediate cliques are from or from , except for possibly the last one. Similarly, every clique in can be reached from the empty clique in the same way. (Note that cliques in , however, may not be reachable from without crossing ; for example, by adding vertices one by one to reach a clique of size , it may first reach a clique of size which is a -gateway with intersection .) Thus, we have
(30) |
The rest of the proof aims to upper bound the two ratios in Eq. 30 respectively.
For the first ratio, the key observation is that since , is dominated by cliques of intersection (almost completely outside the planted clique) or more rigorously speaking those with sufficiently small intersection, say, in where . Hence, we write
and combining we get
(31) |
By Lemma 5.1, the clique-counts property , and the gateway-counts property , we upper bound the first term in Eq. 31 by
(32) |
For the second term in Eq. 31, and imply that
(33) |
where the last inequality uses . Combining Eqs. 31, 7.1 and 33, we obtain
for some constant . This bounds the first ratio in Eq. 30.
For the second ratio in Eq. 30, we have from and that
(34) |
Using linearity of expectation we have that
Let
It follows that
(35) |
Meanwhile, let , and we have
(36) |
Combining Eqs. 34, 35 and 36, we get that
Let and . Then by definition we have that and . Furthermore by 5.4 we have
Then, an application of Item 1 of Lemma 5.1 implies that
If then for . If then
since . Therefore, we have for . This bounds the second ratio in Eq. 30. Hence, we establish Eq. 29 and the theorem then follows. ∎
Proof of Claim 7.5.
The first item is obvious, since if a clique is adjacent to another clique , then by appending to the path from to we get a path from to without passing through cliques from except possibly at , since . This implies that which proves the first item.
For the second item, suppose for contradiction that there exists a clique of size that is in . Clearly and so . Then there exists a path of cliques which contain no cliques from . Let for be the first clique of size in this path; that is, for . Then, the (sub)path must contain a -gateway of size , call it , as one can choose the largest for which and set . If has intersection with the planted clique less than , then , contradiction. Otherwise, has intersection at least which means at some earlier time, it will pass a clique of intersection exactly whose size is less than , which is again in leading to a contradiction. This establishes the second item.
For the third one, if is a clique of intersection , then either meaning or meaning any path from to must pass through a clique of size exactly and by the second item must pass through cliques from , implying . ∎
7.2 Starting from the Empty Clique
Theorem 7.6.
Let be any fixed constant. For any constant , the random graph with a planted clique satisfies the following with probability as .
Consider the general Gibbs measure given by Eq. 14 for arbitrary -Lipschitz with and inverse temperature . Let denote the Metropolis process on with stationary distribution . Then there exist constants and such that for any clique of size at most , one has
In particular, the Metropolis process starting from the empty clique requires steps to reach cliques of size at least , with probability .
Proof.
By Lemmas 5.3 and 6.5, the random graph satisfies both and for with probability as . In the rest of the proof we assume that both and are satisfied.
Suppose that , for and is a sufficiently small constant to be determined. Let and . First observe that if the chain arrives at a clique in , it either hits a clique from or previously reached a clique in . This implies that
It suffices to upper bound each of the two terms respectively.
Similar as in Eq. 21, we deduce from the union bound and that
It suffices to show that for all integer , all pairs
and all clique , it holds
(37) |
for some constant .
Without loss of generality we may assume that . Let and . Write with . Consider first the pair where . Suppose that with . We deduce from Lemma 6.6, with , , , and , that
This shows Eq. 37 for the first case.
For the second case, we have the pair where . Suppose with . Also recall that . Again, we deduce from Lemma 6.6, with , , , and , that
where the last inequality follows from
since . If , then . If , then
where the last inequality is because . Hence,
This shows Eq. 37 for the second case. The theorem then follows from Eq. 37. ∎
7.3 Low Temperature Regime and Greedy Algorithm
Theorem 7.7.
Let be any fixed constant. For any constant , the random graph with a planted clique satisfies the following with probability as .
Consider the general Gibbs measure given by Eq. 14 for the identity function and inverse temperature . For any constant , there exists constant such that for any clique of size at most and any constant , with probability at least the Metropolis process starting from will not reach
-
•
either cliques of size at least ,
-
•
or cliques of intersection at least with the planted clique,
within steps.
For a subset of cliques and an integer , let denote the collection of all cliques of size that are subsets of cliques from , i.e.,
For we define and similarly for , , etc.
Lemma 7.8.
Consider the random graph with a planted clique conditional on satisfying the property . For any with and any with we have
with high probability as .
Proof.
Notice that every clique of size has subsets of size . The lemma then follows immediately from . ∎
We now give the proof of Theorem 7.7.
Proof of Theorem 7.7.
By Lemmas 5.3 and 6.5, the random graph satisfies both and for with probability as . In the rest of the proof we assume that both and are satisfied.
First, a simple observation is that the process actually never remove vertices. Indeed, the probability of removing a vertex from the current clique in one step is at most . Since we run the Metropolis process for polynomially many steps, the probability that the chain ever remove a vertex is upper bounded by . Hence, in this low temperature regime the Metropolis process is equivalent to the greedy algorithm.
Without loss of generality, we may assume that
Let and . Suppose the initial state is a clique of size for . Let and . If the chain arrives at a clique in , it either hits a clique from or previously reached a clique in . Similarly, if the chain hits , then it must also reach either or . Hence, to bound the probability of reaching either or , we only need to bound the probability of reaching or . Furthermore, since the process never removes a vertex with high probability in polynomially many steps, in the case this happens it must first reach a clique from , or , or . To summarize, we can deduce from the union bound that
We bound each of the three probabilities respectively.
For the first case, we have from the union bound and that,
Suppose . We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with and for the notations of Lemma 6.6) that for every integer and every clique (note that since ),
where the last inequality follows from
Next consider the second case, where we have
Suppose and so . Also recall that . We deduce from Lemmas 5.1 and 7.8 and Lemma 6.6 (with and for the notations of Lemma 6.6) that for every integer and every clique (note that since ),
where the last inequality follows from
Finally consider the third case. Again we have
Suppose and so . Also recall that . We deduce from Lemma 6.6 that for every integer and every clique ,
where the last inequality follows from
Therefore, we conclude that
as we wanted. ∎
8 Simulated Tempering
In this section, we discuss our lower bounds against the simulated tempering versions of the Metropolis process.
8.1 Definition of the dynamics
We start with the formal definition. Suppose for some we have a collection of inverse temperatures . For each , let denote an estimate of the partition function . The simulated tempering (ST) dynamics is a Markov chain on the state space , which seeks to optimize a Hamiltonian defined on , say given according to for an arbitrary vector . The transition matrix is given as follows.
-
•
A level move: For and such that and differ by exactly one vertex,
-
•
A temperature move: For such that ,
Some remarks are in order.
Remark 8.1.
The stationary distribution of the ST dynamics can be straightforwardly checked to be given by , for the generic Gibbs measure defined in Eq. 14. Notice that if for all then we have
and along a single temperature the ST dynamics is identical to the Metropolis process introduced in Section 5.3.
Remark 8.2.
The use of estimates of the partition function in the definition of the ST dynamics, as opposed to the original values is naturally motivated from applications where one cannot efficiently compute the value of to decide the temperature move step.
8.2 Existence of a Bad Initial Clique
We now present our lower bound results which are for the ST dynamics under a worst-case initialization.
Our first result is about the ST dynamics failing to reach intersection with the planted clique, similar to the Metropolis process according to Theorem 6.1. Interestingly, the lower bound holds for any choice of arbitrarily many temperatures and for any choice of estimators of the partition function.
Theorem 8.3.
Let be any fixed constant. For any constant , the random graph with a planted clique satisfies the following with probability as .
Consider the general ST dynamics given in Section 8.1 for arbitrary arbitrary , arbitrary inverse temperatures , and arbitrary estimates . Then there is an initialization pair of temperature and clique for the ST dynamics from which it requires -time to reach a pair of temperature and clique where the clique is of intersection with the planted clique at least with probability at least . In particular, under worst-case initialization it fails to recover the planted clique in polynomial-time.
Proof.
Throughout the proof we assume that both and are satisfied for given by Eq. 15, which happens with probability as by Lemma 5.3.
Notice that we can assume without loss of generality that satisfies . We let . Now from the proof of Theorem 6.1 we have that for any such there exists a constant such that for all and the corresponding Gibbs measure per Eq. 14,
(38) |
w.h.p. as We now consider the set the subset of the state space of the ST dynamics, and notice where is the boundary of . In particular using (38) we conclude that w.h.p. as
(39) |
Given Eq. 39, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎
Our second result is about the ST dynamics under the additional restriction that In this case, similar to the Metropolis process per Theorem 7.1, we show that the ST dynamics fail to reach either -cliques or cliques with intersection at least with the planted clique. Interestingly, again, the lower bound holds for any choice of arbitrarily many temperatures of magnitude and for any choice of estimators of the partition function.
Theorem 8.4.
Let be any fixed constant. Then the random graph with a planted clique satisfies the following with probability as .
Consider the general ST dynamics given in Section 8.1 for arbitrary satisfying 5.4, arbitrary , arbitrary inverse temperatures with , and arbitrary estimates . For any constants and , there is an initialization pair of temperature and clique for the ST dynamics from which it requires -time to reach a pair of temperature and clique where
-
•
either the clique is of size at least ,
-
•
or the clique is of intersection at least with the planted clique,
with probability at least .
Proof.
Throughout the proof we assume that , , and are all satisfied for , which happens with probability as by Lemmas 5.3 and 7.4.
We start with following the proof of Theorem 7.1. For let be such that . By assumption we have . Pick a constant such that for all
Let , , and . Let also
Let also denote the collection of cliques that are reachable from the empty clique through a path (i.e. a sequence of cliques where each adjacent pair differs by exactly one vertex) not including any clique from except possibly for the destination.
From the proof of Theorem 7.1 we have that there exists a constant such that for all and the corresponding being the Gibbs measure per Eq. 14,
(40) |
w.h.p. as We now consider the set the subset of the state space of the ST dynamics, and notice In particular using (40) we conclude that w.h.p. as
(41) |
Given Eq. 41, Theorem 8.3 is an immediate consequence of Lemma 5.5. ∎
8.3 Starting From the Empty Clique
Theorem 8.5.
Let be any fixed constant. Then the random graph with a planted clique satisfies the following with probability as .
Consider the general ST dynamics given in Section 8.1 for monotone -Lipschitz with , arbitrary inverse temperatures with and , and arbitrary estimates . For any constants , , and , the ST dynamics starting from will not reach within steps a pair of temperature and clique where
-
•
either the clique is of size at least ,
-
•
or the clique is of intersection at least with the planted clique,
with probability .
In what follows we condition on both and for , which by Lemmas 5.3 and 6.5 hold with probability as . Also, we may assume that
(42) |
for all . Otherwise, for any clique of size one has
since , and . This means that the chain with high probability will never make a move to the inverse temperature in polynomially many steps, unless already having clique size, say, . Since we are studying reaching cliques of size , we may assume that Eq. 42 holds for all , by removing those temperatures violating Eq. 42 and those larger since the chain does not reach them with high probability in steps. An immediate corollary of Eq. 42 is that
(43) |
Let and . By the union bound we have
(44) |
We will show that for all integer , all integer , all integer , and all clique , it holds
(45) |
and all integer , and all clique , it holds
(46) |
It will be helpful to consider the time-reversed dynamics and try to bound the probability of reaching when starting from a large clique . By reversibility, we have
(47) |
which is an application of Fact 6.7.
Let be a constant. Introduce a random walk on with transition matrix given by
and for all ,
To be more precise, the above definition of applies when and we assume if , e.g., when and .
We now calculate the stationary distribution of on states when restricted to . We start with proving that the random walk is time-reversible. Note that the random walk introduced is clearly aperiodic, positive recurrent and irreducible. Hence, by the Kolmogorov’s criterion the random walk is time-reversible if and only if for any cycle in the state space, the probability the random walk moves along the cycle in one direction equals to the probability of moving in the opposite direction. Given that the minimal cycles in the finite box are simply squares of the form , it suffices to show that for any the criterion solely for these cycles, that is to show
which can be straightforwardly checked to be true.
We now calculate the stationary distribution. Using the reversibility and that is monotonically increasing in , we have for arbitrary
Furthermore, since we have again by reversibility,
Combining the above, we conclude
(48) |
The following lemma shows that is stochastically dominated by the pair .
Lemma 8.6.
Let denote the Simulated Tempering process starting from some and . Let denote the stochastic process described above with parameter starting from and . Assume that satisfies the conclusion of Lemma 6.5 with parameter , then there exists a coupling of the two processes such that for all integer it holds
In particular, for all integer it holds
Proof.
We couple and as follows. Suppose that for some integer . We will construct a coupling of and such that . With probability , the two chains both attempt to update the first coordinate, and with probability the second.
Consider first updating the first coordinate. Since the probability that is less than and so does the probability of , we may couple and such that decreases at most one; namely, it never happens that increases by while decreases in size. Thus, it suffices to consider the extremal case when . Since , we have and thus
So we can couple and such that either or , . Meanwhile, recall that is the set of vertices such that . Then we have
whenever by Lemma 6.5. Hence, we deduce that
So we can couple and such that either or and , as desired.
Next we consider update the second coordinate. Since the probability that is less than and so does the probability of , we can couple and such that it never happens both and . This means that it suffices for us to consider the extremal case where . Since , we have and therefore
and similarly,
Therefore, we can couple and such that only one of the followings can happen:
-
(i)
;
-
(ii)
and ;
-
(iii)
and .
Therefore, one always has under the constructed coupling. ∎
The next lemma upper bounds the -step transition probability .
Lemma 8.7.
Let denote the Markov process on described above with parameter starting from and . Then for all integer we have
where .
Proof.
We now present the proof of Theorem 8.5 provided Lemmas 8.6 and 8.7.
Proof of Theorem 8.5.
Recall that . As will be clear later, we define
We assume that our graph satisfies the conclusions of Lemmas 5.1 and 6.5 with parameter .
Consider first Eq. 45. From Lemmas 8.6 and 8.7, we deduce that
where recall that and and we choose .
9 Conclusion
In this work we revisit the work by Jerrum [Jer92] that large cliques elude the Metropolis process. We extend [Jer92] by establishing the failure of the Metropolis process (1) for all planted clique sizes for any constant (2) for arbitrary temperature and Hamiltonian vector (under worst-case initialization), (3) for a large family of of temperatures and Hamiltonian vectors (under the empty clique initialization) and obtain as well (4) lower bounds for the performance of the simulated tempering dynamics.
An important future direction would be to explore the generality of our proposed reversibility and birth and death process arguments which allowed us to prove the failure of the Metropolis process when initialized at the empty clique. It is interesting to see whether the proposed method can lead to MCMC lower bounds from a specific state in other inference settings beyond the planted clique model.
Moreover, it would be interesting to see if our results can be strengthened even more. First, a current shortcoming of our lower bounds for the Metropolis process when initialized from the empty clique do not completely cover the case where for an arbitrary constant . While we almost certainly think the result continues to hold in this case, some new idea seem to be needed to prove it. Second, it seems interesting to study the regime where Recall that there are polynomial-time algorithms that can find a clique of size whenever a (worst-case) graph has a clique of size , for some constant [Fei05]. If our lower bounds for the Metropolis process could be extended to the case , this would mean that for some the Metropolis process fails to find in polynomial-time a clique of size when a -clique is planted in while some other polynomial-time algorithm can find a clique of size on every (worst-case) graph which has a clique of size Such a result, if true, will make the failure of the Metropolis process even more profound.
Acknowledgement
EM and IZ are supported by the Simons-NSF grant DMS-2031883 on the Theoretical Foundations of Deep Learning and the Vannevar Bush Faculty Fellowship ONR-N00014-20-1-2826. EM is also supported by the Simons Investigator award (622132).
References
- [ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940–969, 2014.
- [AFdF21] Maria Chiara Angelini, Paolo Fachin, and Simone de Feo. Mismatching as a tool to enhance algorithmic performances of monte carlo methods for the planted clique model, 2021.
- [AFUZ19] Fabrizio Antenucci, Silvio Franz, Pierfrancesco Urbani, and Lenka Zdeborová. On the glassy nature of the hard phase in inference problems. Physical Review X, 9:011020, January 2019.
- [AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457–466, 1998.
- [AWZ20] Gérard Ben Arous, Alexander S. Wein, and Ilias Zadik. Free energy wells and overlap gap property in sparse PCA. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 479–482. PMLR, 2020.
- [BAGJ20] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor pca. Annals of Probability, 48:2052–2087, 07 2020.
- [BB20] Matthew Brennan and Guy Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
- [BE76] B. Bollobas and P. Erdös. Cliques in random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 80(3):419–427, 1976.
- [BHK+19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.
- [BR04] Nayantara Bhatnagar and Dana Randall. Torpid mixing of simulated tempering on the potts model. In SODA, volume 4, pages 478–487. Citeseer, 2004.
- [BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.
- [DGGP14] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. Combinatorics, Probability and Computing, 23(1):29–49, 2014.
- [DM15] Yash Deshpande and Andrea Montanari. Finding hidden cliques of size in nearly linear time. Foundations of Computational Mathematics, 15(4):1069–1128, 2015.
- [DSC06] Persi Diaconis and Laurent Saloff-Coste. Separation cut-offs for birth and death chains. The Annals of Applied Probability, 16(4):2098–2122, 2006.
- [Fei05] Uriel Feige. Approximating maximum clique by removing subgraphs. SIAM J. Discret. Math., 18(2):219–225, feb 2005.
- [FGR+17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
- [GM75] G. R. Grimmett and C. J. H. McDiarmid. On colouring random graphs. Mathematical Proceedings of the Cambridge Philosophical Society, 77(2):313–324, 1975.
- [GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property, 2019.
- [Jer92] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.
- [Kar79] Richard M Karp. Recent advances in the probabilistic analysis of graph-theoretic algorithms. In International Colloquium on Automata, Languages, and Programming, pages 338–339. Springer, 1979.
- [Kuč95] Luděk Kučera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2-3):193–212, 1995.
- [LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- [MBC+20] Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Complex dynamics in simple neural networks: Understanding gradient flow in phase retrieval, 2020.
- [MKUZ19] Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4333–4342. PMLR, 09–15 Jun 2019.
- [MP92] Enzo Marinari and Giorgio Parisi. Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6):451, 1992.
- [MW15] Zongming Ma and Yihong Wu. Computational barriers in minimax submatrix detection. The Annals of Statistics, 43(3):1089–1116, 2015.
- [MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143(3):401–439, 2009.
- [RF10] Dorit Ron and Uriel Feige. Finding hidden cliques in linear time. Discrete Mathematics & Theoretical Computer Science, 2010.
- [RM14] Emile Richard and Andrea Montanari. A statistical model for tensor pca. Advances in Neural Information Processing Systems, 27, 2014.
Appendix A Deferred Proofs
Proof of Lemma 5.1.
For part (1), notice that by linearity of expectation and the elementary application of Stirling’s formula that for with it holds we have
For part (2), notice that is distributed as the number of -cliques in . Hence, standard calculation (e.g. [BE76, Proof of Theorem 1]) prove that since
Hence, Chebyshev’s inequality yields that with probability Taking a union bound over the different values of completes the proof of this part.
Finally, part (3) follows directly from part (1), Markov’s inequality and a union bound over the possible values of . ∎
Proof of Lemma 6.5.
It clearly suffices to establish this result for , i.e. for an the random graph For any fixed follows a Binomial distribution with trials and probability In particular, it has a mean . Hence, by Hoeffding’s inequality with probability it holds . As there are only the result follows from a union bound over . ∎