Limits of Sequential Local Algorithms on the Random -XORSAT Problem
Abstract
The random -XORSAT problem is a random constraint satisfaction problem of Boolean variables and clauses, which a random instance can be expressed as a linear system of the form , where is a random matrix with ones per row, and is a random vector. It is known that there exist two distinct thresholds such that as for the random instance has solutions with high probability, while for the solution space shatters into an exponential number of clusters. Sequential local algorithms are a natural class of algorithms which assign values to variables one by one iteratively. In each iteration, the algorithm runs some heuristics, called local rules, to decide the value assigned, based on the local neighborhood of the selected variables under the factor graph representation of the instance.
We prove that for any the sequential local algorithms with certain local rules fail to solve the random -XORSAT with high probability. They include (1) the algorithm using the Unit Clause Propagation as local rule for , and (2) the algorithms using any local rule that can calculate the exact marginal probabilities of variables in instances with factor graphs that are trees, for . The well-known Belief Propagation and Survey Propagation are included in (2). Meanwhile, the best known linear-time algorithm succeeds with high probability for . Our results support the intuition that is the sharp threshold for the existence of a linear-time algorithm for random -XORSAT.
Our approach is to apply the Overlap Gap Property OGP framework to the sub-instance induced by the core of the instance, instead of the whole instance. By doing so, the sequential local algorithms can be ruled out at density as low as , since the sub-instance exhibits OGP at much lower clause density, compared with the whole instance.
1 Introduction
1.1 Background
The -XORSAT problem is a Boolean constraint satisfaction problem and closely related to the well-known -SAT problem. An instance of the -XORSAT problem consists of clauses in Boolean variables. Each clause is a Boolean linear equation of variables of the form , where is the modulo-2 addition. By convention, when we say an XORSAT instance, without the prefix ””, we mean the same except we do not require the clauses to have exactly variables. An assignment to the variables is a mapping from the set of all variables to the set of the two Boolean values. By abusing the notation, we can write it as the Boolean vector containing the assigned values. The distance between any two assignments and is defined to be the Hamming distance . A clause is satisfied by an assignment if the equation of the clause holds when the variables are replaced by the corresponding assigned values, and an instance of the -XORSAT problem is satisfied by an assignment if all its clauses are satisfied by the assignment. An instance is satisfiable if it has at least one satisfying assignment, or unsatisfiable if it does not have any satisfying assignment. The assignment that satisfies the instance is called a solution for the instance. The set of all satisfying solutions for the instance is called the solution space of the instance, denoted by . We are interested in the complexity of finding a solution.
Since each clause is just a Boolean linear equation, an instance can be viewed as a Boolean linear system , where is an Boolean matrix and is a vector of length . Note that each row in contains exactly ones, since each clause has exactly variables. We can see that finding solutions for a -XORSAT instance is equivalent to solving a Boolean linear system, and the solution space is an affine space inside . By abusing the notation, we can simply write , and the terms ”clause” and ”equation” are interchangeable here.
We are particularly interested in random instances of the -XORSAT problem. In a random instance , each clause is drawn over all possibilities, independently. In particular, the left-hand side of the equation is the modulo-2 sum of variables chosen uniformly from possibilities, and the right-hand side is either 0 or 1 with even probabilities. Therefore, a random instance of the -XORSAT problem is drawn uniformly from the ensemble of all possible instances of the -XORSAT problem with variables and clauses, each clause containing exactly variables, and we denote this by . We focus on the regime in which the number of variables goes to the infinity and the number of clauses is proportional to the number of variables , that is, , where is a constant independent of and called the clause density.
Since a -XORSAT instance can be represented by a system of linear equations in , given an instance, some standard linear algebra methods such as the Gaussian elimination can determine whether the instance is satisfiable, find a solution, and even count the total number of solutions, in polynomial time. However, beyond this particular algebraic structure, some variants of the -XORSAT problem is hard to solve. Achlioptas and Molloy mentioned in their paper [AM15] that random instances of the -XORSAT problem seems to be extremely difficult for both generic CSP solvers and SAT solvers, which do not take the algebraic structure into account. Guidetti and Young [GY11] suggested that the random -XORSAT is the most difficult for random walk type algorithms such as WalkSAT, among many random CSPs. The difficulty of solving random -XORSAT instances becomes more apparent when we only consider linear-time algorithms as efficient algorithms, since we do not have linear-time algebraic method to solve a linear system in general.
Many studies suggest that the difficulties of solving random CSPs are related to the phase transition of the solution spaces when the clause density grows. (We will have more detailed discussion in Section 1.3.) Pittel and Sorkin [PS16] obtained the sharp satisfiability threshold of random -XORSAT, for general : The random -XORSAT instance is satisfiable w.h.p. when , and it is unsatisfiable w.h.p. when . (We say an event , depending on a number , occurs with high probability, or shortened to w.h.p., if the probability of the event occurring converges to 1 as goes to the infinity, that is, .) Furthermore, Ibrahimi, Kanoria, Kraning and Montanari [IKKM12] obtained the sharp clustering threshold , which is less than , of random -XORSAT for . When , w.h.p. the solution space of a random -XORSAT instance is ”well-connected”. When , w.h.p. the solution space of a random -XORSAT instance shatters into an exponential number of ”well-separated clusters”. In [IKKM12], They also provided a linear-time algorithm that can solve a random -XORSAT instance w.h.p. for . On the other hand, no algorithm is known to be able to find a solution for a random -XORSAT instance with non-vanishing probability in linear time, for in which solutions exist with high probability.
In this work, we consider a natural class of algorithms, called sequential local algorithms. A sequential algorithm selects an unassigned variable randomly and assigns a value to it, iteratively, until every variable has an assigned value. In each iteration of the algorithm, to decide the assigned value, the algorithm runs some heuristic called local rules which return a value , and decide the assigned value to be either 0 or 1 randomly, according to the Bernoulli distribution with parameter . Ideally, if in each iteration the local rule can calculate the exact marginal probability of the selected variable over a randomly chosen solution for the instance conditioned on fixing all previously selected variables to their assigned values, the algorithm should be able to find a solution. However, we restrict the ability of the local rules by only providing local structure to the local rules. To explain the meaning of ”local”, we first construct a graphical representation for the -XORSAT instances: the factor graph. The factor graph of a -XORSAT instance is constructed in the following way: (1) each variable is represented by a variable node; (2) each equation is represented by an equation node; (3) add an undirected edge if the variable is involved in the equation . Note that since there is a one-to-one correspondence between variables (equations) and variable nodes (equation nodes), in this paper, the terms variables (equations) and variable nodes (equation nodes) are interchangeable. The distance between any two nodes is the number of edges in the shortest path connecting the two nodes. For any integer , the local neighborhood with radius of a node is the subgraph of induced by all nodes with distances less than or equal to from the node . By ”local” in the name of the algorithms, it means the local rules only takes the local neighborhood of the selected variable, of radius , as its input.
The actual implementation of a sequential local algorithm depends on the choice for the local rules. To emphasize the choices for the local rules of the algorithms, the sequential local algorithm with the given local rule is called the -decimation algorithm DEC. The formal definitions of the sequential local algorithms, as well as the -decimation algorithms, will be given in Section 1.4. Note that if the local rule takes constant time, then the -decimation algorithm also takes linear time.
We introduce a notion of freeness to the sequential local algorithms. For any iteration in which the local rule returns , we call it a free step. Intuitively, a free step means the local rule cannot obtain useful information from the local structure and let the algorithms make a random guess for the assigned value. We say a -decimation algorithm is -free if w.h.p. it has at least free steps. Moreover, we say a -decimation algorithm is strictly -free if it is -free for some . If the -decimation algorithm is -free with large , it means the algorithm makes a lot of random guess on the assigned values, and it is likely that the local rule cannot extract useful information from the local structure to guide the algorithm. This leads to our contribution described in the next section.
1.2 Main contribution
The main contribution of this work consists of two parts. The first part is to show that as if the -decimation algorithm is strictly -free then w.h.p. it fails to find a solution for the random -XORSAT instance, when the clause density is beyond the clustering threshold but below the satisfiability threshold . This can be formally written as Theorem 1. The value given in Theorem 1 is an upper bound of the number of variables removed from the instance in order to obtain the sub-instance called core instance, which is crucial in our proof. We will discuss its meaning in detail in Section 2.
Theorem 1.
For any and , if the -decimation algorithm DEC is strictly -free on the random -XORSAT instance , then w.h.p. the output assignment from the algorithm DEC on input is not in the solution space , that is,
where is the largest solution of the fixed point equation , and is the real-valued function given by .
Note. This theorem can also be applied to the non-sequential local algorithms in which the algorithms run their local rules and decide the assigned value for each variable, in parallel, without depending on the other assigned values. We will briefly discuss the reason at the end of Section 1.6 where we discuss the proof technique we used.
The best known linear algorithm of finding a solution for the random -XORSAT instance succeeds w.h.p., for and [IKKM12]. That means these sequential local algorithms do not outperform the best known linear algorithm. Note that is where the best known linear algorithm succeeds up to, and where the sequential local algorithms starts failing. These support the intuition that is the sharp threshold of the existence of a linear-time algorithm for random -XORSAT problem.
The second part of our contribution is to verify that the ”freeness” condition in Theorem 1 is satisfied by the -decimation algorithm with certain local rules . One of them is the simplest local rule, the Unit Clause Propagation UC, which tries to satisfy the unit clause on the selected variable if exists, or make random guess otherwise. By using the Wormald’s method of differential equations to count the number of free steps run by UC-decimation algorithm DECUC, we can show that it is strictly -free on the random -XORSAT instance for , which leads to the following theorem.
Theorem 2.
For any , , given a random -XORSAT instance , we denote by DECUC the output assignment from the UC-decimation algorithm DECUC on input . Then, we have
In each iteration, the role of the local rules is to calculate the marginal probability of the selected variable in the instance conditioned on fixing all previously selected variables to their assigned values. Belief Propagation BP and Survey Propagation SP are surprisingly good at approximating marginal probabilities of variables over randomly chosen solutions in many random constraint satisfaction problems empirically [MPZ02, GS02, BMZ05, RS09, KSS09]. In particular, it is well-known that they can calculate the exact marginal probabilities of variables when the underlying factor graph is a tree, which is proved analytically. If Belief Propagation BP and Survey Propagation SP are used as the local rule , it is natural to expect that the -decimation algorithm can find a solution. However, we prove that even the local rule can give the exact marginal probabilities of variables over a randomly chosen solution for any instance whose factor graph is a tree, the -decimation algorithm still cannot find a solution w.h.p. for . We know that w.h.p. the local neighborhood of the factor graph of the random -XORSAT instance is a tree. Therefore, running BP and SP on the local neighborhood actually gives the exact marginal probabilities of the selected variables, with respect to the sub-instance induced by the local neighborhood. This implies that both BP-decimation algorithm DECBP and SP-decimation algorithm DECSP fail to find a solution w.h.p. for .
Theorem 3.
For any , , given a random -XORSAT instance , denote by DEC the output assignment from the -decimation algorithm DECUC on input . Assume the local rule outputs the exact marginal probability of a selected variable for any instance whose factor graph is a tree. Then, we have
To prove Theorem 2 and Theorem 3, we only need to calculate the number of free steps in DECUC and the number of free steps in DEC with the assumption on described in Theorem 3. If the number of free steps is strictly greater than with high probability, we know that they are strictly -free, and the results follow immediately by applying Theorem 1. Similarly, to obtain the same results for other -decimation algorithms, all we need to do is to calculate the number of free steps for those algorithms. If they are strictly -free, we can obtain the same results by applying Theorem 1. Note that, due to certain limitations of our calculation, our results are limited to in Theorem 2 and in Theorem 3. We believe that the results hold for general , and can be proved by improving some subtle calculation in our argument.
Although we only show a few of implementations of the sequential local algorithms, we believe that the results are general across many sequential local algorithms with different local rules due to Theorem 3. In the framework of sequential local algorithms, the role of the local rules is to approximate the marginal probabilities of the selected variables over a random solution for the instance induced by the local neighborhood centered at the selected variables. Therefore, we believe that for any local rule that can make a good approximation on the marginals, it shall give similar results as Theorem 3. (Note that a more general definition of ”-free” may be useful, for example, we can say a -decimation algorithm is -free if we have , where is the value returned by the local rule, for at least iterations.)
It is worth to mention the differences between these implementations of the sequential local algorithms and their well-known variants. Firstly, the UC-decimation algorithm is slightly different from the well-known Unit Clause algorithm. Under the framework of sequential local algorithm, the variables are selected in a random order. However, in the well-studied Unit Clause algorithm the variables in unit clauses are selected first [AKKT02]. The difference in the variable order could be crucial to the effectiveness of the Unit Clause algorithm. Secondly, the BP-decimation algorithm and the SP-decimation algorithm are slightly different from the Belief Propagation Guided Decimation Algorithm [Coj17] and Survey Propagation Decimation Algorithm [BZ04, BMZ05, MMW07]. In the framework of sequential local algorithms, we only provide the local neighborhood to BP and SP. It is equivalent to bounding the number of messages passed in each decimation step by the constant in BP-guided Decimation Algorithm and SP-guided Decimation Algorithm. It is in contrast to many empirical studies of BP-guided Decimation Algorithm and SP-guided Decimation Algorithm which allow the message passing iteration to continue until it converges.
Moreover, this work provides a new variant of the overlap gap property method, which was originally introduced by Gamarnik and Sudan [GS17b]. Instead of considering the overlap gap property of the whole instance, we utilize that property of a sub-instance of the random -XORSAT instance. In particular, the proof of this work is inspired by [GS17b], which uses the overlap gap property method to rule out the class of balanced sequential local algorithms from being able to solve random NAE--SAT problem when the clause density is close to the satisfiability threshold. Instead of directly applying the method on the whole instance, we focus on the sub-instance induced by the 2-core of the factor graph of the instance. This modification help us obtain tight bounds of algorithmic threshold unlike [GS17b]. If we apply the original overlap gap property method and use the first moment method to obtain the property, we are able to show that the sequential local algorithms fail to solve the random -XORSAT problem when the clause density is lower than a certain threshold . However, that threshold is much higher than the . It only tells us that the algorithms fails when the density is very close to the satisfiability threshold . With our modification, we can lower that threshold to exactly , namely, the algorithms fail in finding a solution when the clause density is as low as the clustering threshold. This opens a new possibility to improve other results which use the overlap gap property method on other random constraint satisfaction problems.
1.3 Phase transition of random -XORSAT
Many random ensembles of constraint satisfaction problems CSPs such as random -SAT and random NAE--SAT are closely related to the random -XORSAT. For example, the well-known random -SAT is analogous to the random -XORSAT, in the sense that we can obtain a -XORSAT instance from a -SAT instance by replacing OR operators with XOR operators. We are particularly interested in the existences of some sharp thresholds on the clause density in which the behaviors of a random instance changes sharply when the clause density grows and passes through those thresholds. The following three thresholds are particularly of interest.
-
1.
The satisfiability threshold separates the regime where w.h.p. the random instance is satisfiable and the regime where w.h.p. it is unsatisfiable.
-
2.
The clustering threshold separates the regime where w.h.p. the solution space can be partitioned into well-separated subsets, each containing an exponential small fraction of solutions, and the regime where w.h.p. the solution space cannot be partitioned in this way.
-
3.
The algorithmic threshold separates the regime where we have an efficient algorithm that can find a solution for a satisfiable random instance with non-vanishing probability, and the regime where no such algorithm exists.
Many random constraint satisfaction problems such as random -SAT, random NAE--SAT and random graph coloring share the following phenomena related to these thresholds [AC08].
-
•
There is an upper bound of the (conjectured) satisfiability threshold.
-
•
There is a lower bound of the (conjectured) satisfiability threshold, from the non-constructive proof, and the lower bound is essentially tight.
-
•
There are some polynomial time algorithms that can find a solution when the density is relatively low, but no polynomial time algorithm is known to succeed when the density is close to the satisfiability threshold. This leads to a conjectured algorithmic threshold, which is asymptotically below the (conjectured) satisfiability threshold.
-
•
The clustering phenomenon takes place when the density is greater than a (conjectured) clustering threshold, and this threshold is close to or even asymptotically equal to the algorithmic threshold.
It is worth to mention that not every random constraint satisfaction problems share this set of phenomena. The most notable example is symmetric binary perceptron (SBP). Its satisfiability threshold was established by [PX21, ALS22b]. They also showed that SBP exhibits clustering property and almost all clusters are singletons, for clause density . On the other hand, [ALS22a] gave a polynomial-time algorithm that can find solutions w.h.p., for low clause density. Therefore, there is a regime of low clause density in which SBP exhibits clustering property, and it is solvable in polynomial time, simultaneously. Its clustering phenomenon does not cause the hardness.
Being analogous to the random -SAT problem, the random -XORSAT problem shares those phenomena with other random constraint satisfaction problems. However, the story is slightly different in the random -XORSAT problem. Since the -XORSAT instances are equivalent to Boolean linear systems, their solution spaces are simply some affine subspaces in the Hamming hypercube . Because of their algebraic structures, we are able to obtain the existence and the (-independent) value of the satisfiability threshold of the random -XORSAT problem. Dubois and Mandler [DM02] proved that there exists an -independent satisfiability threshold for , and determined the exact value of the sharp threshold by the second moment argument. Pittel and Sorkin [PS16] further extended it for general . An independent work on cuckoo hashing from [DGM+10] also included an argument of the -XORSAT satisfiability threshold for general . Those proofs consider the 2-core of the hypergraph associated with the random -XORSAT instance. (In graph theory, a -core of a (hyper)graph is a maximal subgraph in which all vertices have degree at least .) Based on the 2-core, we can construct a sub-instance, called 2-core instance or simply core instance of the original instance. One can prove that the original instance is satisfiable if and only if core instance is satisfiable. [CDMM03, MRZ03] studied the core instance and determined the satisfiability threshold of the core instance, which can be converted to the satisfiability threshold of the random -XORSAT instance.
Mézard, Ricci-Tersenghi and Zecchina [MRZ03] started the study of the clustering phenomenon of random -XORSAT, and linked it to the existence of the non-empty 2-core instance. From [PSW96, Mol05, Kim04], we know the non-empty 2-core of random hypergraphs suddenly emerges at a critical edge density . After that, Ibrahimi, Kanoria, Kraning and Montanari [IKKM12], and Achlioptas and Molloy [AM15] independently proved that there exists the clustering threshold , which is equal to and smaller than , such that w.h.p. the solution space is a connected component for density , and w.h.p. the solution space shatters into exponentially many -separated components for density , provided that we consider the solution space as a graph in which we add an edge between two solutions if their Hamming distance is .
As we mentioned before, the random -XORSAT instance can be written as a random Boolean linear system, so it can be solved in polynomial time by using linear algebra method, regardless the clause density. For example, Gaussian elimination can solve it in time. Since we do not have linear time algebraic method to solve linear system, we can still study the algorithmic phase transition if we only consider linear-time algorithms as efficient algorithms. In the proofs in [IKKM12], they provided an algorithm that can find a solution in linear time when , which implies that is a lower bound of the (linear-time version of) algorithmic threshold of the random -XORSAT problem. We conjecture that no algorithm can solve the random -XORSAT problem in linear time with non-vanishing probability when , which implies is an upper bound of and thus . This would lead to the intimate relation between the failure of linear time algorithms on random -XORSAT and the clustering phenomenon of its solution space.
1.4 Sequential local algorithms
Sequential local algorithms are a class of algorithms parametrized by a local rule that specifies how values should be assigned to variables based on the ”neighborhoods” of the variables. Given a local rule , the sequential local algorithm can be written as the following -decimation algorithm.
Given a fixed even number , we denote by the set of all instances in which each of those instances has exactly one of its variables selected as root, and all nodes in its factor graph have distances from the root variable node at most . A local rule is defined to be a function , mapping from to the interval . Given an instance , since the local neighborhood of a variable node represents a sub-instance of induced by all nodes having distance at most from the root variable node , we have and is well-defined. Then, the -decimation algorithm can be expressed as the followings.
For any , if the value given by the local rule in the -th iteration is , then we call that iteration a free step. In a free step, the -decimation algorithm simply assigns a uniformly random Boolean value to the selected variable. On the contrary, if the value given by the local rule in the -th iteration is either 0 or 1, then we call that iteration a forced step. In a forced step, the -decimation algorithm is forced to assign a particular value to the selected variable according to the value . To simplify our discussion, we introduce the following definitions for those -decimation algorithms having certain numbers of free steps.
Definition 1.
For any , we say a -decimation algorithm DEC is -free on the random -XORSAT instance if w.h.p the -decimation algorithm on input has at least free steps.
Definition 2.
For any , we say a -decimation algorithm DEC is strictly -free on the random -XORSAT instance if there exists such that the -decimation algorithm DEC is -free on .
There are many choices for the local rules . The simplest one is the Unit Clause Propagation UC. In each iteration, after selecting the unassigned variable , UC checks whether there exists a unit clause (clause with one variable) on the variable . If yes, then UC sets to be the right-hand-side value of the unit clause, which can force the decimation algorithm to pick the suitable value to satisfy that clause. In this case, this iteration is a forced step. (If there are multiple unit clauses on the selected variable , then only consider the one with the lowest index.) If there is no unit clause on the selected variable , then UC sets to , which let the algorithm choose the assigned value randomly. In this case, this iteration is a free step.
1.5 Message passing algorithms
A new challenger to break the algorithmic threshold came out from statistical mechanics. In experiments [MPZ02, GS02, BMZ05, RS09, KSS09], the message passing algorithms demonstrated their high efficiency on finding solutions of random -SAT problem with the densities close to the satisfiability threshold. Those algorithms include Belief Propagation Guided Decimation Algorithm and Survey Propagation Guided Decimation Algorithm, which are based on the insightful but non-rigorous cavity method from statistical mechanics [BMZ05, RS09]. Unfortunately, several analyses showed that they do not outperform the best known algorithms for some problems. Coja-Oghlan [Coj17] showed that BP-guided Decimation fails to find solutions for random -SAT w.h.p. for density above for a universal constant , and thus does not outperform the best known algorithm from [Coj10]. Hetterich [Het16] also gave the same conclusion for SP-guided Decimation by showing that it fails w.h.p. for density above .
For random NAE--SAT, Gamarnik and Sudan [GS17b] showed that the balanced sequential local algorithms fail to find solutions for density above for sufficiently large . This means the algorithms do not outperform the best known algorithm, Unit Clause algorithm, which can find solutions w.h.p. for density up to for some universal constant for sufficiently large [AKKT02]. The framework of balanced sequential local algorithms also covers BP-guided Decimation and SP-guided Decimation with the number of message passing iterations is bounded by .
In our work, we obtain an analogous result. In Theorem 1, we show that w.h.p. strictly -free sequential local algorithms fails to solve the random -XORSAT problem when the clause density exceeds the clustering threshold. Then, in Theorem 3, we show that any sequential local algorithm with local rule that can compute the exact marginals are strictly -free and thus fails to find a solution for random -XORSAT problem. This theorem covers the sequential local algorithms with Belief Propagation BP and Survey Propagation SP as local rules.
1.6 Technique
The works from [AC08, ACR11] demonstrated the clustering phenomenon for several random CSPs, and conjectured that it could be an obstruction of solving those problems. [GS17a, GS17b] and subsequent works leveraged a different notion of clustering, named overlap gap property (OGP) by [GL18], to link the clustering phenomenon to the hardness rigorously. Gamarnik gave a detailed survey on it [Gam21].
This paper focuses on the vanilla version of the OGP. Given an instance of the constraint satisfaction problem, we say it exhibits the overlap gap property with values if every two solutions and satisfy either or , where is a metric on its solution space. (We assume is the Hamming distance throughout this paper.) Intuitively, it means every pair of solutions are either close to each other, or far from each other, and thus the solution space of the instance exhibits a topological discontinuity based on the proximity.
Now we illustrate how does the overlap gap property method. (Some details of the overlap gap property method is slightly different if we consider different variants of the OGP, but the overall idea of topological barrier stays the same.) Assume we have an algorithm that takes a random instance as input and outputs an assignment for the instance. The output can be viewed as a random variable which depends on both the random instance and some internal random variables, represented by a random internal vector , of the algorithm. Let and be realizations of and respectively, and denote the output assignment as . We then re-randomize the components of the internal vector one-by-one. After each re-randomizing a component of , we run the algorithm again to generate a new output assignment. Then, we obtain a sequence of assignments for the instance , where is the number of components of the random vector that we have re-randomized. Next, we show that the algorithm is insensitive to its input in the sense that when one of the components of the random internal vector is re-randomized, the output assignment almost remains unchanged, In particular, is smaller than . We also show that the algorithm has certain freeness in the sense that when all components in the random internal are re-randomized the output assignment is expected to change a lot. In particular, should be larger than . These two properties together imply that the sequence of assignments cannot ”jump” over the overlap gap, while two ends of the sequence probably lie in different clusters. Therefore, there should be an assignment that falls in the gap, namely, there exists such that . If the probability that the algorithm successfully finds a solution is greater than some small value slowly converging to 0, then there could be a very small probability that both and are solutions of the instance , with . Even though this probability is very small, it still has the chance to violates the OGP of the instance. Then, by contradiction, we could conclude that the probability that the algorithm succeeds in finding a solution is smaller than , namely, w.h.p. the algorithm fails in finding a solution.
Instead of considering the overlap gap property of the entire instance , we move our focus to the overlap gap property of a sub-instance of . Indeed, the sub-instance we consider is the 2-core instance induced by the 2-core of the factor graph representation of the random instance . In [IKKM12, AM15], they proved that the 2-core instance exhibits the overlap gap property with and for some constant for clause density . We remove all variables not in the core instance from the sequence of assignments we obtained above, then it becomes a sequence of assignments for the core instance. We also prove that the algorithm is insensitive to its input with respect to the core instance in the sense that , and has certain freeness so that . By repeating the above argument of the overlap gap property method, we can conclude that w.h.p. the algorithm fails in find a solution.
Our proof can also be used for the non-sequential local algorithms. Since the local rule runs on the local neighborhood of each variable in parallel, the values assigned to variables do not depend on each other. Informally speaking, there is no long-range dependency among those assigned values. Therefore, re-randomizing one component of the internal vector , say , only affects the value assigned to the corresponding variable . So, we have . Hence, we can obtain the same result as Theorem 1 for non-sequential local algorithms with the same proof.
1.7 Related works
The vanilla version of OGP helps us rule out some large classes of algorithms on random CSPs for relatively high clause densities, but it is not sophisticated enough to close the statistical-to-computational gap in some cases such as random NAE--SAT discussed in [GS17b] and random -XORSAT discussed in this paper (more details in Section 2). There have been some recent works trying to improve the notion of OGP by developing different variants of OGP. The most notable one is multi-OGP [Wei22, BH22, HS22, GKPX23, GK23], which succeeds in closing the statistical-to-computational gap in certain models. However, it is not clear about the relation between the clustering property and the multi-OGP.
2 Overlap Gap Property
We say that a -XORSAT instance exhibits the overlap gap property (or shortened as OGP) with the values if for any two solutions we have either or . Informal speaking, any two solutions of the instance are either close to each other or far away from each other, and thus the solution space exhibits a topological discontinuity. Given a random -XORSAT instance, we can prove that it exhibits the OGP when the clause density is greater than certain value, and obtain the following lemma. (The proof is given in the Appendix H.)
Lemma 1.
For any , there exists such that for and any pair of solutions of the random -XORSAT instance , w.h.p. the distance between the two solutions is either or for some . In particular, the value of is given by
where is the binary entropy function, that is, .
Instead of considering the random -XORSAT instance itself, we focus on the sub-instance, called the core instance (defined below), of the random -XORSAT instance , and show that core instance also exhibits the overlap gap property, even when the clause density is much lower. (See Table 1.)
3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|
0.818470 | 0.772280 | 0.701780 | 0.637081 | 0.581775 | 0.534997 | 0.495255 | |
0.984516 | 0.943723 | 0.905812 | 0.874349 | 0.848314 | 0.826470 | 0.807862 |
We start from defining the peeling algorithm and the core instances. Given an XORSAT instance , suppose there exists a variable of degree 1, which means it is involved in exactly one clause . We remove the variable and the only clause involving , to obtain a modified instance . If we have a solution for the modified instance , we can always choose a suitable value for the variable to satisfy the clause , and extend the solution to a solution for the original instance . Similarly, we can also do the same thing if the variable is of degree 0, since it does not involve in any equation, and we are free to choose any value for it. By doing this, solving the original instance is reduced to solving the modified instance .
We can repeat this process until there is no variable of degree at most 1. This process is named the peeling algorithm on an instance as its input (Algorithm 3). We call the resultant instance the 2-core instance (or simply the core instance) of the instance , denoted by . This name is borrowed from the graph theory. In graph theory, the -core of a graph is the maximal subgraph with minimum degree at least . It is known that the factor graph of the core instance is exactly the maximal subgraph of the factor graph of the instance , with minimum variable degree at least 2.
Mézard and Montanari [MM09] gave a detailed description on the structure of the core instance. Reader can find more details about the core instance in their book. The following theorem is a short summary of some known facts about the core instances we needed in the paper.
Theorem 4.
For any , there exists given by
such that the factor graph of the core instance of the random -XORSAT instance have the following properties.
-
1.
For , w.h.p. the factor graph of the core instance is an empty graph.
-
2.
For , w.h.p. the factor graph of the core instance have variable nodes, where
and is the largest solution of the fixed point equation . In particular, the fraction of variable nodes of degree is between and with probability greater than , where and
-
3.
Conditioning on the number of variable nodes and the degree profile , the factor graph of the core instance is distributed according to the ensemble containing all possible factor graphs of -XORSAT instances of variables and variable degree distribution .
Theorem 4 shows that there exists a threshold below the satisfiability threshold of random -XORSAT problem. When the clause density is below the threshold , w.h.p. the instance does not have a core instance. When the clause density is above the threshold , w.h.p. the core instance emerges. In particular, the variable degree distribution is a Poisson distribution with mean conditioning on . [MM09] also showed that the core instance exhibits the OGP.
Lemma 2.
For and , there exists such that w.h.p. the distance between any two solutions for the core instance of a random -XORSAT instance is either or greater than .
Now, we know that w.h.p. the core instance of a random -XORSAT instance has the overlap gap property with the values and . With OGP, we can partition the solution space of the core instance into multiple groups, each called a core cluster, such that the distance between any pair of core solutions in the same core cluster is at most , and the distance between any pair of core solutions in different core clusters is at least .
Suppose we have a -XORSAT instance . We first define a binary relation on the solution space of a core instance by: for , we write if and only if . It is easy to see it is an equivalence relation. Then, we can partition the solution space by the equivalence classes of . We can denote those equivalence classes by . Thus, we have where is the disjoint union. Then, we have
Now we can partition the solution space of the original instance into clusters based on the partition of the solution space of core instance. We set
where is defined to be the projection mapping assignments for the instance to assignments for the core instance by removing all variables not in the core instance . Each is called a cluster in the solutions space . We can then prove that these clusters are well-separated from each other.
Lemma 3.
Let and . Suppose is a random -XORSAT instance. Then, w.h.p. there exists a partition for the solutions space of the random instance such that the following statements hold.
-
1.
If for some , then we have , where the real-valued function is given by and is the largest solution of the fixed point equation .
-
2.
If and for some , then we have .
Proof.
Assume the instance has a non-empty core instance , which exists with high probability according to Theorem 4. We also assume the core instance exhibits the OGP with and , which occurs with high probability according to Lemma 2. Let and be two solutions of the random -XORSAT instance , and let and be the projection of and on the core solution space , respectively.
To prove the first part of the lemma, we assume that and are in the same cluster, that is, for some . By the definition of cluster, we have . Therefore, is upper bounded by the number of variables not in the core instance, plus . By Theorem 4, the number of variables outside the core instance is given by . Hence, we have .
To prove the second part of the lemma, we assume that and are in the different clusters, that is, , and for some . By the definition of cluster and Lemma 2, we have Therefore, we have . ∎
3 Preparation of OGP method
In this section, we introduce some notions and obtain some preliminary results needed by the overlap gap property method to prove the main results.
3.1 Sequence of output assignments
The random -XORSAT instance is a random variable, and the -decimation algorithm DECτ is a randomized algorithm. Therefore, the assignment output by the -decimation algorithm DECτ on input is also a random variable. The outcomes of the output assignment depend on the random instance , the order of variables being chosen, and the value selection based on the output from the local rule . Now we introduce two random variables to explicitly represent the order of variables and the value selection so that we can have a more concrete language to discuss how the randomness from both the instance and the algorithm affects the output assignment. We adopt the notation from [GS17b] in the following discussion.
The order of variables can be represented by a random vector whose entries are i.i.d. random variables with uniform distribution over the interval , independent of the random instance . We call the ordering vector of the algorithm. For all , the variable in the instance is associated with the random variable . In each iteration of the algorithm, the unassigned variable with the largest value , among all other unassigned variables, is selected. In the other words, we can construct the permutation such that , and for all the variable is selected in the -th iteration. The value selection based the output from the local rule can be represented by a random vector whose entries are i.i.d. random variables with uniform distribution over the interval . We call the internal vector of the algorithm. In the -th iteration of the algorithm, the value assigned to the selected variable is set to be 1 if , and 0 otherwise. Conditioning on , and , the output assignment can be uniquely determined. Therefore, we can view the -decimation algorithm DECτ as a deterministic algorithm on random input , and denote by the output of the algorithm.
With this notion of the deterministic algorithm, we can construct a sequence of output assignments which will be used in the argument of the overlap gap property method. The sequence of output assignments is generated by applying the -decimation algorithm DECτ on a random -XORSAT instance multiple times in the following way: First, given a random -XORSAT instance , we sample an ordering vector and an internal vector . Then, we run the -decimation algorithm DECτ on input with the ordering vector and the internal vector to get the first output assignment . After that, we re-randomize (i.e. sample again) the entries of the internal vector one by one from to . Right after each re-randomization we run the algorithm again to get a new output assignment. By doing this, we obtain a sequence of output assignments for the instance in total. We denote by the output assignment generated after re-randomizing the first entries of , for . Precisely speaking, let and be two independent random internal vectors with the uniform distribution over , and set for . Note that and . Then, the sequence of output assignments can be written as , which is equivalent to the sequence of output assignment obtained by running the -decimation algorithm DECτ (for times in total) on input for all .
Recall the projection mapping assignments for the instance to assignments for the core instance , by removing all variables not in the core instance. We can further obtain a sequence of assignments for the core instance by applying the projection on the output assignments , that is, we set .
3.2 Insensitive to internal vector
In this section, we show that the -decimation algorithm DECτ is insensitive to its internal vector. By insensitive, it means when the value of an entry in the internal vector is changed, only a small portion of the assigned values in the output assignment change accordingly. If so, every two consecutive output assignments in the sequence should only differ from each other in only a small portion of assigned values.
Consider the sequence of output assignment from Section 3.1. Note that the -th output assignment in the sequence is the output of the algorithm on input . For any , the only difference between the input and the input is the -th entries of the internal vectors and . We can immediately see that the insensitivity of the algorithm implies that every two consecutive output assignments in the sequence are close to each other. Gamarnik and Sudan [GS17b] proved the insensitivity of the -decimation algorithm in their works, using the notion of influence range. Although their works [GS17b] focused on the random NAE--SAT problem, the proof for the insensitivity of the -decimation algorithm is independent of the type of clauses in the random constraint satisfaction framework. So, we can directly use the result here.
Definition 3.
Given a random instance and a random ordering vector , we say that influences if either or in the variable-to-variable graph of the instance there exists a sequence of variable nodes such that the following statements hold.
-
1.
and .
-
2.
There exists a path from to , of length at most , in the variable-to-variable graph , for .
-
3.
for . In particular, .
We define the influence range of to be the set of all variables influenced by , denoted by .
Lemma 4.
Given an instance , a vector , and two vectors , we assume there exists such that and for all . Then, for all variables .
Lemma 5.
For any and sufficiently large ,
They first showed that changing the value of only one entry, say , in the internal vector only affects the values assigned to the variables in the influence range of the variable (Lemma 4). They further showed that w.h.p. the size of the influence range of variables is sublinear for all variables (Lemma 5). Note that in the original statement of Lemma 5 in [GS17b], the index in the inequality above can be any real number between 0 and 1/5. Here, we pick a fixed value for simplicity. Combining these two lemmas, we can show that w.h.p. the differences between and is upper bounded by for all .
Lemma 6.
For any and sufficiently large ,
3.3 Freeness
Recall the definition of free steps. An iteration of the -decimation algorithm DECτ is called a free step if the local rule gives the value in that iteration. In this case, the value chosen by the -decimation algorithm for the selected variable is either 0 or 1 with even probability. Intuitively, it means that the local rule cannot capture useful information from the local structure to guide the -decimation algorithm choosing value for the selected variable, and thus the -decimation algorithm simply make a random guess for the assigned value. We also recall the definition of a -decimation algorithm being -free. A -decimation algorithm DECτ is -free on the random -XORSAT instance if w.h.p. the algorithm has at least free steps, on input . Informal speaking, the more free the -decimation algorithm, the less the information captured by the local rule.
By using the Wormald’s method of differential equations, we can calculate the degree profile of the remaining factor graph after steps of the -decimation algorithm, for all . With the degree profiles, we can calculate the probability of each step being free, and thus approximate how free the -decimation algorithm is. The probability of having free steps depends on the choice of the local rules. Lemma 7 shows the freeness of the UC-decimation algorithm.
Lemma 7.
For and , the UC-decimation algorithm DECUC is -free on the random -XORSAT instance , where
and is the lower incomplete gamma function given by .
The role of the local rules is to approximate the marginal probability of the selected variable over a randomly chosen solution for the sub-instance induced by the local neighborhood of the selected variable. Interestingly, even we have a local rule that is capable to give the exact marginals when the factor graph is a tree, it still cannot provide enough useful information to guide the -decimation algorithm making good decision for the assigned value. With such a local rule, the -decimation algorithm still has a certain level of freeness.
Lemma 8.
Assume the local rule can give the exact marginal probabilities of variables on any factor graph that is a tree. For and , the -decimation algorithm DECτ is -free on the random -XORSAT instance , where
and for any and .
4 Proof of main theorems
We denote by the success probability of the -decimation algorithm DEC, namely, is the probability that the assignment output by the -decimation algorithm DEC on the random -XORSAT instance with variables and clauses is a solution for . Formally, we define by the following expression , where is the random -XORSAT instance, is the random ordering vector, and is the random internal vector, as mentioned in Section 3.1. Now, we consider the sequence of output assignments generated by the procedure in Section 3.1. We first prove that if the algorithm DEC is -free, then the expected distance between the first and the last assignments in the sequence is at least .
Lemma 9.
If the -decimation algorithm DEC is -free on the random -XORSAT instance for some , then we have .
Next, we will show that, if the -decimation algorithm is ”free enough”, namely, strictly -free, then we can pick a pair of output assignments and project them to the core instance so that the distance between the two corresponding core assignments falls in the forbidden range from the overlap gap property of the core instance .
Lemma 10.
For any and , if the -decimation algorithm DEC is strictly -free on the random -XORSAT instance , then there exist and such that w.h.p. we have , where is given in Lemma 2.
The following lemma shows that the probability of both the output assignments and being solutions for the instance is lower bounded by .
Lemma 11.
For any , we have .
Finally, we can combine all above lemmas in this section to give the proof of Theorem 1.
Proof of Theorem 1.
We denote by the event of , and we have by Lemma 10. On the other hand, we pick for the inequality in Lemma 11. We denote by the event of , and . Note that we have . Thus, we have .
Now assume both and take places. Since both and are solutions for the random instance , both and are solutions for the core instance . Moreover, the distance falls in the interval , which takes place with probability at most by Lemma 2. So, we have , and thus . ∎
To prove Theorem 2 and 3, all we need to do is to show that DECUC and DECτ with the exact marginal assumption are strictly -free. The results immediately follow by applying Theorem 1. From Lemma 7 and 8, we know that DECUC and DECτ are -free and -free, respectively. So, we only need to show that and . It can be done with the following lemmas, which give an upper bound of in Lemma 12, a lower bound of in Lemma 13, and a lower bound of and in Lemma 14. The proofs of these three lemmas are given in Appendices B, F and G, respectively.
Lemma 12.
For any and , we have , where
Lemma 13.
For any and , , where
Lemma 14.
For any and , we have , where and
(1) |
Proof of Theorem 2.
References
- [AC08] Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic Barriers from Phase Transitions. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 793–802, Philadelphia, PA, USA, October 2008. IEEE.
- [ACR11] Dimitris Achlioptas, Amin Coja-Oghlan, and Federico Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. Random Structures & Algorithms, 38(3):251–268, May 2011.
- [AKKT02] Dimitris Achlioptas, Jeong Han Kim, Michael Krivelevich, and Prasad Tetali. Two-coloring random hypergraphs. Random Structures and Algorithms, 20(2):249–259, March 2002.
- [ALS22a] Emmanuel Abbe, Shuangping Li, and Allan Sly. Binary perceptron: Efficient algorithms can find solutions in a rare well-connected cluster. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pages 860–873, New York, NY, USA, June 2022. Association for Computing Machinery.
- [ALS22b] Emmanuel Abbe, Shuangping Li, and Allan Sly. Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 327–338, February 2022.
- [AM15] Dimitris Achlioptas and Michael Molloy. The solution space geometry of random linear equations. Random Structures & Algorithms, 46(2):197–231, March 2015.
- [BH22] Guy Bresler and Brice Huang. The Algorithmic Phase Transition of Random k-SAT for Low Degree Polynomials. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 298–309, Denver, CO, USA, February 2022. IEEE.
- [BMZ05] A. Braunstein, M. Mézard, and R. Zecchina. Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms, 27(2):201–226, September 2005.
- [BZ04] Alfredo Braunstein and Riccardo Zecchina. Survey propagation as local equilibrium equations. Journal of Statistical Mechanics: Theory and Experiment, 2004(06):P06007, June 2004.
- [CDMM03] S. Cocco, O. Dubois, J. Mandler, and R. Monasson. Rigorous Decimation-Based Construction of Ground Pure States for Spin-Glass Models on Random Lattices. Physical Review Letters, 90(4):047205, January 2003.
- [Coj10] Amin Coja-Oghlan. A Better Algorithm for Random k -SAT. SIAM Journal on Computing, 39(7):2823–2864, January 2010.
- [Coj17] Amin Coja-Oghlan. Belief Propagation Guided Decimation Fails on Random Formulas. Journal of the ACM, 63(6):1–55, February 2017.
- [DGM+10] Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight Thresholds for Cuckoo Hashing via XORSAT. In Automata, Languages and Programming, volume 6198, pages 213–225. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
- [DM02] Olivier Dubois and Jacques Mandler. The 3-XORSAT threshold. Comptes Rendus Mathematique, 335(11):963–966, December 2002.
- [Gam21] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41):e2108492118, October 2021.
- [GK23] David Gamarnik and Eren C. Kızıldağ. Algorithmic obstructions in the random number partitioning problem. The Annals of Applied Probability, 33(6B), December 2023.
- [GKPX23] David Gamarnik, Eren C. Kizildağ, Will Perkins, and Changji Xu. Geometric Barriers for Stable and Online Algorithms for Discrepancy Minimization. In Proceedings of Thirty Sixth Conference on Learning Theory, pages 3231–3263. PMLR, July 2023.
- [GL18] David Gamarnik and Quan Li. Finding a large submatrix of a Gaussian random matrix. The Annals of Statistics, 46(6A), December 2018.
- [GS02] Carla P. Gomes and Bart Selman. Satisfied with Physics. Science, 297(5582):784–785, August 2002.
- [GS17a] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. The Annals of Probability, 45(4), July 2017.
- [GS17b] David Gamarnik and Madhu Sudan. Performance of Sequential Local Algorithms for the Random NAE-$K$-SAT Problem. SIAM Journal on Computing, 46(2):590–619, January 2017.
- [GY11] Marco Guidetti and A. P. Young. Complexity of several constraint-satisfaction problems using the heuristic classical algorithm WalkSAT. Physical Review E, 84(1):011102, July 2011.
- [Het16] Samuel Hetterich. Analysing Survey Propagation Guided Decimation on Random Formulas, February 2016.
- [HS22] Brice Huang and Mark Sellke. Tight Lipschitz Hardness for optimizing Mean Field Spin Glasses. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 312–322, Denver, CO, USA, October 2022. IEEE.
- [IKKM12] Morteza Ibrahimi, Yashodhan Kanoria, Matt Kraning, and Andrea Montanari. The Set of Solutions of Random XORSAT Formulae. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 760–779. Society for Industrial and Applied Mathematics, January 2012.
- [Kim04] Jeong Han Kim. The Poisson Cloning Model for Random Graphs, Random Directed Graphs and Random k-SAT Problems. In Computing and Combinatorics, volume 3106, pages 2–2. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
- [KSS09] Lukas Kroc, Ashish Sabharwal, and Bart Selman. Message-passing and local heuristics as decimation strategies for satisfiability. In Proceedings of the 2009 ACM Symposium on Applied Computing, pages 1408–1414, Honolulu Hawaii, March 2009. ACM.
- [MM09] Marc Mézard and Andrea Montanari. Information, Physics, and Computation. Oxford Graduate Texts. Oxford University Press, Oxford ; New York, 2009.
- [MMW07] Elitza Maneva, Elchanan Mossel, and Martin J. Wainwright. A new look at survey propagation and its generalizations. Journal of the ACM, 54(4):17, July 2007.
- [Mol05] Michael Molloy. Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms, 27(1):124–135, August 2005.
- [MPZ02] M. Mézard, G. Parisi, and R. Zecchina. Analytic and Algorithmic Solution of Random Satisfiability Problems. Science, 297(5582):812–815, August 2002.
- [MRZ03] M. Mézard, F. Ricci-Tersenghi, and R. Zecchina. Two Solutions to Diluted p-Spin Models and XORSAT Problems. Journal of Statistical Physics, 111(3/4):505–533, 2003.
- [PS16] Boris Pittel and Gregory B. Sorkin. The Satisfiability Threshold for k -XORSAT. Combinatorics, Probability and Computing, 25(2):236–268, March 2016.
- [PSW96] Boris Pittel, Joel Spencer, and Nicholas Wormald. Sudden Emergence of a Giant k-Core in a Random Graph. Journal of Combinatorial Theory, Series B, 67(1):111–151, May 1996.
- [PX21] Will Perkins and Changji Xu. Frozen 1-RSB structure of the symmetric Ising perceptron. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1579–1588, New York, NY, USA, June 2021. Association for Computing Machinery.
- [RS09] Federico Ricci-Tersenghi and Guilhem Semerjian. On the cavity method for decimated random constraint satisfaction problems and the analysis of belief propagation guided decimation algorithms. Journal of Statistical Mechanics: Theory and Experiment, 2009(09):P09001, September 2009.
- [Wei22] Alexander S. Wein. Optimal low-degree hardness of maximum independent set. Mathematical Statistics and Learning, 4(3):221–251, January 2022.
- [Wor95] Nicholas C. Wormald. Differential Equations for Random Processes and Random Graphs. The Annals of Applied Probability, 5(4), November 1995.
- [Wor99] Nicholas C. Wormald. The differential equation method for random graph processes and greedy algorithms. Lectures on approximation and randomized algorithms, pages 73–155, 1999.
Appendix A Proof of lemmas in Section 4
In this section, we prove the proof of lemmas in Section 4.
Proof of Lemma 9.
Suppose is the subset of indices of iterations that are free steps, namely,
Note that and and thus
Since we can write , we have
In free steps, the local rule gives the value to the decimation algorithm. Therefore, for any , if and only if either or . Therefore, we have
Note that the random variables are i.i.d. over uniform distributions on . Thus, is distributed over the binomial distribution with parameters and .
Assume that the algorithm DEC is -free on the random -XORSAT instance for some , which implies that w.h.p. . Note that only depends on and . Hence, we have
∎
Proof of Lemma 10.
Assume the -decimation algorithm DECτ is strictly -free on the random -XORSAT instance . Then, there exists such that DECτ is -free on .
We first show that there exists such that the expected value of is close to for some . Consider the sequence in which the first item is
(2) |
By Lemma 9, we know that . Moreover, by Theorem 4, we know that w.h.p. there are variables not in the core instance . Note that the projection function only remove variables not in the core instance . Therefore, we have where is the number of variables not in the core instance . Therefore, we have
By re-assigning the terms, we have
(3) |
with . From Lemma 6, we know that with probability we have
(4) |
By the triangle inequality of the metric and the linearity of the expectation, we know that for the difference of two consecutive expected values in the sequence is
(5) |
Combining (2), (3) and (5), we know that there exists and such that
(6) |
Next, we prove that concentrates around its mean. Given two vectors and , we write and for . A function satisfies the bounded differences property if there exist such that for any , , , , and ,
Note that . For arbitrary instance and ordering vector , we have
So, given a random instance and a random ordering vector , we can write as a function on variables given by . Conditioning on (4), we can verify that satisfies bounded differences property with for , and thus we have
by McDiarmid’s inequality. Since the condition (4) holds with probability as , the inequality in Lemma 10 holds with high probability. ∎
Proof of Lemma 11.
Fix an arbitrary . Note that we have and , where are uniformly distributed over , independently. Conditioning on , the assignment only depends on , and the assignment only depends on , and we have
Therefore, by Jensen’s inequality, we have
∎
Appendix B Upper Bound of Cluster Diameter
In this section, we give the proof of Lemma 12, which gives an upper bound for the diameter of a cluster that can be written as a closed form expression. From Lemma 3, we know that for any and w.h.p. the diameter of a cluster is upper bounded by , where
and is the largest solution of the fixed point equation with the given values of and . This implicit expression would make calculations complicated. So, we slightly relax the upper bound to obtain a simpler expression in Lemma 12. Note that this lemma can only be applied when due to some calculation restriction, which will be mentioned later in this section.
To prove Lemma 12, we first re-write the fixed point equation as
Since satisfies this equation, we can write as
(7) |
Note that must lie in the interval as
Further note that the real-valued function is strictly decreasing on . So, it suffices to find a lower bound of , in order to find an upper bound of .
Next, we try to show that is a lower bound of . To facilitate the calculation, we define a real-valued analytic function by
which have the following derivatives:
(8) | ||||
(9) | ||||
(10) |
Before proving that is a lower bound of , we give the following two lemmas, Lemma 15 and Lemma 16, which will be used later.
Lemma 15.
For any , .
Proof.
We prove it by contradiction. Assume for some . By the continuity of , there exists such that . Note that . Again, by the continuity of , there exists such that . Since , this contradicts the definition of . ∎
Lemma 16.
For any , there exists such that
Proof.
With Lemma 15 and Lemma 16, we can prove the is a lower bound for . We split the proof into two cases: the case of , and the case of . We first study the latter case, which is easier to be proved.
Lemma 17.
For any and , we have .
Proof.
From Lemma 16, we have since . Note that . By the continuity of , there exists at least one such that , which can be written as . By the definition of , we then have . ∎
Next, we study the case of . To prove that , we need the two following facts from the basic calculus. With these two facts, we can prove that for in Lemma 18. Note that the condition is required in the calculation in the proof of Lemma 18. This is the reason why Lemma 12 requires .
Fact 1.
Let be an analytic function. If for any , then .
Fact 2.
Let be an analytic function. If there exists such that
and , then .
Lemma 18.
For any and , where and , we have .
Proof.
Given arbitrary and , from (8), we know that is strictly increasing with , and thus we have for any .
From (10), when we view as a function of with fixed and there are two points of inflection:
Now, we consider the following two cases separately:
-
1.
-
2.
Case 1. Assume . Then we have
Moreover, since , we have . Therefore, we have
for . By Fact 2, we have for any . Note that both and are less than 0. So, for any , and . Therefore, cannot be in by its definition, and hence .
Now, we can complete the proof for Lemma 12.
Appendix C Terminologies
C.1 Degree profile
The distribution of the degrees of nodes in a factor graph can be described by degree profile, which plays an important role in our analysis. Given a factor graph , let be the number of variable nodes of degree for all . Similarly, let be the number of equations nodes of degree for all . Furthermore, let be the total number of variable nodes and be the total number of equation nodes. It is obvious that and for the factor graph of a -XORSAT instance, but keep in mind that the total number of variable nodes and the total number of equation nodes decrease during the process of the algorithm. The degree distribution of variable nodes is given by the sequence , where , and the degree distribution of equation nodes is given by the sequence , where . Then, the degree profile of the factor graph is given by . Sometimes, the degree distributions and can also be represented by the polynomials and , respectively. With this representation, we can write and .
Given the factor graph of a random -XORSAT instance , is uniformly distributed over the ensemble of -uniform factor graph . It is clear that the degree distribution of equation nodes is given by . The degree distribution of variable nodes is also known to converge in distribution to independent Poisson random variables with mean . Precisely speaking, w.h.p. for any , we have
We can also describe the degree profile from edge perspective. The edge perspective degree distribution of variable nodes is given by , where , and the edge perspective degree distribution of equation nodes is given by , where . Then, the edge perspective degree profile of the factor graph is given by . Similar to and , the edge perspective degree distribution and can be written as the polynomials and , respectively.
The ensemble of factor graphs with prescribed degree profile is called the ensemble of degree constrained factor graphs , which is the set of all factor graphs of variable nodes with degree profile with the uniform distribution. Note that the number of the function nodes is restricted to satisfy the equation .
C.2 Local tree-like structure
In a factor graph , we can define the length of a path to be the number of edges in the path. Then, the distance between two nodes is defined to be the length of the shortest path between them. By convention, we set the length to be between two nodes if there is no path connecting them. With this notion of distance, we can define the local neighborhood of a node. Given a factor graph , the local neighborhood (or simply neighborhood) of a node of radius is defined to be the subgraph of induced by all nodes of distances at most from and all edges between those nodes. The local neighborhood also represents an XORSAT instance with the variables and clauses inside the neighborhood.
The local neighborhood of a variable in a random -uniform factor graph looks like a tree. In particular, it looks similar to a random -generation tree. For any non-negative even number , the -generation tree ensemble of a given degree profile is defined as follows. When , the ensemble contains only one element, a single isolated node, and call it the variable node of the generation 0. Assume . We first generate a tree from the -generation tree ensemble . For each variable node of generation , we draw an independent integer distributed according to (or if ), and add function nodes, which are connected to as its children. Then, for each of these function nodes , we draw an independent integer distributed according to , and add variable nodes, which are connected to as its children and called the variable nodes of the generation .
In particular, Mézard and Montanari [MM09] shows that the local structure of a variable node of a random factor graph from and converges to this tree ensemble. The more details of the following theorem can be found in [MM09].
Theorem 5.
Let be a fixed degree profile, be a random factor graph in the ensemble (and respectively), be a variable node chosen uniformly at random from , and be a non-negative even number. Then, the local neighborhood of the factor graph converges in distribution to (and respectively) as .
Appendix D Proof of Lemma 7: Finding the number of free steps in DECUC
In this section, we will show the calculation for finding the number of free steps run by the UC-decimation DECUC on the random -XORSAT instance , which implies that DECUC is -free. To be specific, we will show the proof of Lemma 7, by using the Wormald’s method of differential equations [Wor95].
We start from approximating the degree profile of the factor graph of the remaining instance after iterations. The degree profiles can help us to calculate the probability of the next iteration being a free step. Note that the notations for describing the degree profile are defined in Appendix C.1. For each of those notations, we append ”” to them to specify the values right after iterations.
Lemma 19.
For any local rule , after iterations of DEC, w.h.p.the number of variables of degree is given by
(11) |
and the number of equations of degree is given by
(12) |
Proof.
We are going to track the changes in the numbers of variable nodes and equation nodes of different degrees throughout the process of the algorithm, by using the method of differential equations from [Wor95].
To apply the method of differential equations, we need to compute the expected changes in the numbers of variable nodes and equation nodes of different degrees. Note that exactly one variable node is removed in each iteration, so the total number of variable nodes is after iterations.
Suppose we are at time (i.e. after iterations). The selected variable node is going to be removed from the factor graph . The selected variable node is of degree with probability . So, for , the expected change in the number of variable nodes of degree is . This yields
(13) |
for . By Theorem 5, the local neighborhood of the selected variable of radius in converges in distribution to , where for all and for all . Therefore, with probability for , there are equation nodes directly adjacent to the selected variable node. For those equation nodes, the degree of each equation node is with probability , and decreases by 1 due to the removal of the selected variable node. In the other words, the expected changes in the numbers of equation nodes of different degree are given by
(14) |
and
(15) |
In the above calculation, we use the fact that , which holds because both and are equal to the number of edges at time .
Let be the normalized time. Furthermore, let
Then, the equations (13), (14) and (15) suggest the following different equations:
(16) | |||||
(17) | |||||
(18) |
At time (i.e. before running the algorithm), the number of variable nodes of degree is for all . Since all equation nodes are of degree at time , and for all . These suggest the following initial conditions for those different equations above:
(19) | |||||
(20) | |||||
(21) |
The solution of the different equations with the initial conditions above is given by
for any . (The steps of solving the differential equations are shown at Appendix D.1.) By the Wormald’s method of differential equations, at time , w.h.p. the number of variable nodes of degree is equal to for all , and the number of equation nodes of degree is equal to . ∎
In an iteration of the UC-decimation algorithm DECUC, the local rule UC gives when there is no unit clause containing the selected variable . Therefore, an iteration is a free step if and only if there is no equation node of degree 1 adjacent to the selected variable node . With the degree profiles from Lemma 19, we can calculate the probability of an iteration being a free step, and thus approximate the total number of free steps.
Proof of Lemma 7.
We track the number of free steps after iterations, by using the Wormald’s method of differential equations. Let be the number of free steps after iterations. Note that . At time , if there exists at least one equation node of degree 1 adjacent to the selected variable node , then the -st iteration is not a free step and . Conversely, if all equation nodes adjacent to the selected variable node are of degree not equal to 1, then the -st iteration is a free step and . The probability that all equation nodes adjacent to the selected variable node are of degree not equal to 1 is given by . Therefore, we have
(22) |
From (12) and the polynomial identity , we can simplify by
Then, by (11), (12) and the fact that , we have
(23) |
By letting and , the equation (23) and the fact that suggest the differential equation
(24) |
and the initial condition
(25) |
By the Wormald’s method of differential equations, w.h.p. the number of free steps after iterations is equal to .
By the second fundamental theorem of calculus, we can write them as a definite integral
Note that . Using integration by substitution, we have
where is the lower incomplete gamma function defined by
Hence, w.h.p. the total number of free steps run by the algorithm is , where
Hence, DECUC is -free. ∎
D.1 Solving differential equations (16), (17) and (18)
In this section, we show how to solve the differential equations (16), (17) and (18) with the initial conditions (19), (20) and (21). For , the differential equations in (16) can be written as
By separation of variable in integration and the initial condition (19), we have
(26) |
Next, we are going to use induction from to to prove that
First, the differential equation (17) can be written as
By separation of variable in integration and the initial condition (20), we have
Now we assume for some . From the differential equations in (18), we have
By applying the standard method for the first order linear differential equations and the induction assumption, we have
where is a constant and . Since , and thus . Hence, we have
(27) |
which completes the induction.
Appendix E Proof of Lemma 8: Finding the number of free steps in DECτ
Now we assume the local rule can give the marginal probabilities of variables for any factor graph that is a tree. We then show the calculation for the number of free steps run by the -decimation algorithm DECτ on the random -XORSAT instance . To be specific, we show the proof of Lemma 8, by using the Wormald’s method of differential equations [Wor99].
In this section, the approach is slightly different from the proof in Appendix D since we do not have the details on how the local rule calculates its output value. However, since we assume that the local rule can give the exact marginals, we can determine the value output by the local rule , by studying the structure of the local neighborhood at time .
Assume we are at time . We consider the local neighborhood of the selected variable of radius . By Theorem 5, the local neighborhood converges in distribution to the tree ensemble , where is the degree distribution of the factor graph at time . The values of and are given by (11) and (12) since Lemma 19 can be applied to -decimation with any local rule . By our assumption, the value output by the local rule is equal to the marginal probability of the selected variable on a random solution for the XORSAT instance induced by the local neighborhood . So, we can obtain the value output by the local rule by calculating the marginal probability of , using the tree ensemble .
For any , we denote by a subtree of rooted at some node whose distance from is . According to the tree ensemble , the tree is i.i.d. for all variable nodes of same distance from . By abusing the notation, we can simply omit ”” and write . For any factor graph which is a tree rooted at a variable node , we say the tree has a free root if in the XORSAT instance induced by the marginal distribution of the root variable is an even distribution over . Now, we are going to prove that for any the tree has a free root with probability at least , where the sequence is given by
(28) |
for any .
Lemma 20.
For , the tree has a free root with probability at least .
Proof.
First, we consider the tree . The root variable node of has distance from the selected variable . Since the radius of is , the variable does not have any child node. Thus, consists of only one variable node, namely , and no equation node. We can assign either 0 or 1 to without violating any equation. Hence, the tree has a free root with probability .
For , consider the subtree of local neighborhood , rooted at the variable node of distance from . The variable node has child equation nodes with probability for . Each of those equation nodes has child variable nodes with probability for . We also know that each of these child variable nodes is the root of an i.i.d. subtree . In other words, each of those equation nodes is connected to the roots of i.i.d. subtrees as its children.
For an equation node mentioned above, if at least one of its child subtree has a free root, then there are at least two solutions for the subtree , one assigns 0 to the root of subtree , and another one assigns 1 to the root of subtree . Therefore, no matter what value we assign to , we are able to choose a suitable assignment for the variables in that subtree so that the equation of and all equations in are satisfied.
Now, we can calculate the probability that the local neighborhood of the selected variable of radius has a free root with probability at least . The proof is similar to the proof of Lemma 20, except replacing with .
Lemma 21.
At time , the local neighborhood of the selected variable of radius has a free root with probability at least .
Proof.
The root variable node has child equation node with probability for . Each of those equation nodes has child variable nodes with probability , and each of these child variable nodes is the root of an i.i.d. subtree . In other words, each of those equation nodes is connected to the roots of i.i.d. subtrees as its children.
For an equation node mentioned above, if at least one of its child subtree has a free root, then we are able to obtain a satisfying assignment for all variable nodes in the child subtree which assigns either 1 or 0 to the root of subtree . Therefore, no matter what value we assign to , we are able to choose a suitable assignment for the variables in that subtree so that the equation of and all equations in are satisfied.
Proof of Lemma 8.
Lemma 21 implies that at time the local rule gives the value to the decimation algorithm with probability at least . Now we can calculate the number of free steps run by the -decimation algorithm by tracking the number throughout the process of the algorithm. We know that the -st iteration is a free step with probability at least . Let be a lower bound of the number of free steps after iterations, with and
(29) |
By letting and , the equation (29) suggests the differential equation
(30) |
with the initial condition
(31) |
since . By the Wormald’s method of differential equations, w.h.p. the lower bound of the number of free steps after iterations is given by
Hence, w.h.p. the total number of free steps run by the -decimation algorithm is lower bounded by , where is given by
Hence, DECτ is -free. ∎
Appendix F Proof of Lemma 13
In this section, we prove Lemma 13, which gives a lower bound for . To do that, we first prove that is lower bounded by
for and . We then prove that is decreasing with integer , and thus has a lower bound for any .
We start from proving that is decreasing with , which implies for any and .
Lemma 22.
is decreasing with for any .
Proof.
We obtain the derivative of with respect to by the followings.
where is given by
Note that and its derivative is given by
Therefore, is decreasing with and for any . Hence, and thus is decreasing with for any . ∎
By the above lemma, we know that for any . Next we will prove that is lower bounded by .
Lemma 23.
For , .
Proof.
For any and any we have
This implies that
(32) |
∎
Next, we prove that is an increasing sequence, which implies that for any .
Lemma 24.
is an increasing sequence.
Proof.
Note that for we have
By replacing with , we also have
Therefore, we have
and
The above inequality is based on the fact that the lower incomplete gamma function is strictly increasing with and .
For , we have . Therefore, we have
and thus is an increasing sequence. ∎
Combining above lemmas, we can complete the proof of Lemma 13.
Appendix G Proof of Lemma 14
In this section, we prove Lemma 14, which gives a lower bound for . Recall that
(33) |
We first show that is decreasing.
Lemma 25.
For any , and , we have for any , and the sequence is decreasing.
Proof.
Without loss of generality, we write . First, we prove that for all , by induction. Note that . If for some , then it is easy to see from (33) that we also have . Therefore, we know that for all .
Then, we prove for any by induction again. For , we have . Assume that for some . Then, we have
Therefore, is true for all . The result follows. ∎
Since the sequence is decreasing and bounded from below by , by monotone convergence theorem, it converges as . In particular, converges to as , where is the largest solution of the fixed point equation
(34) |
and for any .
Lemma 26.
Given and , we have for any , where
(35) |
Proof.
Define the analytic real-valued function by
for any , , and . Note that we have
since for any , and . Now we set . With the polynomial identity , we then have
where
By the quadratic formula and (35), we know that if or . In particular, if or . For , we have and , and thus . Hence, we have for any .
Now we view as a function of . For , we know that and . By the continuity of as a function of , there exists at least one such that , which implies
By definition, is the largest solution of the fixed point equation (34), so we have . ∎
Proof of Lemma 14.
From Lemma 25, with , we know the sequence is decreasing and lower bounded by 0. By the monotone convergence theorem, the sequence converges as . In particular, it converges to which is the largest solution of the fixed point equation in (34), and for all . Therefore, we have
Furthermore, since is lower bounded by , is non-negative for any . Thus, we have
Then, by Lemma 26, we have for any , which implies
By directly differentiating with respect to and , we can check that is increasing with for and decreasing with for . Therefore, we have . ∎
Appendix H Proof of Lemma 1
The natural way to reveal the overlap gap property of the random -XORSAT problem is to use the first moment method. Given a random -XORSAT instance of variables and clauses, we first calculate the expected number of pairs of solutions with distance between them, for any . Let be the number of pairs of solutions with distance between them. The expected value of is given by the following lemma.
Lemma 27.
Let and . For any , the expected value of is given by
where the real-valued function is defined by
For convenience, we simply assume when or .
Proof.
Let be the number of pairs of solutions, with distance between the two solutions, for a random -XORSAT instance. In the other words, given a random instance with linear system representation ,
By linearity of expectation, the expected value of is given by
Now we consider the calculation of . Since each equation in the linear system are chosen identically and independently, the summand can be written as , where is the first equation in the linear system. In addition, by condition probability formula we have
Since , there are variables having different values and variables having same values when we compare the assignments and . The random equation holds if and only if the equation chooses even number of variables from those variables having different values in and . So, we have
From above formula, we can see that the value of is independent of the choices of and , and only depends on the distance between and . So we have
By applying the Stirling’s approximation for we have
where is defined by
For convenience, we simply assume when or . ∎
Fix and . If for some , then the expectation converges to 0 as . By Markov’s inequality, we have , and thus also converges to 0 as . That means the probability of having at least one pair of solutions with distance between them converges to 0. In the other words, w.h.p. there is no such pair of solutions. So, if we can find an interval such that for any , we can say that w.h.p. there is no pair of solutions with distance between them, for any . In such case, w.h.p. this distance between every pair of solutions is either , or , that is, w.h.p. the random instance exhibits the overlap gap property with and .
From the formula of the function , we can see that it decreases with , for any fixed and , illustrated in Figure 1. It is more likely to have the OGP if the clause density is large. We can further determine the minimal clause density for having the OGP. This leads to Lemma 1, with the following proof.

Proof of Lemma 1.
If is odd, then we have . In this case, by the continuity of , there exist such that for any .
Now we assume is even. When , it is easy to see that for any . If we fix and , then the function is strictly decreasing with since . We aim at finding the minimal positive value of that guarantees there exists an interval such that for any . Furthermore, we know that for even , so it suffices to find the minimal positive value of that guarantees the existence of such interval within .
By fixing even and , we can treat as a strictly decreasing continuous function of , with and . Therefore, for any even and , there exists a unique given by
such that
where is the binary entropy function . Then, we can define and by
with . Suppose . Since is strictly decreasing with , we have . By the continuity of , there exist and such that we have for any .
Therefore, with Lemma 27, we know that the expected number of pairs of solutions with distance between them converges to zero for any . By the first moment method, w.h.p. there is no pair of solutions with distance between them, for any . In the other words, w.h.p. for any pair of solutions and of the distance between them is either smaller than or larger than for some and with . ∎