This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Limits of Sequential Local Algorithms on the Random kk-XORSAT Problem

Kingsley Yung The Chinese University of Hong Kong. Email: [email protected].
Abstract

The random kk-XORSAT problem is a random constraint satisfaction problem of nn Boolean variables and m=rnm=rn clauses, which a random instance can be expressed as a G𝔽(2)G\mathbb{F}(2) linear system of the form Ax=bAx=b, where AA is a random m×nm\times n matrix with kk ones per row, and bb is a random vector. It is known that there exist two distinct thresholds rcore(k)<rsat(k)r_{core}(k)<r_{sat}(k) such that as nn\rightarrow\infty for r<rsat(k)r<r_{sat}(k) the random instance has solutions with high probability, while for rcore<r<rsat(k)r_{core}<r<r_{sat}(k) the solution space shatters into an exponential number of clusters. Sequential local algorithms are a natural class of algorithms which assign values to variables one by one iteratively. In each iteration, the algorithm runs some heuristics, called local rules, to decide the value assigned, based on the local neighborhood of the selected variables under the factor graph representation of the instance.

We prove that for any r>rcore(k)r>r_{core}(k) the sequential local algorithms with certain local rules fail to solve the random kk-XORSAT with high probability. They include (1) the algorithm using the Unit Clause Propagation as local rule for k9k\geq 9, and (2) the algorithms using any local rule that can calculate the exact marginal probabilities of variables in instances with factor graphs that are trees, for k13k\geq 13. The well-known Belief Propagation and Survey Propagation are included in (2). Meanwhile, the best known linear-time algorithm succeeds with high probability for r<rcore(k)r<r_{core}(k). Our results support the intuition that rcore(k)r_{core}(k) is the sharp threshold for the existence of a linear-time algorithm for random kk-XORSAT.

Our approach is to apply the Overlap Gap Property OGP framework to the sub-instance induced by the core of the instance, instead of the whole instance. By doing so, the sequential local algorithms can be ruled out at density as low as rcore(k)r_{core}(k), since the sub-instance exhibits OGP at much lower clause density, compared with the whole instance.

1 Introduction

1.1 Background

The kk-XORSAT problem is a Boolean constraint satisfaction problem and closely related to the well-known kk-SAT problem. An instance Φ\Phi of the kk-XORSAT problem consists of mm clauses in nn Boolean variables. Each clause is a Boolean linear equation of kk variables of the form xj1xj2xjk=bjx_{j_{1}}\oplus x_{j_{2}}\oplus\cdots\oplus x_{j_{k}}=b_{j}, where \oplus is the modulo-2 addition. By convention, when we say an XORSAT instance, without the prefix ”kk”, we mean the same except we do not require the clauses to have exactly kk variables. An assignment σ\sigma to the nn variables is a mapping from the set {xi:i[n]}\{x_{i}:i\in[n]\} of all nn variables to the set {0,1}\{0,1\} of the two Boolean values. By abusing the notation, we can write it as the Boolean vector σ=(σ(x1),σ(x2),,σ(xn)){0,1}n\sigma=(\sigma(x_{1}),\sigma(x_{2}),\cdots,\sigma(x_{n}))\in\{0,1\}^{n} containing the assigned values. The distance d(σ,σ)d(\sigma,\sigma^{\prime}) between any two assignments σ\sigma and σ\sigma^{\prime} is defined to be the Hamming distance d(σ,σ)=i=1n𝟙(σ(xi)σ(xi))d(\sigma,\sigma^{\prime})=\sum_{i=1}^{n}\mathbbm{1}(\sigma(x_{i})\neq\sigma^{\prime}(x_{i})). A clause is satisfied by an assignment if the equation of the clause holds when the variables are replaced by the corresponding assigned values, and an instance of the kk-XORSAT problem is satisfied by an assignment if all its clauses are satisfied by the assignment. An instance is satisfiable if it has at least one satisfying assignment, or unsatisfiable if it does not have any satisfying assignment. The assignment σ\sigma that satisfies the instance Φ\Phi is called a solution for the instance. The set of all satisfying solutions for the instance Φ\Phi is called the solution space of the instance, denoted by 𝒮(Φ)\mathcal{S}(\Phi). We are interested in the complexity of finding a solution.

Since each clause is just a Boolean linear equation, an instance Φ\Phi can be viewed as a Boolean linear system Ax=bAx=b, where A{0,1}m×nA\in\{0,1\}^{m\times n} is an m×nm\times n Boolean matrix and b{0,1}nb\in\{0,1\}^{n} is a vector of length nn. Note that each row in AA contains exactly kk ones, since each clause has exactly kk variables. We can see that finding solutions for a kk-XORSAT instance is equivalent to solving a Boolean linear system, and the solution space 𝒮(Φ)\mathcal{S}(\Phi) is an affine space inside {0,1}n\{0,1\}^{n}. By abusing the notation, we can simply write Φ=(A,b)\Phi=(A,b), and the terms ”clause” and ”equation” are interchangeable here.

We are particularly interested in random instances of the kk-XORSAT problem. In a random instance 𝚽\mathbf{\Phi}, each clause is drawn over all 2(nk)2\binom{n}{k} possibilities, independently. In particular, the left-hand side of the equation is the modulo-2 sum of kk variables chosen uniformly from (nk)\binom{n}{k} possibilities, and the right-hand side is either 0 or 1 with even probabilities. Therefore, a random instance 𝚽\mathbf{\Phi} of the kk-XORSAT problem is drawn uniformly from the ensemble 𝚽k(n,m)\mathbf{\Phi}_{k}(n,m) of all possible instances of the kk-XORSAT problem with nn variables and mm clauses, each clause containing exactly kk variables, and we denote this by 𝚽𝚽k(n,m)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,m). We focus on the regime in which the number of variables nn goes to the infinity and the number of clauses mm is proportional to the number of variables nn, that is, m=rnm=rn, where rr is a constant independent of nn and called the clause density.

Since a kk-XORSAT instance can be represented by a system of linear equations in G𝔽(2)G\mathbb{F}(2), given an instance, some standard linear algebra methods such as the Gaussian elimination can determine whether the instance is satisfiable, find a solution, and even count the total number of solutions, in polynomial time. However, beyond this particular algebraic structure, some variants of the kk-XORSAT problem is hard to solve. Achlioptas and Molloy mentioned in their paper [AM15] that random instances of the kk-XORSAT problem seems to be extremely difficult for both generic CSP solvers and SAT solvers, which do not take the algebraic structure into account. Guidetti and Young [GY11] suggested that the random kk-XORSAT is the most difficult for random walk type algorithms such as WalkSAT, among many random CSPs. The difficulty of solving random kk-XORSAT instances becomes more apparent when we only consider linear-time algorithms as efficient algorithms, since we do not have linear-time algebraic method to solve a linear system in general.

Many studies suggest that the difficulties of solving random CSPs are related to the phase transition of the solution spaces when the clause density rr grows. (We will have more detailed discussion in Section 1.3.) Pittel and Sorkin [PS16] obtained the sharp satisfiability threshold rsat(k)r_{sat}(k) of random kk-XORSAT, for general k3k\geq 3: The random kk-XORSAT instance is satisfiable w.h.p. when r<rsat(k)r<r_{sat}(k), and it is unsatisfiable w.h.p. when r>rsat(k)r>r_{sat}(k). (We say an event n\mathcal{E}_{n}, depending on a number nn, occurs with high probability, or shortened to w.h.p., if the probability of the event n\mathcal{E}_{n} occurring converges to 1 as nn goes to the infinity, that is, limnPr[n]=1\lim_{n\rightarrow\infty}\Pr\left[\,{\mathcal{E}_{n}}\,\right]=1.) Furthermore, Ibrahimi, Kanoria, Kraning and Montanari [IKKM12] obtained the sharp clustering threshold rcore(k)r_{core}(k), which is less than rsat(k)r_{sat}(k), of random kk-XORSAT for k3k\geq 3. When r<rcore(k)r<r_{core}(k), w.h.p. the solution space of a random kk-XORSAT instance is ”well-connected”. When rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k), w.h.p. the solution space of a random kk-XORSAT instance shatters into an exponential number of ”well-separated clusters”. In [IKKM12], They also provided a linear-time algorithm that can solve a random kk-XORSAT instance w.h.p. for r<rcore(k)r<r_{core}(k). On the other hand, no algorithm is known to be able to find a solution for a random kk-XORSAT instance with non-vanishing probability in linear time, for rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k) in which solutions exist with high probability.

In this work, we consider a natural class of algorithms, called sequential local algorithms. A sequential algorithm selects an unassigned variable randomly and assigns a value to it, iteratively, until every variable has an assigned value. In each iteration of the algorithm, to decide the assigned value, the algorithm runs some heuristic called local rules which return a value p[0,1]p\in[0,1], and decide the assigned value to be either 0 or 1 randomly, according to the Bernoulli distribution with parameter pp. Ideally, if in each iteration the local rule can calculate the exact marginal probability of the selected variable over a randomly chosen solution for the instance conditioned on fixing all previously selected variables to their assigned values, the algorithm should be able to find a solution. However, we restrict the ability of the local rules by only providing local structure to the local rules. To explain the meaning of ”local”, we first construct a graphical representation for the kk-XORSAT instances: the factor graph. The factor graph GG of a kk-XORSAT instance Φ\Phi is constructed in the following way: (1) each variable is represented by a variable node; (2) each equation is represented by an equation node; (3) add an undirected edge (v,e)(v,e) if the variable vv is involved in the equation ee. Note that since there is a one-to-one correspondence between variables (equations) and variable nodes (equation nodes), in this paper, the terms variables (equations) and variable nodes (equation nodes) are interchangeable. The distance between any two nodes is the number of edges in the shortest path connecting the two nodes. For any integer R0R\geq 0, the local neighborhood BG(v,R)B_{G}(v,R) with radius RR of a node vv is the subgraph of GG induced by all nodes with distances less than or equal to RR from the node vv. By ”local” in the name of the algorithms, it means the local rules only takes the local neighborhood of the selected variable, of radius RR, as its input.

The actual implementation of a sequential local algorithm depends on the choice for the local rules. To emphasize the choices for the local rules of the algorithms, the sequential local algorithm with the given local rule τ\tau is called the τ\tau-decimation algorithm DECτ\tau. The formal definitions of the sequential local algorithms, as well as the τ\tau-decimation algorithms, will be given in Section 1.4. Note that if the local rule τ\tau takes constant time, then the τ\tau-decimation algorithm also takes linear time.

We introduce a notion of freeness to the sequential local algorithms. For any iteration in which the local rule returns 1/21/2, we call it a free step. Intuitively, a free step means the local rule cannot obtain useful information from the local structure and let the algorithms make a random guess for the assigned value. We say a τ\tau-decimation algorithm is δ\delta-free if w.h.p. it has at least δn\delta n free steps. Moreover, we say a τ\tau-decimation algorithm is strictly δ\delta-free if it is δ\delta^{\prime}-free for some δ>δ\delta^{\prime}>\delta. If the τ\tau-decimation algorithm is δ\delta-free with large δ>0\delta>0, it means the algorithm makes a lot of random guess on the assigned values, and it is likely that the local rule τ\tau cannot extract useful information from the local structure to guide the algorithm. This leads to our contribution described in the next section.

1.2 Main contribution

The main contribution of this work consists of two parts. The first part is to show that as nn\rightarrow\infty if the τ\tau-decimation algorithm is strictly 2μ(k,r)2\mu(k,r)-free then w.h.p. it fails to find a solution for the random kk-XORSAT instance, when the clause density rr is beyond the clustering threshold rcore(k)r_{core}(k) but below the satisfiability threshold rsat(k)r_{sat}(k). This can be formally written as Theorem 1. The value μ(k,r)\mu(k,r) given in Theorem 1 is an upper bound of the number of variables removed from the instance in order to obtain the sub-instance called core instance, which is crucial in our proof. We will discuss its meaning in detail in Section 2.

Theorem 1.

For any k3k\geq 3 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), if the τ\tau-decimation algorithm DECτ\tau is strictly 2μ(k,r)2\mu(k,r)-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), then w.h.p. the output assignment DECτ(𝚽)\textnormal{{DEC\textsubscript{$\tau$}}}(\mathbf{\Phi}) from the algorithm DECτ\tau on input 𝚽\mathbf{\Phi} is not in the solution space 𝒮(𝚽)\mathcal{S}(\mathbf{\Phi}), that is,

limnPr[DECτ(𝚽)𝒮(𝚽)]=0,\displaystyle\lim_{n\rightarrow\infty}\Pr\left[\,{\textnormal{{DEC\textsubscript{$\tau$}}}(\mathbf{\Phi})\in\mathcal{S}(\mathbf{\Phi})}\,\right]=0,

where QQ is the largest solution of the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}), and μ(k,r)\mu(k,r) is the real-valued function given by μ(k,r)=exp(krQk1)+krQk1exp(krQk1)\mu(k,r)=\exp(-krQ^{k-1})+krQ^{k-1}\exp(-krQ^{k-1}).

Note.  This theorem can also be applied to the non-sequential local algorithms in which the algorithms run their local rules and decide the assigned value for each variable, in parallel, without depending on the other assigned values. We will briefly discuss the reason at the end of Section 1.6 where we discuss the proof technique we used.

The best known linear algorithm of finding a solution for the random kk-XORSAT instance succeeds w.h.p., for k3k\geq 3 and r<rcore(k)r<r_{core}(k) [IKKM12]. That means these sequential local algorithms do not outperform the best known linear algorithm. Note that rcore(k)r_{core}(k) is where the best known linear algorithm succeeds up to, and where the sequential local algorithms starts failing. These support the intuition that rcore(k)r_{core}(k) is the sharp threshold of the existence of a linear-time algorithm for random kk-XORSAT problem.

The second part of our contribution is to verify that the ”freeness” condition in Theorem 1 is satisfied by the τ\tau-decimation algorithm with certain local rules τ\tau. One of them is the simplest local rule, the Unit Clause Propagation UC, which tries to satisfy the unit clause on the selected variable if exists, or make random guess otherwise. By using the Wormald’s method of differential equations to count the number of free steps run by UC-decimation algorithm DECUC, we can show that it is strictly 2μ(k,r)2\mu(k,r)-free on the random kk-XORSAT instance 𝚽\mathbf{\Phi} for k9k\geq 9, which leads to the following theorem.

Theorem 2.

For any k9k\geq 9, r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), given a random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), we denote by DECUC(𝚽)(\mathbf{\Phi}) the output assignment from the UC-decimation algorithm DECUC on input 𝚽\mathbf{\Phi}. Then, we have limnPr[DECUC(𝚽)𝒮(𝚽)]=0.\lim_{n\rightarrow\infty}\Pr\left[\,{\textnormal{{DEC\textsubscript{UC}}}(\mathbf{\Phi})\in\mathcal{S}(\mathbf{\Phi})}\,\right]=0.

In each iteration, the role of the local rules is to calculate the marginal probability of the selected variable in the instance conditioned on fixing all previously selected variables to their assigned values. Belief Propagation BP and Survey Propagation SP are surprisingly good at approximating marginal probabilities of variables over randomly chosen solutions in many random constraint satisfaction problems empirically [MPZ02, GS02, BMZ05, RS09, KSS09]. In particular, it is well-known that they can calculate the exact marginal probabilities of variables when the underlying factor graph is a tree, which is proved analytically. If Belief Propagation BP and Survey Propagation SP are used as the local rule τ\tau, it is natural to expect that the τ\tau-decimation algorithm can find a solution. However, we prove that even the local rule τ\tau can give the exact marginal probabilities of variables over a randomly chosen solution for any instance whose factor graph is a tree, the τ\tau-decimation algorithm still cannot find a solution w.h.p. for k13k\geq 13. We know that w.h.p. the local neighborhood of the factor graph of the random kk-XORSAT instance is a tree. Therefore, running BP and SP on the local neighborhood actually gives the exact marginal probabilities of the selected variables, with respect to the sub-instance induced by the local neighborhood. This implies that both BP-decimation algorithm DECBP and SP-decimation algorithm DECSP fail to find a solution w.h.p. for k13k\geq 13.

Theorem 3.

For any k13k\geq 13, r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), given a random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), denote by DECτ\tau(𝚽)(\mathbf{\Phi}) the output assignment from the τ\tau-decimation algorithm DECUC on input 𝚽\mathbf{\Phi}. Assume the local rule τ\tau outputs the exact marginal probability of a selected variable for any instance whose factor graph is a tree. Then, we have limnPr[DECτ(𝚽)𝒮(𝚽)]=0.\lim_{n\rightarrow\infty}\Pr\left[\,{\textnormal{{DEC\textsubscript{$\tau$}}}(\mathbf{\Phi})\in\mathcal{S}(\mathbf{\Phi})}\,\right]=0.

To prove Theorem 2 and Theorem 3, we only need to calculate the number of free steps in DECUC and the number of free steps in DECτ\tau with the assumption on τ\tau described in Theorem 3. If the number of free steps is strictly greater than 2μ(k,r)n2\mu(k,r)n with high probability, we know that they are strictly 2μ(k,r)2\mu(k,r)-free, and the results follow immediately by applying Theorem 1. Similarly, to obtain the same results for other τ\tau-decimation algorithms, all we need to do is to calculate the number of free steps for those algorithms. If they are strictly 2μ(k,r)2\mu(k,r)-free, we can obtain the same results by applying Theorem 1. Note that, due to certain limitations of our calculation, our results are limited to k9k\geq 9 in Theorem 2 and k13k\geq 13 in Theorem 3. We believe that the results hold for general k3k\geq 3, and can be proved by improving some subtle calculation in our argument.

Although we only show a few of implementations of the sequential local algorithms, we believe that the results are general across many sequential local algorithms with different local rules due to Theorem 3. In the framework of sequential local algorithms, the role of the local rules is to approximate the marginal probabilities of the selected variables over a random solution for the instance induced by the local neighborhood centered at the selected variables. Therefore, we believe that for any local rule that can make a good approximation on the marginals, it shall give similar results as Theorem 3. (Note that a more general definition of ”δ\delta-free” may be useful, for example, we can say a τ\tau-decimation algorithm is (δ,ϵ)(\delta,\epsilon)-free if we have |p1/2|<ϵ|p-1/2|<\epsilon, where pp is the value returned by the local rule, for at least δn\delta n iterations.)

It is worth to mention the differences between these implementations of the sequential local algorithms and their well-known variants. Firstly, the UC-decimation algorithm is slightly different from the well-known Unit Clause algorithm. Under the framework of sequential local algorithm, the variables are selected in a random order. However, in the well-studied Unit Clause algorithm the variables in unit clauses are selected first [AKKT02]. The difference in the variable order could be crucial to the effectiveness of the Unit Clause algorithm. Secondly, the BP-decimation algorithm and the SP-decimation algorithm are slightly different from the Belief Propagation Guided Decimation Algorithm [Coj17] and Survey Propagation Decimation Algorithm [BZ04, BMZ05, MMW07]. In the framework of sequential local algorithms, we only provide the local neighborhood to BP and SP. It is equivalent to bounding the number of messages passed in each decimation step by the constant R0R\geq 0 in BP-guided Decimation Algorithm and SP-guided Decimation Algorithm. It is in contrast to many empirical studies of BP-guided Decimation Algorithm and SP-guided Decimation Algorithm which allow the message passing iteration to continue until it converges.

Moreover, this work provides a new variant of the overlap gap property method, which was originally introduced by Gamarnik and Sudan [GS17b]. Instead of considering the overlap gap property of the whole instance, we utilize that property of a sub-instance of the random kk-XORSAT instance. In particular, the proof of this work is inspired by [GS17b], which uses the overlap gap property method to rule out the class of balanced sequential local algorithms from being able to solve random NAE-kk-SAT problem when the clause density is close to the satisfiability threshold. Instead of directly applying the method on the whole instance, we focus on the sub-instance induced by the 2-core of the factor graph of the instance. This modification help us obtain tight bounds of algorithmic threshold unlike [GS17b]. If we apply the original overlap gap property method and use the first moment method to obtain the property, we are able to show that the sequential local algorithms fail to solve the random kk-XORSAT problem when the clause density is lower than a certain threshold r1(k)r_{1}(k). However, that threshold r1(k)r_{1}(k) is much higher than the rcore(k)r_{core}(k). It only tells us that the algorithms fails when the density is very close to the satisfiability threshold rsat(k)r_{sat}(k). With our modification, we can lower that threshold to exactly rcore(k)r_{core}(k), namely, the algorithms fail in finding a solution when the clause density is as low as the clustering threshold. This opens a new possibility to improve other results which use the overlap gap property method on other random constraint satisfaction problems.

1.3 Phase transition of random kk-XORSAT

Many random ensembles of constraint satisfaction problems CSPs such as random kk-SAT and random NAE-kk-SAT are closely related to the random kk-XORSAT. For example, the well-known random kk-SAT is analogous to the random kk-XORSAT, in the sense that we can obtain a kk-XORSAT instance from a kk-SAT instance by replacing OR operators with XOR operators. We are particularly interested in the existences of some sharp thresholds on the clause density rr in which the behaviors of a random instance changes sharply when the clause density rr grows and passes through those thresholds. The following three thresholds are particularly of interest.

  1. 1.

    The satisfiability threshold separates the regime where w.h.p. the random instance is satisfiable and the regime where w.h.p. it is unsatisfiable.

  2. 2.

    The clustering threshold separates the regime where w.h.p. the solution space can be partitioned into well-separated subsets, each containing an exponential small fraction of solutions, and the regime where w.h.p. the solution space cannot be partitioned in this way.

  3. 3.

    The algorithmic threshold separates the regime where we have an efficient algorithm that can find a solution for a satisfiable random instance with non-vanishing probability, and the regime where no such algorithm exists.

Many random constraint satisfaction problems such as random kk-SAT, random NAE-kk-SAT and random graph coloring share the following phenomena related to these thresholds [AC08].

  • There is an upper bound of the (conjectured) satisfiability threshold.

  • There is a lower bound of the (conjectured) satisfiability threshold, from the non-constructive proof, and the lower bound is essentially tight.

  • There are some polynomial time algorithms that can find a solution when the density is relatively low, but no polynomial time algorithm is known to succeed when the density is close to the satisfiability threshold. This leads to a conjectured algorithmic threshold, which is asymptotically below the (conjectured) satisfiability threshold.

  • The clustering phenomenon takes place when the density is greater than a (conjectured) clustering threshold, and this threshold is close to or even asymptotically equal to the algorithmic threshold.

It is worth to mention that not every random constraint satisfaction problems share this set of phenomena. The most notable example is symmetric binary perceptron (SBP). Its satisfiability threshold αsat(k)>0\alpha_{sat}(k)>0 was established by [PX21, ALS22b]. They also showed that SBP exhibits clustering property and almost all clusters are singletons, for clause density α>0\alpha>0. On the other hand, [ALS22a] gave a polynomial-time algorithm that can find solutions w.h.p., for low clause density. Therefore, there is a regime of low clause density in which SBP exhibits clustering property, and it is solvable in polynomial time, simultaneously. Its clustering phenomenon does not cause the hardness.

Being analogous to the random kk-SAT problem, the random kk-XORSAT problem shares those phenomena with other random constraint satisfaction problems. However, the story is slightly different in the random kk-XORSAT problem. Since the kk-XORSAT instances are equivalent to Boolean linear systems, their solution spaces are simply some affine subspaces in the Hamming hypercube {0,1}n\{0,1\}^{n}. Because of their algebraic structures, we are able to obtain the existence and the (nn-independent) value of the satisfiability threshold of the random kk-XORSAT problem. Dubois and Mandler [DM02] proved that there exists an nn-independent satisfiability threshold rsat(k)r_{sat}(k) for k=3k=3, and determined the exact value of the sharp threshold by the second moment argument. Pittel and Sorkin [PS16] further extended it for general k3k\geq 3. An independent work on cuckoo hashing from [DGM+10] also included an argument of the kk-XORSAT satisfiability threshold for general k3k\geq 3. Those proofs consider the 2-core of the hypergraph associated with the random kk-XORSAT instance. (In graph theory, a kk-core of a (hyper)graph is a maximal subgraph in which all vertices have degree at least kk.) Based on the 2-core, we can construct a sub-instance, called 2-core instance or simply core instance of the original instance. One can prove that the original instance is satisfiable if and only if core instance is satisfiable. [CDMM03, MRZ03] studied the core instance and determined the satisfiability threshold of the core instance, which can be converted to the satisfiability threshold of the random kk-XORSAT instance.

Mézard, Ricci-Tersenghi and Zecchina [MRZ03] started the study of the clustering phenomenon of random kk-XORSAT, and linked it to the existence of the non-empty 2-core instance. From [PSW96, Mol05, Kim04], we know the non-empty 2-core of random hypergraphs suddenly emerges at a critical edge density rcore(k)r_{core}(k). After that, Ibrahimi, Kanoria, Kraning and Montanari [IKKM12], and Achlioptas and Molloy [AM15] independently proved that there exists the clustering threshold rclt(k)r_{clt}(k), which is equal to rcore(k)r_{core}(k) and smaller than rsat(k)r_{sat}(k), such that w.h.p. the solution space is a connected component for density r<rcore(k)r<r_{core}(k), and w.h.p. the solution space shatters into exponentially many Θ(n)\Theta(n)-separated components for density r>rcore(k)r>r_{core}(k), provided that we consider the solution space as a graph in which we add an edge between two solutions if their Hamming distance is O(logn)O(\log n).

As we mentioned before, the random kk-XORSAT instance can be written as a random Boolean linear system, so it can be solved in polynomial time by using linear algebra method, regardless the clause density. For example, Gaussian elimination can solve it in O(n3)O(n^{3}) time. Since we do not have linear time algebraic method to solve linear system, we can still study the algorithmic phase transition if we only consider linear-time algorithms as efficient algorithms. In the proofs in [IKKM12], they provided an algorithm that can find a solution in linear time when r<rcore(k)r<r_{core}(k), which implies that rcore(k)r_{core}(k) is a lower bound of the (linear-time version of) algorithmic threshold ralg(k)r_{alg}(k) of the random kk-XORSAT problem. We conjecture that no algorithm can solve the random kk-XORSAT problem in linear time with non-vanishing probability when r>rcore(k)r>r_{core}(k), which implies rcore(k)r_{core}(k) is an upper bound of ralg(k)r_{alg}(k) and thus ralg(k)=rcore(k)r_{alg}(k)=r_{core}(k). This would lead to the intimate relation between the failure of linear time algorithms on random kk-XORSAT and the clustering phenomenon of its solution space.

1.4 Sequential local algorithms

Sequential local algorithms are a class of algorithms parametrized by a local rule τ\tau that specifies how values should be assigned to variables based on the ”neighborhoods” of the variables. Given a local rule τ\tau, the sequential local algorithm can be written as the following τ\tau-decimation algorithm.

Given a fixed even number R0R\geq 0, we denote by R\mathcal{I}_{R} the set of all instances in which each of those instances has exactly one of its variables selected as root, and all nodes in its factor graph have distances from the root variable node at most RR. A local rule is defined to be a function τ:R[0,1]\tau:\mathcal{I}_{R}\rightarrow[0,1]\in\mathbb{R}, mapping from R\mathcal{I}_{R} to the interval [0,1][0,1]. Given an instance Φ\Phi, since the local neighborhood BΦ(x,R)B_{\Phi}(x^{*},R) of a variable node xx^{*} represents a sub-instance of Φ\Phi induced by all nodes having distance at most RR from the root variable node xx^{*}, we have BΦ(x,R)RB_{\Phi}(x^{*},R)\in\mathcal{I}_{R} and τ(BΦ(x,R))\tau(B_{\Phi}(x^{*},R)) is well-defined. Then, the τ\tau-decimation algorithm can be expressed as the followings.

Algorithm 1 τ\tau-decimation algorithm
1:Input: an instance of the k-XORSAT problem Φ\Phi, an even number R0R\geq 0, and a local rule τ:R[0,1]\tau:\mathcal{I}_{R}\rightarrow[0,1].
2:Set Φ0=Φ\Phi_{0}=\Phi.
3:for t=0,,n1t=0,...,n-1 do
4:     Select an unassigned variable xx^{*} from Φt\Phi_{t}, uniformly at random.
5:     Set σ(x)={1 with probability τ(BΦt(x,R))0 with probability 1τ(BΦt(x,R))\sigma(x^{*})=\begin{cases}1\text{\quad with probability\;}\tau(B_{\Phi_{t}}(x^{*},R))\\ 0\text{\quad with probability\;}1-\tau(B_{\Phi_{t}}(x^{*},R))\end{cases}
6:      Obtain Φt+1\Phi_{t+1} from Φt\Phi_{t} by: (i) remove xx^{*}; (ii) for any clause having xx^{*} before (i), add σ(x)\sigma(x^{*}) to its right-hand-side value; (iii) remove all clauses that no longer contain any variable.
7:end for
8:Output: the assignment σ\sigma.

For any t[n]t\in[n], if the value τ(BΦt(x,R))\tau(B_{\Phi_{t}}(x^{*},R)) given by the local rule τ\tau in the tt-th iteration is 1/21/2, then we call that iteration a free step. In a free step, the τ\tau-decimation algorithm simply assigns a uniformly random Boolean value to the selected variable. On the contrary, if the value τ(BΦt(x,R))\tau(B_{\Phi_{t}}(x^{*},R)) given by the local rule τ\tau in the tt-th iteration is either 0 or 1, then we call that iteration a forced step. In a forced step, the τ\tau-decimation algorithm is forced to assign a particular value to the selected variable according to the value τ(BΦt(x,R))\tau(B_{\Phi_{t}}(x^{*},R)). To simplify our discussion, we introduce the following definitions for those τ\tau-decimation algorithms having certain numbers of free steps.

Definition 1.

For any δ[0,1]\delta\in[0,1], we say a τ\tau-decimation algorithm DECτ\tau is δ\delta-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) if w.h.p the τ\tau-decimation algorithm DECτ\texttt{DEC}_{\tau} on input 𝚽\mathbf{\Phi} has at least δn\delta n free steps.

Definition 2.

For any δ[0,1]\delta\in[0,1], we say a τ\tau-decimation algorithm DECτ\tau is strictly δ\delta-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) if there exists δ>δ\delta^{\prime}>\delta such that the τ\tau-decimation algorithm DECτ\tau is δ\delta^{\prime}-free on 𝚽\mathbf{\Phi}.

There are many choices for the local rules τ\tau. The simplest one is the Unit Clause Propagation UC. In each iteration, after selecting the unassigned variable xx^{*}, UC checks whether there exists a unit clause (clause with one variable) on the variable xx^{*}. If yes, then UC sets τ(BΦt(x,R))\tau(B_{\Phi_{t}}(x^{*},R)) to be the right-hand-side value of the unit clause, which can force the decimation algorithm to pick the suitable value to satisfy that clause. In this case, this iteration is a forced step. (If there are multiple unit clauses on the selected variable xx^{*}, then only consider the one with the lowest index.) If there is no unit clause on the selected variable xx^{*}, then UC sets τ(BΦt(x,R))\tau(B_{\Phi_{t}}(x^{*},R)) to 1/21/2, which let the algorithm choose the assigned value randomly. In this case, this iteration is a free step.

Algorithm 2 Unit Clause Propagation UC
1:Input: the selected variable xx^{*}, and its local neighborhood BΦt(x,2)B_{\Phi_{t}}(x^{*},2)
2:if there exists any unit clause on the variable xx^{*} then
3:     Pick the unit clause cc on the variable xx^{*} (with the lowest index if having multiple such clauses).
4:     Output: the right-hand-side value of the clause cc.
5:else
6:     Output: 1/21/2.
7:end if

1.5 Message passing algorithms

A new challenger to break the algorithmic threshold came out from statistical mechanics. In experiments [MPZ02, GS02, BMZ05, RS09, KSS09], the message passing algorithms demonstrated their high efficiency on finding solutions of random kk-SAT problem with the densities close to the satisfiability threshold. Those algorithms include Belief Propagation Guided Decimation Algorithm and Survey Propagation Guided Decimation Algorithm, which are based on the insightful but non-rigorous cavity method from statistical mechanics [BMZ05, RS09]. Unfortunately, several analyses showed that they do not outperform the best known algorithms for some problems. Coja-Oghlan [Coj17] showed that BP-guided Decimation fails to find solutions for random kk-SAT w.h.p. for density above ρ02k/k\rho_{0}2^{k}/k for a universal constant ρ0>0\rho_{0}>0, and thus does not outperform the best known algorithm from [Coj10]. Hetterich [Het16] also gave the same conclusion for SP-guided Decimation by showing that it fails w.h.p. for density above (1+ok(1))2klnk/k(1+o_{k}(1))2^{k}\ln k/k.

For random NAE-kk-SAT, Gamarnik and Sudan [GS17b] showed that the balanced sequential local algorithms fail to find solutions for density above (1+ok(1))2k1ln2k/k(1+o_{k}(1))2^{k-1}\ln^{2}k/k for sufficiently large kk. This means the algorithms do not outperform the best known algorithm, Unit Clause algorithm, which can find solutions w.h.p. for density up to ρ 2k1/k\rho\,2^{k-1}/k for some universal constant ρ>0\rho>0 for sufficiently large kk [AKKT02]. The framework of balanced sequential local algorithms also covers BP-guided Decimation and SP-guided Decimation with the number of message passing iterations is bounded by O((lnlnn)O(1))O((\ln\ln n)^{O(1)}).

In our work, we obtain an analogous result. In Theorem 1, we show that w.h.p. strictly 2μ(k,r)2\mu(k,r)-free sequential local algorithms fails to solve the random kk-XORSAT problem when the clause density exceeds the clustering threshold. Then, in Theorem 3, we show that any sequential local algorithm with local rule that can compute the exact marginals are strictly 2μ(k,r)2\mu(k,r)-free and thus fails to find a solution for random kk-XORSAT problem. This theorem covers the sequential local algorithms with Belief Propagation BP and Survey Propagation SP as local rules.

1.6 Technique

The works from [AC08, ACR11] demonstrated the clustering phenomenon for several random CSPs, and conjectured that it could be an obstruction of solving those problems. [GS17a, GS17b] and subsequent works leveraged a different notion of clustering, named overlap gap property (OGP) by [GL18], to link the clustering phenomenon to the hardness rigorously. Gamarnik gave a detailed survey on it [Gam21].

This paper focuses on the vanilla version of the OGP. Given an instance Φ\Phi of the constraint satisfaction problem, we say it exhibits the overlap gap property with values 0v1<v20\leq v_{1}<v_{2} if every two solutions σ\sigma and σ\sigma^{\prime} satisfy either d(σ,σ)v1d(\sigma,\sigma^{\prime})\leq v_{1} or d(σ,σ)v2d(\sigma,\sigma^{\prime})\geq v_{2}, where dd is a metric on its solution space. (We assume dd is the Hamming distance throughout this paper.) Intuitively, it means every pair of solutions are either close to each other, or far from each other, and thus the solution space of the instance exhibits a topological discontinuity based on the proximity.

Now we illustrate how does the overlap gap property method. (Some details of the overlap gap property method is slightly different if we consider different variants of the OGP, but the overall idea of topological barrier stays the same.) Assume we have an algorithm 𝒜\mathcal{A} that takes a random instance 𝚽\mathbf{\Phi} as input and outputs an assignment σ\sigma for the instance. The output σ\sigma can be viewed as a random variable which depends on both the random instance 𝚽\mathbf{\Phi} and some internal random variables, represented by a random internal vector 𝐈\mathbf{I}, of the algorithm. Let Φ\Phi and II be realizations of 𝚽\mathbf{\Phi} and 𝐈\mathbf{I} respectively, and denote the output assignment as σ0=σΦ,I\sigma_{0}=\sigma_{\Phi,I}. We then re-randomize the components of the internal vector II one-by-one. After each re-randomizing a component of II, we run the algorithm again to generate a new output assignment. Then, we obtain a sequence of assignments σ0,σ1,,σT\sigma_{0},\sigma_{1},\cdots,\sigma_{T} for the instance Φ\Phi, where TT is the number of components of the random vector II that we have re-randomized. Next, we show that the algorithm is insensitive to its input in the sense that when one of the components of the random internal vector 𝐈\mathbf{I} is re-randomized, the output assignment almost remains unchanged, In particular, d(σi,σi+1)d(\sigma_{i},\sigma_{i+1}) is smaller than v2v1v_{2}-v_{1}. We also show that the algorithm has certain freeness in the sense that when all components in the random internal are re-randomized the output assignment is expected to change a lot. In particular, 𝔼[d(σ0,σT)]\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{T})}\right] should be larger than v2v_{2}. These two properties together imply that the sequence of assignments cannot ”jump” over the overlap gap, while two ends of the sequence probably lie in different clusters. Therefore, there should be an assignment σT0\sigma_{T_{0}} that falls in the gap, namely, there exists T0>0T_{0}>0 such that v1d(σ0,σT0)v2v_{1}\leq d(\sigma_{0},\sigma_{T_{0}})\leq v_{2}. If the probability that the algorithm successfully finds a solution is greater than some small value sns_{n} slowly converging to 0, then there could be a very small probability that both σ0\sigma_{0} and σT0\sigma_{T_{0}} are solutions of the instance Φ\Phi, with v1d(σ0,σT0)v2v_{1}\leq d(\sigma_{0},\sigma_{T_{0}})\leq v_{2}. Even though this probability is very small, it still has the chance to violates the OGP of the instance. Then, by contradiction, we could conclude that the probability that the algorithm succeeds in finding a solution is smaller than sn=o(1)s_{n}=o(1), namely, w.h.p. the algorithm fails in finding a solution.

Instead of considering the overlap gap property of the entire instance 𝚽\mathbf{\Phi}, we move our focus to the overlap gap property of a sub-instance of 𝚽\mathbf{\Phi}. Indeed, the sub-instance we consider is the 2-core instance 𝚽𝐜\mathbf{\Phi_{c}} induced by the 2-core of the factor graph representation of the random instance 𝚽\mathbf{\Phi}. In [IKKM12, AM15], they proved that the 2-core instance 𝚽𝐜\mathbf{\Phi_{c}} exhibits the overlap gap property with v1=o(n)v^{\prime}_{1}=o(n) and v2=ϵknv^{\prime}_{2}=\epsilon_{k}n for some constant ϵk>0\epsilon_{k}>0 for clause density rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k). We remove all variables not in the core instance from the sequence of assignments σ0,σ1,,σT\sigma_{0},\sigma_{1},\cdots,\sigma_{T} we obtained above, then it becomes a sequence of assignments σ0,σ1,,σT\sigma^{\prime}_{0},\sigma^{\prime}_{1},\cdots,\sigma^{\prime}_{T} for the core instance. We also prove that the algorithm is insensitive to its input with respect to the core instance in the sense that d(σi,σi+1)<v2v1d(\sigma^{\prime}_{i},\sigma^{\prime}_{i+1})<v^{\prime}_{2}-v^{\prime}_{1}, and has certain freeness so that 𝔼[d(σ0,σT)]>v2\mathop{{}\mathbb{E}}\left[{d(\sigma^{\prime}_{0},\sigma^{\prime}_{T})}\right]>v^{\prime}_{2}. By repeating the above argument of the overlap gap property method, we can conclude that w.h.p. the algorithm fails in find a solution.

Our proof can also be used for the non-sequential local algorithms. Since the local rule τ\tau runs on the local neighborhood of each variable in parallel, the values assigned to variables do not depend on each other. Informally speaking, there is no long-range dependency among those assigned values. Therefore, re-randomizing one component of the internal vector II, say Ii+1I_{i+1}, only affects the value σ(xi+1)\sigma(x_{i+1}) assigned to the corresponding variable xi+1x_{i+1}. So, we have d(σi,σi+1)1<v2v1d(\sigma_{i},\sigma_{i+1})\leq 1<v^{\prime}_{2}-v^{\prime}_{1}. Hence, we can obtain the same result as Theorem 1 for non-sequential local algorithms with the same proof.

1.7 Related works

The vanilla version of OGP helps us rule out some large classes of algorithms on random CSPs for relatively high clause densities, but it is not sophisticated enough to close the statistical-to-computational gap in some cases such as random NAE-kk-SAT discussed in [GS17b] and random kk-XORSAT discussed in this paper (more details in Section 2). There have been some recent works trying to improve the notion of OGP by developing different variants of OGP. The most notable one is multi-OGP [Wei22, BH22, HS22, GKPX23, GK23], which succeeds in closing the statistical-to-computational gap in certain models. However, it is not clear about the relation between the clustering property and the multi-OGP.

2 Overlap Gap Property

We say that a kk-XORSAT instance Φ\Phi exhibits the overlap gap property (or shortened as OGP) with the values 0v1<v20\leq v_{1}<v_{2} if for any two solutions σ,σS(Φ)\sigma,\sigma^{\prime}\in S(\Phi) we have either d(σ,σ)v1d(\sigma,\sigma^{\prime})\leq v_{1} or d(σ,σ)v2d(\sigma,\sigma^{\prime})\geq v_{2}. Informal speaking, any two solutions of the instance are either close to each other or far away from each other, and thus the solution space exhibits a topological discontinuity. Given a random kk-XORSAT instance, we can prove that it exhibits the OGP when the clause density is greater than certain value, and obtain the following lemma. (The proof is given in the Appendix H.)

Lemma 1.

For any k3k\geq 3, there exists r1(k)>0r_{1}(k)>0 such that for r>r1(k)r>r_{1}(k) and any pair of solutions σ,σS(𝚽)\sigma,\sigma^{\prime}\in S(\mathbf{\Phi}) of the random kk-XORSAT instance 𝚽𝚽n(k,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{n}(k,rn), w.h.p. the distance d(σ,σ)d(\sigma,\sigma^{\prime}) between the two solutions is either u1n\leq u_{1}n or u2n\geq u_{2}n for some 0u1<u20\leq u_{1}<u_{2}. In particular, the value of r1(k)r_{1}(k) is given by

r1(k)=min0α121+H(α)2log2(1+(12α)k),\displaystyle r_{1}(k)=\min_{0\leq\alpha\leq\frac{1}{2}}\frac{1+H(\alpha)}{2-\log_{2}(1+(1-2\alpha)^{k})},

where HH is the binary entropy function, that is, H(x)=xlog2(x)(1x)log2(1x)H(x)=-x\log_{2}(x)-(1-x)\log_{2}(1-x).

Instead of considering the random kk-XORSAT instance 𝚽\mathbf{\Phi} itself, we focus on the sub-instance, called the core instance (defined below), of the random kk-XORSAT instance 𝚽\mathbf{\Phi}, and show that core instance also exhibits the overlap gap property, even when the clause density is much lower. (See Table 1.)

kk 3 4 5 6 7 8 9
rcore(k)r_{core}(k) 0.818470 0.772280 0.701780 0.637081 0.581775 0.534997 0.495255
r1(k)r_{1}(k) 0.984516 0.943723 0.905812 0.874349 0.848314 0.826470 0.807862
Table 1: Compare rcore(k)r_{core}(k) with r1(k)r_{1}(k) for different kk. The numeric values in the table are rounded off to 6 decimal places.

We start from defining the peeling algorithm and the core instances. Given an XORSAT instance Φ\Phi, suppose there exists a variable xx of degree 1, which means it is involved in exactly one clause ee. We remove the variable xx and the only clause cc involving xx, to obtain a modified instance Φ\Phi^{\prime}. If we have a solution σ\sigma^{\prime} for the modified instance Φ\Phi^{\prime}, we can always choose a suitable value for the variable xx to satisfy the clause cc, and extend the solution σ\sigma^{\prime} to a solution σ\sigma for the original instance Φ\Phi. Similarly, we can also do the same thing if the variable xx is of degree 0, since it does not involve in any equation, and we are free to choose any value for it. By doing this, solving the original instance Φ\Phi is reduced to solving the modified instance Φ\Phi^{\prime}.

We can repeat this process until there is no variable of degree at most 1. This process is named the peeling algorithm on an instance as its input (Algorithm 3). We call the resultant instance the 2-core instance (or simply the core instance) of the instance Φ\Phi, denoted by Φc\Phi_{c}. This name is borrowed from the graph theory. In graph theory, the kk-core of a graph is the maximal subgraph with minimum degree at least kk. It is known that the factor graph of the core instance Φc\Phi_{c} is exactly the maximal subgraph of the factor graph GΦG_{\Phi} of the instance Φ\Phi, with minimum variable degree at least 2.

Algorithm 3 Peeling algorithm
1:Input: an instance Φ\Phi.
2:while There exists 1\geq 1 variable of degree 1\leq 1do
3:     Select a variable xx of degree 1\leq 1. (Pick xx with the lowest index if there are >1>1 such variables.)
4:     Update Φ\Phi by removing the variable xix_{i} and its only involved clause (if exists).
5:end while
6:Output: the resultant instance Φ\Phi.

Mézard and Montanari [MM09] gave a detailed description on the structure of the core instance. Reader can find more details about the core instance in their book. The following theorem is a short summary of some known facts about the core instances we needed in the paper.

Theorem 4.

For any k3k\geq 3, there exists rcore(k)>0r_{core}(k)>0 given by

rcore(k)sup{r[0,1]:Q>1ekrQk1Q(0,1)}\displaystyle r_{core}(k)\equiv\sup\{r\in[0,1]:Q>1-e^{-krQ^{k-1}}\;\forall Q\in(0,1)\}

such that the factor graph GcG_{c} of the core instance 𝚽c\mathbf{\Phi}_{c} of the random kk-XORSAT instance 𝚽𝚽n(k,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{n}(k,rn) have the following properties.

  1. 1.

    For r<rcore(k)r<r_{core}(k), w.h.p. the factor graph GcG_{c} of the core instance 𝚽c\mathbf{\Phi}_{c} is an empty graph.

  2. 2.

    For r>rcore(k)r>r_{core}(k), w.h.p. the factor graph GcG_{c} of the core instance 𝚽c\mathbf{\Phi}_{c} have V(k,r)n+o(n)V(k,r)n+o(n) variable nodes, where

    V(k,r)\displaystyle V(k,r) =1exp(krQk1)krQk1exp(krQk1)\displaystyle=1-\exp(-krQ^{k-1})-krQ^{k-1}\exp(-krQ^{k-1})

    and QQ is the largest solution of the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}). In particular, the fraction of variable nodes of degree ll is between Λ^lϵ\widehat{\Lambda}_{l}-\epsilon and Λ^l+ϵ\widehat{\Lambda}_{l}+\epsilon with probability greater than 1eΘ(n)1-e^{-\Theta(n)}, where Λ^0=Λ^1=0\widehat{\Lambda}_{0}=\widehat{\Lambda}_{1}=0 and

    Λ^l\displaystyle\widehat{\Lambda}_{l} =1ekrQk11krQk11l!(krQk1)l for l2.\displaystyle=\frac{1}{e^{krQ^{k-1}}-1-krQ^{k-1}}\;\frac{1}{l!}\;(krQ^{k-1})^{l}\text{\quad for \;}l\geq 2.
  3. 3.

    Conditioning on the number of variable nodes V(k,r)n+o(n)V(k,r)n+o(n) and the degree profile Λ^\widehat{\Lambda}, the factor graph GcG_{c} of the core instance 𝚽c\mathbf{\Phi}_{c} is distributed according to the ensemble containing all possible factor graphs of kk-XORSAT instances of V(k,r)n+o(n)V(k,r)n+o(n) variables and variable degree distribution Λ\Lambda.

Theorem 4 shows that there exists a threshold rcore(k)r_{core}(k) below the satisfiability threshold rsat(k)r_{sat}(k) of random kk-XORSAT problem. When the clause density rr is below the threshold rcore(k)r_{core}(k), w.h.p. the instance 𝚽\mathbf{\Phi} does not have a core instance. When the clause density rr is above the threshold rcore(k)r_{core}(k), w.h.p. the core instance 𝚽c\mathbf{\Phi}_{c} emerges. In particular, the variable degree distribution is a Poisson distribution with mean krQk1krQ^{k-1} conditioning on Λ0=Λ1=0\Lambda_{0}=\Lambda_{1}=0. [MM09] also showed that the core instance exhibits the OGP.

Lemma 2.

For k3k\geq 3 and rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k), there exists ϵ(k,r)>0\epsilon(k,r)>0 such that w.h.p. the distance between any two solutions for the core instance 𝚽c\mathbf{\Phi}_{c} of a random kk-XORSAT instance 𝚽𝚽n(k,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{n}(k,rn) is either o(n)o(n) or greater than ϵ(k,r)n\epsilon(k,r)n.

Now, we know that w.h.p. the core instance of a random kk-XORSAT instance has the overlap gap property with the values v1=o(n)v_{1}=o(n) and v2=ϵ(k,r)nv_{2}=\epsilon(k,r)n. With OGP, we can partition the solution space of the core instance into multiple groups, each called a core cluster, such that the distance between any pair of core solutions in the same core cluster is at most o(n)o(n), and the distance between any pair of core solutions in different core clusters is at least ϵ(k,r)n\epsilon(k,r)n.

Suppose we have a kk-XORSAT instance Φ\Phi. We first define a binary relation on the solution space of a core instance Φc\Phi_{c} by: for σc,σc𝒮(Φc)\sigma_{c},\sigma^{\prime}_{c}\in\mathcal{S}(\Phi_{c}), we write σcσc\sigma_{c}\simeq\sigma^{\prime}_{c} if and only if d(σc,σc)=o(n)d(\sigma_{c},\sigma^{\prime}_{c})=o(n). It is easy to see it is an equivalence relation. Then, we can partition the solution space by the equivalence classes of \simeq. We can denote those equivalence classes by 𝒮c,1,𝒮c,2,,𝒮c,nc\mathcal{S}_{c,1},\mathcal{S}_{c,2},...,\mathcal{S}_{c,n_{c}}. Thus, we have 𝒮c,1𝒮c,2𝒮c,nc=𝒮(Φc)\mathcal{S}_{c,1}\sqcup\mathcal{S}_{c,2}\sqcup...\sqcup\mathcal{S}_{c,n_{c}}=\mathcal{S}(\Phi_{c}) where \sqcup is the disjoint union. Then, we have

d(σc,σc)\displaystyle d(\sigma_{c},\sigma^{\prime}_{c}) =o(n) if σc,σc𝒮c,i, and\displaystyle=o(n)\text{\quad\quad\quad\, if \;}\sigma_{c},\sigma^{\prime}_{c}\in\mathcal{S}_{c,i},\text{\quad and}
d(σc,σc)\displaystyle d(\sigma_{c},\sigma^{\prime}_{c}) ϵ(k,r)n if σc𝒮c,i,σc𝒮c,j and 𝒮c,i𝒮c,j\displaystyle\geq\epsilon(k,r)n\text{\quad\quad if \;}\sigma_{c}\in\mathcal{S}_{c,i},\;\sigma^{\prime}_{c}\in\mathcal{S}_{c,j}\text{\;and\;}\mathcal{S}_{c,i}\neq\mathcal{S}_{c,j}

Now we can partition the solution space 𝒮(Φ)\mathcal{S}(\Phi) of the original instance Φ\Phi into clusters based on the partition of the solution space of core instance. We set

𝒮(Φ)=i=1nc𝒮i and 𝒮i={σ𝒮(Φ):π(σ)𝒮c,i} for i=1,2,,nc\displaystyle\mathcal{S}(\Phi)=\bigsqcup_{i=1}^{n_{c}}\mathcal{S}_{i}\text{\quad and \quad}\mathcal{S}_{i}=\{\sigma\in\mathcal{S}(\Phi):\pi(\sigma)\in\mathcal{S}_{c,i}\}\text{\quad for\;}i=1,2,...,n_{c}

where π\pi is defined to be the projection mapping assignments for the instance Φ\Phi to assignments for the core instance Φc\Phi_{c} by removing all variables not in the core instance Φc\Phi_{c}. Each 𝒮i\mathcal{S}_{i} is called a cluster in the solutions space 𝒮(Φ)\mathcal{S}(\Phi). We can then prove that these clusters are well-separated from each other.

Lemma 3.

Let k3k\geq 3 and rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k). Suppose 𝚽𝚽n(k,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{n}(k,rn) is a random kk-XORSAT instance. Then, w.h.p. there exists a partition 𝒮(𝚽)=𝒮1𝒮1𝒮nc\mathcal{S}(\mathbf{\Phi})=\mathcal{S}_{1}\sqcup\mathcal{S}_{1}\sqcup...\sqcup\mathcal{S}_{n_{c}} for the solutions space S(𝚽)S(\mathbf{\Phi}) of the random instance 𝚽\mathbf{\Phi} such that the following statements hold.

  1. 1.

    If σ,σ𝒮i\sigma,\sigma^{\prime}\in\mathcal{S}_{i} for some i[nc]i\in[n_{c}], then we have d(σ,σ)μ(k,r)n+o(n)d(\sigma,\sigma^{\prime})\leq\mu(k,r)n+o(n), where the real-valued function μ(k,r)\mu(k,r) is given by μ(k,r)=exp(krQk1)+krQk1exp(krQk1)\mu(k,r)=\exp(-krQ^{k-1})+krQ^{k-1}\exp(-krQ^{k-1}) and QQ is the largest solution of the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}).

  2. 2.

    If σ𝒮i,σ𝒮j\sigma\in\mathcal{S}_{i},\sigma^{\prime}\in\mathcal{S}_{j} and 𝒮i𝒮j\mathcal{S}_{i}\neq\mathcal{S}_{j} for some i,j[nc]i,j\in[n_{c}], then we have d(σ,σ)ϵ(k,r)nd(\sigma,\sigma^{\prime})\geq\epsilon(k,r)n.

Proof.

Assume the instance 𝚽\mathbf{\Phi} has a non-empty core instance 𝚽c\mathbf{\Phi}_{c}, which exists with high probability according to Theorem 4. We also assume the core instance 𝚽c\mathbf{\Phi}_{c} exhibits the OGP with v1=o(n)v_{1}=o(n) and v2=ϵ(k,r)nv_{2}=\epsilon(k,r)n, which occurs with high probability according to Lemma 2. Let σ\sigma and σ\sigma^{\prime} be two solutions of the random kk-XORSAT instance 𝚽\mathbf{\Phi}, and let σc=π(σ)\sigma_{c}=\pi(\sigma) and σc=π(σ)\sigma^{\prime}_{c}=\pi(\sigma^{\prime}) be the projection of σ\sigma and σ\sigma^{\prime} on the core solution space 𝒮(𝚽)\mathcal{S}(\mathbf{\Phi}), respectively.

To prove the first part of the lemma, we assume that σ\sigma and σ\sigma^{\prime} are in the same cluster, that is, σ,σ𝒮i\sigma,\sigma^{\prime}\in\mathcal{S}_{i} for some i[nc]i\in[n_{c}]. By the definition of cluster, we have d(σc,σc)=o(n)d(\sigma_{c},\sigma^{\prime}_{c})=o(n). Therefore, d(σ,τ)d(\sigma,\tau) is upper bounded by the number of variables not in the core instance, plus o(n)o(n). By Theorem 4, the number of variables outside the core instance is given by (1V(k,r))n+o(n)(1-V(k,r))n+o(n). Hence, we have d(σ,τ)(1V(k,r))n+o(n)=(exp(krQk1)+krQk1exp(krQk1))n+o(n)d(\sigma,\tau)\leq(1-V(k,r))n+o(n)=\left(\exp(-krQ^{k-1})+krQ^{k-1}\exp(-krQ^{k-1})\right)n+o(n).

To prove the second part of the lemma, we assume that σ\sigma and τ\tau are in the different clusters, that is, σ𝒮i\sigma\in\mathcal{S}_{i}, σ𝒮j\sigma^{\prime}\in\mathcal{S}_{j} and 𝒮i𝒮j\mathcal{S}_{i}\neq\mathcal{S}_{j} for some i,j[nc]i,j\in[n_{c}]. By the definition of cluster and Lemma 2, we have d(σc,σc)ϵ(k,r)nd(\sigma_{c},\sigma^{\prime}_{c})\geq\epsilon(k,r)n Therefore, we have d(σ,σ)d(π(σ),π(σ))=d(σc,σc)ϵ(k,r)nd(\sigma,\sigma^{\prime})\geq d(\pi(\sigma),\pi(\sigma^{\prime}))=d(\sigma_{c},\sigma^{\prime}_{c})\geq\epsilon(k,r)n. ∎

3 Preparation of OGP method

In this section, we introduce some notions and obtain some preliminary results needed by the overlap gap property method to prove the main results.

3.1 Sequence of output assignments

The random kk-XORSAT instance 𝚽\mathbf{\Phi} is a random variable, and the τ\tau-decimation algorithm DECτ is a randomized algorithm. Therefore, the assignment output by the τ\tau-decimation algorithm DECτ on input 𝚽\mathbf{\Phi} is also a random variable. The outcomes of the output assignment depend on the random instance 𝚽\mathbf{\Phi}, the order of variables being chosen, and the value selection based on the output from the local rule τ\tau. Now we introduce two random variables to explicitly represent the order of variables and the value selection so that we can have a more concrete language to discuss how the randomness from both the instance and the algorithm affects the output assignment. We adopt the notation from [GS17b] in the following discussion.

The order of variables can be represented by a random vector 𝐙=(𝐙1,𝐙2,,𝐙n)\mathbf{Z}=(\mathbf{Z}_{1},\mathbf{Z}_{2},\cdots,\mathbf{Z}_{n}) whose entries are nn i.i.d. random variables with uniform distribution over the interval [0,1][0,1]\subset\mathbb{R}, independent of the random instance 𝚽\mathbf{\Phi}. We call 𝐙\mathbf{Z} the ordering vector of the algorithm. For all i[n]i\in[n], the variable xix_{i} in the instance 𝚽\mathbf{\Phi} is associated with the random variable 𝐙i\mathbf{Z}_{i}. In each iteration of the algorithm, the unassigned variable xix_{i} with the largest value 𝐙i\mathbf{Z}_{i}, among all other unassigned variables, is selected. In the other words, we can construct the permutation s:[n][n]s:[n]\rightarrow[n] such that 𝐙s(1)>𝐙s(2)>>𝐙s(n)\mathbf{Z}_{s(1)}>\mathbf{Z}_{s(2)}>\cdots>\mathbf{Z}_{s(n)}, and for all t[n]t\in[n] the variable xs(t)x_{s(t)} is selected in the tt-th iteration. The value selection based the output from the local rule τ\tau can be represented by a random vector 𝐔=(𝐔1,𝐔2,,𝐔n)\mathbf{U}=(\mathbf{U}_{1},\mathbf{U}_{2},...,\mathbf{U}_{n}) whose entries are nn i.i.d. random variables with uniform distribution over the interval [0,1][0,1]\subset\mathbb{R}. We call 𝐔\mathbf{U} the internal vector of the algorithm. In the tt-th iteration of the algorithm, the value σ(xs(t))\sigma(x_{s(t)}) assigned to the selected variable xs(t)x_{s(t)} is set to be 1 if 𝐔t<τ(B𝚽t(xs(t),R))\mathbf{U}_{t}<\tau(B_{\mathbf{\Phi}_{t}}(x_{s(t)},R)), and 0 otherwise. Conditioning on 𝚽\mathbf{\Phi}, 𝐙\mathbf{Z} and 𝐔\mathbf{U}, the output assignment σ\sigma can be uniquely determined. Therefore, we can view the τ\tau-decimation algorithm DECτ as a deterministic algorithm on random input (𝚽,𝐙,𝐔)(\mathbf{\Phi},\mathbf{Z},\mathbf{U}), and denote by σ𝚽,𝐙,𝐔\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U}} the output of the algorithm.

With this notion of the deterministic algorithm, we can construct a sequence of output assignments which will be used in the argument of the overlap gap property method. The sequence of output assignments is generated by applying the τ\tau-decimation algorithm DECτ on a random kk-XORSAT instance 𝚽\mathbf{\Phi} multiple times in the following way: First, given a random kk-XORSAT instance 𝚽\mathbf{\Phi}, we sample an ordering vector 𝐙\mathbf{Z} and an internal vector 𝐔\mathbf{U}. Then, we run the τ\tau-decimation algorithm DECτ on input 𝚽\mathbf{\Phi} with the ordering vector 𝐙\mathbf{Z} and the internal vector 𝐔\mathbf{U} to get the first output assignment σ0\sigma_{0}. After that, we re-randomize (i.e. sample again) the entries of the internal vector 𝐔\mathbf{U} one by one from 𝐔1\mathbf{U}_{1} to 𝐔n\mathbf{U}_{n}. Right after each re-randomization we run the algorithm again to get a new output assignment. By doing this, we obtain a sequence of n+1n+1 output assignments for the instance 𝚽\mathbf{\Phi} in total. We denote by σi\sigma_{i} the output assignment generated after re-randomizing the first ii entries of 𝐔\mathbf{U}, for i=0,1,2,,ni=0,1,2,...,n. Precisely speaking, let 𝐕=(𝐕1,𝐕2,,𝐕n)\mathbf{V}=(\mathbf{V}_{1},\mathbf{V}_{2},...,\mathbf{V}_{n}) and 𝐖=(𝐖1,𝐖2,,𝐖n)\mathbf{W}=(\mathbf{W}_{1},\mathbf{W}_{2},...,\mathbf{W}_{n}) be two independent random internal vectors with the uniform distribution over [0,1]n[0,1]^{n}, and set 𝐔𝐢=(𝐖1,,𝐖i,𝐕i+1,,𝐕n)\mathbf{U^{i}}=(\mathbf{W}_{1},...,\mathbf{W}_{i},\mathbf{V}_{i+1},...,\mathbf{V}_{n}) for i=0,1,2,,ni=0,1,2,...,n. Note that 𝐔𝟎=𝐕\mathbf{U^{0}}=\mathbf{V} and 𝐔𝐧=𝐖\mathbf{U^{n}}=\mathbf{W}. Then, the sequence of output assignments {σi}i=0n\{\sigma_{i}\}_{i=0}^{n} can be written as {σ𝚽,𝐙,𝐔𝐢}i=0n\{\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\}_{i=0}^{n}, which is equivalent to the sequence of output assignment obtained by running the τ\tau-decimation algorithm DECτ (for n+1n+1 times in total) on input (𝚽,𝐙,𝐔𝐢)(\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}) for all i=0,1,,ni=0,1,...,n.

Recall the projection π\pi mapping assignments for the instance 𝚽\mathbf{\Phi} to assignments for the core instance 𝚽c\mathbf{\Phi}_{c}, by removing all variables not in the core instance. We can further obtain a sequence of assignments for the core instance 𝚽c\mathbf{\Phi}_{c} by applying the projection on the output assignments σi\sigma_{i}, that is, we set {σi=π(σ𝚽,𝐙,𝐔𝐢)}i=0n\{\sigma^{\prime}_{i}=\pi(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}})\}_{i=0}^{n}.

3.2 Insensitive to internal vector

In this section, we show that the τ\tau-decimation algorithm DECτ is insensitive to its internal vector. By insensitive, it means when the value of an entry in the internal vector 𝐔\mathbf{U} is changed, only a small portion of the assigned values in the output assignment σ𝚽,𝐙,𝐔\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U}} change accordingly. If so, every two consecutive output assignments in the sequence {σi=σΦ,𝐙,𝐔𝐢}i=0n\{\sigma_{i}=\sigma_{\Phi,\mathbf{Z},\mathbf{U^{i}}}\}_{i=0}^{n} should only differ from each other in only a small portion of assigned values.

Consider the sequence of output assignment {σi=σ𝚽,𝐙,𝐔𝐢}i=0n\{\sigma_{i}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\}_{i=0}^{n} from Section 3.1. Note that the ii-th output assignment σi\sigma_{i} in the sequence is the output of the algorithm on input (𝚽,𝐙,𝐔𝐢)(\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}). For any i[n]i\in[n], the only difference between the input (𝚽,𝐙,𝐔𝐢𝟏)(\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i-1}}) and the input (𝚽,𝐙,𝐔𝐢)(\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}) is the ii-th entries of the internal vectors 𝐔𝐢𝟏\mathbf{U^{i-1}} and 𝐔𝐢\mathbf{U^{i}}. We can immediately see that the insensitivity of the algorithm implies that every two consecutive output assignments in the sequence are close to each other. Gamarnik and Sudan [GS17b] proved the insensitivity of the τ\tau-decimation algorithm in their works, using the notion of influence range. Although their works [GS17b] focused on the random NAE-kk-SAT problem, the proof for the insensitivity of the τ\tau-decimation algorithm is independent of the type of clauses in the random constraint satisfaction framework. So, we can directly use the result here.

Definition 3.

Given a random instance 𝚽\mathbf{\Phi} and a random ordering vector 𝐙\mathbf{Z}, we say that xix_{i} influences xjx_{j} if either xi=xjx_{i}=x_{j} or in the variable-to-variable graph of the instance 𝚽\mathbf{\Phi} there exists a sequence of variable nodes y0,y1,,yt{x1,x2,,xn}y_{0},y_{1},...,y_{t}\in\{x_{1},x_{2},...,x_{n}\} such that the following statements hold.

  1. 1.

    y0=xiy_{0}=x_{i} and yt=xjy_{t}=x_{j}.

  2. 2.

    There exists a path from yly_{l} to yl+1y_{l+1}, of length at most rr, in the variable-to-variable graph GG, for l=0,1,,t1l=0,1,...,t-1.

  3. 3.

    𝐙yl1>𝐙yl\mathbf{Z}_{y_{l-1}}>\mathbf{Z}_{y_{l}} for l=1,2,,tl=1,2,...,t. In particular, 𝐙xi>𝐙xj\mathbf{Z}_{x_{i}}>\mathbf{Z}_{x_{j}}.

We define the influence range of xix_{i} to be the set of all variables xjx_{j} influenced by xix_{i}, denoted by xi\mathcal{IR}_{x_{i}}.

Lemma 4.

Given an instance Φ\Phi, a vector Z[0,1]nZ\in[0,1]^{n}, and two vectors U,U[0,1]nU,U^{\prime}\in[0,1]^{n}, we assume there exists i{1,2,,n}i\in\{1,2,...,n\} such that UiUiU_{i}\neq U^{\prime}_{i} and Uj=UjU_{j}=U^{\prime}_{j} for all jij\neq i. Then, σΦ,Z,U(x)=σΦ,Z,U(x)\sigma_{\Phi,Z,U}(x)=\sigma_{\Phi,Z,U^{\prime}}(x) for all variables xjxix_{j}\notin\mathcal{IR}_{x_{i}}.

Lemma 5.

For any ξ(0,1)\xi\in(0,1) and sufficiently large nn,

Pr[max1in|xi|n1/6]exp(lnn(lnlnn)ξ/4).\Pr\left[\,{\max_{1\leq i\leq n}|\mathcal{IR}_{x_{i}}|\geq n^{1/6}}\,\right]\leq\exp\left(-\ln n(\ln\ln n)^{\xi/4}\right).

They first showed that changing the value of only one entry, say UiU_{i}, in the internal vector UU only affects the values assigned to the variables in the influence range of the variable xix_{i} (Lemma 4). They further showed that w.h.p. the size of the influence range of variables is sublinear for all variables (Lemma 5). Note that in the original statement of Lemma 5 in [GS17b], the index 1/61/6 in the inequality above can be any real number between 0 and 1/5. Here, we pick a fixed value 1/61/6 for simplicity. Combining these two lemmas, we can show that w.h.p. the differences between σ𝚽,𝐙,𝐔𝐢𝟏\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i-1}}} and σ𝚽,𝐙,𝐔𝐢\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}} is upper bounded by n1/6n^{1/6} for all i[n]i\in[n].

Lemma 6.

For any ξ(0,1)\xi\in(0,1) and sufficiently large nn,

Pr[d(σ𝚽,𝐙,𝐔𝐢𝟏,σ𝚽,𝐙,𝐔𝐢)n1/6 for some i[n]]exp(lnn(lnlnn)ξ/4).\Pr\left[\,{d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i-1}}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}})\geq n^{1/6}\textnormal{\;\;for some\;}i\in[n]}\,\right]\leq\exp\left(-\ln n(\ln\ln n)^{\xi/4}\right).
Proof.

Fix an arbitrary i[n]i\in[n]. We know that 𝐔j𝐢𝟏=𝐔j𝐢\mathbf{U}_{j}^{\mathbf{i-1}}=\mathbf{U}_{j}^{\mathbf{i}} for all jij\neq i, and 𝐔i𝐢𝟏𝐔i𝐢\mathbf{U}_{i}^{\mathbf{i-1}}\neq\mathbf{U}_{i}^{\mathbf{i}}. By Lemma 4, we have σ𝚽,𝐙,𝐔𝐢𝟏(xj)=σ𝚽,𝐙,𝐔𝐢(xj)\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i-1}}}(x_{j})=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}(x_{j}) for all variables xjxix_{j}\notin\mathcal{IR}_{x_{i}}. If d(σ𝚽,𝐙,𝐔𝐢𝟏,σ𝚽,𝐙,𝐔𝐢)n1/6d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i-1}}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}})\geq n^{1/6} for some i[n]i\in[n], we have |xi|n1/6|\mathcal{IR}_{x_{i}}|\geq n^{1/6}. Hence, by Lemma 5, the result follows. ∎

3.3 Freeness

Recall the definition of free steps. An iteration of the τ\tau-decimation algorithm DECτ is called a free step if the local rule τ\tau gives the value 1/21/2 in that iteration. In this case, the value chosen by the τ\tau-decimation algorithm for the selected variable is either 0 or 1 with even probability. Intuitively, it means that the local rule τ\tau cannot capture useful information from the local structure to guide the τ\tau-decimation algorithm choosing value for the selected variable, and thus the τ\tau-decimation algorithm simply make a random guess for the assigned value. We also recall the definition of a τ\tau-decimation algorithm being δ\delta-free. A τ\tau-decimation algorithm DECτ is δ\delta-free on the random kk-XORSAT instance 𝚽\mathbf{\Phi} if w.h.p. the algorithm has at least δn\delta n free steps, on input 𝚽\mathbf{\Phi}. Informal speaking, the more free the τ\tau-decimation algorithm, the less the information captured by the local rule.

By using the Wormald’s method of differential equations, we can calculate the degree profile of the remaining factor graph after tt steps of the τ\tau-decimation algorithm, for all 0tn0\leq t\leq n. With the degree profiles, we can calculate the probability of each step being free, and thus approximate how free the τ\tau-decimation algorithm is. The probability of having free steps depends on the choice of the local rules. Lemma 7 shows the freeness of the UC-decimation algorithm.

Lemma 7.

For k3k\geq 3 and r>0r>0, the UC-decimation algorithm DECUC is w1(k,r)w_{1}(k,r)-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), where

w1(k,r)=(kr)11kk1γ(1k1,kr)\displaystyle w_{1}(k,r)=\frac{(kr)^{\frac{1}{1-k}}}{k-1}\;\gamma\left(\frac{1}{k-1},kr\right)

and γ\gamma is the lower incomplete gamma function given by γ(a,x)0xta1et𝑑t\gamma(a,x)\equiv\int_{0}^{x}t^{a-1}e^{-t}dt.

The role of the local rules is to approximate the marginal probability of the selected variable over a randomly chosen solution for the sub-instance induced by the local neighborhood of the selected variable. Interestingly, even we have a local rule τ\tau that is capable to give the exact marginals when the factor graph is a tree, it still cannot provide enough useful information to guide the τ\tau-decimation algorithm making good decision for the assigned value. With such a local rule, the τ\tau-decimation algorithm still has a certain level of freeness.

Lemma 8.

Assume the local rule τ\tau can give the exact marginal probabilities of variables on any factor graph that is a tree. For k3k\geq 3 and r>0r>0, the τ\tau-decimation algorithm DECτ is we(k,r)w_{e}(k,r)-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), where

we(k,r)=01SR(x)𝑑x,\displaystyle w_{e}(k,r)=\int_{0}^{1}S_{R}(x)dx,

S0(x)=1S_{0}(x)=1 and Sl(x)=exp(kr[(1x)(1Sl1(x))+x]k1)S_{l}(x)=\exp\left(-kr[(1-x)(1-S_{l-1}(x))+x]^{k-1}\right) for any l1l\geq 1 and xx\in\mathbb{R}.

The proof for Lemma 7 and Lemma 8 can be found in Appendix D and Appendix E, respectively.

4 Proof of main theorems

We denote by αn\alpha_{n} the success probability of the τ\tau-decimation algorithm DECτ\tau, namely, αn\alpha_{n} is the probability that the assignment output by the τ\tau-decimation algorithm DECτ\tau on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) with nn variables and rnrn clauses is a solution for 𝚽\mathbf{\Phi}. Formally, we define αn\alpha_{n} by the following expression αnPr[σ𝚽𝚽k(n,rn),𝐙,𝐔𝒮(𝚽)]\alpha_{n}\equiv\Pr\left[\,{\sigma_{\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn),\mathbf{Z},\mathbf{U}}\in\mathcal{S}(\mathbf{\Phi})}\,\right], where 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) is the random kk-XORSAT instance, 𝐙\mathbf{Z} is the random ordering vector, and 𝐔\mathbf{U} is the random internal vector, as mentioned in Section 3.1. Now, we consider the sequence of output assignments {σi=σ𝚽,𝐙,𝐔𝐢}i=0n\{\sigma_{i}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\}_{i=0}^{n} generated by the procedure in Section 3.1. We first prove that if the algorithm DECτ\tau is δ\delta-free, then the expected distance 𝔼[d(σ0,σn)]\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{n})}\right] between the first and the last assignments in the sequence is at least (δ/2)n+o(n)(\delta/2)n+o(n).

Lemma 9.

If the τ\tau-decimation algorithm DECτ\tau is δ\delta-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) for some δ>0\delta>0, then we have 𝔼[d(σ0,σn)](δ/2)n+o(n)\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{n})}\right]\geq(\delta/2)n+o(n).

Next, we will show that, if the τ\tau-decimation algorithm is ”free enough”, namely, strictly 2μ(k,r)2\mu(k,r)-free, then we can pick a pair of output assignments and project them to the core instance 𝚽c\mathbf{\Phi}_{c} so that the distance between the two corresponding core assignments falls in the forbidden range from the overlap gap property of the core instance 𝚽c\mathbf{\Phi}_{c}.

Lemma 10.

For any k3k\geq 3 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), if the τ\tau-decimation algorithm DECτ\tau is strictly 2μ(k,r)2\mu(k,r)-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), then there exist 0i0n0\leq i_{0}\leq n and 0<ϵ<ϵ(k,r)0<\epsilon^{\prime}<\epsilon(k,r) such that w.h.p. we have |d(π(σ0),π(σi0))12ϵn|<14ϵn\left|d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))-\frac{1}{2}\epsilon^{\prime}n\right|<\frac{1}{4}\epsilon^{\prime}n, where ϵ(k,r)\epsilon(k,r) is given in Lemma 2.

The following lemma shows that the probability of both the output assignments σ𝚽,𝐙,𝐔𝟎\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}} and σ𝚽,𝐙,𝐔𝐢𝟎\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i_{0}}}} being solutions for the instance 𝚽\mathbf{\Phi} is lower bounded by αn2\alpha_{n}^{2}.

Lemma 11.

For any i[n]i\in[n], we have Pr[σ0𝒮(Φ) and σi𝒮(Φ)]αn2\Pr\left[\,{\sigma_{0}\in\mathcal{S}(\Phi)\textnormal{\;and\;}\sigma_{i}\in\mathcal{S}(\Phi)}\,\right]\geq\alpha_{n}^{2}.

Finally, we can combine all above lemmas in this section to give the proof of Theorem 1.

Proof of Theorem 1.

We denote by 𝒜\mathcal{A} the event of |d(π(σ0),π(σi0))12ϵn|<14ϵn\left|d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))-\frac{1}{2}\epsilon^{\prime}n\right|<\frac{1}{4}\epsilon^{\prime}n, and we have Pr[𝒜]\Pr\left[\,{\mathcal{A}}\,\right] =1o(1)=1-o(1) by Lemma 10. On the other hand, we pick i=i0i=i_{0} for the inequality in Lemma 11. We denote by \mathcal{B} the event of σ0𝒮(Φ) and σi0𝒮(Φ)\sigma_{0}\in\mathcal{S}(\Phi)\textnormal{\;and\;}\sigma_{i_{0}}\in\mathcal{S}(\Phi), and Pr[]αn2\Pr\left[\,{\mathcal{B}}\,\right]\geq\alpha_{n}^{2}. Note that we have Pr[𝒜]1Pr[Not 𝒜]Pr[Not ]1o(1)(1αn2)=αn2o(1)\Pr\left[\,{\mathcal{A}\cap\mathcal{B}}\,\right]\geq 1-\Pr\left[\,{\text{Not\;}\mathcal{A}}\,\right]-\Pr\left[\,{\text{Not\;}\mathcal{B}}\,\right]\geq 1-o(1)-(1-\alpha_{n}^{2})=\alpha_{n}^{2}-o(1). Thus, we have αnPr[𝒜]1/2+o(1)\alpha_{n}\leq\Pr\left[\,{\mathcal{A}\cap\mathcal{B}}\,\right]^{1/2}+o(1).

Now assume both 𝒜\mathcal{A} and \mathcal{B} take places. Since both σ0\sigma_{0} and σi0\sigma_{i_{0}} are solutions for the random instance 𝚽\mathbf{\Phi}, both π(σ0)\pi(\sigma_{0}) and π(σi0)\pi(\sigma_{i_{0}}) are solutions for the core instance 𝚽c\mathbf{\Phi}_{c}. Moreover, the distance d(π(σ0),π(σi0))d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}})) falls in the interval ((1/4)ϵn,(3/4)ϵn)(o(n),ϵn)((1/4)\epsilon^{\prime}n,(3/4)\epsilon^{\prime}n)\subsetneq(o(n),\epsilon n), which takes place with probability at most o(1)o(1) by Lemma 2. So, we have Pr[𝒜]o(1)\Pr\left[\,{\mathcal{A}\cap\mathcal{B}}\,\right]\leq o(1), and thus αno(1)\alpha_{n}\leq o(1). ∎

To prove Theorem 2 and 3, all we need to do is to show that DECUC and DECτ with the exact marginal assumption are strictly 2μ(k,r)2\mu(k,r)-free. The results immediately follow by applying Theorem 1. From Lemma 7 and 8, we know that DECUC and DECτ are w1(k,r)w_{1}(k,r)-free and we(k,r)w_{e}(k,r)-free, respectively. So, we only need to show that w1(k,r)>2μ(k,r)w_{1}(k,r)>2\mu(k,r) and we(k,r)>2μ(k,r)w_{e}(k,r)>2\mu(k,r). It can be done with the following lemmas, which give an upper bound of μ(k,r)\mu(k,r) in Lemma 12, a lower bound of w1(k,r)w_{1}(k,r) in Lemma 13, and a lower bound of and we(k,r)w_{e}(k,r) in Lemma 14. The proofs of these three lemmas are given in Appendices B, F and G, respectively.

Lemma 12.

For any k4k\geq 4 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), we have μ(k,r)<μu(k)\mu(k,r)<\mu_{u}(k), where

μu(k)=(1e1/k)(1e1/k)ln(1e1/k).\displaystyle\mu_{u}(k)=(1-e^{-1/k})-(1-e^{-1/k})\ln(1-e^{-1/k}).
Lemma 13.

For any kk03k\geq k_{0}\geq 3 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), w1(k,r)w1(k0)w_{1}(k,r)\geq w_{1}^{*}(k_{0}), where

w1(k)=k11kk1γ(1k1,k(kk+1)k1)\displaystyle w_{1}^{*}(k)=\frac{k^{\frac{1}{1-k}}}{k-1}\gamma\left(\frac{1}{k-1},k\left(\frac{k}{k+1}\right)^{k-1}\right)
Lemma 14.

For any kk03k\geq k_{0}\geq 3 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)), we have we(k,r)we(k0,rsat(k0))w_{e}(k,r)\geq w_{e}^{*}(k_{0},r_{sat}(k_{0})), where we(k,r)=x(k,r)kr2(x(k,r))kw_{e}^{*}(k,r)=x^{-}(k,r)-kr^{2}(x^{-}(k,r))^{k} and

x±(k,r)=(1±14(kr)2[(kr)1k11]2)1k2.\displaystyle x^{\pm}(k,r)=\left(\frac{1\pm\sqrt{1-4(kr)^{-2}[(kr)^{\frac{1}{k-1}}-1]}}{2}\right)^{\frac{1}{k-2}}. (1)
Proof of Theorem 2.

Let k9k\geq 9 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)). By Lemma 12 and 13, we have 2μ(k,r)<2μu(9)0.3420<0.3575w1(9)w1(k,r)2\mu(k,r)<2\mu_{u}(9)\leq 0.3420<0.3575\leq w_{1}^{*}(9)\leq w_{1}(k,r). Then, by Lemma 7, DECUC is strictly 2μ(k,r)2\mu(k,r)-free. The result follows. ∎

Proof of Theorem 3.

Let k13k\geq 13 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)). By Lemma 12 and 14, we have 2μ(k,r)<2μu(13)0.2668<0.2725we(13)we(k,r)2\mu(k,r)<2\mu_{u}(13)\leq 0.2668<0.2725\leq w_{e}^{*}(13)\leq w_{e}(k,r). Then, by Lemma 8, DECτ is strictly 2μ(k,r)2\mu(k,r)-free. The result follows. ∎

References

  • [AC08] Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic Barriers from Phase Transitions. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 793–802, Philadelphia, PA, USA, October 2008. IEEE.
  • [ACR11] Dimitris Achlioptas, Amin Coja-Oghlan, and Federico Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. Random Structures & Algorithms, 38(3):251–268, May 2011.
  • [AKKT02] Dimitris Achlioptas, Jeong Han Kim, Michael Krivelevich, and Prasad Tetali. Two-coloring random hypergraphs. Random Structures and Algorithms, 20(2):249–259, March 2002.
  • [ALS22a] Emmanuel Abbe, Shuangping Li, and Allan Sly. Binary perceptron: Efficient algorithms can find solutions in a rare well-connected cluster. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pages 860–873, New York, NY, USA, June 2022. Association for Computing Machinery.
  • [ALS22b] Emmanuel Abbe, Shuangping Li, and Allan Sly. Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 327–338, February 2022.
  • [AM15] Dimitris Achlioptas and Michael Molloy. The solution space geometry of random linear equations. Random Structures & Algorithms, 46(2):197–231, March 2015.
  • [BH22] Guy Bresler and Brice Huang. The Algorithmic Phase Transition of Random k-SAT for Low Degree Polynomials. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 298–309, Denver, CO, USA, February 2022. IEEE.
  • [BMZ05] A. Braunstein, M. Mézard, and R. Zecchina. Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms, 27(2):201–226, September 2005.
  • [BZ04] Alfredo Braunstein and Riccardo Zecchina. Survey propagation as local equilibrium equations. Journal of Statistical Mechanics: Theory and Experiment, 2004(06):P06007, June 2004.
  • [CDMM03] S. Cocco, O. Dubois, J. Mandler, and R. Monasson. Rigorous Decimation-Based Construction of Ground Pure States for Spin-Glass Models on Random Lattices. Physical Review Letters, 90(4):047205, January 2003.
  • [Coj10] Amin Coja-Oghlan. A Better Algorithm for Random k -SAT. SIAM Journal on Computing, 39(7):2823–2864, January 2010.
  • [Coj17] Amin Coja-Oghlan. Belief Propagation Guided Decimation Fails on Random Formulas. Journal of the ACM, 63(6):1–55, February 2017.
  • [DGM+10] Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight Thresholds for Cuckoo Hashing via XORSAT. In Automata, Languages and Programming, volume 6198, pages 213–225. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
  • [DM02] Olivier Dubois and Jacques Mandler. The 3-XORSAT threshold. Comptes Rendus Mathematique, 335(11):963–966, December 2002.
  • [Gam21] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41):e2108492118, October 2021.
  • [GK23] David Gamarnik and Eren C. Kızıldağ. Algorithmic obstructions in the random number partitioning problem. The Annals of Applied Probability, 33(6B), December 2023.
  • [GKPX23] David Gamarnik, Eren C. Kizildağ, Will Perkins, and Changji Xu. Geometric Barriers for Stable and Online Algorithms for Discrepancy Minimization. In Proceedings of Thirty Sixth Conference on Learning Theory, pages 3231–3263. PMLR, July 2023.
  • [GL18] David Gamarnik and Quan Li. Finding a large submatrix of a Gaussian random matrix. The Annals of Statistics, 46(6A), December 2018.
  • [GS02] Carla P. Gomes and Bart Selman. Satisfied with Physics. Science, 297(5582):784–785, August 2002.
  • [GS17a] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. The Annals of Probability, 45(4), July 2017.
  • [GS17b] David Gamarnik and Madhu Sudan. Performance of Sequential Local Algorithms for the Random NAE-$K$-SAT Problem. SIAM Journal on Computing, 46(2):590–619, January 2017.
  • [GY11] Marco Guidetti and A. P. Young. Complexity of several constraint-satisfaction problems using the heuristic classical algorithm WalkSAT. Physical Review E, 84(1):011102, July 2011.
  • [Het16] Samuel Hetterich. Analysing Survey Propagation Guided Decimation on Random Formulas, February 2016.
  • [HS22] Brice Huang and Mark Sellke. Tight Lipschitz Hardness for optimizing Mean Field Spin Glasses. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 312–322, Denver, CO, USA, October 2022. IEEE.
  • [IKKM12] Morteza Ibrahimi, Yashodhan Kanoria, Matt Kraning, and Andrea Montanari. The Set of Solutions of Random XORSAT Formulae. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 760–779. Society for Industrial and Applied Mathematics, January 2012.
  • [Kim04] Jeong Han Kim. The Poisson Cloning Model for Random Graphs, Random Directed Graphs and Random k-SAT Problems. In Computing and Combinatorics, volume 3106, pages 2–2. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
  • [KSS09] Lukas Kroc, Ashish Sabharwal, and Bart Selman. Message-passing and local heuristics as decimation strategies for satisfiability. In Proceedings of the 2009 ACM Symposium on Applied Computing, pages 1408–1414, Honolulu Hawaii, March 2009. ACM.
  • [MM09] Marc Mézard and Andrea Montanari. Information, Physics, and Computation. Oxford Graduate Texts. Oxford University Press, Oxford ; New York, 2009.
  • [MMW07] Elitza Maneva, Elchanan Mossel, and Martin J. Wainwright. A new look at survey propagation and its generalizations. Journal of the ACM, 54(4):17, July 2007.
  • [Mol05] Michael Molloy. Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms, 27(1):124–135, August 2005.
  • [MPZ02] M. Mézard, G. Parisi, and R. Zecchina. Analytic and Algorithmic Solution of Random Satisfiability Problems. Science, 297(5582):812–815, August 2002.
  • [MRZ03] M. Mézard, F. Ricci-Tersenghi, and R. Zecchina. Two Solutions to Diluted p-Spin Models and XORSAT Problems. Journal of Statistical Physics, 111(3/4):505–533, 2003.
  • [PS16] Boris Pittel and Gregory B. Sorkin. The Satisfiability Threshold for k -XORSAT. Combinatorics, Probability and Computing, 25(2):236–268, March 2016.
  • [PSW96] Boris Pittel, Joel Spencer, and Nicholas Wormald. Sudden Emergence of a Giant k-Core in a Random Graph. Journal of Combinatorial Theory, Series B, 67(1):111–151, May 1996.
  • [PX21] Will Perkins and Changji Xu. Frozen 1-RSB structure of the symmetric Ising perceptron. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1579–1588, New York, NY, USA, June 2021. Association for Computing Machinery.
  • [RS09] Federico Ricci-Tersenghi and Guilhem Semerjian. On the cavity method for decimated random constraint satisfaction problems and the analysis of belief propagation guided decimation algorithms. Journal of Statistical Mechanics: Theory and Experiment, 2009(09):P09001, September 2009.
  • [Wei22] Alexander S. Wein. Optimal low-degree hardness of maximum independent set. Mathematical Statistics and Learning, 4(3):221–251, January 2022.
  • [Wor95] Nicholas C. Wormald. Differential Equations for Random Processes and Random Graphs. The Annals of Applied Probability, 5(4), November 1995.
  • [Wor99] Nicholas C. Wormald. The differential equation method for random graph processes and greedy algorithms. Lectures on approximation and randomized algorithms, pages 73–155, 1999.

Appendix A Proof of lemmas in Section 4

In this section, we prove the proof of lemmas in Section 4.

Proof of Lemma 9.

Suppose D[n]D\subseteq[n] is the subset of indices of iterations that are free steps, namely,

D={i[n]:The i-th iteration is a free step}.\displaystyle D=\{i\in[n]:\text{The $i$-th iteration is a free step}\}.

Note that σ0=σ𝚽,𝐙,𝐔𝟎=σ𝚽,𝐙,𝐕\sigma_{0}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}} and σn=σ𝚽,𝐙,𝐔𝐧=σ𝚽,𝐙,𝐖\sigma_{n}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{n}}}=\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}} and thus

d(σ0,σn)=d(σ𝚽,𝐙,𝐔𝟎,σ𝚽,𝐙,𝐔𝐧)=d(σ𝚽,𝐙,𝐕,σ𝚽,𝐙,𝐖)\displaystyle d(\sigma_{0},\sigma_{n})=d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{n}}})=d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}})

Since we can write d(σ𝚽,𝐙,𝐕,σ𝚽,𝐙,𝐖)=i=1n𝟙(σ𝚽,𝐙,𝐕(xs(i))σ𝚽,𝐙,𝐖(xs(i)))d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}})=\sum_{i=1}^{n}\mathbbm{1}\left(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}(x_{s(i)})\neq\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}}(x_{s(i)})\right), we have

d(σ𝚽,𝐙,𝐕,σ𝚽,𝐙,𝐖)\displaystyle d(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}},\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}}) =i=1n𝟙(σ𝚽,𝐙,𝐕(xs(i))σ𝚽,𝐙,𝐖(xs(i)))\displaystyle=\sum_{i=1}^{n}\mathbbm{1}\left(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left(x_{s(i)}\right)\neq\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}}\left(x_{s(i)}\right)\right)
iD𝟙(σ𝚽,𝐙,𝐕(xs(i))σ𝚽,𝐙,𝐖(xs(i))).\displaystyle\geq\sum_{i\in D}\mathbbm{1}\left(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left(x_{s(i)}\right)\neq\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}}\left(x_{s(i)}\right)\right).

In free steps, the local rule τ\tau gives the value 1/21/2 to the decimation algorithm. Therefore, for any iDi\in D, σ𝚽,𝐙,𝐕(xs(i))σ𝚽,𝐙,𝐖(xs(i))\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left(x_{s(i)}\right)\neq\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}}\left(x_{s(i)}\right) if and only if either 𝐕i<1/2<𝐖i\mathbf{V}_{i}<1/2<\mathbf{W}_{i} or 𝐖i<1/2<𝐕i\mathbf{W}_{i}<1/2<\mathbf{V}_{i}. Therefore, we have

iD𝟙(σ𝚽,𝐙,𝐕(xs(i))σ𝚽,𝐙,𝐕(xs(i)))\displaystyle\sum_{i\in D}\mathbbm{1}\left(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left(x_{s(i)}\right)\neq\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left(x_{s(i)}\right)\right) =iD𝟙(𝐕i<1/2<𝐖i or 𝐖i<1/2<𝐕i)\displaystyle=\sum_{i\in D}\mathbbm{1}\left(\mathbf{V}_{i}<1/2<\mathbf{W}_{i}\text{\;or\;}\mathbf{W}_{i}<1/2<\mathbf{V}_{i}\right)

Note that the random variables 𝐕1,𝐕2,,𝐕n,𝐖1,𝐖2,,𝐖n\mathbf{V}_{1},\mathbf{V}_{2},\cdots,\mathbf{V}_{n},\mathbf{W}_{1},\mathbf{W}_{2},\cdots,\mathbf{W}_{n} are i.i.d. over uniform distributions on [0,1][0,1]. Thus, iD𝟙(𝐕i<1/2<𝐖i or 𝐖i<1/2<𝐕i)\sum_{i\in D}\mathbbm{1}\left(\mathbf{V}_{i}<1/2<\mathbf{W}_{i}\text{\;or\;}\mathbf{W}_{i}<1/2<\mathbf{V}_{i}\right) is distributed over the binomial distribution B(|D|,1/2)B(|D|,1/2) with parameters |D||D| and 1/21/2.

Assume that the algorithm DECτ\tau is δ\delta-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn) for some δ>0\delta>0, which implies that w.h.p. |D|δn|D|\geq\delta n. Note that DD only depends on 𝚽\mathbf{\Phi} and 𝐙\mathbf{Z}. Hence, we have

𝔼[d(σ0,σn)]\displaystyle\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{n})}\right] =𝔼𝚽,𝐙,𝐕,𝐖[iD𝟙(𝐕i<1/2<𝐖i or 𝐖i<1/2<𝐕i)]\displaystyle=\mathbb{E}_{\mathbf{\Phi},\mathbf{Z},\mathbf{V},\mathbf{W}}\left[\sum_{i\in D}\mathbbm{1}\left(\mathbf{V}_{i}<1/2<\mathbf{W}_{i}\text{\;or\;}\mathbf{W}_{i}<1/2<\mathbf{V}_{i}\right)\right]
=𝔼𝐕,𝐖𝔼𝚽,𝐙[iD𝟙(𝐕i<1/2<𝐖i or 𝐖i<1/2<𝐕i)]\displaystyle=\mathbb{E}_{\mathbf{V},\mathbf{W}}\mathbb{E}_{\mathbf{\Phi},\mathbf{Z}}\left[\sum_{i\in D}\mathbbm{1}\left(\mathbf{V}_{i}<1/2<\mathbf{W}_{i}\text{\;or\;}\mathbf{W}_{i}<1/2<\mathbf{V}_{i}\right)\right]
=(1/2)𝔼[|D|]\displaystyle=(1/2)\cdot\mathop{{}\mathbb{E}}\left[{|D|}\right]
(δ/2)n+o(n)\displaystyle\geq(\delta/2)n+o(n)

Proof of Lemma 10.

Assume the τ\tau-decimation algorithm DECτ is strictly 2μ(k,r)2\mu(k,r)-free on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn). Then, there exists δ>2μ(k,r)\delta>2\mu(k,r) such that DECτ is δ\delta-free on 𝚽\mathbf{\Phi}.

We first show that there exists 0i0n0\leq i_{0}\leq n such that the expected value of d(π(σ0),π(σi0))d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}})) is close to 12ϵn\frac{1}{2}\epsilon^{\prime}n for some ϵ<ϵ(k,r)\epsilon^{\prime}<\epsilon(k,r). Consider the sequence {𝔼[d(π(σ0),π(σi))]}i=0n\{\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i}))}\right]\}_{i=0}^{n} in which the first item is

𝔼[d(π(σ0),π(σ0))]=0.\displaystyle\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{0}))}\right]=0. (2)

By Lemma 9, we know that 𝔼[d(σ0,σn)](δ/2)n+o(n)\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{n})}\right]\geq(\delta/2)n+o(n). Moreover, by Theorem 4, we know that w.h.p. there are μ(k,r)n+o(n)\mu(k,r)n+o(n) variables not in the core instance 𝚽c\mathbf{\Phi}_{c}. Note that the projection function π\pi only remove variables not in the core instance 𝚽c\mathbf{\Phi}_{c}. Therefore, we have d(π(σ0),π(σn))d(σ0,σn)nd(\pi(\sigma_{0}),\pi(\sigma_{n}))\geq d(\sigma_{0},\sigma_{n})-n^{*} where nn^{*} is the number of variables not in the core instance 𝚽c\mathbf{\Phi}_{c}. Therefore, we have

𝔼[d(σ0,σn)]𝔼[d(π(σ0),π(σn))]+𝔼[n]𝔼[d(π(σ0),π(σn))]+μ(k,r)n+o(n)\displaystyle\mathop{{}\mathbb{E}}\left[{d(\sigma_{0},\sigma_{n})}\right]\leq\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{n}))}\right]+\mathop{{}\mathbb{E}}\left[{n^{*}}\right]\leq\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{n}))}\right]+\mu(k,r)n+o(n)

By re-assigning the terms, we have

𝔼[d(π(σ0),π(σn))]\displaystyle\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{n}))}\right] (δ/2μ(k,r))n+o(n)\displaystyle\geq(\delta/2-\mu(k,r))n+o(n) (3)

with δ/2μ(k,r)>0\delta/2-\mu(k,r)>0. From Lemma 6, we know that with probability 1exp(lnn(lnlnn)ξ/4)1-\exp(-\ln n(\ln\ln n)^{\xi/4}) we have

d(σi1,σi)<n1/6 for all i[n].\displaystyle d(\sigma_{i-1},\sigma_{i})<n^{1/6}\text{\quad for all\;}i\in[n]. (4)

By the triangle inequality of the metric dd and the linearity of the expectation, we know that for 1in1\leq i\leq n the difference of two consecutive expected values in the sequence is

𝔼[d(π(σ0),π(σi))]𝔼[d(π(σ0),π(σi1))]\displaystyle\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i}))}\right]-\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i-1}))}\right]
𝔼[d(π(σ0),π(σi1))]+𝔼[d(π(σi1),π(σi))]𝔼[d(π(σ0),π(σi1))]\displaystyle\leq\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i-1}))}\right]+\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{i-1}),\pi(\sigma_{i}))}\right]-\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i-1}))}\right]
=𝔼[d(π(σi1),π(σi))]\displaystyle=\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{i-1}),\pi(\sigma_{i}))}\right]
𝔼[d(σi1,σi)]\displaystyle\leq\mathop{{}\mathbb{E}}\left[{d(\sigma_{i-1},\sigma_{i})}\right]
n1/6+o(1).\displaystyle\leq n^{1/6}+o(1). (5)

Combining (2), (3) and (5), we know that there exists 0i0n0\leq i_{0}\leq n and 0<ϵ<min{δ/2μ(k,r),ϵ(k,r)}0<\epsilon^{\prime}<\min\{\delta/2-\mu(k,r),\epsilon(k,r)\} such that

𝔼[d(π(σ0),π(σi0))][12ϵn,12ϵn+n1/6].\displaystyle\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))}\right]\in\left[\frac{1}{2}\epsilon^{\prime}n,\,\frac{1}{2}\epsilon^{\prime}n+n^{1/6}\right]. (6)

Next, we prove that d(π(σ0),π(σi0))d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}})) concentrates around its mean. Given two vectors A{0,1}nA\in\{0,1\}^{n} and B{0,1}nB\in\{0,1\}^{n}, we write AB=(A1,,An,B1,,Bn)A\cdot B=(A_{1},\cdots,A_{n},B_{1},\cdots,B_{n}) and AiB=(A1,,Ai,Bi+1,,Bn)A\oplus_{i}B=(A_{1},\cdots,A_{i},B_{i+1},\cdots,B_{n}) for i[n]i\in[n]. A function f:D1×D2×Dnf:D_{1}\times D_{2}\times\cdots D_{n}\rightarrow\mathbb{R} satisfies the bounded differences property if there exist c1,c2,,cnc_{1},c_{2},\cdots,c_{n}\in\mathbb{R} such that for any x1D1x_{1}\in D_{1}, x2D2x_{2}\in D_{2}, \cdots, xnDnx_{n}\in D_{n}, yiDiy_{i}\in D_{i} and i[n]i\leq[n],

|f(x1,,xi1,xi,xi+1,,xn)f(x1,,xi1,yi,xi+1,,xn)|ci.\displaystyle|f(x_{1},...,x_{i-1},x_{i},x_{i+1},...,x_{n})-f(x_{1},...,x_{i-1},y_{i},x_{i+1},...,x_{n})|\leq c_{i}.

Note that 𝐔𝐢𝟎=𝐖i0𝐕\mathbf{U^{i_{0}}}=\mathbf{W}\oplus_{i_{0}}\mathbf{V}. For arbitrary instance Φ\Phi and ordering vector ZZ, we have

d(π(σΦ,Z,𝐔𝟎),π(σΦ,Z,𝐔𝐢𝟎))\displaystyle d(\pi(\sigma_{\Phi,Z,\mathbf{U^{0}}}),\pi(\sigma_{\Phi,Z,\mathbf{U^{i_{0}}}})) =d(π(σΦ,Z,𝐕),π(σΦ,Z,𝐖i0𝐕)).\displaystyle=d(\pi(\sigma_{\Phi,Z,\mathbf{V}}),\pi(\sigma_{\Phi,Z,\mathbf{W}\oplus_{i_{0}}\mathbf{V}})).

So, given a random instance 𝚽\mathbf{\Phi} and a random ordering vector 𝐙\mathbf{Z}, we can write d(π(σ0),π(σi0))d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}})) as a function f:{0,1}2nf:\{0,1\}^{2n}\rightarrow\mathbb{R} on variables 𝐕𝐖\mathbf{V}\cdot\mathbf{W} given by f(𝐕𝐖)=d(π(σ𝚽,𝐙,𝐕),π(σ𝚽,𝐙,𝐖i0𝐕))f(\mathbf{V}\cdot\mathbf{W})=d(\pi(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}),\pi(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{W}\oplus_{i_{0}}\mathbf{V}})). Conditioning on (4), we can verify that ff satisfies bounded differences property with ci=2n1/6c_{i}=2n^{1/6} for i[n]i\in[n], and thus we have

Pr[|d(π(σ0),π(σi0))12ϵn|14ϵn]\displaystyle\Pr\left[\,{\left|d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))-\frac{1}{2}\epsilon^{\prime}n\right|\geq\frac{1}{4}\epsilon^{\prime}n}\,\right]
Pr[|d(π(σ0),π(σi0))𝔼[d(π(σ0),π(σi0))]|14ϵnn1/6]\displaystyle\leq\Pr\left[\,{\left|d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))-\mathop{{}\mathbb{E}}\left[{d(\pi(\sigma_{0}),\pi(\sigma_{i_{0}}))}\right]\right|\geq\frac{1}{4}\epsilon^{\prime}n-n^{1/6}}\,\right]
=Pr[|f(VW)𝔼[f(VW)]|14ϵnn1/6]\displaystyle=\Pr\left[\,{\left|f(V\cdot W)-\mathop{{}\mathbb{E}}\left[{f(V\cdot W)}\right]\right|\geq\frac{1}{4}\epsilon^{\prime}n-n^{1/6}}\,\right]
2exp(2(14ϵnn1/6)2(n+i0)n1/6)\displaystyle\leq 2\exp\left(-\frac{2\left(\frac{1}{4}\epsilon^{\prime}n-n^{1/6}\right)^{2}}{(n+i_{0})n^{1/6}}\right)
2exp(18n5/6+o(n5/6))\displaystyle\leq 2\exp\left(-\frac{1}{8}n^{5/6}+o(n^{5/6})\right)

by McDiarmid’s inequality. Since the condition (4) holds with probability 1exp(lnn(lnlnn)ξ/4)11-\exp(-\ln n(\ln\ln n)^{\xi/4})\rightarrow 1 as nn\rightarrow\infty, the inequality in Lemma 10 holds with high probability. ∎

Proof of Lemma 11.

Fix an arbitrary i[n]i\in[n]. Note that we have 𝐔𝟎=(𝐕1,𝐕2,,𝐕n)\mathbf{U^{0}}=(\mathbf{V}_{1},\mathbf{V}_{2},\cdots,\mathbf{V}_{n}) and 𝐔𝐢=(𝐖1,𝐖2,,𝐖i,\mathbf{U^{i}}=(\mathbf{W}_{1},\mathbf{W}_{2},\cdots,\mathbf{W}_{i}, 𝐕i+1,,𝐕n)\mathbf{V}_{i+1},\cdots,\mathbf{V}_{n}), where 𝐕1,𝐕2,,𝐕n,𝐖1,𝐖2,,𝐖n\mathbf{V}_{1},\mathbf{V}_{2},\cdots,\mathbf{V}_{n},\mathbf{W}_{1},\mathbf{W}_{2},\cdots,\mathbf{W}_{n} are uniformly distributed over [0,1][0,1], independently. Conditioning on 𝚽,𝐙,𝐕i+1,,𝐕n\mathbf{\Phi},\mathbf{Z},\mathbf{V}_{i+1},\cdots,\mathbf{V}_{n}, the assignment σ𝚽,𝐙,𝐔𝟎\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}} only depends on 𝐕1,,𝐕i\mathbf{V}_{1},\cdots,\mathbf{V}_{i}, and the assignment σ𝚽,𝐙,𝐔𝐢\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}} only depends on 𝐖1,,𝐖i\mathbf{W}_{1},\cdots,\mathbf{W}_{i}, and we have

𝔼𝐕1,,𝐕i,𝐖1,,𝐖i[ 1(σ𝚽,𝐙,𝐔𝟎𝒮(Φ))𝟙(σ𝚽,𝐙,𝐔𝐢𝒮(Φ))]\displaystyle\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i},\mathbf{W}_{1},\cdots,\mathbf{W}_{i}}\left[\;\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}}\in\mathcal{S}(\Phi))\cdot\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\in\mathcal{S}(\Phi))\;\right]
=(𝔼𝐕1,,𝐕i𝟙(σ𝚽,𝐙,𝐔𝟎𝒮(Φ)))(𝔼𝐖1,,𝐖i𝟙(σ𝚽,𝐙,𝐔𝐢𝒮(Φ)))\displaystyle=\left(\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i}}\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}}\in\mathcal{S}(\Phi))\right)\cdot\left(\mathbb{E}_{\mathbf{W}_{1},\cdots,\mathbf{W}_{i}}\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\in\mathcal{S}(\Phi))\right)
=(𝔼𝐕1,,𝐕i𝟙(σ𝚽,𝐙,𝐕𝒮(Φ)))2\displaystyle=\left(\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i}}\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\in\mathcal{S}(\Phi))\right)^{2}

Therefore, by Jensen’s inequality, we have

Pr𝚽,𝐙,𝐕,𝐖[σ0𝒮(Φ) and σi𝒮(Φ)]\displaystyle\Pr_{\mathbf{\Phi},\mathbf{Z},\mathbf{V},\mathbf{W}}\left[\;\sigma_{0}\in\mathcal{S}(\Phi)\textnormal{\;and\;}\sigma_{i}\in\mathcal{S}(\Phi)\;\right]
=Pr𝚽,𝐙,𝐕,𝐖[σ𝚽,𝐙,𝐔𝟎𝒮(Φ) and σ𝚽,𝐙,𝐔𝐢𝒮(Φ)]\displaystyle=\Pr_{\mathbf{\Phi},\mathbf{Z},\mathbf{V},\mathbf{W}}\left[\;\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}}\in\mathcal{S}(\Phi)\textnormal{\;and\;}\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\in\mathcal{S}(\Phi)\;\right]
=𝔼𝚽,𝐙,𝐕i+1,,𝐕n𝔼𝐕1,,𝐕i,𝐖1,,𝐖i[ 1(σ𝚽,𝐙,𝐔𝟎𝒮(Φ))𝟙(σ𝚽,𝐙,𝐔𝐢𝒮(Φ))]\displaystyle=\mathbb{E}_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}_{i+1},\cdots,\mathbf{V}_{n}}\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i},\mathbf{W}_{1},\cdots,\mathbf{W}_{i}}\left[\;\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{0}}}\in\mathcal{S}(\Phi))\cdot\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{U^{i}}}\in\mathcal{S}(\Phi))\;\right]
=𝔼𝚽,𝐙,𝐕i+1,,𝐕n(𝔼𝐕1,,𝐕i𝟙(σ𝚽,𝐙,𝐕𝒮(Φ)))2\displaystyle=\mathbb{E}_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}_{i+1},\cdots,\mathbf{V}_{n}}\left(\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i}}\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\in\mathcal{S}(\Phi))\right)^{2}
(𝔼𝚽,𝐙,𝐕i+1,,𝐕n𝔼𝐕1,,𝐕i𝟙(σ𝚽,𝐙,𝐕𝒮(Φ)))2\displaystyle\geq\left(\mathbb{E}_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}_{i+1},\cdots,\mathbf{V}_{n}}\mathbb{E}_{\mathbf{V}_{1},\cdots,\mathbf{V}_{i}}\mathbbm{1}(\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\in\mathcal{S}(\Phi))\right)^{2}
=(Pr𝚽,𝐙,𝐕[σ𝚽,𝐙,𝐕𝒮(Φ)])2\displaystyle=\left(\Pr_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\left[\;\sigma_{\mathbf{\Phi},\mathbf{Z},\mathbf{V}}\in\mathcal{S}(\Phi)\;\right]\right)^{2}
=αn2.\displaystyle=\alpha_{n}^{2}.

Appendix B Upper Bound of Cluster Diameter

In this section, we give the proof of Lemma 12, which gives an upper bound for the diameter of a cluster that can be written as a closed form expression. From Lemma 3, we know that for any k3k\geq 3 and rcore(k)<r<rsat(k)r_{core}(k)<r<r_{sat}(k) w.h.p. the diameter of a cluster is upper bounded by μ(k,r)n+o(n)\mu(k,r)n+o(n), where

μ(k,r)=exp(krQk,rk1)+krQk,rk1exp(krQk,rk1)\displaystyle\mu(k,r)=\exp(-krQ_{k,r}^{k-1})+krQ_{k,r}^{k-1}\exp(-krQ_{k,r}^{k-1})

and Qk,rQ_{k,r} is the largest solution of the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}) with the given values of kk and rr. This implicit expression would make calculations complicated. So, we slightly relax the upper bound to obtain a simpler expression μu(k)\mu_{u}(k) in Lemma 12. Note that this lemma can only be applied when k4k\geq 4 due to some calculation restriction, which will be mentioned later in this section.

To prove Lemma 12, we first re-write the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}) as

krQk1=ln(1Q).\displaystyle krQ^{k-1}=-\ln(1-Q).

Since Qk,rQ_{k,r} satisfies this equation, we can write μ(k,r)\mu(k,r) as

μ(k,r)\displaystyle\mu(k,r) =exp(krQk,rk1)+krQk,rk1exp(krQk,rk1)\displaystyle=\exp(-krQ_{k,r}^{k-1})+krQ_{k,r}^{k-1}\exp(-krQ_{k,r}^{k-1})
=(1Qk,r)(1Qk,r)ln(1Qk,r)\displaystyle=(1-Q_{k,r})-(1-Q_{k,r})\ln(1-Q_{k,r}) (7)

Note that Qk,rQ_{k,r} must lie in the interval [0,1)[0,1) as

Q0<1exp(krQk1)\displaystyle Q\leq 0<1-\exp(-krQ^{k-1})  for any Q0and\displaystyle\text{\quad for any\;}Q\leq 0\quad{\;and}
Q>11exp(krQk1)\displaystyle Q>1\geq 1-\exp(-krQ^{k-1})  for any Q>1.\displaystyle\text{\quad for any\;}Q>1.

Further note that the real-valued function f(x)=(1x)(1x)ln(1x)f(x)=(1-x)-(1-x)\ln(1-x) is strictly decreasing on [0,1)[0,1). So, it suffices to find a lower bound of Qk,rQ_{k,r}, in order to find an upper bound of μ(k,r)\mu(k,r).

Next, we try to show that e1/ke^{-1/k} is a lower bound of Qk,rQ_{k,r}. To facilitate the calculation, we define a real-valued analytic function G:[3,+)×[0,1]×[0,1]G:[3,+\infty)\times[0,1]\times[0,1]\rightarrow\mathbb{R} by

G(k,r,Q)=1exp(krQk1)Q,\displaystyle G(k,r,Q)=1-\exp(-krQ^{k-1})-Q,

which have the following derivatives:

Gr\displaystyle\frac{\partial G}{\partial r} =exp(krQk1)kQk1\displaystyle=\exp(-krQ^{k-1})\cdot kQ^{k-1} (8)
GQ\displaystyle\frac{\partial G}{\partial Q} =exp(krQk1)kr(k1)Qk21\displaystyle=\exp(-krQ^{k-1})\cdot kr(k-1)Q^{k-2}-1 (9)
2GQ2\displaystyle\frac{\partial^{2}G}{\partial Q^{2}} =exp(krQk1)kr(k1)Qk3[(k2)kr(k1)Qk1]\displaystyle=\exp(-krQ^{k-1})\cdot kr(k-1)Q^{k-3}[(k-2)-kr(k-1)Q^{k-1}] (10)

Before proving that e1/ke^{-1/k} is a lower bound of Qk,rQ_{k,r}, we give the following two lemmas, Lemma 15 and Lemma 16, which will be used later.

Lemma 15.

For any k3k\geq 3, G(k,rcore(k),e1/k)0G(k,r_{core}(k),e^{-1/k})\leq 0.

Proof.

We prove it by contradiction. Assume G(k,rcore(k),e1/k)>0G(k,r_{core}(k),e^{-1/k})>0 for some k3k\geq 3. By the continuity of GG, there exists r<rcore(k)r^{\prime}<r_{core}(k) such that G(k,r,e1/k)>0G(k,r^{\prime},e^{-1/k})>0. Note that G(k,r,1)=1exp(kr)1<0G(k,r^{\prime},1)=1-\exp(-kr^{\prime})-1<0. Again, by the continuity of GG, there exists Q(e1/k,1)Q^{\prime}\in(e^{-1/k},1) such that G(k,r,Q)=0G(k,r^{\prime},Q^{\prime})=0. Since 0<r<rcore(k)0<r^{\prime}<r_{core}(k), this contradicts the definition of rcore(k)r_{core}(k). ∎

Lemma 16.

For any k3k\geq 3, there exists r1(k)(rcore(k),1)r_{1}(k)\in(r_{core}(k),1) such that

G(k,r,e1/k){<0 for rcore(k)r<r1(k)=0 for r=r1(k)>0 for r1(k)<r1\displaystyle G(k,r,e^{-1/k})\begin{cases}<0&\text{\quad for \;}r_{core}(k)\leq r<r_{1}(k)\\ =0&\text{\quad for \;}r=r_{1}(k)\\ >0&\text{\quad for \;}r_{1}(k)<r\leq 1\end{cases}
Proof.

Note that limrG(k,r,e1/k)=limr1exp(kre(k1)/k)e1/k=1e1/k>0\lim_{r\rightarrow\infty}G(k,r,e^{-1/k})=\lim_{r\rightarrow\infty}1-\exp(-kre^{-(k-1)/k})-e^{-1/k}=1-e^{-1/k}>0. That means for sufficiently large rr, G(k,r,e1/k)>0G(k,r,e^{-1/k})>0. On the other hands, from Lemma 15, we know that G(k,rcore(k),e1/k)0G(k,r_{core}(k),e^{-1/k})\leq 0 for any k3k\geq 3. Moreover, from (8), we have rG(k,r,ek)>0\frac{\partial}{\partial r}G(k,r,e^{-k})>0, which implies G(k,r,e1/k)G(k,r,e^{-1/k}) is strictly increasing with rr for r>0r>0. Therefore, there exists r1=r1(k)(rcore(k),1)r_{1}=r_{1}(k)\in(r_{core}(k),1) such that G(k,r,e1/k)<0G(k,r,e^{-1/k})<0 for rcore(k)r<r1(k)r_{core}(k)\leq r<r_{1}(k), G(k,r,e1/k)>0G(k,r,e^{-1/k})>0 for r1(k)<r1r_{1}(k)<r\leq 1 and G(k,r1(k),e1/k)=0G(k,r_{1}(k),e^{-1/k})=0. ∎

With Lemma 15 and Lemma 16, we can prove the e1/ke^{-1/k} is a lower bound for Qk,rQ_{k,r}. We split the proof into two cases: the case of r[rcore(k),r1(k)]r\in[r_{core}(k),r_{1}(k)], and the case of r(r1(k),1]r\in(r_{1}(k),1]. We first study the latter case, which is easier to be proved.

Lemma 17.

For any k3k\geq 3 and r(r1(k),1]r\in(r_{1}(k),1], we have Qk,r>e1/kQ_{k,r}>e^{-1/k}.

Proof.

From Lemma 16, we have G(k,r,e1/k)>0G(k,r,e^{-1/k})>0 since r>r1(k)r>r_{1}(k). Note that G(k,r,1)=1exp(kr)1<0G(k,r,1)=1-\exp(-kr)-1<0. By the continuity of GG, there exists at least one Q(e1/k,1)Q^{\prime}\in(e^{-1/k},1) such that G(k,r,Q)=0G(k,r,Q^{\prime})=0, which can be written as Q=1exp(kr(Q)k1)Q^{\prime}=1-\exp(-kr(Q^{\prime})^{k-1}). By the definition of Qk,rQ_{k,r}, we then have Qk,rQ>e1/kQ_{k,r}\geq Q^{\prime}>e^{-1/k}. ∎

Next, we study the case of r[rcore(k),r1(k)]r\in[r_{core}(k),r_{1}(k)]. To prove that Qk,r>e1/kQ_{k,r}>e^{-1/k}, we need the two following facts from the basic calculus. With these two facts, we can prove that Qk,r>e1/kQ_{k,r}>e^{-1/k} for r[rcore(k),r1(k)]r\in[r_{core}(k),r_{1}(k)] in Lemma 18. Note that the condition k4k\geq 4 is required in the calculation in the proof of Lemma 18. This is the reason why Lemma 12 requires k4k\geq 4.

Fact 1.

Let f:[a,b]f:[a,b]\rightarrow\mathbb{R} be an analytic function. If f′′(x)>0f^{\prime\prime}(x)>0 for any x(a,b)x\in(a,b), then maxx[a,b]f(x)=max{f(a),f(b)}\max_{x\in[a,b]}f(x)=\max\{f(a),f(b)\}.

Fact 2.

Let f:[a,b]f:[a,b]\rightarrow\mathbb{R} be an analytic function. If there exists c(a,b)c\in(a,b) such that

f′′(x){>0 for x(a,c)=0 for x=c<0 for x(c,b)\displaystyle f^{\prime\prime}(x)\begin{cases}>0&\text{\quad for \;}x\in(a,c)\\ =0&\text{\quad for \;}x=c\\ <0&\text{\quad for \;}x\in(c,b)\end{cases}

and f(b)0f^{\prime}(b)\geq 0, then maxx[a,b]f(x)=max{f(a),f(b)}\max_{x\in[a,b]}f(x)=\max\{f(a),f(b)\}.

Lemma 18.

For any k4k\geq 4 and r[rc,r1]r\in[r_{c},r_{1}], where rc=rcore(k)r_{c}=r_{core}(k) and r1=r1(k)r_{1}=r_{1}(k), we have Qk,r>e1/kQ_{k,r}>e^{-1/k}.

Proof.

Given arbitrary k3k\geq 3 and Q>0Q>0, from (8), we know that G(k,r,Q)G(k,r,Q) is strictly increasing with r[rc,r1]r\in[r_{c},r_{1}], and thus we have Q(k,r,Q)<Q(k,r1,Q)Q(k,r,Q)<Q(k,r_{1},Q) for any r[rc,r1]r\in[r_{c},r_{1}].

From (10), when we view GG as a function of QQ with fixed k3k\geq 3 and r>0r>0 there are two points of inflection:

Q=0 and Q=(k2kr(k1))1/(k1)\displaystyle Q=0\text{\quad and \quad}Q=\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}

Now, we consider the following two cases separately:

  1. 1.

    (k2kr(k1))1/(k1)<e1/k\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}<e^{-1/k}

  2. 2.

    e1/k(k2kr(k1))1/(k1)e^{-1/k}\leq\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}

Case 1. Assume (k2kr(k1))1/(k1)<e1/k(\frac{k-2}{kr(k-1)})^{1/(k-1)}<e^{-1/k}. Then we have

2Q2G(k,r1,Q){>0 for 0<Q<(k2kr(k1))1/(k1)=0 for Q=(k2kr(k1))1/(k1)<0 for (k2kr(k1))1/(k1)<Q<e1/k\displaystyle\frac{\partial^{2}}{\partial Q^{2}}G(k,r_{1},Q)\begin{cases}>0&\text{\quad for \;}0<Q<\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}\\ =0&\text{\quad for \;}Q=\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}\\ <0&\text{\quad for \;}\left(\frac{k-2}{kr(k-1)}\right)^{1/(k-1)}<Q<e^{-1/k}\end{cases}

Moreover, since G(k,r1,e1/k)=0G(k,r_{1},e^{-1/k})=0, we have e1/k=1exp(kr1e(k1)/k)e^{-1/k}=1-\exp(-kr_{1}e^{-(k-1)/k}). Therefore, we have

QG(k,r1,Q)|Q=e1/k\displaystyle\left.\frac{\partial}{\partial Q}G(k,r_{1},Q)\right|_{Q=e^{-1/k}} =exp(kr1e(k1)/k)kr1(k1)e(k2)/k1\displaystyle=\exp(-kr_{1}e^{-(k-1)/k})\cdot kr_{1}(k-1)e^{-(k-2)/k}-1
=exp(kr1e(k1)/k)kr1e(k1)/k(k1)e1/k1\displaystyle=\exp(-kr_{1}e^{-(k-1)/k})\cdot kr_{1}e^{-(k-1)/k}\cdot(k-1)e^{1/k}-1
=(1e1/k)[ln(1e1/k)](k1)e1/k1\displaystyle=(1-e^{-1/k})\cdot[-\ln(1-e^{-1/k})]\cdot(k-1)e^{1/k}-1
>0\displaystyle>0

for k4k\geq 4. By Fact 2, we have G(k,r1,Q)<max{G(k,r1,0),G(k,r1,e1/k)}G(k,r_{1},Q)<\max\{G(k,r_{1},0),G(k,r_{1},e^{-1/k})\} for any G(0,e1/k)G\in(0,e^{-1/k}). Note that both G(k,r1,0)G(k,r_{1},0) and G(k,r1,e1/k)G(k,r_{1},e^{-1/k}) are less than 0. So, G(k,r,Q)G(k,r1,Q)<0G(k,r,Q)\leq G(k,r_{1},Q)<0 for any k4k\geq 4, r[rc,r1]r\in[r_{c},r_{1}] and Q[0,e1/k]Q\in[0,e^{-1/k}]. Therefore, Qk,rQ_{k,r} cannot be in [0,e1/k][0,e^{-1/k}] by its definition, and hence Qk,r>e1/kQ_{k,r}>e^{-1/k}.


Case 2. Assume e1/k(k2kr(k1))1/(k1)e^{-1/k}\leq(\frac{k-2}{kr(k-1)})^{1/(k-1)}. From (10), we have

2Q2G(k,r1,Q)>0\displaystyle\frac{\partial^{2}}{\partial Q^{2}}G(k,r_{1},Q)>0

for Q[0,e1/k]Q\in[0,e^{-1/k}]. By Fact 1, we know that for G[0,e1/k]G\in[0,e^{-1/k}]

G(k,r1,Q)max{G(k,r1,0),G(k,r1,e1/k)}.\displaystyle G(k,r_{1},Q)\leq\max\{G(k,r_{1},0),G(k,r_{1},e^{-1/k})\}.

Since both G(k,r1,0)G(k,r_{1},0) and G(k,r1,e1/k)G(k,r_{1},e^{-1/k}) are less than 0, we have G(k,r,Q)G(k,r1,Q)<0G(k,r,Q)\leq G(k,r_{1},Q)<0 for any k3k\geq 3, r[rc,r1]r\in[r_{c},r_{1}] and Q[0,e1/k]Q\in[0,e^{-1/k}]. Therefore, Qk,rQ_{k,r} cannot be in [0,e1/k][0,e^{-1/k}] by its definition, and hence Qk,r>e1/kQ_{k,r}>e^{-1/k}. ∎

Now, we can complete the proof for Lemma 12.

Proof of Lemma 12..

Let k4k\geq 4 and r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)). From the fixed point equation Q=1exp(krQk1)Q=1-\exp(-krQ^{k-1}), we have krQk,rk1=ln(1Qk,r)krQ_{k,r}^{k-1}=-\ln(1-Q_{k,r}). Note that the real-valued function (1x)(1x)ln(1x)(1-x)-(1-x)\ln(1-x) is strictly decreasing on [0,1)[0,1), and Qk,r[0,1)Q_{k,r}\in[0,1). By Lemma 17 and Lemma 18, we have Qk,r>e1/kQ_{k,r}>e^{-1/k}. Combining all these, the result is followed by

μ(k,r)\displaystyle\mu(k,r) =exp(krQk,rk1)+krQk,rk1exp(krQk,rk1)\displaystyle=\exp(-krQ_{k,r}^{k-1})+krQ_{k,r}^{k-1}\exp(-krQ_{k,r}^{k-1})
=(1Qk,r)(1Qk,r)ln(1Qk,r)\displaystyle=(1-Q_{k,r})-(1-Q_{k,r})\ln(1-Q_{k,r})
<(1e1/k)(1e1/k)ln(1e1/k)\displaystyle<(1-e^{-1/k})-(1-e^{-1/k})\ln(1-e^{-1/k})
=μu(k).\displaystyle=\mu_{u}(k).

Appendix C Terminologies

In this section, we cover the definition of some terminologies used in Appendices D and E.

C.1 Degree profile

The distribution of the degrees of nodes in a factor graph can be described by degree profile, which plays an important role in our analysis. Given a factor graph GG, let nin_{i} be the number of variable nodes of degree ii for all ii. Similarly, let mim_{i} be the number of equations nodes of degree ii for all ii. Furthermore, let n^\widehat{n} be the total number of variable nodes and m^\widehat{m} be the total number of equation nodes. It is obvious that n^=n\widehat{n}=n and m^=m\widehat{m}=m for the factor graph of a kk-XORSAT instance, but keep in mind that the total number n^\widehat{n} of variable nodes and the total number m^\widehat{m} of equation nodes decrease during the process of the algorithm. The degree distribution of variable nodes is given by the sequence Λ={Λi}i0\Lambda=\{\Lambda_{i}\}_{i\geq 0}, where Λi=ni/n^\Lambda_{i}=n_{i}/\widehat{n}, and the degree distribution of equation nodes is given by the sequence P={Pi}i0P=\{P_{i}\}_{i\geq 0}, where Pi=mi/m^P_{i}=m_{i}/\widehat{m}. Then, the degree profile of the factor graph GG is given by (Λ,P)(\Lambda,P). Sometimes, the degree distributions Λ\Lambda and PP can also be represented by the polynomials Λ(x)=i0Λixi\Lambda(x)=\sum_{i\geq 0}\Lambda_{i}x^{i} and P(x)=i0PixiP(x)=\sum_{i\geq 0}P_{i}x^{i}, respectively. With this representation, we can write i1iΛi=Λ(1)\sum_{i\geq 1}i\Lambda_{i}=\Lambda^{\prime}(1) and i1iPi=P(1)\sum_{i\geq 1}iP_{i}=P^{\prime}(1).

Given the factor graph GG of a random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), GG is uniformly distributed over the ensemble of kk-uniform factor graph 𝔾n(k,rn)\mathbb{G}_{n}(k,rn). It is clear that the degree distribution of equation nodes is given by P(x)=xkP(x)=x^{k}. The degree distribution of variable nodes is also known to converge in distribution to independent Poisson random variables with mean krkr. Precisely speaking, w.h.p. for any 0irn0\leq i\leq rn, we have

Λi=ekr(kr)ii!+o(1).\displaystyle\Lambda_{i}=e^{-kr}\frac{(kr)^{i}}{i!}+o(1).

We can also describe the degree profile from edge perspective. The edge perspective degree distribution of variable nodes is given by λ={λi}i1\lambda=\{\lambda_{i}\}_{i\geq 1}, where λi=iΛi/j1jΛj\lambda_{i}=i\Lambda_{i}/\sum_{j\geq 1}j\Lambda_{j}, and the edge perspective degree distribution of equation nodes is given by ρ={ρi}i1\rho=\{\rho_{i}\}_{i\geq 1}, where ρi=iPi/j1jPj\rho_{i}=iP_{i}/\sum_{j\geq 1}jP_{j}. Then, the edge perspective degree profile of the factor graph GG is given by (λ,ρ)(\lambda,\rho). Similar to Λ\Lambda and PP, the edge perspective degree distribution λ\lambda and ρ\rho can be written as the polynomials λ(x)=i1λixi1\lambda(x)=\sum_{i\geq 1}\lambda_{i}x^{i-1} and ρ(x)=i1ρixi1\rho(x)=\sum_{i\geq 1}\rho_{i}x^{i-1}, respectively.

The ensemble of factor graphs with prescribed degree profile is called the ensemble of degree constrained factor graphs 𝔻n(Λ,P)\mathbb{D}_{n}(\Lambda,P), which is the set of all factor graphs of nn variable nodes with degree profile (Λ,P)(\Lambda,P) with the uniform distribution. Note that the number mm of the function nodes is restricted to satisfy the equation Λ(1)n=P(1)m\Lambda^{\prime}(1)n=P^{\prime}(1)m.

C.2 Local tree-like structure

In a factor graph GG, we can define the length of a path (v1,v2,,vl)(v_{1},v_{2},...,v_{l}) to be the number of edges in the path. Then, the distance between two nodes is defined to be the length of the shortest path between them. By convention, we set the length to be ++\infty between two nodes if there is no path connecting them. With this notion of distance, we can define the local neighborhood of a node. Given a factor graph GG, the local neighborhood (or simply neighborhood) BG(x,R)B_{G}(x,R) of a node xx of radius R0R\geq 0 is defined to be the subgraph of GG induced by all nodes of distances at most RR from xx and all edges between those nodes. The local neighborhood BG(x,R)B_{G}(x,R) also represents an XORSAT instance with the variables and clauses inside the neighborhood.

The local neighborhood of a variable in a random kk-uniform factor graph looks like a tree. In particular, it looks similar to a random RR-generation tree. For any non-negative even number R0R\geq 0, the RR-generation tree ensemble 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda,P) of a given degree profile (Λ,P)(\Lambda,P) is defined as follows. When R=0R=0, the ensemble contains only one element, a single isolated node, and call it the variable node of the generation 0. Assume R>0R>0. We first generate a tree TT from the (R2)(R-2)-generation tree ensemble 𝕋R2(Λ,P)\mathbb{T}_{R-2}(\Lambda,P). For each variable node xx of generation R2R-2, we draw an independent integer i1i\geq 1 distributed according to λi\lambda_{i} (or Λi\Lambda_{i} if R=2R=2), and add i1i-1 function nodes, which are connected to xx as its children. Then, for each of these function nodes aa, we draw an independent integer j1j\geq 1 distributed according to ρj\rho_{j}, and add j1j-1 variable nodes, which are connected to aa as its children and called the variable nodes of the generation RR.

In particular, Mézard and Montanari [MM09] shows that the local structure of a variable node of a random factor graph from 𝔻n(Λ,P)\mathbb{D}_{n}(\Lambda,P) and 𝔾n(k,m)\mathbb{G}_{n}(k,m) converges to this tree ensemble. The more details of the following theorem can be found in [MM09].

Theorem 5.

Let (Λ,P)(\Lambda,P) be a fixed degree profile, GG be a random factor graph in the 𝔻n(Λ,P)\mathbb{D}_{n}(\Lambda,P) ensemble (and 𝔾n(k,m)\mathbb{G}_{n}(k,m) respectively), xx be a variable node chosen uniformly at random from GG, and RR be a non-negative even number. Then, the local neighborhood BG(x,R)B_{G}(x,R) of the factor graph GG converges in distribution to 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda,P) (and 𝕋R(ekr(x1),xk)\mathbb{T}_{R}(e^{kr(x-1)},x^{k}) respectively) as nn\rightarrow\infty.

Appendix D Proof of Lemma 7: Finding the number of free steps in DECUC

In this section, we will show the calculation for finding the number of free steps run by the UC-decimation DECUC on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn), which implies that DECUC is w1(k,r)w_{1}(k,r)-free. To be specific, we will show the proof of Lemma 7, by using the Wormald’s method of differential equations [Wor95].

We start from approximating the degree profile of the factor graph of the remaining instance after tt iterations. The degree profiles can help us to calculate the probability of the next iteration being a free step. Note that the notations for describing the degree profile are defined in Appendix C.1. For each of those notations, we append ”(t)(t)” to them to specify the values right after tt iterations.

Lemma 19.

For any local rule τ\tau, after tt iterations of DECτ\tau, w.h.p.the number of variables of degree ii is given by

ni(t)\displaystyle n_{i}(t) =(rni)(kn)i(1kn)rni(nt)+o(n), for i=0,1,,m,\displaystyle=\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}(n-t)+o(n),\text{\quad for $i=0,1,...,m$}, (11)

and the number of equations of degree ii is given by

mi(t)\displaystyle m_{i}(t) =(ki)(tn)ki(1tn)irn+o(n), for i=1,2,,k.\displaystyle=\binom{k}{i}\left(\frac{t}{n}\right)^{k-i}\left(1-\frac{t}{n}\right)^{i}rn+o(n),\text{\quad for $i=1,2,...,k$}. (12)
Proof.

We are going to track the changes in the numbers of variable nodes and equation nodes of different degrees throughout the process of the algorithm, by using the method of differential equations from [Wor95].

To apply the method of differential equations, we need to compute the expected changes in the numbers of variable nodes and equation nodes of different degrees. Note that exactly one variable node is removed in each iteration, so the total number n^(t)\widehat{n}(t) of variable nodes is ntn-t after tt iterations.

Suppose we are at time t0t\geq 0 (i.e. after tt iterations). The selected variable node xs(t)x_{s(t)} is going to be removed from the factor graph G𝚽𝐭G_{\mathbf{\Phi_{t}}}. The selected variable node xs(t)x_{s(t)} is of degree ii with probability ni(t)/n^(t)n_{i}(t)/\widehat{n}(t). So, for i=0,1,2,,mi=0,1,2,...,m, the expected change in the number nin_{i} of variable nodes of degree ii is 1×ni(t)/n^(t)-1\times n_{i}(t)/\widehat{n}(t). This yields

𝔼[ni(t+1)ni(t)]\displaystyle\mathop{{}\mathbb{E}}\left[{n_{i}(t+1)-n_{i}(t)}\right] =ni(t)n^(t)=ni(t)nt\displaystyle=-\frac{n_{i}(t)}{\widehat{n}(t)}=-\frac{n_{i}(t)}{n-t} (13)

for i=0,1,,mi=0,1,...,m. By Theorem 5, the local neighborhood BΦt(xs(t),R)B_{\Phi_{t}}(x_{s(t)},R) of the selected variable xs(t)x_{s(t)} of radius RR in G𝚽𝐭G_{\mathbf{\Phi_{t}}} converges in distribution to 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda^{\prime},P^{\prime}), where Λi=ni(t)/n^(t)\Lambda^{\prime}_{i}=n_{i}(t)/\widehat{n}(t) for all ii and Pj=mj(t)/m^(t)P^{\prime}_{j}=m_{j}(t)/\widehat{m}(t) for all jj. Therefore, with probability nj/n^n_{j}/\widehat{n} for j=0,1,2,,mj=0,1,2,...,m, there are jj equation nodes directly adjacent to the selected variable node. For those equation nodes, the degree of each equation node is ll with probability ρl(t)\rho_{l}(t), and decreases by 1 due to the removal of the selected variable node. In the other words, the expected changes in the numbers of equation nodes of different degree are given by

𝔼[mk(t+1)mk(t)]\displaystyle\mathop{{}\mathbb{E}}\left[{m_{k}(t+1)-m_{k}(t)}\right] =j=1mjnj(t)n^(t)(ρk(t)(1))\displaystyle=\sum_{j=1}^{m}j\frac{n_{j}(t)}{\widehat{n}(t)}\cdot(\rho_{k}(t)\cdot(-1))
=(j=1mjnj(t)nt)kmk(t)j=1kjmj(t)\displaystyle=\left(\sum_{j=1}^{m}j\frac{n_{j}(t)}{n-t}\right)\frac{-km_{k}(t)}{\sum_{j=1}^{k}jm_{j}(t)}
=(j=1mjnj(t)nt)kmk(t)j=1mjnj(t)\displaystyle=\left(\sum_{j=1}^{m}j\frac{n_{j}(t)}{n-t}\right)\frac{-km_{k}(t)}{\sum_{j=1}^{m}jn_{j}(t)}
=kmk(t)nt\displaystyle=\frac{-km_{k}(t)}{n-t} (14)

and

𝔼[mi(t+1)mi(t)]\displaystyle\mathop{{}\mathbb{E}}\left[{m_{i}(t+1)-m_{i}(t)}\right] =j=1mjnjn^[ρi+1(t)(+1)+ρi(t)(1)]\displaystyle=\sum_{j=1}^{m}j\frac{n_{j}}{\widehat{n}}\cdot[\rho_{i+1}(t)\cdot(+1)+\rho_{i}(t)\cdot(-1)]
=(j=1mjnj(t)nt)(i+1)mi+1(t)imi(t)j=1kjmj(t)\displaystyle=\left(\sum_{j=1}^{m}j\frac{n_{j}(t)}{n-t}\right)\frac{(i+1)m_{i+1}(t)-im_{i}(t)}{\sum_{j=1}^{k}jm_{j}(t)}
=(j=1mjnj(t)nt)(i+1)mi+1(t)imi(t)j=1mjnj(t)\displaystyle=\left(\sum_{j=1}^{m}j\frac{n_{j}(t)}{n-t}\right)\frac{(i+1)m_{i+1}(t)-im_{i}(t)}{\sum_{j=1}^{m}jn_{j}(t)}
=(i+1)mi+1(t)imi(t)nt for i=1,2,,k1.\displaystyle=\frac{(i+1)m_{i+1}(t)-im_{i}(t)}{n-t}\text{\quad for\;}i=1,2,...,k-1. (15)

In the above calculation, we use the fact that j=1kjmj(t)=j=1mjnj(t)\sum_{j=1}^{k}jm_{j}(t)=\sum_{j=1}^{m}jn_{j}(t), which holds because both j=1kjmj(t)\sum_{j=1}^{k}jm_{j}(t) and j=1mjnj(t)\sum_{j=1}^{m}jn_{j}(t) are equal to the number of edges at time tt.

Let x=t/nx=t/n be the normalized time. Furthermore, let

yi\displaystyle y_{i} =yi(x)=mi(xn)/n for i=1,2,3,,k,\displaystyle=y_{i}(x)=m_{i}(xn)/n\text{\quad for\;}i=1,2,3,...,k,
zi\displaystyle z_{i} =zi(x)=ni(xn)/n for i=0,1,2,,m, and\displaystyle=z_{i}(x)=n_{i}(xn)/n\text{\quad for\;}i=0,1,2,...,m,\text{\quad and}
z\displaystyle z =z(x)=n^(xn)/n.\displaystyle=z(x)=\widehat{n}(xn)/n.

Then, the equations (13), (14) and (15) suggest the following different equations:

dzidx\displaystyle\frac{dz_{i}}{dx} =zi1x\displaystyle=-\frac{z_{i}}{1-x} for i=0,1,2,,m\displaystyle\text{for\;\;}i=0,1,2,...,m (16)
dykdx\displaystyle\frac{dy_{k}}{dx} =kyk1x\displaystyle=\frac{-ky_{k}}{1-x} (17)
dyidx\displaystyle\frac{dy_{i}}{dx} =(i+1)yi+1iyi1x\displaystyle=\frac{(i+1)y_{i+1}-iy_{i}}{1-x} for i=1,2,3,,k1\displaystyle\text{for\;\;}i=1,2,3,...,k-1 (18)

At time t=0t=0 (i.e. before running the algorithm), the number ni(0)n_{i}(0) of variable nodes of degree ii is (rni)(kn)i(1kn)rnin+o(n)\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}n+o(n) for all ii. Since all rnrn equation nodes are of degree kk at time t=0t=0, mk(0)=rnm_{k}(0)=rn and mi(0)=0m_{i}(0)=0 for all ii. These suggest the following initial conditions for those different equations above:

zi(0)\displaystyle z_{i}(0) =(rni)(kn)i(1kn)rni\displaystyle=\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i} for i=0,1,2,,m\displaystyle\text{for\;\;}i=0,1,2,...,m (19)
yk(0)\displaystyle y_{k}(0) =r\displaystyle=r (20)
yi(0)\displaystyle y_{i}(0) =0\displaystyle=0 for i=1,2,3,,k1\displaystyle\text{for\;\;}i=1,2,3,...,k-1 (21)

The solution of the different equations with the initial conditions above is given by

zi(x)\displaystyle z_{i}(x) =(rni)(kn)i(1kn)rni(1x)\displaystyle=\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}(1-x) for i=0,1,2,,m\displaystyle\text{for\;\;}i=0,1,2,...,m
yi(x)\displaystyle y_{i}(x) =r(ki)xk1(1x)i\displaystyle=r\binom{k}{i}x^{k-1}(1-x)^{i} for i=1,2,3,,k\displaystyle\text{for\;\;}i=1,2,3,...,k

for any 0x10\leq x\leq 1. (The steps of solving the differential equations are shown at Appendix D.1.) By the Wormald’s method of differential equations, at time t0t\geq 0, w.h.p. the number ni(t)n_{i}(t) of variable nodes of degree ii is equal to zi(t/n)n+o(n)z_{i}(t/n)n+o(n) for all ii, and the number mi(t)m_{i}(t) of equation nodes of degree ii is equal to yi(t/n)n+o(n)y_{i}(t/n)n+o(n). ∎

In an iteration of the UC-decimation algorithm DECUC, the local rule UC gives 1/21/2 when there is no unit clause containing the selected variable xs(t)x_{s(t)}. Therefore, an iteration is a free step if and only if there is no equation node of degree 1 adjacent to the selected variable node xs(t)x_{s(t)}. With the degree profiles from Lemma 19, we can calculate the probability of an iteration being a free step, and thus approximate the total number of free steps.

Proof of Lemma 7.

We track the number of free steps after tt iterations, by using the Wormald’s method of differential equations. Let q(t)q(t) be the number of free steps after tt iterations. Note that q(0)=0q(0)=0. At time t0t\geq 0, if there exists at least one equation node of degree 1 adjacent to the selected variable node xs(t)x_{s(t)}, then the (t+1)(t+1)-st iteration is not a free step and q(t+1)=q(t)q(t+1)=q(t). Conversely, if all equation nodes adjacent to the selected variable node xs(t)x_{s(t)} are of degree not equal to 1, then the (t+1)(t+1)-st iteration is a free step and q(t+1)=q(t)+1q(t+1)=q(t)+1. The probability that all equation nodes adjacent to the selected variable node are of degree not equal to 1 is given by j=0m(nj(t)/n^(t))(1ρ1(t))i\sum_{j=0}^{m}(n_{j}(t)/\widehat{n}(t))(1-\rho_{1}(t))^{i}. Therefore, we have

𝔼[q(t+1)q(t)]\displaystyle\mathop{{}\mathbb{E}}\left[{q(t+1)-q(t)}\right] =i=0mni(t)n^(t)(1ρ1(t))i(+1)\displaystyle=\sum_{i=0}^{m}\frac{n_{i}(t)}{\widehat{n}(t)}(1-\rho_{1}(t))^{i}\cdot(+1)
=i=0rnni(t)nt(1m1(t)j=1kjmj(t))i.\displaystyle=\sum_{i=0}^{rn}\frac{n_{i}(t)}{n-t}\left(1-\frac{m_{1}(t)}{\sum_{j=1}^{k}jm_{j}(t)}\right)^{i}. (22)

From (12) and the polynomial identity i=1ni(ni)(1x)ixni=n(1x)\sum_{i=1}^{n}i\binom{n}{i}(1-x)^{i}x^{n-i}=n(1-x), we can simplify j=1kjmj(t)\sum_{j=1}^{k}jm_{j}(t) by

j=1kjmj(t)\displaystyle\sum_{j=1}^{k}jm_{j}(t) =j=1kj(ki)(tn)ki(1tn)irn+o(n)\displaystyle=\sum_{j=1}^{k}j\binom{k}{i}\left(\frac{t}{n}\right)^{k-i}\left(1-\frac{t}{n}\right)^{i}rn+o(n)
=(k(1tn))rn+o(n)\displaystyle=\left(k\cdot\left(1-\frac{t}{n}\right)\right)rn+o(n)
=kr(nt)+o(n).\displaystyle=kr(n-t)+o(n).

Then, by (11), (12) and the fact that limn(1kxk1n)rn=ekrxk1\lim_{n\rightarrow\infty}(1-\frac{kx^{k-1}}{n})^{rn}=e^{-krx^{k-1}}, we have

𝔼[q(t+1)q(t)]\displaystyle\mathop{{}\mathbb{E}}\left[{q(t+1)-q(t)}\right] =i=0mni(t)nt(1(k1)(tn)k1(1tn)rnkr(nt))i+o(1)\displaystyle=\sum_{i=0}^{m}\frac{n_{i}(t)}{n-t}\left(1-\frac{\binom{k}{1}\left(\frac{t}{n}\right)^{k-1}\left(1-\frac{t}{n}\right)rn}{kr(n-t)}\right)^{i}+o(1)
=i=0mni(t)nt(1(tn)k1)i+o(1)\displaystyle=\sum_{i=0}^{m}\frac{n_{i}(t)}{n-t}\left(1-\left(\frac{t}{n}\right)^{k-1}\right)^{i}+o(1)
=i=0mni(0)(1(tn)k1)i+o(1)\displaystyle=\sum_{i=0}^{m}n_{i}(0)\left(1-\left(\frac{t}{n}\right)^{k-1}\right)^{i}+o(1)
=i=0m(rni)(kn)i(1kn)rni(1(tn)k1)i+o(1)\displaystyle=\sum_{i=0}^{m}\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}\left(1-\left(\frac{t}{n}\right)^{k-1}\right)^{i}+o(1)
=i=0m(rni)[knkn(tn)k1]i(1kn)rni+o(1)\displaystyle=\sum_{i=0}^{m}\binom{rn}{i}\left[\frac{k}{n}-\frac{k}{n}\left(\frac{t}{n}\right)^{k-1}\right]^{i}\left(1-\frac{k}{n}\right)^{rn-i}+o(1)
=[knkn(tn)k1+1kn]rn+o(1)\displaystyle=\left[\frac{k}{n}-\frac{k}{n}\left(\frac{t}{n}\right)^{k-1}+1-\frac{k}{n}\right]^{rn}+o(1)
=[1kn(tn)k1]rn+o(1)\displaystyle=\left[1-\frac{k}{n}\left(\frac{t}{n}\right)^{k-1}\right]^{rn}+o(1)
=ekr(t/n)k1+o(1)\displaystyle=e^{-kr(t/n)^{k-1}}+o(1) (23)

By letting x=t/nx=t/n and w=w(x)=q(xn)/nw=w(x)=q(xn)/n, the equation (23) and the fact that q(0)=0q(0)=0 suggest the differential equation

dwdx\displaystyle\frac{dw}{dx} =ekrxk1\displaystyle=e^{-krx^{k-1}} (24)

and the initial condition

w(0)=0.\displaystyle w(0)=0. (25)

By the Wormald’s method of differential equations, w.h.p. the number q(t)q(t) of free steps after tt iterations is equal to w(t/n)n+o(n)w(t/n)n+o(n).

By the second fundamental theorem of calculus, we can write them as a definite integral

w(x)w(0)=0xekrtk1𝑑t\displaystyle w(x)-w(0)=\int_{0}^{x}e^{-krt^{k-1}}dt

Note that ddt(krtk1)=kr(k1)tk2\frac{d}{dt}(krt^{k-1})=kr(k-1)t^{k-2}. Using integration by substitution, we have

w(x)\displaystyle w(x) =0+0xekrtk1𝑑t\displaystyle=0+\int_{0}^{x}e^{-krt^{k-1}}dt
=0xekrxk11kr(k1)t2kkr(k1)tk2𝑑t\displaystyle=\int_{0}^{x}e^{-krx^{k-1}}\frac{1}{kr(k-1)}t^{2-k}\cdot kr(k-1)t^{k-2}dt
=0krxk1ekrtk11kr(k1)t2kd(krtk1)\displaystyle=\int_{0}^{krx^{k-1}}e^{-krt^{k-1}}\frac{1}{kr(k-1)}t^{2-k}d(krt^{k-1})
=(kr)11kk10krxk1ekrtk1(kr)2kk1t2kd(krtk1)\displaystyle=\frac{(kr)^{\frac{1}{1-k}}}{k-1}\int_{0}^{krx^{k-1}}e^{-krt^{k-1}}(kr)^{\frac{2-k}{k-1}}t^{2-k}d(krt^{k-1})
=(kr)11kk10krxk1(krtk1)1k11ekrtk1d(krtk1)\displaystyle=\frac{(kr)^{\frac{1}{1-k}}}{k-1}\int_{0}^{krx^{k-1}}(krt^{k-1})^{\frac{1}{k-1}-1}e^{-krt^{k-1}}d(krt^{k-1})
=(kr)11kk1γ(1k1,krxk1)\displaystyle=\frac{(kr)^{\frac{1}{1-k}}}{k-1}\;\gamma\left(\frac{1}{k-1},krx^{k-1}\right)

where γ\gamma is the lower incomplete gamma function defined by

γ(a,x)0xta1et𝑑t.\displaystyle\gamma(a,x)\equiv\int_{0}^{x}t^{a-1}e^{-t}dt.

Hence, w.h.p. the total number of free steps run by the algorithm is w1(k,r)n+o(n)w_{1}(k,r)n+o(n), where

w1(k,r)=(kr)11kk1γ(1k1,kr)=w(1).\displaystyle w_{1}(k,r)=\frac{(kr)^{\frac{1}{1-k}}}{k-1}\;\gamma\left(\frac{1}{k-1},kr\right)=w(1).

Hence, DECUC is w1(k,r)w_{1}(k,r)-free. ∎

D.1 Solving differential equations (16), (17) and (18)

In this section, we show how to solve the differential equations (16), (17) and (18) with the initial conditions (19), (20) and (21). For i=0,1,,mi=0,1,...,m, the differential equations in (16) can be written as

1zidzi=11xdx.\displaystyle\frac{1}{z_{i}}dz_{i}=\frac{-1}{1-x}dx.

By separation of variable in integration and the initial condition (19), we have

zi(x)\displaystyle z_{i}(x) =(1x)zi(0)\displaystyle=(1-x)z_{i}(0)
=(rni)(kn)i(1kn)rnj(1x) for all i=0,1,,m.\displaystyle=\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-j}(1-x)\text{\quad for all\;}i=0,1,...,m. (26)

Next, we are going to use induction from kk to 11 to prove that

yi(x)=r(ki)xki(1x)i\displaystyle y_{i}(x)=r\binom{k}{i}x^{k-i}(1-x)^{i}

First, the differential equation (17) can be written as

1ykdyk=k11xdx\displaystyle\frac{1}{y_{k}}dy_{k}=k\cdot\frac{-1}{1-x}dx

By separation of variable in integration and the initial condition (20), we have

yk(x)=(1x)kyk(0)=r(1x)k\displaystyle y_{k}(x)=(1-x)^{k}y_{k}(0)=r(1-x)^{k}

Now we assume yi+1(x)=r(ki+1)xki1(1x)i+1y_{i+1}(x)=r\binom{k}{i+1}x^{k-i-1}(1-x)^{i+1} for some 1i<k1\leq i<k. From the differential equations in (18), we have

dyidx+i1xyi=i+11xyi+1\displaystyle\frac{dy_{i}}{dx}+\frac{i}{1-x}y_{i}=\frac{i+1}{1-x}y_{i+1}

By applying the standard method for the first order linear differential equations and the induction assumption, we have

yi(x)\displaystyle y_{i}(x) =μ(x)i+11xyi+1𝑑x+cμ(x)\displaystyle=\frac{\int\mu(x)\frac{i+1}{1-x}y_{i+1}dx+c}{\mu(x)}
=1(1x)ii+11xr(ki+1)xki1(1x)i+1𝑑x+c1(1x)i\displaystyle=\frac{\int\frac{1}{(1-x)^{i}}\frac{i+1}{1-x}r\binom{k}{i+1}x^{k-i-1}(1-x)^{i+1}dx+c}{\frac{1}{(1-x)^{i}}}
=i+1kir(ki+1)xk1(1x)i+c(1x)i\displaystyle=\frac{i+1}{k-i}r\binom{k}{i+1}x^{k-1}(1-x)^{i}+c(1-x)^{i}
=r(ki)xk1(1x)i+c(1x)i\displaystyle=r\binom{k}{i}x^{k-1}(1-x)^{i}+c(1-x)^{i}

where cc is a constant and μ(x)=exp(i1x𝑑x)=1(1x)i\mu(x)=\exp\left(\int\frac{i}{1-x}dx\right)=\frac{1}{(1-x)^{i}}. Since 1i<k1\leq i<k, yi(0)=0y_{i}(0)=0 and thus c=0c=0. Hence, we have

yi(x)=r(ki)xki(1x)i,\displaystyle y_{i}(x)=r\binom{k}{i}x^{k-i}(1-x)^{i}, (27)

which completes the induction.

Appendix E Proof of Lemma 8: Finding the number of free steps in DECτ

Now we assume the local rule τ\tau can give the marginal probabilities of variables for any factor graph that is a tree. We then show the calculation for the number of free steps run by the τ\tau-decimation algorithm DECτ on the random kk-XORSAT instance 𝚽𝚽k(n,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{k}(n,rn). To be specific, we show the proof of Lemma 8, by using the Wormald’s method of differential equations [Wor99].

In this section, the approach is slightly different from the proof in Appendix D since we do not have the details on how the local rule τ\tau calculates its output value. However, since we assume that the local rule τ\tau can give the exact marginals, we can determine the value output by the local rule τ\tau, by studying the structure of the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) at time t0t\geq 0.

Assume we are at time t0t\geq 0. We consider the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) of the selected variable xs(t)x_{s(t)} of radius R0R\geq 0. By Theorem 5, the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) converges in distribution to the tree ensemble 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda^{\prime},P^{\prime}), where (Λ,P)(\Lambda^{\prime},P^{\prime}) is the degree distribution of the factor graph 𝚽𝐭\mathbf{\Phi_{t}} at time tt. The values of Λi\Lambda_{i} and PiP_{i} are given by (11) and (12) since Lemma 19 can be applied to τ\tau-decimation with any local rule τ\tau. By our assumption, the value output by the local rule τ\tau is equal to the marginal probability of the selected variable xs(t)x_{s(t)} on a random solution for the XORSAT instance induced by the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R). So, we can obtain the value output by the local rule by calculating the marginal probability of xs(t)x_{s(t)}, using the tree ensemble 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda^{\prime},P^{\prime}).

For any 0lR0\leq l\leq R, we denote by Tl(x)T_{l}(x^{\prime}) a subtree of B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) rooted at some node xx^{\prime} whose distance from xs(t)x_{s(t)} is RlR-l. According to the tree ensemble 𝕋R(Λ,P)\mathbb{T}_{R}(\Lambda^{\prime},P^{\prime}), the tree Tl(x)T_{l}(x) is i.i.d. for all variable nodes xx^{\prime} of same distance RlR-l from xs(t)x_{s(t)}. By abusing the notation, we can simply omit ”(x)(x)” and write Tl=Tl(x)T_{l}=T_{l}(x). For any factor graph TT which is a tree rooted at a variable node xx^{\prime}, we say the tree TT has a free root if in the XORSAT instance induced by TT the marginal distribution of the root variable xx^{\prime} is an even distribution over {0,1}\{0,1\}. Now, we are going to prove that for any 0lR10\leq l\leq R-1 the tree TlT_{l} has a free root with probability at least Sl(t/n)+o(1)S_{l}(t/n)+o(1), where the sequence {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is given by

S0(x)=1, and Sl(x)=exp(kr[(1x)(1Sl1(x))+x]k1) for any l1\displaystyle S_{0}(x)=1\text{,\quad and \quad}S_{l}(x)=\exp\left(-kr\left[\left(1-x\right)(1-S_{l-1}(x))+x\right]^{k-1}\right)\text{\quad for any\;}l\geq 1 (28)

for any xx\in\mathbb{R}.

Lemma 20.

For 0lR10\leq l\leq R-1, the tree TlT_{l} has a free root with probability at least Sl(t/n)+o(1)S_{l}(t/n)+o(1).

Proof.

First, we consider the tree T0T_{0}. The root variable node xx^{\prime} of T0T_{0} has distance RR from the selected variable xs(t)x_{s(t)}. Since the radius of B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) is RR, the variable xx^{\prime} does not have any child node. Thus, T0T_{0} consists of only one variable node, namely xx^{\prime}, and no equation node. We can assign either 0 or 1 to xx without violating any equation. Hence, the tree T0T_{0} has a free root with probability S0(t/n)=1S_{0}(t/n)=1.

For 1lR11\leq l\leq R-1, consider the subtree TlT_{l} of local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R), rooted at the variable node xax_{a} of distance RlR-l from xs(t)x_{s(t)}. The variable node xax_{a} has i1i-1 child equation nodes with probability λi(t)\lambda_{i}(t) for 1im1\leq i\leq m. Each of those equation nodes eae_{a} has j1j-1 child variable nodes xbx_{b} with probability ρj(t)\rho_{j}(t) for 1jk1\leq j\leq k. We also know that each of these child variable nodes xbx_{b} is the root of an i.i.d. subtree Tl1T_{l-1}. In other words, each of those equation nodes eae_{a} is connected to the roots of j1j-1 i.i.d. subtrees Tl1T_{l-1} as its children.

For an equation node eae_{a} mentioned above, if at least one of its child subtree Tl1T_{l-1} has a free root, then there are at least two solutions for the subtree Tl1T_{l-1}, one assigns 0 to the root xbx_{b} of subtree Tl1T_{l-1}, and another one assigns 1 to the root xbx_{b} of subtree Tl1T_{l-1}. Therefore, no matter what value we assign to xax_{a}, we are able to choose a suitable assignment for the variables in that subtree Tl1T_{l-1} so that the equation of eae_{a} and all equations in Tl1T_{l-1} are satisfied.

Note that, from (11) and (12), we know that at time tt

λi(t)\displaystyle\lambda_{i}(t) =i(rni)(kn)i(1kn)rni1kr+o(1) for 1im, and\displaystyle=i\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}\frac{1}{kr}+o(1)\text{\quad for $1\leq i\leq m$, and}
ρj(t)\displaystyle\rho_{j}(t) =j(kj)(tn)kj(1tn)j1k(1tn)+o(1) for 1jk.\displaystyle=j\binom{k}{j}\left(\frac{t}{n}\right)^{k-j}\left(1-\frac{t}{n}\right)^{j}\frac{1}{k\left(1-\frac{t}{n}\right)}+o(1)\text{\quad for $1\leq j\leq k$.}

So, the probability of the equation node eae_{a} having at least one of its child subtree Tl1T_{l-1} having a free root is given by

1j=1kρj(t)(1Sl1)j1\displaystyle 1-\sum_{j=1}^{k}\rho_{j}(t)(1-S_{l-1})^{j-1}
=1j=1kj(kj)(kn)kj(1tn)j(1Sl1)j1k(1tn)(1Sl1)+o(1)\displaystyle=1-\sum_{j=1}^{k}j\binom{k}{j}\left(\frac{k}{n}\right)^{k-j}\left(1-\frac{t}{n}\right)^{j}(1-S_{l-1})^{j}\cdot\frac{1}{k\left(1-\frac{t}{n}\right)(1-S_{l-1})}+o(1)
=1j=1kj(kj)(kn)kj[(1tn)(1Sl1)]j1k(1tn)(1Sl1)+o(1)\displaystyle=1-\sum_{j=1}^{k}j\binom{k}{j}\left(\frac{k}{n}\right)^{k-j}\left[\left(1-\frac{t}{n}\right)(1-S_{l-1})\right]^{j}\cdot\frac{1}{k\left(1-\frac{t}{n}\right)(1-S_{l-1})}+o(1)
=1k[(1tn)(1Sl1)+tn]k1(1tn)(1Sl1)\displaystyle=1-k\left[\left(1-\frac{t}{n}\right)(1-S_{l-1})+\frac{t}{n}\right]^{k-1}\left(1-\frac{t}{n}\right)(1-S_{l-1})
1k(1tn)(1Sl1)+o(1)\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\cdot\frac{1}{k\left(1-\frac{t}{n}\right)(1-S_{l-1})}+o(1)
=1[(1tn)(1Sl1)+tn]k1+o(1)\displaystyle=1-\left[\left(1-\frac{t}{n}\right)(1-S_{l-1})+\frac{t}{n}\right]^{k-1}+o(1)
Sl,\displaystyle\equiv S_{l}^{*},

where Sl1=Sl1(t/n)S_{l-1}=S_{l-1}(t/n). Note that limn(1+α/n)n=eα\lim_{n\rightarrow\infty}(1+\alpha/n)^{n}=e^{\alpha} for all α\alpha\in\mathbb{R}. Then, the subtree TlT_{l} has a free root with probability

i=1rnλi(t)(Sl)i1\displaystyle\sum_{i=1}^{rn}\lambda_{i}(t)(S_{l}^{*})^{i-1} =i=1rni(rni)(kn)i(1kn)rni(Sl)i1krSl+o(1)\displaystyle=\sum_{i=1}^{rn}i\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}(S_{l}^{*})^{i}\cdot\frac{1}{krS_{l}^{*}}+o(1)
=i=1rni(rni)(knSl)i(1kn)rni1krSl+o(1)\displaystyle=\sum_{i=1}^{rn}i\binom{rn}{i}\left(\frac{k}{n}S_{l}^{*}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}\cdot\frac{1}{krS_{l}^{*}}+o(1)
=rn[knSl+(1kn)]rn1(knSl)1krSl+o(1)\displaystyle=rn\left[\frac{k}{n}S_{l}^{*}+\left(1-\frac{k}{n}\right)\right]^{rn-1}\left(\frac{k}{n}S_{l}^{*}\right)\cdot\frac{1}{krS_{l}^{*}}+o(1)
=[knSl+(1kn)]rn1+o(1)\displaystyle=\left[\frac{k}{n}S_{l}^{*}+\left(1-\frac{k}{n}\right)\right]^{rn-1}+o(1)
=(1+k(Sl1)n)rn1+o(1)\displaystyle=\left(1+\frac{k(S_{l}^{*}-1)}{n}\right)^{rn-1}+o(1)
=exp(kr(Sl1))+o(1)\displaystyle=\exp(kr(S_{l}^{*}-1))+o(1)
=exp(kr[(1tn)(1Sl1)+tn]k1)+o(1)\displaystyle=\exp\left(-kr\left[\left(1-\frac{t}{n}\right)(1-S_{l-1})+\frac{t}{n}\right]^{k-1}\right)+o(1)
=Sl(t/n)+o(1).\displaystyle=S_{l}(t/n)+o(1).

Now, we can calculate the probability that the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) of the selected variable xs(t)x_{s(t)} of radius R>0R>0 has a free root with probability at least SR(t/n)S_{R}(t/n). The proof is similar to the proof of Lemma 20, except replacing λi(t)\lambda_{i}(t) with Λi(t)\Lambda_{i}(t).

Lemma 21.

At time t0t\geq 0, the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) of the selected variable xs(t)x_{s(t)} of radius R0R\geq 0 has a free root with probability at least SR(t/n)+o(1)S_{R}(t/n)+o(1).

Proof.

The root variable node xs(t)x_{s(t)} has ii child equation node with probability Λi(t)\Lambda_{i}(t) for 0im0\leq i\leq m. Each of those equation nodes eae_{a} has j1j-1 child variable nodes xbx_{b} with probability ρj(t)\rho_{j}(t), and each of these child variable nodes xbx_{b} is the root of an i.i.d. subtree TR1T_{R-1}. In other words, each of those equation nodes eae_{a} is connected to the roots of j1j-1 i.i.d. subtrees TR1T_{R-1} as its children.

For an equation node eae_{a} mentioned above, if at least one of its child subtree TR1T_{R-1} has a free root, then we are able to obtain a satisfying assignment for all variable nodes in the child subtree TR1T_{R-1} which assigns either 1 or 0 to the root xbx_{b} of subtree TR1T_{R-1}. Therefore, no matter what value we assign to xax_{a}, we are able to choose a suitable assignment for the variables in that subtree TR1T_{R-1} so that the equation of eae_{a} and all equations in TR1T_{R-1} are satisfied.

Similar to the proof of Lemma 20, the probability that the equation node eae_{a} has at least one of its child subtree TR1T_{R-1} having a free root is given by

1j=1kρj(t)(1SR1)j1\displaystyle 1-\sum_{j=1}^{k}\rho_{j}(t)(1-S_{R-1})^{j-1} =exp(kr[(1tn)(1SR1)+tn]k1)+o(1)SR,\displaystyle=\exp\left(-kr\left[\left(1-\frac{t}{n}\right)(1-S_{R-1})+\frac{t}{n}\right]^{k-1}\right)+o(1)\equiv S_{R}^{*},

where SR1=SR1(t/n)S_{R-1}=S_{R-1}(t/n). From (11), at time tt we have Λi(t)=(rni)(kn)i(1kn)rni+o(1)\Lambda_{i}(t)=\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}+o(1) for 0im0\leq i\leq m. Note that limn(1+α/n)n=eα\lim_{n\rightarrow\infty}(1+\alpha/n)^{n}=e^{\alpha} for all α\alpha\in\mathbb{R}. Hence, the probability that the local neighborhood B𝚽𝐭(xs(t),R)B_{\mathbf{\Phi_{t}}}(x_{s(t)},R) has a free root is given by

i=0rnΛi(t)(SR)i1\displaystyle\sum_{i=0}^{rn}\Lambda_{i}(t)(S_{R}^{*})^{i-1} =i=0rn(rni)(kn)i(1kn)rni(SR)i+o(1)\displaystyle=\sum_{i=0}^{rn}\binom{rn}{i}\left(\frac{k}{n}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}(S_{R}^{*})^{i}+o(1)
=i=0rn(rni)(knSR)i(1kn)rni+o(1)\displaystyle=\sum_{i=0}^{rn}\binom{rn}{i}\left(\frac{k}{n}S_{R}^{*}\right)^{i}\left(1-\frac{k}{n}\right)^{rn-i}+o(1)
=[knSR+(1kn)]rn+o(1)\displaystyle=\left[\frac{k}{n}S_{R}^{*}+\left(1-\frac{k}{n}\right)\right]^{rn}+o(1)
=[1+k(SR1)n]rn+o(1)\displaystyle=\left[1+\frac{k(S_{R}^{*}-1)}{n}\right]^{rn}+o(1)
=exp(kr(SR1))+o(1)\displaystyle=\exp\left(kr(S_{R}^{*}-1)\right)+o(1)
=exp(kr[(1tn)(1SR1)+tn]k1)+o(1)\displaystyle=\exp\left(-kr\left[\left(1-\frac{t}{n}\right)(1-S_{R-1})+\frac{t}{n}\right]^{k-1}\right)+o(1)
=SR(t/n)+o(1),\displaystyle=S_{R}(t/n)+o(1),

Proof of Lemma 8.

Lemma 21 implies that at time t0t\geq 0 the local rule τ\tau gives the value 1/21/2 to the decimation algorithm with probability at least SR(t/n)S_{R}(t/n). Now we can calculate the number of free steps run by the τ\tau-decimation algorithm by tracking the number throughout the process of the algorithm. We know that the (t+1)(t+1)-st iteration is a free step with probability at least SR(t/n)+o(1)S_{R}(t/n)+o(1). Let q(t)q(t) be a lower bound of the number of free steps after tt iterations, with q(0)=0q(0)=0 and

𝔼[qe(t+1)qe(t)]=SR(t/n)+o(1).\displaystyle\mathop{{}\mathbb{E}}\left[{q_{e}(t+1)-q_{e}(t)}\right]=S_{R}(t/n)+o(1). (29)

By letting x=t/nx=t/n and w=w(x)=q(xn)/nw=w(x)=q(xn)/n, the equation (29) suggests the differential equation

dwdx=SR(x)\displaystyle\frac{dw}{dx}=S_{R}(x) (30)

with the initial condition

w(x)=0\displaystyle w(x)=0 (31)

since q(0)=0q(0)=0. By the Wormald’s method of differential equations, w.h.p. the lower bound q(t)q(t) of the number of free steps after tt iterations is given by

(0t/nSR(x)𝑑x)n+o(n).\displaystyle\left(\int_{0}^{t/n}S_{R}(x)dx\right)n+o(n).

Hence, w.h.p. the total number of free steps run by the τ\tau-decimation algorithm is lower bounded by we(k,r)n+o(n)w_{e}(k,r)n+o(n), where we(k,r)w_{e}(k,r) is given by

we(k,r)=01SR(x)𝑑x,\displaystyle w_{e}(k,r)=\int_{0}^{1}S_{R}(x)dx,

Hence, DECτ is we(k,r)w_{e}(k,r)-free. ∎

Appendix F Proof of Lemma 13

In this section, we prove Lemma 13, which gives a lower bound for w1(k,r)w_{1}(k,r). To do that, we first prove that w1(k,r)w_{1}(k,r) is lower bounded by

w1(k)=k11kk1γ(1k1,k(kk+1)k1),\displaystyle w_{1}^{*}(k)=\frac{k^{\frac{1}{1-k}}}{k-1}\gamma\left(\frac{1}{k-1},k\left(\frac{k}{k+1}\right)^{k-1}\right),

for k3k\geq 3 and r[0,1]r\in[0,1]. We then prove that w1(k)w_{1}^{*}(k) is decreasing with integer k3k\geq 3, and thus w1(k,r)w_{1}(k,r) has a lower bound w1(k0)w_{1}^{*}(k_{0}) for any kk03k\geq k_{0}\geq 3.

We start from proving that w1(k,r)w_{1}(k,r) is decreasing with r[0,1]r\in[0,1], which implies w1(k,r)w1(k,1)w_{1}(k,r)\geq w_{1}(k,1) for any r[0,1]r\in[0,1] and k3k\geq 3.

Lemma 22.

w1(k,r)w_{1}(k,r) is decreasing with r[0,1]r\in[0,1] for any k3k\geq 3.

Proof.

We obtain the derivative of w1(k,r)w_{1}(k,r) with respect to rr by the followings.

w1r\displaystyle\frac{\partial w_{1}}{\partial r} =11k(kr)11k1kk1γ(1k1,kr)+(kr)11kk1(kr)11k1ekrk\displaystyle=\frac{\frac{1}{1-k}(kr)^{\frac{1}{1-k}-1}k}{k-1}\gamma\left(\frac{1}{k-1},kr\right)+\frac{(kr)^{\frac{1}{1-k}}}{k-1}(kr)^{\frac{1}{1-k}-1}e^{-kr}k
=k(kr)11k1k1[11kγ(1k1,kr)+(kr)1k1ekr]\displaystyle=\frac{k(kr)^{\frac{1}{1-k}-1}}{k-1}\left[\frac{1}{1-k}\gamma\left(\frac{1}{k-1},kr\right)+(kr)^{\frac{1}{k-1}}e^{-kr}\right]
=k(kr)11k1k1hk(r)\displaystyle=\frac{k(kr)^{\frac{1}{1-k}-1}}{k-1}h_{k}(r)

where hk(r)h_{k}(r) is given by

hk(r)=11kγ(1k1,kr)+(kr)1k1ekr.\displaystyle h_{k}(r)=\frac{1}{1-k}\gamma\left(\frac{1}{k-1},kr\right)+(kr)^{\frac{1}{k-1}}e^{-kr}.

Note that hk(0)=0h_{k}(0)=0 and its derivative is given by

dhkdr\displaystyle\frac{dh_{k}}{dr} =11k(kr)1k11ekrk+1k1(kr)1k11kekr+(kr)1k1ekr(k)\displaystyle=\frac{1}{1-k}(kr)^{\frac{1}{k-1}-1}e^{-kr}k+\frac{1}{k-1}(kr)^{\frac{1}{k-1}-1}ke^{-kr}+(kr)^{\frac{1}{k-1}}e^{-kr}(-k)
=k(kr)1k1ekr\displaystyle=-k(kr)^{\frac{1}{k-1}}e^{-kr}
0.\displaystyle\leq 0.

Therefore, hk(r)h_{k}(r) is decreasing with rr and hk(r)hk(0)=0h_{k}(r)\leq h_{k}(0)=0 for any r[0,1]r\in[0,1]. Hence, w1r0\frac{\partial w_{1}}{\partial r}\leq 0 and thus w1w_{1} is decreasing with r[0,1]r\in[0,1] for any k3k\geq 3. ∎

By the above lemma, we know that w1(k,r)w1(k,1)w_{1}(k,r)\geq w_{1}(k,1) for any r[0,1]r\in[0,1]. Next we will prove that w1(k,1)w_{1}(k,1) is lower bounded by w1(k)w_{1}^{*}(k).

Lemma 23.

For k3k\geq 3, w1(k,1)w1(k)w_{1}(k,1)\geq w_{1}^{*}(k).

Proof.

For any k3k\geq 3 and any t0t\geq 0 we have

k>k(kk+1)k1 and t1k11et>0,\displaystyle k>k\left(\frac{k}{k+1}\right)^{k-1}\text{\quad and \quad}t^{\frac{1}{k-1}-1}e^{-t}>0,

This implies that

w1(k,1)=k11kk1γ(1k1,k)>k11kk1γ(1k1,k(kk+1)k1)=w1(k).\displaystyle w_{1}(k,1)=\frac{k^{\frac{1}{1-k}}}{k-1}\gamma\left(\frac{1}{k-1},k\right)>\frac{k^{\frac{1}{1-k}}}{k-1}\gamma\left(\frac{1}{k-1},k\left(\frac{k}{k+1}\right)^{k-1}\right)=w_{1}^{*}(k). (32)

Next, we prove that {w1(k)}k3\{w_{1}^{*}(k)\}_{k\geq 3} is an increasing sequence, which implies that w1(k)w1(k0)w_{1}^{*}(k)\geq w_{1}^{*}(k_{0}) for any kk0k\geq k_{0}.

Lemma 24.

{w1(k)}k3\{w_{1}^{*}(k)\}_{k\geq 3} is an increasing sequence.

Proof.

Note that for k,x>0k,x>0 we have

ektk1𝑑x\displaystyle\int e^{-kt^{k-1}}dx =k11kk1Γ(1k1,kxk1)+constant.\displaystyle=-\frac{k^{\frac{1}{1-k}}}{k-1}\;\Gamma\left(\frac{1}{k-1},kx^{k-1}\right)+\text{constant.}

By replacing kk with k+1k+1, we also have

e(k+1)tk𝑑x\displaystyle\int e^{-(k+1)t^{k}}dx =(k+1)1kkΓ(1k,(k+1)xk)+constant.\displaystyle=-\frac{(k+1)^{\frac{1}{-k}}}{k}\;\Gamma\left(\frac{1}{k},(k+1)x^{k}\right)+\text{constant.}

Therefore, we have

0kk+1ektk1𝑑x\displaystyle\int_{0}^{\frac{k}{k+1}}e^{-kt^{k-1}}dx =k11kk1[Γ(1k1)Γ(1k1,k(kk+1)k1)]\displaystyle=\frac{k^{\frac{1}{1-k}}}{k-1}\;\left[\Gamma\left(\frac{1}{k-1}\right)-\Gamma\left(\frac{1}{k-1},k\left(\frac{k}{k+1}\right)^{k-1}\right)\right]
=k11kk1γ(1k1,k(kk+1)k1)\displaystyle=\frac{k^{\frac{1}{1-k}}}{k-1}\;\gamma\left(\frac{1}{k-1},k\left(\frac{k}{k+1}\right)^{k-1}\right)
=w1(k)\displaystyle=w_{1}^{*}(k)

and

0kk+1e(k+1)tk𝑑x\displaystyle\int_{0}^{\frac{k}{k+1}}e^{-(k+1)t^{k}}dx =(k+1)1kk[Γ(1k)Γ(1k,(k+1)(kk+1)k)]\displaystyle=\frac{(k+1)^{\frac{1}{-k}}}{k}\;\left[\Gamma\left(\frac{1}{k}\right)-\Gamma\left(\frac{1}{k},(k+1)\left(\frac{k}{k+1}\right)^{k}\right)\right]
=(k+1)1kkγ(1k,(k+1)(kk+1)k)\displaystyle=\frac{(k+1)^{\frac{1}{-k}}}{k}\;\gamma\left(\frac{1}{k},(k+1)\left(\frac{k}{k+1}\right)^{k}\right)
<(k+1)1kkγ(1k,(k+1)(k+1k+2)k)\displaystyle<\frac{(k+1)^{\frac{1}{-k}}}{k}\;\gamma\left(\frac{1}{k},(k+1)\left(\frac{k+1}{k+2}\right)^{k}\right)
=w1(k+1).\displaystyle=w_{1}^{*}(k+1).

The above inequality is based on the fact that the lower incomplete gamma function γ(a,x)\gamma(a,x) is strictly increasing with x0x\geq 0 and kk+1<k+1k+2\frac{k}{k+1}<\frac{k+1}{k+2}.

For 0xkk+10\leq x\leq\frac{k}{k+1}, we have ekxk1e(k+1)xke^{-kx^{k-1}}\leq e^{-(k+1)x^{k}}. Therefore, we have

w1(k)=0kk+1ektk1𝑑x0kk+1e(k+1)tk𝑑xw1(k+1)\displaystyle w_{1}^{*}(k)=\int_{0}^{\frac{k}{k+1}}e^{-kt^{k-1}}dx\leq\int_{0}^{\frac{k}{k+1}}e^{-(k+1)t^{k}}dx\leq w_{1}^{*}(k+1)

and thus {w1(k)}k3\{w_{1}^{*}(k)\}_{k\geq 3} is an increasing sequence. ∎

Combining above lemmas, we can complete the proof of Lemma 13.

Proof of Lemma 13.

From Lemma 22, 23 and 24, we have w1(k,r)w1(k,1)w1(k)w1(k0)w_{1}(k,r)\geq w_{1}(k,1)\geq w_{1}^{*}(k)\geq w_{1}^{*}(k_{0}) for any kk03k\geq k_{0}\geq 3 and r[0,1]r\in[0,1]. ∎

Appendix G Proof of Lemma 14

In this section, we prove Lemma 14, which gives a lower bound for we(k,r)w_{e}(k,r). Recall that

S0(x)=1 and Sl(x)=exp(kr[(1x)(1Sl1(x))+x]k1) for any l1 and x.\displaystyle S_{0}(x)=1\text{\;\;and\;\;}S_{l}(x)=\exp\left(-kr[(1-x)(1-S_{l-1}(x))+x]^{k-1}\right)\text{\quad for any\;}l\geq 1\text{\;and\;}x\in\mathbb{R}. (33)

We first show that {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is decreasing.

Lemma 25.

For any k3k\geq 3, r[0,1]r\in[0,1] and x[0,1]x\in[0,1], we have 0<Sl(x)10<S_{l}(x)\leq 1 for any l0l\geq 0, and the sequence {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is decreasing.

Proof.

Without loss of generality, we write Sl=Sl(x)S_{l}=S_{l}(x). First, we prove that 0<Sl10<S_{l}\leq 1 for all l0l\geq 0, by induction. Note that S0=1S_{0}=1. If 0<Sl10<S_{l}\leq 1 for some l0l\geq 0, then it is easy to see from (33) that we also have 0<Sl+110<S_{l+1}\leq 1. Therefore, we know that 0<Sl10<S_{l}\leq 1 for all l0l\geq 0.

Then, we prove Sl+1SlS_{l+1}\leq S_{l} for any l0l\geq 0 by induction again. For l=0l=0, we have S1=exp(kr[(1x)(1S0)+x]k1)=exp(krxk1)1=S0S_{1}=\exp(-kr[(1-x)(1-S_{0})+x]^{k-1})=\exp(-krx^{k-1})\leq 1=S_{0}. Assume that Sl+1SlS_{l+1}\leq S_{l} for some l0l\geq 0. Then, we have

Sl+2\displaystyle S_{l+2} =exp(kr[(1x)(1Sl+1)+x]k1)\displaystyle=\exp(-kr[(1-x)(1-S_{l+1})+x]^{k-1})
exp(kr[(1x)(1Sl)+x]k1)\displaystyle\leq\exp(-kr[(1-x)(1-S_{l})+x]^{k-1})
=Sl+1.\displaystyle=S_{l+1}.

Therefore, Sl+1SlS_{l+1}\leq S_{l} is true for all l0l\geq 0. The result follows. ∎

Since the sequence {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is decreasing and bounded from below by 0, by monotone convergence theorem, it converges as ll\rightarrow\infty. In particular, {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} converges to S^(x)\hat{S}(x) as ll\rightarrow\infty, where S^(x)\hat{S}(x) is the largest solution of the fixed point equation

S=exp(kr[(1x)(1S)+x]k1),\displaystyle S=\exp(-kr[(1-x)(1-S)+x]^{k-1}), (34)

and Sl(x)S^(x)S_{l}(x)\geq\hat{S}(x) for any l0l\geq 0.

Lemma 26.

Given k3k\geq 3 and r[0,1]r\in[0,1], we have S^(x)1(kr)2xk1\hat{S}(x)\geq 1-(kr)^{2}x^{k-1} for any 0x<x(k,r)0\leq x<x^{-}(k,r), where

x±(k,r)=(1±14(kr)2[(kr)1k11]2)1k2.\displaystyle x^{\pm}(k,r)=\left(\frac{1\pm\sqrt{1-4(kr)^{-2}[(kr)^{\frac{1}{k-1}}-1]}}{2}\right)^{\frac{1}{k-2}}. (35)
Proof.

Define the analytic real-valued function F(k,r,x,s)F(k,r,x,s) by

F(k,r,x,s)=exp(kr[(1x)(1s)+x]k1)s\displaystyle F(k,r,x,s)=\exp(-kr[(1-x)(1-s)+x]^{k-1})-s

for any k3k\geq 3, r[0,1]r\in[0,1], x[0,1]x\in[0,1] and ss\in\mathbb{R}. Note that we have

F(k,r,x,s)\displaystyle F(k,r,x,s) =exp(kr[(1x)(1s)+x]k1)s\displaystyle=\exp(-kr[(1-x)(1-s)+x]^{k-1})-s
1kr[(1x)(1s)+x]k1s\displaystyle\geq 1-kr[(1-x)(1-s)+x]^{k-1}-s
1kr[(1xk2)(1s)+x]k1s\displaystyle\geq 1-kr[(1-x^{k-2})(1-s)+x]^{k-1}-s

since exp(y)1y\exp(-y)\geq 1-y for any yy\in\mathbb{R}, and x[0,1]x\in[0,1]. Now we set s=1(kr)2xk1s=1-(kr)^{2}x^{k-1}. With the polynomial identity Xk1Yk1=(XY)i=1k1Xk1iYi1X^{k-1}-Y^{k-1}=(X-Y)\sum_{i=1}^{k-1}X^{k-1-i}Y^{i-1}, we then have

F(k,r,x,1(kr)2xk1)\displaystyle F(k,r,x,1-(kr)^{2}x^{k-1})
(kr)2xk1kr[(1xk2)(kr)2xk1+x]k1\displaystyle\geq(kr)^{2}x^{k-1}-kr[(1-x^{k-2})(kr)^{2}x^{k-1}+x]^{k-1}
=kr([(kr)1k1x]k1[(1xk2)(kr)2xk1+x]k1)\displaystyle=kr\left([(kr)^{\frac{1}{k-1}}x]^{k-1}-[(1-x^{k-2})(kr)^{2}x^{k-1}+x]^{k-1}\right)
=kr([(kr)1k1x][(1xk2)(kr)2xk1+x])F2(k,r,x)\displaystyle=kr\left([(kr)^{\frac{1}{k-1}}x]-[(1-x^{k-2})(kr)^{2}x^{k-1}+x]\right)\cdot F_{2}(k,r,x)
=kr((kr)1k1x(kr)2xk1+(kr)2x2k3x)F2(k,r,x)\displaystyle=kr\left((kr)^{\frac{1}{k-1}}x-(kr)^{2}x^{k-1}+(kr)^{2}x^{2k-3}-x\right)\cdot F_{2}(k,r,x)
=krx((kr)2(xk2)2(kr)2(xk2)+((kr)1k11))F2(k,r,x)\displaystyle=krx\left((kr)^{2}(x^{k-2})^{2}-(kr)^{2}(x^{k-2})+((kr)^{\frac{1}{k-1}}-1)\right)\cdot F_{2}(k,r,x)
=krxF1(k,r,x)F2(k,r,x)\displaystyle=krx\cdot F_{1}(k,r,x)\cdot F_{2}(k,r,x)

where

F1(k,r,x)\displaystyle F_{1}(k,r,x) =(kr)2(xk2)2(kr)2(xk2)+((kr)1k11) and\displaystyle=(kr)^{2}(x^{k-2})^{2}-(kr)^{2}(x^{k-2})+((kr)^{\frac{1}{k-1}}-1)\text{\quad and}
F2(k,r,x)\displaystyle F_{2}(k,r,x) =i=1k1[(kr)1k1x]k1i[(1xk2)(kr)2xk1+x]i1.\displaystyle=\sum_{i=1}^{k-1}[(kr)^{\frac{1}{k-1}}x]^{k-1-i}[(1-x^{k-2})(kr)^{2}x^{k-1}+x]^{i-1}.

By the quadratic formula and (35), we know that F1(k,r,x)0F_{1}(k,r,x)\geq 0 if xx(k,r)x\leq x^{-}(k,r) or xx+(k,r)x\geq x^{+}(k,r). In particular, F1(k,r,x)=0F_{1}(k,r,x)=0 if x=x(k,r)x=x^{-}(k,r) or x=x+(k,r)x=x^{+}(k,r). For x0x\geq 0, we have (kr)1k1x0(kr)^{\frac{1}{k-1}}x\geq 0 and (1x)k2(kr)2xk1+x0(1-x)^{k-2}(kr)^{2}x^{k-1}+x\geq 0, and thus F2(k,r,x)0F_{2}(k,r,x)\geq 0. Hence, we have F(k,r,x,1(kr)2xk1)0F(k,r,x,1-(kr)^{2}x^{k-1})\geq 0 for any 0xx(k,r)0\leq x\leq x^{-}(k,r).

Now we view FF as a function of ss. For 0xx(k,r)0\leq x\leq x^{-}(k,r), we know that F(k,r,x,1(kr)2xk1)0F(k,r,x,1-(kr)^{2}x^{k-1})\geq 0 and F(k,r,x,1)=exp(krxk1)10F(k,r,x,1)=\exp(-krx^{k-1})-1\leq 0. By the continuity of FF as a function of ss, there exists at least one s0[1krxk1,1]s_{0}\in[1-krx^{k-1},1] such that F(k,r,x,s0)=0F(k,r,x,s_{0})=0, which implies

s0=exp(kr[(1x)(1s0)+x]k1).\displaystyle s_{0}=\exp(-kr[(1-x)(1-s_{0})+x]^{k-1}).

By definition, S^(k,r,x)\hat{S}(k,r,x) is the largest solution of the fixed point equation (34), so we have S^(x)s01krxk1\hat{S}(x)\geq s_{0}\geq 1-krx^{k-1}. ∎

Proof of Lemma 14.

From Lemma 25, with x(rcore(k),rsat(k))x\in(r_{core}(k),r_{sat}(k)), we know the sequence {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is decreasing and lower bounded by 0. By the monotone convergence theorem, the sequence {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} converges as ll\rightarrow\infty. In particular, it converges to S^(x)\hat{S}(x) which is the largest solution of the fixed point equation in (34), and Sl(x)S^(x)S_{l}(x)\geq\hat{S}(x) for all l0l\geq 0. Therefore, we have

we(k,r)=01SR(x)𝑑x01S^(x)𝑑x.\displaystyle w_{e}(k,r)=\int_{0}^{1}S_{R}(x)dx\geq\int_{0}^{1}\hat{S}(x)dx.

Furthermore, since {Sl(x)}l0\{S_{l}(x)\}_{l\geq 0} is lower bounded by 0, S^(x)\hat{S}(x) is non-negative for any x[0,1]x\in[0,1]. Thus, we have

01S^(x)𝑑x0x(k,r)S^(x)𝑑x.\displaystyle\int_{0}^{1}\hat{S}(x)dx\geq\int_{0}^{x^{-}(k,r)}\hat{S}(x)dx.

Then, by Lemma 26, we have S^(x)1(kr)2xk1\hat{S}(x)\geq 1-(kr)^{2}x^{k-1} for any 0xx(k,r)0\leq x\leq x^{-}(k,r), which implies

01S^(x)𝑑x\displaystyle\int_{0}^{1}\hat{S}(x)dx 0x(k,r)(1(kr)2xk1)𝑑x\displaystyle\geq\int_{0}^{x^{-}(k,r)}(1-(kr)^{2}x^{k-1})dx
x(k,r)kr2(x(k,r))k\displaystyle\geq x^{-}(k,r)-kr^{2}(x^{-}(k,r))^{k}
=we(k,r).\displaystyle=w_{e}^{*}(k,r).

By directly differentiating we(k,r)w_{e}^{*}(k,r) with respect to kk and rr, we can check that we(k,r)w_{e}^{*}(k,r) is increasing with kk for k3k\geq 3 and decreasing with rr for r(rcore(k),rsat(k))r\in(r_{core}(k),r_{sat}(k)). Therefore, we have we(k,r)we(k0,rsat(k0))w_{e}(k,r)\geq w_{e}^{*}(k_{0},r_{sat}(k_{0})). ∎

Appendix H Proof of Lemma 1

The natural way to reveal the overlap gap property of the random kk-XORSAT problem is to use the first moment method. Given a random kk-XORSAT instance 𝚽𝚽n(k,rn)\mathbf{\Phi}\sim\mathbf{\Phi}_{n}(k,rn) of nn variables and rnrn clauses, we first calculate the expected number of pairs of solutions with distance αn\alpha n between them, for any α[0,1]\alpha\in[0,1]. Let Z(αn)Z(\alpha n) be the number of pairs of solutions with distance αn\alpha n between them. The expected value of Z(αn)Z(\alpha n) is given by the following lemma.

Lemma 27.

Let k3k\geq 3 and r>0r>0. For any α[0,1]\alpha\in[0,1], the expected value of Z(αn)Z(\alpha n) is given by

𝔼[Z(αn)]=12πn1α(1α)f(k,r,α)n+o(n),\displaystyle\mathop{{}\mathbb{E}}\left[{Z(\alpha n)}\right]=\frac{1}{\sqrt{2\pi n}}\frac{1}{\sqrt{\alpha(1-\alpha)}}f(k,r,\alpha)^{n}+o(n),

where the real-valued function ff is defined by

f(k,r,α)2αα(1α)1α(1+(12α)k4)r.\displaystyle f(k,r,\alpha)\equiv\frac{2}{\alpha^{\alpha}(1-\alpha)^{1-\alpha}}\left(\frac{1+(1-2\alpha)^{k}}{4}\right)^{r}.

For convenience, we simply assume αα(1α)1α=1\alpha^{\alpha}(1-\alpha)^{1-\alpha}=1 when α=0\alpha=0 or α=1\alpha=1.

Proof.

Let Z=Z(αn)Z=Z(\alpha n) be the number of pairs of solutions, with distance αn\alpha n between the two solutions, for a random kk-XORSAT instance. In the other words, given a random instance with linear system representation Ax=bAx=b,

Z=σ,σ{0,1}nd(σ,σ)=αn𝟙(Aσ=b and Aσ=b).Z=\sum_{\begin{subarray}{c}\sigma,\sigma^{\prime}\in\{0,1\}^{n}\\ d(\sigma,\sigma^{\prime})=\alpha n\end{subarray}}\mathbbm{1}(A\sigma=b\text{\;\;and\;\;}A\sigma^{\prime}=b).

By linearity of expectation, the expected value of ZZ is given by

𝔼[Z]\displaystyle\mathop{{}\mathbb{E}}\left[{Z}\right] =σ,σ{0,1}nd(σ,σ)=αnPr[Aσ=b and Aσ=b].\displaystyle=\sum_{\begin{subarray}{c}\sigma,\sigma^{\prime}\in\{0,1\}^{n}\\ d(\sigma,\sigma^{\prime})=\alpha n\end{subarray}}\Pr\left[\,{A\sigma=b\text{\;\;and\;\;}A\sigma^{\prime}=b}\,\right].

Now we consider the calculation of Pr[Aσ=b and Aσ=b]\Pr\left[\,{A\sigma=b\text{\;and\;}A\sigma^{\prime}=b}\,\right]. Since each equation in the linear system are chosen identically and independently, the summand can be written as (Pr[A1σ=b1 and A1σ=b1])rn(\Pr\left[\,{A_{1}\sigma=b_{1}\text{\;and\;}A_{1}\sigma^{\prime}=b_{1}}\,\right])^{rn}, where A1x=b1A_{1}x=b_{1} is the first equation in the linear system. In addition, by condition probability formula we have

Pr[Aσ=b and Aσ=b]\displaystyle\Pr\left[\,{A\sigma=b\text{\;and\;}A\sigma^{\prime}=b}\,\right] =(Pr[A1σ=b1 and A1σ=b1])rn\displaystyle=\left(\Pr\left[\,{A_{1}\sigma=b_{1}\text{\;\;and\;\;}A_{1}\sigma^{\prime}=b_{1}}\,\right]\right)^{rn}
=(Pr[A1σ=b1 and A1σ=A1σ])rn\displaystyle=\left(\Pr\left[\,{A_{1}\sigma=b_{1}\text{\;\;and\;\;}A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]\right)^{rn}
=(Pr[A1σ=b1|A1σ=A1σ]Pr[A1σ=A1σ])rn\displaystyle=\left(\Pr\left[\,{A_{1}\sigma=b_{1}\;|\;A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]\Pr\left[\,{A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]\right)^{rn}
=(12Pr[A1σ=A1σ])rn\displaystyle=\left(\frac{1}{2}\Pr\left[\,{A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]\right)^{rn}

Since d(σ,σ)=αnd(\sigma,\sigma^{\prime})=\alpha n, there are αn\alpha n variables having different values and (1α)n(1-\alpha)n variables having same values when we compare the assignments σ\sigma and σ\sigma^{\prime}. The random equation A1σ=A1σA_{1}\sigma=A_{1}\sigma^{\prime} holds if and only if the equation chooses even number of variables from those αn\alpha n variables having different values in σ\sigma and σ\sigma^{\prime}. So, we have

Pr[A1σ=A1σ]\displaystyle\Pr\left[\,{A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]
=i=0k/2Pr[ 2i variables with different values in σ and σ are chosen by A1]\displaystyle=\sum_{i=0}^{\lfloor k/2\rfloor}\Pr\left[\,{2i\text{\;variables with different values in $\sigma$ and $\sigma^{\prime}$ are chosen by $A_{1}$}}\,\right]
=i=0k/2(αn2i)((1α)nk2i)(nk)\displaystyle=\sum_{i=0}^{\lfloor k/2\rfloor}\frac{\binom{\alpha n}{2i}\cdot\binom{(1-\alpha)n}{k-2i}}{\binom{n}{k}}
=i=0k/2(k2i)α2i(1α)k2i\displaystyle=\sum_{i=0}^{\lfloor k/2\rfloor}\binom{k}{2i}\alpha^{2i}(1-\alpha)^{k-2i}
=12(1+(12α)k)\displaystyle=\frac{1}{2}(1+(1-2\alpha)^{k})

From above formula, we can see that the value of Pr[A1σ=A1σ]\Pr\left[\,{A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right] is independent of the choices of σ\sigma and σ\sigma^{\prime}, and only depends on the distance between σ\sigma and σ\sigma^{\prime}. So we have

𝔼[Z]\displaystyle\mathop{{}\mathbb{E}}\left[{Z}\right] =σ,σ{0,1}nd(σ,σ)=αnPr[A1σ=b1 and A1σ=b1]rn\displaystyle=\sum_{\begin{subarray}{c}\sigma,\sigma^{\prime}\in\{0,1\}^{n}\\ d(\sigma,\sigma^{\prime})=\alpha n\end{subarray}}\Pr\left[\,{A_{1}\sigma=b_{1}\text{\;\;and\;\;}A_{1}\sigma^{\prime}=b_{1}}\,\right]^{rn}
=σ,σ{0,1}nd(σ,σ)=αn(12Pr[A1σ=A1σ])rn\displaystyle=\sum_{\begin{subarray}{c}\sigma,\sigma^{\prime}\in\{0,1\}^{n}\\ d(\sigma,\sigma^{\prime})=\alpha n\end{subarray}}\left(\frac{1}{2}\Pr\left[\,{A_{1}\sigma=A_{1}\sigma^{\prime}}\,\right]\right)^{rn}
=2n(nαn)(1+(12α)k4)rn\displaystyle=2^{n}\binom{n}{\alpha n}\left(\frac{1+(1-2\alpha)^{k}}{4}\right)^{rn}

By applying the Stirling’s approximation for (nαn)\binom{n}{\alpha n} we have

𝔼[Z]\displaystyle\mathop{{}\mathbb{E}}\left[{Z}\right] =2n12πn1α(1α)(1αα(1α)1α)n(1+(12α)k4)rn+o(n)\displaystyle=2^{n}\frac{1}{\sqrt{2\pi n}}\frac{1}{\sqrt{\alpha(1-\alpha)}}\left(\frac{1}{\alpha^{\alpha}(1-\alpha)^{1-\alpha}}\right)^{n}\left(\frac{1+(1-2\alpha)^{k}}{4}\right)^{rn}+o(n)
=12πn1α(1α)(2αα(1α)1α(1+(12α)k4)r)n+o(n)\displaystyle=\frac{1}{\sqrt{2\pi n}}\frac{1}{\sqrt{\alpha(1-\alpha)}}\left(\frac{2}{\alpha^{\alpha}(1-\alpha)^{1-\alpha}}\left(\frac{1+(1-2\alpha)^{k}}{4}\right)^{r}\right)^{n}+o(n)
=12πn1α(1α)f(k,r,α)n+o(n)\displaystyle=\frac{1}{\sqrt{2\pi n}}\frac{1}{\sqrt{\alpha(1-\alpha)}}f(k,r,\alpha)^{n}+o(n)

where f(k,r,α)f(k,r,\alpha) is defined by

f(k,r,α)2αα(1α)1α(1+(12α)k4)r.\displaystyle f(k,r,\alpha)\equiv\frac{2}{\alpha^{\alpha}(1-\alpha)^{1-\alpha}}\left(\frac{1+(1-2\alpha)^{k}}{4}\right)^{r}.

For convenience, we simply assume αα(1α)1α=1\alpha^{\alpha}(1-\alpha)^{1-\alpha}=1 when α=0\alpha=0 or α=1\alpha=1. ∎

Fix k3k\geq 3 and r>0r>0. If f(k,r,α)<1f(k,r,\alpha^{\prime})<1 for some α[0,1]\alpha^{\prime}\in[0,1], then the expectation 𝔼[Z(αn)]\mathop{{}\mathbb{E}}\left[{Z(\alpha^{\prime}n)}\right] converges to 0 as nn\rightarrow\infty. By Markov’s inequality, we have Pr[Z(αn)>0]𝔼[αn]\Pr\left[\,{Z(\alpha^{\prime}n)>0}\,\right]\leq\mathop{{}\mathbb{E}}\left[{\alpha^{\prime}n}\right], and thus Pr[Z(αn)>0]\Pr\left[\,{Z(\alpha^{\prime}n)>0}\,\right] also converges to 0 as nn\rightarrow\infty. That means the probability of having at least one pair of solutions with distance αn\alpha^{\prime}n between them converges to 0. In the other words, w.h.p. there is no such pair of solutions. So, if we can find an interval (u1,u2)[0,1](u_{1},u_{2})\subset[0,1] such that f(k,r,α)<1f(k,r,\alpha)<1 for any α(u1,u2)\alpha\in(u_{1},u_{2}), we can say that w.h.p. there is no pair of solutions with distance αn\alpha n between them, for any α(u1,u2)\alpha\in(u_{1},u_{2}). In such case, w.h.p. this distance between every pair of solutions is either u1n\leq u_{1}n, or u2n\geq u_{2}n, that is, w.h.p. the random instance 𝚽\mathbf{\Phi} exhibits the overlap gap property with v1=u1nv_{1}=u_{1}n and v2=u2nv_{2}=u_{2}n.

From the formula of the function ff, we can see that it decreases with rr, for any fixed kk and α\alpha, illustrated in Figure 1. It is more likely to have the OGP if the clause density rr is large. We can further determine the minimal clause density r1(k)r_{1}(k) for having the OGP. This leads to Lemma 1, with the following proof.

Refer to caption
Figure 1: The graph of f(k,r,α)f(k,r,\alpha) against α\alpha, with k=3k=3 and different values of rr: (1) r=0.85r=0.85 in blue at the top, (2) r=0.90r=0.90 in brown in the middle, and (3) r=0.95r=0.95 in green at the bottom.
Proof of Lemma 1.

If kk is odd, then we have f(k,r,1)=0f(k,r,1)=0. In this case, by the continuity of ff, there exist u1<12u2=1u_{1}<\frac{1}{2}\leq u_{2}=1 such that f(k,r,α)<1f(k,r,\alpha)<1 for any α(u1,u2)\alpha\in(u_{1},u_{2}).

Now we assume kk is even. When r=0r=0, it is easy to see that f(k,r,α)=2αα(1α)1α>1f(k,r,\alpha)=\frac{2}{\alpha^{\alpha}(1-\alpha)^{1-\alpha}}>1 for any α[0,1]\alpha\in[0,1]. If we fix kk and α\alpha, then the function ff is strictly decreasing with rr since (1+(12α)k)/4<1(1+(1-2\alpha)^{k})/4<1. We aim at finding the minimal positive value of rr that guarantees there exists an interval (u1,u2)[0,1](u_{1},u_{2})\subset[0,1] such that f(k,r,α)<1f(k,r,\alpha)<1 for any α(u1,u2)\alpha\in(u_{1},u_{2}). Furthermore, we know that f(k,r,α)=f(k,r,1α)f(k,r,\alpha)=f(k,r,1-\alpha) for even k3k\geq 3, so it suffices to find the minimal positive value of rr that guarantees the existence of such interval (u1,u2)(u_{1},u_{2}) within [0,12][0,\frac{1}{2}].

By fixing even k3k\geq 3 and α[0,12]\alpha\in[0,\frac{1}{2}], we can treat ff as a strictly decreasing continuous function fk,α(r)f_{k,\alpha}(r) of rr, with fk,α(0)=2>1f_{k,\alpha}(0)=2>1 and limrfk,α(r)=0<1\lim_{r\to\infty}f_{k,\alpha}(r)=0<1. Therefore, for any even k3k\geq 3 and α[0,12]\alpha\in[0,\frac{1}{2}], there exists a unique r(k,α)>0r^{*}(k,\alpha)>0 given by

r(k,α)=1+H(α)2log2(1+(12α)k),\displaystyle r^{*}(k,\alpha)=\frac{1+H(\alpha)}{2-\log_{2}(1+(1-2\alpha)^{k})},

such that

fk,α(r)\displaystyle f_{k,\alpha}(r) >1 for r<r(k,α),\displaystyle>1\text{\quad for }r<r^{*}(k,\alpha),
fk,α(r)\displaystyle f_{k,\alpha}(r) =1 for r=r(k,α), and\displaystyle=1\text{\quad for }r=r^{*}(k,\alpha),\text{ and}
fk,α(r)\displaystyle f_{k,\alpha}(r) <1 for r>r(k,α)\displaystyle<1\text{\quad for }r>r^{*}(k,\alpha)

where HH is the binary entropy function H(x)=xlog2(x)(1x)log2(1x)H(x)=-x\log_{2}(x)-(1-x)\log_{2}(1-x). Then, we can define r1(k)r_{1}(k) and α1(k)\alpha_{1}(k) by

r1(k)=min0α12r(k,α) and α1(k)=argmin0α12r(k,α)\displaystyle r_{1}(k)=\min_{0\leq\alpha\leq\frac{1}{2}}r^{*}(k,\alpha)\text{\quad and \quad}\alpha_{1}(k)=\operatorname*{arg\,min}_{0\leq\alpha\leq\frac{1}{2}}r^{*}(k,\alpha)

with r(k,α1(k))=r1(k)r^{*}(k,\alpha_{1}(k))=r_{1}(k). Suppose r>r1(k)r>r_{1}(k). Since ff is strictly decreasing with rr, we have f(k,r,α1(k))<f(k,r1(k),α1(k))=fk,α1(k)(r1(k))=fk,α1(k)(r(k,α1(k)))=1f(k,r,\alpha_{1}(k))<f(k,r_{1}(k),\alpha_{1}(k))=f_{k,\alpha_{1}(k)}(r_{1}(k))=f_{k,\alpha_{1}(k)}(r^{*}(k,\alpha_{1}(k)))=1. By the continuity of ff, there exist 0u1<α1(k)0\leq u_{1}<\alpha_{1}(k) and α1(k)<u212\alpha_{1}(k)<u_{2}\leq\frac{1}{2} such that we have f(k,r,α)<1f(k,r,\alpha)<1 for any α(u1,u2)\alpha\in(u_{1},u_{2}).

Therefore, with Lemma 27, we know that the expected number 𝔼[Z(αn)]\mathop{{}\mathbb{E}}\left[{Z(\alpha n)}\right] of pairs of solutions with distance αn\alpha n between them converges to zero for any α(u1,u2)\alpha\in(u_{1},u_{2}). By the first moment method, w.h.p. there is no pair of solutions with distance αn\alpha n between them, for any α(u1,u2)\alpha\in(u_{1},u_{2}). In the other words, w.h.p. for any pair of solutions σ\sigma and σ\sigma^{\prime} of 𝚽\mathbf{\Phi} the distance d(σ,σ)d(\sigma,\sigma^{\prime}) between them is either smaller than u1nu_{1}n or larger than u2nu_{2}n for some u1u_{1} and u2u_{2} with 0u1<u20\leq u_{1}<u_{2}. ∎