This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Moser-Tardos Algorithm: Beyond Shearer’s Bound

Kun He Qian Li  and  Xiaoming Sun Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. University of Chinese Academy of Sciences. Beijing, China. E-mail: [email protected], [email protected], and [email protected].
Abstract.

In a seminal paper (Moser and Tardos, JACM’10), Moser and Tardos developed a simple and powerful algorithm to find solutions to combinatorial problems in the variable Lovász Local Lemma (LLL) setting. Kolipaka and Szegedy (Kolipaka and Szegedy, STOC’11) proved that the Moser-Tardos algorithm is efficient up to the tight condition of the abstract Lovász Local Lemma, known as Shearer’s bound. A fundamental problem around LLL is whether the efficient region of the Moser-Tardos algorithm can be further extended.

In this paper, we give a positive answer to this problem. We show that the efficient region of the Moser-Tardos algorithm goes beyond the Shearer’s bound of the underlying dependency graph, if the graph is not chordal. Otherwise, the dependency graph is chordal, and it has been shown that Shearer’s bound exactly characterizes the efficient region for such graphs (Kolipaka and Szegedy, STOC’11; He, Li, Liu, Wang and Xia, FOCS’17).

Moreover, we demonstrate that the efficient region can exceed Shearer’s bound by a constant by explicitly calculating the gaps on several infinite lattices.

The core of our proof is a new criterion on the efficiency of the Moser-Tardos algorithm which takes the intersection between dependent events into consideration. Our criterion is strictly better than Shearer’s bound whenever the intersection exists between dependent events. Meanwhile, if any two dependent events are mutually exclusive, our criterion becomes the Shearer’s bound, which is known to be tight in this situation for the Moser-Tardos algorithm (Kolipaka and Szegedy, STOC’11; Guo, Jerrum and Liu, JACM’19).

1. Introduction

Suppose 𝒜={A1,,Am}\mathcal{A}=\{A_{1},\cdots,A_{m}\} is a set of bad events. If the events are mutually independent, then we can avoid all of these events simultaneously whenever no event has probability 1. Lovász Local Lemma (LLL) [14], one of the most important probabilistic methods, allows for limited dependency among the events, but still concludes that all the events can be avoided simultaneously if each individual event has a bounded probability. In the most general setting (a.k.a. abstract LLL), the dependency among 𝒜\mathcal{A} is characterized by an undirected graph GD=([m],ED)G_{D}=([m],E_{D}), called a dependency graph of 𝒜\mathcal{A}, which satisfies that for any vertex ii, AiA_{i} is independent of {Aj:j𝒩GD(i){i}}\{A_{j}:j\notin\mathcal{N}_{G_{D}}(i)\cup\{i\}\}. Here 𝒩G(i)\mathcal{N}_{G}(i) stands for the set of neighbors of vertex ii in a given graph GG.

We use 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}) to denote that (i) GDG_{D} is a dependency graph of 𝒜\mathcal{A} and (ii) the probability vector of 𝒜\mathcal{A} is 𝒑\bm{p}. Given a graph GDG_{D}, define the abstract interior a(GD)\mathcal{I}_{a}(G_{D}) to be the set consisting of all vectors 𝒑\bm{p} such that (A𝒜A¯)>0\mathbb{P}\left(\cap_{A\in\mathcal{A}}\overline{A}\right)>0 for any 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). In this context, the most frequently used abstract LLL can be stated as follows:

Theorem 1.1 ([58]).

Given any graph GD=([m],ED)G_{D}=([m],E_{D}) and any probability vector 𝐩(0,1]m\bm{p}\in(0,1]^{m}, if there exist real numbers x1,,xm(0,1)x_{1},...,x_{m}\in(0,1) such that pixij𝒩GD(i)(1xj)p_{i}\leq x_{i}\prod_{j\in\mathcal{N}_{G_{D}}(i)}(1-x_{j}) for any i[m]i\in[m], then 𝐩a(GD)\bm{p}\in\mathcal{I}_{a}(G_{D}).

Shearer [56] obtained the strongest possible condition for abstract LLL. Let Ind(GD)\text{Ind}(G_{D}) be the set of all independent sets of an undirected graph GD=([m],ED)G_{D}=([m],E_{D}) and 𝒑=(p1,,pm)(0,1]m\bm{p}=(p_{1},\cdots,p_{m})\in(0,1]^{m}. For each IInd(GD)I\in\text{Ind}(G_{D}), define the quantity

qI(GD,𝒑)=JInd(GD),IJ(1)|J||I|iJpi.q_{I}(G_{D},\bm{p})=\sum_{J\in\text{Ind}(G_{D}),I\subseteq J}(-1)^{|J|-|I|}\prod_{i\in J}p_{i}.

𝒑\bm{p} is called in Shearer’s bound of GDG_{D} if qI(GD,𝒑)>0q_{I}(G_{D},\bm{p})>0 for any IInd(GD)I\in\text{Ind}(G_{D}). Otherwise we say 𝒑\bm{p} is beyond Shearer’s bound of GDG_{D}. Shearer’s result can be stated as follows.

Theorem 1.2 ([56]).

For any graph GD=([m],ED)G_{D}=([m],E_{D}) and any probability vector 𝐩(0,1]m\bm{p}\in(0,1]^{m}, 𝐩a(GD)\bm{p}\in\mathcal{I}_{a}(G_{D}) if and only if 𝐩\bm{p} is in Shearer’s bound of GDG_{D}.

Variable Lovász Local Lemma. Variable Lovász Local Lemma (VLLL) is another quite general and common setting of LLL, which applies to variable-generated event systems. In this setting, there is a set of underlying mutually independent random variables {X1,,Xn}\{X_{1},\cdots,X_{n}\}, and each event AiA_{i} can be fully determined by some variables vbl(Ai)\mathrm{vbl}(A_{i}) of them. The dependency between events and variables can be naturally characterized by a bipartite graph GB=([m],[n],EB)G_{B}=([m],[n],E_{B}), known as the event-variable graph, such that edge (i,j)[m]×[n](i,j)\in[m]\times[n] exists if and only if Xjvbl(Ai)X_{j}\in\mathrm{vbl}(A_{i}).

The variable setting is important, mainly because most applications of LLL have natural underlying independent variables, such as the satisfiability of CNF formulas[30, 31, 50, 16], hypergraph coloring [49, 29], and Ramsey numbers[57, 58, 32]. In particular, the groundbreaking result by Moser and Tardos [54] on constructive LLL applies in the variable setting.

There is a natural choice for the dependency graph of variable-generated systems, called the canonical dependency graph: two events are adjacent if they share some common variables. Formally, given a bipartite graph GB=(U,V,EB)G_{B}=(U,V,E_{B}), its base graph is defined as the graph GD(GB)=(U,ED)G_{D}(G_{B})=(U,E_{D}) such that for any two vertices ui,ujUu_{i},u_{j}\in U, (ui,uj)ED(u_{i},u_{j})\in E_{D} if and only if uiu_{i} and uju_{j} share common neighbors in GBG_{B}. If GBG_{B} is the event-variable graph of a variable-generated system 𝒜\mathcal{A}, then GD(GB)G_{D}(G_{B}) is the canonical dependency graph of 𝒜\mathcal{A}.

Given a graph GDG_{D}, define the variable interior v(GD)\mathcal{I}_{v}(G_{D}) to be the set consisting of all vectors 𝒑\bm{p} such that (A𝒜A¯)>0\mathbb{P}\left(\cap_{A\in\mathcal{A}}\overline{A}\right)>0 for any variable-generated event system 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). Obviously, v(GD)a(GD)\mathcal{I}_{v}(G_{D})\supseteq\mathcal{I}_{a}(G_{D}) for any GDG_{D}. In contrast with the abstract LLL, the Shearer’s bound (of the canonical dependency graph) turns out to be not tight for variable-generated systems [36]: the containment is proper if and only if GDG_{D} is not chordal111A graph is chordal if it has no induced cycle of length at least four..

Constructive (variable) Lovász Local Lemma and Moser-Tardos algorithm. The abstract LLL and the variable LLL mentioned above are not constructive in that they do not indicate how to efficiently find an object avoiding all the bad events. In a seminal paper [54], Moser and Tardos developed an amazingly simple efficient algorithm for variable-generated systems, depicted in Algorithm 1222Throughout the paper, the Moser-Tardos algorithm is allowed to follow arbitrary selection rules., and showed that this algorithm terminates quickly under the condition in Theorem 1.1. Following the Moser-Tardos algorithm (or MT algorithm for short), a large amount of effort devoted to constructive LLL, including the remarkable works which extend the MT techniques beyond the variable setting  [38, 5, 3, 4, 42, 41]. The MT algorithm has been applied to many important problems, including kk-SAT [31], hypergraph coloring [32], Hamiltonian cycle [32], and their counting and sampling  [27, 50, 16, 18, 44, 40].

1
2Assign random values to X1,,XnX_{1},\cdots,X_{n};
3 while i[m]\exists i~{}\in[m] such that AiA_{i} holds do
4       Arbitrarily select one such ii and resample all variables XjX_{j} in vbl(Ai)\mathrm{vbl}(A_{i});
Return the current assignment;
Algorithm 1 Moser-Tardos Algorithm

Mainly because such a simple algorithm is so powerful and general-purpose, it is one of the most intriguing and fundamental problems on constructive LLL how powerful the MT algorithm is. Given a graph GDG_{D}, define the Moser-Tardos interior MT(GD)\mathcal{I}_{MT}(G_{D}) to be the set consisting of all vectors 𝒑\bm{p} such that the MT algorithm is efficient for any variable-generated event system 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). Clearly, MT(GD)v(GD)\mathcal{I}_{MT}(G_{D})\subseteq\mathcal{I}_{v}(G_{D}) for any GDG_{D}. A major line of follow-up works explores MT(GD)\mathcal{I}_{MT}(G_{D})  [46, 55, 45, 10]. The best known criterion is obtained by Kolipaka and Szegedy [45]. They extended the MT interior to the Shearer’s bound. That is, they showed that MT(GD)a(GD)\mathcal{I}_{MT}(G_{D})\supseteq\mathcal{I}_{a}(G_{D}). As mentioned above, if GDG_{D} is not chordal, a(GD)\mathcal{I}_{a}(G_{D}) is properly contained in v(GD)\mathcal{I}_{v}(G_{D}), so it is possible to further push MT(GD)\mathcal{I}_{MT}(G_{D}) beyond Shearer’s bound.

In this paper, we concentrate on the following open problem:

Problem 1: does MT(GD)\mathcal{I}_{MT}(G_{D}) properly contain a(GD)\mathcal{I}_{a}(G_{D}) for some GDG_{D}? If so, for what kind of graph GDG_{D}?

Rather than potential applications, our main motivations are the following fundamental problems around LLL itself:

  • The limitation of the constructive LLL in the variable setting. In the most fascinating problems around LLL, a mysterious conjecture says that there is an algorithm which is efficient for all variable-generated systems 𝒜\mathcal{A} if 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}) for some GDG_{D} and 𝒑v(GD)\bm{p}\in\mathcal{I}_{v}(G_{D}) [59]. It would be a small miracle if the conjecture is true, since if so, one can always construct a solution efficiently in the variable setting if solutions are guaranteed to exist by the LLL condition. Towards this conjecture, a good start is to show that MT(GD)a(GD)\mathcal{I}_{MT}(G_{D})\supsetneqq\mathcal{I}_{a}(G_{D}) for some GDG_{D}, as v(GD)a(GD)\mathcal{I}_{v}(G_{D})\supsetneqq\mathcal{I}_{a}(G_{D}) for GDG_{D} which is not chordal.

  • The limitation of the MT algorithm. The MT algorithm is one of the most intriguing topics in modern algorithm researches, not only because it is very simple and with magic power, but also because it is closely related to the famous Walksat algorithm for random kk-SAT. A mysterious problem about the MT algorithm is where is its true limitation [59, 10]. It is conjectured that MT(GD)=v(GD)\mathcal{I}_{MT}(G_{D})=\mathcal{I}_{v}(G_{D}) for any GDG_{D} [59]. To prove this conjecture, the first step is to give a positive answer to Problem 1. Moreover, due to the connection between Shearer’s bound and the Repulsive Lattice Gas model, it is conjectured that essential connection exists between statistical mechanics and the MT algorithm [59]. Whether MT(GD)=a(GD)\mathcal{I}_{MT}(G_{D})=\mathcal{I}_{a}(G_{D}) for each GDG_{D} is critical to this conjecture.

Remark 1.3.

To explore the power of the MT algorithm in specific applications, one may employ special structures of the applications, such as the way the variables interact, to obtain sharp bounds rather than in terms of the canonical dependency graph only. Nevertheless, characterizing the power of the MT algorithm in terms of the canonical dependency graph is a very fundamental problem and also the focus of the major line of researches [54, 55, 9, 45]. Moreover, a major difficulty to strengthen the guarantees of the MT algorithm is that the analysis should be valid for all possible variable-generated event systems. It is not quite surprising to obtain better bounds if the event system has further restrictions. To substantially improve the guarantees of the MT algorithm and provide deep insight about its dynamics, we would rather focus on the general variable LLL setting than employ the special structures in the applications.

We should emphasize that Problem 1 is still quite open! As mentioned before, it has been proved that the Shearer’s bound is not tight for variable-generated systems [36]. However, this only says that there is some probability vector 𝒑\bm{p} beyond the Shearer’s bound such that all variable-generated event systems 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}) must have a satisfying assignment. It is unclear whether the MT algorithm can construct such an assignment efficiently.

It also has been proved that the MT algorithm can still be efficient even beyond the Shearer’s bound for some specific applications [32]. Despite its novel contribution, this result does not provide an answer to Problem 1. The result in [32] focuses on the event systems with special structures. Thus, it only implies that there is a probability vector 𝒑\bm{p} beyond the Shearer’s bound such that the MT algorithm is efficient for some restricted variable-generated event systems 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). However, to show MT(GD)a(GD)\mathcal{I}_{MT}(G_{D})\supsetneqq\mathcal{I}_{a}(G_{D}), one must prove that the MT algorithm is efficient for all possible event systems, and this is one major difficulty to resolve Problem 1.

1.1. Results and contributions

We provide a complete answer to Problem 1 (Theorem 1.5): if GDG_{D} is not chordal, then MT(GD)a(GD)\mathcal{I}_{MT}(G_{D})\supsetneqq\mathcal{I}_{a}(G_{D}), i.e., the efficient region of the MT algorithm goes beyond Shearer’s bound. Otherwise, MT(GD)=a(GD)\mathcal{I}_{MT}(G_{D})=\mathcal{I}_{a}(G_{D}), because a(GD)MT(GD)v(GD)\mathcal{I}_{a}(G_{D})\subseteq\mathcal{I}_{MT}(G_{D})\subseteq\mathcal{I}_{v}(G_{D}) and v(GD)=a(GD)\mathcal{I}_{v}(G_{D})=\mathcal{I}_{a}(G_{D}) for chordal graphs GDG_{D}  [36].

The core of the proof of Theorem 1.5 is a new convergence criterion for the MT algorithm (Theorem 1.6), which may be of independent interest. This new criterion takes the intersection between dependent events into consideration, and is strictly better than Shearer’s bound when there exists a pair of dependent events which are not mutually exclusive.

1.1.1. Moser-Tardos algorithm: beyond Shearer’s bound

Given a dependency graph GD=([m],ED)G_{D}=([m],E_{D}) and a probability vector 𝒑=(p1,p2,,pm)(0,1)m\bm{p}=(p_{1},p_{2},\cdots,p_{m})\in(0,1)^{m}, we say that 𝒑\bm{p} is on the Shearer’s boundary of GDG_{D} if (1ε)𝒑(1-\varepsilon)\bm{p} is in Shearer’s bound and (1+ε)𝒑(1+\varepsilon)\bm{p} is not for any ε>0\varepsilon>0. A chordless cycle in a graph GDG_{D} is an induced cycle of length at least 4. A chordal graph is a graph without chordless cycles.

Given two vectors 𝒑\bm{p} and 𝒒\bm{q}, we say 𝒑𝒒\bm{p}\leq\bm{q} if the inequality holds entry-wise. Additionally, if the inequality is strict on at least one entry, we say that 𝒑<𝒒\bm{p}<\bm{q}.

Definition 1.4 (Maximum L1L_{1}-gap to the Shearer’s bound).

Given a dependency graph GDG_{D} and a probability vector 𝐩\bm{p} beyond the Shearer’s bound of GDG_{D}, define the maximum L1L_{1}-gap from 𝐩\bm{p} to the Shearer’s bound of GDG_{D} as

d(𝒑,GD)argsup𝒒1{𝒑𝒒a(GD):𝒒𝒑}.d(\bm{p},G_{D})\triangleq\arg\sup_{||\bm{q}||_{1}}\{\bm{p}-\bm{q}\not\in\mathcal{I}_{a}(G_{D}):\bm{q}\leq\bm{p}\}.

For convenience, we let d(𝐩,GD)=1d(\bm{p},G_{D})=-1 if 𝐩\bm{p} is in the Shearer’s bound of GDG_{D}.

Intuitively, d(𝒑,GD)d(\bm{p},G_{D}) measures how far 𝒑\bm{p} is from the Shearer’s bound of GDG_{D}. One can verify that d(𝒑,GD)<0d(\bm{p},G_{D})<0 if 𝒑\bm{p} is in the Shearer’s bound, d(𝒑,GD)=0d(\bm{p},G_{D})=0 if 𝒑\bm{p} is on the Shearer’s boundary, and d(𝒑,GD)>0d(\bm{p},G_{D})>0 if 𝒑\bm{p} is beyond Shearer’s bound but not on the Shearer’s boundary. Now, we are ready to state our main result.

Theorem 1.5.

For any chordal graph GDG_{D}, MT(GD)=a(GD)\mathcal{I}_{MT}(G_{D})=\mathcal{I}_{a}(G_{D}), i.e., 𝐩MT(GD)\bm{p}\in\mathcal{I}_{MT}(G_{D}) iff d(𝐩,GD)<0d(\bm{p},G_{D})<0.

For any graph GDG_{D} which is not chordal, 𝐩MT(GD)\bm{p}\in\mathcal{I}_{MT}(G_{D}) if

d(𝒑,GD)<1545i|Ci|(minjCipj)4(max{2jCipj|Ci|1,0})2d(\bm{p},G_{D})<\frac{1}{545}\cdot\sum_{i\leq\ell}|C_{i}|\big{(}\min_{j\in C_{i}}p_{j}\big{)}^{4}\cdot\left(\max\left\{\frac{2\sum_{j\in C_{i}}\sqrt{p_{j}}}{|C_{i}|}-1,0\right\}\right)^{2}

for some disjoint chordless cycles C1,C2,,CC_{1},C_{2},\cdots,C_{\ell} in GDG_{D}. In particular, there is a probability vector 𝐩\bm{p} with d(𝐩,GD)220K3d(\bm{p},G_{D})\geq 2^{-20}K^{-3} satisfying the above condition, where KK is the length the shortest chordless cycle. This implies that MT(GD)\mathcal{I}_{MT}(G_{D}) contains a probability vector 𝐩\bm{p} with d(𝐩,GD)220K3d(\bm{p},G_{D})\geq 2^{-20}K^{-3}.

The intuition of Theorem 1.5 is as follows. The theorem characterizes the efficient region of the MT algorithm with d(𝒑,GD)d(\bm{p},G_{D}). It shows that if d(𝒑,GD)d(\bm{p},G_{D}) is upper bounded by a non-negative quantity related to the chordless cycles in GDG_{D}, then the MT algorithm is efficient. Since a(GD)\mathcal{I}_{a}(G_{D}) is the set of 𝒑\bm{p} where d(𝒑,GD)<0d(\bm{p},G_{D})<0, our criterion is at least as good as Shearer’s bound. Moreover, for each GDG_{D} which is not chordal, our criterion is strictly better: there exists some 𝒑\bm{p} with d(𝒑,GD)220K3d(\bm{p},G_{D})\geq 2^{-20}K^{-3} satisfying our criterion. Intuitively, Theorem 1.5 implies that chordless cycles in GDG_{D} enhance the power of the MT algorithm.

We emphasize that Theorem 1.5 provides a complete answer to Problem 1: MT(GD)\mathcal{I}_{MT}(G_{D}) properly contains a(GD)\mathcal{I}_{a}(G_{D}) if and only if GDG_{D} is not chordal.

1.1.2. A new constructive LLL for non-extremal instances

Given a set 𝒜\mathcal{A} of events with dependency graph GDG_{D}, 𝒜\mathcal{A} is called extremal if all pairs of dependent events are mutually exclusive, and non-extremal otherwise. Kolipaka and Szegedy [45] showed that the MT algorithm is efficient up to the Shearer’s bound. In particular, Shearer’s bound is the tight convergence criterion for extremal instances [45, 27]. Here, we provide a new convergence criterion (Theorem 1.6) which is a strict improvement of Kolipaka and Szegedy’s result: this criterion is strictly better than Shearer’s bound when the instance is non-extremal, and becomes Shearer’s bound when the instance is extremal. This criterion, named intersection LLL, is the core of our proof of Theorem 1.5.

Let GD=([m],ED)G_{D}=([m],E_{D}) be a canonical dependency graph and 𝒑=(p1,,pm)(0,1)m\bm{p}=(p_{1},\cdots,p_{m})\in(0,1)^{m} be a probability vector. Let ={(i1,i1),(i2,i2),}ED\mathcal{M}=\{(i_{1},i_{1}^{\prime}),(i_{2},i_{2}^{\prime}),\cdots\}\subseteq E_{D} be a matching of GDG_{D}, and 𝜹=(δi1,i1,δi2,i2,)(0,1)||\bm{\delta}=(\delta_{i_{1},i_{1}^{\prime}},\delta_{i_{2},i_{2}^{\prime}},\cdots)\in(0,1)^{|\mathcal{M}|} be another probability vector. We say that an event set 𝒜\mathcal{A} is of the setting (GD,𝒑,,𝜹)(G_{D},\bm{p},\mathcal{M},\bm{\delta}), and write 𝒜(GD,𝒑,,𝜹)\mathcal{A}\sim(G_{D},\bm{p},\mathcal{M},\bm{\delta}), if 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}) and (AiAi)δi,i\mathbb{P}(A_{i}\cap A_{i^{\prime}})\geq\delta_{i,i^{\prime}} for each pair (i,i)(i,i^{\prime})\in\mathcal{M}. Given (GD,𝒑,,𝜹)(G_{D},\bm{p},\mathcal{M},\bm{\delta}), define 𝒑(0,1)m\bm{p}^{-}\in(0,1)^{m} as follows:

i[m]:pi={pi117δi,i2,if (i,i) for some i;pi,otherwise.\displaystyle\forall i\in[m]:\quad p_{i}^{-}=\begin{cases}p_{i}-\frac{1}{17}\cdot\delta_{i,i^{\prime}}^{2},&\text{if }(i,i^{\prime})\in\mathcal{M}\text{ for some }i^{\prime};\\ p_{i},&\text{otherwise}.\end{cases}
Theorem 1.6 (intersection LLL (informal)).

For any 𝒜(GD,𝐩,,𝛅)\mathcal{A}\sim(G_{D},\bm{p},\mathcal{M},\bm{\delta}), MT algorithm terminates quickly if 𝐩\bm{p}^{-} is in the Shearer’s bound of GDG_{D}.

The intuition of Theorem 1.6 is as follows. For any matching \mathcal{M} in GDG_{D}, if the intersection of events on each edge (i,i)(i,i^{\prime}) in \mathcal{M} has a lower bound δi,i\delta_{i,i^{\prime}}, then one can subtract 117δi,i2\frac{1}{17}\cdot\delta^{2}_{i,i^{\prime}} from the probabilities of endpoints ii and ii^{\prime}, and the MT algorithm is guaranteed to be efficient whenever the reduced probability vector is in the Shearer’s bound.

Remark 1.7.

In many applications of LLL [49, 31, 30, 50, 28], the dependent bad events naturally intersect with each other. For instance, in a CNF formula, if the common variables in two clauses are both either positive or negative, then the bad events corresponding to these two clauses are dependent and intersect with each other. Thus our intersection LLL may be capable of improving bounds for these applications. However, currently the improvement is weak because only the intersections between the matched events are considered in Theorem 1.6.

Nevertheless, the primary motivation of this work is to explore the power of the MT algorithm in the general variable LLL setting. This basic problem is very important in itself, besides its potential applications.

1.1.3. Application to lattices

To illustrate the application of Theorem 1.5, we estimate the efficient region of the MT algorithm on some lattices explicitly. For simplicity, we focus on symmetric probabilities, i.e., 𝒑=(p,p,,p)\bm{p}=(p,p,\cdots,p). Our lower bounds on the gaps between the efficient region of the MT algorithm and the Shearer’s bound are summarized in Table 1. For example, when the canonical dependency graph is the square lattice, the vector (0.1193,0.1193,)(0.1193,0.1193,\cdots) is on the Shearer’s boundary, and the MT algorithm is provably efficient whenever the probability of each event is at most 0.1193+1.858×10220.1193+1.858\times 10^{-22}.

Table 1. Summary of lower bounds on the gaps
Lattice Shearer’s bound lower bound on the gaps
Square 0.1193  [20, 60] 1.858×10221.858\times 10^{-22}
Hexagonal 0.1547  [60] 2.597×10252.597\times 10^{-25}
Simple Cubic 0.0744  [19] 7.445×10237.445\times 10^{-23}

1.2. Technique overview

As mentioned before, the Shearer’s bound is the tight criterion for MT algorithm on extremal instances. Thus in order to show that MT algorithm goes beyond Shearer’s bound, we need to take advantage of the intersection between dependent events. Specifically, Theorem 1.5 immediately follows from two results about non-extremal instances. One is the intersection LLL criterion (Theorem 1.6), which goes beyond Shearer’s bound whenever there are intersections between dependent events. The other result is a lower bound on the amount of intersection between dependent events for general instances (Theorem 4.1).

1.2.1. Proof overview of Theorem 1.6

Let us first remember Kolipaka and Szegedy’s argument [45], which shows that the MT algorithm is efficient up to the Shearer’s bound. We assume that {Ai}i=1m\{A_{i}\}_{i=1}^{m} is a fixed set of events with dependency graph GD=([m],ED)G_{D}=([m],E_{D}) and probabilities 𝒑=(p1,,pm)\bm{p}=(p_{1},\cdots,p_{m}). The notion of a witness DAG333In the paper [45], the role of witness DAGs was played by “stable set sequences”, but the concepts are essentially the same: there is a natural bijection between stable set sequences and wdags. (abbreviated wdag) is central to their argument. A wdag is a DAG whose each node vv has a label L(v)L(v) from [m][m] and in which two nodes vv and vv^{\prime} are connected by an arc if and only if L(v)=L(v)L(v)=L(v^{\prime}) or (L(v),L(v))ED(L(v),L(v^{\prime}))\in E_{D}. With a resampling sequence 𝒔=s1,s2,,sT\bm{s}=s_{1},s_{2},\cdots,s_{T} (i.e., MT algorithm picks the events As1,As2,,AsTA_{s_{1}},A_{s_{2}},\cdots,A_{s_{T}} for resampling in this order), we associate a wdag D𝒔D_{\bm{s}} on node set {v1,,vT}\{v_{1},\cdots,v_{T}\} as follows: (a) L(vk)=skL(v_{k})=s_{k} and (b) there is an arc from vkv_{k} to vv_{\ell} with k<k<\ell if and only if either sk=ss_{k}=s_{\ell} or (sk,s)ED(s_{k},s_{\ell})\in E_{D} (see an example in Figure 1). We say that a wdag DD occurs in the resampling sequence 𝒔\bm{s} if there is subset UU of nodes in D𝒔D_{\bm{s}} such that DD is a subgraph of D𝒔D_{\bm{s}} induced by the nodes that have a directed path to UU (Figure 1 (d) is an example, where U={v4}U=\{v_{4}\}). An useful observation is that 𝔼[T]=D𝒟𝒔[D occurs in 𝒔]\mathbb{E}[T]=\sum_{D\in\mathcal{D}}\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}]. Here, 𝒟\mathcal{D} denotes the set of all single-sink wdags (a.k.a. proper wdags) of GDG_{D}.

We define the weight of a wdag DD to be ΠvDpL(v)\Pi_{v\in D}p_{L(v)}. The crucial lemma in Kolipaka and Szegedy’s argument (the idea is from Moser-Tardos analysis) is that the probability of occurrence of a certain wdag DD is upper bounded by its weight. The idea is that we can assume (only for the analysis) that the MT algorithm has a preprocessing step where it prepares an infinite number of independent samples for each variable. These independent samples create a table 𝑿\bm{X}, called the resampling table (see Figure 2 in Section 3.1 for an example). When the MT algorithm decides to resample variable XjX_{j}, it picks a new sample of XjX_{j} from the resampling table. Suppose a certain wdag DD occurs, then for each of its events we can determine a particular set of samples in the resampling table that must satisfy the event, where we say that DD is consistent with the resampling table 𝑿\bm{X} and denote it by D𝑿D\sim\bm{X}. Hence, 𝒔[D occurs in 𝒔]𝑿[D𝑿]=ΠvDpL(v)\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}]\leq\mathbb{P}_{\bm{X}}[D\sim\bm{X}]=\Pi_{v\in D}p_{L(v)}.

Finally, they solved beautifully the summation of weights of proper wdags, i.e., D𝒟ΠvDpL(v)\sum_{D\in\mathcal{D}}\Pi_{v\in D}p_{L(v)}, which turns out to converge if and only if 𝒑\bm{p} is in the Shearer’s bound of GDG_{D}.

Refer to caption
Figure 1. (a) a dependency graph GDG_{D}; (b) a resample sequence; (c) the DsD_{s}; (d) a wdag occurring in 𝒔\bm{s}.

Viewing Theorem 1.6 as an improvement of Kolipaka and Szegedy’s result, we begin by providing a tighter upper bound on D𝒟𝒔[D occurs in 𝒔]\sum_{D\in\mathcal{D}}\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}] when the instance is non-extremal (Theorem 3.7). First, note that for each wdag DD, there exist selection rules to make 𝒔[D occurs in 𝒔]=ΠvDpL(v)\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}]=\Pi_{v\in D}p_{L(v)}, so it is impossible to give a better upper bound on 𝒔[D occurs in 𝒔]\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}] which holds for all selection rules. Our idea is to group proper wdags, and consider the sum of 𝒔[D occurs in 𝒔]\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}] over a group. For example, suppose that A1A_{1} and A2A_{2} are dependent and [A1A2]δ1,2\mathbb{P}[A_{1}\cap A_{2}]\geq\delta_{1,2}. Let D1D_{1} denote the proper wdag which consists of only one arc A1A2A_{1}\rightarrow A_{2}, and D2D_{2} denote the proper wdag consisting of only A2A1A_{2}\rightarrow A_{1}. D1D_{1} and D2D_{2} cannot both occur, but they may be both consistent with a given resampling table. So the total weights of D1D_{1} and D2D_{2} is an overestimate of the probability that D1D_{1} or D2D_{2} occurs. Formally,

𝒔[D1 occurs in 𝒔]+𝒔[D2 occurs in 𝒔]=\displaystyle\mathbb{P}_{\bm{s}}[D_{1}\mbox{ occurs in }\bm{s}]+\mathbb{P}_{\bm{s}}[D_{2}\mbox{ occurs in }\bm{s}]= 𝒔[(D1 occurs in 𝒔)(D2 occurs in 𝒔)]\displaystyle\mathbb{P}_{\bm{s}}[(D_{1}\mbox{ occurs in }\bm{s})\vee(D_{2}\mbox{ occurs in }\bm{s})]
\displaystyle\leq 𝑿[(D1𝑿)(D2𝑿)]\displaystyle\mathbb{P}_{\bm{X}}[(D_{1}\sim\bm{X})\vee(D_{2}\sim\bm{X})]
=\displaystyle= 𝑿[D1𝑿]+𝑿[(D2𝑿)(D1≁𝑿)]\displaystyle\mathbb{P}_{\bm{X}}[D_{1}\sim\bm{X}]+\mathbb{P}_{\bm{X}}[(D_{2}\sim\bm{X})\wedge(D_{1}\not\sim\bm{X})]
\displaystyle\leq p1p2+p1p2δ1,22,\displaystyle p_{1}p_{2}+p_{1}p_{2}-\delta_{1,2}^{2},

where the last inequality is according to the Cauchy–Schwarz inequality (see Proposition 3.3). Importantly, the upper bound holds for all selection rules.

It is crucial as well as the difficulty that our improvement over the weight of wdags should be “exponential”: since the quantity D𝒟ΠvDpL(v)\sum_{D\in\mathcal{D}}\Pi_{v\in D}p_{L(v)}^{-} converges if and only if 𝒑\bm{p}^{-} is in the Shearer’s bound, constant factor or even sub-exponential improvements over D𝒟ΠvDpL(v)\sum_{D\in\mathcal{D}}\Pi_{v\in D}p_{L(v)} do not help to show the desired convergence criterion. Our exponential improvement relies on a delicate grouping and a tricky random partition of the union of D𝑿D\sim\bm{X} across wdags.

We first state how we group proper wdags: define 𝒟(i,r)\mathcal{D}(i,r) to be the set of proper wdags whose unique sink node is labelled with ii and in which there are exactly rr nodes labelled with ii. Noticing that at most one wdag in 𝒟(i,r)\mathcal{D}(i,r) can occur, we have that

D𝒟(i,r)𝒔[D occurs in 𝒔]=\displaystyle\sum_{D\in\mathcal{D}(i,r)}\mathbb{P}_{\bm{s}}[D\mbox{ occurs in }\bm{s}]= 𝑿[D𝒟(i,r)(D occurs)]𝑿[D𝒟(i,r)(D𝑿)].\displaystyle\mathbb{P}_{\bm{X}}\left[\bigvee_{D\in\mathcal{D}(i,r)}(D\mbox{ occurs})\right]\leq\mathbb{P}_{\bm{X}}\left[\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X})\right].

Now, we partition the space D𝒟(i,r)(D𝑿)\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X}) across wdags in 𝒟(i,r)\mathcal{D}(i,r). The notions of reversible arcs (see Definition 2.4) and a auxiliary table (see Section 3.1) are two central concepts here. Specifically, an arc uvu\rightarrow v in a wdag DD is said reversible, if the directed graph obtained from DD by reversing the direction of uvu\rightarrow v is also a wdag. The auxiliary table is a table 𝒀\bm{Y} of independent fair coins corresponding to directions of reversible arcs. We say a wdag DD is consistent with (𝑿,𝒀)(\bm{X},\bm{Y}), denoted by D(𝑿,𝒀)D\sim(\bm{X},\bm{Y}) if (i) D𝑿D\sim\bm{X}; and (ii) for each reversible arc whose direction is not consistent with 𝒀\bm{Y}, the wdag obtained by reversing the arc is not consistent with 𝑿\bm{X}. The crucial lemma (Lemma 3.1) shows that for any certain assignment 𝒚\bm{y} of the auxiliary table 𝒀\bm{Y}, D𝒟(i,r)(D𝑿)=D𝒟(i,r)(D(𝑿,𝒚))\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X})=\bigvee_{D\in\mathcal{D}(i,r)}(D\sim(\bm{X},\bm{y})). The point is that (D(𝑿,𝒚))(D\sim(\bm{X},\bm{y}))’s have much less overlap with each other so that they can be viewed as a “approximate” partition of the space. By applying union bound, we get

𝑿[D𝒟(i,r)(D𝑿)]=\displaystyle\mathbb{P}_{\bm{X}}\left[\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X})\right]= 𝔼𝒀𝑿[D𝒟(i,r)(D𝑿)]=𝔼𝒀𝑿[D𝒟(i,r)(D(𝑿,𝒀)]\displaystyle\mathbb{E}_{\bm{Y}}\mathbb{P}_{\bm{X}}\left[\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X})\right]=\mathbb{E}_{\bm{Y}}\mathbb{P}_{\bm{X}}\left[\bigvee_{D\in\mathcal{D}(i,r)}(D\sim(\bm{X},\bm{Y})\right]
\displaystyle\leq 𝔼𝒀D𝒟(i,r)𝑿[D(𝑿,𝒀)]\displaystyle\mathbb{E}_{\bm{Y}}\sum_{D\in\mathcal{D}(i,r)}\mathbb{P}_{\bm{X}}\left[D\sim(\bm{X},\bm{Y})\right]
=\displaystyle= D𝒟(i,r)𝔼𝒀𝑿[D(𝑿,𝒀)].\displaystyle\sum_{D\in\mathcal{D}(i,r)}\mathbb{E}_{\bm{Y}}\mathbb{P}_{\bm{X}}\left[D\sim(\bm{X},\bm{Y})\right].

Then we are able to provide an upper bound on 𝔼𝒀𝑿[D(𝑿,𝒀)]\mathbb{E}_{\bm{Y}}\mathbb{P}_{\bm{X}}\left[D\sim(\bm{X},\bm{Y})\right] which is “exponentially” smaller than ΠvDpL(v)\Pi_{v\in D}p_{L(v)} (Lemma 3.4), and then complete the proof of Theorem 3.7.

The next step is to show that the tighter upper bound converges when 𝒑\bm{p}^{-} is in the Shearer’s bound. For each vertex ii in the matching \mathcal{M}, we “split” vertex ii into two new connected vertices ii^{\uparrow} and ii^{\downarrow}. Let GG^{\mathcal{M}} be the resulted dependency graph (see an example in Figure 3). Define pi=pip^{\mathcal{M}}_{i^{\uparrow}}=p^{\prime}_{i} and pi=pipip^{\mathcal{M}}_{i^{\downarrow}}=p^{-}_{i}-p^{\prime}_{i} (see the definition of pip_{i}^{\prime} in Section 2.3). One can see that (GD,𝒑)(G_{D},\bm{p}^{-}) and (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}) are essentially the same: suppose 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}^{-}), then for each ii\in\mathcal{M}, we view AiA_{i} as the union of two mutually exclusive events AiA_{i^{\uparrow}} and AiA_{i^{\downarrow}} whose probabilities are pip_{i}^{\prime} and ppip^{-}-p_{i}^{\prime} respectively. Such a representation of 𝒜\mathcal{A} is of the setting (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}). Thus, the sum of weights of proper wdags in the setting (GD,𝒑)(G_{D},\bm{p}^{-}) is equal to that in the setting (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}) (Proposition 3.9). So it suffices to show that our tighter upper bound is upper bounded by the sum of weights of proper wdags in the setting (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}) (Theorem 3.13). Our idea is to construct a mapping which maps each D𝒟(GD)D\in\mathcal{D}(G_{D}) to a subset of 𝒟(G)\mathcal{D}(G^{\mathcal{M}}) and satisfies that:

  • (a)

    distinct proper wdags of GDG_{D} are mapped to disjoint subsets of 𝒟(G)\mathcal{D}(G^{\mathcal{M}}); and

  • (b)

    for each D𝒟(GD)D\in\mathcal{D}(G_{D}), the bound in Lemma 3.4 is upper bounded by the sum of weights of proper wdags over the subset that DD is mapped to.

We present such a mapping in Definition 3.11. Conditions (a) and (b) are verified in Theorem 3.12 and Theorem 3.13 respectively.

The idea of constructing a mapping between wdags of two dependency graphs may be of independent interest, and may be applied elsewhere when we wish to show some properties about Shearer’s bound.

1.2.2. Proof overview of Theorem 4.1

The proof of Theorem 4.1 mainly consists of two parts. First, we show that there is an elementary event set which approximately achieves the minimum amount of the intersection between dependent events (Lemma 4.2). Here, we call an event Ai𝒜A_{i}\in\mathcal{A} elementary, if there is a subset SjiS_{j}^{i} of the domain of variable XjX_{j} for each variable in vbl(A)\mathrm{vbl}(A) such that AA happens if and only if XjSjiX_{j}\in S_{j}^{i} for all variables in vbl(A)\mathrm{vbl}(A). We call a set 𝒜\mathcal{A} of events elementary if every Ai𝒜A_{i}\in\mathcal{A} is elementary. Then, for elementary event sets, by applying AM-GM inequality, we obtain a lower bound on the total amount of overlap on common variables, which further implies a lower bound on the amount of intersection between dependent events (Lemma 4.5).

1.3. Related works

Beck proposed the first constructive LLL, which provides efficient algorithms for finding the perfect object avoiding all “bad” events [7]. His methods were refined and improved by a long line of research [6, 53, 13, 39]. In a groundbreaking work, Moser and Tardos proposed a new algorithm, i.e., Algorithm 1, and proved that it finds such a perfect object under the condition in Theorem 1.1 in the variable setting [54]. Pegden [55] proved that the MT algorithm efficiently converges even under the condition of the cluster expansion local lemma [9]. Kolipaka and Szegedy  [45] pushed the efficient region to Shearer’s bound. The phenomenon that the MT algorithm can still be efficient beyond Shearer’s bound was known to exist for sporadic and toy examples [32]. However, such result employs the special structures in the examples and only applies to some restricted variable-generated event systems 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). By contrast, the results in this work applies to all variable-generated event systems.

Besides the line of research exploring the efficient region of the MT algorithm, there is a large amount of effort devoted to derandomizing or parallelizing the MT algorithm [54, 11, 34, 8, 23, 12, 35, 33] and to extending the Moser-Tardos techniques beyond the variable setting [38, 2, 41, 5, 3, 52, 42, 4].

There is a line of works studying the gap between non-constructive VLLL and Shearer’s bound [45, 36, 24, 37]. Kolipaka and Szegedy [45] obtained the first example of gap existence where the canonical dependency graph is a cycle of length 4. The paper [36] showed that Shearer’s bound is not tight for VLLL. More precisely, Shearer’s bound is tight for non-constructive VLLL if and only if the canonical dependency graph is chordal. The first paper to study quantitatively the gaps systematically is [37], which provides lower bounds on the gap when the canonical dependency graph containing many chordless cycles.

Erdös and Spencer [15] introduced the lopsided-LLL, which extends the results in [14] to lopsidependency graphs. Lopsided LLL has many interesting applications in combinatorics and theoretical computer science, such as the kk-SAT [31], random permutations [47], Hamiltonian cycles [1], and matchings on the complete graph [48]. Shearer’s bound is also the tight condition for the lopsided LLL [56].

LLL has a strong connection to sampling. Guo, Jerrum and Liu [27] proved that the MT algorithm indeed uniformly samples a perfect object if the instance is extremal. For extremal instances, they developed an algorithm called “partial rejection sampling” which resamples in a parallel fashion, since the occurring bad events form an independent set in the dependency graph. Actually, a series of sampling algorithms for specific problems are the parallel resampling algorithm running in the extremal case [27, 26, 22, 25]. In a celebrated work, Moitra [51] introduced a novel approach that utilizes LLL to sample kk-CNF solutions. This approach was then extended by several works  [29, 21, 16, 17, 43, 44].

1.4. Organization of the paper.

In Section 2, we recall and introduce some definitions and notations. In Section 3, we prove Theorem 1.6. Section 4 is about the proof of Theorem 4.1, which gives a lower bound on the amount of the intersection between dependent events. In Section 5, we prove Theorem 1.5. In Section 6, we provide a explicit lower bound for the gaps between the efficient region of MT algorithm and Shearer’s bound on periodic Euclidean graphs.

2. Preliminaries

Let ={0,1,2,}\mathbb{N}=\{0,1,2,\cdots\} denote the set of non-negative integers. Let +={1,2,}\mathbb{N}^{+}=\{1,2,\cdots\} denote the set positive integers. For m+m\in\mathbb{N}^{+}, we define [m]={1,,m}[m]=\{1,\cdots,m\}. Throughout this section, we fix a canonical dependency graph GD=([m],ED)G_{D}=([m],E_{D}).

2.1. Witness DAG

If for a given run, MT algorithm picks the events As1,As2,,AsTA_{s_{1}},A_{s_{2}},...,A_{s_{T}} for resampling in this order, we say that 𝒔=s1,s2,sT\bm{s}=s_{1},s_{2}...,s_{T} is a resample sequence. If the algorithm never finishes, the resample sequence is infinite, and in this case we set T=T=\infty.

Definition 2.1 (Witness DAG).

We define a witness DAG (abbreviated wdag) of GDG_{D} to be a DAG DD, in which each node vv has a label L(v)L(v) from [m][m], and which satisfies the additional condition that for all distinct nodes v,vDv,v^{\prime}\in D there is an arc between vv and vv^{\prime} (in either direction) if and only if L(v)=L(v)L(v)=L(v^{\prime}) or (L(v),L(v))ED\big{(}L(v),L(v^{\prime})\big{)}\in E_{D}.

We say DD is a proper wdag (abbreviated pwdag) if DD has only one sink node. Let 𝒟(GD)\mathcal{D}(G_{D}) denote the set of pwdags of GDG_{D}.

Given a resampling sequence 𝒔=s1,s2,,sT\bm{s}=s_{1},s_{2},...,s_{T}, we associate a wdag D𝒔D_{\bm{s}} on the node set {v1,,vT}\{v_{1},...,v_{T}\} such that (i) L(vk)=skL(v_{k})=s_{k} and (ii) vkvv_{k}\rightarrow v_{\ell} with k<k<\ell is as an arc of D𝒔D_{\bm{s}} if and only if either sk=ss_{k}=s_{\ell} or (sk,s)ED(s_{k},s_{\ell})\in E_{D}. See Figure 1 for an example of D𝒔D_{\bm{s}}.

Given a wdag DD and a set UU of nodes of DD, we define D(U)D(U) to be the induced subgraph on all nodes which has a directed path to some uUu\in U. Note that D(U)D(U) is also a wdag. We say that HH is a prefix of DD, denoted by HDH\unlhd D, if H=D(U)H=D(U) for some node set UU.

Definition 2.2.

We say a wdag DD occurs in a resampling sequence 𝐬\bm{s} if DD𝐬D\unlhd D_{\bm{s}}. Let χD\chi_{D} be the indicator variable of the event that DD occurs in 𝐬\bm{s}.

Similar to Lemma 12 in [45], we have that T=D𝒟(GD)χDT=\sum_{D\in\mathcal{D}(G_{D})}\chi_{D}. For i[m]i\in[m] and r+r\in\mathbb{N}^{+}, define 𝒟(i,r)\mathcal{D}(i,r) to be the set of pwdags whose unique sink node is labelled with ii and in which there are exactly rr nodes labelled with ii. Let χ𝒟(i,r)\chi_{\mathcal{D}(i,r)} be the indicator variable of the event that there is a D𝒟(i,r)D\in\mathcal{D}(i,r) occurring in 𝒔\bm{s}. It is easy to see that only one pwdag in 𝒟(i,r)\mathcal{D}(i,r) can occur in 𝒔\bm{s}. Thus χ𝒟(i,r)=D𝒟(i,r)χD\chi_{\mathcal{D}(i,r)}=\sum_{D\in\mathcal{D}(i,r)}\chi_{D}, which further implies that

Fact 2.3.

T=i[m]r+χ𝒟(i,r)T=\sum_{i\in[m]}\sum_{r\in\mathbb{N}^{+}}\chi_{\mathcal{D}(i,r)}.

2.2. Reversible arc

In the rest of this section, we fix a matching ED\mathcal{M}\subseteq E_{D} of GDG_{D}. Given i[m]i\in[m], with a slight abuse of notation, we sometimes say ii\in\mathcal{M} if there is some i[m]i^{\prime}\in[m] such that (i,i)(i,i^{\prime})\in\mathcal{M}.

Definition 2.4 (Reversibility).

We say that an arc uvu\rightarrow v is reversible in a wdag DD if the directed graph obtained from DD by reversing the direction of the arc is still a DAG.

Furthermore, we say that uvu\rightarrow v is \mathcal{M}-reversible in DD if uvu\rightarrow v is reversible in DD and (L(u),L(v))(L(u),L(v))\in\mathcal{M}.

By definition, we have the following two observations.

Fact 2.5.

uvu\rightarrow v is reversible in DD if and only if it is the unique path from uu to vv in DD.

Fact 2.6.

If uvu\rightarrow v is reversible in a wdag DD of GDG_{D}, then the directed graph obtained from DD by reversing the direction of uvu\rightarrow v is also a wdag of GDG_{D}.

Given a pwdag D=(V,E,L)D=(V,E,L), define

𝒱(D){v:uV such that uv or vu is -reversible in D}\mathcal{V}(D)\triangleq\{v:\exists u\in V\text{ such that $u\rightarrow v$ or $v\rightarrow u$ is $\mathcal{M}$-reversible in $D$}\}

to be the set of nodes participating in reversible arcs, and 𝒱¯(D)V𝒱(D)\overline{\mathcal{V}}(D)\triangleq V\setminus\mathcal{V}(D). For i[m]i\in[m], define 𝒱(D,i)𝒱(D){v:L(v)=i}\mathcal{V}(D,i)\triangleq\mathcal{V}(D)\cap\{v:L(v)=i\}.

2.3. Other notations

Let 𝒑=(p1,,pm)(0,1]m\bm{p}=(p_{1},\cdots,p_{m})\in(0,1]^{m} and 𝜹(0,1)\bm{\delta}\in(0,1)^{\mathcal{M}} be two probability vectors. Recall that 𝒑=(p1,,pm)\bm{p}^{-}=(p_{1}^{-},\cdots,p_{m}^{-}) is defined as

(1) i[m]:pi={piδi,i217if (i,i) for some i,piotherwise.\displaystyle\forall i\in[m]:\quad p_{i}^{-}=\begin{cases}p_{i}-\frac{\delta_{i,i^{\prime}}^{2}}{17}&\text{if }(i,i^{\prime})\in\mathcal{M}\text{ for some }i^{\prime},\\ p_{i}&\text{otherwise}.\end{cases}

For each i[m]i\in[m] where (i,i)(i,i^{\prime})\in\mathcal{M} for some i[m]i^{\prime}\in[m], define

ciδi,i28pipi and pipi(1ci)=piδi,i28pi.c_{i}\triangleq\frac{\delta_{i,i^{\prime}}^{2}}{8p_{i}p_{i^{\prime}}}\quad\quad\quad\text{ and }\quad\quad\quad p_{i}^{\prime}\triangleq p_{i}(1-c_{i})=p_{i}-\frac{\delta_{i,i^{\prime}}^{2}}{8p_{i^{\prime}}}.
Fact 2.7.

pi+pi(pipi)pip^{-}_{i}+p^{-}_{i^{\prime}}(p^{-}_{i}-p^{\prime}_{i})\geq p_{i} for each (i,i)(i,i^{\prime})\in\mathcal{M}.

3. Proof of Theorem 1.6

The proof of Theorem 1.6 consists of two parts. First, we provide a tighter upper bound on the complexity of MT algorithm (Section 3.1). Then, we show that the tighter upper bound converges if 𝒑\bm{p}^{-} is in the Shearer’s bound of GDG_{D} (Section 3.2).

3.1. A tighter upper bound on the complexity of MT algorithm

In this subsection, we prove Theorem 3.7, which follows from Lemma 3.1 and Lemma 3.4 immediately. We first recall and introduce some concepts and notations.

Resampling Table. One key analytical technique of Moser and Tardos [54] is to precompute the randomness in a resampling table 𝑿\bm{X}. Specifically, we can assume (only for the analysis) that MT algorithm has a preprocessing step where it draws an infinite number of independent samples Xj1,Xj2,X_{j}^{1},X_{j}^{2},\cdots for each variable XjX_{j}. These independent samples create a table 𝑿=(Xjk)j[m],k+\bm{X}=(X_{j}^{k})_{j\in[m],k\in\mathbb{N}^{+}}, called the resampling table (see Figure 2). MT algorithm takes that first column as the initial assignments of X1,,XnX_{1},\cdots,X_{n}. Then, when XjX_{j} is to be resampled, MT algorithm goes right in the row corresponding to XjX_{j} and picks the sample.

Consistency with the resampling table. For a wdag DD, a node vv, and a variable Xjvbl(AL(v))X_{j}\in\mathrm{vbl}(A_{L(v)}), we define

(D,v,j)|{u: there is a directed path from u to v in D and Xjvbl(AL(u))}|+1.\mathcal{L}(D,v,j)\triangleq|\{u:\text{ there is a directed path from $u$ to $v$ in $D$ and }X_{j}\in\mathrm{vbl}(A_{L(u)})\}|+1.

Moreover, let 𝑿D,v{Xj(D,v,j):XjAL(v)}\bm{X}_{D,v}\triangleq\{X_{j}^{\mathcal{L}(D,v,j)}:X_{j}\in A_{L(v)}\}. We say that DD is consistent with 𝑿\bm{X}, denoted by D𝑿D\sim\bm{X}, if for each node vv in DD, the event AL(v)A_{L(v)} holds on 𝑿D,v\bm{X}_{D,v}. Intuitively, suppose DD occurs, then 𝑿D,v\bm{X}_{D,v} are the assignments of vbl(AL(v))\mathrm{vbl}(A_{L(v)}) just before the time that the MT algorithm picks the event corresponding to vv to resample, hence AL(v)A_{L(v)} must hold on 𝑿D,v\bm{X}_{D,v}. We sometimes use (v,j)\mathcal{L}(v,j) and 𝑿v\bm{X}_{v} instead of (D,v,j)\mathcal{L}(D,v,j) and 𝑿D,v\bm{X}_{D,v} respectively if DD is clear from the context. Besides, we use 𝒟(i,r)𝑿\mathcal{D}(i,r)\sim\bm{X} to denote that there is some D𝒟(i,r)D\in\mathcal{D}(i,r) such that D𝑿D\sim\bm{X}.

Refer to caption
Refer to caption
Figure 2. The left is a resampling table where there are four variables X1,,X4X_{1},\cdots,X_{4}. The right is an auxiliary table where ={(1,2),(3,4),(5,6),(7,8)}.\mathcal{M}=\{(1,2),(3,4),(5,6),(7,8)\}.

Auxiliary Table. We introduce another central concept in the proof of Theorem 3.7, called the auxiliary table, which is a table of independent fair coins. Specifically, for each pair (i,i)(i,i^{\prime})\in\mathcal{M}, we draw an infinite number of independent fair coins Yi,i1,Yi,i2,Y_{i,i^{\prime}}^{1},Y_{i,i^{\prime}}^{2},\cdots, where (Yi,ik=i)=(Yi,ik=i)=1/2\mathbb{P}(Y_{i,i^{\prime}}^{k}=i)=\mathbb{P}(Y_{i,i^{\prime}}^{k}=i^{\prime})=1/2. These independent coins form the auxiliary table 𝒀=(Yi,ik)(i,i),k+\bm{Y}=(Y_{i,i^{\prime}}^{k})_{(i,i^{\prime})\in\mathcal{M},k\in\mathbb{N}^{+}} (see Figure 2). The auxiliary table is used to encode directions of \mathcal{M}-reversible arcs, according to which we partition the space D𝒟(i,r)(D𝑿)\bigvee_{D\in\mathcal{D}(i,r)}(D\sim\bm{X}).

Consistency with the resampling table and the auxiliary table. We need some notations about reversible arcs. Suppose DD has a unique sink node ww and uvu\rightarrow v is reversible in DD. Let DD^{\prime} be the DAG obtained from DD by reversing the direction of uvu\rightarrow v. We define φ(D,u,v)D({w})\varphi(D,u,v)\triangleq D^{\prime}(\{w\}). In other words, φ(D,u,v)\varphi(D,u,v) is the prefix of DD^{\prime} with a unique sink node ww. Given (i,i)(i,i^{\prime})\in\mathcal{M} and a pwdag DD, let List(D,i,i)\mathrm{List}(D,i,i^{\prime}) denote the sequence listing all nodes in DD with labels ii or ii^{\prime} in a topological order of GDG_{D}444It is easy to see that List(D,i,i)\mathrm{List}(D,i,i^{\prime}) is well defined. That is, all topological orderings of DD induce the same List(D,i,i)\mathrm{List}(D,i,i^{\prime}).. Given a node vv in DD, if (L(v),i)(L(v),i)\in\mathcal{M}555Because \mathcal{M} is a matching, there is at most one such ii., we define

λ(v,D)|{u:(uv is in D)(L(u){i,L(v)})}|+1\lambda(v,D)\triangleq|\{u:(u\rightarrow v\text{ is in }D)\land(L(u)\in\{i,L(v)\})\}|+1

to be the order of vv in List(D,L(v),i)\mathrm{List}(D,L(v),i). For simplicity of notations, we will use λ(v)\lambda(v) instead of λ(v,D)\lambda(v,D) if DD is clear from the context.

Given a wdag DD, we say an \mathcal{M}-reversible arc uvu\rightarrow v is inconsistent with the auxiliary table 𝒀\bm{Y} if yL(u),L(v)λ(u)=L(v)y_{L(u),L(v)}^{\lambda(u)}=L(v). We say DD is consistent with (𝑿,𝒀)(\bm{X},\bm{Y}), denoted by D(𝑿,𝒀)D\sim(\bm{X},\bm{Y}), if (i) D𝑿D\sim\bm{X} and (ii) for any \mathcal{M}-reversible arc uvu\rightarrow v inconsistent with 𝒚\bm{y}, φ(D,u,v)≁𝑿\varphi(D,u,v)\not\sim\bm{X}. We say 𝒟(i,r)(𝑿,𝒀)\mathcal{D}(i,r)\sim(\bm{X},\bm{Y}) if there is some D𝒟(i,r)D\in\mathcal{D}(i,r) such that D(𝑿,𝒀)D\sim(\bm{X},\bm{Y}).

The intuition of the notion “consistency” is as follows. Suppose uvu\rightarrow v in a \mathcal{M}-reversible arc in DD, and both DD and φ(D,u,v)\varphi(D,u,v) are consistent with the resampling table. But DD and φ(D,u,v)\varphi(D,u,v) cannot both occur. It is according to the auxiliary table to which one of DD and φ(D,u,v)\varphi(D,u,v) we assign (D𝑿)(φ(D,u,v)𝑿)(D\sim\bm{X})\wedge(\varphi(D,u,v)\sim\bm{X}).

Lemma 3.1.

For each i[m]i\in[m] and r+r\in\mathbb{N}^{+}, 𝐗[𝒟(i,r)𝐗]=𝐗,𝐘[𝒟(i,r)(𝐗,𝐘)]\mathbb{P}_{\bm{X}}[\mathcal{D}(i,r)\sim\bm{X}]=\mathbb{P}_{\bm{X},\bm{Y}}[\mathcal{D}(i,r)\sim(\bm{X},\bm{Y})].

Proof.

Fix an arbitrary assignment 𝒙\bm{x} of 𝑿\bm{X} and an arbitrary assignment 𝒚\bm{y} of 𝒀\bm{Y}. Suppose 𝒟(i,r)𝒙\mathcal{D}(i,r)\sim\bm{x}, i.e, D0𝒟(i,r)\exists D_{0}\in\mathcal{D}(i,r) such that D𝒙D\sim\bm{x}. We will show that there must exist some D𝒟(i,r)D\in\mathcal{D}(i,r) such that D(𝒙,𝒚)D\sim(\bm{x},\bm{y}). This will imply the conclusion immediately.

We apply the following procedure to find such a pwdag D𝒟(i,r)D\in\mathcal{D}(i,r).

1 Initially, k=0k=0;
2 while \exists an \mathcal{M}-reversible arc ukvku_{k}\rightarrow v_{k} in DkD_{k} inconsistent with 𝐲\bm{y} such that φ(Dk,uk,vk)𝐱\varphi(D_{k},u_{k},v_{k})\sim\bm{x} do
3       let Dk+1:=φ(Dk,uk,vk)D_{k+1}:=\varphi(D_{k},u_{k},v_{k}) and k:=k+1k:=k+1;
4      
Return DkD_{k};

By induction on kk, it is easy to check that Dk𝒙D_{k}\sim\bm{x} and Dk𝒟(i,r)D_{k}\in\mathcal{D}(i,r) for each kk. Furthermore, if the procedure terminates, then in the final wdag DD, for every \mathcal{M}-reversible arc uvu\rightarrow v inconsistent with 𝒚\bm{y}, we have that φ(D,u,v)≁𝒙\varphi(D,u,v)\not\sim\bm{x}. So D(𝒙,𝒚)D\sim(\bm{x},\bm{y}). In the following, we will show that the procedure always terminates, which finishes the proof.

Note that each DkD_{k} has no more nodes than D0D_{0} and that there are finite number of wdags in 𝒟(i,r)\mathcal{D}(i,r) with no more nodes than D0D_{0}, so it suffices to prove that each wdag appears at most once in the procedure.

By contradiction, assume Dj=DkD_{j}=D_{k} for some jkj\leq k. Recall that ujvju_{j}\rightarrow v_{j} is reversible in DjD_{j} and inconsistent with 𝒚\bm{y}. So yL(uj),L(vj)λ(vj,Dj)1=yL(uj),L(vj)λ(uj,Dj)=L(vj)y_{L(u_{j}),L(v_{j})}^{\lambda(v_{j},D_{j})-1}=y_{L(u_{j}),L(v_{j})}^{\lambda(u_{j},D_{j})}=L(v_{j}).

Let DD_{\ell} be the last wdag in Dj+1,,DkD_{j+1},\cdots,D_{k} such that λ(vj,D)<λ(vj,Dj)\lambda(v_{j},D_{\ell})<\lambda(v_{j},D_{j}). Observing that λ(vj,Dj+1)=λ(vj,Dj)1\lambda(v_{j},D_{j+1})=\lambda(v_{j},D_{j})-1, we have such DD_{\ell} must exist. By λ(vj,Dk)=λ(vj,Dj)\lambda(v_{j},D_{k})=\lambda(v_{j},D_{j}), we have λ(vj,D)=λ(vj,Dj)1\lambda(v_{j},D_{\ell})=\lambda(v_{j},D_{j})-1, λ(vj,D+1)=λ(vj,Dj)\lambda(v_{j},D_{\ell+1})=\lambda(v_{j},D_{j}). Therefore, λ(vj,D+1)=λ(vj,D)+1\lambda(v_{j},D_{\ell+1})=\lambda(v_{j},D_{\ell})+1. Combining with that uvu_{\ell}\rightarrow v_{\ell} is the inconsistent arc in DD_{\ell} which is reversed in D+1D_{\ell+1}, we have u=vju_{\ell}=v_{j}, (L(uj),L(vj))=(L(u),L(v))(L(u_{j}),L(v_{j}))=(L(u_{\ell}),L(v_{\ell}))\in\mathcal{M} and yL(u),L(v)λ(u,D)=L(v)y_{L(u_{\ell}),L(v_{\ell})}^{\lambda(u_{\ell},D_{\ell})}=L(v_{\ell}). Thus we have L(v)=L(uj)L(v_{\ell})=L(u_{j}) and yL(u),L(v)λ(u,D)=L(uj)y_{L(u_{\ell}),L(v_{\ell})}^{\lambda(u_{\ell},D_{\ell})}=L(u_{j}). Note that λ(v,D)=1+λ(u,D)=1+λ(vj,D)\lambda(v_{\ell},D_{\ell})=1+\lambda(u_{\ell},D_{\ell})=1+\lambda(v_{j},D_{\ell}). Combining with λ(uj,Dj)=λ(vj,Dj)1\lambda(u_{j},D_{j})=\lambda(v_{j},D_{j})-1, we have λ(u,D)=λ(uj,Dj)\lambda(u_{\ell},D_{\ell})=\lambda(u_{j},D_{j}). Combining with yL(u),L(v)λ(u,D)=L(uj)y_{L(u_{\ell}),L(v_{\ell})}^{\lambda(u_{\ell},D_{\ell})}=L(u_{j}), we have yL(u),L(v)λ(uj,Dj)=L(uj)y_{L(u_{\ell}),L(v_{\ell})}^{\lambda(u_{j},D_{j})}=L(u_{j}). This is contradicted with yL(uj),L(vj)λ(uj,Dj)=L(vj)y_{L(u_{j}),L(v_{j})}^{\lambda(u_{j},D_{j})}=L(v_{j}).

The following two propositions will be used in the proof of Lemma 3.4. The first proposition is an easy observation, and the second one is a direct application of the Cauchy-Schwarz inequality. For the sake of completeness, we present their proof in the appendix.

Proposition 3.2.

Given any wdag DD, there exists a set 𝒫\mathcal{P} of disjoint \mathcal{M}-reversible arcs666We say two arc uvu\rightarrow v and uvu^{\prime}\rightarrow v^{\prime} are disjoint if their node sets are disjoint, i.e. {u,v}{u,v}=\{u,v\}\cap\{u^{\prime},v^{\prime}\}=\emptyset. such that: for each ii\in\mathcal{M},

|{v:u such that uv or vu is in 𝒫}{v:L(v)=i}|12𝒱(D,i).|\{v:\exists u\text{ such that }u\rightarrow v\text{ or }v\rightarrow u\text{ is in }\mathcal{P}\}\cap\{v:L(v)=i\}|\geq\frac{1}{2}\cdot\mathcal{V}(D,i).
Proposition 3.3.

Suppose X,YX,Y and ZZ are three independent random variables, AA is an event determined by {X,Y}\{X,Y\}, and AA^{\prime} is an event determined by {Y,Z}\{Y,Z\}. Let X1,Y1,Y2,Z1X_{1},Y_{1},Y_{2},Z_{1} be four independent samples of X,Y,Y,ZX,Y,Y,Z, respectively. Then the following holds with probability at most (A)(A)(AA)2\mathbb{P}(A)\mathbb{P}(A^{\prime})-\mathbb{P}(A\cap A^{\prime})^{2}:

  • AA is true on (X1,Y1)(X_{1},Y_{1}), AA^{\prime} is true on (Y2,Z1)(Y_{2},Z_{1}), and

  • either AA is false on (X1,Y2)(X_{1},Y_{2}) or AA^{\prime} is false on (Y1,Z1)(Y_{1},Z_{1}).

Now, we are ready to show Lemma 3.4.

Lemma 3.4.

For each pwdag DD,

[D(𝑿,𝒀)](v𝒱¯(D)pL(v))(v𝒱(D)pL(v)).\mathbb{P}[D\sim(\bm{X},\bm{Y})]\leq\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right).
Proof.

Let 𝒫\mathcal{P} be the set of disjoint \mathcal{M}-reversible arcs defined in Proposition 3.2. Let V(𝒫)V(\mathcal{P}) denote the set of nodes which appears in 𝒫\mathcal{P}, and V(𝒫)¯\overline{V(\mathcal{P})} consists of the other nodes. Proposition 3.2 says that for each ii\in\mathcal{M},

V(𝒫){v:L(v)=i}|12𝒱(D,i).V(\mathcal{P})\cap\{v:L(v)=i\}|\geq\frac{1}{2}\cdot\mathcal{V}(D,i).

For each vV(𝒫)¯v\in\overline{V(\mathcal{P})}, let BvB_{v} denote the event that AL(v)A_{L(v)} holds on 𝑿v\bm{X}_{v}. It is easy to see that [Bv]=pL(v)\mathbb{P}[B_{v}]=p_{L(v)}. Besides,

Claim 3.5.

If D(𝐗,𝐘)D\sim(\bm{X},\bm{Y}), then BvB_{v} holds for each vV(𝒫)¯v\in\overline{V(\mathcal{P})}.

Proof.

Note that 𝑿v\bm{X}_{v} are the assignments of vbl(AL(v))\mathrm{vbl}(A_{L(v)}) just before the time that the MT algorithm picks the event corresponding to vv to resample. MT algorithm decides to pick AL(v)A_{L(v)} only if AL(v)A_{L(v)} holds. Hence AL(v)A_{L(v)} must hold on 𝑿v\bm{X}_{v}. ∎

Let uvu\rightarrow v be an arc in 𝒫\mathcal{P}, where L(u)=iL(u)=i and L(v)=iL(v)=i^{\prime}. Then by the definition of 𝒫\mathcal{P}, we have uvu\rightarrow v is reversible in DD. Let DD^{\prime} be the wdag obtained by reversing the direction of uvu\rightarrow v in DD. Recalling the definition of 𝑿D,v\bm{X}_{D^{\prime},v}, one can verify that

𝑿D,u:={Xj,(v,j):Xjvbl(Ai)vbl(Ai)}{Xj,(u,j):Xjvbl(Ai)vbl(Ai)}\bm{X}_{D^{\prime},u}:=\left\{X_{j,\mathcal{L}(v,j)}:X_{j}\in\mathrm{vbl}\big{(}A_{i}\big{)}\cap\mathrm{vbl}\big{(}A_{i^{\prime}}\big{)}\right\}\cup\left\{X_{j,\mathcal{L}(u,j)}:X_{j}\in\mathrm{vbl}\big{(}A_{i}\big{)}\setminus\mathrm{vbl}\big{(}A_{i^{\prime}}\big{)}\right\}

and

𝑿D,v:={Xj,(u,j):Xjvbl(Ai)vbl(Ai)}{Xj,(v,j):Xjvbl(Ai)vbl(Ai)}.\bm{X}_{D^{\prime},v}:=\left\{X_{j,\mathcal{L}(u,j)}:X_{j}\in\mathrm{vbl}\big{(}A_{i}\big{)}\cap\mathrm{vbl}\big{(}A_{i^{\prime}}\big{)}\right\}\cup\left\{X_{j,\mathcal{L}(v,j)}:X_{j}\in\mathrm{vbl}\big{(}A_{i^{\prime}}\big{)}\setminus\mathrm{vbl}\big{(}A_{i}\big{)}\right\}.

For simplicity, let λ:=λ(u,D)\lambda:=\lambda(u,D). We define Bu,vB_{u,v} to be the event that the following hold:

  • (a)

    AiA_{i} holds on XuX_{u}, and AiA_{i^{\prime}} holds on XvX_{v};

  • (b)

    If Yi,iλ=iY_{i,i^{\prime}}^{\lambda}=i^{\prime}, then either AiA_{i} is false on 𝑿D,u\bm{X}_{D^{\prime},u} or AiA_{i}^{\prime} is false on 𝑿D,v\bm{X}_{D^{\prime},v}.

Conditioned on that Yi,iλ=iY_{i,i^{\prime}}^{\lambda}=i, Bu,vB_{u,v} happens with probability pipip_{i}p_{i^{\prime}}. Condition on that Yi,iλ=iY_{i,i^{\prime}}^{\lambda}=i^{\prime}, by using Proposition 3.3, Bu,vB_{u,v} happens with probability at most pipiδi,i2p_{i}p_{i^{\prime}}-\delta^{2}_{i,i^{\prime}}. Thus,

[Bu,v]\displaystyle\mathbb{P}[B_{u,v}] [Yi,iλ=i]pipi+[Yi,iλ=i](pipiδi,i2)12pipi+12(pipiδi,i2)\displaystyle\leq\mathbb{P}[Y_{i,i^{\prime}}^{\lambda}=i]p_{i}p_{i^{\prime}}+\mathbb{P}[Y_{i,i^{\prime}}^{\lambda}=i^{\prime}]\left(p_{i}p_{i^{\prime}}-\delta^{2}_{i,i^{\prime}}\right)\leq\frac{1}{2}\cdot p_{i}p_{i^{\prime}}+\frac{1}{2}\cdot\left(p_{i}p_{i^{\prime}}-\delta^{2}_{i,i^{\prime}}\right)
pipi(12ci)(12ci).\displaystyle\leq p_{i}p_{i^{\prime}}(1-2c_{i})(1-2c_{i^{\prime}}).
Claim 3.6.

If D(𝐗,𝐘)D\sim(\bm{X},\bm{Y}), then Bu,vB_{u,v} holds for each uvu\rightarrow v in 𝒫\mathcal{P}.

Proof.

Suppose D(𝑿,𝒀)D\sim(\bm{X},\bm{Y}). Similar to the argument in Claim 3.5, we can see that Item (a) holds. In the following, we show Item (b) holds.

By contradiction, assume Yi,iλ=iY_{i,i^{\prime}}^{\lambda}=i^{\prime}, AiA_{i} holds on 𝑿D,u\bm{X}_{D^{\prime},u}, and AiA_{i^{\prime}} holds on 𝑿D,v\bm{X}_{D^{\prime},v}. Then, we have uvu\rightarrow v in DD is inconsistent with 𝒀\bm{Y} and D𝑿D^{\prime}\sim\bm{X}. Thus, φ(D,u,v)𝑿\varphi(D,u,v)\sim\bm{X} since φ(D,u,v)\varphi(D,u,v) is a prefix of DD^{\prime}. By definition, we have D≁(𝑿,𝒀)D\not\sim(\bm{X},\bm{Y}), a contradiction. ∎

Since the events {Bv:vV(𝒫)¯}\{B_{v}:v\in\overline{V(\mathcal{P})}\} and {Bu,v:uv is in 𝒫}\{B_{u,v}:u\rightarrow v\text{ is in }\mathcal{P}\} depend on distinct entries of 𝑿\bm{X} and 𝒀\bm{Y}, they are mutually independent. Therefore,

[D(𝑿,𝒀)]\displaystyle\mathbb{P}\left[D\sim(\bm{X},\bm{Y})\right] [(wV(𝒫¯)Bw)(uv is in 𝒫Bu,v)]=(wV(𝒫)¯(Bw))(uv is in 𝒫(Bu,v))\displaystyle\leq\mathbb{P}\left[\left(\bigcap_{w\in\overline{V(\mathcal{P}})}B_{w}\right)\bigcap\left(\bigcap_{u\rightarrow v\text{ is in }\mathcal{P}}B_{u,v}\right)\right]=\left(\prod_{w\in\overline{V(\mathcal{P})}}\mathbb{P}(B_{w})\right)\left(\prod_{u\rightarrow v\text{ is in }\mathcal{P}}\mathbb{P}(B_{u,v})\right)
(wV(𝒫)¯pL(w))(uv is in 𝒫pL(u)pL(v)(12cL(u))(12cL(u)))\displaystyle\leq\left(\prod_{w\in\overline{V(\mathcal{P})}}p_{L(w)}\right)\left(\prod_{u\rightarrow v\text{ is in }\mathcal{P}}p_{L(u)}p_{L(v)}\left(1-2c_{L(u)}\right)\left(1-2c_{L(u)}\right)\right)
=(v in DpL(v))(i[m](12ci)|𝒫{v:L(v)=i}|)\displaystyle=\left(\prod_{v\text{ in }D}p_{L(v)}\right)\cdot\left(\prod_{i\in[m]}\left(1-2c_{i}\right)^{|\mathcal{P}\cap\{v:L(v)=i\}|}\right)
(v in DpL(v))(i[m](12ci)|𝒱(i)|/2)(v in DpL(v))(i[m](1ci)|𝒱(i)|)\displaystyle\leq\left(\prod_{v\text{ in }D}p_{L(v)}\right)\cdot\left(\prod_{i\in[m]}\left(1-2c_{i}\right)^{|\mathcal{V}(i)|/2}\right)\leq\left(\prod_{v\text{ in }D}p_{L(v)}\right)\cdot\left(\prod_{i\in[m]}\left(1-c_{i}\right)^{|\mathcal{V}(i)|}\right)
=(v𝒱¯(D)pL(v))(v𝒱(D)pL(v)).\displaystyle=\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right).

Now we are ready to prove the main theorem of this subsection.

Theorem 3.7.

𝔼[T]D𝒟(GD)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v))\mathbb{E}[T]\leq\sum_{D\in\mathcal{D}(G_{D})}\big{(}\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\big{)}\big{(}\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\big{)}.

Proof.

First, according to Lemmas 3.1 and  3.4,

[χ𝒟(i,r)]\displaystyle\mathbb{P}[\chi_{\mathcal{D}(i,r)}] [𝒟(i,r)𝑿]=[𝒟(i,r)(𝑿,𝒀)]D𝒟(i,r)[D(𝑿,𝒀)]\displaystyle\leq\mathbb{P}[\mathcal{D}(i,r)\sim\bm{X}]=\mathbb{P}[\mathcal{D}(i,r)\sim(\bm{X},\bm{Y})]\leq\sum_{D\in\mathcal{D}(i,r)}\mathbb{P}[D\sim(\bm{X},\bm{Y})]
D𝒟(i,r)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v)).\displaystyle\leq\sum_{D\in\mathcal{D}(i,r)}\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right).

Then, by Fact 2.3 and the above inequality, we have

𝔼[T]\displaystyle\mathbb{E}[T] =i[m]r+[χ𝒟(i,r)]i[m]r+D𝒟(i,r)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v))\displaystyle=\sum_{i\in[m]}\sum_{r\in\mathbb{N}^{+}}\mathbb{P}[\chi_{\mathcal{D}(i,r)}]\leq\sum_{i\in[m]}\sum_{r\in\mathbb{N}^{+}}\sum_{D\in\mathcal{D}(i,r)}\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right)
D𝒟(GD)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v)).\displaystyle\leq\sum_{D\in\mathcal{D}(G_{D})}\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right).

3.2. Mapping between wdags

In this section, we will prove Theorem 3.13, which provides a upper bound of 𝔼[T]\mathbb{E}[T] in terms of 𝒑\bm{p}^{-}.

Definition 3.8 (Homomorphic dependency graph).

Given a dependency graph GD=([m],ED)G_{D}=([m],E_{D}) and a matching \mathcal{M} of GDG_{D}, we define a graph G=(V,E)G^{\mathcal{M}}=(V^{\mathcal{M}},E^{\mathcal{M}}) homomorphic to GDG_{D} respected to \mathcal{M} as follows.

  • V=[m]{i0,i1:(i0,i1)}{i0,i0,i1,i1:(i0,i1)}V^{\mathcal{M}}=[m]\setminus\{i_{0},i_{1}:(i_{0},i_{1})\in\mathcal{M}\}\cup\{i_{0}^{\uparrow},i_{0}^{\downarrow},i_{1}^{\uparrow},i_{1}^{\downarrow}:(i_{0},i_{1})\in\mathcal{M}\};

  • (i0,i1)ED\forall(i_{0},i_{1})\in E_{D}, each pair of vertices in {i0,i1,i0,i0,i1,i1}V\{i_{0},i_{1},i_{0}^{\uparrow},i_{0}^{\downarrow},i_{1}^{\uparrow},i_{1}^{\downarrow}\}\cap V^{\mathcal{M}} are connected in GG^{\mathcal{M}}.

Besides, we associate a probability vector 𝐩\bm{p}^{\mathcal{M}} with GG^{\mathcal{M}} as follows:

vV:pv={piif v=i for some i[m],pipiif v=i for some i[m],piotherwise, v=i for some i[m].\displaystyle\forall v\in V^{\mathcal{M}}:\quad p^{\mathcal{M}}_{v}=\begin{cases}p_{i}^{\prime}&\text{if }v=i^{\uparrow}\text{ for some }i\in[m],\\ p_{i}^{-}-p_{i}^{\prime}&\text{if }v=i^{\downarrow}\text{ for some }i\in[m],\\ p^{-}_{i}&\text{otherwise, }v=i\text{ for some }i\in[m].\end{cases}
Refer to caption
Figure 3. (a) a dependency graph GDG_{D}; (b) the GG^{\mathcal{M}} when ={(2,3)}\mathcal{M}=\{(2,3)\}.

In fact, (GD,𝒑)(G_{D},\bm{p}^{-}) and (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}) are essentially the same: suppose 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}^{-}), then for each ii\in\mathcal{M}, we view AiA_{i} as the union of two mutually exclusive events AiAiA_{i^{\uparrow}}\cup A_{i^{\downarrow}} whose probabilities are pip_{i}^{\prime} and ppip^{-}-p_{i}^{\prime} respectively. Such a representation of 𝒜\mathcal{A} is of the setting (G,𝒑)(G^{\mathcal{M}},\bm{p}^{\mathcal{M}}).

We have the following proposition, whose proof can be found in the appendix.

Proposition 3.9.

D𝒟(G)v in DpL(v)=D𝒟(GD)v in DpL(v).\sum_{D^{\prime}\in\mathcal{D}(G^{\mathcal{M}})}\prod_{v^{\prime}\text{ in }D^{\prime}}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})}=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\text{ in }D}p^{-}_{L(v)}.

Given a pwdag D=(V,E,L)D=(V,E,L), recall that 𝒱(D)\mathcal{V}(D) is the set of nodes of \mathcal{M}-reversible arcs in DD. Define (D){v:L(v)}\mathscr{M}(D)\triangleq\{v:L(v)\in\mathcal{M}\} to be the set of nodes vv in DD where L(v)L(v) is contained in an edge in \mathcal{M}. Obviously, 𝒱(D)(D)\mathcal{V}(D)\subseteq\mathscr{M}(D). For simplicity of notations, we will omit DD from the notations if DD is clear from the context.

Given a pwdag D=(V,E,L)D=(V,E,L), we use 𝒮={𝒮1,𝒮2,𝒮3,𝒮4}\mathscr{S}=\{\mathscr{S}_{1},\mathscr{S}_{2},\mathscr{S}_{3},\mathscr{S}_{4}\} to represent a partition of (D)\mathscr{M}(D) where 𝒱𝒮1\mathcal{V}\subseteq\mathscr{S}_{1} (some of these four sets are possibly empty). Let ψ(D)\psi(D) denote the set consisting of all such partitions. The formal definition is as follows.

Definition 3.10 (Partition).

Given a pwdag D=(V,E,L)D=(V,E,L) of GDG_{D}, define

ψ(D){{𝒮1,𝒮2,𝒮3,𝒮4}:𝒱𝒮1 and =𝒮1𝒮2𝒮3𝒮4}.\psi(D)\triangleq\{\{\mathscr{S}_{1},\mathscr{S}_{2},\mathscr{S}_{3},\mathscr{S}_{4}\}:\mathcal{V}\subseteq\mathscr{S}_{1}\text{ and }\mathscr{M}=\mathscr{S}_{1}\sqcup\mathscr{S}_{2}\sqcup\mathscr{S}_{3}\sqcup\mathscr{S}_{4}\}.

Given a wdag DD, there may be two or more topological ordering of DD. We fix an arbitrary topological ordering, and denote it by πD\pi_{D}. In the following, we define an injection hh from {(D,𝒮):D𝒟(GD),𝒮ψ(D)}\{(D,\mathscr{S}):D\in\mathcal{D}(G_{D}),\mathscr{S}\in\psi(D)\} to 𝒟(G)\mathcal{D}(G^{\mathcal{M}}).

Definition 3.11.

Given a pwdag DD and 𝒮ψ(D)\mathscr{S}\in\psi(D), define h(D,𝒮)h(D,\mathscr{S}) to be a directed graph D=(V,E,L)D^{\prime}=(V^{\prime},E^{\prime},L^{\prime}) constructed as follows.

Constructing VV^{\prime}. V=V1V2V^{\prime}=V_{1}^{\prime}\sqcup V_{2}^{\prime} where |V1|=|V||V_{1}^{\prime}|=|V| and |V2|=|𝒮3𝒮4||V_{2}^{\prime}|=|\mathscr{S}_{3}\cup\mathscr{S}_{4}|. For convenience of presentation, we fix two bijections f:VV1f:V\rightarrow V_{1}^{\prime} and f:𝒮3𝒮4V2f^{\ast}:\mathscr{S}_{3}\cup\mathscr{S}_{4}\rightarrow V_{2}^{\prime} to name nodes in VV^{\prime}. In order to distinguish between nodes in DD and those in DD^{\prime}, we will always use u,v,wu,v,w to represent the nodes of DD and u,v,wu^{\prime},v^{\prime},w^{\prime} to present the nodes of DD^{\prime}. Given vVv^{\prime}\in V^{\prime}, we use g(v)g(v^{\prime}) to denote the unique node vVv\in V such that f(v)=vf(v)=v^{\prime} (if vV1v^{\prime}\in V_{1}^{\prime}) or f(v)=vf^{\ast}(v)=v^{\prime} (if vV2v^{\prime}\in V_{2}^{\prime}).

Description of LL^{\prime}. For each node vV1v^{\prime}\in V_{1}^{\prime}, where v=f(v)v^{\prime}=f(v),

(2) L(v)={(L(v)),if v𝒮1,(L(v)),if v𝒮2𝒮3𝒮4,L(v),otherwise, v.\displaystyle L^{\prime}(v^{\prime})=\begin{cases}(L(v))^{\uparrow},&\text{if }v\in\mathscr{S}_{1},\\ (L(v))^{\downarrow},&\text{if }v\in\mathscr{S}_{2}\cup\mathscr{S}_{3}\cup\mathscr{S}_{4},\\ L(v),&\text{otherwise, }v\not\in\mathscr{M}.\end{cases}

For each node vV2v^{\prime}\in V_{2}^{\prime}, assuming v𝒮3𝒮4v\in\mathscr{S}_{3}\cup\mathscr{S}_{4} is the node such that v=f(v)v^{\prime}=f^{\ast}(v) and i[m]i\in[m] is the node such that ((L(v),i)((L(v),i)\in\mathcal{M},

(3) L(v)={i,if v𝒮3,i,otherwise, v𝒮4.\displaystyle\quad L^{\prime}(v^{\prime})=\begin{cases}i^{\uparrow},&\text{if }v\in\mathscr{S}_{3},\\ i^{\downarrow},&\text{otherwise, }v\in\mathscr{S}_{4}.\end{cases}

Constructing EE^{\prime}. E=E1E2E^{\prime}=E_{1}^{\prime}\sqcup E_{2}^{\prime} where E1={f(v)f(v):v𝒮3𝒮4}E^{\prime}_{1}=\{f^{\ast}(v)\rightarrow f(v):v\in\mathscr{S}_{3}\cup\mathscr{S}_{4}\} and

E2={uv:((L(u)=L(v))((L(u),L(v))E))(g(u)g(v) in πD)}.E^{\prime}_{2}=\{u^{\prime}\rightarrow v^{\prime}:\big{(}(L^{\prime}(u^{\prime})=L^{\prime}(v^{\prime})\big{)}\lor\big{(}(L^{\prime}(u^{\prime}),L^{\prime}(v^{\prime}))\in E^{\mathcal{M}})\big{)}\land(g(u^{\prime})\prec g(v^{\prime})\text{ in }\pi_{D})\}.
Theorem 3.12.

h(,)h(\cdot,\cdot) is an injection from {(D,𝒮):D𝒟(GD),𝒮ψ(D)}\{(D,\mathscr{S}):D\in\mathcal{D}(G_{D}),\mathscr{S}\in\psi(D)\} to 𝒟(G)\mathcal{D}(G^{\mathcal{M}}).

The proof of Theorem 3.12 is in the appendix. Now we can prove the main theorem of this subsection.

Theorem 3.13.

D𝒟(GD)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v))D𝒟(GD)v in DpL(v).\sum_{D\in\mathcal{D}(G_{D})}\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right)\leq\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\text{ in }D}p^{-}_{L(v)}.

Proof.

For each i[m]i\in[m] where (i,j)(i,j)\in\mathcal{M}, let

qi1pi,qi2pipi,qi3(pipi)pj,and qi4(pipi)(pjpj).\begin{array}[]{llll}q^{1}_{i}\triangleq p^{\prime}_{i},\quad q^{2}_{i}\triangleq p^{-}_{i}-p^{\prime}_{i},\quad q^{3}_{i}\triangleq(p^{-}_{i}-p^{\prime}_{i})p^{\prime}_{j},\quad\text{and }\quad q^{4}_{i}\triangleq(p^{-}_{i}-p^{\prime}_{i})(p^{-}_{j}-p^{\prime}_{j}).\end{array}

According to Fact 2.7, qi1+qi2+qi3+qi4=pi+pj(pipi)piq^{1}_{i}+q^{2}_{i}+q^{3}_{i}+q^{4}_{i}=p^{-}_{i}+p^{-}_{j}(p^{-}_{i}-p^{\prime}_{i})\geq p_{i}.

Given D=(V,E,L)𝒟(GD)D=(V,E,L)\in\mathcal{D}(G_{D}) and 𝒮ψ(D)\mathscr{S}\in\psi(D), let D=h(D,𝒮)D^{\prime}=h(D,\mathscr{S}). For each vv in DD where (L(v),j)(L(v),j)\in\mathcal{M} for some j[m]j\in[m], according to the definition of 𝒑\bm{p}^{\mathcal{M}}, (2), and (3), we have that

  • if v𝒮1v\in\mathscr{S}_{1}, then pL(f(v))=pL(v)=qL(v)1p^{\mathcal{M}}_{L^{\prime}(f(v))}=p^{\prime}_{L(v)}=q^{1}_{L(v)};

  • if v𝒮2v\in\mathscr{S}_{2}, then pL(f(v))=pL(v)pL(v)=qL(v)2p^{\mathcal{M}}_{L^{\prime}(f(v))}=p^{-}_{L(v)}-p^{\prime}_{L(v)}=q^{2}_{L(v)};

  • if v𝒮3v\in\mathscr{S}_{3}, then pL(f(v))pL(f(v))=(pL(v)pL(v))pj=qL(v)3p^{\mathcal{M}}_{L^{\prime}(f(v))}\cdot p^{\mathcal{M}}_{L^{\prime}(f^{\ast}(v))}=(p^{-}_{L(v)}-p^{\prime}_{L(v)})p^{\prime}_{j}=q^{3}_{L(v)};

  • if v𝒮4v\in\mathscr{S}_{4}, then pL(f(v))pL(f(v))=(pL(v)pL(v))(pjpj)=qL(v)4p^{\mathcal{M}}_{L^{\prime}(f(v))}\cdot p^{\mathcal{M}}_{L^{\prime}(f^{\ast}(v))}=(p^{-}_{L(v)}-p^{\prime}_{L(v)})(p^{-}_{j}-p^{\prime}_{j})=q^{4}_{L(v)}.

Moreover, for each uV(D)=𝒱(D)¯(D)u\in V\setminus\mathscr{M}(D)=\overline{\mathcal{V}(D)}\setminus\mathscr{M}(D), we have pL(f(v))=pL(v)p^{\mathcal{M}}_{L^{\prime}(f(v))}=p_{L(v)}. Thus, for each 𝒮ψ(D)\mathscr{S}\in\psi(D),

v in h(D,𝒮)pL(v)\displaystyle\prod_{v^{\prime}\text{ in }h(D,\mathscr{S})}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})} =v𝒱¯pL(v)v𝒮1qL(v)1v𝒮2qL(v)2v𝒮3qL(v)3v𝒮4qL(v)4\displaystyle=\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathscr{S}_{1}}q^{1}_{L(v)}\prod_{v\in\mathscr{S}_{2}}q^{2}_{L(v)}\prod_{v\in\mathscr{S}_{3}}q^{3}_{L(v)}\prod_{v\in\mathscr{S}_{4}}q^{4}_{L(v)}
=v𝒱¯pL(v)v𝒱pL(v)v𝒮1𝒱qL(v)1v𝒮2qL(v)2v𝒮3qL(v)3v𝒮4qL(v)4.\displaystyle=\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)}\prod_{v\in\mathscr{S}_{1}\setminus\mathcal{V}}q^{1}_{L(v)}\prod_{v\in\mathscr{S}_{2}}q^{2}_{L(v)}\prod_{v\in\mathscr{S}_{3}}q^{3}_{L(v)}\prod_{v\in\mathscr{S}_{4}}q^{4}_{L(v)}.

So

𝒮ψ(D)v in h(D,𝒮)pL(v)\displaystyle\sum_{\mathscr{S}\in\psi(D)}\prod_{v^{\prime}\text{ in }h(D,\mathscr{S})}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})} =𝒮ψ(D)v𝒱¯pL(v)v𝒱pL(v)v𝒮1𝒱qL(v)1v𝒮2qL(v)2v𝒮3qL(v)3v𝒮4qL(v)4\displaystyle=\sum_{\mathscr{S}\in\psi(D)}\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)}\prod_{v\in\mathscr{S}_{1}\setminus\mathcal{V}}q^{1}_{L(v)}\prod_{v\in\mathscr{S}_{2}}q^{2}_{L(v)}\prod_{v\in\mathscr{S}_{3}}q^{3}_{L(v)}\prod_{v\in\mathscr{S}_{4}}q^{4}_{L(v)}
=v𝒱¯pL(v)v𝒱pL(v)𝒮ψ(D)v𝒮1𝒱qL(v)1v𝒮2qL(v)2v𝒮3qL(v)3v𝒮4qL(v)4\displaystyle=\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)}\sum_{\mathscr{S}\in\psi(D)}\prod_{v\in\mathscr{S}_{1}\setminus\mathcal{V}}q^{1}_{L(v)}\prod_{v\in\mathscr{S}_{2}}q^{2}_{L(v)}\prod_{v\in\mathscr{S}_{3}}q^{3}_{L(v)}\prod_{v\in\mathscr{S}_{4}}q^{4}_{L(v)}
=v𝒱¯pL(v)v𝒱pL(v)v𝒱(qL(v)1+qL(v)2+qL(v)3+qL(v)4)\displaystyle=\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)}\prod_{v\in\mathscr{M}\setminus\mathcal{V}}\left(q^{1}_{L(v)}+q^{2}_{L(v)}+q^{3}_{L(v)}+q^{4}_{L(v)}\right)
v𝒱¯pL(v)v𝒱pL(v)v𝒱pL(v)\displaystyle\geq\prod_{v\in\overline{\mathcal{V}}\setminus\mathscr{M}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)}\prod_{v\in\mathscr{M}\setminus\mathcal{V}}p_{L(v)}
=v𝒱¯pL(v)v𝒱pL(v),\displaystyle=\prod_{v\in\overline{\mathcal{V}}}p_{L(v)}\prod_{v\in\mathcal{V}}p^{\prime}_{L(v)},

where the third equality is according to the definition of ψ(D)\psi(D). Finally,

D𝒟(GD)(v𝒱¯(D)pL(v))(v𝒱(D)pL(v))\displaystyle\sum_{D\in\mathcal{D}(G_{D})}\left(\prod_{v\in\overline{\mathcal{V}}(D)}p_{L(v)}\right)\left(\prod_{v\in\mathcal{V}(D)}p^{\prime}_{L(v)}\right) D𝒟(GD)𝒮ψ(D)v in h(D,𝒮)pL(v)D𝒟(G)v in DpL(v)\displaystyle\leq\sum_{D\in\mathcal{D}(G_{D})}\sum_{\mathscr{S}\in\psi(D)}\prod_{v^{\prime}\text{ in }h(D,\mathscr{S})}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})}\leq\sum_{D^{\prime}\in\mathcal{D}(G^{\mathcal{M}})}\prod_{v\text{ in }D^{\prime}}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})}
=D𝒟(GD)v in DpL(v),\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\text{ in }D}p^{-}_{L(v)},

where the second inequality is due to Theorem 3.12 and the equality is by Proposition 3.9. ∎

3.3. Putting all things together

The following lemma is implicitly proved in [45].

Lemma 3.14 ([45]).

For any undirected graph GD=([m],ED)G_{D}=([m],E_{D}) and probability vector 𝐩a(GD)/(1+ε)\bm{p}\in\mathcal{I}_{a}(G_{D})/(1+\varepsilon), i[m]q{i}(GD,𝐩)q(GD,𝐩)m/ε\sum_{i\in[m]}\frac{q_{\{i\}}(G_{D},\bm{p})}{q_{\emptyset}(G_{D},\bm{p})}\leq m/\varepsilon.

Theorem 1.6 (restated).

For any 𝒜(GD,𝐩,,𝛅)\mathcal{A}\sim(G_{D},\bm{p},\mathcal{M},\bm{\delta}), if (1+ε)𝐩a(GD)(1+\varepsilon)\cdot\bm{p}^{-}\in\mathcal{I}_{a}(G_{D}), then the expected number of resampling steps performed by MT algorithm is most m/εm/\varepsilon, where mm is the number of events in 𝒜\mathcal{A}.

Proof.

Fix any such 𝒜\mathcal{A}. We have that

𝔼[T]D𝒟(GD)v in DpL(v)i[m]q{i}(GD,𝒑)q(GD,𝒑)mε,\mathbb{E}[T]\leq\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\text{ in }D}p^{-}_{L(v)}\leq\frac{\sum_{i\in[m]}q_{\{i\}}(G_{D},\bm{p}^{-})}{q_{\emptyset}(G_{D},\bm{p}^{-})}\leq\frac{m}{\varepsilon},

where the first inequality is by Theorems 3.7 and 3.13, the second inequality is due to Theorem 4 in [45], and the last inequality is according to Lemma 3.14. ∎

4. Lower bound on the amount of intersection

In order to explore how far beyond Shearer’s bound MT algorithm is still efficient in general, we provide a lower bound on the amount of intersection between dependent events for general instances (Theorem 4.1).

We first introduce some notations. Given a bipartite graph GB=([m],[n],EB)G_{B}=([m],[n],E_{B}), we call the vertex i[m]i\in[m] left vertex and the vertex j[n]j\in[n] right vertex. We call GBG_{B} linear777The notion is not arbitrary. The bipartite graph GBG_{B} can be represented by a hypergraph in a natural way: each right vertex jj is represented by a node vjv_{j} in the hypergraph, each left vertex ii is represented by a hyperedge eie_{i}, and vjv_{j} is in eie_{i} if and only if (i,j)EB(i,j)\in E_{B}. A hypergraph is called linear if any two hyperedges share at most one node. if any two left vertices in [m][m] share at most one common neighbor in [n][n]. Let ΔD(GB)\Delta_{D}(G_{B}) denote the maximum degree of GD(GB)G_{D}(G_{B}), and ΔB(GB)\Delta_{B}(G_{B}) denote the maximum degree of the left vertices in GBG_{B}. If GBG_{B} is clear from the context, we may omit GBG_{B} from these notations. In addition, for a bipartite graph G=(L[m],R,E)G=(L\subset[m],R,E) and a probability vector 𝒑(0,1)m\bm{p}\in(0,1)^{m}, we define888It is possible that ϝ(G,𝒑)<0\digamma(G,\bm{p})<0.

ϝ(G,𝒑)(miniLpi)2(|iL𝒩G(i)|+iL|𝒩G(i)|pi1/|𝒩G(i)|)|L|ΔD(G)ΔB(G)2.\digamma(G,\bm{p})\triangleq\frac{\big{(}\min_{i\in L}p_{i}\big{)}^{2}\cdot\left(-|\cup_{i\in L}\mathcal{N}_{G}(i)|+\sum_{i\in L}|\mathcal{N}_{G}(i)|\cdot p_{i}^{1/|\mathcal{N}_{G}(i)|}\right)}{\sqrt{|L|}\cdot\Delta_{D}(G)\cdot\Delta_{B}(G)^{2}}.

and ϝ+(G,𝒑)max{ϝ(G,𝒑),0}\digamma^{+}(G,\bm{p})\triangleq\max\{\digamma(G,\bm{p}),0\}.

We use 𝒜(GB,𝒑)\mathcal{A}\sim(G_{B},\bm{p}) to denote that (i) GBG_{B} is an event-variable graph of 𝒜\mathcal{A} and (ii) the probability vector of 𝒜\mathcal{A} is 𝒑\bm{p}. Let ={(i1,i1),(i2,i2),}\mathcal{M}=\{(i_{1},i_{1}^{\prime}),(i_{2},i_{2}^{\prime}),\cdots\} be a matching of GD(GB)G_{D}(G_{B}), and 𝜹=(δi1,i1,δi2,i2,)(0,1)||\bm{\delta}=(\delta_{i_{1},i_{1}^{\prime}},\delta_{i_{2},i_{2}^{\prime}},\cdots)\in(0,1)^{|\mathcal{M}|} be another probability vector. We say that an event set 𝒜\mathcal{A} is of the setting (GB,𝒑,,𝜹)(G_{B},\bm{p},\mathcal{M},\bm{\delta}), and write 𝒜(GB,𝒑,,𝜹)\mathcal{A}\sim(G_{B},\bm{p},\mathcal{M},\bm{\delta}), if 𝒜(GB,𝒑)\mathcal{A}\sim(G_{B},\bm{p}) and (AiAi)δi,i\mathbb{P}(A_{i}\cap A_{i^{\prime}})\geq\delta_{i,i^{\prime}} for each pair (i,i)(i,i^{\prime})\in\mathcal{M}.

We call an event AA elementary, if AA can be written as (Xi1Si1)(Xi2Si2)(XikSik)(X_{i_{1}}\in S_{i_{1}})\land(X_{i_{2}}\in S_{i_{2}})\land\cdots\land(X_{i_{k}}\in S_{i_{k}}) where Si1,,SikS_{i_{1}},\cdots,S_{i_{k}} are subsets of the domains of variables. We call an event set 𝒜\mathcal{A} elementary if all events in 𝒜\mathcal{A} are elementary.

Theorem 4.1.

Let GB=([m],[n],EB)G_{B}=([m],[n],E_{B}) be a bipartite graph, 𝐩(0,1]m\bm{p}\in(0,1]^{m} be a probability vector, and L1,L2,,LtL_{1},L_{2},\cdots,L_{t} be a collection of disjoint subsets of [m][m]. For each k[t]k\in[t], let GkG_{k} denote the induced subgraph on Lk(iLk𝒩GB(i))L_{k}\cup\left(\cup_{i\in L_{k}}\mathcal{N}_{G_{B}}(i)\right) and EkE_{k} denote the edge set of GD(Gk)G_{D}(G_{k}). If all GkG_{k}’s are linear, then the following holds.

If 𝒜(GB,𝐩)\mathcal{A}\sim(G_{B},\bm{p}), then there is a matching \mathcal{M} of GD(GB)G_{D}(G_{B}) satisfying that (i,i)Ek(AiAi)2(ϝ+(Gk,𝐩))2\sum_{(i,i^{\prime})\in\mathcal{M}\cap E_{k}}\mathbb{P}(A_{i}\cap A_{i^{\prime}})^{2}\geq\left(\digamma^{+}(G_{k},\bm{p})\right)^{2} for any kk.

The proof of Theorem 4.1 mainly consists of two parts. First, we show that there is an elementary event set which approximately achieves the minimum amount of intersection between dependent events (Lemma 4.2). Then, for elementary event sets, by applying AM-GM inequality, we obtain a lower bound on the total amount of overlap on common variables, which further implies a lower bound on the amount of intersection between dependent events (Lemma 4.5).

Lemma 4.2.

Let GB=([m],[n],EB)G_{B}=([m],[n],E_{B}) be a linear bipartite graph, EDE_{D} be the edge set of GD(GB)G_{D}(G_{B}), and 𝐩(0,1]m\bm{p}\in(0,1]^{m} is a probability vector. Let γ\gamma denote the minimum (i0,i1)ED[Ai0Ai1]\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}[A_{i_{0}}\cap A_{i_{1}}] among all event sets 𝒜=(A1,,Am)(GB,𝐩)\mathcal{A}=(A_{1},\cdots,A_{m})\sim(G_{B},\bm{p}). Then there is an elementary event set 𝒜\mathcal{A}^{\prime} such that (i0,i1)ED[Ai0Ai1](ΔB(GB))2γ\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}[A_{i_{0}}^{\prime}\cap A_{i_{1}}^{\prime}]\leq(\Delta_{B}(G_{B}))^{2}\cdot\gamma.

Proof.

For simplicity, we let ΔΔB(GB)\Delta\triangleq\Delta_{B}(G_{B}). Without loss of generality, we assume that each random variable XiX_{i} is uniformly distributed over [0,1][0,1]. Let 𝒜(GB,𝒑)\mathcal{A}\sim(G_{B},\bm{p}) be an event set where (i0,i1)ED[AiAj]=γ\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}[A_{i}\cap A_{j}]=\gamma. We will replace AiA_{i} with an elementary AiA_{i}^{\prime} one by one for each i=1,2,,mi=1,2,\cdots,m, so that the resulted event set 𝒜\mathcal{A}^{\prime} satisfies (i0,i1)ED[Ai0Ai1]Δ2(i0,i1)ED[Ai0Ai1]=Δ2γ\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}[A_{i_{0}}^{\prime}\cap A_{i_{1}}^{\prime}]\leq\Delta^{2}\cdot\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}[A_{i_{0}}\cap A_{i_{1}}]=\Delta^{2}\cdot\gamma.

More precisely, fix i[m]i\in[m] and suppose A1,,Ai1A_{1},\cdots,A_{i-1} have been replaced with elementary events A1,,Ai1A_{1}^{\prime},\cdots,A_{i-1}^{\prime} respectively. For simplicity of notations, for any pair i0<i1i_{0}<i_{1}, we abbreviate [Ai0Ai1]\mathbb{P}[A_{i_{0}}\cap A_{i_{1}}], [Ai0Ai1]\mathbb{P}[A_{i_{0}}^{\prime}\cap A_{i_{1}}] and [Ai0Ai1]\mathbb{P}[A_{i_{0}}^{\prime}\cap A_{i_{1}}^{\prime}] to pi0,i1p_{i_{0},i_{1}}, pi0,i1p_{i_{0},i_{1}}^{\prime} and pi0,i1′′p_{i_{0},i_{1}}^{\prime\prime} respectively. Without loss of generality, we assume AiA_{i} depends on variables X1,X2,,XkX_{1},X_{2},\cdots,X_{k}. For every j[k]j\in[k], we define

Pj(xj):=i0<i,i0𝒩GB(j)1Δ[Ai0Xj=xj]+i0>i,i0𝒩GB(j)[Ai0Xj=xj].P_{j}(x_{j}):=\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\frac{1}{\Delta}\cdot\mathbb{P}[A_{i_{0}}^{\prime}\mid X_{j}=x_{j}]+\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\mathbb{P}[A_{i_{0}}\mid X_{j}=x_{j}].

for xj[0,1]x_{j}\in[0,1]. Without loss of generality, we assume Pj()P_{j}(\cdot) is non-decreasing. Let μ:[0,1]k{0,1}\mu:[0,1]^{k}\rightarrow\{0,1\} be the indicator of AiA_{i}, then

x1,,xkμ(x1,,xk)dx1dxk=[Ai],\displaystyle\int_{x_{1},\cdots,x_{k}}\mu(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}=\mathbb{P}[A_{i}],

For each j[k]j\in[k], let

μj(xj):=[AiXj=xj]=x1,,xj1,xj+1,,xkμ(x1,,xk)dx1dxj1dxj+1dxk.\mu_{j}(x_{j}):=\mathbb{P}[A_{i}\mid X_{j}=x_{j}]=\int_{x_{1},\cdots,x_{j-1},x_{j+1},\cdots,x_{k}}\mu(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{j-1}\mathrm{d}x_{j+1}\cdots\mathrm{d}x_{k}.

Noticing that GBG_{B} is linear (i.e., any two events share at most one common variable), we have

(4) xjPj(xj)μj(xj)dxj=i0<i,i0𝒩GB(j)pi0,iΔ+i0>i,i0𝒩GB(j)pi0,i.\displaystyle\int_{x_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}=\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\frac{p^{\prime}_{i_{0},i}}{\Delta}+\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p_{i_{0},i}.

Let AiA_{i}^{\prime} be an elementary event such that it happens if and only if (x1,,xk)[0,q1]××[0,qk](x_{1},\cdots,x_{k})\in[0,q_{1}]\times\cdots\times[0,q_{k}]. Here q1,,qkq_{1},\cdots,q_{k} is a set of positive real numbers satisfying that

  • (i)

    Πj=1kqj=[Ai]\Pi_{j=1}^{k}q_{j}=\mathbb{P}[A_{i}]. That is, [Ai]=[Ai]\mathbb{P}[A_{i}^{\prime}]=\mathbb{P}[A_{i}];

  • (ii)

    x1q1μ1(x1)dx1=x2q2μ2(x2)dx2=xkqkμk(xk)dxk\int_{x_{1}\geq q_{1}}\mu_{1}(x_{1})\mathrm{d}x_{1}=\int_{x_{2}\geq q_{2}}\mu_{2}(x_{2})\mathrm{d}x_{2}\cdots=\int_{x_{k}\geq q_{k}}\mu_{k}(x_{k})\mathrm{d}x_{k}.

Claim 4.3.

Such {q1,,qk}\{q_{1},\cdots,q_{k}\} exists. Thus so does AiA_{i}^{\prime}.

Proof.

We prove a generalized statement in which Πj=1kqj\Pi_{j=1}^{k}q_{j} can be required to be an arbitrary number in [0,1][0,1]. Our proof is by induction on kk. The base case when k=1k=1 is trivial. Now we assume that for any preset q(0,1]q^{\prime}\in(0,1], there exist {q1,,qk1}\{q_{1},\cdots,q_{k-1}\} satisfying that

  • (i)

    Πj=1k1qj=q\Pi_{j=1}^{k-1}q_{j}=q^{\prime} and

  • (ii)

    x1q1μ1(x1)dx1==xk1qk1μk1(xk1)dxk1\int_{x_{1}\geq q_{1}}\mu_{1}(x_{1})\mathrm{d}x_{1}=\cdots=\int_{x_{k-1}\geq q_{k-1}}\mu_{k-1}(x_{k-1})\mathrm{d}x_{k-1}.

Let f(q)f(q^{\prime}) denote the minimum x1q1μ1(x1)dx1\int_{x_{1}\geq q_{1}}\mu_{1}(x_{1})\mathrm{d}x_{1} among all such {q1,,qk1}\{q_{1},\cdots,q_{k-1}\}’s. It is easy to see that f(1)=0f(1)=0 and ff is continuous and non-increasing.

Fix an arbitrary q[0,1]q\in[0,1]. We define g(q′′):=xkq/qμk(xk)dxkg(q^{\prime\prime}):=\int_{x_{k}\geq q/q^{\prime}}\mu_{k}(x_{k})\mathrm{d}x_{k} for q′′[q,1]q^{\prime\prime}\in[q,1]. Obviously, g(q)=0g(q)=0 and gg is continuous and non-decreasing. So there must exist a q[q,1]q^{\ast}\in[q,1] such that g(q)=f(q)g(q^{\ast})=f(q^{\ast}). Then let {q1,,qk1}\{q_{1}^{\ast},\cdots,q_{k-1}^{\ast}\} be a set of positive real numbers where

  • (i)

    Πj=1k1qj=q\Pi_{j=1}^{k-1}q_{j}^{\ast}=q^{\ast} and

  • (ii)

    f(q)=x1q1μ1(x1)dx1==xk1qk1μk1(xk1)dxk1f(q^{\ast})=\int_{x_{1}\geq q_{1}^{\ast}}\mu_{1}(x_{1})\mathrm{d}x_{1}=\cdots=\int_{x_{k-1}\geq q_{k-1}^{\ast}}\mu_{k-1}(x_{k-1})\mathrm{d}x_{k-1}.

Let qk=q/qq_{k}^{\ast}=q/q^{\ast}. It is obvious that Πj=1kqj=q\Pi_{j=1}^{k}q_{j}^{\ast}=q and f(q)=g(q)=xkqkμk(xk)dxkf(q^{\ast})=g(q^{\ast})=\int_{x_{k}\geq q_{k}^{\ast}}\mu_{k}(x_{k})\mathrm{d}x_{k}. This completes the induction step. ∎

Claim 4.4.

For every j[k]j\in[k], we have

i0<i,i0𝒩GB(j)pi0,i′′Δ+i0>i,i0𝒩GB(j)pi0,ii0<i,i0𝒩GB(j)pi0,i+Δi0>i,i0𝒩GB(j)pi0,i.\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\frac{p_{i_{0},i}^{\prime\prime}}{\Delta}+\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p_{i_{0},i}^{\prime}\leq\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p^{\prime}_{i_{0},i}+\Delta\cdot\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p_{i_{0},i}.
Proof.

Let μi,μii\mu_{i^{\prime}},\mu_{i\cap i^{\prime}}, and μii\mu_{i^{\prime}\setminus i} denote the indicator functions of the events AiA_{i}^{\prime}, AiAiA_{i}^{\prime}\cap A_{i}, and AiAiA_{i}^{\prime}\setminus A_{i} respectively. Since [Ai]=[Ai]\mathbb{P}[A_{i}^{\prime}]=\mathbb{P}[A_{i}],

x1q1μ1(x1)dx1++xkqkμk(xk)dxk[AiAi]=[AiAi]=x1,,xkμii(x1,,xk)dx1dxk.\int_{x_{1}\geq q_{1}}\mu_{1}(x_{1})\mathrm{d}x_{1}+\cdots+\int_{x_{k}\geq q_{k}}\mu_{k}(x_{k})\mathrm{d}x_{k}\geq\mathbb{P}[A_{i}\setminus A_{i}^{\prime}]=\mathbb{P}[A_{i}^{\prime}\setminus A_{i}]=\int_{x_{1},\cdots,x_{k}}\mu_{i^{\prime}\setminus i}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}.

Fix j[k]j\in[k], then

xjqjμj(xj)dxj1kx1,x2,,xkμii(x1,,xk)dx1dxk.\int_{x_{j}\geq q_{j}}\mu_{j}(x_{j})\mathrm{d}x_{j}\geq\frac{1}{k}\cdot\int_{x_{1},x_{2},\cdots,x_{k}}\mu_{i^{\prime}\setminus i}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}.

Since Pj(xj)P_{j}(x_{j}) is non-decreasing and kΔk\leq\Delta, we have

xjqjPj(xj)μj(xj)dxj1Δx1,x2,,xkPj(xj)μii(x1,,xk)dx1dxk.\int_{x_{j}\geq q_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}\geq\frac{1}{\Delta}\cdot\int_{x_{1},x_{2},\cdots,x_{k}}P_{j}(x_{j})\mu_{i^{\prime}\setminus i}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}.

According to Equation 4,

i0<i,i0𝒩GB(j)pi0,i+Δi0>i,i0𝒩GB(j)pi0,i=ΔxjPj(xj)μj(xj)dxj\displaystyle\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p^{\prime}_{i_{0},i}+\Delta\cdot\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}p_{i_{0},i}=\Delta\cdot\int_{x_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}
=\displaystyle= ΔxjqjPj(xj)μj(xj)dxj+Δxj<qjPj(xj)μj(xj)dxj\displaystyle\Delta\cdot\int_{x_{j}\geq q_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}+\Delta\cdot\int_{x_{j}<q_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}
\displaystyle\geq ΔxjqjPj(xj)μj(xj)dxj+Δx1,,xkPj(xj)μii(x1,,xk)dx1dxk\displaystyle\Delta\cdot\int_{x_{j}\geq q_{j}}P_{j}(x_{j})\mu_{j}(x_{j})\mathrm{d}x_{j}+\Delta\cdot\int_{x_{1},\cdots,x_{k}}P_{j}(x_{j})\mu_{i\cap i^{\prime}}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}
\displaystyle\geq x1,,xkPj(xj)μii(x1,,xk)dx1dxk+x1,,xkPj(xj)μii(x1,,xk)dx1dxk\displaystyle\int_{x_{1},\cdots,x_{k}}P_{j}(x_{j})\mu_{i^{\prime}\setminus i}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}+\int_{x_{1},\cdots,x_{k}}P_{j}(x_{j})\mu_{i\cap i^{\prime}}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}
=\displaystyle= x1,,xkPj(xj)μi(x1,,xk)dx1dxk\displaystyle\int_{x_{1},\cdots,x_{k}}P_{j}(x_{j})\mu_{i^{\prime}}(x_{1},\cdots,x_{k})\mathrm{d}x_{1}\cdots\mathrm{d}x_{k}
=\displaystyle= i0<i,i0𝒩GB(j)[Ai0Ai]Δ+i0>i,i0𝒩GB(j)[Ai0Ai].\displaystyle\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\frac{\mathbb{P}[A_{i_{0}}^{\prime}\cap A_{i}^{\prime}]}{\Delta}+\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{B}}(j)}\mathbb{P}[A_{i_{0}}\cap A_{i}^{\prime}].

This completes the proof. ∎

From Claim 4.4, we have

(5) i0<i,i0𝒩GD(i)pi0,i′′Δ+i0>i,i0𝒩GD(i)pi0,ii0<i,i0𝒩GD(i)pi0,i+Δi0>i,i0𝒩GD(i)pi0,i,\displaystyle\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{D}}(i)}\frac{p_{i_{0},i}^{\prime\prime}}{\Delta}+\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{D}}(i)}p_{i_{0},i}^{\prime}\leq\sum_{i_{0}<i,i_{0}\in\mathcal{N}_{G_{D}}(i)}p^{\prime}_{i_{0},i}+\Delta\cdot\sum_{i_{0}>i,i_{0}\in\mathcal{N}_{G_{D}}(i)}p_{i_{0},i},

By summation over all i[m]i\in[m], we finish the proof:

(i0,i)EDpi0,i′′ΔΔ(i0,i)EDpi0,i.\sum_{(i_{0},i)\in E_{D}}\frac{p_{i_{0},i}^{\prime\prime}}{\Delta}\leq\Delta\cdot\sum_{(i_{0},i)\in E_{D}}p_{i_{0},i}.

Lemma 4.5.

Let GB=([m],[n],EB)G_{B}=([m],[n],E_{B}) be a linear bipartite graph and 𝐩\bm{p} be a probability vector. Then for any elementary 𝒜=(A1,,Am)(GB,𝐩)\mathcal{A}=(A_{1},\cdots,A_{m})\sim(G_{B},\bm{p}),

(i0,i1)ED(Ai0Ai1)mΔD(GB)ΔB(GB)2ϝ(GB,𝒑),\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}\left(A_{i_{0}}\cap A_{i_{1}}\right)\geq\sqrt{m}\cdot\Delta_{D}(G_{B})\cdot\Delta_{B}(G_{B})^{2}\cdot\digamma(G_{B},\bm{p}),

where EDE_{D} is the edge set of GD(GB)G_{D}(G_{B})

Proof.

For simplicity of notation, we let 𝒩\mathcal{N} stand for 𝒩GB\mathcal{N}_{G_{B}}. Without loss of generality, we assume that each variable XiX_{i} is uniformly distributed over [0,1][0,1]. As 𝒜\mathcal{A} is elementary, each AiA_{i} can be written as j𝒩(i)[XjSij]\bigwedge_{j\in\mathcal{N}(i)}[X_{j}\in S_{i}^{j}] where Sij[0,1]S_{i}^{j}\subset[0,1]. Let μ\mu be the Lebesgue measure.

On one hand, according to the AM–GM inequality,

(6) i[m]j𝒩(i)μ(Sij)i[m]|𝒩(i)|(Πj𝒩(i)μ(Sij))1/|𝒩(i)|=i[m]|𝒩(i)|pi1/|𝒩(i)|.\displaystyle\sum_{i\in[m]}\sum_{j\in\mathcal{N}(i)}\mu(S^{j}_{i})\geq\sum_{i\in[m]}|\mathcal{N}(i)|\cdot\big{(}\Pi_{j\in\mathcal{N}(i)}\mu(S^{j}_{i})\big{)}^{1/|\mathcal{N}(i)|}=\sum_{i\in[m]}|\mathcal{N}(i)|\cdot p_{i}^{1/|\mathcal{N}(i)|}.

On the other hand,

(7) i[m]j𝒩(i)μ(Sij)=j[n]i𝒩(i)μ(Sij)n+j[n]i0i1𝒩(j)μ(Si0jSi1j)\displaystyle\sum_{i\in[m]}\sum_{j\in\mathcal{N}(i)}\mu(S^{j}_{i})=\sum_{j\in[n]}\sum_{i\in\mathcal{N}(i)}\mu(S^{j}_{i})\leq n+\sum_{j\in[n]}\sum_{i_{0}\neq i_{1}\in\mathcal{N}(j)}\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)

By Inequalities 6 and 7 and noticing GBG_{B} is linear, we have that

(8) (i0,i1)EDj𝒩(i0)𝒩(i1)μ(Si0jSi1j)=j[n]i0i1𝒩(j)μ(Si0jSi1j)(i[m]|𝒩(i)|pi1/|𝒩(i)|)n.\displaystyle\sum_{(i_{0},i_{1})\in E_{D}}\sum_{j\in\mathcal{N}(i_{0})\cap\mathcal{N}(i_{1})}\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)=\sum_{j\in[n]}\sum_{i_{0}\neq i_{1}\in\mathcal{N}(j)}\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)\geq\left(\sum_{i\in[m]}|\mathcal{N}(i)|\cdot p_{i}^{1/|\mathcal{N}(i)|}\right)-n.

Moreover, given any (i0,i1)ED(i_{0},i_{1})\in E_{D}, where {j}=𝒩(i)𝒩(i)\{j\}=\mathcal{N}(i)\cap\mathcal{N}(i^{\prime}), we have that

(9) (Ai0Ai1)\displaystyle\mathbb{P}(A_{i_{0}}\cap A_{i_{1}}) μ(Si0jSi1j)(k𝒩(i0){j}μ(Si0k))(k𝒩(i1){j}μ(Si1k))\displaystyle\geq\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)\cdot\left(\prod_{k\in\mathcal{N}(i_{0})\setminus\{j\}}\mu(S^{k}_{i_{0}})\right)\cdot\left(\prod_{k^{\prime}\in\mathcal{N}(i_{1})\setminus\{j\}}\mu(S^{k^{\prime}}_{i_{1}})\right)
μ(Si0jSi1j)pi0pi1.\displaystyle\geq\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)\cdot p_{i_{0}}\cdot p_{i_{1}}.

Finally, combining (8) with (9), we concludes that

(i0,i1)ED(Ai0Ai1)\displaystyle\sum_{(i_{0},i_{1})\in E_{D}}\mathbb{P}(A_{i_{0}}\cap A_{i_{1}}) (i0,i1)EDj𝒩(i0)𝒩(i1)μ(Si0jSi1j)pi0pi1\displaystyle\geq\sum_{(i_{0},i_{1})\in E_{D}}\sum_{j\in\mathcal{N}(i_{0})\cap\mathcal{N}(i_{1})}\mu\left(S^{j}_{i_{0}}\cap S^{j}_{i_{1}}\right)\cdot p_{i_{0}}\cdot p_{i_{1}}
(mini[m]pi)2(i|𝒩(i)|pi1/|𝒩(i)|n)\displaystyle\geq\left(\min_{i\in[m]}p_{i}\right)^{2}\left(\sum_{i}|\mathcal{N}(i)|\cdot p_{i}^{1/|\mathcal{N}(i)|}-n\right)
=mΔD(GB)ΔB(GB)2ϝ(GB,𝒑).\displaystyle=\sqrt{m}\cdot\Delta_{D}(G_{B})\cdot\Delta_{B}(G_{B})^{2}\cdot\digamma(G_{B},\bm{p}).

The following lemma is a special case of Theorem 4.1 where t=1t=1 and L1=[m]L_{1}=[m]. In fact, Theorem 4.1 is proved by applying Lemma 4.6 to each GkG_{k} separately.

Lemma 4.6.

Let GB=([m],[n],EB)G_{B}=([m],[n],E_{B}) be a linear bipartite graph and 𝐩\bm{p} be a probability vector. If 𝒜(GB,𝐩)\mathcal{A}\sim(G_{B},\bm{p}), then 𝒜(GB,𝐩,,𝛅)\mathcal{A}\sim(G_{B},\bm{p},\mathcal{M},\bm{\delta}) for some matching \mathcal{M} of GD(GB)G_{D}(G_{B}) and some 𝛅(0,1)||\bm{\delta}\in(0,1)^{|\mathcal{M}|} satisfying that (i,i)δi,i2(ϝ+(GB,𝐩))2\sum_{(i,i^{\prime})\in\mathcal{M}}\delta^{2}_{i,i^{\prime}}\geq\big{(}\digamma^{+}(G_{B},\bm{p})\big{)}^{2}.

Proof.

Given an instance 𝒜(GB,𝒑)\mathcal{A}\sim(G_{B},\bm{p}), we construct such a \mathcal{M} greedily as follows.

We maintain two sets EE and \mathcal{M}, which are initialized as EDE_{D} and \emptyset respectively. We do the following iteratively until EE becomes empty: select a edge (i0,i1)(i_{0},i_{1}) with maximum (Ai0Ai1)\mathbb{P}(A_{i_{0}}\cap A_{i_{1}}) from EE, add (i0,i1)(i_{0},i_{1}) to \mathcal{M}, and delete all edges connecting i0i_{0} or i1i_{1} from EE (including (i0,i1)(i_{0},i_{1})).

Let ΔD\Delta_{D} and ΔB\Delta_{B} denote ΔD(GB)\Delta_{D}(G_{B}) and ΔB(GB)\Delta_{B}(G_{B}) respectively. In each iteration, at most 2ΔD2\Delta_{D} edges are deleted from EE and for each deleted edge (i,i)(i,i^{\prime}), (AiAi)2(Ai0Ai1)2\mathbb{P}(A_{i}\cap A_{i^{\prime}})^{2}\leq\mathbb{P}(A_{i_{0}}\cap A_{i_{1}})^{2}. Based on this observation, it is easy to see that

(10) (i0,i1)(Ai0Ai1)212ΔD(i,i)ED(AiAi)2.\displaystyle\sum_{(i_{0},i_{1})\in\mathcal{M}}\mathbb{P}(A_{i_{0}}\cap A_{i_{1}})^{2}\geq\frac{1}{2\Delta_{D}}\sum_{(i,i^{\prime})\in E_{D}}\mathbb{P}(A_{i}\cap A_{i^{\prime}})^{2}.

Moreover, according to Lemma 4.2 and  4.5, it has that

(11) (i,i)ED(AiAi)21|ED|((i,i)ED(AiAi))2mΔD2(ϝ+(GB,𝒑))2|ED|,\displaystyle\sum_{(i,i^{\prime})\in E_{D}}\mathbb{P}(A_{i}\cap A_{i^{\prime}})^{2}\geq\frac{1}{|E_{D}|}\cdot\left(\sum_{(i,i^{\prime})\in E_{D}}\mathbb{P}(A_{i}\cap A_{i^{\prime}})\right)^{2}\geq\frac{m\cdot\Delta_{D}^{2}\cdot\left(\digamma^{+}(G_{B},\bm{p})\right)^{2}}{|E_{D}|},

By combining Inequality 10 and 11, setting δi,i=(AiAi)\delta_{i,i^{\prime}}=\mathbb{P}(A_{i}\cap A_{i^{\prime}}), and noting 2|ED|mΔD2|E_{D}|\leq m\Delta_{D}, we finish the proof. ∎

Proof of Theorem 4.1.

For each k[t]k\in[t], by applying Lemma 4.6 to GkG_{k}, we have that 𝒜(GB,𝒑,k,𝜹k)\mathcal{A}\sim(G_{B},\bm{p},\mathcal{M}_{k},\bm{\delta}_{k}) for some matching kEk\mathcal{M}_{k}\subseteq E_{k} and some 𝜹k\bm{\delta}_{k} where (i,i)kδi,i2(ϝ+(Gk,𝒑))2\sum_{(i,i^{\prime})\in\mathcal{M}_{k}}\delta^{2}_{i,i^{\prime}}\geq\big{(}\digamma^{+}(G_{k},\bm{p})\big{)}^{2}. Note that EkE_{k}’s are disjoint with each other, so 12t\mathcal{M}_{1}\cup\mathcal{M}_{2}\cup\cdots\cup\mathcal{M}_{t} is still a matching. By letting =12t\mathcal{M}=\mathcal{M}_{1}\cup\mathcal{M}_{2}\cup\cdots\cup\mathcal{M}_{t} and 𝜹=(𝜹1,,𝜹t)\bm{\delta}=(\bm{\delta}_{1},\cdots,\bm{\delta}_{t}), we conclude the theorem. ∎

Remark 4.7.

Given a bipartite graph GG, its simplified graph is defined to be obtained from GG by deleting all the right nodes which only have one neighbor and combining all the right nodes with the same neighbor set. Notice that if GG is linear, so is its simplified graph.

Theorem 4.1 can be slightly generalized: it is sufficient that the simplified graph of GkG_{k} instead of GkG_{k} itself is linear.

5. The Moser-Tardos algorithm is beyond Shearer’s bound

In this section, we prove Theorem 1.5. Given a dependency graph GDG_{D}, a vector 𝒑\bm{p} and a chordless cycle CC in GDG_{D}, define

r(GD,𝒑,C)|C|(minjCpj)4(2jCpj|C|1)2.r(G_{D},\bm{p},C)\triangleq|C|\cdot\big{(}\min_{j\in C}p_{j}\big{)}^{4}\cdot\left(\frac{2\sum_{j\in C}\sqrt{p_{j}}}{|C|}-1\right)^{2}.

and

r+(GD,𝒑,C)|C|(minjCpj)4(max{2jCpj|C|1,0})2.r^{+}(G_{D},\bm{p},C)\triangleq|C|\cdot\big{(}\min_{j\in C}p_{j}\big{)}^{4}\cdot\left(\max\left\{\frac{2\sum_{j\in C}\sqrt{p_{j}}}{|C|}-1,0\right\}\right)^{2}.

Then Theorem 1.5 is obvious by Lemmas 5.1 and 5.2.

Lemma 5.1.

Given GDG_{D}, 𝐩\bm{p} and ε>0\varepsilon>0, let C1,C2,,CC_{1},C_{2},\cdots,C_{\ell} be any disjoint chordless cycles in GDG_{D}. If

d((1+ε)𝒑,GD)<1544ir+(GD,𝒑,Ci),d((1+\varepsilon)\bm{p},G_{D})<\frac{1}{544}\sum_{i\leq\ell}r^{+}(G_{D},\bm{p},C_{i}),

then for any variable-generated event system 𝒜(GD,𝐩)\mathcal{A}\sim(G_{D},\bm{p}), the expected number of resampling steps performed by MT algorithm is most m/εm/\varepsilon.

Proof.

Fix such an instance 𝒜\mathcal{A}. Define δi,i:=(AiAi)\delta_{i,i^{\prime}}:=\mathbb{P}(A_{i}\cap A_{i^{\prime}}). Let GBG_{B} denote the event-variable graph of 𝒜\mathcal{A}. Let GkG_{k} denote the induced subgraph of GBG_{B} on Ck(iCk𝒩GB(i))C_{k}\cup\left(\cup_{i\in C_{k}}\mathcal{N}_{G_{B}}(i)\right). According to Remark 4.7, it is lossless to assume GkG_{k} is a cycle of length 2|Ck|2|C_{k}|. Thus we have

(12) ϝ+(Gk,𝒑)(miniCkpi)2(|Ck|+iL2pi)8|Ck|.\digamma^{+}(G_{k},\bm{p})\geq\frac{\big{(}\min_{i\in C_{k}}p_{i}\big{)}^{2}\cdot\left(-|C_{k}|+\sum_{i\in L}2\sqrt{p_{i}}\right)}{8\sqrt{|C_{k}|}}.

According to Theorem 4.1, there is a matching \mathcal{M} of GDG_{D} such that (i,i)δi,i2k(ϝ+(Gk,𝒑))2\sum_{(i,i^{\prime})\in\mathcal{M}}\delta_{i,i^{\prime}}^{2}\geq\sum_{k\leq\ell}\left(\digamma^{+}(G_{k},\bm{p})\right)^{2}. Define 𝒑\bm{p}^{-} as (1). We have (1+ε)𝒑(1+ε)𝒑(1+\varepsilon)\bm{p}^{-}\leq(1+\varepsilon)\bm{p} and

(1+ε)𝒑(1+ε)𝒑1𝒑𝒑1217(i,i)δi,i2217k(ϝ+(Gk,𝒑))2.\displaystyle||(1+\varepsilon)\bm{p}-(1+\varepsilon)\bm{p}^{-}||_{1}\geq||\bm{p}-\bm{p}^{-}||_{1}\geq\frac{2}{17}\sum_{(i,i^{\prime})\in\mathcal{M}}\delta^{2}_{i,i^{\prime}}\geq\frac{2}{17}\sum_{k\leq\ell}\left(\digamma^{+}(G_{k},\bm{p})\right)^{2}.

Combining with (12), we have

(1+ε)𝒑(1+ε)𝒑11544ir+(GD,𝒑,Ci)>d((1+ε)𝒑,GD),\displaystyle||(1+\varepsilon)\bm{p}-(1+\varepsilon)\bm{p}^{-}||_{1}\geq\frac{1}{544}\sum_{i\leq\ell}r^{+}(G_{D},\bm{p},C_{i})>d((1+\varepsilon)\bm{p},G_{D}),

where the last inequality is by the condition of the lemma. Thus by Definition 1.4, we have (1+ε)𝒑(1+\varepsilon)\bm{p}^{-} is in the Shearer’s bound of GDG_{D}. Combining with Theorem 1.6, we have the expected number of resampling steps performed by the Moser-Tardos algorithm is most m/εm/\varepsilon. ∎

Lemma 5.2.

Given GDG_{D} and any chordless cycle CC in GDG_{D}, there is some probability vector 𝐩\bm{p} beyond the Shearer’s bound of GDG_{D} and with

d(𝒑,GD)1545r(GD,𝒑,C)>2203d(\bm{p},G_{D})\geq\frac{1}{545}\cdot r(G_{D},\bm{p},C)>2^{-20}\ell^{-3}

such that for any variable-generated event system 𝒜(GD,𝐩)\mathcal{A}\sim(G_{D},\bm{p}), the expected number of resampling steps performed by MT algorithm is most 229m2|C|32^{29}\cdot m^{2}\cdot|C|^{3}.

The following two lemmas will be used in the proof of Lemma 5.2.

Lemma 5.3.

[56] q(GD,𝐩)=1(A𝒜A)q_{\emptyset}(G_{D},\bm{p})=1-\mathbb{P}(\bigcup_{A\in\mathcal{A}}A) holds for any extremal instance 𝒜(GD,𝐩)\mathcal{A}\sim(G_{D},\bm{p}).

Lemma 5.4.

[56] Suppose 𝐩\bm{p} is the Shearer’s bound of GD=([m],ED)G_{D}=([m],E_{D}). Then for i[m]i\in[m],

q(GD,𝒑)pi=(j𝒩GD(i){i}Aj¯)\displaystyle\frac{\partial{q_{\emptyset}(G_{D},\bm{p})}}{\partial{p_{i}}}=-\mathbb{P}\left(\bigcap_{j\notin\mathcal{N}_{G_{D}}(i)\cup\{i\}}\overline{A_{j}}\right)

holds for any 𝒜(GD,𝐩)\mathcal{A}\sim(G_{D},\bm{p}) satisfying that AiAi′′=A_{i^{\prime}}\cap A_{i^{\prime\prime}}=\emptyset for any (i,i′′)ED(i^{\prime},i^{\prime\prime})\in E_{D} where i,i′′ii^{\prime},i^{\prime\prime}\neq i.

Proof of Lemma 5.2.

Let =|C|\ell=|C| and 𝝀=(14,,14,14)\bm{\lambda}=\left(\frac{1}{4},\cdots,\frac{1}{4},\frac{1}{4}\right). Let 𝒜(C,𝝀)\mathcal{A}\sim(C,\bm{\lambda}) be an extremal instance defined as follows: 𝒜=(A1,,A)\mathcal{A}=(A_{1},\cdots,A_{\ell}) is a variable-generated event system fully determined a set of underlying mutually independent random variables {X1,,X}\{X_{1},\cdots,X_{\ell}\}. Moreover, Ai=[Xi<1/2][Xi+11/2]A_{i}=[X_{i}<1/2]\land[X_{i+1}\geq 1/2] for each i[1]i\in[\ell-1], and A=[X<1/2][X11/2]A_{\ell}=[X_{\ell}<1/2]\land[X_{1}\geq 1/2]. According to Lemma  5.3,

q(C,𝝀)=(i[]Ai)=121.q_{\emptyset}(C,\bm{\lambda})=\mathbb{P}\left(\bigcup_{i\in[\ell]}A_{i}\right)=\frac{1}{2^{\ell-1}}.

Besides, according to Lemma 5.4, for any 𝝀=(14,,14,14+ε)\bm{\lambda}^{\prime}=(\frac{1}{4},\cdots,\frac{1}{4},\frac{1}{4}+\varepsilon) in the Shearer’s bound of CC,

q(C,𝝀)λ=(i[2,2]Ai¯)=223.\displaystyle\frac{\partial{q_{\emptyset}(C,\bm{\lambda}^{\prime})}}{\partial{\lambda^{\prime}_{\ell}}}=-\mathbb{P}\left(\bigcap_{i\in[2,\ell-2]}\overline{A_{i}}\right)=-\frac{\ell-2}{2^{\ell-3}}.

Thus, for any 𝝀𝝀𝝀′′:=(14,,14,14+14(1))\bm{\lambda}\leq\bm{\lambda^{\prime}}\leq\bm{\lambda}^{\prime\prime}:=\left(\frac{1}{4},\cdots,\frac{1}{4},\frac{1}{4}+\frac{1}{4(\ell-1)}\right), we have that

q(C,𝝀′′)=q(C,𝝀)+14λ′′q(C,𝝀)λ𝑑λ>12121121=11121.\displaystyle q_{\emptyset}(C,\bm{\lambda}^{\prime\prime})=q_{\emptyset}(C,\bm{\lambda})+\int_{\frac{1}{4}}^{\lambda^{\prime\prime}_{\ell}}\frac{\partial{q_{\emptyset}(C,\bm{\lambda}^{\prime})}}{\partial{\lambda_{\ell}^{\prime}}}d\lambda_{\ell}^{\prime}>\frac{1}{2^{\ell-1}}-\frac{\ell-2}{\ell-1}\cdot\frac{1}{2^{\ell-1}}=\frac{1}{\ell-1}\cdot\frac{1}{2^{\ell-1}}.

Hence 𝝀′′\bm{\lambda}^{\prime\prime} is in the Shearer’s bound of CC. Thus, there exists q>0q>0 such that 𝒒\bm{q} defined as follows is on the Shearer’s boundary of GDG_{D}:

i[m]:qi={14if i[1],14+14(1)if i=,qotherwise.\displaystyle\forall i\in[m]:\quad q_{i}=\begin{cases}\frac{1}{4}&\text{if }i\in[\ell-1],\\ \frac{1}{4}+\frac{1}{4(\ell-1)}&\text{if }i=\ell,\\ q&\text{otherwise}.\end{cases}

One can verify that

(13) r+(GD,𝒒,C)=r(GD,𝒒,C)>144(122)2>12103.\displaystyle r^{+}(G_{D},\bm{q},C)=r(G_{D},\bm{q},C)>\ell\cdot\frac{1}{4^{4}}\cdot\left(\frac{1}{2\ell^{2}}\right)^{2}>\frac{1}{2^{10}\cdot\ell^{3}}.

Define

f(δ)=545d((1+δ)𝒒,GD)r+(GD,(1+δ)𝒒,C).f(\delta)=545\cdot d((1+\delta)\bm{q},G_{D})-r^{+}(G_{D},(1+\delta)\bm{q},C).

One can verify that f(0)<0f(0)<0 because d(𝒒,GD)=0d(\bm{q},G_{D})=0 and r+(GD,𝒒,C)>0r^{+}(G_{D},\bm{q},C)>0. Moreover, let δ\delta^{\prime} be large enough such that (1+δ)𝒒v(GD)(1+\delta^{\prime})\bm{q}\not\in\mathcal{I}_{v}(G_{D}). One can verify that such δ\delta^{\prime} must exist. We have f(δ)0f(\delta^{\prime})\geq 0. This is because otherwise f(δ)<0f(\delta^{\prime})<0 and then

d((1+δ)𝒒,GD)<1545r+(GD,(1+δ)𝒒,C).d((1+\delta^{\prime})\bm{q},G_{D})<\frac{1}{545}\cdot r^{+}(G_{D},(1+\delta^{\prime})\bm{q},C).

By following the proof of Lemma 5.1, we have the MT algorithm terminates at (1+δ)𝒒(1+\delta^{\prime})\bm{q}, which is contradictory with (1+δ)𝒒v(GD)(1+\delta^{\prime})\bm{q}\not\in\mathcal{I}_{v}(G_{D}).

Moreover, f(δ)f(\delta) is a continuous function of δ\delta, because d((1+δ)𝒒,GD)d((1+\delta)\bm{q},G_{D}) and r+(GD,(1+δ)𝒒,C)r^{+}(G_{D},(1+\delta)\bm{q},C) are both continuous functions of δ\delta. Combining with f(0)<0f(0)<0 and f(δ)>0f(\delta^{\prime})>0, we have there must be a 0δδ0\leq\delta\leq\delta^{\prime} such that f(δ)=0f(\delta)=0. Let 𝒑=(1+δ)𝒒\bm{p}=(1+\delta)\bm{q}. By f(δ)=0f(\delta)=0, we have

(14) d(𝒑,GD)=1545r+(GD,𝒑,C).d(\bm{p},G_{D})=\frac{1}{545}\cdot r^{+}(G_{D},\bm{p},C).

Combining with r+(GD,𝒑,C)=r(GD,𝒑,C)>r(GD,𝒒,C)r^{+}(G_{D},\bm{p},C)=r(G_{D},\bm{p},C)>r(G_{D},\bm{q},C) and (13), we have d(𝒑,GD)>2203d(\bm{p},G_{D})>2^{-20}\ell^{-3}.

Fix a variable-generated event system 𝒜(GD,𝒑)\mathcal{A}\sim(G_{D},\bm{p}). Define δi,i:=(AiAi)\delta_{i,i^{\prime}}:=\mathbb{P}(A_{i}\cap A_{i^{\prime}}). Let GBG_{B} denote the event-variable graph of 𝒜\mathcal{A}. Let GG denote the induced subgraph of GBG_{B} on C(iC𝒩GB(i))C\cup\left(\cup_{i\in C}\mathcal{N}_{G_{B}}(i)\right). According to Remark 4.7, it is lossless to assume that GG is a cycle of length 2|C|2|C|. Thus we have

(15) ϝ+(G,𝒑)(miniCpi)2(|C|+iL2pi)8|C|.\digamma^{+}(G,\bm{p})\geq\frac{\big{(}\min_{i\in C}p_{i}\big{)}^{2}\cdot\left(-|C|+\sum_{i\in L}2\sqrt{p_{i}}\right)}{8\sqrt{|C|}}.

According to Theorem 4.1, there is a matching \mathcal{M} of GDG_{D} such that (i,i)δi,i2(ϝ+(G,𝒑))2\sum_{(i,i^{\prime})\in\mathcal{M}}\delta_{i,i^{\prime}}^{2}\geq\left(\digamma^{+}(G,\bm{p})\right)^{2}. Define 𝒑\bm{p}^{-} as (1). We have

𝒑𝒑1217(i,i)δi,i2217k(ϝ+(Gk,𝒑))2.\displaystyle||\bm{p}-\bm{p}^{-}||_{1}\geq\frac{2}{17}\sum_{(i,i^{\prime})\in\mathcal{M}}\delta^{2}_{i,i^{\prime}}\geq\frac{2}{17}\sum_{k\leq\ell}\left(\digamma^{+}(G_{k},\bm{p})\right)^{2}.

Combining with (15), we have

𝒑𝒑11544r+(GD,𝒑,C).\displaystyle||\bm{p}-\bm{p}^{-}||_{1}\geq\frac{1}{544}\cdot r^{+}(G_{D},\bm{p},C).

Let

ε12293m.\varepsilon\triangleq\frac{1}{2^{29}\cdot\ell^{3}\cdot m}.

By (13) we have

mε15455442103(15441545)r+(GD,𝒒,C)(15441545)r+(GD,𝒑,C).\displaystyle m\varepsilon\leq\frac{1}{545\cdot 544\cdot 2^{10}\cdot\ell^{3}}\leq\left(\frac{1}{544}-\frac{1}{545}\right)r^{+}(G_{D},\bm{q},C)\leq\left(\frac{1}{544}-\frac{1}{545}\right)r^{+}(G_{D},\bm{p},C).

Thus we have

𝒑(1+ε)𝒑1>𝒑𝒑1mεr+(GD,𝒑,C)544mεr+(GD,𝒑,C)545d(𝒑,GD),\displaystyle||\bm{p}-(1+\varepsilon)\bm{p}^{-}||_{1}>||\bm{p}-\bm{p}^{-}||_{1}-m\varepsilon\geq\frac{r^{+}(G_{D},\bm{p},C)}{544}-m\varepsilon\geq\frac{r^{+}(G_{D},\bm{p},C)}{545}\geq d(\bm{p},G_{D}),

where the last inequality is by (14). Thus by Definition 1.4, we have (1+ε)𝒑(1+\varepsilon)\bm{p}^{-} is in the Shearer’s bound of GDG_{D}. Combining with Theorem 1.6, we have the expected number of resampling steps performed by the MT algorithm is most m/εm/\varepsilon. ∎

6. Application to periodic Euclidean graphs

In this section, we explicitly calculate the gaps between our new criterion and Shearer’s bound on periodic Euclidean graphs, including several lattices that have been studied extensively in physics. It turns out the efficient region of MT algorithm can exceed significantly beyond Shearer’s bound.

A periodic Euclidean graph GDG_{D} is a graph that is embedded into a Euclidean space naturally and has a translational unit GUG_{U} in the sense that GDG_{D} can be viewed as the union of periodic translations of GUG_{U}. For example, a cycle of length 4 is a translational unit of the square lattice.

Given a dependency graph GDG_{D}, it naturally defines a bipartite graph GB(GD)G_{B}(G_{D}) as follows. Regard each edge of GDG_{D} as a variable and each vertex as an event. An event AA depends on a variable XX if and only if the vertex corresponding to AA is an endpoint of the edge corresponding to XX.

For simplicity, we only focus on symmetric probabilities, where 𝒑=(p,p,,p)\bm{p}=(p,p,\cdots,p). Given a dependency graph GBG_{B} and a vector 𝒑\bm{p}, remember that 𝒑\bm{p} is on Shearer’s boundary of GDG_{D} if (1ε)𝒑(1-\varepsilon)\bm{p} is in Shearer’s bound and (1+ε)𝒑(1+\varepsilon)\bm{p} is not for any ε>0\varepsilon>0.

Given a dependency graph GD=([m],ED)G_{D}=([m],E_{D}) and two vertices i,i[m]i,i^{\prime}\in[m], we use dist(i,i)\mathrm{dist}(i,i^{\prime}) to denote the distance between ii and ii^{\prime} in GDG_{D}. The following Lemma will be used.

Lemma 6.1.

Suppose 𝐩a=(pa,pa,,pa)\bm{p}_{a}=(p_{a},p_{a},\cdots,p_{a}) is on Shearer’s boundary of GD=([m],ED)G_{D}=([m],E_{D}). For any probability vector 𝐩\bm{p} other than 𝐩a\bm{p}_{a}, it is in the Shearer’s bound if there exist K,d+K,d\in\mathbb{N}^{+}, 𝒮2[m]\mathcal{S}\subseteq 2^{[m]} where S𝒮=[m]\cup_{S\in\mathcal{S}}=[m], and f:𝒮2[m]f:\mathcal{S}\rightarrow 2^{[m]} such that the following conditions hold:

  • (a)

    for each i[m]i\in[m], there are at most KK subsets S𝒮S\in\mathcal{S} such that f(S)if(S)\ni i;

  • (b)

    if f(S)=Tf(S)=T, then dist(i,i)d\mathrm{dist}(i,i^{\prime})\leq d for each iSi\in S and iTi^{\prime}\in T;

  • (c)

    if f(S)=Tf(S)=T, then

    (1papa)d1KpaiSmax{pipa,0}iTmax{papi,0}.\left(\frac{1-p_{a}}{p_{a}}\right)^{d-1}\cdot\frac{K}{p_{a}}\cdot\sum_{i\in S}\max\{p_{i}-p_{a},0\}\leq\sum_{i\in T}\max\{p_{a}-p_{i},0\}.

While Lemma 6.1 looks involved, the basic idea is simple: by contradiction, suppose there is such a vector 𝒑\bm{p}^{\prime} beyond Shearer’s bound; then we apply Lemma D.1 repeatedly to transfer probability from one event to another while keeping the probability vector still beyond Shearer’s bound; finally, the vector 𝒑\bm{p}^{\prime} will be changed to a vector strictly below 𝒑\bm{p}, which makes a contradiction to the assumption that 𝒑\bm{p} is on the Shearer’s boundary. The involved part is a transferring scheme which changes 𝒑\bm{p}^{\prime} to another probability vector strictly below 𝒑\bm{p}. We leave the proof to the appendix.

The main result of this section is as follows.

Theorem 6.2.

Let GD=(VD,ED)G_{D}=(V_{D},E_{D}) be a periodic Euclidean graph with maximum degree Δ\Delta, and 𝐩a=(pa,,pa)\bm{p}_{a}=(p_{a},\cdots,p_{a}) be the probability vector on Shearer’s boundary of GDG_{D}. Suppose GU=(VU,EU)G_{U}=(V_{U},E_{U}) is a translational unit of GDG_{D} with diameter DD. Let

qpaD+2(ϝ+(GB(GU),𝒑a))217(Δ+1)|V|2(1pa)D+1.\displaystyle q\triangleq\frac{p_{a}^{D+2}\big{(}\digamma^{+}(G_{B}(G_{U}),\bm{p}_{a})\big{)}^{2}}{17\cdot(\Delta+1)\cdot|V|^{2}\cdot(1-p_{a})^{D+1}}.

Then for any 𝒜(GB(GD),𝐩)\mathcal{A}\sim(G_{B}(G_{D}),\bm{p}) where (1+ε)𝐩(pa+q,,pa+q)(1+\varepsilon)\bm{p}\leq(p_{a}+q,\cdots,p_{a}+q), the expected number of resampling steps performed by the MT algorithm is most |VD|/ε|V_{D}|/\varepsilon.

Proof.

Fix any 𝒜(GB(GD),𝒑)\mathcal{A}\sim(G_{B}(G_{D}),\bm{p}) where (1+ε)𝒑(pa+q,,pa+q)(1+\varepsilon)\bm{p}\leq(p_{a}+q,\cdots,p_{a}+q). Let δv0,v1\delta_{v_{0},v_{1}} denote (Av0Av1)\mathbb{P}(A_{v_{0}}\cap A_{v_{1}}) for (v0,v1)ED(v_{0},v_{1})\in E_{D}. We construct a matching ED\mathcal{M}\subset E_{D} greedily as follows: we maintain two sets EE and \mathcal{M}, which are initialized as EDE_{D} and \emptyset respectively. We do the following iteratively until EE becomes empty: select a edge (v0,v1)(v_{0},v_{1}) with maximum δv0,v1\delta_{v_{0},v_{1}} from EE, add (v0,v1)(v_{0},v_{1}) to \mathcal{M}, and delete all edges connecting v0v_{0} or v1v_{1} from EE (including (v0,v1)(v_{0},v_{1})). Let 𝜹=(δv0,v1:(v0,v1))\bm{\delta}=\big{(}\delta_{v_{0},v_{1}}:(v_{0},v_{1})\in\mathcal{M}\big{)}. Then 𝒜(GB(GD),𝒑,,𝜹)\mathcal{A}\sim(G_{B}(G_{D}),\bm{p},\mathcal{M},\bm{\delta}).

Define 𝒑\bm{p}^{-} as (1). In the remaining part of the proof, we will show that (1+ε)𝒑(1+\varepsilon)\bm{p}^{-} is in the Shearer’s bound. This implies the conclusion immediately by Theorem 1.6.

In fact, it is a direct application of Lemma 6.1 to show that (1+ε)𝒑(1+\varepsilon)\bm{p}^{-} is in the Shearer’s bound. To provide more detail, we need some notations. We use v,v,v1,v2,v,v^{\prime},v_{1},v_{2},\cdots to represent vertices in GDG_{D}, and use u,u,u1,u2,u,u^{\prime},u_{1},u_{2},\cdots to represent vertices in GUG_{U}. Let GU1,GU2,G_{U}^{1},G_{U}^{2},\cdots be the periodic translations of GUG_{U} in GDG_{D}. And we use a surjection999hh is possibly not a injection, as these translations are possibly overlapped with each other. h:+×VUVDh:\mathbb{N}^{+}\times V_{U}\rightarrow V_{D} to represent how these periodic translations constitute GDG_{D}: h(k,u)=vh(k,u)=v if the copy of uVUu\in V_{U} in kk-th translation (i.e., GUkG_{U}^{k}) is vVDv\in V_{D}. In particular, the vertex set of GUkG_{U}^{k} , denoted by VUkV_{U}^{k}, is {h(k,u):uV}\{h(k,u):u\in V\}, and the edge set of GUkG_{U}^{k} , denoted by EUkE_{U}^{k}, is {(h(k,u),h(k,u)):(u,u)EU}\{(h(k,u),h(k,u^{\prime})):(u,u^{\prime})\in E_{U}\}. Besides, let 𝒩+(v):=𝒩GD(v){v}\mathcal{N}^{+}(v):=\mathcal{N}_{G_{D}}(v)\cup\{v\} for vVDv\in V_{D}. For VVDV\subset V_{D}, let 𝒩+(V):=vV𝒩+(v)\mathcal{N}^{+}(V):=\cup_{v\in V}\mathcal{N}^{+}(v). Let Tk:={(v0,v1):v0,v1𝒩+(GUk)}T_{k}:=\{(v_{0},v_{1})\in\mathcal{M}:v_{0},v_{1}\in\mathcal{N}^{+}(G_{U}^{k})\} stand for the pairs in \mathcal{M} adjacent to GUkG_{U}^{k}. With some abuse of notation, we sometimes use vTkv\in T_{k} to denote that (v,v)Tk(v,v^{\prime})\in T_{k} for some vVDv^{\prime}\in V_{D}.

The following claim says that 𝒑\bm{p}^{-} is much smaller than 𝒑\bm{p} even projected on a single translation. Its proof uses a similar idea to Theorem 4.6 and can be found in the appendix.

Claim 6.3.

(v0,v1)Tkδv0,v12(ϝ+(GB(GU),𝒑))2\sum_{(v_{0},v_{1})\in T_{k}}\delta_{v_{0},v_{1}}^{2}\geq\big{(}\digamma^{+}(G_{B}(G_{U}),\bm{p})\big{)}^{2} holds for any kk.

To apply Lemma 6.1, let K:=(Δ+1)|VU|K:=(\Delta+1)|V_{U}|, d:=D+2d:=D+2, 𝒮:={VU1,VU2,}\mathcal{S}:=\{V_{U}^{1},V_{U}^{2},\cdots\}, and f(VUk):=Tkf(V_{U}^{k}):=T_{k}. Based on Claim 6.3, one can check that all the three conditions in Lemma 6.1 hold (see the appendix for details). Thus, according to Lemma 6.1, (1+ε)𝒑(1+\varepsilon)\bm{p}^{-} is in Shearer’s bound. ∎

We apply Theorem 6.2 to three lattices: square lattice, Hexagonal lattice, and simple cubic lattice. For square lattice, we take the 5×55\times 5 square with 25 vertices as the translational unit. For Hexagonal lattice, we take a graph consisting of 1919 hexagons as the translational unit, in which there are 3,4,5,4,3 hexagons in the five columns, respectively. For simple cubic lattice, we take the 3×3×33\times 3\times 3 cube with 27 vertices as the translational unit. The explicit gaps are summarized in Table 1. Finally, the lower bounds for these three lattices in Table 1 hold for all bipartite graphs with the given canonical dependency graph, because all such bipartite graphs are essentially the same under the reduction rules defined in [36].

References

  • AFR [95] Michael Albert, Alan Frieze, and Bruce Reed. Multicoloured hamilton cycles. the electronic journal of combinatorics, 2(1):R10, 1995.
  • AI [16] Dimitris Achlioptas and Fotis Iliopoulos. Random walks that find perfect objects and the lovász local lemma. Journal of ACM, 63(3):22:1–22:29, 2016.
  • AIK [19] Dimitris Achlioptas, Fotis Iliopoulos, and Vladimir Kolmogorov. A local lemma for focused stochastic algorithms. SIAM Journal on Computing, 48(5):1583–1602, 2019.
  • AIS [19] Dimitris Achlioptas, Fotis Iliopoulos, and Alistair Sinclair. Beyond the lovász local lemma: Point to set correlations and their algorithmic applications. In Proceedings of Symposium on Foundations of Computer Science (FOCS), pages 725–744, 2019.
  • AIV [17] Dimitris Achlioptas, Fotis Iliopoulos, and Nikos Vlassis. Stochastic control via entropy compression. In Proceedings of International Colloquium on Automata, Languages, and Programming (ICALP), volume 80, pages 83:1–83:13, 2017.
  • Alo [91] Noga Alon. A parallel algorithmic version of the local lemma. Random Structures and Algorithms, 2(4):367–378, 1991.
  • Bec [91] József Beck. An algorithmic approach to the Lovász local lemma. Random Structures and Algorithms, 2(4):343–365, 1991.
  • BFH+ [16] Sebastian Brandt, Orr Fischer, Juho Hirvonen, Barbara Keller, Tuomo Lempiäinen, Joel Rybicki, Jukka Suomela, and Jara Uitto. A lower bound for the distributed lovász local lemma. In Proceedings of Symposium on Theory of Computing (STOC), pages 479–488, 2016.
  • BFPS [11] Rodrigo Bissacot, Roberto Fernández, Aldo Procacci, and Benedetto Scoppola. An improvement of the Lovász local lemma via cluster expansion. Combinatorics, Probability and Computing, 20(05):709–719, 2011.
  • CCS+ [17] Jan Dean Catarata, Scott Corbett, Harry Stern, Mario Szegedy, Tomas Vyskocil, and Zheng Zhang. The Moser-Tardos resample algorithm: Where is the limit?(an experimental inquiry). In 2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 159–171, 2017.
  • CGH [13] Karthekeyan Chandrasekaran, Navin Goyal, and Bernhard Haeupler. Deterministic algorithms for the lovász local lemma. SIAM Journal on Computing, 42(6):2132–2155, 2013.
  • CPS [17] Kai-Min Chung, Seth Pettie, and Hsin-Hao Su. Distributed algorithms for the lovász local lemma and graph coloring. Distributed Computing, 30(4):261–280, 2017.
  • CS [00] Artur Czumaj and Christian Scheideler. Coloring nonuniform hypergraphs: A new algorithmic approach to the general Lovász local lemma. Random Structures and Algorithms, 17(3-4):213–237, 2000.
  • EL [75] Paul Erdős and László Lovász. Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets, 10(2):609–627, 1975.
  • ES [91] Paul Erdös and Joel Spencer. Lopsided lovász local lemma and latin transversals. Discrete Applied Mathematics, 30(2-3):151–154, 1991.
  • FGYZ [20] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Fast sampling and counting kk-sat solutions in the local lemma regime. In Proceedings of Symposium on Theory of Computing (STOC), pages 854–867, 2020.
  • FHY [20] Weiming Feng, Kun He, and Yitong Yin. Sampling constraint satisfaction solutions in the local lemma regime. arXiv preprint arXiv:2011.03915, 2020.
  • FHY [21] Weiming Feng, Kun He, and Yitong Yin. Sampling constraint satisfaction solutions in the local lemma regime. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1565–1578, 2021.
  • Gau [67] David S Gaunt. Hard-sphere lattice gases. ii. plane-triangular and three-dimensional lattices. The Journal of Chemical Physics, 46(8):3237–3259, 1967.
  • GF [65] David S Gaunt and Michael E Fisher. Hard-sphere lattice gases. i. plane-square lattice. The Journal of Chemical Physics, 43(8):2840–2863, 1965.
  • GGGY [20] Andreas Galanis, Leslie Ann Goldberg, Heng Guo, and Kuan Yang. Counting solutions to random CNF formulas. In ICALP, volume 168 of LIPIcs, pages 53:1–53:14, 2020.
  • GH [20] Heng Guo and Kun He. Tight bounds for popping algorithms. Random Structures & Algorithms, 2020.
  • Gha [16] Mohsen Ghaffari. An improved distributed algorithm for maximal independent set. In Robert Krauthgamer, editor, Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 270–277, 2016.
  • Gil [19] András Gilyén. Quantum singular value transformation & its algorithmic applications. PhD thesis, University of Amsterdam, 2019.
  • GJ [18] Heng Guo and Mark Jerrum. Approximately counting bases of bicircular matroids. arXiv preprint arXiv:1808.09548, 2018.
  • GJ [19] Heng Guo and Mark Jerrum. A polynomial-time approximation algorithm for all-terminal network reliability. SIAM Journal on Computing, 48(3):964–978, 2019.
  • GJL [19] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the lovász local lemma. Journal of the ACM (JACM), 66(3):18, 2019.
  • GKPT [17] Ioannis Giotis, Lefteris Kirousis, Kostas I Psaromiligkos, and Dimitrios M Thilikos. Acyclic edge coloring through the Lovász local lemma. Theoretical Computer Science, 665:40–50, 2017.
  • GLLZ [19] Heng Guo, Chao Liao, Pinyan Lu, and Chihao Zhang. Counting hypergraph colorings in the local lemma regime. SIAM Journal on Computing, 48(4):1397–1424, 2019.
  • GMSW [09] Heidi Gebauer, Robin A Moser, Dominik Scheder, and Emo Welzl. The lovász local lemma and satisfiability. In Efficient Algorithms, pages 30–54. Springer, 2009.
  • GST [16] Heidi Gebauer, Tibor Szabó, and Gábor Tardos. The local lemma is asymptotically tight for SAT. Journal of ACM, 63(5):43, 2016.
  • Har [16] David G Harris. Lopsidependency in the moser-tardos framework: beyond the lopsided Lovász local lemma. ACM Transactions on Algorithms (TALG), 13(1):17, 2016.
  • Har [18] David G. Harris. Deterministic parallel algorithms for fooling polylogarithmic juntas and the lovász local lemma. ACM Transaction on Algorithms, 14(4):47:1–47:24, 2018.
  • Har [19] David G. Harris. Deterministic algorithms for the lovasz local lemma: simpler, more general, and more parallel. CoRR, abs/1909.08065, 2019.
  • HH [17] Bernhard Haeupler and David G. Harris. Parallel algorithms and concentration bounds for the lovász local lemma via witness dags. ACM Transaction on Algorithms, 13(4):53:1–53:25, 2017.
  • HLL+ [17] Kun He, Liang Li, Xingwu Liu, Yuyi Wang, and Mingji Xia. Variable-version Lovász local lemma: Beyond shearer’s bound. In Proceedings of Symposium on Foundations of Computer Science (FOCS), pages 451–462, 2017.
  • HLSZ [19] Kun He, Qian Li, Xiaoming Sun, and Jiapeng Zhang. Quantum lovász local lemma: Shearer’s bound is tight. In Proceedings of Symposium on Theory of Computing (STOC), pages 461–472, 2019.
  • HS [14] David G. Harris and Aravind Srinivasan. A constructive algorithm for the lovász local lemma on permutations. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 907–925, 2014.
  • HSS [11] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. New constructive aspects of the lovász local lemma. Journal of ACM, 58(6):1–28, 2011.
  • HSW [21] Kun He, Xiaoming Sun, and Kewen Wu. Perfect sampling for (atomic) lov\\backslash’asz local lemma. arXiv preprint arXiv:2107.03932, 2021.
  • HV [20] Nicholas J. A. Harvey and Jan Vondrák. An algorithmic proof of the lovász local lemma via resampling oracles. SIAM Journal on Computing, 49(2):394–428, 2020.
  • IS [20] Fotis Iliopoulos and Alistair Sinclair. Efficiently list-edge coloring multigraphs asymptotically optimally. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2319–2336, 2020.
  • JPV [20] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. Towards the sampling lovász local lemma. CoRR, abs/2011.12196, 2020.
  • JPV [21] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. On the sampling lovász local lemma for atomic constraint satisfaction problems. CoRR, abs/2102.08342, 2021.
  • KS [11] Kashyap Babu Rao Kolipaka and Mario Szegedy. Moser and tardos meet lovász. In Proceedings of Symposium on Theory of Computing (STOC), pages 235–244, 2011.
  • KSX [12] Kashyap Babu Rao Kolipaka, Mario Szegedy, and Yixin Xu. A sharper local lemma with improved applications. In Proceedings of APPROX/RANDOM, volume 7408 of Lecture Notes in Computer Science, pages 603–614, 2012.
  • LS [07] Linyuan Lu and László Székely. Using lovász local lemma in the space of random injections. the electronic journal of combinatorics, 14(1):R63, 2007.
  • LS [09] Linyuan Lu and Laszlo A Szekely. A new asymptotic enumeration technique: the lovász local lemma. arXiv preprint arXiv:0905.3983, 2009.
  • McD [97] Colin McDiarmid. Hypergraph colouring and the Lovász local lemma. Discrete Mathematics, 167:481–486, 1997.
  • [50] Ankur Moitra. Approximate counting, the lovasz local lemma, and inference in graphical models. Journal of ACM, 66(2):10, 2019.
  • [51] Ankur Moitra. Approximate counting, the Lovász local lemma, and inference in graphical models. J. ACM, 66(2):10:1–10:25, 2019.
  • Mol [19] Michael Molloy. The list chromatic number of graphs with small clique number. Journal of Combinatorial Theory, Series B, 134:264–284, 2019.
  • MR [98] Michael Molloy and Bruce Reed. Further algorithmic aspects of the local lemma. In Proceedings of Symposium on Theory of Computing (STOC), pages 524–529, 1998.
  • MT [10] Robin A Moser and Gábor Tardos. A constructive proof of the general Lovász local lemma. Journal of ACM, 57(2):11, 2010.
  • Peg [14] Wesley Pegden. An extension of the moser–tardos algorithmic local lemma. SIAM Journal on Discrete Mathematics, 28(2):911–917, 2014.
  • She [85] James B. Shearer. On a problem of spencer. Combinatorica, 5(3):241–245, 1985.
  • Spe [75] Joel Spencer. Ramsey’s theorem—a new lower bound. Journal of Combinatorial Theory, Series A, 18(1):108–115, 1975.
  • Spe [77] Joel Spencer. Asymptotic lower bounds for Ramsey functions. Discrete Mathematics, 20:69–76, 1977.
  • Sze [13] Mario Szegedy. The lovász local lemma–a survey. In International Computer Science Symposium in Russia, pages 1–11. Springer, 2013.
  • Tod [99] Synge Todo. Transfer-matrix study of negative-fugacity singularity of hard-core lattice gas. International Journal of Modern Physics C, 10(04):517–529, 1999.

Appendix A Missing Proofs in Section 2

Proof of Proposition 3.2.

The following simple greedy procedure will find such a 𝒫\mathcal{P}.

1 Initially, 𝒫=\mathcal{P}=\emptyset;
2 for  each (i,i)(i,i^{\prime})\in\mathcal{M} do
3       for each kk from 11 to |List(D,i,i)|1|\mathrm{List}(D,i,i^{\prime})|-1 do
4             if the kk-th node and (k+1)(k+1)-th node in List(D,i,i)\mathrm{List}(D,i,i^{\prime}) form a reversible arc then
5                  add this arc to 𝒫\mathcal{P}, and k:=k+2k:=k+2;
6            else
7                  k:=k+1k:=k+1;
8            
9      
Return 𝒫\mathcal{P};

Obviously, for each (i,i)(i,i^{\prime})\in\mathcal{M}, the procedure contains at least half of all reversible arcs uvu\rightarrow v where {L(u),L(v)}={i,i}\{L(u),L(v)\}=\{i,i^{\prime}\}, hence at least half of nodes in 𝒱(D,i)\mathcal{V}(D,i). ∎

Appendix B Proof of Proposition 3.9

Given a pwdag D=(V,E,L)D=(V,E,L) of GDG_{D} and a Boolean string 𝑹{0,1}(D)\bm{R}\in\{0,1\}^{\mathscr{M}(D)}, define h(D,𝑹)h(D,\bm{R}) to be a directed graph D:=(V,E,L)D^{\prime}:=(V^{\prime},E^{\prime},L^{\prime}) where V=VV^{\prime}=V, E=EE^{\prime}=E, and

vV:L(v)={(L(v)),if v and Rv=0;(L(v)),if v and Rv=1;L(v),otherwise, v.\displaystyle\forall v\in V:\quad L^{\prime}(v)=\begin{cases}(L(v))^{\uparrow},&\text{if }v\in\mathscr{M}\text{ and }R_{v}=0;\\ (L(v))^{\downarrow},&\text{if }v\in\mathscr{M}\text{ and }R_{v}=1;\\ L(v),&\text{otherwise, }v\not\in\mathscr{M}.\end{cases}

It is easy to verify that h(D,𝑹)h(D,\bm{R}) is a pwdag of GG^{\mathcal{M}}. Moreover, given any D𝒟(G)D^{\prime}\in\mathcal{D}(G^{\mathcal{M}}), there is one and only one D𝒟(GD)D\in\mathcal{D}(G_{D}) and 𝑹{0,1}(D)\bm{R}\in\{0,1\}^{\mathscr{M}(D)} such that h(D,𝑹)=Dh(D,\bm{R})=D^{\prime}. In other words, hh is a bijection between {(D,𝑹):D𝒟(GD),𝑹{0,1}(D)}\{(D,\bm{R}):D\in\mathcal{D}(G_{D}),\bm{R}\in\{0,1\}^{\mathscr{M}(D)}\} and 𝒟(G)\mathcal{D}(G^{\mathcal{M}}). So

D𝒟(G)v in DpL(v)\displaystyle\sum_{D^{\prime}\in\mathcal{D}(G^{\mathcal{M}})}\prod_{v^{\prime}\text{ in }D^{\prime}}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})} =D𝒟(GD)𝑹{0,1}(D)v in h(D,𝑹)pL(v)\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\sum_{\bm{R}\in\{0,1\}^{\mathscr{M}(D)}}\prod_{v^{\prime}\text{ in }h(D,\bm{R})}p^{\mathcal{M}}_{L^{\prime}(v^{\prime})}
=D𝒟(GD)𝑹{0,1}(D)v in DpL(v)\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\sum_{\bm{R}\in\{0,1\}^{\mathscr{M}(D)}}\prod_{v\text{ in }D}p^{\mathcal{M}}_{L^{\prime}(v)}
=D𝒟(GD)v(D)pL(v)(𝑹{0,1}(D)v(D)pL(v))\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\not\in\mathscr{M}(D)}p^{\mathcal{M}}_{L^{\prime}(v)}\left(\sum_{\bm{R}\in\{0,1\}^{\mathscr{M}(D)}}\prod_{v\in\mathscr{M}(D)}p^{\mathcal{M}}_{L^{\prime}(v)}\right)
=D𝒟(GD)v(D)pL(v)v(D)(pL(v)+pL(v))\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\not\in\mathscr{M}(D)}p^{\mathcal{M}}_{L(v)}\prod_{v\in\mathscr{M}(D)}\left(p^{\mathcal{M}}_{L(v)^{\uparrow}}+p^{\mathcal{M}}_{L(v)^{\downarrow}}\right)
=D𝒟(GD)v(D)pL(v)v(D)(pL(v)+pL(v)pL(v))\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\not\in\mathscr{M}(D)}p^{-}_{L(v)}\prod_{v\in\mathscr{M}(D)}\left(p^{\prime}_{L(v)}+p^{-}_{L(v)}-p^{\prime}_{L(v)}\right)
=D𝒟(GD)v in DpL(v),\displaystyle=\sum_{D\in\mathcal{D}(G_{D})}\prod_{v\text{ in }D}p^{-}_{L(v)},

where the second equality is by that V=VV=V^{\prime}, the forth equality is by the definition of LL^{\prime}, and the fifth equality is by the definition of 𝒑\bm{p}^{\mathcal{M}}.

Appendix C Proof of Theorem 3.12

We first verify that the image of hh is a subset of 𝒟(G)\mathcal{D}(G^{\mathcal{M}}).

Lemma C.1.

For any D𝒟(GD)D\in\mathcal{D}(G_{D}) and 𝒮ψ(D)\mathscr{S}\in\psi(D), h(D,𝒮)𝒟(G)h(D,\mathscr{S})\in\mathcal{D}(G^{\mathcal{M}}).

Proof.

First, we prove that h(D,𝒮)=(V,E,L)h(D,\mathscr{S})=(V^{\prime},E^{\prime},L^{\prime}) is a DAG. Define a total order π\pi^{\prime} over the set VV^{\prime} as follows: for any two distinct nodes u,vVu^{\prime},v^{\prime}\in V^{\prime},

  • if g(u)g(v)g(u^{\prime})\neq g(v^{\prime}), then uvu^{\prime}\prec v^{\prime} in π\pi^{\prime} if and only if g(u)g(v)g(u^{\prime})\prec g(v^{\prime}) in πD\pi_{D};

  • if g(u)=g(v)g(u^{\prime})=g(v^{\prime}), then uvu^{\prime}\prec v^{\prime} in π\pi^{\prime} if and only if u=f(g(u))u^{\prime}=f^{\ast}(g(u^{\prime})) (and then v=f(g(u))v^{\prime}=f(g(u^{\prime}))).

One can verify that π\pi^{\prime} is a topological order of h(D,𝒮)h(D,\mathscr{S}), which means that h(D,𝒮)h(D,\mathscr{S}) is a DAG.

Secondly, we prove that h(D,𝒮)h(D,\mathscr{S}) is a wdag of GG^{\mathcal{M}}. As h(D,𝒮)h(D,\mathscr{S}) has been shown to be a DAG, we only need to verify that: for any two distinct nodes u,vu^{\prime},v^{\prime} in DD^{\prime}, there is a arc between uu^{\prime} and vv^{\prime} (in either direction) if and only if either L(u)=L(v)L^{\prime}(u^{\prime})=L^{\prime}(v^{\prime}) or (L(v),L(u))E(L^{\prime}(v^{\prime}),L^{\prime}(u^{\prime}))\in E^{\mathcal{M}}.

\Longrightarrow: By symmetry, suppose (uv)E(u^{\prime}\rightarrow v^{\prime})\in E^{\prime}. If (uv)E1(u^{\prime}\rightarrow v^{\prime})\in E^{\prime}_{1}, then u=f(w)u^{\prime}=f^{\ast}(w) and v=f(w)v^{\prime}=f(w) for some vertex w𝒮3𝒮4w\in\mathscr{S}_{3}\cup\mathscr{S}_{4}. Thus, by (2) and (3) we have L(u){i,i}L^{\prime}(u^{\prime})\in\{i^{\uparrow},i^{\downarrow}\} and L(v)=L(w)L^{\prime}(v^{\prime})=L(w)^{\downarrow} where (L(w),i)(L(w),i)\in\mathcal{M}. By (L(w),i)(L(w),i)\in\mathcal{M}, any two vertices in {(L(w)),(L(w)),i,i}\{(L(w))^{\uparrow},(L(w))^{\downarrow},i^{\uparrow},i^{\downarrow}\} are connected in GG^{\mathcal{M}}. In particular, (L(v),L(u))E(L^{\prime}(v^{\prime}),L^{\prime}(u^{\prime}))\in E^{\mathcal{M}}. If (uv)E2(u^{\prime}\rightarrow v^{\prime})\in E^{\prime}_{2}, we have L(u)=L(v)L^{\prime}(u^{\prime})=L^{\prime}(v^{\prime}) or (L(u),L(v))E(L^{\prime}(u^{\prime}),L^{\prime}(v^{\prime}))\in E^{\mathcal{M}} immediately.

\Longleftarrow: Suppose u,vVu^{\prime},v^{\prime}\in V^{\prime} are two distinct nodes where L(u)=L(v)L^{\prime}(u^{\prime})=L^{\prime}(v^{\prime}) or (L(u),L(v))E(L^{\prime}(u^{\prime}),L^{\prime}(v^{\prime}))\in E^{\mathcal{M}}. If g(u)g(v)g(u^{\prime})\not=g(v^{\prime}), then either g(u)g(v)g(u^{\prime})\prec g(v^{\prime}) or g(v)g(u)g(v^{\prime})\prec g(u^{\prime}) in πD\pi_{D}, which implies that either (uv)E2(u^{\prime}\rightarrow v^{\prime})\in E^{\prime}_{2} or (vu)E2(v^{\prime}\rightarrow u^{\prime})\in E^{\prime}_{2}. Otherwise, g(u)=g(v)g(u^{\prime})=g(v^{\prime}). Let v:=g(u)=g(v)v:=g(u^{\prime})=g(v^{\prime}). By (2) and (3), we have v𝒮3𝒮4v\in\mathscr{S}_{3}\cup\mathscr{S}_{4} and {u,v}={f(v),f(v)}\{u^{\prime},v^{\prime}\}=\{f(v),f^{\ast}(v)\}. Therefore either uvu^{\prime}\rightarrow v^{\prime} or vuv^{\prime}\rightarrow u^{\prime} is in E1E^{\prime}_{1}.

Finally, one can check that f(v)f(v) where vv is the unique sink of DD is the unique sink of DD^{\prime}. This completes the proof. ∎

In the rest of this section, we show that hh is injective. Given D𝒟(GD)D\in\mathcal{D}(G_{D}) and (i,j)(i,j)\in\mathcal{M}, recall that List(D,i,j)\mathrm{List}(D,i,j) is the sequence listing all nodes in DD^{\prime} labelled with ii or jj in the topological order. Similarly,

Definition C.2.

Given D=(V,E,L)𝒟(G)D^{\prime}=(V^{\prime},E^{\prime},L^{\prime})\in\mathcal{D}(G^{\mathcal{M}}) and (i,j)(i,j)\in\mathcal{M}, we use List(D,i,j)\mathrm{List^{\prime}}(D^{\prime},i,j) to denote the unique sequence listing all nodes in DD^{\prime} with label in {i,i,j,j}\{i^{\uparrow},i^{\downarrow},j^{\uparrow},j^{\downarrow}\} in the topological order.

Claims C.3 and C.5 are two properties about List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j), which will be used to show the injectiveness of hh.

Claim C.3.

Suppose D=h(D,𝒮)D^{\prime}=h(D,\mathscr{S}) for some D𝒟(GD)D\in\mathcal{D}(G_{D}) and 𝒮ψ(D)\mathscr{S}\in\psi(D). Let (i,j)(i,j)\in\mathcal{M}. Then for any node vv^{\prime} in DD^{\prime},

  • (a)

    vList(D,i,j)v^{\prime}\in\mathrm{List}^{\prime}(D^{\prime},i,j) if and only if g(v)List(D,i,j)g(v^{\prime})\in\mathrm{List}(D,i,j);

  • (b)

    for any other node uu^{\prime} in DD^{\prime}, if g(u)g(u^{\prime}) precedes g(v)g(v^{\prime}) in List(D,i,j)\mathrm{List}(D,i,j), then uu^{\prime} precedes vv^{\prime} in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j);

  • (c)

    if v𝒮3𝒮4v\in\mathscr{S}_{3}\cup\mathscr{S}_{4}, then f(v)f(v) is next to f(v)f^{\ast}(v) in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j).

Proof.

Part (a) is immediate by Definition 3.11.

Now, we show Part (b). Suppose g(u)g(u^{\prime}) precedes g(v)g(v^{\prime}) in List(D,i,j)\mathrm{List}(D,i,j). Then g(u)g(v)g(u^{\prime})\prec g(v^{\prime}) in πD\pi_{D}. Thus one can check that all the four arcs f(g(u))f(g(v))f(g(u^{\prime}))\rightarrow f(g(v^{\prime})), f(g(u))f(g(v))f^{\ast}(g(u^{\prime}))\rightarrow f(g(v^{\prime})), f(g(u))f(g(v))f(g(u^{\prime}))\rightarrow f^{\ast}(g(v^{\prime})), and f(g(u))f(g(v))f^{\ast}(g(u^{\prime}))\rightarrow f^{\ast}(g(v^{\prime})) are contained in E2E_{2}^{\prime}. In particular, (uv)E(u^{\prime}\rightarrow v^{\prime})\in E^{\prime} as u{f(g(u)),f(g(u))}u^{\prime}\in\{f(g(u^{\prime})),f^{\ast}(g(u^{\prime}))\} and v{f(g(v)),f(g(v))}v^{\prime}\in\{f(g(v^{\prime})),f^{\ast}(g(v^{\prime}))\}. This implies that uu^{\prime} precedes vv^{\prime} in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j).

Finally, we prove Part (c). According to Part (b), f(v)f(v) and f(v)f^{\ast}(v) are adjacent in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j). Besides, as there is an arc f(v)f(v)f^{\ast}(v)\rightarrow f(v) in E1E_{1}^{\prime}, we conclude that f(v)f(v) is next to f(v)f^{\ast}(v) in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j).

Definition C.4.

For a reversible arc uvu^{\prime}\rightarrow v^{\prime} in DD^{\prime}, we call it (,)(*,\downarrow)-reversible in DD^{\prime} if L(u){i,i}L^{\prime}(u^{\prime})\in\{i^{\uparrow},i^{\downarrow}\} and L(v)=jL^{\prime}(v^{\prime})=j^{\downarrow} for some (i,j)ED(i,j)\in E_{D}.

Claim C.5.

Suppose D=h(D,𝒮)D^{\prime}=h(D,\mathscr{S}) for some D𝒟(GD)D\in\mathcal{D}(G_{D}) and 𝒮ψ(D)\mathscr{S}\in\psi(D). Let (i,j)(i,j)\in\mathcal{M}. Let u,vu^{\prime},v^{\prime} be two nodes in List(D,i,j)\mathrm{List^{\prime}}(D^{\prime},i,j) where vv^{\prime} is next to uu^{\prime}. Then uV2u^{\prime}\in V_{2}^{\prime} if and only if uvu^{\prime}\rightarrow v^{\prime} is (,)(*,\downarrow)-reversible in DD^{\prime} and vV1v^{\prime}\in V_{1}^{\prime}.

Proof.

\Longrightarrow: Let u:=g(u)u:=g(u^{\prime}). Assume uV2u^{\prime}\in V_{2}^{\prime}, i.e., u=f(u)u^{\prime}=f^{\ast}(u). By Definition 3.11, u𝒮3𝒮4u\in\mathscr{S}_{3}\cup\mathscr{S}_{4}. According to Part (c) of Claim C.3, as vv^{\prime} is next to uu^{\prime}, we have v=f(u)v^{\prime}=f(u) and then vV1v^{\prime}\in V_{1}^{\prime}.

Now we show that uvu^{\prime}\rightarrow v^{\prime} is (,)(*,\downarrow)-reversible. First, by Definition 3.11, either L(u){i,i}L^{\prime}(u^{\prime})\in\{i^{\uparrow},i^{\downarrow}\} and L(v)=jL^{\prime}(v^{\prime})=j^{\downarrow}, or L(u){j,j}L^{\prime}(u^{\prime})\in\{j^{\uparrow},j^{\downarrow}\} and L(v)=iL^{\prime}(v^{\prime})=i^{\downarrow}. What remains is to show uvu^{\prime}\rightarrow v^{\prime} is reversible, by Fact 2.5 which is equivalent to show that f(u)f(u)f^{\ast}(u)\rightarrow f(u) is the unique path from uu^{\prime} to vv^{\prime} in DD^{\prime}. By contradiction, assume that there is a path f(u)w1wkf(u)f^{\ast}(u)\rightarrow w^{\prime}_{1}\rightarrow\cdots\rightarrow w^{\prime}_{k}\rightarrow f(u) in DD^{\prime} where w1f(u)w^{\prime}_{1}\neq f(u) and wkf(u)w^{\prime}_{k}\neq f^{\ast}(u). As w1f(u)w^{\prime}_{1}\neq f(u), we have (f(u)w1)(f^{\ast}(u)\rightarrow w^{\prime}_{1}) is not in E1E^{\prime}_{1} and then should be in E2E^{\prime}_{2}, which further implies that ug(w1)u\prec g(w^{\prime}_{1}) in πD\pi_{D}. Similarly, we have g(wk)ug(w^{\prime}_{k})\prec u in πD\pi_{D}. So g(wk)ug(w1)g(w^{\prime}_{k})\prec u\prec g(w^{\prime}_{1}). Meanwhile, for each <k\ell<k, if (ww+1)E1(w^{\prime}_{\ell}\rightarrow w^{\prime}_{\ell+1})\in E^{\prime}_{1}, then g(w)=g(w+1)g(w^{\prime}_{\ell})=g(w^{\prime}_{\ell+1}); if (ww+1)E2(w^{\prime}_{\ell}\rightarrow w^{\prime}_{\ell+1})\in E^{\prime}_{2}, then g(w)g(w+1)g(w^{\prime}_{\ell})\prec g(w^{\prime}_{\ell+1}) in πD\pi_{D}. So, it always holds that g(w)g(w+1)g(w^{\prime}_{\ell})\preccurlyeq g(w^{\prime}_{\ell+1}) in πD\pi_{D} for each <k\ell<k. In particular, g(w1)g(wk)g(w_{1}^{\prime})\preccurlyeq g(w^{\prime}_{k}). A contradiction.

\Longleftarrow: Let u:=g(u)u:=g(u^{\prime}) and v:=g(v)v:=g(v^{\prime}). Assume uV2u^{\prime}\notin V_{2}^{\prime} and vV1v^{\prime}\in V_{1}^{\prime}, i.e., u=f(u)u^{\prime}=f(u) and v=f(v)v^{\prime}=f(v). Furthermore, assume L(v)=jL^{\prime}(v^{\prime})=j^{\downarrow}, then v𝒮1v\notin\mathscr{S}_{1} and L(v)=jL(v)=j. We will show that (f(u)f(v))(f(u)\rightarrow f(v)) is not reversible.

Note that (f(u)f(v))(f(u)\rightarrow f(v)) should be in E2E^{\prime}_{2} and then uvu\prec v in πD\pi_{D}. By L(u){i,i}L^{\prime}(u^{\prime})\in\{i^{\uparrow},i^{\downarrow}\}, u=f(u)u^{\prime}=f(u), and (2), we have L(u)=iL(u)=i. Thus, (L(u),L(v))=(i,j)ED(L(u),L(v))=(i,j)\in\mathcal{M}\subseteq E_{D}. As DD is a wdag and uvu\prec v in πD\pi_{D}, the arc (uv)(u\rightarrow v) exists in DD. Since v𝒮1v\notin\mathscr{S}_{1}, v𝒱v\notin\mathcal{V}, which means that uvu\rightarrow v is not reversible in DD. According to Fact 2.5, there is a path u=w1w2wkwk+1=vu=w_{1}\rightarrow w_{2}\rightarrow\cdots\rightarrow w_{k}\rightarrow w_{k+1}=v from uu to vv in DD other than the arc uvu\rightarrow v, where ww+1w_{\ell}\prec w_{\ell+1} in πD\pi_{D} and (L(w)=L(w+1))((L(w),L(w+1))ED)\left(L(w_{\ell})=L(w_{\ell+1})\right)\lor\left((L(w_{\ell}),L(w_{\ell+1}))\in E_{D}\right) for each [k]\ell\in[k].

According to the definition of GG^{\mathcal{M}} and (2), one can check that (f(wi)f(wi+1))E2(f(w_{i})\rightarrow f(w_{i+1}))\in E^{\prime}_{2}. Therefore u=f(w1)f(w2)f(wk)f(wk+1)=vu^{\prime}=f(w_{1})\rightarrow f(w_{2})\rightarrow\cdots\rightarrow f(w_{k})\rightarrow f(w_{k+1})=v^{\prime} is a path from uu^{\prime} to vv^{\prime} in DD^{\prime}, which implies that uf(v)u^{\prime}\rightarrow f(v) is not reversible in DD^{\prime} by Fact 2.5. ∎

Having Claims C.3 and C.5, we are ready to show that hh is injective.

Lemma C.6.

hh is injective.

Proof.

Fix a D=(V,E,L)𝒟(GD)D=(V,E,L)\in\mathcal{D}(G_{D}) and a 𝒮ψ(D)\mathscr{S}\in\psi(D). Let D=(V,E,L)D^{\prime}=(V^{\prime},E^{\prime},L^{\prime}) denote h(D,𝒮)h(D,\mathscr{S}). We show (D,𝒮)(D,\mathscr{S}) can be recovered from DD^{\prime}, which implies the injectiveness of hh.

First, we recover the partition (V1,V2)(V_{1}^{\prime},V_{2}^{\prime}). That is, given a node uVu^{\prime}\in V^{\prime}, we distinguish whether uV1u^{\prime}\in V_{1}^{\prime} or uV2u^{\prime}\in V_{2}^{\prime}. If L(u)[m]L^{\prime}(u^{\prime})\in[m]\setminus\mathcal{M}, then uV1u^{\prime}\in V_{1}^{\prime} according to (2). Otherwise, we have L(u){i,i}L^{\prime}(u^{\prime})\in\{i^{\uparrow},i^{\downarrow}\} for some (i,j)(i,j)\in\mathcal{M}, hence uu^{\prime} is in List(D,i,j)\mathrm{List}^{\prime}(D^{\prime},i,j). Assume the nodes in List(D,i,j)\mathrm{List}^{\prime}(D,i,j) are v1v2v3vkv_{1}^{\prime}v_{2}^{\prime}v_{3}^{\prime}\cdots v_{k}^{\prime}. According to Claim C.5, we can see that the following procedure distinguishes whether vV1v_{\ell}^{\prime}\in V_{1}^{\prime} or vkV2v_{k}^{\prime}\in V_{2}^{\prime} for all vList(D,i,j)v_{\ell}^{\prime}\in\mathrm{List}^{\prime}(D^{\prime},i,j), including uu^{\prime}.

1 Initially, mark that vkV1v_{k}^{\prime}\in V_{1}^{\prime}, and let :=k1\ell:=k-1;
2 while  1\ell\geq 1 do
3       if the arc (vv+1)(v_{\ell}^{\prime}\rightarrow v_{\ell+1}^{\prime}) is (,)(*,\downarrow)-reversible and v+1V1v_{\ell+1}^{\prime}\in V_{1}^{\prime} then
4             Mark that vV2v_{\ell}^{\prime}\in V_{2}^{\prime};
5            
6      else
7             Mark that vV1v_{\ell}^{\prime}\in V_{1}^{\prime};
8            
9      :=1\ell:=\ell-1;

Secondly, we can easily recover D=(V,E,L)D=(V,E,L) from DD^{\prime} and (V1,V2)(V_{1}^{\prime},V_{2}^{\prime}). Ignoring labels, it is easy to see that DD is exactly the induced subgraph of DD^{\prime} on V1V_{1}^{\prime}. By the way, we also get the function f:VV1f:V\rightarrow V_{1}^{\prime}. For labels, we simply replace each label ii^{\uparrow} or ii^{\downarrow} with ii.

Finally, we recover 𝒮\mathscr{S} from DD^{\prime}, DD and (V1,V2)(V_{1}^{\prime},V_{2}). That is, we distinguish which one of {𝒮1,𝒮2,𝒮3,𝒮3}\{\mathscr{S}_{1},\mathscr{S}_{2},\mathscr{S}_{3},\mathscr{S}_{3}\} contains a given node v(D)v\in\mathscr{M}(D). Assume L(v)=iL(v)=i and (i,j)(i,j)\in\mathcal{M}. Let uu^{\prime} be the node previous to f(v)f(v) in List(D,i,j)\mathrm{List^{\prime}}(D,i,j). According to Part (c) of Claim C.3, uV2u^{\prime}\in V_{2}^{\prime} if and only if v𝒮3𝒮4v\in\mathscr{S}_{3}\cup\mathscr{S}_{4}. When v𝒮3𝒮4v\in\mathscr{S}_{3}\cup\mathscr{S}_{4}, v𝒮3v\in\mathscr{S}_{3} if L(u)=jL^{\prime}(u^{\prime})=j^{\uparrow}, and v𝒮4v\in\mathscr{S}_{4} if L(u)=jL^{\prime}(u^{\prime})=j^{\downarrow}. When v𝒮3𝒮4v\notin\mathscr{S}_{3}\cup\mathscr{S}_{4}, v𝒮1v\in\mathscr{S}_{1} if L(v)=iL^{\prime}(v^{\prime})=i^{\uparrow}, and v𝒮2v\in\mathscr{S}_{2} if L(v)=iL^{\prime}(v^{\prime})=i^{\downarrow}.

Appendix D Proof of Lemma 6.1

Let 𝒆𝒊\bm{e_{i}} denote the vector whose coordinates are all 0 except the ii-th that equals 1. The following lemmas will be used in the proof.

Lemma D.1.

[37] Let GD=([m],ED)G_{D}=([m],E_{D}) be a dependency graph and 𝐩\bm{p} be a probability vector beyond the Shearer’s bound. Suppose i,i1,i2,,ik1,ii,i_{1},i_{2},\cdots,i_{k-1},i^{\prime} form a shortest path from ii to ii^{\prime} in GDG_{D}. Then for any qpiq\leq p_{i^{\prime}}, 𝐩q𝐞𝐢+([k1]1pipi)1pipiq𝐞𝐢\bm{p}-q\bm{e_{i^{\prime}}}+\big{(}\prod_{\ell\in[k-1]}\frac{1-p_{i_{\ell}}}{p_{i_{\ell}}}\big{)}\cdot\frac{1-p_{i}}{p_{i^{\prime}}}\cdot q\bm{e_{i}} is also beyond the Shearer’s bound.

Without loss of generality, we assume that pipap_{i}-p_{a} is rational for each i[m]i\in[m]. By contradiction, let 𝒑\bm{p} be such a vector which is beyond Shearer’s bound. Let S+:={i[m]:pi>p}S_{+}:=\{i\in[m]:p_{i}>p\} and S:={i[m]:pi<p}S_{-}:=\{i\in[m]:p_{i}<p\}. Let Δp\Delta_{p} be a real number such that the following hold:

  • For each iS+i\in S_{+}, pipa=γiΔpp_{i}-p_{a}=\gamma_{i}\cdot\Delta_{p} for some γi+\gamma_{i}\in\mathbb{N}^{+}. Intuitively, we cut pipap_{i}-p_{a} into γi\gamma_{i} pieces each of size Δp\Delta_{p}. Besides, we call such pieces positive pieces.

  • For each iSi\in S_{-},

    papi=τiK(1papa)d1Δppap_{a}-p_{i}=\tau_{i}\cdot K\cdot\left(\frac{1-p_{a}}{p_{a}}\right)^{d-1}\cdot\frac{\Delta_{p}}{p_{a}}

    for some τi+\tau_{i}\in\mathbb{N}^{+}. Intuitively, we cut papip_{a}-p_{i} into τiK\tau_{i}\cdot K pieces each of size (1papa)d1Δppa\left(\frac{1-p_{a}}{p_{a}}\right)^{d-1}\cdot\frac{\Delta_{p}}{p_{a}}. We call such pieces negative pieces.

We use :={(i,r):iS+,r[γi]}\mathcal{R}:=\{(i,r):i\in S_{+},r\in[\gamma_{i}]\} and 𝒯{(i,t,k):iS,t[τi],k[K]}\mathcal{T}\triangleq\{(i^{\prime},t,k):i^{\prime}\in S_{-},t\in[\tau_{i^{\prime}}],k\in[K]\} to denote the set of positive pieces and negative pieces respectively.

For convenience, let γi=0\gamma_{i}=0 if iS+i\not\in S_{+}, and τi=0\tau_{i}=0 if iSi\not\in S_{-}. Then Condition (c) can be restated as: for f(S)=Tf(S)=T, the positive pieces in SS are no more than the negative pieces in TT, i.e.,

(16) iSγiiTτi.\displaystyle\sum_{i\in S}\gamma_{i}\leq\sum_{i^{\prime}\in T}\tau_{i^{\prime}}.

The basic idea of Lemma 6.1 is relatively simple: for each S𝒮S\in\mathcal{S}, we move positive pieces in SS to f(S)f(S) such that (i) all the positive pieces in SS are absorbed by the negative pieces in f(S)f(S) and (ii) the resulted probability vector is still beyond Shearer’s bound. Finally, all positive pieces will be absorbed, and we will get a vector strictly smaller than 𝒑\bm{p}. By Lemma D.1, this vector is beyond Shearer’s bound, which makes a contradiction.

For i[m]i^{\prime}\in[m], remember Condition (a) which says that there are at most KK subsets S𝒮S\subset\mathcal{S} such that if(S)i^{\prime}\in f(S), and we use Si1,Si2,S^{1}_{i^{\prime}},S^{2}_{i^{\prime}},\cdots to represent these subsets. Let g:𝒯g:\mathcal{R}\rightarrow\mathcal{T} be a injection mapping each (i,r)(i,r)\in\mathcal{R} to some (i,t,k)𝒯(i^{\prime},t,k)\in\mathcal{T} satisfying that (i) iSiki\in S^{k}_{i^{\prime}} and (ii)

i0Sik,i0<iγi0+r=i1f(Sik),i1<iτi1+t.\sum_{i_{0}\in S^{k}_{i^{\prime}},i_{0}<i}\gamma_{i_{0}}+r=\sum_{i_{1}\in f(S^{k}_{i^{\prime}}),i_{1}<i^{\prime}}\tau_{i_{1}}+t.

By (16), one can verify that such mapping gg exists. In addition, according to Condition (b), if g(i,r)=(i,t,k)g(i,r)=(i^{\prime},t,k), then dist(i,i)d\mathrm{dist}(i,i^{\prime})\leq d.

In the following, we will apply Lemma D.1 repeatedly.

Let g0g_{0} be gg, S0S_{0} be SS_{-} and 0\mathcal{R}_{0} be \mathcal{R}. Given an injection gκ:𝒯g_{\kappa}:\mathcal{R}\rightarrow\mathcal{T}, SκS_{\kappa} and κ\mathcal{R}_{\kappa} where dis(i,j)d\text{dis}(i,j)\leq d if gκ(i,r)=(j,t,k)g_{\kappa}(i,r)=(j,t,k), we construct another injection gκ+1:𝒯g_{\kappa+1}:\mathcal{R}\rightarrow\mathcal{T}, Sκ+1S_{\kappa+1} and κ+1\mathcal{R}_{\kappa+1} as follows. There are two possible cases for gκg_{\kappa}, SκS_{\kappa} and κ\mathcal{R}_{\kappa}.

  • (1)

    there exists i,r,j,t,ki,r,j,t,k such that (i,r)κ(i,r)\in\mathcal{R}_{\kappa}, gκ(i,r)=(j,t,k)g_{\kappa}(i,r)=(j,t,k) and there is a shortest path between ii and jj such that no vertex in SκS_{\kappa} is on the path;

  • (2)

    For each gκ(i,r)=(j,t,k)g_{\kappa}(i,r)=(j,t,k) where (i,r)κ(i,r)\in\mathcal{R}_{\kappa} and each shortest path between ii and jj, there is a vertex in SκS_{\kappa} on the path.

For case (1), we let gκ+1=gκg_{\kappa+1}=g_{\kappa}, κ+1=κ{(i,r)}\mathcal{R}_{\kappa+1}=\mathcal{R}_{\kappa}\setminus\{(i,r)\}, and

Sκ+1={jS:there exists i,r,t,k where (i,r)κ+1 such that gκ+1(i,r)=(j,t,k)}.S_{\kappa+1}=\{j\in S_{-}:\text{there exists }i,r,t,k\text{ where }(i,r)\in\mathcal{R}_{\kappa+1}\text{ such that }g_{\kappa+1}(i,r)=(j,t,k)\}.

For case (2), there must be (i1,r1,j1,t1,k1),,(in,rn,jn,tn,kn)(i_{1},r_{1},j_{1},t_{1},k_{1}),\cdots,(i_{n},r_{n},j_{n},t_{n},k_{n}) for some n+n\in\mathbb{N}^{+} such that

  • -

    (i,r)κ(i_{\ell},r_{\ell})\in\mathcal{R}_{\kappa}, jSκj_{\ell}\in S_{\kappa}, gκ(i,r)=(j,t,k)g_{\kappa}(i_{\ell},r_{\ell})=(j_{\ell},t_{\ell},k_{\ell}) for each [n]\ell\in[n],

  • -

    j+1j_{\ell+1} is on a shortest path between ii_{\ell} and jj_{\ell} for each [n1]\ell\in[n-1],

  • -

    j1j_{1} is on a shortest path between ini_{n} and jnj_{n}.

We define the injection F(gκ)F(g_{\kappa}) as follows.

{F(gκ)(in,rn)=(j1,t1,k1),F(gκ)(i,r)=(j+1,t+1,k+1) for each [n1],F(gκ)(i,r)=gκ(i,r) for other (i,r).\displaystyle\begin{cases}F(g_{\kappa})(i_{n},r_{n})&=(j_{1},t_{1},k_{1}),\\ F(g_{\kappa})(i_{\ell},r_{\ell})&=(j_{\ell+1},t_{\ell+1},k_{\ell+1})\text{ for each }\ell\in[n-1],\\ F(g_{\kappa})(i,r)&=g_{\kappa}(i,r)\text{ for other }(i,r).\end{cases}

One can verify that dis(i,j)d\text{dis}(i,j)\leq d if F(gκ)(i,r)=(j,t,k)F(g_{\kappa})(i,r)=(j,t,k) and

N(i,r,j,t,k):gκ(i,r)=(j,t,k)dis(i,j)1+(i,r,j,t,k):F(gκ)(i,r)=(j,t,k)dis(i,j).\displaystyle N\triangleq\sum_{\begin{subarray}{c}(i,r,j,t,k):\\ g_{\kappa}(i,r)=(j,t,k)\end{subarray}}\text{dis}(i,j)\geq 1+\sum_{\begin{subarray}{c}(i,r,j,t,k):\\ F(g_{\kappa})(i,r)=(j,t,k)\end{subarray}}\text{dis}(i,j).

Since NN is bounded, there must be a constant N\ell\leq N and i,r,j,t,ki,r,j,t,k such that (i,r)κ(i,r)\in\mathcal{R}_{\kappa}, F(gκ)(i,r)=(j,t,k)F^{\ell}(g_{\kappa})(i,r)=(j,t,k) and there is a shortest path between ii and jj such that no vertex in SκS_{\kappa} is on the path. Let gκ+1=F(gκ)g_{\kappa+1}=F^{\ell}(g_{\kappa}), κ+1=κ{(i,r)}\mathcal{R}_{\kappa+1}=\mathcal{R}_{\kappa}\setminus\{(i,r)\} and

Sκ+1={jS:there exists i,r,t,k where (i,r)κ+1 such that gκ+1(i,r)=(j,t,k)}.S_{\kappa+1}=\{j\in S_{-}:\text{there exists }i,r,t,k\text{ where }(i,r)\in\mathcal{R}_{\kappa+1}\text{ such that }g_{\kappa+1}(i,r)=(j,t,k)\}.

One can verify that in both cases, gκ+1g_{\kappa+1} is an injection from \mathcal{R} to 𝒯\mathcal{T} and dis(i,j)d\text{dis}(i,j)\leq d if gκ+1(i,r)=(j,t,k)g_{\kappa+1}(i,r)=(j,t,k).

Let gg^{\prime} be g||g_{|\mathcal{R}|}. For each [||]\ell\in[|\mathcal{R}|], let (i,r)(i_{\ell},r_{\ell}) be the unique element in 1\mathcal{R}_{\ell-1}\setminus\mathcal{R}_{\ell}. Let (j,t,k)(j_{\ell},t_{\ell},k_{\ell}) denote g(i,r)g^{\prime}(i_{\ell},r_{\ell}). Thus, we have

  • -

    gg^{\prime} is an injection from \mathcal{R} to 𝒯\mathcal{T},

  • -

    dis(i,j)d\text{dis}(i_{\ell},j_{\ell})\leq d for each [||]\ell\in[|\mathcal{R}|],

  • -

    there is a shortest path between ii_{\ell} and jj_{\ell} such that j+1,j+2,,j||Sj_{\ell+1},j_{\ell+2},\cdots,j_{|\mathcal{R}|}\in S_{\ell} are not on the path.

For each jSj\in S_{-}, define

ηj=|{(i,r):g(i,r)=(j,t,k) for some t[τj],k[K]}|.\displaystyle\eta_{j}=|\{(i,r):g^{\prime}(i,r)=(j,t,k)\text{ for some }t\in[\tau_{j}],k\in[K]\}|.

Because gg^{\prime} is an injection, we have ηjτjK\eta_{j}\leq\tau_{j}\cdot K. Let

𝒑′′𝒑+jS(Kτjηj)(1pp)d1Δpp𝒆j.\bm{p^{\prime\prime}}\triangleq\bm{p}^{\prime}+\sum_{j\in S_{-}}(K\cdot\tau_{j}-\eta_{j})\cdot\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{\Delta_{p}}{p}\cdot\bm{e}_{j}.

By 𝒑\bm{p}^{\prime} is beyond Shearer’s bound and ηjKτj\eta_{j}\leq K\cdot\tau_{j} for each jSj\in S_{-}, we have 𝒑′′\bm{p}^{\prime\prime} is also beyond Shearer’s bound. For each [0,||]\ell\in[0,|\mathcal{R}|], let

𝒑𝒑′′Δp(κ1(𝒆iκ(1pp)d11p𝒆jκ)+𝒆i(1pp)d11p+Δp𝒆j).\displaystyle\bm{p}_{\ell}\triangleq\bm{p}^{\prime\prime}-\Delta_{p}\cdot\left(\sum_{\kappa\leq\ell-1}\left(\bm{e}_{i_{\kappa}}-\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{1}{p}\cdot\bm{e}_{j_{\kappa}}\right)+\bm{e}_{i_{\ell}}-\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{1}{p+\Delta_{p}}\cdot\bm{e}_{j_{\ell}}\right).

Then we have the following claim.

Claim D.2.

For [0,||]\ell\in[0,|\mathcal{R}|], 𝐩\bm{p}_{\ell} is beyond Shearer’s bound.

Proof.

We prove this claim by induction. Obviously, 𝒑0\bm{p}_{0} is beyond Shearer’s bound. In the following, we prove that if 𝒑1\bm{p}_{\ell-1} is beyond Shearer’s bound, then 𝒑\bm{p}_{\ell} is also beyond Shearer’s bound.

Let

𝒒𝒑′′Δpκ1(𝒆iκ(1pp)d11p𝒆jκ).\bm{q}\triangleq\bm{p}^{\prime\prime}-\Delta_{p}\cdot\sum_{\kappa\leq\ell-1}\left(\bm{e}_{i_{\kappa}}-\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{1}{p}\cdot\bm{e}_{j_{\kappa}}\right).

Obviously, 𝒒𝒑1\bm{q}\geq\bm{p}_{\ell-1}. By 𝒑1\bm{p}_{\ell-1} is beyond Shearer’s bound, we have 𝒒\bm{q} is also beyond Shearer’s bound. Note that there is a shortest path i,k1,k2,,kn,ji_{\ell},k_{1},k_{2},\cdots,k_{n},j_{\ell} between ii_{\ell} and jj_{\ell} such that j+1,j+2,,j||j_{\ell+1},j_{\ell+2},\cdots,j_{|\mathcal{R}|} are not on the path. Because 𝒒\bm{q} is beyond Shearer’s bound, by Lemma D.1, we have

𝒒𝒒Δp(𝒆i(t[n]1qktqkt)1qi𝒆j)\bm{q}^{\prime}\triangleq\bm{q}-\Delta_{p}\cdot\left(\bm{e}_{i_{\ell}}-\left(\prod_{t\in[n]}\frac{1-q_{k_{t}}}{q_{k_{t}}}\right)\cdot\frac{1}{q_{i}}\cdot\bm{e}_{j_{\ell}}\right)

is also beyond Shearer’s bound. Meanwhile, by (i,r)(i_{\ell},r_{\ell})\in\mathcal{R}, we have

qi=piΔpκ1𝟙(iκ=i)pi(γi1)Δppi+Δp.\displaystyle q_{i_{\ell}}=p^{\prime}_{i}-\Delta_{p}\sum_{\kappa\in\ell-1}\mathbbm{1}(i_{\kappa}=i_{\ell})\geq p^{\prime}_{i}-(\gamma_{i}-1)\Delta_{p}\geq p_{i}+\Delta_{p}.

For each t[n]t\in[n], if ktSk_{t}\not\in S_{-}, we have qktpq_{k_{t}}\geq p. Otherwise, ktSk_{t}\in S_{-}, and ktjκk_{t}\neq j_{\kappa} for each κ\kappa\geq\ell. Thus, we have κ1𝟙(jκ=kt)=ηkt\sum_{\kappa\in\ell-1}\mathbbm{1}(j_{\kappa}=k_{t})=\eta_{k_{t}}. Therefore,

qkt\displaystyle q_{k_{t}} =pkt+(Kτktηkt)(1pp)d1Δpp+κ1𝟙(jκ=kt)(1pp)d1Δpp\displaystyle=p^{\prime}_{k_{t}}+(K\cdot\tau_{k_{t}}-\eta_{k_{t}})\cdot\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{\Delta_{p}}{p}+\sum_{\kappa\in\ell-1}\mathbbm{1}(j_{\kappa}=k_{t})\cdot\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{\Delta_{p}}{p}
=pkt+(Kτktηkt)(1pp)d1Δpp+ηkt(1pp)d1Δpp=p.\displaystyle=p^{\prime}_{k_{t}}+(K\cdot\tau_{k_{t}}-\eta_{k_{t}})\cdot\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{\Delta_{p}}{p}+\eta_{k_{t}}\cdot\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{\Delta_{p}}{p}=p.

By dis(i,j)d,qip+Δp\text{dis}(i,j)\geq d,q_{i_{\ell}}\geq p+\Delta_{p} and qktpq_{k_{t}}\geq p for each t[n]t\in[n], we have

(t[n]1qktqkt)1qi<(1pp)d11p+Δp.\left(\prod_{t\in[n]}\frac{1-q_{k_{t}}}{q_{k_{t}}}\right)\cdot\frac{1}{q_{i}}<\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{1}{p+\Delta_{p}}.

Thus, by 𝒒\bm{q}^{\prime} is beyond Shearer’s bound, we have

𝒑=𝒒Δp(𝒆i(1pp)d11p+Δp𝒆j)\bm{p}_{\ell}=\bm{q}-\Delta_{p}\cdot\left(\bm{e}_{i_{\ell}}-\left(\frac{1-p}{p}\right)^{d-1}\cdot\frac{1}{p+\Delta_{p}}\cdot\bm{e}_{j_{\ell}}\right)

is also beyond Shearer’s bound. ∎

Thus, we have 𝒑||\bm{p}_{|\mathcal{R}|} is beyond Shearer’s bound. It is easy to verify that 𝒑||<𝒑\bm{p}_{|\mathcal{R}|}<\bm{p}, which is contradictory with that 𝒑\bm{p} is on Shearer’s boundary.

Appendix E Missing part in the proof of Theorem 6.2

Proof of Claim 6.3.

Observe that for each (v,v)EUk(v,v^{\prime})\in E_{U}^{k}, if (v,v)(v,v^{\prime})\notin\mathcal{M}, then one of its neighboring edge (v0,v1)(v_{0},v_{1}) is in TkT_{k} and satisfies that δv,vδv0,v1\delta_{v,v^{\prime}}\leq\delta_{v_{0},v_{1}}. Here, we say two edges neighboring if they share a common vertex. Besides, note that each edge has at most 2Δ2\Delta neighboring edges. So

(17) (v0,v1)Tkδv0,v1212Δ(v,v)EUkδv,v2.\displaystyle\sum_{(v_{0},v_{1})\in T_{k}}\delta_{v_{0},v_{1}}^{2}\geq\frac{1}{2\Delta}\sum_{(v,v^{\prime})\in E_{U}^{k}}\delta_{v,v^{\prime}}^{2}.

Moreover, according to Lemma 4.2 and  4.5, it has that

(18) (v,v)EUkδv,v21|EUk|((v,v)EUkδv,v)2|VUk|Δ2|EUk|(ϝ+(GB(GD),𝒑))2,\displaystyle\sum_{(v,v^{\prime})\in E_{U}^{k}}\delta_{v,v^{\prime}}^{2}\geq\frac{1}{|E_{U}^{k}|}\cdot\left(\sum_{(v,v^{\prime})\in E_{U}^{k}}\delta_{v,v^{\prime}}\right)^{2}\geq\frac{|V_{U}^{k}|\cdot\Delta^{2}}{|E_{U}^{k}|}\cdot\left(\digamma^{+}(G_{B}(G_{D}),\bm{p})\right)^{2},

By combining Inequality 17, 18 and the fact that 2|EUk||VUk|Δ2|E_{U}^{k}|\leq|V_{U}^{k}|\Delta, we finish the proof. ∎

Let K:=(Δ+1)|VU|K:=(\Delta+1)|V_{U}|, d:=D+2d:=D+2, 𝒮:={VU1,VU2,}\mathcal{S}:=\{V_{U}^{1},V_{U}^{2},\cdots\}, and f(VUk):=Tkf(V_{U}^{k}):=T_{k}. In the following, we check that all the three conditions in Lemma 6.1 hold.

Condition (a). That is, we want to show |{k:Tkv}|(Δ+1)|VU||\{k:T_{k}\ni v\}|\leq(\Delta+1)|V_{U}| for each vVDv\in V_{D}. Observe that if vTkv\in T_{k}, then v𝒩+(VUk)v\in\mathcal{N}^{+}(V_{U}^{k}). So

|{k:Tkv}||{k:𝒩+(VUk)v}||{k:𝒩+(v)VUk}|v𝒩+(v)|{k:VUkv}|(Δ+1)|VD|.|\{k:T_{k}\ni v\}|\leq|\{k:\mathcal{N}^{+}(V_{U}^{k})\ni v\}|\leq|\{k:\mathcal{N}^{+}(v)\cap V_{U}^{k}\neq\emptyset\}|\leq\sum_{v^{\prime}\in\mathcal{N}^{+}(v)}|\{k:V_{U}^{k}\ni v^{\prime}\}|\leq(\Delta+1)\cdot|V_{D}|.

The last inequality uses the fact that h(k,u)h(k,u)h(k^{\prime},u)\neq h(k,u) if kkk\neq k^{\prime}.

Condition (b). That is, we want to show dist(v,v)D+2\mathrm{dist}(v,v^{\prime})\leq D+2 for any vVUkv\in V_{U}^{k} and vTkv^{\prime}\in T_{k}. This is obvious, because if vTkv^{\prime}\in T_{k}, then v𝒩+(VUk)v^{\prime}\in\mathcal{N}^{+}(V_{U}^{k}).

Condition (c). We verify that

(19) (1papa)D+1KpaiSmax{pipa,0}iTmax{papi,0}.\displaystyle\left(\frac{1-p_{a}}{p_{a}}\right)^{D+1}\cdot\frac{K}{p_{a}}\cdot\sum_{i\in S}\max\{p_{i}-p_{a},0\}\leq\sum_{i\in T}\max\{p_{a}-p_{i},0\}.

On one hand, noting that max{(1+ε)pvpa,0}max{(1+ε)pvpa,0}q\max\{(1+\varepsilon)p^{-}_{v}-p_{a},0\}\leq\max\{(1+\varepsilon)p_{v}-p_{a},0\}\leq q, we have

(20) L.H.S of (19)(1papa)D+1(Δ+1)|VU|2paq.\displaystyle\text{L.H.S of }(\ref{eq:conditionc})\leq\left(\frac{1-p_{a}}{p_{a}}\right)^{D+1}\cdot\frac{(\Delta+1)|V_{U}|^{2}}{p_{a}}\cdot q.

On the other hand, observe that

max{pa(1+ε)pv,0}\displaystyle\max\{p_{a}-(1+\varepsilon)p^{-}_{v},0\} pa(1+ε)pv=(pa+q(1+ε)pv)q(1+ε)(pvpv)q\displaystyle\geq p_{a}-(1+\varepsilon)p^{-}_{v}=(p_{a}+q-(1+\varepsilon)p^{-}_{v})-q\geq(1+\varepsilon)(p_{v}-p_{v}^{-})-q
(pvpv)q,\displaystyle\geq(p_{v}-p_{v}^{-})-q,

where the last inequality is due to the assumption that (1+ε)𝒑(pa+q,,pa+q)(1+\varepsilon)\bm{p}\leq(p_{a}+q,\cdots,p_{a}+q). Then

R.H.S of (19)\displaystyle\text{R.H.S of }(\ref{eq:conditionc})\geq (vVUk(pvpv))|𝒩+(VUk)|q217((v0,v1)Tkδv0,v12)Δ|VU|q\displaystyle\left(\sum_{v\in V_{U}^{k}}(p_{v}-p_{v}^{-})\right)-|\mathcal{N}^{+}(V_{U}^{k})|q\geq\frac{2}{17}\left(\sum_{(v_{0},v_{1})\in T_{k}}\delta_{v_{0},v_{1}}^{2}\right)-\Delta|V_{U}|q
(21) \displaystyle\geq 217(ϝ+(GB(GD),𝒑))2Δ|VU|q.\displaystyle\frac{2}{17}\left(\digamma^{+}(G_{B}(G_{D}),\bm{p})\right)^{2}-\Delta|V_{U}|q.

Putting Inequality 20 and 21 together and noting that 1papa1\frac{1-p_{a}}{p_{a}}\geq 1.