This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bounding Distance Between Outputs in Distributed Lattice Agreement

Abdullah Rasheed The University of Texas at Austin
Austin, TX, USA
   Nidhi Dubagunta The University of Texas at Austin
Austin, TX, USA
Abstract

This paper studies the lattice agreement problem and proposes a stronger form, ε\varepsilon-bounded lattice agreement, that enforces an additional tightness constraint on the outputs. To formalize the concept, we define a quasi-metric on the structure of the lattice, which captures a natural notion of distance between lattice elements. We consider the bounded lattice agreement problem in both synchronous and asynchronous systems, and provide algorithms that aim to minimize the distance between the output values, while satisfying the requirements of the classic lattice agreement problem. We show strong impossibility results for the asynchronous case, and a heuristic algorithm that achieves improved tightness with high probability, and we test an approximation of this algorithm to show that only a very small number of rounds are necessary.

Index Terms:
Lattice agreement, fault-tolerance, consensus.

I Introduction

The Lattice Agreement problem, introduced in [1], is a weakened form of the distributed decision problem. Each process begins with a value from a lattice, and the goal is for all processes to decide on values that are mutually comparable. Lattice agreements loosens the constraints of the consensus problem, which requires strict uniformity and validity in the output values and is impossible in asynchronous systems [3]. Lattice agreement has the advantage of decidability in the presence of asynchrony and faults [1], making it an important and useful problem in distributed systems.

Lattice agreement has a variety of applications in practice, and was first proposed as a method for attaining consistent atomic snapshots [1]. A generalized form of the lattice agreement problem was later proposed to implement fault-tolerant replicated state machines [2]. The generalized lattice agreement [2] is also shown to be useful in achieving linearizable data structures, particularly in conflict-free replicated data types. Another version of lattice agreement, reconfigurable lattice agreement, was proposed in [4] in order to handle agreement on the system configuration for client and server consistency.

While better bounds on lattice agreement algorithms were found in [5], they were improved upon in [6] with algorithms for synchronous, asynchronous, and generalized lattice agreement.

In this paper, we propose a new constraint, tightness, on standard lattice agreement that requires all outputs to be within a certain distance of each other as determined by a quasi-metric over the lattice. Such a requirement can force processes to agree on values (such as the global view in atomic snapshot) that are closer to each other. Minimizing this distance between intermediate states in applications such as atomic snapshots and replicated state machines could lead to faster eventual convergence. Reducing the divergence between replicas during the reconciliation phase may mean that fewer overall synchronization operations are needed, allowing the system to reach eventual consistency more efficiently. Additionally processes can potentially recover more quickly in the case of node crashes, because it reduces the risk of losing information from data loss. It can also ensure that processes are “synchronized enough” with respect to what they must agree on, in a similar notion as clock synchronization.

We show that this problem has equivalences to standard lattice agreement under particular bounds, and that there is a discrete structure to the range of tightness on the outputs, despite the continuity of the range on the quasi-metric. We briefly show that this problem is completely solved in the synchronous case, but is much trickier in the asynchronous case. While standard lattice agreement is solvable asynchronously, better tightness can never be guaranteed. This motivates our heuristic approach, which we provide an approximate model of and simulate to get approximate probabilities of improved tightness. We see that after only 5 rounds of our algorithm, the probability converges to a near guarantee. Finally, we conjecture that this number of rounds is constant, no matter the number of processes, faults, or the chance of a fault.

II Background

We will let [n][n] denote the set {1,,n}\{1,\ldots,n\}. We will also let 0={rr0}{}\mathbb{R}_{\geq 0}^{\infty}=\{r\in\mathbb{R}\mid r\geq 0\}\cup\{\infty\}, where r<r<\infty for all r0{}r\in\mathbb{R}_{\geq 0}^{\infty}\setminus\{\infty\}.

A triple (X,,)(X,\leq,\sqcup) is called a join semi-lattice if (X,)(X,\leq) is a poset, and \sqcup is a join operator such that the join of every non-empty finite subset is defined. For brevity, we will refer to join semi-lattices simply as lattices. We say XX is grounded if there exists an element XX\bot_{X}\in X such that Xx\bot_{X}\leq x for all xXx\in X. For any subset SXS\subseteq X, we let ySy\bowtie S mean sS:syS\exists s\in S:s\leq y\leq\bigsqcup S.

With any lattice XX, we will feel free to say a<ba<b for a,bXa,b\in X to mean ababa\leq b\wedge a\neq b.

III System Model

We consider a system model consisting of nn distributed processes, in which at most ff faults may occur. We only consider faults on processes, which suffer crash faults.

We will consider the system to be either synchronous or asynchronous, and each process can send and receive messages from every other process including itself.

IV Distributed Lattice Agreement

The original lattice agreement is as follows. Let (X,,)(X,\leq,\sqcup) be a finite lattice, and let there be nn processes in the system. Each process pip_{i} proposes an input value xiXx_{i}\in X, and each pip_{i} decides on an output yiXy_{i}\in X. To solve the lattice agreement problem, the following constraints must be met:

  • Downward-Validity: i[n]:xiyi\forall i\in[n]:x_{i}\leq y_{i}

  • Upward-Validity: i[n]:yij[n]xj\forall i\in[n]:y_{i}\leq\bigsqcup_{j\in[n]}x_{j}

  • Comparability: i,j[n]:yiyjyjyi\forall i,j\in[n]:y_{i}\leq y_{j}\vee y_{j}\leq y_{i}

We introduce a stronger version of this problem by requiring the outputs to all be “close” to each other with respect to some distance function on the lattice.

Definition 1.

Let (X,,)(X,\leq,\sqcup) be a lattice. Then a function δX:X×X0{}\delta_{X}:X\times X\rightarrow\mathbb{R}_{\geq 0}^{\infty}\cup\{\bot\} is called a quasi-metric over XX if for all x1,x2,x3Xx_{1},x_{2},x_{3}\in X,

  1. (i)

    x1=x2δX(x1,x2)=0x_{1}=x_{2}\Leftrightarrow\delta_{X}(x_{1},x_{2})=0

  2. (ii)

    x1x2δX(x1,x2)x_{1}\leq x_{2}\Leftrightarrow\delta_{X}(x_{1},x_{2})\neq\bot

  3. (iii)

    x1x2x3δX(x1,x3)δX(x1,x2)+δX(x2,x3)x_{1}\leq x_{2}\leq x_{3}\Rightarrow\delta_{X}(x_{1},x_{3})\leq\delta_{X}(x_{1},x_{2})+\delta_{X}(x_{2},x_{3})

Definition 2.

A lattice (X,,)(X,\leq,\sqcup) with a quasi-metric δX\delta_{X} is called a lattice quasi-metric space.

In a lattice quasi-metric space, δX(x1,x2)\delta_{X}(x_{1},x_{2}) can intuitively be described as how “close” x1x_{1} and x2x_{2} are in some natural sense. For example, in the lattice of sets of integers, we would naturally have the distance between {1}\{1\} and {1,2}\{1,2\} is smaller than the distance between {1}\{1\} and {1,,100}\{1,\ldots,100\}, or that the distance between {1}\{1\} and {1,2}\{1,2\} is smaller than the distance between {1}\{1\} and {1,1000}\{1,1000\} (all depending on the context of the lattice). Observe that, while the length of the shortest path from x1x_{1} to x2x_{2} may work as a quasi-metric, this is not the quasi-metric that one may want to use, as there may be paths in the lattice that should naturally be considered longer than others (in the sense of distance), even if the paths are the same length in the lattice.

We may also find it easier to work with more natural quasi-metrics, so we may often impose a requirement of a normal quasi-metric.

Definition 3.

A quasi-metric δX\delta_{X} over a lattice (X,,)(X,\leq,\sqcup) is said to be height-normal (or simply, normal) if it satisfies

  • abcδX(a,b),δX(b,c)δX(a,c)a\leq b\leq c\implies\delta_{X}(a,b),\delta_{X}(b,c)\leq\delta_{X}(a,c)

for all a,b,cXa,b,c\in X.

We now introduce a new version of the lattice agreement problem. The ε\varepsilon-bounded lattice agreement problem (with ε0\varepsilon\geq 0) is as follows. Let (X,,)(X,\leq,\sqcup) be a lattice quasi-metric space with δX\delta_{X}. Each process pip_{i} proposes an input value xiXx_{i}\in X, and each pip_{i} decides on an output yiXy_{i}\in X. To solve the ε\varepsilon-bounded lattice agreement, Downward-Validity, Upward-Valildity, and Comparability must all be met, along with one additional constraint:

  • ε\varepsilon-Tightness: i,j[n]:(yiyjδX(yi,yj)ε)\forall i,j\in[n]:(y_{i}\leq y_{j}\Rightarrow\delta_{X}(y_{i},y_{j})\leq\varepsilon)

This forces the output values to be approximately the same, i.e. “close” to each other.

Later in the paper, we will be interested in studying the tightness of specific instances of the ε\varepsilon-bounded lattice agreement problem. For the following definitions, an instance of a protocol is a pair (I,Y)(I,Y) of sets of input and output values, respectively. In context, I={xii[n]}I=\{x_{i}\mid i\in[n]\} and Y={yii[n]}Y=\{y_{i}\mid i\in[n]\}.

Definition 4.

An instance of a protocol of the lattice agreement problem is said to be γ\gamma-compliant if

  1. (1)

    The outputs satisfy γ\gamma-Tightness;

  2. (2)

    There is no ε<γ\varepsilon<\gamma such that the outputs satisfy ε\varepsilon-Tightness.

Lemma 1.

Any instance of any lattice agreement protocol is γ\gamma-compliant for a unique γ0\gamma\geq 0 determined by γ=maxi,j[n]δX(yi,yj)\gamma=\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j}).

Proof.

Let γ=maxi,j[n]δX(yi,yj)\gamma=\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j}) (which must satisfy γ0\gamma\geq 0, since the outputs form a chain by Comparability). Then the outputs certainly satisfy γ\gamma-Tightness by definition, and if ε<γ\varepsilon<\gamma, then the i,ji,j that maximize γ\gamma (i.e. the i,ji,j such that γ=δX(yi,yj)\gamma=\delta_{X}(y_{i},y_{j})) do not satisfy δX(yi,yj)ε\delta_{X}(y_{i},y_{j})\leq\varepsilon, so ε\varepsilon-Tightness is not satisfied, and so the instance is γ\gamma-compliant. Furthermore, this means the instance is not ε\varepsilon-compliant for any ε<γ\varepsilon<\gamma. If ε>γ\varepsilon>\gamma, the instance is clearly not ε\varepsilon-compliant by Condition 2. Thus, γ\gamma is unique. ∎

We also prove that, for normal δX\delta_{X}, γ\gamma is determined by the min and max outputs.

Lemma 2.

For any instance of the lattice agreement problem and a normal δX\delta_{X}, let Y={yii[n]}Y=\{y_{i}\mid i\in[n]\}, and γ=δX(minY,maxY)\gamma=\delta_{X}(\min{Y},\max{Y}). Then, this instance is γ\gamma-compliant.

Proof.

First observe that minY,maxY\min{Y},\max{Y} exists by Comparability. Let yi=minYy_{i}=\min{Y} and yj=maxYy_{j}=\max{Y}, and let yk,yYy_{k},y_{\ell}\in Y such that ykyy_{k}\leq y_{\ell}. Then yiykyyjy_{i}\leq y_{k}\leq y_{\ell}\leq y_{j}, so by normality of δX\delta_{X}, δX(yi,yj)δX(yk,yj)δX(yk,y)\delta_{X}(y_{i},y_{j})\geq\delta_{X}(y_{k},y_{j})\geq\delta_{X}(y_{k},y_{\ell}). Thus, by Lemma 1, γ=δX(yi,yj)=δX(minY,maxY)\gamma=\delta_{X}(y_{i},y_{j})=\delta_{X}(\min{Y},\max{Y}) (since yi,yjy_{i},y_{j} have the maximum distance among the outputs as just shown). ∎

IV-A Equivalences

In the following lemmas we present equivalences between different instances of the ε\varepsilon-bounded lattice agreement problem and one result displaying an equivalence with the original lattice agreement problem.

An obvious equivalence is that a protocol which guarantees tighter bounds can also be used to solve a bounded lattice agreement problem for looser bounds.

Lemma 3.

Let εε\varepsilon^{\prime}\geq\varepsilon. Then any protocol that solves the ε\varepsilon-bounded lattice agreement problem can be used to solve the ε\varepsilon^{\prime}-bounded lattice agreement problem.

Proof.

Downward-Validity, Upward-Validity, and Comparability are all satisfied by the ε\varepsilon-bounded protocol. By ε\varepsilon-Tightness, δX(yi,yj)ε\delta_{X}(y_{i},y_{j})\leq\varepsilon for all yiyjy_{i}\leq y_{j}, and since εε\varepsilon\leq\varepsilon^{\prime}, we have δX(yi,yj)ε\delta_{X}(y_{i},y_{j})\leq\varepsilon^{\prime}, so ε\varepsilon^{\prime}-Tightness is also satisfied. ∎

The following lemma shows that the ε\varepsilon-bounded lattice agreement problem for a ε\varepsilon less than the minimum distance between any pair of elements in the lattice is the same as the 0-bounded lattice agreement problem.

Lemma 4.

Let ε>0\varepsilon>0 such that x<yδX(x,y)>εx<y\implies\delta_{X}(x,y)>\varepsilon. The ε\varepsilon-bounded lattice agreement problem is equivalent to the 0-bounded lattice agreement problem.

Proof.

Suppose there exists a protocol to solve the ε\varepsilon-bounded lattice agreement problem for such a ε\varepsilon. Then, by ε\varepsilon-Tightness, δX(yi,yj)ε\delta_{X}(y_{i},y_{j})\leq\varepsilon for all (correct) i,j[n]i,j\in[n] with yiyjy_{i}\leq y_{j}.

Suppose there is some yiyjy_{i}\neq y_{j}. By comparability, either yiyjy_{i}\leq y_{j} or yjyiy_{j}\leq y_{i}. Suppose, wlog, the former is true. Then by our chosen ε\varepsilon, we have δX(yi,yj)>ε\delta_{X}(y_{i},y_{j})>\varepsilon, which contradicts ε\varepsilon-Tightness. Thus, there is no yi,yjy_{i},y_{j} such that yiyjy_{i}\neq y_{j}, and so this protocol solves the 0-bounded lattice agreement problem.

The other direction is trivial since ε>0\varepsilon>0. ∎

We also show another equivalence, in that the original lattice agreement problem is equivalent to the ε\varepsilon-bounded agreement problem when εD\varepsilon\geq D, where

D=maxs1,s2{xii[n]}δX(s1,s2).D=\max_{s_{1},s_{2}\bowtie\{x_{i}\mid i\in[n]\}}\delta_{X}(s_{1},s_{2}).

Intuitively, DD is the longest distance between any two points in the range of values in which the outputs can lie.

Lemma 5.

The lattice agreement problem is equivalent to the DD-bounded lattice agreement problem.

Proof.

Let PP be a protocol solving the lattice agreement problem. Then, by Downward-Validity xiyix_{i}\leq y_{i} for all i[n]i\in[n]. By Upward-Validity, yij[n]xjy_{i}\leq\bigsqcup_{j\in[n]}x_{j}, so yi{xjj[n]}y_{i}\bowtie\{x_{j}\mid j\in[n]\} for all i[n]i\in[n], so δX(yi,yj)D\delta_{X}(y_{i},y_{j})\leq D for all i,j[n]i,j\in[n] by maximality of DD.

The other direction is trivial since Downward-Validity, Upward-Validity, and Comparability are all achieved in the DD-bounded lattice agreement problem. ∎

If we have normality in δX\delta_{X}, however, then we may attain a stricter upper-bound on ε\varepsilon for guaranteed by the lattice agreement problem. We will let

D=maxi[n]δX(xi,j[n]xj).D^{\prime}=\max_{i\in[n]}\,\delta_{X}(x_{i},\bigsqcup_{j\in[n]}x_{j}).

and we first show that DD^{\prime} is indeed a stronger upper-bound.

Lemma 6.

For an input set {xii[n]}\{x_{i}\mid i\in[n]\}, DDD^{\prime}\leq D.

Proof.

Since for all i[n]i\in[n], xi{xii[n]}x_{i}\bowtie\{x_{i}\mid i\in[n]\} and j[n]xj{xii[n]}\bigsqcup_{j\in[n]}x_{j}\bowtie\{x_{i}\mid i\in[n]\}, we immediately have DDD^{\prime}\leq D. ∎

Lemma 7.

If δX\delta_{X} is normal, then the lattice agreement problem is equivalent to the DD^{\prime}-bounded lattice agreement problem.

Proof.

Let PP be a protocol solving the lattice agreement problem. Let xix_{i} be an input which maximizes DD^{\prime} (that is, xi{xjj[n]}x_{i}\in\{x_{j}\mid j\in[n]\} such that D=δX(x,j[n]xj)D^{\prime}=\delta_{X}(x,\bigsqcup_{j\in[n]}x_{j})). Then, by Downward-Validity, xiyix_{i}\leq y_{i}. Let yjy_{j} be any output, then there are two cases by Comparability. If yiyjy_{i}\leq y_{j}, then xiyjx_{i}\leq y_{j}, so δX(yi,yj)D\delta_{X}(y_{i},y_{j})\leq D^{\prime} by normality. If yjyiy_{j}\leq y_{i}, then δX(yj,yi)δX(yj,j[n]xj)δX(xj,j[n]xj)\delta_{X}(y_{j},y_{i})\leq\delta_{X}(y_{j},\bigsqcup_{j\in[n]}x_{j})\leq\delta_{X}(x_{j},\bigsqcup_{j\in[n]}x_{j}) by normality, and δX(xj,j[n]xj)D\delta_{X}(x_{j},\bigsqcup_{j\in[n]}x_{j})\leq D^{\prime} since DD^{\prime} is maximized by xix_{i}.

Then, for any pair of outputs yiyjy_{i}\leq y_{j} (or, wlog, yjyiy_{j}\leq y_{i}), we have δX(yi,yj)D\delta_{X}(y_{i},y_{j})\leq D^{\prime}.

The other direction is trivial as stated in Lemma 5. ∎

These equivalences will help us reduce a seemingly continuous set of problems to an equivalent discrete set of problems that is much easier to reason about.

IV-B Reconciliation Protocols

Suppose we already have a set of output values {yii[n]}\{y_{i}\mid i\in[n]\} satisfying Downward-Validity, Upward-Validity, and Comparability with respect to some input set {xii[n]}\{x_{i}\mid i\in[n]\}. Our goal is to achieve ε\varepsilon-Tightness with at most f<nf<n faults when possible. These algorithms that begin with outputs from standard lattice agreement will be called reconciliation protocols, and they are correct w.r.t. ε\varepsilon when Downward-Validity, Upward-Validity, and Comparability are all maintained in the new outputs, and ε\varepsilon-Tightness is also achieved in the new outputs.

An instance (I,Y,Y)(I,Y,Y^{\prime}) of a reconciliation protocol then consists of an additional set YY^{\prime} representing the outputs from reconciliation on YY, which satisfies lattice agreement requirements w.r.t. II.

We show an important result that allows us to focus merely on reconciliation protocols for the sake of analyzing solvability of the ε\varepsilon-bounded lattice agreement problem.

Theorem 1.

There exists a ε\varepsilon-reconciliation protocol with ff faults if and only if there exists a protocol solving ε\varepsilon-bounded lattice agreement with ff faults.

Proof.

If there exists a ε\varepsilon-reconciliation protocol, then a protocol solving ε\varepsilon-bounded lattice agreement by simply running a standard lattice agreement protocol (e.g. synchronous and asynchronous algorithms presented in [1]), followed by the reconciliation protocol. The outputs from the lattice agreement must satisfy Downward-Validity, Upward-Validity, and Comparability, so they are valid inputs to the reconciliation protocol. By definition, the reconciliation protocol must maintain Downward-Validity, Upward-Validity, and Comparability, and it must also achieve ε\varepsilon-Tightness. Therefore, the ε\varepsilon-bounded lattice agreement is solved.

In the other direction, simply run the ε\varepsilon-bounded lattice agreement protocol. Then, the output values satisfy everything that is needed for a ε\varepsilon-reconciliation protocol, so we are done. ∎

This shows that it is sufficient to study reconciliation protocols for the sake of solvability of the ε\varepsilon-bounded lattice agreement problem, and vice versa. This is not the case, however, when attempting to minimize message complexity (as is clear from the proof).

V Synchronous Reconciliation

Solvability in synchronous systems is very straightforward by a generalizable synchronous algorithm for binary consensus.

Algorithm 1 Synchronous Agreement
1:  V{yi}V\leftarrow\{y_{i}\} {Initially just the lattice agreement output}
2:  for k=1k=1 to f+1f+1 do
3:     Send {vVPi has not sent v before}\{v\in V\mid P_{i}\text{ has not sent }v\text{ before}\} to all
4:     for each j[n]j\in[n] do
5:        Receive SjS_{j} from PjP_{j}
6:        VVSjV\leftarrow V\cup S_{j}
7:     end for
8:  end for
9:  Decide on max(V)\max(V)
Theorem 2.

All correct processes decide on the same value.

Proof.

Since there are f+1f+1 rounds and at most ff faults, there must (by pigeonhole principle) be a round in which no faults occur. That is, all correct processes send their set of unsent values to every other process, and all correct processes receive the respective sets from all other correct processes. Thus, at the end of that round, all processes must receive the same values, and end with the same VV. Then, VV stays the same for the remaining rounds, so all correct processes finish by deciding on the same value. ∎

Corollary 1.

The synchronous ε\varepsilon-bounded lattice agreement problem is solvable for all ε0\varepsilon\geq 0.

Proof.

Immediate from Theorem 2 and Lemma 3. ∎

VI Asynchronous Reconciliation

While we can solve the ε\varepsilon-bounded lattice agreement problem for all ε0\varepsilon\geq 0 synchronously, we certainly cannot solve the 0-bounded lattice agreement problem asynchronously.

Theorem 3.

The asynchronous 0-bounded lattice agreement problem is unsolvable for f1f\geq 1.

Proof.

Suppose there exists a protocol to solve the asynchronous 0-bounded lattice agreement problem. We show that such a protocol would solve the FLP asynchronous binary consensus problem presented in [3].

Then, it certainly terminates since every process eventually decides on an output. Let the lattice be {0,1}\{0,1\} with 0<10<1. Then, if xi=0x_{i}=0 for all i[n]i\in[n], j[n]xj=0\bigsqcup_{j\in[n]}x_{j}=0, and so 0yij[n]xj=00\leq y_{i}\leq\bigsqcup_{j\in[n]}x_{j}=0 by Downward and Upward-Validity (and similarly if xi=1x_{i}=1 for all i[n]i\in[n]). Thus we satisfy the Validity requirement of FLP.

Agreement is guaranteed by 0-Tightness, so this protocol guarantees FLP conditions. Since there may be a crash fault (f1f\geq 1), this is a contradiction by the FLP result [3]. ∎

In fact, by Lemma 4, we cannot solve this problem asynchronously for ε<M\varepsilon<M where

M=min{δX(x,y)x,yX,xy}M=\min{\{\delta_{X}(x,y)\mid x,y\in X,x\leq y\}}

While this gives a lower bound for ε\varepsilon, the arbitrary nature of the chosen quasi-metric gives the following fundamental result.

Theorem 4.

There does not exist an asynchronous protocol that always solves the ε\varepsilon-bounded lattice agreement problem for any ε>0\varepsilon>0.

Proof.

It is easy to construct a lattice and set of inputs for which all elements in the join-closed sub-lattice generated by the inputs are all more than ε\varepsilon units apart. Here, ε\varepsilon-Tightness is certainly impossible to achieve. ∎

Observe that Theorem 4 (and the following Corollary) is over all lattices, however it does not say anything about existence of protocols for specific lattices. Soon, we will prove an impossibility result on the upper-bound depending on not only the lattice, but the particular instance in the lattice as well.

Corollary 2.

The asynchronous ε\varepsilon-bounded lattice agreement problem is unsolvable for all ε0\varepsilon\geq 0 with f1f\geq 1.

Proof.

Immediate from Theorems 3 and 4. ∎

This result shows us that, even for viable values of ε\varepsilon, there is no protocol relying on ε\varepsilon that can always guarantee a valid set of outputs per the problem. Furthermore, this shows that we cannot make an algorithm for arbitrary tightness stronger than the upper-bound given by standard lattice agreement (for arbitrary lattices and quasi-metrics). Therefore, we instead present algorithms for achieving better ε\varepsilon-Tightness, but not a particular value of ε\varepsilon. That is, we would like to attain stronger upper-bounds on compliance.

Theorem 5.

There is no asynchronous reconciliation protocol that guarantees ε\varepsilon-Tightness for any fixed ε<D\varepsilon<D^{\prime} when f1f\geq 1.

Proof.

Let Y={yii[n]}Y=\{y_{i}\mid i\in[n]\} and suppose

δX(minY,maxY)=maxi[n]δX(xi,j[n]xj)=D.\delta_{X}(\min{Y},\max{Y})=\max_{i\in[n]}\,\delta_{X}(x_{i},\bigsqcup_{j\in[n]}x_{j})=D^{\prime}.

Then, it must be the case that such a protocol changes the value of minY\min{Y} or maxY\max{Y} in their respective processes. Suppose minY=y1==yn1=x1==xn1\min{Y}=y_{1}=\ldots=y_{n-1}=x_{1}=\ldots=x_{n-1} and maxY=yn=xn=j[n]xj\max{Y}=y_{n}=x_{n}=\bigsqcup_{j\in[n]}x_{j}.

We cannot move yn=maxYy_{n}=\max{Y} to any other point in the lattice since either Downward or Upward-Validity would be violated.

We cannot move any y1,,yn1y_{1},\ldots,y_{n-1} down in the lattice, since then Downward-Validity would be violated. Then, to decrease ε\varepsilon by changing minY\min{Y}, we must move all y1,,yn1y_{1},\ldots,y_{n-1} up the lattice.

Since the system is asynchronous, we may repeatedly delay pnp_{n}’s message, meaning that transitions in the state of values are made without depending on yny_{n}. Then, if some process pip_{i} changes their value yiy_{i} to a different yiy_{i}^{\prime}, then this transition could also occur if xn=yn=y1=x_{n}=y_{n}=y_{1}=\ldots. This would then violate Upward-Validity.

Therefore, an adversary in this situation can always enforce a transition to an illegal state by delaying pnp_{n}’s messages. ∎

While Theorem 5 seems similar to the previous unsolvability results, there is an important distinction. This theorem states that we cannot even attain a guarantee-able upper-bound on γ\gamma-compliance better than DD^{\prime}.

This motivates a heuristic approach to ε\varepsilon-bounded lattice agreement, where we would like to attain such an upper-bound (or something even stronger) on average or with high probability.

VI-A Heuristic Algorithm

If we have a normal quasi-metric, then improving the γ\gamma-compliance from maxi,j[n]δX(yi,yj)\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j}) (by Lemma 1) is simply a matter of either moving all minimum values up the lattice or moving all the maximum values down the lattice (while maintaining validity).

Moving maximum values down can certainly cause us to violate Downward-Validity, however, since the original input xix_{i} of a maximum value yiy_{i} could be equal to it (that is, xi=yix_{i}=y_{i}). Moving minimum values up is therefore a much better option, since we only need to stay below j[n]xj\bigsqcup_{j\in[n]}x_{j}. Without knowing the inputs, however, this may be challenging. The only thing we do know in a reconciliation protocol is that all values satisfy Upward-Validity, so we can rely on these values.

Here, we present a very simple algorithm for attaining better γ\gamma-compliance with high probability (depending on the chosen value of kk, and of course the set of initial values).

Algorithm 2 DR(k){\rm DR}(k)
1:  yiOutput from lattice agreementy_{i}\leftarrow\text{Output from lattice agreement}
2:  for r=1r=1 to kk do
3:     Send (yi,r)(y_{i},r) to all
4:     Receive nfn-f (,r)(\cdot,r) messages. Let VrV_{r} be the set of received values.
5:     yimax(Vr{yi})y_{i}\leftarrow\max{(V_{r}\cup\{y_{i}\})}
6:  end for
7:  Output yiy_{i}

Let yiry_{i}^{r} be the value of yiy_{i} in process pip_{i} before executing the for loop for round rr. Then, the initial value (output from lattice agreement) is yi1y_{i}^{1}, and we let yik+1y_{i}^{k+1} be the final output value. Then, let Ar={yiri[n]}A_{r}=\{y_{i}^{r}\mid i\in[n]\}.

Lemma 8.

For all 1mk1\leq m\leq k, Am+1AmA_{m+1}\subseteq A_{m}.

Theorem 6.

Downward-Validity, Upward-Validity, and Comparability are all maintained.

Proof.

For each yiry_{i}^{r}, we have yiryir+1y_{i}^{r}\leq y_{i}^{r+1}, and since xiyi1x_{i}\leq y_{i}^{1}, we then have xiyik+1x_{i}\leq y_{i}^{k+1}, so Downward-Validity is maintained.

By Lemma 8, we have that Ak+1A1A_{k+1}\subseteq A_{1}, so for all i[n]i\in[n], j[n]\exists j\in[n] such that yik+1=yj1y_{i}^{k+1}=y_{j}^{1}. Since the initial values all satisfy Upward-Validity, we then have that yik+1y_{i}^{k+1} satisfies Upward-Validity. This same argument also follows to show that Comparability is maintained as well. ∎

Theorem 7.

For any instance (I,Y,Y)(I,Y,Y^{\prime}) of this algorithm, if (I,Y)(I,Y) is γ\gamma-compliant, then (I,Y)(I,Y^{\prime}) is γ\gamma^{\prime}-compliant for some γγ\gamma^{\prime}\leq\gamma.

Proof.

By Lemma 8, the maximum distance between any pair of values can only decrease from AmA_{m} to Am+1A_{m+1}. ∎

Observe that none of these proofs uses the normality of δX\delta_{X}. Therefore, the algorithm still works for non-normal quasi-metrics, however there is a much greater likelihood of improving γ\gamma-compliance with a normal quasi-metric since we can guarantee improvement when minimum values are moved up.

Simulating this algorithm for large nn is computationally difficult, so we instead provide an easier to compute non-distributed abstract semantics that gives a good approximation of the improvement rates with DR(k){\rm DR}(k).

VI-B Simplified Approximate Model

Since, for all i[n]i\in[n], yiryir+1y_{i}^{r}\leq y_{i}^{r+1}, we only need the minimum values to change in order to attain a γ\gamma-compliance better than DD^{\prime}. This gives rise to a simple model that allows us to analyze any given instance of DR(k){\rm DR}(k) as a discrete-time absorbing Markov chain.

The idea is to represent states by 0s and 1s, where a 0 represents a minimum value (that is, some yiry_{i}^{r} such that yir=min{yj1j[n]}y_{i}^{r}=\min{\{y_{j}^{1}\mid j\in[n]\}}) and 1 represents a non-minimum value. The goal is to have all 1s and no 0s, so that there are no more minimum values (with respect to the initial values).

At each round of the algorithm, each 0 picks nfn-f of the current state’s 0s and 1s, and changes to a 1 only if it picked at least 1 (that is, it takes the max).

For any given process, we assume that the probability of receiving a message from another process on round rr is uniformly distributed. That is, there is an equal chance for pip_{i} to receive from pjp_{j} for all j[n]j\in[n] at each round. Thus, in the simplified model, this means that each bit has an equal chance of being chosen by each 0 independently.

We also assume that crashes are independent, and each process has an equal chance pfp_{f} to crash (among the first ff crashes). To simplify this model, we say that line 3 of DR{\rm DR} is atomic, so processes cannot send a message to some processes and crash before sending it to the rest of the processes.

Definition 5.

The state space 𝕊n,f\mathbb{S}_{n,f} (or simply 𝕊\mathbb{S} when n,fn,f are clear from context) for nn processes and at most ff failures consists of all vectors s1,,sn\langle s_{1},\ldots,s_{n}\rangle such that si{0,1,}s_{i}\in\{0,1,\bot\} for all i[n]i\in[n], and there are no more than ff components equal to \bot and at least one component equals 1.

For convenience, we say that <0<1\bot<0<1.

Definition 6.

A state S=s1,,sn𝕊S^{\prime}=\langle s_{1}^{\prime},\ldots,s_{n}^{\prime}\rangle\in\mathbb{S} is reachable from state S=s1,,sn𝕊S=\langle s_{1},\ldots,s_{n}\rangle\in\mathbb{S} (written S1S2S_{1}\rightarrow S_{2}) if

  • i[n]:si=si=\forall i\in[n]:s_{i}=\bot\implies s_{i}^{\prime}=\bot

  • i[n]:si=1si0\forall i\in[n]:s_{i}=1\implies s_{i}^{\prime}\neq 0.

This definition corresponds to the reachable states after running one round of DR{\rm DR}.

We now define the set of “good” states, in which compliance was improved.

Definition 7.

The set of improved states 𝔽𝕊\mathbb{F}\subseteq\mathbb{S} is the set 𝔽={s1,,sn𝕊(i[n]:si0)(i[n]:si1)}\mathbb{F}=\{\langle s_{1},\ldots,s_{n}\rangle\in\mathbb{S}\mid(\forall i\in[n]:s_{i}\neq 0)\vee(\forall i\in[n]:s_{i}\neq 1)\}.

We now provide an algorithm for the abstract semantics with kk rounds beginning at state SS, and traversing the state reachability graph (as determined by the random nfn-f choices).

Algorithm 3 Simplified Approximate Model Simulation
1:  x0x\leftarrow 0
2:  for r1r\leftarrow 1 to kk do
3:     Acopy of SA\leftarrow\text{copy of $S$}
4:     for i1i\leftarrow 1 to nn do
5:        if x<fx<f and pfp_{f} probability then
6:           S[i]S[i]\leftarrow\bot
7:           xx+1x\leftarrow x+1
8:           continue
9:        end if
10:        if S[i]0S[i]\neq 0 then
11:           continue
12:        end if
13:        Cchoose nf random values from SC\leftarrow\text{choose $n-f$ random values from $S$}
14:        A[i]max(C{S[i]})A[i]\leftarrow\max(C\cup\{S[i]\})
15:     end for
16:     SAS\leftarrow A
17:  end for
18:  return S𝔽S\in\mathbb{F}?

This algorithm was simulated for 1000 runs with n=1000n=1000. The following tables show results for when f=200f=200 and pf=0.06p_{f}=0.06. Table I gives success rates over the 1000 runs with randomized inputs. Table II gives success rates with worst-case input, which is all 0s and one 1.

TABLE I: Success rates of random input with n=1000,pf=0.06n=1000,p_{f}=0.06
k=2k=2 k=3k=3 k=4k=4
f=200f=200 17.1% 90.3% 99.9%
f=800f=800 16.8% 89.6% 99.3%
TABLE II: Success rates of worst-case input with n=1000,pf=0.06n=1000,p_{f}=0.06
k=2k=2 k=3k=3 k=4k=4 k=5k=5
f=200f=200 0.0% 41.3% 97.6% 100.0%
f=800f=800 0.0% 5.4% 84.0% 98.9%

We also tested pf=0.5p_{f}=0.5 to pf=0.8p_{f}=0.8 in 0.1 for f=800f=800 and observed that, for k=2k=2, 0% of simulations succeeded in achieving a better upper-bound, however for k=3k=3, 100% of simulations succeeded. This is to be expected, since using more of the fault budget allows either all 1s to be eliminated, or enough 0s to be eliminated so that a 1 is guaranteed among nfn-f choices.

Overall, we see that only a very small number of rounds in relation to the number of processes is needed for very high probability to improve the upper-bound. For 1000 processes, the average case (randomized input) gives that we only need to run 4 rounds to get a high probability of success (5 for a near guarantee, even in worst-case input), which is nearly negligible compared to the number of rounds needed for the lattice agreement itself.

We conjecture that the same probability of improvement is the same at k=5k=5 for any number of processes nn, failures ff, and chance of failure pfp_{f}, with minimal error. That is, we conjecture that such a high probability of improvement is achieved with DR(5){\rm DR}(5) in any system (under the assumptions of our original system model). This is because the same amount “spreading” per round will occur proportional to the number of processes, causing it to converge at the same round, even if the first few rounds differ when n,f,pfn,f,p_{f} differ.

Furthermore, this would mean that our heuristic algorithm only needs O(1)O(1) (constant) rounds to achieve improved γ\gamma-compliance, and the constant number of rounds is very small (5, as per the conjecture).

VII Conclusion

In this paper, we proposed a new constraint on the standard lattice agreement problem by requiring outputs to be “close” to each other w.r.t. a quasi-metric on the lattice. We show that this new problem whose range is over 0\mathbb{R}_{\geq 0}^{\infty} can be reduced to a discrete set of equivalence classes on this range. An upper-bound on tightness is derived for standard lattice agreement, and we show that these upper-bounds are stubborn in the presence of asynchrony. For synchronous systems, we show that the problem is easily solvable for all desired levels of tightness. For asynchronous systems, we provide a variety of impossibility results, and one which states that the upper-bound achieved in standard lattice agreement cannot be improved with certainty. This led us to present a heuristic algorithm for attaining improved tightness with high probability in asynchronous systems. We then modeled this heuristic algorithm with an approximation and simulated this model to obtain an approximation on the probabilities of improving tightness with kk rounds. Finally, we conjectured that only a constant number (k=5k=5) of rounds are necessary to achieve high probability of improvement in the presence of any number of processes, faults (f<nf<n), and any fault probability.

References

  • [1] Hagit Attiya, Maurice Herlihy, and Ophir Rachman. Atomic snapshots using lattice agreement. Distrib. Comput., 8(3):121–132, March 1995.
  • [2] Jose M. Faleiro, Sriram Rajamani, Kaushik Rajan, G. Ramalingam, and Kapil Vaswani. Generalized lattice agreement. In Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing, PODC ’12, page 125–134, New York, NY, USA, 2012. Association for Computing Machinery.
  • [3] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985.
  • [4] Petr Kuznetsov, Thibault Rieutord, and Sara Tucci-Piergiovanni. Reconfigurable lattice agreement and applications. 2019.
  • [5] Marios Mavronicolas. A bound on the rounds to reach lattice agreement. 2000.
  • [6] Xiong Zheng, Changyong Hu, and Vijay K. Garg. Lattice agreement in message passing systems. 2018.