Bounding Distance Between Outputs in Distributed Lattice Agreement

Abdullah Rasheed The University of Texas at Austin
Austin, TX, USA Nidhi Dubagunta The University of Texas at Austin
Austin, TX, USA

Abstract

This paper studies the lattice agreement problem and proposes a stronger form, $\varepsilon$ -bounded lattice agreement, that enforces an additional tightness constraint on the outputs. To formalize the concept, we define a quasi-metric on the structure of the lattice, which captures a natural notion of distance between lattice elements. We consider the bounded lattice agreement problem in both synchronous and asynchronous systems, and provide algorithms that aim to minimize the distance between the output values, while satisfying the requirements of the classic lattice agreement problem. We show strong impossibility results for the asynchronous case, and a heuristic algorithm that achieves improved tightness with high probability, and we test an approximation of this algorithm to show that only a very small number of rounds are necessary.

Index Terms:

Lattice agreement, fault-tolerance, consensus.

I Introduction

The Lattice Agreement problem, introduced in [1], is a weakened form of the distributed decision problem. Each process begins with a value from a lattice, and the goal is for all processes to decide on values that are mutually comparable. Lattice agreements loosens the constraints of the consensus problem, which requires strict uniformity and validity in the output values and is impossible in asynchronous systems [3]. Lattice agreement has the advantage of decidability in the presence of asynchrony and faults [1], making it an important and useful problem in distributed systems.

Lattice agreement has a variety of applications in practice, and was first proposed as a method for attaining consistent atomic snapshots [1]. A generalized form of the lattice agreement problem was later proposed to implement fault-tolerant replicated state machines [2]. The generalized lattice agreement [2] is also shown to be useful in achieving linearizable data structures, particularly in conflict-free replicated data types. Another version of lattice agreement, reconfigurable lattice agreement, was proposed in [4] in order to handle agreement on the system configuration for client and server consistency.

While better bounds on lattice agreement algorithms were found in [5], they were improved upon in [6] with algorithms for synchronous, asynchronous, and generalized lattice agreement.

In this paper, we propose a new constraint, tightness, on standard lattice agreement that requires all outputs to be within a certain distance of each other as determined by a quasi-metric over the lattice. Such a requirement can force processes to agree on values (such as the global view in atomic snapshot) that are closer to each other. Minimizing this distance between intermediate states in applications such as atomic snapshots and replicated state machines could lead to faster eventual convergence. Reducing the divergence between replicas during the reconciliation phase may mean that fewer overall synchronization operations are needed, allowing the system to reach eventual consistency more efficiently. Additionally processes can potentially recover more quickly in the case of node crashes, because it reduces the risk of losing information from data loss. It can also ensure that processes are “synchronized enough” with respect to what they must agree on, in a similar notion as clock synchronization.

We show that this problem has equivalences to standard lattice agreement under particular bounds, and that there is a discrete structure to the range of tightness on the outputs, despite the continuity of the range on the quasi-metric. We briefly show that this problem is completely solved in the synchronous case, but is much trickier in the asynchronous case. While standard lattice agreement is solvable asynchronously, better tightness can never be guaranteed. This motivates our heuristic approach, which we provide an approximate model of and simulate to get approximate probabilities of improved tightness. We see that after only 5 rounds of our algorithm, the probability converges to a near guarantee. Finally, we conjecture that this number of rounds is constant, no matter the number of processes, faults, or the chance of a fault.

II Background

We will let $[n]$ denote the set $\{1,\ldots,n\}$ . We will also let $\mathbb{R}_{\geq 0}^{\infty}=\{r\in\mathbb{R}\mid r\geq 0\}\cup\{\infty\}$ , where $r<\infty$ for all $r\in\mathbb{R}_{\geq 0}^{\infty}\setminus\{\infty\}$ .

A triple $(X,\leq,\sqcup)$ is called a join semi-lattice if $(X,\leq)$ is a poset, and $\sqcup$ is a join operator such that the join of every non-empty finite subset is defined. For brevity, we will refer to join semi-lattices simply as lattices. We say $X$ is grounded if there exists an element $\bot_{X}\in X$ such that $\bot_{X}\leq x$ for all $x\in X$ . For any subset $S\subseteq X$ , we let $y\bowtie S$ mean $\exists s\in S:s\leq y\leq\bigsqcup S$ .

With any lattice $X$ , we will feel free to say $a<b$ for $a,b\in X$ to mean $a\leq b\wedge a\neq b$ .

III System Model

We consider a system model consisting of $n$ distributed processes, in which at most $f$ faults may occur. We only consider faults on processes, which suffer crash faults.

We will consider the system to be either synchronous or asynchronous, and each process can send and receive messages from every other process including itself.

IV Distributed Lattice Agreement

The original lattice agreement is as follows. Let $(X,\leq,\sqcup)$ be a finite lattice, and let there be $n$ processes in the system. Each process $p_{i}$ proposes an input value $x_{i}\in X$ , and each $p_{i}$ decides on an output $y_{i}\in X$ . To solve the lattice agreement problem, the following constraints must be met:

•

Downward-Validity: $\forall i\in[n]:x_{i}\leq y_{i}$
•

Upward-Validity: $\forall i\in[n]:y_{i}\leq\bigsqcup_{j\in[n]}x_{j}$
•

Comparability: $\forall i,j\in[n]:y_{i}\leq y_{j}\vee y_{j}\leq y_{i}$

We introduce a stronger version of this problem by requiring the outputs to all be “close” to each other with respect to some distance function on the lattice.

Definition 1.

Let $(X,\leq,\sqcup)$ be a lattice. Then a function $\delta_{X}:X\times X\rightarrow\mathbb{R}_{\geq 0}^{\infty}\cup\{\bot\}$ is called a quasi-metric over $X$ if for all $x_{1},x_{2},x_{3}\in X$ ,

(i)

$x_{1}=x_{2}\Leftrightarrow\delta_{X}(x_{1},x_{2})=0$
(ii)

$x_{1}\leq x_{2}\Leftrightarrow\delta_{X}(x_{1},x_{2})\neq\bot$
(iii)

$x_{1}\leq x_{2}\leq x_{3}\Rightarrow\delta_{X}(x_{1},x_{3})\leq\delta_{X}(x_{1},x_{2})+\delta_{X}(x_{2},x_{3})$

Definition 2.

A lattice $(X,\leq,\sqcup)$ with a quasi-metric $\delta_{X}$ is called a lattice quasi-metric space.

In a lattice quasi-metric space, $\delta_{X}(x_{1},x_{2})$ can intuitively be described as how “close” $x_{1}$ and $x_{2}$ are in some natural sense. For example, in the lattice of sets of integers, we would naturally have the distance between $\{1\}$ and $\{1,2\}$ is smaller than the distance between $\{1\}$ and $\{1,\ldots,100\}$ , or that the distance between $\{1\}$ and $\{1,2\}$ is smaller than the distance between $\{1\}$ and $\{1,1000\}$ (all depending on the context of the lattice). Observe that, while the length of the shortest path from $x_{1}$ to $x_{2}$ may work as a quasi-metric, this is not the quasi-metric that one may want to use, as there may be paths in the lattice that should naturally be considered longer than others (in the sense of distance), even if the paths are the same length in the lattice.

We may also find it easier to work with more natural quasi-metrics, so we may often impose a requirement of a normal quasi-metric.

Definition 3.

A quasi-metric $\delta_{X}$ over a lattice $(X,\leq,\sqcup)$ is said to be height-normal (or simply, normal) if it satisfies

•

$a\leq b\leq c\implies\delta_{X}(a,b),\delta_{X}(b,c)\leq\delta_{X}(a,c)$

for all $a,b,c\in X$ .

We now introduce a new version of the lattice agreement problem. The $\varepsilon$ -bounded lattice agreement problem (with $\varepsilon\geq 0$ ) is as follows. Let $(X,\leq,\sqcup)$ be a lattice quasi-metric space with $\delta_{X}$ . Each process $p_{i}$ proposes an input value $x_{i}\in X$ , and each $p_{i}$ decides on an output $y_{i}\in X$ . To solve the $\varepsilon$ -bounded lattice agreement, Downward-Validity, Upward-Valildity, and Comparability must all be met, along with one additional constraint:

•

$\varepsilon$ -Tightness: $\forall i,j\in[n]:(y_{i}\leq y_{j}\Rightarrow\delta_{X}(y_{i},y_{j})\leq\varepsilon)$

This forces the output values to be approximately the same, i.e. “close” to each other.

Later in the paper, we will be interested in studying the tightness of specific instances of the $\varepsilon$ -bounded lattice agreement problem. For the following definitions, an instance of a protocol is a pair $(I,Y)$ of sets of input and output values, respectively. In context, $I=\{x_{i}\mid i\in[n]\}$ and $Y=\{y_{i}\mid i\in[n]\}$ .

Definition 4.

An instance of a protocol of the lattice agreement problem is said to be $\gamma$ -compliant if

(1)

The outputs satisfy $\gamma$ -Tightness;
(2)

There is no $\varepsilon<\gamma$ such that the outputs satisfy $\varepsilon$ -Tightness.

Lemma 1.

Any instance of any lattice agreement protocol is $\gamma$ -compliant for a unique $\gamma\geq 0$ determined by $\gamma=\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j})$ .

Proof.

Let $\gamma=\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j})$ (which must satisfy $\gamma\geq 0$ , since the outputs form a chain by Comparability). Then the outputs certainly satisfy $\gamma$ -Tightness by definition, and if $\varepsilon<\gamma$ , then the $i,j$ that maximize $\gamma$ (i.e. the $i,j$ such that $\gamma=\delta_{X}(y_{i},y_{j})$ ) do not satisfy $\delta_{X}(y_{i},y_{j})\leq\varepsilon$ , so $\varepsilon$ -Tightness is not satisfied, and so the instance is $\gamma$ -compliant. Furthermore, this means the instance is not $\varepsilon$ -compliant for any $\varepsilon<\gamma$ . If $\varepsilon>\gamma$ , the instance is clearly not $\varepsilon$ -compliant by Condition 2. Thus, $\gamma$ is unique. ∎

We also prove that, for normal $\delta_{X}$ , $\gamma$ is determined by the min and max outputs.

Lemma 2.

For any instance of the lattice agreement problem and a normal $\delta_{X}$ , let $Y=\{y_{i}\mid i\in[n]\}$ , and $\gamma=\delta_{X}(\min{Y},\max{Y})$ . Then, this instance is $\gamma$ -compliant.

Proof.

First observe that $\min{Y},\max{Y}$ exists by Comparability. Let $y_{i}=\min{Y}$ and $y_{j}=\max{Y}$ , and let $y_{k},y_{\ell}\in Y$ such that $y_{k}\leq y_{\ell}$ . Then $y_{i}\leq y_{k}\leq y_{\ell}\leq y_{j}$ , so by normality of $\delta_{X}$ , $\delta_{X}(y_{i},y_{j})\geq\delta_{X}(y_{k},y_{j})\geq\delta_{X}(y_{k},y_{\ell})$ . Thus, by Lemma 1, $\gamma=\delta_{X}(y_{i},y_{j})=\delta_{X}(\min{Y},\max{Y})$ (since $y_{i},y_{j}$ have the maximum distance among the outputs as just shown). ∎

IV-A Equivalences

In the following lemmas we present equivalences between different instances of the $\varepsilon$ -bounded lattice agreement problem and one result displaying an equivalence with the original lattice agreement problem.

An obvious equivalence is that a protocol which guarantees tighter bounds can also be used to solve a bounded lattice agreement problem for looser bounds.

Lemma 3.

Let $\varepsilon^{\prime}\geq\varepsilon$ . Then any protocol that solves the $\varepsilon$ -bounded lattice agreement problem can be used to solve the $\varepsilon^{\prime}$ -bounded lattice agreement problem.

Proof.

Downward-Validity, Upward-Validity, and Comparability are all satisfied by the $\varepsilon$ -bounded protocol. By $\varepsilon$ -Tightness, $\delta_{X}(y_{i},y_{j})\leq\varepsilon$ for all $y_{i}\leq y_{j}$ , and since $\varepsilon\leq\varepsilon^{\prime}$ , we have $\delta_{X}(y_{i},y_{j})\leq\varepsilon^{\prime}$ , so $\varepsilon^{\prime}$ -Tightness is also satisfied. ∎

The following lemma shows that the $\varepsilon$ -bounded lattice agreement problem for a $\varepsilon$ less than the minimum distance between any pair of elements in the lattice is the same as the 0-bounded lattice agreement problem.

Lemma 4.

Let $\varepsilon>0$ such that $x<y\implies\delta_{X}(x,y)>\varepsilon$ . The $\varepsilon$ -bounded lattice agreement problem is equivalent to the 0-bounded lattice agreement problem.

Proof.

Suppose there exists a protocol to solve the $\varepsilon$ -bounded lattice agreement problem for such a $\varepsilon$ . Then, by $\varepsilon$ -Tightness, $\delta_{X}(y_{i},y_{j})\leq\varepsilon$ for all (correct) $i,j\in[n]$ with $y_{i}\leq y_{j}$ .

Suppose there is some $y_{i}\neq y_{j}$ . By comparability, either $y_{i}\leq y_{j}$ or $y_{j}\leq y_{i}$ . Suppose, wlog, the former is true. Then by our chosen $\varepsilon$ , we have $\delta_{X}(y_{i},y_{j})>\varepsilon$ , which contradicts $\varepsilon$ -Tightness. Thus, there is no $y_{i},y_{j}$ such that $y_{i}\neq y_{j}$ , and so this protocol solves the 0-bounded lattice agreement problem.

The other direction is trivial since $\varepsilon>0$ . ∎

We also show another equivalence, in that the original lattice agreement problem is equivalent to the $\varepsilon$ -bounded agreement problem when $\varepsilon\geq D$ , where

D=\max_{s_{1},s_{2}\bowtie\{x_{i}\mid i\in[n]\}}\delta_{X}(s_{1},s_{2}).

Intuitively, $D$ is the longest distance between any two points in the range of values in which the outputs can lie.

Lemma 5.

The lattice agreement problem is equivalent to the $D$ -bounded lattice agreement problem.

Proof.

Let $P$ be a protocol solving the lattice agreement problem. Then, by Downward-Validity $x_{i}\leq y_{i}$ for all $i\in[n]$ . By Upward-Validity, $y_{i}\leq\bigsqcup_{j\in[n]}x_{j}$ , so $y_{i}\bowtie\{x_{j}\mid j\in[n]\}$ for all $i\in[n]$ , so $\delta_{X}(y_{i},y_{j})\leq D$ for all $i,j\in[n]$ by maximality of $D$ .

The other direction is trivial since Downward-Validity, Upward-Validity, and Comparability are all achieved in the $D$ -bounded lattice agreement problem. ∎

If we have normality in $\delta_{X}$ , however, then we may attain a stricter upper-bound on $\varepsilon$ for guaranteed by the lattice agreement problem. We will let

D^{\prime}=\max_{i\in[n]}\,\delta_{X}(x_{i},\bigsqcup_{j\in[n]}x_{j}).

and we first show that $D^{\prime}$ is indeed a stronger upper-bound.

Lemma 6.

For an input set $\{x_{i}\mid i\in[n]\}$ , $D^{\prime}\leq D$ .

Proof.

Since for all $i\in[n]$ , $x_{i}\bowtie\{x_{i}\mid i\in[n]\}$ and $\bigsqcup_{j\in[n]}x_{j}\bowtie\{x_{i}\mid i\in[n]\}$ , we immediately have $D^{\prime}\leq D$ . ∎

Lemma 7.

If $\delta_{X}$ is normal, then the lattice agreement problem is equivalent to the $D^{\prime}$ -bounded lattice agreement problem.

Proof.

Let $P$ be a protocol solving the lattice agreement problem. Let $x_{i}$ be an input which maximizes $D^{\prime}$ (that is, $x_{i}\in\{x_{j}\mid j\in[n]\}$ such that $D^{\prime}=\delta_{X}(x,\bigsqcup_{j\in[n]}x_{j})$ ). Then, by Downward-Validity, $x_{i}\leq y_{i}$ . Let $y_{j}$ be any output, then there are two cases by Comparability. If $y_{i}\leq y_{j}$ , then $x_{i}\leq y_{j}$ , so $\delta_{X}(y_{i},y_{j})\leq D^{\prime}$ by normality. If $y_{j}\leq y_{i}$ , then $\delta_{X}(y_{j},y_{i})\leq\delta_{X}(y_{j},\bigsqcup_{j\in[n]}x_{j})\leq\delta_{X}(x_{j},\bigsqcup_{j\in[n]}x_{j})$ by normality, and $\delta_{X}(x_{j},\bigsqcup_{j\in[n]}x_{j})\leq D^{\prime}$ since $D^{\prime}$ is maximized by $x_{i}$ .

Then, for any pair of outputs $y_{i}\leq y_{j}$ (or, wlog, $y_{j}\leq y_{i}$ ), we have $\delta_{X}(y_{i},y_{j})\leq D^{\prime}$ .

The other direction is trivial as stated in Lemma 5. ∎

These equivalences will help us reduce a seemingly continuous set of problems to an equivalent discrete set of problems that is much easier to reason about.

IV-B Reconciliation Protocols

Suppose we already have a set of output values $\{y_{i}\mid i\in[n]\}$ satisfying Downward-Validity, Upward-Validity, and Comparability with respect to some input set $\{x_{i}\mid i\in[n]\}$ . Our goal is to achieve $\varepsilon$ -Tightness with at most $f<n$ faults when possible. These algorithms that begin with outputs from standard lattice agreement will be called reconciliation protocols, and they are correct w.r.t. $\varepsilon$ when Downward-Validity, Upward-Validity, and Comparability are all maintained in the new outputs, and $\varepsilon$ -Tightness is also achieved in the new outputs.

An instance $(I,Y,Y^{\prime})$ of a reconciliation protocol then consists of an additional set $Y^{\prime}$ representing the outputs from reconciliation on $Y$ , which satisfies lattice agreement requirements w.r.t. $I$ .

We show an important result that allows us to focus merely on reconciliation protocols for the sake of analyzing solvability of the $\varepsilon$ -bounded lattice agreement problem.

Theorem 1.

There exists a $\varepsilon$ -reconciliation protocol with $f$ faults if and only if there exists a protocol solving $\varepsilon$ -bounded lattice agreement with $f$ faults.

Proof.

If there exists a $\varepsilon$ -reconciliation protocol, then a protocol solving $\varepsilon$ -bounded lattice agreement by simply running a standard lattice agreement protocol (e.g. synchronous and asynchronous algorithms presented in [1]), followed by the reconciliation protocol. The outputs from the lattice agreement must satisfy Downward-Validity, Upward-Validity, and Comparability, so they are valid inputs to the reconciliation protocol. By definition, the reconciliation protocol must maintain Downward-Validity, Upward-Validity, and Comparability, and it must also achieve $\varepsilon$ -Tightness. Therefore, the $\varepsilon$ -bounded lattice agreement is solved.

In the other direction, simply run the $\varepsilon$ -bounded lattice agreement protocol. Then, the output values satisfy everything that is needed for a $\varepsilon$ -reconciliation protocol, so we are done. ∎

This shows that it is sufficient to study reconciliation protocols for the sake of solvability of the $\varepsilon$ -bounded lattice agreement problem, and vice versa. This is not the case, however, when attempting to minimize message complexity (as is clear from the proof).

V Synchronous Reconciliation

Solvability in synchronous systems is very straightforward by a generalizable synchronous algorithm for binary consensus.

Algorithm 1 Synchronous Agreement

V\leftarrow\{y_{i}\}

{Initially just the lattice agreement output}

2: for

k=1

f+1

3: Send

\{v\in V\mid P_{i}\text{ has not sent }v\text{ before}\}

to all

4: for each

j\in[n]

5: Receive

S_{j}

from

P_{j}

V\leftarrow V\cup S_{j}

7: end for

8: end for

9: Decide on

\max(V)

Theorem 2.

All correct processes decide on the same value.

Proof.

Since there are $f+1$ rounds and at most $f$ faults, there must (by pigeonhole principle) be a round in which no faults occur. That is, all correct processes send their set of unsent values to every other process, and all correct processes receive the respective sets from all other correct processes. Thus, at the end of that round, all processes must receive the same values, and end with the same $V$ . Then, $V$ stays the same for the remaining rounds, so all correct processes finish by deciding on the same value. ∎

Corollary 1.

The synchronous $\varepsilon$ -bounded lattice agreement problem is solvable for all $\varepsilon\geq 0$ .

Proof.

Immediate from Theorem 2 and Lemma 3. ∎

VI Asynchronous Reconciliation

While we can solve the $\varepsilon$ -bounded lattice agreement problem for all $\varepsilon\geq 0$ synchronously, we certainly cannot solve the 0-bounded lattice agreement problem asynchronously.

Theorem 3.

The asynchronous 0-bounded lattice agreement problem is unsolvable for $f\geq 1$ .

Proof.

Suppose there exists a protocol to solve the asynchronous 0-bounded lattice agreement problem. We show that such a protocol would solve the FLP asynchronous binary consensus problem presented in [3].

Then, it certainly terminates since every process eventually decides on an output. Let the lattice be $\{0,1\}$ with $0<1$ . Then, if $x_{i}=0$ for all $i\in[n]$ , $\bigsqcup_{j\in[n]}x_{j}=0$ , and so $0\leq y_{i}\leq\bigsqcup_{j\in[n]}x_{j}=0$ by Downward and Upward-Validity (and similarly if $x_{i}=1$ for all $i\in[n]$ ). Thus we satisfy the Validity requirement of FLP.

Agreement is guaranteed by 0-Tightness, so this protocol guarantees FLP conditions. Since there may be a crash fault ( $f\geq 1$ ), this is a contradiction by the FLP result [3]. ∎

In fact, by Lemma 4, we cannot solve this problem asynchronously for $\varepsilon<M$ where

M=\min{\{\delta_{X}(x,y)\mid x,y\in X,x\leq y\}}

While this gives a lower bound for $\varepsilon$ , the arbitrary nature of the chosen quasi-metric gives the following fundamental result.

Theorem 4.

There does not exist an asynchronous protocol that always solves the $\varepsilon$ -bounded lattice agreement problem for any $\varepsilon>0$ .

Proof.

It is easy to construct a lattice and set of inputs for which all elements in the join-closed sub-lattice generated by the inputs are all more than $\varepsilon$ units apart. Here, $\varepsilon$ -Tightness is certainly impossible to achieve. ∎

Observe that Theorem 4 (and the following Corollary) is over all lattices, however it does not say anything about existence of protocols for specific lattices. Soon, we will prove an impossibility result on the upper-bound depending on not only the lattice, but the particular instance in the lattice as well.

Corollary 2.

The asynchronous $\varepsilon$ -bounded lattice agreement problem is unsolvable for all $\varepsilon\geq 0$ with $f\geq 1$ .

Proof.

Immediate from Theorems 3 and 4. ∎

This result shows us that, even for viable values of $\varepsilon$ , there is no protocol relying on $\varepsilon$ that can always guarantee a valid set of outputs per the problem. Furthermore, this shows that we cannot make an algorithm for arbitrary tightness stronger than the upper-bound given by standard lattice agreement (for arbitrary lattices and quasi-metrics). Therefore, we instead present algorithms for achieving better $\varepsilon$ -Tightness, but not a particular value of $\varepsilon$ . That is, we would like to attain stronger upper-bounds on compliance.

Theorem 5.

There is no asynchronous reconciliation protocol that guarantees $\varepsilon$ -Tightness for any fixed $\varepsilon<D^{\prime}$ when $f\geq 1$ .

Proof.

Let $Y=\{y_{i}\mid i\in[n]\}$ and suppose

\delta_{X}(\min{Y},\max{Y})=\max_{i\in[n]}\,\delta_{X}(x_{i},\bigsqcup_{j\in[n]}x_{j})=D^{\prime}.

Then, it must be the case that such a protocol changes the value of $\min{Y}$ or $\max{Y}$ in their respective processes. Suppose $\min{Y}=y_{1}=\ldots=y_{n-1}=x_{1}=\ldots=x_{n-1}$ and $\max{Y}=y_{n}=x_{n}=\bigsqcup_{j\in[n]}x_{j}$ .

We cannot move $y_{n}=\max{Y}$ to any other point in the lattice since either Downward or Upward-Validity would be violated.

We cannot move any $y_{1},\ldots,y_{n-1}$ down in the lattice, since then Downward-Validity would be violated. Then, to decrease $\varepsilon$ by changing $\min{Y}$ , we must move all $y_{1},\ldots,y_{n-1}$ up the lattice.

Since the system is asynchronous, we may repeatedly delay $p_{n}$ ’s message, meaning that transitions in the state of values are made without depending on $y_{n}$ . Then, if some process $p_{i}$ changes their value $y_{i}$ to a different $y_{i}^{\prime}$ , then this transition could also occur if $x_{n}=y_{n}=y_{1}=\ldots$ . This would then violate Upward-Validity.

Therefore, an adversary in this situation can always enforce a transition to an illegal state by delaying $p_{n}$ ’s messages. ∎

While Theorem 5 seems similar to the previous unsolvability results, there is an important distinction. This theorem states that we cannot even attain a guarantee-able upper-bound on $\gamma$ -compliance better than $D^{\prime}$ .

This motivates a heuristic approach to $\varepsilon$ -bounded lattice agreement, where we would like to attain such an upper-bound (or something even stronger) on average or with high probability.

VI-A Heuristic Algorithm

If we have a normal quasi-metric, then improving the $\gamma$ -compliance from $\max_{i,j\in[n]}\delta_{X}(y_{i},y_{j})$ (by Lemma 1) is simply a matter of either moving all minimum values up the lattice or moving all the maximum values down the lattice (while maintaining validity).

Moving maximum values down can certainly cause us to violate Downward-Validity, however, since the original input $x_{i}$ of a maximum value $y_{i}$ could be equal to it (that is, $x_{i}=y_{i}$ ). Moving minimum values up is therefore a much better option, since we only need to stay below $\bigsqcup_{j\in[n]}x_{j}$ . Without knowing the inputs, however, this may be challenging. The only thing we do know in a reconciliation protocol is that all values satisfy Upward-Validity, so we can rely on these values.

Here, we present a very simple algorithm for attaining better $\gamma$ -compliance with high probability (depending on the chosen value of $k$ , and of course the set of initial values).

Algorithm 2

{\rm DR}(k)

y_{i}\leftarrow\text{Output from lattice agreement}

2: for

r=1

k

3: Send

(y_{i},r)

to all

4: Receive

n-f

(\cdot,r)

messages. Let

V_{r}

be the set of received values.

y_{i}\leftarrow\max{(V_{r}\cup\{y_{i}\})}

6: end for

7: Output

y_{i}

Let $y_{i}^{r}$ be the value of $y_{i}$ in process $p_{i}$ before executing the for loop for round $r$ . Then, the initial value (output from lattice agreement) is $y_{i}^{1}$ , and we let $y_{i}^{k+1}$ be the final output value. Then, let $A_{r}=\{y_{i}^{r}\mid i\in[n]\}$ .

Lemma 8.

For all $1\leq m\leq k$ , $A_{m+1}\subseteq A_{m}$ .

Theorem 6.

Downward-Validity, Upward-Validity, and Comparability are all maintained.

Proof.

For each $y_{i}^{r}$ , we have $y_{i}^{r}\leq y_{i}^{r+1}$ , and since $x_{i}\leq y_{i}^{1}$ , we then have $x_{i}\leq y_{i}^{k+1}$ , so Downward-Validity is maintained.

By Lemma 8, we have that $A_{k+1}\subseteq A_{1}$ , so for all $i\in[n]$ , $\exists j\in[n]$ such that $y_{i}^{k+1}=y_{j}^{1}$ . Since the initial values all satisfy Upward-Validity, we then have that $y_{i}^{k+1}$ satisfies Upward-Validity. This same argument also follows to show that Comparability is maintained as well. ∎

Theorem 7.

For any instance $(I,Y,Y^{\prime})$ of this algorithm, if $(I,Y)$ is $\gamma$ -compliant, then $(I,Y^{\prime})$ is $\gamma^{\prime}$ -compliant for some $\gamma^{\prime}\leq\gamma$ .

Proof.

By Lemma 8, the maximum distance between any pair of values can only decrease from $A_{m}$ to $A_{m+1}$ . ∎

Observe that none of these proofs uses the normality of $\delta_{X}$ . Therefore, the algorithm still works for non-normal quasi-metrics, however there is a much greater likelihood of improving $\gamma$ -compliance with a normal quasi-metric since we can guarantee improvement when minimum values are moved up.

Simulating this algorithm for large $n$ is computationally difficult, so we instead provide an easier to compute non-distributed abstract semantics that gives a good approximation of the improvement rates with ${\rm DR}(k)$ .

VI-B Simplified Approximate Model

Since, for all $i\in[n]$ , $y_{i}^{r}\leq y_{i}^{r+1}$ , we only need the minimum values to change in order to attain a $\gamma$ -compliance better than $D^{\prime}$ . This gives rise to a simple model that allows us to analyze any given instance of ${\rm DR}(k)$ as a discrete-time absorbing Markov chain.

The idea is to represent states by 0s and 1s, where a 0 represents a minimum value (that is, some $y_{i}^{r}$ such that $y_{i}^{r}=\min{\{y_{j}^{1}\mid j\in[n]\}}$ ) and 1 represents a non-minimum value. The goal is to have all 1s and no 0s, so that there are no more minimum values (with respect to the initial values).

At each round of the algorithm, each 0 picks $n-f$ of the current state’s 0s and 1s, and changes to a 1 only if it picked at least 1 (that is, it takes the max).

For any given process, we assume that the probability of receiving a message from another process on round $r$ is uniformly distributed. That is, there is an equal chance for $p_{i}$ to receive from $p_{j}$ for all $j\in[n]$ at each round. Thus, in the simplified model, this means that each bit has an equal chance of being chosen by each 0 independently.

We also assume that crashes are independent, and each process has an equal chance $p_{f}$ to crash (among the first $f$ crashes). To simplify this model, we say that line 3 of ${\rm DR}$ is atomic, so processes cannot send a message to some processes and crash before sending it to the rest of the processes.

Definition 5.

The state space $\mathbb{S}_{n,f}$ (or simply $\mathbb{S}$ when $n,f$ are clear from context) for $n$ processes and at most $f$ failures consists of all vectors $\langle s_{1},\ldots,s_{n}\rangle$ such that $s_{i}\in\{0,1,\bot\}$ for all $i\in[n]$ , and there are no more than $f$ components equal to $\bot$ and at least one component equals 1.

For convenience, we say that $\bot<0<1$ .

Definition 6.

A state $S^{\prime}=\langle s_{1}^{\prime},\ldots,s_{n}^{\prime}\rangle\in\mathbb{S}$ is reachable from state $S=\langle s_{1},\ldots,s_{n}\rangle\in\mathbb{S}$ (written $S_{1}\rightarrow S_{2}$ ) if

•

$\forall i\in[n]:s_{i}=\bot\implies s_{i}^{\prime}=\bot$
•

$\forall i\in[n]:s_{i}=1\implies s_{i}^{\prime}\neq 0$ .

This definition corresponds to the reachable states after running one round of ${\rm DR}$ .

We now define the set of “good” states, in which compliance was improved.

Definition 7.

The set of improved states $\mathbb{F}\subseteq\mathbb{S}$ is the set $\mathbb{F}=\{\langle s_{1},\ldots,s_{n}\rangle\in\mathbb{S}\mid(\forall i\in[n]:s_{i}\neq 0)\vee(\forall i\in[n]:s_{i}\neq 1)\}$ .

We now provide an algorithm for the abstract semantics with $k$ rounds beginning at state $S$ , and traversing the state reachability graph (as determined by the random $n-f$ choices).

Algorithm 3 Simplified Approximate Model Simulation

x\leftarrow 0

2: for

r\leftarrow 1

k

A\leftarrow\text{copy of $S$}

4: for

i\leftarrow 1

n

5: if

x<f

and

p_{f}

probability then

S[i]\leftarrow\bot

x\leftarrow x+1

8: continue

9: end if

10: if

S[i]\neq 0

then

11: continue

12: end if

13:

C\leftarrow\text{choose $n-f$ random values from $S$}

14:

A[i]\leftarrow\max(C\cup\{S[i]\})

15: end for

16:

S\leftarrow A

17: end for

18: return

S\in\mathbb{F}

This algorithm was simulated for 1000 runs with $n=1000$ . The following tables show results for when $f=200$ and $p_{f}=0.06$ . Table I gives success rates over the 1000 runs with randomized inputs. Table II gives success rates with worst-case input, which is all 0s and one 1.

TABLE I: Success rates of random input with

n=1000,p_{f}=0.06

	$k=2$	$k=3$	$k=4$
$f=200$	17.1%	90.3%	99.9%
$f=800$	16.8%	89.6%	99.3%

TABLE II: Success rates of worst-case input with

n=1000,p_{f}=0.06

	$k=2$	$k=3$	$k=4$	$k=5$
$f=200$	0.0%	41.3%	97.6%	100.0%
$f=800$	0.0%	5.4%	84.0%	98.9%

We also tested $p_{f}=0.5$ to $p_{f}=0.8$ in 0.1 for $f=800$ and observed that, for $k=2$ , 0% of simulations succeeded in achieving a better upper-bound, however for $k=3$ , 100% of simulations succeeded. This is to be expected, since using more of the fault budget allows either all 1s to be eliminated, or enough 0s to be eliminated so that a 1 is guaranteed among $n-f$ choices.

Overall, we see that only a very small number of rounds in relation to the number of processes is needed for very high probability to improve the upper-bound. For 1000 processes, the average case (randomized input) gives that we only need to run 4 rounds to get a high probability of success (5 for a near guarantee, even in worst-case input), which is nearly negligible compared to the number of rounds needed for the lattice agreement itself.

We conjecture that the same probability of improvement is the same at $k=5$ for any number of processes $n$ , failures $f$ , and chance of failure $p_{f}$ , with minimal error. That is, we conjecture that such a high probability of improvement is achieved with ${\rm DR}(5)$ in any system (under the assumptions of our original system model). This is because the same amount “spreading” per round will occur proportional to the number of processes, causing it to converge at the same round, even if the first few rounds differ when $n,f,p_{f}$ differ.

Furthermore, this would mean that our heuristic algorithm only needs $O(1)$ (constant) rounds to achieve improved $\gamma$ -compliance, and the constant number of rounds is very small (5, as per the conjecture).

VII Conclusion

In this paper, we proposed a new constraint on the standard lattice agreement problem by requiring outputs to be “close” to each other w.r.t. a quasi-metric on the lattice. We show that this new problem whose range is over $\mathbb{R}_{\geq 0}^{\infty}$ can be reduced to a discrete set of equivalence classes on this range. An upper-bound on tightness is derived for standard lattice agreement, and we show that these upper-bounds are stubborn in the presence of asynchrony. For synchronous systems, we show that the problem is easily solvable for all desired levels of tightness. For asynchronous systems, we provide a variety of impossibility results, and one which states that the upper-bound achieved in standard lattice agreement cannot be improved with certainty. This led us to present a heuristic algorithm for attaining improved tightness with high probability in asynchronous systems. We then modeled this heuristic algorithm with an approximation and simulated this model to obtain an approximation on the probabilities of improving tightness with $k$ rounds. Finally, we conjectured that only a constant number ( $k=5$ ) of rounds are necessary to achieve high probability of improvement in the presence of any number of processes, faults ( $f<n$ ), and any fault probability.

References

[1] Hagit Attiya, Maurice Herlihy, and Ophir Rachman. Atomic snapshots using lattice agreement. Distrib. Comput., 8(3):121–132, March 1995.
[2] Jose M. Faleiro, Sriram Rajamani, Kaushik Rajan, G. Ramalingam, and Kapil Vaswani. Generalized lattice agreement. In Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing, PODC ’12, page 125–134, New York, NY, USA, 2012. Association for Computing Machinery.
[3] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985.
[4] Petr Kuznetsov, Thibault Rieutord, and Sara Tucci-Piergiovanni. Reconfigurable lattice agreement and applications. 2019.
[5] Marios Mavronicolas. A bound on the rounds to reach lattice agreement. 2000.
[6] Xiong Zheng, Changyong Hu, and Vijay K. Garg. Lattice agreement in message passing systems. 2018.