Bounding Distance Between Outputs in Distributed Lattice Agreement
Abstract
This paper studies the lattice agreement problem and proposes a stronger form, -bounded lattice agreement, that enforces an additional tightness constraint on the outputs. To formalize the concept, we define a quasi-metric on the structure of the lattice, which captures a natural notion of distance between lattice elements. We consider the bounded lattice agreement problem in both synchronous and asynchronous systems, and provide algorithms that aim to minimize the distance between the output values, while satisfying the requirements of the classic lattice agreement problem. We show strong impossibility results for the asynchronous case, and a heuristic algorithm that achieves improved tightness with high probability, and we test an approximation of this algorithm to show that only a very small number of rounds are necessary.
Index Terms:
Lattice agreement, fault-tolerance, consensus.I Introduction
The Lattice Agreement problem, introduced in [1], is a weakened form of the distributed decision problem. Each process begins with a value from a lattice, and the goal is for all processes to decide on values that are mutually comparable. Lattice agreements loosens the constraints of the consensus problem, which requires strict uniformity and validity in the output values and is impossible in asynchronous systems [3]. Lattice agreement has the advantage of decidability in the presence of asynchrony and faults [1], making it an important and useful problem in distributed systems.
Lattice agreement has a variety of applications in practice, and was first proposed as a method for attaining consistent atomic snapshots [1]. A generalized form of the lattice agreement problem was later proposed to implement fault-tolerant replicated state machines [2]. The generalized lattice agreement [2] is also shown to be useful in achieving linearizable data structures, particularly in conflict-free replicated data types. Another version of lattice agreement, reconfigurable lattice agreement, was proposed in [4] in order to handle agreement on the system configuration for client and server consistency.
While better bounds on lattice agreement algorithms were found in [5], they were improved upon in [6] with algorithms for synchronous, asynchronous, and generalized lattice agreement.
In this paper, we propose a new constraint, tightness, on standard lattice agreement that requires all outputs to be within a certain distance of each other as determined by a quasi-metric over the lattice. Such a requirement can force processes to agree on values (such as the global view in atomic snapshot) that are closer to each other. Minimizing this distance between intermediate states in applications such as atomic snapshots and replicated state machines could lead to faster eventual convergence. Reducing the divergence between replicas during the reconciliation phase may mean that fewer overall synchronization operations are needed, allowing the system to reach eventual consistency more efficiently. Additionally processes can potentially recover more quickly in the case of node crashes, because it reduces the risk of losing information from data loss. It can also ensure that processes are “synchronized enough” with respect to what they must agree on, in a similar notion as clock synchronization.
We show that this problem has equivalences to standard lattice agreement under particular bounds, and that there is a discrete structure to the range of tightness on the outputs, despite the continuity of the range on the quasi-metric. We briefly show that this problem is completely solved in the synchronous case, but is much trickier in the asynchronous case. While standard lattice agreement is solvable asynchronously, better tightness can never be guaranteed. This motivates our heuristic approach, which we provide an approximate model of and simulate to get approximate probabilities of improved tightness. We see that after only 5 rounds of our algorithm, the probability converges to a near guarantee. Finally, we conjecture that this number of rounds is constant, no matter the number of processes, faults, or the chance of a fault.
II Background
We will let denote the set . We will also let , where for all .
A triple is called a join semi-lattice if is a poset, and is a join operator such that the join of every non-empty finite subset is defined. For brevity, we will refer to join semi-lattices simply as lattices. We say is grounded if there exists an element such that for all . For any subset , we let mean .
With any lattice , we will feel free to say for to mean .
III System Model
We consider a system model consisting of distributed processes, in which at most faults may occur. We only consider faults on processes, which suffer crash faults.
We will consider the system to be either synchronous or asynchronous, and each process can send and receive messages from every other process including itself.
IV Distributed Lattice Agreement
The original lattice agreement is as follows. Let be a finite lattice, and let there be processes in the system. Each process proposes an input value , and each decides on an output . To solve the lattice agreement problem, the following constraints must be met:
-
•
Downward-Validity:
-
•
Upward-Validity:
-
•
Comparability:
We introduce a stronger version of this problem by requiring the outputs to all be “close” to each other with respect to some distance function on the lattice.
Definition 1.
Let be a lattice. Then a function is called a quasi-metric over if for all ,
-
(i)
-
(ii)
-
(iii)
Definition 2.
A lattice with a quasi-metric is called a lattice quasi-metric space.
In a lattice quasi-metric space, can intuitively be described as how “close” and are in some natural sense. For example, in the lattice of sets of integers, we would naturally have the distance between and is smaller than the distance between and , or that the distance between and is smaller than the distance between and (all depending on the context of the lattice). Observe that, while the length of the shortest path from to may work as a quasi-metric, this is not the quasi-metric that one may want to use, as there may be paths in the lattice that should naturally be considered longer than others (in the sense of distance), even if the paths are the same length in the lattice.
We may also find it easier to work with more natural quasi-metrics, so we may often impose a requirement of a normal quasi-metric.
Definition 3.
A quasi-metric over a lattice is said to be height-normal (or simply, normal) if it satisfies
-
•
for all .
We now introduce a new version of the lattice agreement problem. The -bounded lattice agreement problem (with ) is as follows. Let be a lattice quasi-metric space with . Each process proposes an input value , and each decides on an output . To solve the -bounded lattice agreement, Downward-Validity, Upward-Valildity, and Comparability must all be met, along with one additional constraint:
-
•
-Tightness:
This forces the output values to be approximately the same, i.e. “close” to each other.
Later in the paper, we will be interested in studying the tightness of specific instances of the -bounded lattice agreement problem. For the following definitions, an instance of a protocol is a pair of sets of input and output values, respectively. In context, and .
Definition 4.
An instance of a protocol of the lattice agreement problem is said to be -compliant if
-
(1)
The outputs satisfy -Tightness;
-
(2)
There is no such that the outputs satisfy -Tightness.
Lemma 1.
Any instance of any lattice agreement protocol is -compliant for a unique determined by .
Proof.
Let (which must satisfy , since the outputs form a chain by Comparability). Then the outputs certainly satisfy -Tightness by definition, and if , then the that maximize (i.e. the such that ) do not satisfy , so -Tightness is not satisfied, and so the instance is -compliant. Furthermore, this means the instance is not -compliant for any . If , the instance is clearly not -compliant by Condition 2. Thus, is unique. ∎
We also prove that, for normal , is determined by the min and max outputs.
Lemma 2.
For any instance of the lattice agreement problem and a normal , let , and . Then, this instance is -compliant.
Proof.
First observe that exists by Comparability. Let and , and let such that . Then , so by normality of , . Thus, by Lemma 1, (since have the maximum distance among the outputs as just shown). ∎
IV-A Equivalences
In the following lemmas we present equivalences between different instances of the -bounded lattice agreement problem and one result displaying an equivalence with the original lattice agreement problem.
An obvious equivalence is that a protocol which guarantees tighter bounds can also be used to solve a bounded lattice agreement problem for looser bounds.
Lemma 3.
Let . Then any protocol that solves the -bounded lattice agreement problem can be used to solve the -bounded lattice agreement problem.
Proof.
Downward-Validity, Upward-Validity, and Comparability are all satisfied by the -bounded protocol. By -Tightness, for all , and since , we have , so -Tightness is also satisfied. ∎
The following lemma shows that the -bounded lattice agreement problem for a less than the minimum distance between any pair of elements in the lattice is the same as the 0-bounded lattice agreement problem.
Lemma 4.
Let such that . The -bounded lattice agreement problem is equivalent to the 0-bounded lattice agreement problem.
Proof.
Suppose there exists a protocol to solve the -bounded lattice agreement problem for such a . Then, by -Tightness, for all (correct) with .
Suppose there is some . By comparability, either or . Suppose, wlog, the former is true. Then by our chosen , we have , which contradicts -Tightness. Thus, there is no such that , and so this protocol solves the 0-bounded lattice agreement problem.
The other direction is trivial since . ∎
We also show another equivalence, in that the original lattice agreement problem is equivalent to the -bounded agreement problem when , where
Intuitively, is the longest distance between any two points in the range of values in which the outputs can lie.
Lemma 5.
The lattice agreement problem is equivalent to the -bounded lattice agreement problem.
Proof.
Let be a protocol solving the lattice agreement problem. Then, by Downward-Validity for all . By Upward-Validity, , so for all , so for all by maximality of .
The other direction is trivial since Downward-Validity, Upward-Validity, and Comparability are all achieved in the -bounded lattice agreement problem. ∎
If we have normality in , however, then we may attain a stricter upper-bound on for guaranteed by the lattice agreement problem. We will let
and we first show that is indeed a stronger upper-bound.
Lemma 6.
For an input set , .
Proof.
Since for all , and , we immediately have . ∎
Lemma 7.
If is normal, then the lattice agreement problem is equivalent to the -bounded lattice agreement problem.
Proof.
Let be a protocol solving the lattice agreement problem. Let be an input which maximizes (that is, such that ). Then, by Downward-Validity, . Let be any output, then there are two cases by Comparability. If , then , so by normality. If , then by normality, and since is maximized by .
Then, for any pair of outputs (or, wlog, ), we have .
The other direction is trivial as stated in Lemma 5. ∎
These equivalences will help us reduce a seemingly continuous set of problems to an equivalent discrete set of problems that is much easier to reason about.
IV-B Reconciliation Protocols
Suppose we already have a set of output values satisfying Downward-Validity, Upward-Validity, and Comparability with respect to some input set . Our goal is to achieve -Tightness with at most faults when possible. These algorithms that begin with outputs from standard lattice agreement will be called reconciliation protocols, and they are correct w.r.t. when Downward-Validity, Upward-Validity, and Comparability are all maintained in the new outputs, and -Tightness is also achieved in the new outputs.
An instance of a reconciliation protocol then consists of an additional set representing the outputs from reconciliation on , which satisfies lattice agreement requirements w.r.t. .
We show an important result that allows us to focus merely on reconciliation protocols for the sake of analyzing solvability of the -bounded lattice agreement problem.
Theorem 1.
There exists a -reconciliation protocol with faults if and only if there exists a protocol solving -bounded lattice agreement with faults.
Proof.
If there exists a -reconciliation protocol, then a protocol solving -bounded lattice agreement by simply running a standard lattice agreement protocol (e.g. synchronous and asynchronous algorithms presented in [1]), followed by the reconciliation protocol. The outputs from the lattice agreement must satisfy Downward-Validity, Upward-Validity, and Comparability, so they are valid inputs to the reconciliation protocol. By definition, the reconciliation protocol must maintain Downward-Validity, Upward-Validity, and Comparability, and it must also achieve -Tightness. Therefore, the -bounded lattice agreement is solved.
In the other direction, simply run the -bounded lattice agreement protocol. Then, the output values satisfy everything that is needed for a -reconciliation protocol, so we are done. ∎
This shows that it is sufficient to study reconciliation protocols for the sake of solvability of the -bounded lattice agreement problem, and vice versa. This is not the case, however, when attempting to minimize message complexity (as is clear from the proof).
V Synchronous Reconciliation
Solvability in synchronous systems is very straightforward by a generalizable synchronous algorithm for binary consensus.
Theorem 2.
All correct processes decide on the same value.
Proof.
Since there are rounds and at most faults, there must (by pigeonhole principle) be a round in which no faults occur. That is, all correct processes send their set of unsent values to every other process, and all correct processes receive the respective sets from all other correct processes. Thus, at the end of that round, all processes must receive the same values, and end with the same . Then, stays the same for the remaining rounds, so all correct processes finish by deciding on the same value. ∎
Corollary 1.
The synchronous -bounded lattice agreement problem is solvable for all .
VI Asynchronous Reconciliation
While we can solve the -bounded lattice agreement problem for all synchronously, we certainly cannot solve the 0-bounded lattice agreement problem asynchronously.
Theorem 3.
The asynchronous 0-bounded lattice agreement problem is unsolvable for .
Proof.
Suppose there exists a protocol to solve the asynchronous 0-bounded lattice agreement problem. We show that such a protocol would solve the FLP asynchronous binary consensus problem presented in [3].
Then, it certainly terminates since every process eventually decides on an output. Let the lattice be with . Then, if for all , , and so by Downward and Upward-Validity (and similarly if for all ). Thus we satisfy the Validity requirement of FLP.
Agreement is guaranteed by 0-Tightness, so this protocol guarantees FLP conditions. Since there may be a crash fault (), this is a contradiction by the FLP result [3]. ∎
In fact, by Lemma 4, we cannot solve this problem asynchronously for where
While this gives a lower bound for , the arbitrary nature of the chosen quasi-metric gives the following fundamental result.
Theorem 4.
There does not exist an asynchronous protocol that always solves the -bounded lattice agreement problem for any .
Proof.
It is easy to construct a lattice and set of inputs for which all elements in the join-closed sub-lattice generated by the inputs are all more than units apart. Here, -Tightness is certainly impossible to achieve. ∎
Observe that Theorem 4 (and the following Corollary) is over all lattices, however it does not say anything about existence of protocols for specific lattices. Soon, we will prove an impossibility result on the upper-bound depending on not only the lattice, but the particular instance in the lattice as well.
Corollary 2.
The asynchronous -bounded lattice agreement problem is unsolvable for all with .
This result shows us that, even for viable values of , there is no protocol relying on that can always guarantee a valid set of outputs per the problem. Furthermore, this shows that we cannot make an algorithm for arbitrary tightness stronger than the upper-bound given by standard lattice agreement (for arbitrary lattices and quasi-metrics). Therefore, we instead present algorithms for achieving better -Tightness, but not a particular value of . That is, we would like to attain stronger upper-bounds on compliance.
Theorem 5.
There is no asynchronous reconciliation protocol that guarantees -Tightness for any fixed when .
Proof.
Let and suppose
Then, it must be the case that such a protocol changes the value of or in their respective processes. Suppose and .
We cannot move to any other point in the lattice since either Downward or Upward-Validity would be violated.
We cannot move any down in the lattice, since then Downward-Validity would be violated. Then, to decrease by changing , we must move all up the lattice.
Since the system is asynchronous, we may repeatedly delay ’s message, meaning that transitions in the state of values are made without depending on . Then, if some process changes their value to a different , then this transition could also occur if . This would then violate Upward-Validity.
Therefore, an adversary in this situation can always enforce a transition to an illegal state by delaying ’s messages. ∎
While Theorem 5 seems similar to the previous unsolvability results, there is an important distinction. This theorem states that we cannot even attain a guarantee-able upper-bound on -compliance better than .
This motivates a heuristic approach to -bounded lattice agreement, where we would like to attain such an upper-bound (or something even stronger) on average or with high probability.
VI-A Heuristic Algorithm
If we have a normal quasi-metric, then improving the -compliance from (by Lemma 1) is simply a matter of either moving all minimum values up the lattice or moving all the maximum values down the lattice (while maintaining validity).
Moving maximum values down can certainly cause us to violate Downward-Validity, however, since the original input of a maximum value could be equal to it (that is, ). Moving minimum values up is therefore a much better option, since we only need to stay below . Without knowing the inputs, however, this may be challenging. The only thing we do know in a reconciliation protocol is that all values satisfy Upward-Validity, so we can rely on these values.
Here, we present a very simple algorithm for attaining better -compliance with high probability (depending on the chosen value of , and of course the set of initial values).
Let be the value of in process before executing the for loop for round . Then, the initial value (output from lattice agreement) is , and we let be the final output value. Then, let .
Lemma 8.
For all , .
Theorem 6.
Downward-Validity, Upward-Validity, and Comparability are all maintained.
Proof.
For each , we have , and since , we then have , so Downward-Validity is maintained.
By Lemma 8, we have that , so for all , such that . Since the initial values all satisfy Upward-Validity, we then have that satisfies Upward-Validity. This same argument also follows to show that Comparability is maintained as well. ∎
Theorem 7.
For any instance of this algorithm, if is -compliant, then is -compliant for some .
Proof.
By Lemma 8, the maximum distance between any pair of values can only decrease from to . ∎
Observe that none of these proofs uses the normality of . Therefore, the algorithm still works for non-normal quasi-metrics, however there is a much greater likelihood of improving -compliance with a normal quasi-metric since we can guarantee improvement when minimum values are moved up.
Simulating this algorithm for large is computationally difficult, so we instead provide an easier to compute non-distributed abstract semantics that gives a good approximation of the improvement rates with .
VI-B Simplified Approximate Model
Since, for all , , we only need the minimum values to change in order to attain a -compliance better than . This gives rise to a simple model that allows us to analyze any given instance of as a discrete-time absorbing Markov chain.
The idea is to represent states by 0s and 1s, where a 0 represents a minimum value (that is, some such that ) and 1 represents a non-minimum value. The goal is to have all 1s and no 0s, so that there are no more minimum values (with respect to the initial values).
At each round of the algorithm, each 0 picks of the current state’s 0s and 1s, and changes to a 1 only if it picked at least 1 (that is, it takes the max).
For any given process, we assume that the probability of receiving a message from another process on round is uniformly distributed. That is, there is an equal chance for to receive from for all at each round. Thus, in the simplified model, this means that each bit has an equal chance of being chosen by each 0 independently.
We also assume that crashes are independent, and each process has an equal chance to crash (among the first crashes). To simplify this model, we say that line 3 of is atomic, so processes cannot send a message to some processes and crash before sending it to the rest of the processes.
Definition 5.
The state space (or simply when are clear from context) for processes and at most failures consists of all vectors such that for all , and there are no more than components equal to and at least one component equals 1.
For convenience, we say that .
Definition 6.
A state is reachable from state (written ) if
-
•
-
•
.
This definition corresponds to the reachable states after running one round of .
We now define the set of “good” states, in which compliance was improved.
Definition 7.
The set of improved states is the set .
We now provide an algorithm for the abstract semantics with rounds beginning at state , and traversing the state reachability graph (as determined by the random choices).
This algorithm was simulated for 1000 runs with . The following tables show results for when and . Table I gives success rates over the 1000 runs with randomized inputs. Table II gives success rates with worst-case input, which is all 0s and one 1.
17.1% | 90.3% | 99.9% | |
16.8% | 89.6% | 99.3% |
0.0% | 41.3% | 97.6% | 100.0% | |
0.0% | 5.4% | 84.0% | 98.9% |
We also tested to in 0.1 for and observed that, for , 0% of simulations succeeded in achieving a better upper-bound, however for , 100% of simulations succeeded. This is to be expected, since using more of the fault budget allows either all 1s to be eliminated, or enough 0s to be eliminated so that a 1 is guaranteed among choices.
Overall, we see that only a very small number of rounds in relation to the number of processes is needed for very high probability to improve the upper-bound. For 1000 processes, the average case (randomized input) gives that we only need to run 4 rounds to get a high probability of success (5 for a near guarantee, even in worst-case input), which is nearly negligible compared to the number of rounds needed for the lattice agreement itself.
We conjecture that the same probability of improvement is the same at for any number of processes , failures , and chance of failure , with minimal error. That is, we conjecture that such a high probability of improvement is achieved with in any system (under the assumptions of our original system model). This is because the same amount “spreading” per round will occur proportional to the number of processes, causing it to converge at the same round, even if the first few rounds differ when differ.
Furthermore, this would mean that our heuristic algorithm only needs (constant) rounds to achieve improved -compliance, and the constant number of rounds is very small (5, as per the conjecture).
VII Conclusion
In this paper, we proposed a new constraint on the standard lattice agreement problem by requiring outputs to be “close” to each other w.r.t. a quasi-metric on the lattice. We show that this new problem whose range is over can be reduced to a discrete set of equivalence classes on this range. An upper-bound on tightness is derived for standard lattice agreement, and we show that these upper-bounds are stubborn in the presence of asynchrony. For synchronous systems, we show that the problem is easily solvable for all desired levels of tightness. For asynchronous systems, we provide a variety of impossibility results, and one which states that the upper-bound achieved in standard lattice agreement cannot be improved with certainty. This led us to present a heuristic algorithm for attaining improved tightness with high probability in asynchronous systems. We then modeled this heuristic algorithm with an approximation and simulated this model to obtain an approximation on the probabilities of improving tightness with rounds. Finally, we conjectured that only a constant number () of rounds are necessary to achieve high probability of improvement in the presence of any number of processes, faults (), and any fault probability.
References
- [1] Hagit Attiya, Maurice Herlihy, and Ophir Rachman. Atomic snapshots using lattice agreement. Distrib. Comput., 8(3):121–132, March 1995.
- [2] Jose M. Faleiro, Sriram Rajamani, Kaushik Rajan, G. Ramalingam, and Kapil Vaswani. Generalized lattice agreement. In Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing, PODC ’12, page 125–134, New York, NY, USA, 2012. Association for Computing Machinery.
- [3] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985.
- [4] Petr Kuznetsov, Thibault Rieutord, and Sara Tucci-Piergiovanni. Reconfigurable lattice agreement and applications. 2019.
- [5] Marios Mavronicolas. A bound on the rounds to reach lattice agreement. 2000.
- [6] Xiong Zheng, Changyong Hu, and Vijay K. Garg. Lattice agreement in message passing systems. 2018.