This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

University of Texas at Austin, Austin, TX 78712, [email protected] University of Texas at Austin, Austin, TX 78712, [email protected] \CopyrightXiong Zheng, and Vijay K. Garg\ccsdesc[100]Theory of Computation Distributed Algorithms \supplement

Acknowledgements.

Byzantine Lattice Agreement in Asynchronous Systems

Xiong Zheng    Vijay K. Garg
Abstract

We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passing systems. In the BLA problem, each process proposes a value from a join semi-lattice and needs to output a value also in the lattice such that all output values of correct processes lie on a chain despite the presence of Byzantine processes. We present an algorithm for this problem with round complexity of O(logf)O(\log f) which tolerates f<n5f<\frac{n}{5} Byzantine failures in the asynchronous setting without digital signatures, where nn is the number of processes. We also show how this algorithm can be modified to work in the authenticated setting (i.e., with digital signatures) to tolerate f<n3f<\frac{n}{3} Byzantine failures.

keywords:
Lattice Agreement, Byzantine, Gradecast
category:
\relatedversion

1 Introduction

In distributed systems, reaching agreement in the presence of process failures is a fundamental task. Understanding the kind of agreement that can be reached helps us understand the limitation of distributed systems with failures. Consensus [15] is the most fundamental problem in distributed computing. In this problem, each process proposes some input value and has to decide on some output value such that all correct processes decide on the same valid output. In synchronous message systems with crash failures, consensus cannot be solved in fewer than f+1f+1 rounds [8]. In asynchronous systems, consensus is impossible in the presence of even one crash failure [10]. The kk-set agreement [4] is a generalization of consensus, in which processes can decide on at most kk values instead of just one single value. The kk-set agreement cannot be solved in asynchronous systems if the number of crash failures fkf\geq k [2, 11]. The paper [5] shows that kk-set agreement problem cannot be solved within fk\lfloor\frac{f}{k}\rfloor rounds if nf+k+1n\geq f+k+1 in crash failure model.

The lattice agreement problem was proposed by Attiya et al [1] to solve the atomic snapshot object problem in shared memory systems. In this problem, each process i[n]i\in[n] has input xix_{i} and needs to output yiy_{i} such that the following properties are satisfied.

1) Downward-Vaility: xiyix_{i}\leq y_{i} for each correct process ii.

2) Upward-Validity: yi{xi|i[n]}y_{i}\leq\sqcup\{x_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in[n]\}.

3) Comparability: for any two correct processes ii and jj, either yiyjy_{i}\leq y_{j} or yjyiy_{j}\leq y_{i}.

The lattice agreement problem is a weaker problem than the consensus problem and the kk-set agreement problem. It can be solved in O(logf)O(\log f) rounds in synchronous systems and tolerate f<nf<n crash failures [20]. In asynchronous systems, it can also be solved in O(logf)O(\log f) rounds but can only tolerate f<n2f<\frac{n}{2} crash failures [19].

Attiya et al in [1] presents a generic algorithm to transform any protocol for the lattice agreement problem to a protocol for implementing an atomic snapshot object in shared memory systems. This transformation can be easily implemented in message passing systems by replacing each read and write step with sending “readread” and “writewrite” messages to all and waiting for acknowledgements from nfn-f different processes. On the other hand, if we can implement an atomic snapshot object, lattice agreement can also be solved easily both on shared-memory and message passing systems with only crash failures. Thus, solving the lattice agreement problem in message passing systems is equivalent to implementing an atomic snapshot object in message passing systems with only crash failures.

Using lattice agreement protocols, Faleiro et al [9] gives procedures to build a special class of linearizable and serializable replicated state machines which only support query operations and update operations but not mixed query-update operations. Later, Xiong et al [19] proposes some optimizations for their procedure for implementing replicated state machines from lattice agreement in practice. They propose a method to truncate the logs maintained in the procedure in [9]. The recent paper [17] by Skrzypczak et al proposes a protocol based on generalized lattice agreement, which is a multi-shot version of lattice agreement problem, to provide linearizability for state based conflict-free data types [16] the procedure given in [9] in terms of memory consumption, at the expense of progress.

In message passing systems with crash failures, the lattice agreement problem is well studied  [1, 19, 20, 13]. The best upper bound for both synchronous systems and asynchronous systems is O(logf)O(\log f) rounds. In the Byzantine failure model, a variant of the lattice agreement problem is first studied by Nowak et al [14]. Then, Di Luna et al [7] proposes a validity condition which still permits the application of lattice agreement protocol in obtaining atomic snapshots and implementing a special class of replicated state machines. They present an O(f)O(f) rounds algorithm for the Byzantine lattice agreement problem in asynchronous message systems. For synchronous message systems, a recent preprint [18] gives three algorithms. The first algorithm takes O(f)O(\sqrt{f}) rounds and has the early stopping property. The second and third algorithm takes O(logn)O(\log n) and O(logf)O(\log f) rounds but are not early stopping. All three algorithms can tolerate f<n3f<\frac{n}{3} failures. They also show how to modify their algorithms to work for authenticated settings and tolerates f<n2f<\frac{n}{2} failures. The preprint [6] presents an algorithm which takes O(logf)O(\log f) rounds which can tolerate f<n4f<\frac{n}{4} failures and shows how to improve resilience to f<n3f<\frac{n}{3} by using digital signatures and public-key infrastructure.

In this work, we present new algorithms for the Byzantine lattice agreement (BLA) problem in asynchronous message passing systems. In this problem, each process i[n]i\in[n] has input xix_{i} from a join semi-lattice (X,,)(X,\leq,\sqcup) with XX being the set of elements in the lattice, \leq being the partial order defined on XX, and \sqcup being the join operation. Each process ii has to output some yiXy_{i}\in X such that the following properties are satisfied. Let CC denote the set of correct processes in the system and tt denote the actual number of Byzantine processes in the system.

Comparability: For all iCi\in C and jCj\in C, either yiyjy_{i}\leq y_{j} or yjyiy_{j}\leq y_{i}.

Downward-Validity: For all iCi\in C, xiyix_{i}\leq y_{i}.

Upward-Validity: {yi|iC}({xi|iC}B)\sqcup\{y_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in C\}\leq\sqcup(\{x_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in C\}\cup B), where BXB\subset X and |B|t|B|\leq t.

Our main contribution is summarized as follows.

Theorem 1.1.

There is an O(logf)O(\log f) rounds algorithm for the BLA problem in asynchronous systems which can tolerate f<n5f<\frac{n}{5} Byzantine failures, where nn is the number of processes in the system. The algorithm takes O(n2logf)O(n^{2}\log f) messages.

Theorem 1.2.

There is a O(logf)O(\log f) rounds algorithm for the BLA problem in authenticated asynchronous systems which can tolerate f<n3f<\frac{n}{3} Byzantine failures, where nn is the number of processes in the system. The algorithm takes O(n2logf)O(n^{2}\log f) messages.

2 System Model

We assume a distributed asynchronous message system with nn processes with unique ids in [1,2,,n][1,2,...,n]. The communication graph is a clique, i.e., each process can send messages to any other process in the system. We assume that the communication channel between any two processes is reliable. There is no upper bound on message delay. We assume that processes can have Byzantine failures but at most f<n/3f<n/3 processes can be Byzantine in any execution of the algorithm. We use parameter tt to denote the actual number of Byzantine processes in a system. By our assumption, we must have tft\leq f. Byzantine processes can deviate arbitrarily from the algorithm. We say a process is correct or non-faulty if it is not a Byzantine process. We consider both systems with and without digital signatures. In a system with digital signatures, Byzantine processes cannot forge the signature of correct processes.

3 O(logf)O(\log f) Rounds Algorithm for the Asynchronous BLA Problem

In this section, we present an algorithm for the BLA problem in asynchronous systems which takes O(logf)O(\log f) rounds of asynchronous communication and tolerates f<n5f<\frac{n}{5} Byzantine failures. The high level idea of the algorithm is to apply a Byzantine tolerant classifier procedure, similar to the classifier procedure in [20] for crash failure systems, to divide a group of processes into the slave subgroup and the master subgroup such that the values of the slave group is less than the values of the master group. Then, by recursively applying such a classifier procedure within each subgroup, eventually all processes have comparable values. The classifier procedure for crash failure model only needs to guarantee that the value a correct slave process is at most the value of any correct master process and the size of the union of all values of correct slave processes is at most kk. The parameter kk is a knowledge threshold.

Fig. 1 shows the classification tree which specifies the threshold parameter for each group when the classifier procedure is invoked recursively. Before recursively invoking the classifier procedure, an initial round is used to let all processes exchange their input values. After this initial round, each process obtains at least nfn-f values. Each node in the tree represents a group. We also use label to indicate the threshold parameter of a group. Initially, all processes are in the same group with label nf2n-\frac{f}{2}. Let GG be a group with label kk at level rr. The master group (the right child node) of GG has label k+f2r+1k+\frac{f}{2^{r+1}}. The slave group (the left child node) of GG has label kf2r+1k-\frac{f}{2^{r+1}}. We can observe that all labels in the classification tree up to level logf\log f are unique. The above properties of the classifier procedure guarantee that processes in a leaf group must have the same value.

nf2n-\frac{f}{2}n3f4n-\frac{3f}{4}nf4n-\frac{f}{4}nfn-fnf+1n-f+1nnn1n-1level 1:level\leavevmode\nobreak\ 1:level 2:level\leavevmode\nobreak\ 2:levellogf+1:level\leavevmode\nobreak\ \log f+1:
Figure 1: The Classification Tree

In presence of Byzantine processes, the above properties are not enough for recursively applying such classifier procedure within each subgroup. A Byzantine process in a slave group can introduce some new values which are not known by some master process. We introduce the notion of admissible values for a group (to be defined later), which is the set of values that processes in this group can ever have. We present a Byzantine tolerant classifier procedure with threshold parameter kk which provides the following three properties: 1) Each correct slave process has at most kk values and each correct master process has more than kk values. 2) The admissible values of the slave group is a subset of the value of any correct master process. 3) The union of all admissible values in the slave group has size less than the threshold parameter kk.

Suppose now we have a classifier which guarantees the above properties. The main algorithm, shown in Fig. 2, proceeds in asynchronous rounds. Each process ii has a label lil_{i}, which is used as the threshold parameter when it invokes the classifier procedure. Initially, each process has the same label k0=nf2k_{0}=n-\frac{f}{2}. The label of a process is updated at each round according to the classification tree. In some places of our algorithm, we use the reliable broadcast primitive defined by Bracha [3] to send values. In this primitive, a process uses RB_broadcast to send a message and uses RB_broadcast to reliably deliver a message. This primitive guarantees many nice properties. In our algorithm, we need the following two main properties: 1) If a message is reliably delivered by some correct process, then this message will eventually be reliably delivered by each correct process. 2) If a correct process reliably delivers a message from process pp, then each correct process reliably delivers the same message from pp.

In the initial round at line 1-2, process ii RB_broadcast its input xix_{i} to all and waits for RB_deliver from nfn-f different processes. Then, it updates its value set to be the set of values reliably delivered at this round. When reliable delivering a value, process ii adds this value into its safe value set for the initial group k0=nf2k_{0}=n-\frac{f}{2}. The reliable delivering procedure is running on background. So this safe value set for the initial group keeps growing. By the properties of reliable broadcast, this safe value set can only contain at most one value from each process. This ensures Upward-Validity.

After the initial round, we can assume that all values in the initial safe value set of each process are unique, which can be done by associating the sender’s id with the value.

From line 3 to line 8, process ii executes the classifier procedure (to be presented later) for logf\log f rounds. At each round, it invokes the classifier procedure to decide whether it is classified as a slave or master and then updates its value accordingly. At round rr, if process ii is a master, it updates its label to be li:=li+f2r+1l_{i}:=l_{i}+\frac{f}{2^{r+1}}. Otherwise, if updates its label to be li:=lif2r+1l_{i}:=l_{i}-\frac{f}{2^{r+1}}.

Applying the above three properties provided by the classifier procedure, by induction on the round number, we can readily see that at the end of round logf\log f, for any two correct process ii and jj, if they are in the same group, say with label kk, then both ii and jj must have exactly kk values and their union also has exactly kk values. Then, ii and jj must have the same set of values. If they are in different group, then by recursively applying property 2, the values of one process must be subset of the values of the other process.

We now show a Byzantine tolerant classifier which satisfies the properties we define.

Code for process ii:
xix_{i}: input value      yiy_{i}: output value
lil_{i}: label of process ii. Initially, li=k0=nf2l_{i}=k_{0}=n-\frac{f}{2}:
VirV_{i}^{r}: value set held by process ii at round rr of the algorithm
Map SiS_{i}: Si[k]S_{i}[k] denote the safe value set for group kk
/* Initial Round */ 1: RB_broadcast(xix_{i}), wait for nfn-f RB_deliver(xjx_{j}) from pjp_{j} 2: Set Vi1V_{i}^{1} as the set of values reliably delivered /* Round 1 to logf\log f */ 3: for r:=1r:=1 to logf\log f 4:      (Vir+1,classV_{i}^{r+1},class) := Classifier(Vir,li,r)Classifier(V_{i}^{r},l_{i},r) 5:      if class=masterclass=master      then li:=li+f2r+1l_{i}:=l_{i}+\frac{f}{2^{r+1}} 6:      else      li:=lif2r+1l_{i}:=l_{i}-\frac{f}{2^{r+1}} 7: end for
8:
yi:={vVilogf+1}y_{i}:=\sqcup\{v\in V_{i}^{\log f+1}\}
Upon RB_deliver(xj)(x_{j}) from pjp_{j} Si[k0]:=Si[k0]xjS_{i}[k_{0}]:=S_{i}[k_{0}]\cup x_{j}

Figure 2: O(logf)O(\log f) Rounds Algorithm for the BLA Problem

3.1 The Byzantine Tolerant Classifier

The Byzantine tolerant classifier procedure, shown in Fig. 7, is inspired by the asynchronous classifier procedure given in [19] for the crash failure model. We say a process writes a value to at least nfn-f processes if it sends a “writewrite” message containing the value to all processes and waits for nfn-f different processes to send acknowledgement back. We say a process reads from at least nfn-f processes if it sends a “readread” message to all processes and waits for at least nfn-f processes to send their current values back. We say a process performs a write-read step if it writes its value to at least nfn-f processes and reads their values.

In the asynchronous classifier procedure for the crash failure model [19], to divide a group into a slave subgroup and a master subgroup, each process in the group first writes its value to at least nfn-f processes and then reads from at least nfn-f processes. After that, each process checks whether the union of all values read has size greater than the threshold parameter kk or not. If true, it is classified as a master process, otherwise, it is classified as a slave process. Slave processes keep their value the same. To guarantee that the value of each slave process is less than the value of each master process, each master process performs a write-read step to write the values obtained at the read step to at least nfn-f processes and read the values from them. Then it updates its value to be the union of all values read. The second read step guarantees that the size of the union of values of slave processes is at most kk, since the last slave process which completes the write step must be able to read all values of slave processes.

Constructing such a classifier procedure in presence of Byzantine processes is much more difficult. In order to adapt the above procedure to work in Byzantine failure setting, we need to address the following challenges. First, in the write step or read step, when a process waits for at least nfn-f different processes to send their values back, a Byzantine process can send arbitrary values. Second, simply ensuring that the values of a slave process is a subset of values of each master process is not enough, since a Byzantine process can introduce some values unknown to a master process in the slave group. For example, even if we can guarantee that the current value of each slave process is less that the value of each master process, in a later round, a Byzantine process can send some new value to a slave process which is unknown to some master process. This is possible in an asynchronous systems since messages can be arbitrarily delayed. Third, ensuring that the union of all values in the slave group has size at most kk is quite challenging. A simple second read step does not work any more since the last process which completes the write step might be a Byzantine process.

To prevent the first problem, in the Byzantine classifier procedure, when a process wants to perform a write step or read step, it applies the reliable broadcast primitive to broadcast its value. When a process waits for values from at least nfn-f processes, it only accepts a value if the value is a subset of the values reliably delivered by this process. By property of reliable broadcast, this ensures that each accepted value must be reliably broadcast by some process, which prevents Byzantine processes from introducing arbitrary values.

To tackle the second and third problem, the key idea is to restrict the values that a Byzantine process, which claims itself to be a slave process, can successfully reliable broadcast in later rounds. To achieve that, first we require that that a slave process can only reliable broadcast the value that it has reliably broadcast in the previous round. This prevents Byzantine processes from introducing arbitrary new values into a slave group. Second, we require each process which claims itself as a slave process to prove that it is indeed classified as a slave at the previous round when it tries to reliable broadcast a value at the current round by presenting the set of values it used to do classification. To enforce the above two requirements, we add a validity condition when a process echoes a message in the reliable broadcast primitive. However, this is not enough, since the value of a Byzantine slave process might not be known to a master process if broadcast value of the Byzantine process is arbitrarily delayed. To ensure that the value a Byzantine process reliably broadcast to be read by each correct master process, we force a Byzantine process who wants to be able to reliable broadcast a value in the slave group at next round to actually write its value to at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes, i.e., at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes must have received the value of a Byzantine process before each correct master process tries to read from at least n2fn-2f correct processes. These two sets of correct processes must have at least one correct process in common since f<n5f<\frac{n}{5}.

The validity condition we add in the reliable broadcast ensures that the admissible values for the slave group must be subset of the value set of any correct master process and the union of these values has size at most the threshold kk.

Each process which is classified as master is not required to prove its group identity but the value it tries to broadcast has to be a subset of safe value sets of correct processes. Similar to the asynchronous classifier procedure in [19], to ensure that the value of a slave process is less than the value of a master process, the master process needs to write to and read from at least nfn-f processes after it is classified as a master process.

3.1.1 Bounded Reliable Broadcast

BRB_broadcast(type,pf,v,k,rtype,pf,v,k,r)
typetype denotes the type of the message to be sent, either “writewrite” or “readread
pfpf is an array which is a proof of sender’s group identity
vv is the value to be sent, kk is the label of the sender, rr is the round number
The valid function is defined in Fig. 5
Broadcast INIT(i,type,pf,v,k,r)(i,type,pf,v,k,r) to all Upon receiving INIT(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) if (first reception of INIT(j,tj,,,,rj)(j,t_{j},-,-,-,r_{j}) wait until valid(tj,pfj,vj,kj,rj)valid(t_{j},pf_{j},v_{j},k_{j},r_{j}) Broadcast ECHO(j,tj,vj,kj,rj)(j,t_{j},v_{j},k_{j},r_{j}) to all endif Upon receiving ECHO(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) if ECHO(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) is received from at least n+f2+1\lfloor\frac{n+f}{2}\rfloor+1 different processes \wedge READY(j,vj,kj,rj)(j,v_{j},k_{j},r_{j}) has not yet broadcasted Broadcast READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) endif Upon receiving READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) if READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) received from f+1f+1 different processes \wedge                READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) has not been broadcasted Broadcast READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) endif if READY(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) received from 2f+12f+1 different processes \wedge (j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j})           has not been delivered BRB_deliver(j,tj,pfj,vj,kj,rj)(j,t_{j},pf_{j},v_{j},k_{j},r_{j}) endif


Figure 3: Bounded Reliable Broadcast

Before explaining the Byzantine classifier procedure in detail, we modify the reliable broadcast primitive by adding a condition when a process echoes a broadcast message. This condition (to be explained later) restricts the admissible values for each group. For completeness, the modified reliable broadcast procedure is shown in Fig. 3. When a process reliable broadcast a value, it also includes the round number, the label of the group it belongs to and a proof of its group identity. The proof is an array of size nn denoting the values read by the sender at previous round. When a process ii receives a broadcast message from process jj, it waits for the validity condition to hold and then echoes the message. We say a process BRB_broadcasts a message if it executs BRB_broadcast procedure with the massage. We say a process BRB_delivers a message if it executes BRB_deliver with this message.

3.1.2 Group and Admissible Values

In our algorithm, each process ii has a label lil_{i}, which serves as the threshold when it invokes the classifier procedure. The notion of group defined as below is based on labels of processes.

Definition 3.1 (group).

A groupgroup is a set of processes which have the same label. The label of a group is the label of the processes in this group. The label of a group is also the threshold value processes in this group use to do classification.

We also use label to indicate a group. We say a process is in group kk if its message is associated with label kk. Initially all processes are within the same group with label k0=nf2k_{0}=n-\frac{f}{2}. The label of each process is updated at each round based on whether it is classified as a slave or a master.

We introduce the notion of admissible values for a group, which are the set of values that processes in the group can ever have.

Definition 3.2 (admissible values for a group).

The admissible values for a group GG with label kk is the set of values that can be reliably delivered with label kk if they are reliably broadcast by some process (possibly Byzantine) with label kk.

In our classifier, each process in group kk updates its value set to a subset of the values which are reliably delivered with label kk. Thus, the value set of each process in group kk must be a subset of the admissible values for group kk. Our algorithm ensures that this property holds continues to hold until the end of the algorithm.

3.2 The Classifier

The classifier procedure for process i[n]i\in[n], shown in Fig. 7, has three input parameters: VV is the current value set of process ii, kk is the threshold value used to do the classification, which is also the current label of process ii, and rr is the round number.

Classifier(V,k,r)(V,k,r) for pip_{i}:
VV: input value set      kk: threshold value      rr: round number
/* Each process i[n]i\in[n] keeps track of the following variables */
Array LBirLB_{i}^{r}. LBir[j]LB_{i}^{r}[j] denotes the label of process jj sent along its values at round rr
Map SiS_{i}. Si[k]S_{i}[k] denotes a safe value set for group kk
Map ACVirACV_{i}^{r}. ACVir[k]ACV_{i}^{r}[k] denotes the set of values accepted with label kk, initially ACVir[k]:=ACV_{i}^{r}[k]:=\emptyset
Map RVirRV_{i}^{r}. RVir[j]RV_{i}^{r}[j] denote the values process ii read from process jj at round rr.
Map RTirRT_{i}^{r}. RTir[j]RT_{i}^{r}[j] denote the values process jj read from process ii at round rr.
/* write step*/ 1: if isSlave(k,r)isSlave(k,r) then pf:=RVir1pf:=RV_{i}^{r-1} else pf:=pf:=\emptyset 2: BRB_broadcast("write",pf,v,k,r)("write",pf,v,k,r), wait for wack(,r)wack(-,r) from nfn-f different processes /* read step*/ 3: BRB_broadcast(``read",,,k,r)(``read",-,-,k,r), wait for nfn-f rack(Rj,r)rack(R_{j},r) s.t. RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k] from pjp_{j} 4: Set RVir[j]:=RjRV_{i}^{r}[j]:=R_{j} if RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k], otherwise RVir[j]:=RV_{i}^{r}[j]:=\emptyset /* Classification */ 5: Let T:=j=1nRVir[j]T_{:}=\bigcup\limits_{j=1}^{n}RV_{i}^{r}[j] 6: if |T|>k|T|>k  /* height is greater than its label */ /* write-read step */ 7:      Send master(T,k,r)master(T,k,r) to all, wait for nfn-f mack(Rj,r)mack(R_{j},r) from pjp_{j} s.t. RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k] 8:      Define T:={Rj|RjACVir[k],j[n]}T^{\prime}:=\cup\{R_{j}\leavevmode\nobreak\ |\leavevmode\nobreak\ R_{j}\subseteq ACV_{i}^{r}[k],j\in[n]\} 9:      return (TT^{\prime}, master) 10: else
11:
     return (VV, slave)

Upon BRB_Deliver(j,type,,v,k,r)(j,type,-,v,k,r)
     if type=``write"type=``write"
          Si[m(k,r)]:=Si[m(k,r)]vS_{i}[m(k,r)]:=S_{i}[m(k,r)]\cup v /* Construct safe value set for group m(k,r)m(k,r) */
          ACVir[k]:=ACVir[k]vACV_{i}^{r}[k]:=ACV_{i}^{r}[k]\cup v
          LBir[j]:=kLB_{i}^{r}[j]:=k /* Record the label of a process at round rr */
          Send wack(,r)wack(-,r) to pjp_{j}
     elif type=``read"type=``read"
          RTir[j]:=ACVir[k]RT_{i}^{r}[j]:=ACV_{i}^{r}[k]
          Send rack(ACVir[k],r)rack(ACV_{i}^{r}[k],r) to pjp_{j}
     endif
Upon receiving master(T,k,r)master(T,k,r) from pjp_{j}
     wait until TACVir[k]T\subseteq ACV_{i}^{r}[k]
     Send mack(ACVir[k],r)mack(ACV_{i}^{r}[k],r) to pjp_{j}

Figure 4: The Byzantine Classifier Procedure

In line 1-2, process ii writes its current value set to at least nfn-f processes by using the BRB_broadcast procedure to send a “writewrite” message. If process ii is classified as a slave at the previous round, it needs to include the array of values it read from at least nfn-f processes at previous round as a proof of its group identity. This proof is used by every other process in the valid function to decide whether to echo the “writewrite” message or not. When process ii BRB_delivers a “writewrite” message with label kk at round rr, it includes the value in it into its safe value set for group m(k,r)m(k,r). The safe value set is used to restrict the set of values that can be delivered in the master group m(k,r)m(k,r). Due to this step, we can see that the admissible values in the master subgroup must be a subset of the admissible values at the current group. Process ii also includes the value contained in the “writewrite” message into ACVir[k]ACV_{i}^{r}[k], which stores the set of values reliably delivered with label kk at round rr.

From line 3 to line 4, process ii reads values from at least nfn-f processes by using the BRB_broadcast procedure to send a “readread” message to all. In the valid function, each process jj echos a “readread” message from process ii only if it has BRB_delivered the “writewrite” message from process ii sent at line 2. This step is used to ensure that for any process, possibly Byzantine, to read from other processes, it must have written its value to at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes, otherwise it cannot have enough processes echo its “readread” message in the BRB_broadcast. When process ii BRB_delivers a “readread” message with label kk from process jj at round rr, it records the set of values it has reliably delivered with label kk in RTir[j]RT_{i}^{r}[j]. Then process ii sends back a rackrack message along with the set of reliably delivered values with label kk at round rr to process jj. At line 3, after the “readread” message is sent, process ii has to wait for valid rackrack message from nfn-f processes. A rackrack message is valid if the value set contained in it is a subset of ACVir[k]ACV_{i}^{r}[k], which is the set of values reliably delivered with label kk at round rr. Consider a rack(Rj,r)rack(R_{j},r) message from a correct process jj. Since jj is correct, each value in RjR_{j} must have been reliably delivered by process jj. By property of reliable broadcast, each value in RjR_{j} will eventually be reliably delivered by process ii, thus RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k]. Thus, eventually process ii can obtain nfn-f valid rackrack message. At line 4, process ii records the set of valid RjR_{j}s obtained at line 3 into array RVirRV_{i}^{r}. So, this array stores the values reliably delivered with label kk that process ii read from all processes. This array is used to do classification in line 5-11 and also used as the proof of group identity of process ii when it writes at next round.

Line 5-11 is the classification step. Process ii is classified as a master process if the size of the union of valid values obtained in the read step is greater than its label kk, otherwise, it is classified as a slave process. If it is classified as a slave process, it returns its input value set. If it is classified as a master process, process ii performs a write-read step by sending a mastermaster message which includes the set of values it uses to do classification to all and wait for nfn-f valid mackmack message back at line 7. Similar to line 3, a mackmack message is valid if each value contained in it has been reliably delivered with correct label. When a process receives a mastermaster message with value set TT and label kk at round rr, it first waits until all values in TT are reliably delivered. Then it sends back a mackmack message along with the set of values reliably delivered with label kk at round rr. The waiting is used to ensure that each value in TT is valid, i.e., be reliably delivered, because a Byzantine process can send arbitrary values in its mastermaster message at line 7. By a similar reasoning as line 3, process ii will eventually obtain valid mackmack message from at least nfn-f different processes. After the write-read step, at line 8, process ii updates its value set to be the union of values obtained at line 7.

function Valid(j,type,pf,v,k,r)Valid(j,type,pf,v,k,r) for process ii:
     if (type=``write"¬isSlave(j,k,r)vSi[k])(type=``write"\wedge\lnot isSlave(j,k,r)\wedge v\subseteq S_{i}[k])
          (type=``write"isSlave(j,k,r)BRB_deliver(j,``write",,v,LBir1[j],r1)\vee\leavevmode\nobreak\ (type=``write"\wedge isSlave(j,k,r)\wedge\text{BRB\_deliver}(j,``write",-,v,LB_{i}^{r-1}[j],r-1)
                    pf[i]=RTir1[j]|j=1npf[j]|LBir1[j])\leavevmode\nobreak\ \wedge\leavevmode\nobreak\ pf[i]=RT_{i}^{r-1}[j]\wedge|\bigcup\limits_{j=1}^{n}pf[j]|\leq LB_{i}^{r-1}[j])

          (type=``read"BRB_deliver(j,``write",,,k,r))\vee\leavevmode\nobreak\ (type=``read"\wedge\text{BRB\_deliver}(j,``write",-,-,k,r))
          return TrueTrue
     else
          return FalseFalse
     endif
function isSlave(j,k,r)isSlave(j,k,r) for process ii: if k=LBir1[j]f2rk=LB_{i}^{r-1}[j]-\frac{f}{2^{r}} return TrueTrue else return FalseFalse

Figure 5: The Valid Function

The valid function is defined in Fig. 5. In the this function, we first consider the “writewrite” messages. If the message has been sent by a process that claims to be a master, then it is considered valid if the value vv in this message is contained in the safe value set Si[k]S_{i}[k]. If the message has been sent by a process that claims to be a slave, then process ii checks (1) whether process ii has BRB_delivered the “writewrite” message containing the same value at the previous round, (2) whether the ithi^{th} entry in pfpf array matches the value process jj read from ii in the previous round, and (3) whether the the number of values contained in the proof pfpf is at most kk. The condition (1) ensures that a slave process sends the same value as the previous round since a correct slave process must keep its value same as in the previous round. The condition (2) ensures that the proof sent by the slave process uses values that it read at round r1r-1. The condition (3) checks that the sender classified itself correctly.

If the message is a “readread” with label kk at round rr, process ii considers it as valid if it BRB_deliverd a “writewrite” message with label kk at round rr from the sender. This is used to make sure that the sender (possibly Byzantine) must complete its write step in line 1-2 before trying to read at line 3-4.

The isSlaveisSlave function invoked in the valid function simply checks whether the label of the sender matches the label update rule by comparing it with the label at previous round.

3.3 Proof of Correctness

We first define the notion of committing a message.

Definition 3.3.

We say a process commits a message if it reliably broadcasts the message and the message is reliably delivered. We say a process commits a message at time tt if this message is reliably delivered by the first process at time tt.

By properties of reliable broadcast, we observe that each process (possibly Byzantine) can commit at most one “writewrite” message and at most one “readread” message at each round.

Table 1: Notations
Variable Definition
GG A group of processes at round rr with label kk
slave(G)slave(G) The slave subgroup of GG, i.e., the processes with label s(k,r)s(k,r) at round r+1r+1
master(G)master(G) The master subgroup of GG, i.e., the processes with label m(k,r)m(k,r) at round r+1r+1
VirV_{i}^{r} The value set of process ii at the beginning of round rr
SirS_{i}^{r} The safe value map of process ii at the beginning of round rr Sir[k]S_{i}^{r}[k] is the safe value set of process ii for group kk at the beginning of round rr
UkrU_{k}^{r} The set of admissible values for group kk at round rr

Define s(k,r)=kf2r+1s(k,r)=k-\frac{f}{2^{r+1}} and m(k,r)=k+f2r+1m(k,r)=k+\frac{f}{2^{r+1}}. The variables we use in the proof are shown in Table. 1. Consider the classification step in group kk at round rr. The following lemma shows that if a Byzantine process wants to commit a “writewrite” message mm at round r+1r+1 with a slave label, then it must commit a “writewrite” message mm^{\prime} which contains the same value as mm and a “readread” message at round rr with label kk. Also, it must commit its “readread” message before its “writewrite” message at round rr with label kk.

Lemma 3.4.

Suppose that process ii (possibly Byzantine) commits a write message
(i,``write",,Vi,s(k,r),r+1)(i,``write",-,V_{i},s(k,r),r+1). Then
1) The message (i,``read",,,k,r)(i,``read",-,-,k,r) and the message (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) must be committed by process ii.
2) Let tt denote the time that message (i,``read",,,k,r)(i,``read",-,-,k,r) is committed. Then, the message (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) must have been reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes before time tt.

Proof 3.5.

For correct process ii, the claim is obvious.

Suppose ii is Byzantine. For it to commit a message with label s(k,r)s(k,r) at round r+1r+1, its broadcast message has to be echoed by at least n+f2+1\lfloor\frac{n+f}{2}\rfloor+1 different processes. Thus, it has to prove the values it read at round rr to at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f different correct processes, which implies that it must commit (i,``read",,,k,r)(i,``read",-,-,k,r). For its read message to be reliably delivered, by the condition to echo a read message, we know that process ii’s write message with label kk and value ViV_{i} at round rr must be reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes.

The above lemma guarantees that if a Byzantine process wants to introduce some values in the admissible values of a slave group of group kk, then it must first complete its write step and then complete its read step. Enforcing this order guarantees that the last slave process (possibly Byzantine) which completes its write step at line 2 must be able to read all values committed by slave processes in its read step at line 3. This set of values are exactly the set of admissible values for the slave group, since a slave process can only commit a “writewrite” message which contains the same value as its “writewrite” message at previous round by the valid function. Then, since this last slave process is a valid slave process, which is verified in the valid function, we have that the union of all admissible values for the slave group has size at most kk.

The following lemma shows that the classifier guarantees the properties we defined.

Lemma 3.6.

Let GG be a group at round rr with label kk. Let LL and RR be two nonnegative integers such that L<kRL<k\leq R. If L<|Vir|RL<|V_{i}^{r}|\leq R for each correct process iGi\in G, and |Ukr|R|U_{k}^{r}|\leq R, then
(p1) For each correct process imaster(G)i\in master(G), k<|Vir+1|Rk<|V_{i}^{r+1}|\leq R
(p2) For each correct process islave(G)i\in slave(G), L<|Vir+1|kL<|V_{i}^{r+1}|\leq k
(p3) Us(k,r)r+1UkrU_{s(k,r)}^{r+1}\subseteq U_{k}^{r}
(p4) Um(k,r)r+1UkrU_{m(k,r)}^{r+1}\subseteq U_{k}^{r}
(p5) |Um(k,r)r+1|R|U_{m(k,r)}^{r+1}|\leq R
(p6) |Us(k,r)r+1|k|U_{s(k,r)}^{r+1}|\leq k
(p7) For each correct process jmaster(G)j\in master(G), Us(k,r)r+1Vjr+1U_{s(k,r)}^{r+1}\subseteq V_{j}^{r+1}
(p8) Each correct process islave(G)i\in slave(G) can commits its value set at round r+1r+1, i.e., Vir+1Us(k,r)r+1V_{i}^{r+1}\subseteq U_{s(k,r)}^{r+1}
(p9) Each correct process jmaster(G)j\in master(G) can commit its value set at round r+1r+1, i.e., Vjr+1Um(k,r)r+1V_{j}^{r+1}\subseteq U_{m(k,r)}^{r+1}
(p10) |{Vir+1|islave(G)C}|k|\cup\{V_{i}^{r+1}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in slave(G)\cap C\}|\leq k
(p11) |{Vir+1|imaster(G)C}|R|\cup\{V_{i}^{r+1}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in master(G)\cap C\}|\leq R

Proof 3.7.

(p1)-(p2) : Immediate from the classifier procedure.

(p3): A slave process can only commit the write message that it has reliably broadcast at previous round. Thus, Us(k,r)r+1UkrU_{s(k,r)}^{r+1}\subseteq U_{k}^{r}.

(p4): The safe value set of each correct process for group m(k,r)m(k,r) is the union of values reliably broadcast by processes in group kk at round rr. Thus, Um(k,r)r+1UkrU_{m(k,r)}^{r+1}\subseteq U_{k}^{r}.

(p5): Immediate from (p4)(p4)

(p6): Consider group s(k,r)s(k,r) at round r+1r+1. From Lemma LABEL:lem:unique_label, we know that this group must be the slave group of group kk at round rr. Let PP denote the set of processes who commit a write message at round r+1r+1 with label s(k,r)s(k,r). For each process iPi\in P, let (i,``write",pfi,Vi,s(k,r),r+1)(i,``write",pf_{i},V_{i},s(k,r),r+1) denote the message that is committed by process ii. Then, Us(k,r)r+1={Vi|iP}U_{s(k,r)}^{r+1}=\cup\{V_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in P\}. From part 2) of Lemma 3.4, we have that for each process iPi\in P the message (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) must have been reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes. Let process lPl\in P be the last process such that its write message (l,``write",,Vl,k,r)(l,``write",-,V_{l},k,r) is reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f different correct processes. Let QlQ_{l} denote the set of correct processes which echoed message (l,``write",pfl,Vl,s(k,r),r+1)(l,``write",pf_{l},V_{l},s(k,r),r+1). We have |Ql|n+f2+1f|Q_{l}|\geq\lfloor\frac{n+f}{2}\rfloor+1-f. By the condition of echoing a “writewrite” message, we have pfl[q]=RTqr[l]pf_{l}[q]=RT_{q}^{r}[l] for each qQlq\in Q_{l}.

Consider an arbitrary process pPp\in P. Let QpQ_{p} denote the set of the first n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes which reliably delivered message (p,``write",,Vp,k,r)(p,``write",-,V_{p},k,r). Since 2(n+f2+1f)>nf2(\lfloor\frac{n+f}{2}\rfloor+1-f)>n-f, there exists a correct process sQlQps\in Q_{l}\cap Q_{p}. Let t0t_{0} denote the time that process ss sets its RTsr[l]RT_{s}^{r}[l] as ACVsr[k]ACV_{s}^{r}[k]. This happens after (l,``read",,,k,r)(l,``read",-,-,k,r) is reliably delivered by process tt. From part 2) Lemma of 3.4, the message (l,``write",,Vl,k,r)(l,``write",-,V_{l},k,r) must be reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes before time t0t_{0}. Since ll is the last process in PP such that its write message (l,``write",,Vl,k,r)(l,``write",-,V_{l},k,r) is reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f different correct processes, process pp’s write message (p,``write",,Vp,k,r)(p,``write",-,V_{p},k,r) must be reliably delivered by each process in QpQ_{p} before time t0t_{0}. Hence, when process ss sets RTsr[l]RT_{s}^{r}[l] as ACVsr[k]ACV_{s}^{r}[k], the value set VpV_{p} has been added into ACVsr[k]ACV_{s}^{r}[k] by process ss. Thus, VpRTsr[l]V_{p}\subseteq RT_{s}^{r}[l]. Since pp is an arbitrary process in PP, we have that for each process pPp\in P, there exists a process sQls\in Q_{l} such that VpRTsr[l]V_{p}\subseteq RT_{s}^{r}[l]. Hence, {Vp|pP}{RTqr[l]|qQ}j=1npfl[j]\cup\{V_{p}\leavevmode\nobreak\ |\leavevmode\nobreak\ p\in P\}\subseteq\cup\{RT_{q}^{r}[l]\leavevmode\nobreak\ |\leavevmode\nobreak\ q\in Q\}\subseteq\bigcup\limits_{j=1}^{n}pf_{l}[j]. Since |j=1npfl[j]|k|\bigcup\limits_{j=1}^{n}pf_{l}[j]|\leq k, we have |Us(k,r)r+1|=|{Vp|pP}|k|U_{s(k,r)}^{r+1}|=|\cup\{V_{p}\leavevmode\nobreak\ |\leavevmode\nobreak\ p\in P\}|\leq k.

(p7): Consider group s(k,r)s(k,r) at round r+1r+1. Let PP denote the set of processes who commit a “writewrite” message at round r+1r+1 with label s(k,r)s(k,r). For each process iPi\in P, let (i,``write",pfi,Vi,s(k,r),r+1)(i,``write",pf_{i},V_{i},s(k,r),r+1) denote the message that is commitd by process ii. Then, Us(k,r)r+1={Vi|iP}U_{s(k,r)}^{r+1}=\cup\{V_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in P\}. We show that for each iPi\in P, ViVjr+1V_{i}\subseteq V_{j}^{r+1}. From Lemma 3.4, we have the message (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) must have been reliably delivered by at least n+f2+1f\lfloor\frac{n+f}{2}\rfloor+1-f correct processes. Let PP denote the set of correct processes which echoed for message (i,``write",pfi,Vi,s(k,r),r+1)(i,``write",pf_{i},V_{i},s(k,r),r+1). We have |P|n+f2+1f|P|\geq\lfloor\frac{n+f}{2}\rfloor+1-f. Then, process ii’s “readread” message (i,``read",,,k,r)(i,``read",-,-,k,r) and write message (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) must be reliably delivered by each process in PP. Also, by the echoing condition, we have pfi[p]=ACVpr[i]pf_{i}[p]=ACV_{p}^{r}[i] for each pPp\in P and |j=1npfi[j]|k|\bigcup\limits_{j=1}^{n}pf_{i}[j]|\leq k. Thus ViACVpr[k]V_{i}\subseteq ACV_{p}^{r}[k] for each pPp\in P.
Let QQ denote the set of correct processes which delivered the message master(Tjr,k,r)master(T_{j}^{r},k,r) sent by correct process jmaster(G)j\in master(G) at line 7 of round rr and we have |Q|n2f|Q|\geq n-2f. Since n+f2+1f+n2f>nf\lfloor\frac{n+f}{2}\rfloor+1-f+n-2f>n-f, there exists a correct process sPQs\in P\cap Q such that it reliably delivers (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) of process ii and message master(Tjr,k,r)master(T_{j}^{r},k,r) from process jj. We show that process ss must deliver (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) before master(Tjr,k,r)master(T_{j}^{r},k,r). Suppose that process tt delivers master(Tjr,k,r)master(T_{j}^{r},k,r) before (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r) for contradiction. Then, we have TjrACVtr[k]T_{j}^{r}\subseteq ACV_{t}^{r}[k] when process tt delivers (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r). We also have that process tt must reliable deliver (i,``read",,,k,r)(i,``read",-,-,k,r) after (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r). Thus, TjrACVtr[k]T_{j}^{r}\subseteq ACV_{t}^{r}[k] when ss delivers (i,``read",,,k,r)(i,``read",-,-,k,r). Since ACVsr[k]=pfi[s]ACV_{s}^{r}[k]=pf_{i}[s], we have Tjrpfi[s]T_{j}^{r}\subseteq pf_{i}[s]. Since process jj is correct and jmaster(G)j\in master(G), then |Tjr|>k|T_{j}^{r}|>k. Thus, |j=1npfi[j]|>k|\bigcup\limits_{j=1}^{n}pf_{i}[j]|>k, contradiction. Therefore, process tt delivers master(Tjr,k,r)master(T_{j}^{r},k,r) before (i,``write",,Vi,k,r)(i,``write",-,V_{i},k,r). Then, when jj receives mack(Rt,r)mack(R_{t},r) from tt, we must have ViRtV_{i}\subseteq R_{t}. Thus, ViVjr+1V_{i}\subseteq V_{j}^{r+1}.

(p8): Since process ii is correct, at round rr, it must read from at least n2fn-2f correct processes. Let QQ denote this set of correct processes. Then, at round r+1r+1, each process in QQ will echo the reliable broadcast message of process ii. Thus, there will be at least n2fn-2f echo messages. Since f<n5f<\frac{n}{5}, we have n2fn+f2+1n-2f\geq\lfloor\frac{n+f}{2}\rfloor+1. Hence, eventually the message of ii will be reliable delivered.

(p9): Consider round rr, from line 8, we know that Vjr+1ACVjr[k]V_{j}^{r+1}\subseteq ACV_{j}^{r}[k]. From the property of reliable broadcast, any value vVjr+1v\in V_{j}^{r+1} is reliably broadcast by some process and will eventually be reliably delivered by each correct process. Hence, value vv will be included into the safe value set of each correct process for the group with label m(k,r)m(k,r). Thus, at round r+1r+1, Vjr+1V_{j}^{r+1} will be eventually reliable delivered by each correct process.

(p10): Implied by (p8)(p8) and (p6)(p6).

(p11): Implied by (p9)(p9) and (p5)(p5).

The following lemma shows that the value set of a correct process is non-decreasing.

Lemma 3.8.

For any correct process ii and round rr, VirVjr+1V_{i}^{r}\subseteq V_{j}^{r+1}.

Proof 3.9.

A slave process keeps its value set unchanged and a master process updates its value set to be the set of reliably delivered values which contains its own value set.

The following lemma is used later to show that processes in the same group at the end of the algorithm must have the same set of values. A similar lemma is given in [20, 19]. We defer its detailed proof in Appendix A.

Lemma 3.10.

Let GG be a group of processes at round rr with label kk. Then
(1) for each correct process iGi\in G, kf2r|Vir|k+f2rk-\frac{f}{2^{r}}\leq|V_{i}^{r}|\leq k+\frac{f}{2^{r}}
(2) |Ukr|k+f2r|U_{k}^{r}|\leq k+\frac{f}{2^{r}}

Proof 3.11.

By induction on round number rr and apply (p1)(p1), (p2)(p2), (p5)(p5) and p(6)p(6) of Lemma 3.6.

Lemma 3.12.

Let ii and jj be two correct processes that are within the same group GG with label kk at the beginning of round logf+1\log f+1. Then Vilogf+1V_{i}^{\log f+1} and Vjlogf+1V_{j}^{\log f+1} are equal.

Proof 3.13.

Let GG^{\prime} be the parent of GG with label kk^{\prime}. Assume without loss of generality that G=M(G)G=M(G^{\prime}). The proof for the case G=S(G)G=S(G^{\prime}) follows in the same manner. Since GG^{\prime} is a group at round logf\log f, by Lemma 3.10, we have:
(1) for each correct process pGp\in G^{\prime}, k1<|Vplogf|k+1k^{\prime}-1<|V_{p}^{\log f}|\leq k^{\prime}+1, and
(2) |Uklogf|k+1|U_{k^{\prime}}^{\log f}|\leq k^{\prime}+1

Since iGi\in G^{\prime} and jGj\in G^{\prime}, (1) and (2) hold for both process ii and jj. By the assumption that G=M(G)G=M(G^{\prime}), process ii and jj execute the Classifier procedure with label kk^{\prime} and are both classified as master. Let L=k1L=k^{\prime}-1 and R=k+1R=k^{\prime}+1, then by applying Lemma 3.6(p1p1) we have k<|Vilogf+1|k+1k^{\prime}<|V_{i}^{\log f+1}|\leq k^{\prime}+1 and k<|Vjlogf+1|k+1k^{\prime}<|V_{j}^{\log f+1}|\leq k^{\prime}+1, thus |Vilogf+1|=|Vjlogf+1|=k+1|V_{i}^{\log f+1}|=|V_{j}^{\log f+1}|=k^{\prime}+1. By (p11p11) of Lemma 3.6, we have |{Vilogf+1,Vjlogf+1}|k+1|\cup\{V_{i}^{\log f+1},V_{j}^{\log f+1}\}|\leq k^{\prime}+1. Thus, Vilogf+1=Vjlogf+1V_{i}^{\log f+1}=V_{j}^{\log f+1}. Therefore, VirV_{i}^{r} and VjrV_{j}^{r} are equal at the beginning of round logf+1\log f+1.

Lemma 3.14.

(Comparability) For any two correct process ii and jj, yiy_{i} and yjy_{j} are comparable.

Proof 3.15.

If process ii and jj are in the same group at the beginning of round logf+1\log f+1, then by Lemma 3.12, yi=yjy_{i}=y_{j}. Otherwise, let GG be the last group that both ii and jj belong to. Suppose GG is a group with label kk at round rr. Suppose islave(G)i\in slave(G) and jmaster(G)j\in master(G) without loss of generality. Then, Vilogf+1Us(k,r)r+1Vjr+1Vjlogf+1V_{i}^{\log f+1}\subseteq U_{s(k,r)}^{r+1}\subseteq V_{j}^{r+1}\subseteq V_{j}^{\log f+1}, by (p8p8), (p6p6) (p7p7) and (p5p5) of Lemma 3.6 and Lemma 3.8.

See 1.1

Proof 3.16.

Downward-Validity. After the initial round, xiVi1x_{i}\in V_{i}^{1}. By Lemma 3.8, we have Vi1Vilogf+1V_{i}^{1}\subseteq V_{i}^{\log f+1}. Thus xiVilogf+1x_{i}\in V_{i}^{\log f+1}. Since yi={vVilogf+1}y_{i}=\sqcup\{v\in V_{i}^{\log f+1}\}, we have xiyix_{i}\leq y_{i}.

Comparability follows from Lemma 3.14.

Upward-Validity. In the initial round, reliable broadcast is used to construct the safe value set for the initial group with label k0k_{0}. By property of reliable broadcast, each Byzantine process can introduce at most one value into the safe value set for group k0k_{0}. After the initial round, all admissible values for each group must be subset of the values reliably delivered at the initial round. Thus, the union of all value sets held by correct processes must subset of the union of {xi|iC}\{x_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in C\} and a set BXB\subset X such that |B|f|B|\leq f. Therefore, {yi|iC}({xi|iC}B)\sqcup\{y_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in C\}\leq\sqcup(\{x_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ i\in C\}\cup B), where BXB\subset X and |B|t|B|\leq t.

Remark 3.17.

The reliable broadcast primitive we use can be replaced by a more efficient one proposed by Imbs et al [12], which can only tolerate f<n5f<\frac{n}{5} Byzantine failures but takes 2 asynchronous communication rounds. This suffices for our application.

4 O(logf)O(\log f) Rounds Algorithm for Authenticated BLA Problem

In this section, we present an O(logf)O(\log f) rounds algorithm for the BLA problem in authenticated (i.e., assuming digital signatures and public-key infrastructure) setting that can tolerate f<n3f<\frac{n}{3} Byzantine failures by modifying the Byzantine tolerant classifier procedure in previous section. The Byzantine classifier procedure in authenticated setting is shown in Fig. 6. The primary difference lies in what a process does when it reliably delivers some message and the validity condition for echoing a broadcast message. The basic idea is to let a process sign the ackack message that it needs to send. Each process uses the set of signed ackack messages as proof of its completion of a write step or read step. In this section, we use xi\langle x\rangle_{i} to denote a message xx signed by process ii, i.e., xi=x,σ\langle x\rangle_{i}=\langle x,\sigma\rangle, where σ\sigma is the signature produced by process ii using its private signing key. We say a message is correctly signed by process ii if the signature within the message is a correct signature produced by process ii.

4.1 The Authenticated Byzantine Tolerant Classifier

The classifier in the authenticated setting is shown in Fig. 6. The primary difference between the classifier in previous section and the authenticated classifier is that in the authenticated classifier each process uses signed messages as proof of its group identity.

Classifier(V,k,r)(V,k,r):
VV: input value set      kk: threshold value      rr: round number
Each process i[n]i\in[n] keeps track of the same variables as the classifier in Fig. 7
Set RVirRV_{i}^{r}, which stores the set of signed rackrack message in the read step of previous round
/* write step */ 1: if isSlave(k,r)isSlave(k,r) then pf:=RVir1pf:=RV_{i}^{r-1} else pf:=pf:=\emptyset 2: BRB_broadcast(``write",pf,v,k,r)(``write",pf,v,k,r), wait for nfn-f valid wack(,r)j\langle wack(-,r)\rangle_{j} from pjp_{j} 3: Let WW denote the set of wackj\langle wack\rangle_{j} delivered at line 2 /* read step */ 4: Send read(W,k,r)read(W,k,r) to all, wait for nfn-f valid rack(Rj,r)j\langle rack(R_{j},r)\rangle_{j} s.t. RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k] from pjp_{j} 5: Set RVir:={rack(Rj,r)j|RjACVir[k]}RV_{i}^{r}:=\{\langle rack(R_{j},r)\rangle_{j}\leavevmode\nobreak\ |\leavevmode\nobreak\ R_{j}\subseteq ACV_{i}^{r}[k]\} /* Classification */ 6: Let T:={Rj|RjACVir[k]}T:=\cup\{R_{j}\leavevmode\nobreak\ |\leavevmode\nobreak\ R_{j}\subseteq ACV_{i}^{r}[k]\} /* write-read step */ 7: if |T|>k|T|>k  /* height of TT is greater than its label */ 8:      Send master(T,k,r)master(T,k,r) to all, wait for nfn-f mack(Rj,r)mack(R_{j},r) s.t. RjACVir[k]R_{j}\subseteq ACV_{i}^{r}[k] from pjp_{j} 9:      Define T:={Rj|RjACVir[k]}T^{\prime}:=\cup\{R_{j}\leavevmode\nobreak\ |\leavevmode\nobreak\ R_{j}\subseteq ACV_{i}^{r}[k]\} 10:      return (TT^{\prime}, master) 11: else
12:
     return (VV, slave)

Upon BRB_deliver(j,t,v,k,r)(j,t,v,k,r)
     if t=``write"t=``write"
          Si[m(k,r)]:=Si[m(k,r)]vS_{i}[m(k,r)]:=S_{i}[m(k,r)]\leavevmode\nobreak\ \cup\leavevmode\nobreak\ v
          ACVir[k]:=ACVir[k]vACV_{i}^{r}[k]:=ACV_{i}^{r}[k]\leavevmode\nobreak\ \cup\leavevmode\nobreak\ v
          Send wack(ACVir[k],r)i\langle wack(ACV_{i}^{r}[k],r)\rangle_{i} to pjp_{j}
     endif
Upon receiving read(W,k,r)read(W,k,r) from pjp_{j}
     if validSignature(``read",j,W,r)validSignature(``read",j,W,r)
          Send rack(ACVir[k],r)i\langle rack(ACV_{i}^{r}[k],r)\rangle_{i} to pjp_{j}
     endif
Upon receiving master(T,k,r)master(T,k,r) from pjp_{j}
     wait until TACVir[k]T\subseteq ACV_{i}^{r}[k]
     Send mack(ACVir[k],r)mack(ACV_{i}^{r}[k],r) to pjp_{j}

Figure 6: The Authenticated Byzantine Tolerant Classifier

At lines 1-2, each process writes its current value set by using the BRB_broadcast procedure to send a “writewrite” message. If the process is a slave process, it also includes the set of at least nfn-f signed rackrack messages it received at the previous round as a proof that it is indeed classified as a slave. At line 2, each process waits for correctly signed wackwack message from at least nfn-f different processes. This set of signed wackwack message is used as the proof of its completion of the write step when this process tries to read from other processes. When a process BRB_delivers a “writewrite” message, it performs similar steps as the algorithm in previous section except that it sends a signed wackwack message back.

At line 4-5, each process reads from at least nfn-f processes. Different from the classifier procedure in previous section, each process directly sends a read message along with the set of correctly signed wackwack messages obtained at line 2 to all (instead of using the BRB_broadcast procedure). When a process receives a “readread” message with label kk for round rr, if uses the validSignature function to check whether the “readread” message contains correctly signed wackwack message for round rr from at least nfn-f different processes. If so, it sends back to the sender a signed rackrack message along with the reliably delivered values with label kk at round rr. This ensures that if a process (possibly Byzantine) tries to read from correct processes, it must complete its write step first.

The classification step from line 6-12 is the same as the classification step of the algorithm in previous section. A mater process performs a write-read step by sending a mastermaster message along the set of value obtained at line 6. Then it waits for nfn-f valid mackmack messages and updates its value set to be the set of values contained in these messages. When a process receives a mastermaster message, it performs the same steps as in the classifier in previous section.

The valid function is different from the one given in previous section. First, only “writewrite” messages are reliably broadcast. Second, the proof is a set of signed rackrack messages instead of an array in previous section. To verify the proof, the valid function invokes the validSignature function to check whether the proof contains correctly signed rackrack message for previous round from at least nfn-f different processes.

function Valid(j,type,pf,v,k,r)Valid(j,type,pf,v,k,r): if (type=``write"¬isSlave(j,k,r)vSi[k])(type=``write"\wedge\lnot isSlave(j,k,r)\wedge v\subseteq S_{i}[k]) (type=``write"isSlave(j,k,r)BRB_deliver(j,``write",,v,LBir1[j],r1)\vee\leavevmode\nobreak\ (type=``write"\wedge isSlave(j,k,r)\wedge\text{BRB\_deliver}(j,``write",-,v,LB_{i}^{r-1}[j],r-1) validSignature(``write",pf,r)pfcontains at most k distinct values\leavevmode\nobreak\ \wedge validSignature(``write",pf,r)\wedge pf\leavevmode\nobreak\ \text{contains at most $k$ distinct values} return TrueTrue else
          return FalseFalse
endif function validSignature(type,pf,r)validSignature(type,pf,r): if (type=``write"pftype=``write"\wedge pf contains correctly signed rack(,r1)rack(-,r-1) from nfn-f processes) \vee\leavevmode\nobreak\ (type=``read"pftype=``read"\wedge pf contains correctly signed wack(,r)wack(-,r) from nfn-f processes) return TrueTrue else return FalseFalse endif

Figure 7: The Valid Function

For the proof of correctness, we just need to prove the classifier procedure satisfies the properties given Lemma 3.6 under the assumption that f<n3f<\frac{n}{3}. The proof of (p6)(p6) and (p7)(p7) is similar to the proof in previous section. Thus, we do not give the formal mathematical proof.

Lemma 4.1.

Properties (p1)(p11)(p1)-(p11) of Lemma 3.6 hold for the classifier shown in Fig. 6.

Proof 4.2.

(p1)-(p5) : similar to the proofs given in Lemma 3.6.

(p6): Consider group s(k,r)s(k,r) at round r+1r+1. Let QQ denote the set of processes who commit their “writewrite” message with label s(k,r)s(k,r) at round r+1r+1. For process ii (possibly Byzantine) to commit this message, this message has to be echoed by at least n+f2+1\lfloor\frac{n+f}{2}\rfloor+1 different processes. By the condition of echo, process ii must have received correctly signed rack(,r)rack(-,r) message from at least nfn-f processes at line 4 of round rr, among which there are at least n2fn-2f correct processes. When a process tries to read at round rr, it has to prove that it has received at least nfn-f correctly signed wack(,r)wack(-,r) messages. Thus, process ii must have committed a write message and this message has been reliably delivered by at least n2fn-2f correct processes. Let process lQl\in Q be the last process in QQ such that its write message is reliably delivered by at least n2fn-2f different correct processes. When process ll read from at least n2fn-2f different correct processes at round rr, each process in QQ must have written its value in at least n2fn-2f different correct processes. Then, since 2(n2f)>nf2(n-2f)>n-f by the assumption of f<n3f<\frac{n}{3}, process ll must have read all values reliably broadcast by processes in QQ. These values are exactly the set Us(k,r)r+1U_{s(k,r)}^{r+1}. Since ll is a slave process and the size of RVlrRV_{l}^{r} must be at most kk, we have |Us(k,r)r+1|k|U_{s(k,r)}^{r+1}|\leq k.

(p7): Consider group s(k,r)s(k,r) at round r+1r+1. For a process islave(G)i\in slave(G) to commit its value set at round r+1r+1, process ii must reliably broadcast the same value set as round rr and must be able to verify the value set RVirRV_{i}^{r} it reads at round rr. By a similar argument as the proof for (p6)(p6), process ii’s value set must be reliably delivered by at least n2fn-2f correct processes at round rr before its reading step. Let PP denote the set of correct processes which delivered process ii’s value before its reading step at round rr and |P|n2f|P|\geq n-2f. Let QQ denote the set of correct processes which delivered the mastermaster message sent by process jmaster(G)j\in master(G) at line 11 of round rr and |Q|n2f|Q|\geq n-2f. Then, since 2(n2f)>nf2(n-2f)>n-f by the assumption of f<n3f<\frac{n}{3}, there exists a correct process kPQk\in P\cap Q such that it delivered both VirV_{i}^{r} and TjrT_{j}^{r} at round rr. Since process ii is a slave process, process kk must send rackrack to process ii before sending mackmack to process jj, otherwise, process ii would be classified as master. Then, process jj must receive VjrV_{j}^{r} from process kk and include into Vjr+1V_{j}^{r+1}. Thus, Us(k,r)r+1Vjr+1U_{s(k,r)}^{r+1}\subseteq V_{j}^{r+1}.

(p8): Since process ii is correct, at round rr, it must have received at least nfn-f correctly signed rackrack message. Then, at round r+1r+1, it can prove to at least n+f2+1\lfloor\frac{n+f}{2}\rfloor+1 correct processes. Hence, the write message of ii at round r+1r+1 will be reliably delivered by each correct process.

(p9): Consider round rr, from line 8, we know that Vjr+1ACVjr[k]V_{j}^{r+1}\subseteq ACV_{j}^{r}[k]. From the property of reliable broadcast, any value vVjr+1v\in V_{j}^{r+1} is reliably broadcast by some process with label kk at round rr and will eventually be reliably delivered by each correct process. Hence, value vv will be included into the safe value set of each correct process for the group m(k,r)m(k,r). Then, eventually Vjr+1Si[m(k,r)]V_{j}^{r+1}\subseteq S_{i}[m(k,r)] for each correct process ii. Thus, when process jmaster(G)j\in master(G) tries to reliable broadcast Vjr+1V_{j}^{r+1} at round r+1r+1, the echo condition eventually holds for each correct process ii.

See 1.2

5 Conclusion

In this paper, we present an O(logf)O(\log f) rounds algorithm for the Byzantine lattice agreement problem in asynchronous systems which can tolerates f<n5f<\frac{n}{5} Byzantine failures. We also give an O(logf)O(\log f) rounds algorithm for the authenticated setting that can tolerate f<n3f<\frac{n}{3} Byzantine failures. One open problem left is to design an algorithm which has resilience of f<n3f<\frac{n}{3} and takes O(logf)O(\log f) rounds.

References

  • [1] Hagit Attiya, Maurice Herlihy, and Ophir Rachman. Atomic snapshots using lattice agreement. Distributed Computing, 8(3):121–132, 1995.
  • [2] Martin Biely, Zarko Milosevic, Nuno Santos, and Andre Schiper. S-paxos: Offloading the leader for high throughput state machine replication. In 2012 IEEE 31st Symposium on Reliable Distributed Systems, pages 111–120. IEEE, 2012.
  • [3] Gabriel Bracha. Asynchronous Byzantine agreement protocols. Information and Computation, 75(2):130–143, 1987.
  • [4] Soma Chaudhuri. More choices allow more faults: Set consensus problems in totally asynchronous systems. Inf. Comput., 105(1):132–158, 1993.
  • [5] Soma Chaudhuri, Maurice Erlihy, Nancy A Lynch, and Mark R Tuttle. Tight bounds for k-set agreement. Journal of the ACM (JACM), 47(5):912–943, 2000.
  • [6] Giuseppe Antonio Di Luna, Emmanuelle Anceaume, Silvia Bonomi, and Leonardo Querzoni. Synchronous byzantine lattice agreement in 𝒪(logf)\mathcal{O}(\log f) rounds. arXiv preprint arXiv:2001.02670, 2020.
  • [7] Giuseppe Antonio Di Luna, Emmanuelle Anceaume, and Leonardo Querzoni. Byzantine generalized lattice agreement. arXiv preprint arXiv:1910.05768, 2019.
  • [8] Danny Dolev and H Raymond Strong. Authenticated algorithms for Byzantine agreement. SIAM Journal on Computing, 12(4):656–666, 1983.
  • [9] Jose M Faleiro, Sriram Rajamani, Kaushik Rajan, G Ramalingam, and Kapil Vaswani. Generalized lattice agreement. In Proceedings of the 2012 ACM symposium on Principles of distributed computing, pages 125–134. ACM, 2012.
  • [10] Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32(2):374–382, 1985.
  • [11] Maurice Herlihy, Sergio Rajsbaum, and Mark R Tuttle. Unifying synchronous and asynchronous message-passing models. In Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, pages 133–142, 1998.
  • [12] Damien Imbs and Michel Raynal. Trading off t-resilience for efficiency in asynchronous byzantine reliable broadcast. Parallel Processing Letters, 26(04):1650017, 2016.
  • [13] Marios Mavronicolasa. A bound on the rounds to reach lattice agreement. http://www.cs.ucy.ac.cy/ mavronic/pdf/lattice.pdf, 2018.
  • [14] Thomas Nowak and Joel Rybicki. Byzantine approximate agreement on graphs. In 33rd International Symposium on Distributed Computing (DISC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
  • [15] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM (JACM), 27(2):228–234, 1980.
  • [16] Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. Conflict-free replicated data types. In Symposium on Self-Stabilizing Systems, pages 386–400. Springer, 2011.
  • [17] Jan Skrzypczak, Florian Schintke, and Thorsten Schütt. Linearizable state machine replication of state-based crdts without logs. arXiv preprint arXiv:1905.08733, 2019.
  • [18] Xiong Zheng and Vijay Garg. Byzantine lattice agreement in synchronous systems. arXiv preprint arXiv:1910.14141, 2019.
  • [19] Xiong Zheng, Vijay K Garg, and John Kaippallimalil. Linearizable replicated state machines with lattice agreement. arXiv preprint arXiv:1810.05871, 2018.
  • [20] Xiong Zheng, Changyong Hu, and Vijay K Garg. Lattice agreement in message passing systems. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

Appendix A Proof of Lemma 3.10

See 3.10

Proof A.1.

By induction on rr. Consider the base case with r=1r=1, k=k0=nf2k=k_{0}=n-\frac{f}{2}. After the initial round, each process must receive at least nfn-f different values and at most nn values by the property of reliable broadcast. Thus, k0f2=nf|Vi1|nk_{0}-\frac{f}{2}=n-f\leq|V_{i}^{1}|\leq n. Any value in Ukr=Uk01U_{k}^{r}=U_{k_{0}}^{1} must be reliably delivered by at least one correct process at the initial round. Thus, |Uk01|k0+f2=n|U_{k_{0}}^{1}|\leq k_{0}+\frac{f}{2}=n.

For the induction step, assume the above lemma holds for all groups at round r1r-1. Consider an arbitrary group GG at round r>1r>1 with label kk. Let GG^{\prime} be the parent group of GG at round r1r-1 with label kk^{\prime}. Consider the Classifier procedure executed by all processes in GG^{\prime} with label kk^{\prime}. By induction hypothesis, we have:

(1) for each correct process iGi\in G^{\prime}, kf2r1<|Vir1|k+f2r1k^{\prime}-\frac{f}{2^{r-1}}<|V_{i}^{r-1}|\leq k^{\prime}+\frac{f}{2^{r-1}}

(2) |Ukr1|k+f2r1|U_{k^{\prime}}^{r-1}|\leq k^{\prime}+\frac{f}{2^{r-1}}.

Let L=kf2r1L=k^{\prime}-\frac{f}{2^{r-1}} and R=k+f2r1R=k^{\prime}+\frac{f}{2^{r-1}}, then (1) and (2) are exactly the conditions of Lemma 3.6. Consider the following two cases:

Case 1: G=M(G)G=M(G^{\prime}). Then k=k+f2rk=k^{\prime}+\frac{f}{2^{r}}. From (p1p1) and (p5p5) of Lemma 3.6, we have:

(1) for each correct process iGi\in G, kf2r<|Vir|k+f2rk-\frac{f}{2^{r}}<|V_{i}^{r}|\leq k+\frac{f}{2^{r}}

(2) |Ukr|k+f2r|U_{k}^{r}|\leq k+\frac{f}{2^{r}}.

Case 2: G=S(G)G=S(G^{\prime}). Then k=kf2rk=k^{\prime}-\frac{f}{2^{r}}. Similarly, from (p2p2) and (p6p6) of Lemma 3.6, we have the same equations.