University of California, Riverside, USA [email protected]://orcid.org/0000-0002-1825-0097 University of California, Riverside, [email protected][orcid] University of California, Riverside, [email protected][orcid] \CopyrightXiao Li, Eric Chan, Mohsen Lesani {CCSXML} <ccs2012> <concept> <concept_id>10010147.10010919.10010172</concept_id> <concept_desc>Computing methodologies Distributed algorithms</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10010520.10010575.10010578</concept_id> <concept_desc>Computer systems organization Availability</concept_desc> <concept_significance>300</concept_significance> </concept> <concept> <concept_id>10010520.10010575.10010577</concept_id> <concept_desc>Computer systems organization Reliability</concept_desc> <concept_significance>100</concept_significance> </concept> </ccs2012> \ccsdesc[500]Computing methodologies Distributed algorithms \ccsdesc[300]Computer systems organization Availability \ccsdesc[100]Computer systems organization Reliability \EventEditorsRotem Oshman \EventNoEds1 \EventLongTitle37th International Symposium on Distributed Computing (DISC 2023) \EventShortTitleDISC 2023 \EventAcronymDISC \EventYear2023 \EventDateOctober 10-12, 2023 \EventLocationL’Aquila, Italy \EventLogo \SeriesVolume281 \ArticleNo16
Quorum Subsumption for Heterogeneous Quorum Systems
Abstract
Byzantine quorum systems provide higher throughput than proof-of-work and incur modest energy consumption. Further, their modern incarnations incorporate personalized and heterogeneous trust. Thus, they are emerging as an appealing candidate for global financial infrastructure. However, since their quorums are not uniform across processes anymore, the properties that they should maintain to support abstractions such as reliable broadcast and consensus are not well-understood. It has been shown that the two properties quorum intersection and availability are necessary. In this paper, we prove that they are not sufficient. We then define the notion of quorum subsumption, and show that the three conditions together are sufficient: we present reliable broadcast and consensus protocols, and prove their correctness for quorum systems that provide the three properties.
keywords:
Distributed Systems, Impossibility Results, Byzantine fault tolerance1 Introduction
Bitcoin [42] had the promise to democratize the global finance. Globally scattered servers validate and process transactions, and maintain a consistent replication of a ledger. However, the nature of the proof-of-work consensus exhibited disadvantages such as high energy consumption, and low throughput. In contrast, Byzantine replication have always had modest energy consumption. Further, since its advent as PBFT [18], many recent extensions [47, 39, 48, 17, 6, 12, 13] have improved its throughput. However, its basic model of quorums is closed and homogeneous: the set of processes are fixed, and the quorums are assumed to be uniform across processes. Thus, projects such as Ripple [44] and Stellar [38, 33] emerged to bring heterogeneity and openness to Byzantine quorum systems. They let every process declare its own set of quorums, or the processes it trusts called slices, from which quorums are calculated.
In this paper, we first consider a basic model of heterogeneous quorum systems where each process has an individual set of quorums. Then, we consider fundamental questions about their properties. Quorum systems are the foundation of common distributed computing abstractions such as reliable broadcast and consensus. We specify the expected safety and liveness properties for these abstractions. What are the necessary and sufficient properties of heterogeneous quorum systems to support these abstractions? Previous work [34] noted that quorum intersection and weak availability properties are necessary for the quorum system to implement the consensus abstraction. Quorum intersection requires that every pair of quorums overlap at a well-behaved process. The safety of consensus relies on the quorum intersection property of the underlying quorum system: intuitively, if an operation communicates with a quorum, and a later operation communicates with another quorum, a single well-behaved process in their intersection can make the second quorum aware of the first. A quorum system is weakly available for a process if it has a quorum for that process whose members are all well-behaved. Intuitively, the quorum system is available to that process through that quorum. Since a process needs to communicate with at least one quorum to terminate, the liveness properties are dependent on the availability of the quorum system.
The quorum intersection and availability properties are necessary. Are they sufficient as well? In this paper, we prove that they are not sufficient conditions to implement reliable broadcast and consensus. For each abstraction, we present execution scenarios, and apply indistinguishability arguments to show that any protocol violates at least one of the safety or liveness properties. What property should be added to make the properties sufficient? A less known property is quorum sharing [34]. Roughly speaking, every quorum should include a quorum for all its members. This is a property that trivially holds for homogeneous quorum systems where every quorum is uniformly a quorum of all its members. However, in general, it does not hold for heterogeneous quorum systems. Previous work showed that it also holds for Stellar quorums if Byzantine processes do not lie about their slices.
Since Byzantine processes’ quorums is arbitrary, in practice, quorum sharing is too strong. In order to require inclusion only for the quorums of a well-behaved subset of processes, we consider a weaker notion, called quorum subsumption. As we will see, this property lets processes in the included quorum make local decisions while preserving the properties of the including quorum. We precisely capture this property, and show that together with the other two properties, it is sufficient to implement reliable broadcast and consensus abstractions. We present protocols for both reliable broadcast and consensus, and prove that if the underlying quorum system has quorum intersection, availability, and subsumption for certain quorums, then the protocols satisfy the required safety and liveness properties.
In summary, this paper makes the following contributions.
- •
- •
- •
2 Heterogeneous Quorum Systems
A quorum is a subset of processes that are collectively trusted to perform an operation. However, this trust may not be uniform: while a process may trust a part of a system, another process may not trust that same part. In this section, we adopt a general model of quorum systems [32, 34] and its properties. These basic definitions adapt common properties of quorum systems to the heterogeneous setting, and serve as the foundation for theorems and protocols in the later sections. Since we want the theorems to be as strong as possible, we introduce the weak notion of quorum subsumption in this paper.
2.1 Processes and Quorums
Processes and Failures. A quorum system is hosted on a set of processes . For every execution, we can partition the set into Byzantine and well-behaved processes. Well-behaved processes follow the given protocol, while Byzantine processes can deviate from the protocol arbitrarily.
We assume that the network is partially synchronized, i.e., after an unknown global stabilization time (GST), if both the sender and receiver are well-behaved, the message will eventually be delivered with a known bounded delay [20].
Heterogeneous Quorum Systems (HQS). To represent subjective trust, we let each process specify its own quorums. A quorum of process is a non-empty subset of that trusts to get information from if it obtains the same information from each member of . (In practice, a quorum of can contain itself, although the model does not require it.) Each process stores its own set of quorums that we call individual quorums of . Any superset of a quorum of is also a quorum of ; thus, there are minimal quorums: a quorum of is a minimal quorum of if none of its strict subsets is a quorum of . Thus, to avoid redundancy, can ignore its quorums that are proper supersets of its minimal quorums. Thus, each process stores only its individual minimal quorums.
Definition 2.1 (Quorum System).
A heterogeneous quorum system is a mapping from processes to their non-empty set of individual minimal quorums.
Since the trust assumptions of Byzantine processes can be arbitrary, their quorums can be left unspecified. Fig. 1 presents an example quorum system. When obvious from the context, we say quorums of to refer to the individual minimal quorums of , and use to refer to the set of all individual minimal quorums of the system, i.e. the co-domain of . Additionally, we say quorum systems to refer to heterogeneous quorum systems. A process is a follower of a process iff there is a quorum that includes .
In dissemination quorum system (DQS) [37] (and the cardinality-based quorum systems as a special case), quorums are uniform for all processes. Processes have the same set of individual minimal quorums. For example, a quorum system that tolerates Byzantine failures out of processes considers any set of processes as a quorum for all processes.
2.2 Properties
A quorum system is expected to maintain certain properties in order to provide distributed abstractions such as Byzantine reliable broadcast and consensus. Quorum intersection and quorum availability are well-established requirements for quorum systems. In the following section, we will see their adaption to HQS. Further, we identify a new property we call quorum subsumption that helps achieve the aforementioned abstractions on HQS. Finally, we briefly present a few related quorum systems, and their properties.
Quorum Intersection. Processes store and retrieve information from the quorum system by communicating with its quorums. To ensure that information is properly passed from a quorum to another, the quorum system is expected to maintain a well-behaved process at the intersection of every pair of quorums. For example, in the running example in Fig. 1, all the quorums of well-behaved processes intersect at at least one of well-behaved processes in .
Definition 2.2 (Quorum Intersection).
A quorum system has quorum intersection iff every pair of quorums of well-behaved processes in intersect at a well-behaved process, i.e.,
Quorum Availability. In order to support progress for a process, the quorum system is expected to have at least one quorum for that process whose members are all well-behaved. We say that the quorum system is weakly available for that process. (In the literature, this notion of availability is often unqualified, but we explicitly contrast the weak notion to the strong notion that we will define.) In classical quorum systems, any quorum is a quorum for all processes. This guarantees that if the quorum system is available for a process, it is available for all processes. However, this is obviously not true in a heterogeneous quorum system where quorums are not uniform. In this setting, we weaken the availability property so that it requires only a subset and not necessarily all well-behaved processes to have a well-behaved quorum. In Fig. 1, is available for the set : the quorum of process , and the quorum of processes and make them weakly available. Each process in that subset can always communicate with a quorum independently of Byzantine processes.
Definition 2.3 (Weak Availability).
A quorum system is weakly available for a set of processes iff every process in has at least one quorum that is a subset of well-behaved processes . A quorum system is available iff it is available for a non-empty set of processes.
If a quorum system is weakly available, there is at least one well-behaved process that can communicate with a quorum independently of Byzantine processes.
With quorum availability introduced, we can consider when a quorum system is unavailable. A quorum system is unavailable for a process when that process has no quorum in , i.e., the Byzantine processes can block every one of its quorums. We generalize this idea in the notion of blocking.
Definition 2.4 (Blocking Set).
A set of processes is a blocking set for a process (or is -blocking) if intersects every quorum of .
For example, consider cardinality-based quorum systems where the system contains processes. Any set of size is a blocking set for all well-behaved processes, since a set with processes intersects with any quorum, a set with processes. In Fig. 1, well-behaved process is blocked by , since its only quorum intersect with
Notice also that the definition does not stipulate that the blocking set is Byzantine, but rather it is more general. The concept of blocking will be useful for designing our protocols in (§ 6). For now, we prove a lemma for blocking sets. In order to state the lemma, we generalize the notion of availability. Given a set of processes , we generalize availability for at the complete set of well-behaved processes (Definition 2.3) to availability for at a subset of well-behaved processes. We say that a quorum system is weakly available for a set of processes at a subset of well-behaved processes iff every process in has at least one quorum that is a subset of .
Lemma 2.5.
In every quorum system that is weakly available for a set of processes at , every blocking set of every process in intersects .
Proof 2.6.
Consider a quorum system that is weakly available for at , a process in , and a set of processes that blocks . By the definition of available, there is at least one quorum of that is a subset of . By the definition of blocking set (Definition 2.4), intersects with . Hence, intersects as well.
Quorum subsumption. We now introduce the notion of quorum subsumption.
Definition 2.7 (Quorum Subsumption).
A quorum system is quorum subsuming for a quorum iff every process in has a quorum that is included in , i.e., . We say that is quorum subsuming for a set of quorums if it is quorum subsuming for each quorum in the set.
In Fig. 1, is quorum subsuming for : both members in this quorum have the quorum that is trivially a subset of itself. However, is not quorum subsuming for process ’s quorum : process ’s only quorum is not a subset of .
sender | 1 | 2 | 3 | 4 |
---|---|---|---|---|
blocked forever |
Quorum subsumption is inspired by and weakens the notion of quorum sharing [34]. Quorum sharing requires the above subsumption property for all quorums. Thus, many quorum systems including Ripple and Stellar do not satisfy it (unless Byzantine processes do not lie about their slices [34].) They can maintain the subsumption property only for quorums of a well-behaved subset of processes. In particular, no requirement can be made for quorums of Byzantine processes. Therefore, we define the weaker notion of quorum subsumption for a subset of quorums, and later show that it is sufficient to implement broadcast and consensus.
In order to make progress, protocols (such as Bracha’s Byzantine reliable broadcast [9]) require the members of a quorum to be able to communicate with at least one of their own quorums, or communicate with a subset of processes that contains at least one well-behaved process. Let us see intuitively how quorum subsumption can support liveness properties. Consider a quorum system for processes where the Byzantine processes are , and , , and . The quorum system has quorum intersection, and is weakly available for the set since there is a well-behaved quorum for the process . In the classic Bracha protocol, the sender broadcasts , a well-behaved broadcasts when it receives it from the sender, it broadcasts after receiving or messages, and finally, delivers if it receives messages. In Stellar [33] and follow-up works [34, 24, 15], the check for receiving messages from processes is replaced with receiving messages from a blocking set of the current process. Let’s consider the example execution presented in Table 1; it gives an intuition of why the quorum system needs stronger conditions than weak availability. Consider a Byzantine sender who sends to process . Well-behaved process sends out to each other. We let process deliver messages from process , , and first; it then sends out messages. We note that the two processes , and cannot broadcast since they have not received from a quorum of their own. Then the Byzantine process sends messages to process . Since the set is blocking for the quorums of both processes and , both send out messages. These broadcast protocols prevent a process that is ready for a value from getting ready for another value. Therefore, although and are both blocking sets for the process , it cannot become ready for . Process never receives enough Thus, messages for either or to deliver a message, and is blocked forever. If the quorum for had the quorum subsumption property, then and could send out messages, and eventually would make progress.
Complete Quorum. We will later see that quorum availability and quorum subsumption are important together for liveness. We succinctly combine the two properties into the notion of complete quorums.
Definition 2.8 (Complete Quorum).
A quorum in a quorum system is a complete quorum if all its members are well-behaved, and is quorum subsuming for .
In our previous running example Fig. 1, quorum is a complete quorum: both of its members are well-behaved and is quorum subsuming for .
Definition 2.9 (Strong Availability).
A quorum system has strong availability for a subset of processes iff every process in has at least one complete quorum. We call a strongly available set for , and call a member of a strongly available process. We say that is strongly available if it is strongly available for a non-empty set.
Intuitively, operations stay available at a strongly available process since its complete quorum can perform operations on his behalf in the face of Byzantine attacks. In Fig. 1, is strongly available for . In contrast, is only weakly available for process , since its quorum includes that is not well-behaved, and its other quorum is well-behaved but not a complete quorum.
By Lemma 2.5, every blocking set of every strongly available process contains at least one well-behaved process.
3 Protocol Implementation
In the subsequent sections, we will see that it is impossible to construct a protocol for Byzantine reliable broadcast and consensus in an HQS given only quorum intersection and quorum availability. After that, we give a protocol for Byzantine reliable broadcast and consensus for an HQS that has quorum intersection and strong availability. We first need a model of quorum-based protocols, and then the exact specifications of the distributed abstractions we aim to design protocols for. In this section, we consider the former.
We consider a modular design for protocols. A protocol is captured as a component that accepts request events and issues response events. A component uses other components as sub-components: it issues requests to them and accepts responses from them. A component stores a state and defines handlers for incoming requests from the parent component, and incoming responses from children components. Each handler gets the pre-state and the incoming event as input, and outputs the post-state and outgoing events, either as responses to the parent or requests to the children components. The outputs of a handler can be deterministically a function of its inputs, or randomized.
Definition 3.1 (Determinism).
A protocol is deterministic iff the outputs of its handlers are a function of the inputs.
Quorum-based Protocols. A large class of protocols are implemented based on quorum systems. In order to state impossibility results for these protocols, we capture the properties of quorum-based protocols [34, 29] as a few axioms. Our impossibility results concern protocols that adhere to the necessity, sufficiency, and locality axioms.
A process in a quorum-based protocol should process a request only if it can communicate with at least one of its quorums.
Axiom 1 (Necessity of Quorums [34]).
If a well-behaved process issues a response for a request then there must be a quorum of such that receives at least one message from each member of .
In a quorum-based protocol, a process only needs the participation of itself and members of one of its quorums to deliver a message.
Axiom 2 (Sufficiency of Quorums).
For every execution where a well-behaved process issues a response, there exists an execution where only and a quorum of take steps, and eventually issues the same response.
We add a remark for Byzantine reliable broadcast (BRB) which has a designated sender process. We will use a slight variant of the sufficiency axiom for BRB that states that there exists an execution where only the sender, and a quorum of take steps.
A process’s local state is only affected by the information that it receives from the members of it’s quorums.
Axiom 3 (Locality).
The state of a well-behaved process changes upon receiving a message only if the sender is a member of one of its quorums.
For BRB, we will use a slight variant of the locality axiom that allows processes change state upon receiving messages from the sender in addition to members of quorums.
4 Protocol Specification
We now define the specification of reliable broadcast and consensus for HQS. The liveness properties are weaker than classical notions since in an HQS, availability might be maintained only for a subset of well-behaved processes.
Reliable Broadcast. We now define the specification of the reliable broadcast abstraction. The abstraction accepts a single broadcast request from a designated sender (either in the system or a process that is separate from the other processes in system), and issues delivery responses.
Definition 4.1 (Specification of Reliable Broadcast).
-
•
(Validity for a set of well-behaved processes ). If a well-behaved process broadcasts a message , then every process in eventually delivers .
-
•
(Integrity). If a well-behaved process delivers a message from a well-behaved sender , then was previously broadcast by .
-
•
(Totality for a set of well-behaved processes ). If a message is delivered by a well-behaved process, then every process in eventually delivers a message.
-
•
(Consistency). No two well-behaved processes deliver different messages.
-
•
(No duplication). Every well-behaved process delivers at most one message.
We also consider a variant of reliable broadcast called federated voting. Similar to reliable broadcast, the abstraction accepts a broadcast request from processes, and issues delivery responses. In contrast to reliable broadcast where there is a dedicated sender, in federated voting, every process can broadcast a message. The specification of federated voting is similar to that of reliable broadcast except for validity. The messages that well-behaved processes broadcast may not be the same. Therefore, the validity property provides guarantees only when the messages are the same or there is only one sender. The validity property for a set of well-behaved processes guarantees that if all well-behaved processes broadcast a message , or only one well-behaved process broadcasts a message , then every process in eventually delivers .
Consensus. We now consider the specification of the consensus abstraction. It accepts propose requests from processes in the system, and issues decision responses.
Definition 4.2 (Specification of Consensus).
-
•
(Validity). If all processes are well-behaved, and some process decides a value, then that value was proposed by some process.
-
•
(Agreement). No two well-behaved processes decide differently.
-
•
(Termination for a set of well-behaved processes ). Every process in eventually decides.
5 Impossibility
We now present the impossibility results for consensus and Byzantine Reliable Broadcast (BRB). It is known that quorum intersection and quorum availability are necessary conditions [34] to implement consensus and BRB protocols. In this section, we show that while these two conditions are necessary, they are not sufficient.
We consider the information-theoretic settings (Fault axiom [21]), where byzantine processes have unlimited computational power, and can show arbitrary behavior. However, processes communicate only over secure channels so that the recipient knows the identity of the sender. A Byzantine process is unable to impersonate a well-behaved process. This is similar to the classic unauthenticated Byzantine general problem [30], and is necessary for open decentralized blockchains and HQS, where the trusted authorities including public key infrastructures may not be available.
The two proofs will take a similar approach. First, we assume there does exist a protocol for our distributed abstraction that satisfies all the desired specifications. We then present a quorum system and consider its executions that have quorum intersection and availability in the face of Byzantine attacks. We then show through a series of indistinguishable executions that the protocol cannot satisfy all the desired specifications, leading to a contradiction. The high-level idea is that in the information-theoretic setting, a well-behaved process is not able to distinguish between an execution where the sender is Byzantine and sends misleading messages, and an execution where the relaying process is Byzantine and forwards misleading messages. For example, let and be three processes in the system. When receives conflicting messages from through , it does not know whether or is Byzantine. This eventually leads to violation of the agreement or validity property of the abstraction.
We consider binary proposals for consensus, and binary values (from the sender) for reliable broadcast. For the consensus abstraction, we succinctly present the values that processes propose as as a vector of values that we call a configuration. If the initial value of a process is in the configuration, that process is considered Byzantine. Otherwise, the process is well-behaved. For example, a configuration denotes the first and second process proposing zero and the third process being Byzantine.
5.1 Consensus
We first consider consensus protocols in HQS.
Theorem 5.1.
Quorum intersection and weak availability are not sufficient for deterministic quorum-based consensus protocols to provide validity, agreement and termination for weakly available processes.
Proof 5.2.
We suppose there is a quorum-based consensus protocol that guarantees validity, agreement, and termination for every quorum system with quorum intersection and weak availability, towards contradiction. Consider a quorum system for processes with the following quorums: , , .
We make the following observations: (1) if all processes are well-behaved, then has quorum intersection and weak availability for , (2) if only process is Byzantine, then preserves quorum intersection, and weak availability for , (3) if only process is Byzantine, then preserves quorum intersection, and weak availability for . Going forward, we implicitly assume termination for weakly available processes.
Now consider the following four configurations as shown in Fig. 2: , , , and . The goal is now to show a series of executions over the configurations so that at least one property of the protocol is violated.
-
•
We begin with execution (shown in red) with the initial configuration . All the messages between and are delivered. By termination for weakly available processes and validity, process decides 0. Additionally, by quorum sufficiency, can reach this decision with only processes taking steps.
-
•
Next, we have execution (shown in blue) with initial configuration . All the messages between and are delivered. Again, by termination for weakly available processes and validity, process decides 1. By quorum sufficiency, can reach this decision with only processes taking steps.
-
•
Next, we have execution as a sequence of and , with initial configuration . Suppose messages between well-behaved processes and are delayed. Byzantine process first replays with process , then replays with process . This cause process to decide . Now let Byzantine process stay silent, and messages between processes and be delivered. By termination for , agreement and quorum sufficiency, process makes decide as well (shown in green).
-
•
Lastly, we have execution with initial configuration . Suppose messages between to are delivered in the beginning. We let processes replay ; thus, decides 1. Then, Byzantine process sends messages to as if it were at the end of . In turn, decides 0. Thus, agreement is violated as two well-behaved processes decided differently.
Indistinguishably
We provide some intuition for the proof construction. Ultimately, the problem lies in process not being able to distinguish whether process or process is the Byzantine process. More specifically, both and begin with execution . Since process cannot distinguish between the two executions, it does not know which value to decide. If process believes is the actual execution, then should decide 0 to agree with the decision of well-behaved process . However, if is the actual execution, then agreement is violated as process decided 1. Conversely, if process believes is the actual execution, then should decide 1 to agree with the decision of well-behaved process . Then, if is the actual execution, agreement is violated as the well-behaved process decided 0.
We note that this proof could not be constructed if there was quorum subsumption. For example, if the process adds the quorum , then will have quorum subsumption for the quorum of . However, then by quorum subsumption, there will be no Byzantine process, and the executions and cannot be constructed. If the process adds the quorum , then it will have quorum subsumption. However, then the process cannot Byzantine process anymore, and the executions cannot be constructed. Similarly, if the process adds the quorum , the executions cannot be constructed.
5.2 Byzantine Reliable Broadcast
Now, we prove the insufficiency of quorum intersection and quorum availability for Byzantine reliable broadcast.
For the reliable broadcast abstraction, we represent the initial configuration as an array of values received by the processes from the sender. The sender is a fixed and external process in the executions, and is only used to assign input values for processes in the system, which are captured as the initial configurations. The sender does not take steps in the executions, and processes are not able to distinguish executions based on the sender.
Theorem 5.3.
Quorum intersection and weak availability are not sufficient for deterministic quorum-based reliable broadcast protocols to provide validity and totality for weakly available processes, and consistency.
Proof 5.4.
The proof is similar to the proof for consensus. In fact, we will reuse the construction. There are differences between reliable broadcast and consensus specifications in (1) their validity properties, and (2) their totality and termination properties respectively. The proof can be adjusted for these differences. For reliable broadcast, we need a sender process who broadcasts a message. In executions that we want a well-behaved process to deliver the message , we either (1) keep the sender well-behaved and have it send , and then apply validity, or (2) have a process deliver , then apply totality and consistency. The initial configuration represents values received by each process from the sender.
Executions follow those in the previous proof. Message delivery and delays mirror the previous executions. In execution for configuration , the well-behaved sender broadcasts 0, and messages between processes and are delivered. By validity for weakly available processes, process delivers 0, and by quorum sufficiency, only processes need to take steps. In execution for configuration , the well-behaved sender broadcasts 1, and messages between processes and are delivered. By validity for weakly available processes, and quorum sufficiency, process delivers 1, only with taking steps. In configurations and , the sender is Byzantine. The messages between processes and are delayed in the beginning. In execution for configuration , the Byzantine sender and Byzantine process replay with process , then replay with process . Then Byzantine process stays silent, and messages between processes and are delivered. By totality for weakly available processes, since process delivers 0, then process will also deliver a value. By consistency, process delivers 0 as well. In the last execution for configuration , we let the Byzantine process stay silent in the beginning, and processes and replay . Thus, process delivers 1. Afterwards, messages between process and are delayed, and the Byzantine process replays . Again, process cannot distinguish between the two executions and . Since process sends the exact same messages to process as the end of , process will deliver 0. Thus, consistency between and is violated.
6 Protocols
We just showed that quorum intersection and availability are not sufficient to implement our desired distributed abstractions. Now, we show that quorum intersection and strong availability, our newly introduced property are sufficient to implement both Byzantine reliable broadcast and consensus.
6.1 Reliable Broadcast Protocol
In Alg. 1, we adapt the Bracha protocol [9] to show that quorum intersection and strong availability together are sufficient for Byzantine reliable broadcast. The parts that are different from the classical protocol are highlighted in blue.
Each process stores the set of its individual minimal quorums , and its set of followers . It also stores the boolean flags , , and which record actions the process has taken to avoid duplicate actions. It further uses point-to-point links to each of its followers. Upon receiving a request to broadcast a value (at Alg. 1), the sender broadcasts the value to all processes (at Alg. 1). Upon receiving the message from the sender (at Alg. 1), a well-behaved process echoes the message among its followers (at Alg. 1) only if it has not already . When a well-behaved process receives a quorum of consistent echo messages (at Alg. 1), it sends ready messages to all its followers (at Alg. 1). A well-behaved process can also send a ready message when it receives consistent ready messages from a blocking set (at Alg. 1). When a well-behaved process receives a quorum of consistent ready messages for (at Alg. 1), it delivers (at Alg. 1). The implementation of the federated voting abstraction is similar. The only difference is that there can be multiple senders (at Alg. 1).
We prove that this protocol implements Byzantine reliable broadcast when the quorum system satisfies quorum intersection, and strong availability. We remember that strong availability requires both weak availability and quorum subsumption. More precisely, it requires a well-behaved quorum for a process , and quorum subsumption for .
Theorem 6.1.
Quorum intersection and strong availability are sufficient to implement Byzantine reliable broadcast.
This theorem follows from five lemmas in the appendix [31] that prove the protocol satisfies the specification of Byzantine reliable broadcast that we defined in Definition 4.1. Consider a quorum system with quorum intersection, and strong availability for . Here, we state and prove only the validity property.
Lemma 6.2.
The BRB protocol guarantees validity for .
Proof 6.3.
Consider a well-behaved sender that broadcasts a message . We show that every process in eventually delivers . By availability, every process has a complete quorum . Consider a process . By quorum subsumption, has a quorum . By availability, all members of (including ) are well-behaved. Thus, when they receive from the sender, they all echo it to their followers. The processes in have as a follower. Thus, receives consistent echo messages for from one of its quorums . Thus, sends out ready messages for to its followers. Thus, all processes in send out ready messages for to their followers. The processes in have as a follower. Therefore, receives a quorum of consistent ready messages for from one of its quorums , and delivers .
6.2 Byzantine Consensus Protocol
In this section, we show that quorum intersection and strong availability are sufficient to implement Byzantine consensus. We first present the consensus protocol for heterogeneous quorum systems, and then prove its correctness.
At a high level, the protocol proceeds in rounds with assigned leaders for each. Ballots that carry proposal values are totally ordered. A leader tries to commit its own candidate ballot only after aborting any lower ballot in the system. Leaders use the federated voting abstraction (that we saw in § 4) to abort or commit ballots. There may be multiple leaders or Byzantine leaders before GST, and they may broadcast contradicting abort and commit messages for the same ballot. However, by the consistency property of federated voting, processes agree on aborting or committing ballots.
A ballot is a pair of a round number and a proposed value . Ballots are totally ordered by first their round numbers, and then their values: a ballot is below another , written as , if or . Two ballots and are compatible, , if they have the same value, i.e., ; otherwise, they are incompatible, . We say that a ballot is below and incompatible with another, , if and . For message passing communication, we assume batched network semantics (BNS), where messages issued in an event are sent as a batch, and the receiving process delivers and processes the batch of messages together. (In particular, as we will see later in the correctness proofs, if prepare messages that are sent together are not processed together the validity property can be violated.)
The protocol is similar to SCP [38, 25] in structure; the important difference is that this protocol uses leaders [34] and guarantees termination. Our protocol guarantees termination regardless of Byzantine processes. On the other hand, the SCP protocol guarantees a liveness property called non-blocking which requires Byzantine processes to stop. (More precisely, if a process in the intact set [38, 24] has not yet decided in some execution, then for every continuation of that execution in which all the Byzantine processes stop, the process eventually decides.)
Each process stores four local variables: is the current round number, is the ballot that the process tries to commit, is the ballot that the process is safe to discard any ballots lower and incompatible with, and is the current leader. Each process uses an instance of federated voting for each ballot, and an eventual leader election module. The latter issues events, and eventually elects a well-behaved process as the leader. (Previous work [34] presented a probabilistic leader election module.)
Upon receiving a proposal request (at Alg. 2), a well-behaved process initializes its candidate ballot to the pair of the first round and its own proposal (at Alg. 2). If the current process is the leader, it tries to prepare its by broadcasting abort messages for all ballots with (at Alg. 2). When a well-behaved process delivers messages from the leader for all ballots below and incompatible with some ballot , and its current ballot is below (at Alg. 2), it sets to (at Alg. 2). If the current process is the leader, and the ballot is equal to the ballot, then it broadcasts a commit message for its ballot (at Alg. 2). When a well-behaved process delivers a message for a ballot from the leader, and it has already prepared the same ballot (at Alg. 2), it decides the value of that ballot (at Alg. 2).
To ensure liveness, a well-behaved process triggers a timeout if no value is decided after a predefined time elapses in each round. The process then complains to the leader election module (at Alg. 2). When the leader election module issues a new leader (at Alg. 2), a well-behaved process updates its variable, and increments the number (at Alg. 2). The leader itself then waits for a time (at Alg. 2) which we will further explain below. The process also resets the timer with a doubled timeout for the next round (at Alg. 2). It then updates the ballot: if no value is prepared before, the ballot is updated to the new round number and the value of the current (at Alg. 2); otherwise, it is updated to the new round number and the value of the ballot (at Alg. 2). Then, the leader tries to prepare the by aborting below and incompatible ballots similar to the steps above (at Alg. 2).
Let us now explain why delay is needed for termination. Without this delay, a Byzantine leader can perform a last minute attack that we illustrate in Fig. 3. Consider that we have four processes, one of them is Byzantine, and any set of three processes is a quorum. Let the Byzantine process be the leader , and let the ballot be prepared. The leader sends a commit for ballot to one well-behaved process . Then, echos commit for . Then, the timeout for happens, and the next well-behaved leader comes up. Without the delay, may have not prepared yet (although other well-behaved processes and prepared it). Therefore, the ballot that updates its candidate to (at Alg. 2) is not , and may not be compatible with . In order to prepare , the leader tries to abort (at Alg. 2) but cannot be aborted: in order to abort , a quorum of processes should echo it. However, the well-behaved process has already echoed commit, and if the Byzantine process remains silent, the remaining two well-behaved processes and are not a quorum, and cannot abort . Therefore, cannot succeed, and the timeout is triggered. Further, if the next leader is the Byzantine process again, it can repeat the above scenario: it can abort to prepare a higher ballot , and make a well-behaved process echo commit for , before passing the leadership. The attack can continue infinitely, and delay termination. If the delay is larger than the bounded communication delay after GST, it makes the leader observe the highest prepared ballot , and adopt its value as the value of its candidate (at Alg. 2). When it tries to commit , since it is compatible with , it does not need abort it. Therefore, it can prepare and commit , and decide. We also note that instead of the delay , the above attack can be avoided if the leader election can provide two successive well-behaved leaders.
Theorem 6.4.
Quorum intersection and strong availability are sufficient to implement consensus.
This theorem follows from three lemmas in the appendix [31] that prove that the protocol satisfies the specification of Byzantine consensus that we defined in Definition 4.2. An example execution of the protocol is described in the appendix [31].
7 Related Works
Quorum Systems with Heterogeneous Trust. Ripple [44] and Cobalt [35] pioneered decentralized trust. They let each node specify a list, called the unique node list (UNL), of processes that it trusts. However, they do not consider quorum availability or subsumption.
Stellar [38, 33] presents federated Byzantine quorum systems (FBQS) [24, 25] where quorums are iteratively calculated from quorums slices. Stellar also presents a federated voting and consensus protocol. In comparison, the assumptions of the protocols presented in this paper are weaker, and their guarantees are stronger. The stellar consensus protocol (SCP) guarantees termination when Byzantine processes stop. In contrast, the consensus protocol in this paper guarantees termination regardless of Byzantine processes. Further, abstract SCP [24] provides agreement only for intact processes. The intact set for an FBQS is a subset of processes that have strong availability. On the other hand, the consensus protocol in this paper provides agreement for all well-behaved processes. In FBQS, the intersections of quorums should have a process in the intact set; however, in HQS, they only need to have a well-behaved process. The validity and totality properties for the reliable broadcast for FBQS are restricted to the intact set. On the other hand, the reliable broadcast protocol in this paper provides totality for all processes that have weak availability, and validity for all processes that have strong availability.
Personal Byzantine quorum systems (PBQS) [34] capture the quorum systems that FBQSs derive form slices, and propose a responsiveness consensus protocol [48, 1, 43, 3]. They define a notion called quorum sharing which requires quorum subsumption for every quorum. Stellar quorums have quorum sharing if and only if processes do not lie about their slices. (The appendix [31] presents examples.) In this paper, we relax quorum sharing to quorum subsumption, and capture quorums that FBQSs derive even when Byzantine quorums lie about their slices, and show that even if a quorum system does not satisfy quorum sharing, safety can be maintained for all processes, and liveness can be maintained for the set of strongly available processes.
Asymmetric Byzantine quorum systems (ABQS) [15, 16, 4] allow each process to define a subjective dissemination quorum system (DQS), in a globally known system. The followup model [14] lets each process specify a subjective DQS for processes that it knows, transitively relying on the assumptions of other processes. In contrast, HQS lets each process specify its own set of quorums without knowing the quorums of other processes. Further, it does not require the specification of a set of possible Byzantine sets. Further, there are systems where a strongly available set (from HQS) exists but no guild set (from ABQS) exists. (The appendix [31] presents examples.) Therefore, HQS can provide safety and liveness for those executions but ABQS cannot. ABQS presents shared memory and broadcast protocols, and further, rules to compose two ABQSs. On the other hand, this paper proves impossibility results, and presents protocols for reliable broadcast and consensus abstractions. HQS provides strictly stronger guarantees with weaker assumptions. In ABQS, the properties of reliable broadcast are stated for wise processes and the guild. However, this paper states these four properties for well-behaved processes and the strongly available set. Well-behaved processes are a superset of wise processes, and as noted above, in certain executions, the strongly available set is a superset of the guild.
Flexible BFT [36] allows different failure thresholds between learners. Heterogeneous Paxos [45, 46] further generalizes the separation between learners and acceptors with different trust assumptions; it specifies quorums as sets rather than number of processes. These two projects introduce a consensus protocol that guarantees safety or liveness for learners with correct trust assumptions. However, they require the knowledge of all processes in the system. In contrast, HQS only requires partial knowledge of the system, and captures the properties of quorum systems where reliable broadcast and consensus protocols are impossible or possible. Multi-threshold reliable broadcast and consensus [27] and MT-BFT [40] elaborate Bracha [9] to have different fault thresholds for different properties, and different synchrony assumptions. However, they have cardinality-based or uniform quorums across processes. In contrast, HQS supports heterogeneous quorums.
K-consistent reliable broadcast (K-CRB) [7] introduces a relaxed reliable broadcast abstraction where the correct processes can define their own quorum systems. Given a quorum system, it focuses on delivering the smallest number of different values. In contrast, we propose the weakest condition to solve classical reliable broadcast and consensus. Moreover, K-CRB’s relaxed liveness guarantee (accountability) requires public key infrastructure. In contrast, all the results in this paper are for information-theoretic setting.
Our consensus protocol uses eventual leader election. Several other works present view synchronization and eventual leader election for Byzantine replicated systems [11, 10], and dynamic networks [41, 28]. It is interesting to see if their leader election modules can be generalized to the heterogeneous setting, and support responsiveness [48, 5] for our consensus protocol.
Impossibility Results. There are two categories of assumptions about the computational power of Byzantine processes. In the information-theoretic setting, Byzantine process have unlimited computational resources. While in the computational setting, Byzantine processes can not break a polynomial-time bound [23]. In this work, our impossibility results for reliable broadcast and consensus fall in the information-theoretic category. Whether the same results hold in the computational setting is an interesting open question.
FLP [22] proved that consensus is not solvable in asynchronous networks even with one crash failure. Many following works [26, 19, 2, 21, 30, 8] considered solvability, and necessary and sufficient conditions for consensus and reliable broadcast to tolerate Byzantine failures in partially synchronous networks. The number of processes should be more than and the connectivity of the communication graph should be more than . However, these results apply for cardinality-based quorums, which is a special instance of HQS. We generalize the reliable broadcast and consensus abstractions to HQS which supports non-uniform quorums, and prove impossibility results for them.
8 Conclusion
This paper presented a general model of heterogeneous quorum systems where each process defines its own set of quorums, and captured their properties. Through indistinguishably arguments, it proved that no deterministic quorum-based protocol can implement the consensus and Byzantine reliable broadcast abstractions on a heterogeneous quorum system that provides only quorum intersection and availability. It introduced the quorum subsumption property, and showed that the three conditions together are sufficient to implement the two abstractions. It presented Byzantine broadcast and consensus protocols for heterogeneous quorum systems, and proved their correctness when the underlying quorum system maintain the three properties.
Acknowledgments
We would like to thank DISC ’23 reviewers for detailed and constructive reviewers. Further, we would like to specially thank Giuliano Losa for his insightful comments.
References
- [1] Ittai Abraham and Gilad Stern. Information theoretic hotstuff. arXiv preprint arXiv:2009.12828, 2020.
- [2] Marcos Kawazoe Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. Consensus with byzantine failures and little system synchrony. In International Conference on Dependable Systems and Networks (DSN’06), pages 147–155. IEEE, 2006.
- [3] Dan Alistarh, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi Zhu. Why extension-based proofs fail. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 986–996, 2019.
- [4] Orestis Alpos, Christian Cachin, and Luca Zanolini. How to trust strangers: Composition of byzantine quorum systems. In 2021 40th International Symposium on Reliable Distributed Systems (SRDS), pages 120–131. IEEE, 2021.
- [5] Hagit Attiya, Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Bounds on the time to reach agreement in the presence of timing uncertainty. Journal of the ACM (JACM), 41(1):122–152, 1994.
- [6] Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, Francois Garillot, Zekun Li, Dahlia Malkhi, Oded Naor, Dmitri Perelman, and Alberto Sonnino. State machine replication in the libra blockchain. The Libra Assn., Tech. Rep, 7, 2019.
- [7] João Paulo Bezerra, Petr Kuznetsov, and Alice Koroleva. Relaxed reliable broadcast for decentralized trust. In Networked Systems: 10th International Conference, NETYS 2022, Virtual Event, May 17–19, 2022, Proceedings, pages 104–118. Springer, 2022.
- [8] Malte Borcherding. Levels of authentication in distributed agreement. In International Workshop on Distributed Algorithms, pages 40–55. Springer, 1996.
- [9] Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM (JACM), 32(4):824–840, 1985.
- [10] Manuel Bravo, Gregory Chockler, and Alexey Gotsman. Liveness and latency of byzantine state-machine replication. In 36th International Symposium on Distributed Computing (DISC 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- [11] Manuel Bravo, Gregory Chockler, and Alexey Gotsman. Making byzantine consensus live. Distributed Computing, 35(6):503–532, 2022.
- [12] Ethan Buchman. Tendermint: Byzantine fault tolerance in the age of blockchains. PhD thesis, University of Guelph, 2016.
- [13] Ethan Buchman, Rachid Guerraoui, Jovan Komatovic, Zarko Milosevic, Dragos-Adrian Seredinschi, and Josef Widder. Revisiting tendermint: Design tradeoffs, accountability, and practical use. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), pages 11–14. IEEE, 2022.
- [14] Christian Cachin, Giuliano Losa, and Luca Zanolini. Quorum systems in permissionless network. arXiv preprint arXiv:2211.05630, 2022.
- [15] Christian Cachin and Björn Tackmann. Asymmetric distributed trust. In 23rd International Conference on Principles of Distributed Systems (OPODIS 2019). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- [16] Christian Cachin and Luca Zanolini. From symmetric to asymmetric asynchronous byzantine consensus. arXiv preprint arXiv:2005.08795, 2020.
- [17] Harold Carr, Christa Jenkins, Mark Moir, Victor Cacciari Miraldo, and Lisandra Silva. Towards formal verification of hotstuff-based byzantine fault tolerant consensus in agda. In NASA Formal Methods: 14th International Symposium, NFM 2022, Pasadena, CA, USA, May 24–27, 2022, Proceedings, pages 616–635. Springer, 2022.
- [18] Miguel Castro, Barbara Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999.
- [19] Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui, Vassos Hadzilacos, Petr Kouznetsov, and Sam Toueg. The weakest failure detectors to solve certain fundamental problems in distributed computing. In Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pages 338–346, 2004.
- [20] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2):288–323, 1988.
- [21] Michael J Fischer, Nancy A Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1):26–39, 1986.
- [22] Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32(2):374–382, 1985.
- [23] Juan Garay and Aggelos Kiayias. Sok: A consensus taxonomy in the blockchain era. In Cryptographers’ track at the RSA conference, pages 284–318. Springer, 2020.
- [24] Álvaro García-Pérez and Alexey Gotsman. Federated byzantine quorum systems. In 22nd International Conference on Principles of Distributed Systems (OPODIS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
- [25] Álvaro García-Pérez and Maria A Schett. Deconstructing stellar consensus (extended version). arXiv preprint arXiv:1911.05145, 2019.
- [26] Guy Goren, Yoram Moses, and Alexander Spiegelman. Probabilistic indistinguishability and the quality of validity in byzantine agreement. arXiv preprint arXiv:2011.04719, 2020.
- [27] Martin Hirt, Ard Kastrati, and Chen-Da Liu-Zhang. Multi-threshold asynchronous reliable broadcast and consensus. Cryptology ePrint Archive, 2020.
- [28] Rebecca Ingram, Patrick Shields, Jennifer E Walter, and Jennifer L Welch. An asynchronous leader election algorithm for dynamic networks. In 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1–12. IEEE, 2009.
- [29] Leslie Lamport. Lower bounds for asynchronous consensus. Distributed Computing, 19:104–125, 2006.
- [30] Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, pages 382–401, 1982.
- [31] Xiao Li, Eric Chan, and Mohsen Lesani. Quorum subsumption for heterogeneous quorum systems. technical report. In International Symposium on Distributed Computing (DISC 2023), 2023.
- [32] Xiao Li and Mohsen Lesani. Open heterogeneous quorum systems, 2023. arXiv:2304.02156.
- [33] Marta Lokhava, Giuliano Losa, David Mazières, Graydon Hoare, Nicolas Barry, Eli Gafni, Jonathan Jove, Rafał Malinowsky, and Jed McCaleb. Fast and secure global payments with stellar. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, pages 80–96, 2019.
- [34] Giuliano Losa, Eli Gafni, and David Mazières. Stellar consensus by instantiation. In 33rd International Symposium on Distributed Computing (DISC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
- [35] Ethan MacBrough. Cobalt: Bft governance in open networks. arXiv preprint arXiv:1802.07240, 2018.
- [36] Dahlia Malkhi, Kartik Nayak, and Ling Ren. Flexible byzantine fault tolerance. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 1041–1053, 2019.
- [37] Dahlia Malkhi and Michael Reiter. Byzantine quorum systems. Distributed computing, 11(4):203–213, 1998.
- [38] David Mazieres. The stellar consensus protocol: A federated model for internet-level consensus. Stellar Development Foundation, 32:1–45, 2015.
- [39] Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. The honey badger of bft protocols. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 31–42, 2016.
- [40] Atsuki Momose and Ling Ren. Multi-threshold byzantine fault tolerance. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 1686–1699, 2021.
- [41] Achour Mostefaoui, Michel Raynal, Corentin Travers, Stacy Patterson, Divyakant Agrawal, and Amr EL Abbadi. From static distributed systems to dynamic systems. In 24th IEEE Symposium on Reliable Distributed Systems (SRDS’05), pages 109–118. IEEE, 2005.
- [42] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. White paper, 2008.
- [43] Rafael Pass and Elaine Shi. Thunderella: Blockchains with optimistic instant confirmation. In Advances in Cryptology–EUROCRYPT 2018: 37th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tel Aviv, Israel, April 29-May 3, 2018 Proceedings, Part II 37, pages 3–33. Springer, 2018.
- [44] David Schwartz, Noah Youngs, and Arthur Britto. The ripple protocol consensus algorithm. Ripple Labs Inc White Paper, 5(8):151, 2014.
- [45] Isaac Sheff, Xinwen Wang, Robbert van Renesse, and Andrew C Myers. Heterogeneous paxos. In OPODIS: International Conference on Principles of Distributed Systems, number 2020 in OPODIS, 2021.
- [46] Isaac C Sheff, Robbert van Renesse, and Andrew C Myers. Distributed protocols and heterogeneous trust: Technical report. arXiv preprint arXiv:1412.3136, 2014.
- [47] Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Verissimo. Efficient byzantine fault-tolerance. IEEE Transactions on Computers, 62(1):16–30, 2011.
- [48] Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. Hotstuff: BFT consensus with linearity and responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, pages 347–356, 2019.