Asynchronous Byzantine Reliable Broadcast
With a Message Adversary
Abstract
This paper considers the problem of reliable broadcast in asynchronous authenticated systems, in which processes communicate using signed messages and up to processes may behave arbitrarily (Byzantine processes). In addition, for each message broadcast by a correct (i.e., non-Byzantine) process, a message adversary may prevent up to correct processes from receiving . (This message adversary captures network failures such as transient disconnections, silent churn, or message losses.) Considering such a “double” adversarial context and assuming , a reliable broadcast algorithm is presented. Interestingly, when there is no message adversary (i.e., ), the algorithm terminates in two communication steps (so, in this case, this algorithm is optimal in terms of both Byzantine tolerance and time efficiency). It is then shown that the condition is necessary for implementing reliable broadcast in the presence of both Byzantine processes and a message adversary (whether the underlying system is enriched with signatures or not).
Keywords: Asynchronous system, Byzantine processes, Churn, Message adversary, Message losses, Message-passing, Message signatures, Reliable broadcast, Transient disconnection.
1 Introduction
Reliable broadcast
Introduced in the mid-eighties, Reliable Broadcast is a fundamental communication abstraction that lies at the center of fault-tolerant asynchronous distributed systems. Formally defined in [7, 8], it allows each process to broadcast messages in the presence of process failures, with well-defined delivery properties111The term delivery refers here to the application layer where a process receives and processes the content of an application message (see Section 2.1).. In turn, these properties make it possible to design provably correct distributed software for upper-layer applications based on such a broadcast abstraction.
Intuitively, reliable broadcast guarantees that the non-faulty processes deliver the same set of messages, which includes at least all the messages they broadcast. This set may also contain messages broadcast by faulty processes. The fundamental property of reliable broadcast lies in the fact that no two non-faulty processes deliver different sets of messages [9, 24].
When some processes may suffer from Byzantine failures [20], designing a reliable broadcast communication abstraction that tolerate such failures is far from trivial. Such an algorithm is called Byzantine-tolerant reliable broadcast (BRB) and we say that a process brb-broadcasts and brb-delivers messages. The most famous BRB algorithm is due to Bracha [7] (1987). For an application message, this algorithm gives rise to three sequential communication steps and up to implementation messages sent by correct processes. This algorithm requires , which is optimal in terms of fault tolerance.
Recent works related to reliable broadcast
Due to its fundamental nature, BRB has been addressed by many authors. Here are a few recent results. Similarly to Bracha’s algorithm, all these algorithms assume an underlying fully connected reliable network.
- •
-
•
Addressing efficiency issues, the BRB algorithm presented in [19] implements the reliable broadcast of an application message with only two communication steps and up to implementation messages sent by correct processes. The price to pay for this gain in efficiency is a weaker -resilience, namely . Hence, this algorithm and Bracha’s algorithm differ in their trade-off between -resilience and message/time efficiency.
-
•
Scalable BRB is addressed in [17]. The goal of this work is to avoid paying the message complexity price. To this end, the authors use a non-trivial message-gossiping approach which allows them to design a sophisticated BRB algorithm satisfying probability-dependent properties.
-
•
BRB in dynamic systems is addressed in [16] (dynamic means that a process can enter and leave the system at any time). In their article, the authors present an efficient BRB algorithm for such a context. This algorithm assumes that, at any time, there are at least two times more correct processes than Byzantine ones in the system.
-
•
An efficient algorithm for BRB with long inputs of bits using lower costs than single-bit instances is presented in [21]. This algorithm, which assumes , achieves the best possible communication complexity of input sizes. This article also presents an authenticated extension of this solution.
The work presented in this paper222A very preliminary version of this work appeared in [3]. goes beyond the previous proposals by considering the conjunction of two types of adversary: as in the above works, processes may be Byzantine, but in addition a message adversary may also remove implementation messages between correct processes. More precisely, this work addresses the problem of fault-tolerant reliable broadcast in asynchronous -process message-passing systems enriched with message signatures, in which up to processes are Byzantine, and a message adversary that may prevent up to non-Byzantine processes from delivering an implementation message broadcast by a non-Byzantine process. This dual fault model originated from our research on the reconciliation of local process states in distributed Byzantine-tolerant money transfer systems (a.k.a. cryptocurrencies), in which processes become temporarily disconnected. Several researchers have indeed pointed out the fundamental role that broadcast abstractions play in Byzantine money transfer systems (see, for instance, [5, 11, 12, 13, 15, 16]). This crucial role naturally leads to considering how Byzantine broadcast can be expanded to more volatile and dynamic settings, thus motivating our proposal to combine traditional Byzantine faults with a message adversary.
Acronyms | Meaning |
---|---|
BRB | Byzantine-tolerant reliable broadcast |
MA | Message adversary |
MBRB | Message adversary- and Byzantine-tolerant reliable broadcast |
Notations | Meaning |
number of processes in the network | |
upper bound on the number of Byzantine processes | |
power of the message adversary | |
effective number of correct processes in a run () | |
minimal nb of correct processes that mbrb-deliver a message | |
time complexity of MBRB | |
message complexity of MBRB |
The paper is made up of 5 sections.
-
•
Section 2 defines the computing model and the Message Adversary-Tolerant Byzantine Reliable Broadcast communication abstraction (or MBRB for short).
-
•
Section 3 presents a signature-based algorithm implementing the MBRB abstraction, proves it is correct, and evaluates its cost. When there is no message adversary, this algorithm is optimal from both Byzantine resilience and the number of communication steps333The signature-free BRB algorithm described in [7] is optimal with respect to Byzantine resilience (), but requires three communication steps, while the signature-free BRB algorithm described in [19] is optimal with respect to the number of communication steps (2) but is not with respect to Byzantine resilience (it requires )..
-
•
Section 4 shows that the condition is necessary and sufficient for implementing the MBRB communication abstraction (be the underlying system enriched with signatures or not).
-
•
Finally, Section 5 concludes the article.
2 Computing Model and MBRB Abstraction
2.1 Computing Model
Process model
The system is composed of asynchronous sequential processes denoted , …, . Each process has an identity, and all the identities are different and known by all processes. To simplify, we assume that is the identity of .
Regarding failures, up to processes can be Byzantine, where a Byzantine process is a process whose behavior does not follow the code specified by its algorithm [20, 22]. Let us notice that Byzantine processes can collude to fool the non-Byzantine processes (also called correct processes). Let us also notice that, in this model, the premature stop (crash) of a process is a Byzantine failure.
Moreover, given an execution, denotes the number of processes that effectively behave correctly in that execution. We always have . While this number remains unknown to correct processes, it is used in the following to analyze and characterize (more precisely than using its worse value ) the guarantees provided by the proposed algorithms.
Communication model
The processes communicate through a fully connected asynchronous point-to-point communication network. Although this network is assumed to be reliable—in the sense that it neither corrupts, duplicates, nor creates messages—it may nevertheless lose messages due to the actions of a message adversary (defined below).
Let msg be a message type and the associated value. A process can invoke the unreliable operation , which is a shorthand for “for all do to end for”. It is assumed that all the correct processes invoke to send messages. As we can see, the operation is not reliable. As an example, if the invoking process crashes during its invocation, an arbitrary subset of processes receive the message implementation message . Moreover, due to its very nature, a Byzantine process can send messages without using the macro-operation .
From a terminology point of view, at the system/network level, we say that messages are broadcast and received. Moreover, a message generated by the algorithm is said to be an implementation message (imp-message in short), while a message generated by the application layer is said to be an application message (app-message in short).
Message adversary
The notion of a message adversary (MA) was implicitly introduced in [27] (under the name transient faults and ubiquitous faults) and then used (sometimes implicitly) in many works (e.g., [2, 10, 26, 28, 29]). A short tutorial on message adversaries is presented in [23].
Let be an integer constant such that . The communication network is under the control of an adversary which eliminates imp-messages sent by processes, so that these imp-messages appear as being lost. More precisely, when a correct process invokes , the message adversary is allowed to arbitrarily suppress up to copies of the imp-message that were intended to correct processes. This means that, despite the fact the sender is correct, up to correct processes may miss the imp-message 444 A close but different notion was introduced by Dolev in [14] (and explored in subsequent works, such as [6]) which considers static -connected networks. If the adversary selects statically for each correct sender correct processes that do not receive this sender’s imp-messages, the proposed model includes Dolev’s model with ..
As an example, consider a set of correct processes, where , such that during some period of time, the adversary suppresses all the imp-messages sent to them. It follows that, during this period of time, this set of processes appears as a set of correct processes that are (unknowingly) input-disconnected from the other correct processes. Depending on the message adversary, the set may vary with time. Let us notice that corresponds to the weakest possible message adversary: it corresponds to a classical static system where some processes are Byzantine but no imp-message is lost (the network is fully reliable).
Let us remark that this type of message adversary is stronger, and therefore covers, the more specific case of silent churn, in which processes (nodes) may decide to disconnect from the network. While disconnected, such a process silently pauses its algorithm (a legal behavior in our asynchronous model), and is implicitly moved (by the adversary) to the adversary-defined set. Upon coming back, the node resumes its execution, and is removed from by the adversary.555So the notion of a message adversary implicitly includes the notion of imp-message omission failures.
Informally, in a silent churn environment, a correct process may miss imp-messages sent by other processes while it is disconnected from the network. The adjective “silent” in silent churn expresses the fact that no notification is sent on the network by processes whenever they leave or join the system: there is no explicit “attendance list” of connected processes, and processes are given no information on the status (connected/disconnected) of their peers. In this regard, the silent churn model diverges from the classical approach when designing dynamic distributed systems, in which processes send imp-messages on the network notifying their connection or disconnection [16]. The silent churn model is a good representation of real-life large-scale peer-to-peer systems, where peers leaving the network typically do so in a completely silent manner (i.e., without warning other peers).
Let us also observe that silent churn allows us to model input-disconnections due to process mobility. When a process moves from one location to another, the sender’s broadcasting range may not be large enough to ensure that the moving process remains input-connected. An even more prosaic example would be one where a user simply turns off their device, or disable its Internet connection, preventing it from receiving or sending any further imp-messages. In this context, we consider that the message adversary removes all the incoming imp-messages from the corresponding process until the device reconnects.
Let us mention that the loss of imp-messages caused by a message adversary may be addressed using a reliable unicast protocol. These protocols were originally introduced to provide reliable channels on top of an unreliable network subject to imp-message losses. The principle is simple: the sender keeps sending idempotent imp-messages at regular intervals through an unreliable channel until it receives an acknowledgement from the receiver. This principle notoriously lies at the core of the Transmission Control Protocol (TCP), although with important practical adaptations, as TCP uses timeouts to close a malfunctioning or otherwise idle connection, typically after a few minutes.
But because there is no way to detect that a process has crashed or disconnected in an asynchronous environment, an ideal reliable unicast protocol (i.e. one that keeps on re-transmitting until success) needs to treat disconnected processes the same way as slow processes or as if there were packet losses in the network: the sender will thus potentially send infinitely many imp-messages to a disconnected receiver. To overcome this issue, some works leverage causal dependencies to avoid resending old imp-messages: if an acknowledgement is received by the sender for a given imp-message, then it can stop resending the imp-messages that causally precede this imp-message and that have not been acknowledged yet (e.g. [13]). However, this approach still assumes that eventually, every communication channel lets some imp-messages pass, which is not always the case in our model, where the message adversary can permanently sever up to channels.
Digital signatures
We assume the availability of an asymmetric cryptosystem to sign data (in practice, imp-messages) and verify its authenticity. We assume signatures are secure, and therefore that the computing power of the adversary is bounded. Every process in the network has a public/private key pair. We suppose that the public keys are known to everyone, and that the private keys are kept secret by their owner. Everyone also knows the mapping between any process’ identity and its public key. Additionally, we suppose that each process can produce at most one signature per imp-message.
The signatures are used to cope with the net effect of the Byzantine processes and the fact that imp-messages broadcast (sent) by correct processes can be eliminated by the message adversary. A noteworthy advantage of signatures is that, despite the unauthenticated nature of the point-to-point communication channels, signatures allow correct processes to verify the authenticity of imp-messages that have not been directly received from their initial sender, but rather relayed through intermediary processes. Signatures provide us with a network-wide non-repudiation mechanism: if a Byzantine process issues two conflicting imp-messages to two different subsets of correct processes, then the correct processes can detect the malicious behavior by disclosing to each other the Byzantine signed imp-messages.666 The fact that the algorithm uses signed imp-messages does not mean that MBRB-broadcast requires signatures to be implemented, see [4].
2.2 Message Adversary-Tolerant Byzantine Reliable Broadcast (MBRB)
This paper introduces a new broadcast abstraction we have called Message Adversary-Tolerant Byzantine Reliable Broadcast (MBRB for short.) The MBRB communication abstraction is composed of two matching operations, denoted and . It considers that an identity (sender identity, sequence number) is associated with each app-message, and assumes that any two app-messages mbrb-broadcast by the same correct process have different sequence numbers. Sequence numbers are one of the most natural ways to design “multi-shot” reliable broadcast algorithms, that is, algorithms where the broadcast operation can be invoked multiple times with different app-messages.
When, at the application level, a process invokes , where is the app-message and the associated sequence number, we say “mbrb-broadcasts ”. Similarly, when invokes , where is the sender process, we say “mbrb-delivers ”. We say that the app-messages are mbrb-broadcast and mbrb-delivered (while, as said previously, the imp-messages algorithm are broadcast and received).
Correctness specification
Because of the message adversary, we cannot always guarantee that an app-message mbrb-delivered by a correct process is eventually mbrb-delivered by all correct processes. Hence, in the MBRB specification, we introduce a variable which indicates the strength of the global delivery guarantee of the primitive: if one correct process mbrb-delivers an app-message then correct processes eventually mbrb-deliver this app-message.777If there is no message adversary (i.e., ), we should have . The MBRB-broadcast abstraction is defined by the following properties:
-
•
Safety:
-
–
MBRB-Validity (no spurious message). If a correct process mbrb-delivers an app-message from a correct process with sequence number , then mbrb-broadcast with sequence number .
-
–
MBRB-No-duplication. A correct process mbrb-delivers at most one app-message from a process with sequence number .
-
–
MBRB-No-duplicity. No two different correct processes mbrb-deliver different app-messages from a process with the same sequence number .
-
–
-
•
Liveness:
-
–
MBRB-Local-delivery. If a correct process mbrb-broadcasts an app-message with sequence number , then at least one correct process eventually mbrb-delivers from with sequence number .
-
–
MBRB-Global-delivery. If a correct process mbrb-delivers an app-message from a process with sequence number , then at least correct processes mbrb-deliver from with sequence number .
-
–
It is implicitly assumed that a correct process does not use the same sequence number twice. Let us observe that, since at the implementation level the message adversary can always suppress all the imp-messages sent to a fixed set of processes, the best-guaranteed value for is . Furthermore, let us notice that the constraint prevents the message adversary from partitioning the system.
Performance metrics
In addition to the correctness specification, we define two metrics that capture the performance of an algorithm implementing the MBRB specification: and , which respectively denote the communication step complexity and the imp-message complexity of the algorithm. They are defined as follows:
-
•
MBRB-Time-cost. If a correct process mbrb-broadcasts an app-message with sequence number , then correct processes mbrb-deliver from with sequence number in at most communication steps.
-
•
MBRB-Message-cost. The mbrb-broadcast of an app-message by a correct process entails the sending of at most imp-messages by correct processes.
Byzantine Reliable Broadcast (BRB)
If (obtained when ), the previous specification boils down to Bracha’s seminal specification [7], which defines the Byzantine reliable broadcast (BRB) communication abstraction. Hence, the BRB abstraction is a sub-case of MBRB.
3 A Signature-based Algorithm Implementing the MBRB Abstraction
This section presents Algorithm 1, which implements the MBRB communication abstraction in an asynchronous setting under the constraint . When considering , this algorithm provides both -tolerance optimality (as in [7]) and step optimality (as in [19]): it only assumes , and guaranteed mbrb-delivery of an app-message in only two communication steps888Signature-based BRB in only two communication steps is a known result [1], however, to the best of our knowledge, no existing BRB algorithm tolerates message adversaries as well as ours.. It follows that signatures can help save one communication step compared to classical signature-free BRB algorithms that assume . Algorithm 1 fulfills the MBRB-Global-delivery property with under the following assumption:
-
•
mbrb-Assumption: .
3.1 Preliminaries
Implementation message types
The algorithm uses only one imp-message type, bundle, that carries the signatures backing a given app-message , along with ’s content, sequence number, and emitter. bundle imp-messages propagate through the network using controlled flooding.
Local data structures
Each (correct) process saves locally the valid signatures (i.e., the signed fixed-size digests of a certain data) that it has received from other processes using bundle imp-messages. Each signature “endorses” a certain app-message . When certain conditions are met (described below), a process further broadcasts in a bundle imp-message all signatures it knows for a given triplet . A correct process saves at most one signature for a given triplet per signing process .
Time measurement
For the proofs related to MBRB-Time-cost (Lemmas 7-10), we assume that the duration of local computations is negligible compared to that of imp-message transfer delays, and consider them to take zero time units. As the system is asynchronous, the time is measured under the traditional assumption that all the imp-messages have the same transfer delay.
3.2 Algorithm
At a high level, Algorithm 1 works by producing, forwarding, and accumulating witnesses of an initial mbrb-broadcast operation, until a large-enough quorum is observed by at least one correct process, at which point this quorum is propagated in one final unreliable operation.
Witnesses take the form of signatures for a given triplet , where is the app-message, its associated sequence number and the identity of the sender (which also produces a signature for ). Signatures serve to ascertain the provenance and authenticity of these propagated bundle imp-messages, thus providing a key ingredient to tolerate the limited reliability of the underlying network. They also authenticate the invoker of the operation, and finally, in the last phase of the algorithm, they allow the propagation of a cryptographic proof that a quorum has been reached, thereby ensuring that enough correct processes eventually mbrb-deliver the app-message that was mbrb-broadcast.
operation is (1) save signature for by ; (2) . when is do (3) if ( not already mbrb-delivered contains the valid signature for by ) then (4) save all unsaved valid signatures for of ; (5) if ( not already signed by ) then (6) save signature for by ; (7) (8) end if; (9) if (strictly more than signatures for are saved) then (10) ; (11) (12) end if (13) end if.
In more detail, when a (correct) process invokes , it builds and signs the triplet to guarantee its non-repudiation, and saves locally the resulting signature (line 1). Next, broadcasts the bundle imp-message containing the signature that it just produced (line 1).
When a correct process receives a imp-message, it first checks if no app-message has already been mbrb-delivered for the given sequence number and sender , and if signed the app-message (line 1). If this condition is satisfied, saves all the new valid signatures inside the set (line 1). Next, creates and saves its own signature for and then broadcasts it in a bundle imp-message, if it has not already done so previously (line 1-1). Finally, if has saved a quorum of strictly more than signatures for the same triplet , it broadcasts a bundle imp-message containing all these signatures and mbrb-delivers the triplet (lines 1-1).999The pseudo-code presented in Algorithm 1 favors readability, and is therefore not fully optimized. For instance, in some cases, a process might unreliably broadcast exactly the same content at lines 1 and 1. This could be avoided by either using an appropriate flag, or by tracking and preventing the repeated broadcast of identical bundle imp-messages.
Remark
The reader can notice that the system parameters and appear in the algorithm, whereas the system parameter does not. Naturally, they all explicitly appear in the proof.
3.3 Algorithm proof
This section proves the correctness and performance properties of MBRB.
Theorem 1.
If the mbrb-Assumption is satisfied, Algorithm 1 implements the mbrb-broadcast of an app-message by a correct process with the following guarantees:
-
•
correct processes,
-
•
communication steps,
-
•
imp-messages.
The proof follows from the next lemmas.
Lemma 1 (MBRB-Validity).
If a correct process mbrb-delivers from a correct process with sequence number , then has previously mbrb-broadcast with sequence number .
Proof.
If a correct process mbrb-delivers (where is correct) at line 1, then it has passed the condition at line 1, which means that it must have witnessed a valid signature for by . Since signatures are secure, the only way to create this signature is for to execute the instruction at line 1, during the invocation. ∎
Lemma 2 (MBRB-No-duplication).
A correct process mbrb-delivers at most one app-message from a process with a given sequence number .
Proof.
This property derives trivially from the condition at line 1. ∎
Lemma 3 (MBRB-No-duplicity).
No two different correct processes mbrb-deliver different app-messages from a process with the same sequence number .
Proof.
Let us consider two correct processes and which respectively mbrb-deliver and . Due to the condition at line 1, and must have saved (and thus received) two sets and containing strictly more than signatures for and , respectively. We thus have and .
As we have , and have at least one correct process in common, which must have signed both and . But before signing at line 1 or 1, checks that it did not sign a different app-message from the same sender and with the same sequence number, whether it be implicitly during a invocation or at line 1. Thereby, is necessarily equal to . ∎
Lemma 4 (MBRB-Local-delivery).
If a correct process mbrb-broadcasts an app-message with sequence number , then at least one correct process mbrb-delivers from with sequence number .
Proof.
If a correct process mbrb-broadcasts , then it broadcasts its own signature for in a message at line 1. As is correct, it does not sign another triplet where , therefore it is impossible for a correct process to mbrb-deliver at line 1, because it cannot pass the condition at line 1.
Let us denote by the set of correct processes that receive a message at least once. The first one of such bundle messages that a process of receives can be the one initially broadcast at line 1, but it can also be a bundle message broadcast by a correct process at lines 1 or 1, or it can even be a bundle message sent by a Byzantine process. In any case, the first time the processes of receive such a bundle message, they pass the condition at line 1, and they also pass the condition at line 1, except for if it belongs to . Consequently, each process of necessarily broadcasts its own signature for in a message.
By construction of the algorithm, the set of correct processes that receive a message is equal to the set of correct processes that broadcast a . By the definition of the message adversary, a message broadcast by a correct process is eventually received by at least correct processes. Hence, the minimum number of signatures for made by processes of that is also received by processes of globally is . It follows that a given process of individually receives on average the distinct signatures of at least processes of .
From mbrb-Assumption, we have (as ). As a result, at least one process of (ergo one correct process) receives a set (in possibly multiple bundle messages) of strictly more than valid distinct signatures for . When receives the last signature of , there are two cases:
-
•
Case if does not pass the condition at line 1.
As processes of are correct, then when they broadcast a message, they necessarily include in , which implies that is necessarily in . Therefore, if does not pass the condition at line 1, it is because already mbrb-delivered some . But let us remind that, as is correct, it is impossible for to mbrb-deliver anything different from . Therefore, has already mbrb-delivered .
-
•
Case if passes the condition at line 1.
Lemma 5 (MBRB-Global-delivery).
If a correct process mbrb-delivers an app-message from with sequence number , then at least correct processes mbrb-deliver from with sequence number .
Proof.
If a correct process mbrb-delivers at line 1, it must have saved a set of strictly more than valid distinct signatures because of the condition at line 1. Let us remark that necessarily contains the signature for by because of the condition at line 1. Additionally, must also have broadcast at line 1, that, by definition of the message adversary, is received by a set of at least correct processes. For each process of :
- •
- •
Therefore, all processes of (which, as a reminder, are at least ) necessarily mbrb-deliver at line 1. ∎
Lemma 6.
.
Proof.
We have the following:
(by definition of ) | ||||
(by mbrb-Assumption) | ||||
Lemma 7.
If a correct process mbrb-broadcasts , then at least correct processes mbrb-deliver at most two communication steps later.
Proof.
If a correct process mbrb-broadcasts , then it broadcasts its own signature for in a imp-message at line 1. Let us denote by the set of correct processes that receive this imp-message from during the same communication step, and let be the number of processes in , such that (by definition of the message adversary). By construction of the algorithm, every process of passes the condition at line 1, and therefore broadcasts a imp-message, whether it be at line 1 for , or at line 1 for any other process of .
Let and define two partitions of the set of all correct processes ( is the set of all correct processes, and ). denotes the set of correct processes that receive strictly more than signatures for from processes of two communication steps after mbrb-broadcast , while denotes the set of remaining correct processes of that receive at most signatures for from processes of two communication steps after mbrb-broadcast . Let be the size of : . By construction, . Let and respectively denote the number of signatures for from processes of received by processes of and at most two communication steps after mbrb-broadcast . Figure 1 represents the distribution of such signatures among processes of , sorted by decreasing number of signatures received. Each processes of can receive at most signatures (that is, all signatures) from processes of , while each process of can receive at most signatures from processes of two communication steps after mbrb-broadcasts . For the sake of simplicity, we use in the place of in some parts of this proof.
From these observations, we infer the following inequalities:
By the definition of the message adversary, a imp-message broadcast by a correct process is eventually received by at least correct processes. As a consequence, in total, the minimum number of signatures for collectively received by correct processes as a result of broadcasts by processes in in the first two asynchronous communication steps is . We thus have:
By combining the previous inequalities, we obtain:
(1) |
By Lemma 6, we know that , so we can rewrite (1) into:
(2) |
Let us define a function such that . As we seek the lowest guaranteed value for , we want to find the minimum of on . To this end, let us first study the derivative of . The image is of the form , so we have:
As and are by definition positive, we know that is positive, or null when . Therefore, is monotonically increasing on , and the minimum value for can be found when is also minimum, that is, when . Thus, when we replace by in (2), we obtain:
(3) |
Let us denote by the minimum number of correct processes that receive a quorum of strictly more than valid distinct signatures for two communication steps after mbrb-broadcast , such that . As the right hand side of (3) is not always an integer, we have:
(as ) | ||||
(by definition of ) |
Hence, at least processes of receive strictly more than valid distinct signatures for two communication steps after mbrb-broadcasts . For every process of :
-
•
If does not pass the condition at line 1 after receiving the last signature of the quorum in a bundle imp-message, it is necessarily because already mbrb-delivered some , because processes of are correct and all their bundle imp-messages include the signature for by . But let us remind that, as the sender is correct, it is impossible for to mbrb-deliver anything different from . Therefore, has already mbrb-delivered at line 1.
- •
Therefore, all processes of , which are at least , mbrb-deliver at line 1 at most two communication steps after mbrb-broadcast . ∎
Lemma 8.
If a correct process mbrb-broadcasts and , then at least correct processes mbrb-deliver at most three communication steps later.
Proof.
Let us assume that a correct process mbrb-broadcasts and that . Process must unreliably broadcast a first imp-message (where is the signature of by ) at line 1. This initial imp-message is received by at least other correct processes, due to our assumption on the message adversary. This counts for a first communication step.
In the second communication step, each process of these correct processes unreliably broadcasts its own imp-message (where is the signature of by ) at line 1. At the end of the second communication step, in total, at least distinct signatures for have been created and unreliably broadcast by correct processes (counting that of ), resulting in at least receptions of said signatures by correct processes. As there are correct processes, this means that, on average, each correct process has received at least signatures by the end of the second communication step, and that at least one correct process, , receives (and saves at line 1) at least this number of signatures.
From the Lemma hypothesis and using simple algebraic transformations, we can derive . Therefore, reaches a quorum of signatures, that is, it passes the condition at line 1 and unreliably broadcast this quorum of signatures at line 1, two communication steps after the mbrb-broadcast of by . By definition of the message adversary, this quorum of signatures is received by correct processes, which save it at line 1 and thus pass the condition at line 1 and finally mbrb-deliver at line 1, three communication steps after the mbrb-broadcast of by . ∎
Lemma 9 (MBRB-Time-cost).
If a correct process mbrb-broadcasts an app-message with sequence number , then correct processes mbrb-deliver from with sequence number at most
communication steps later.
Proof.
Let us consider a correct process that mbrb-broadcasts . By exhaustion:
-
•
Case where .
-
•
Case where .
Lemma 8 applies and at least correct processes mbrb-deliver at most three communication steps after has mbrb-broadcast . ∎
Lemma 10 (MBRB-Message-cost).
The mbrb-broadcast of an app-message by a correct process entails the sending of at most imp-messages by correct processes.
Proof.
The broadcast of an imp-message by a correct process at line 1 entails its forwarding by at most other correct processes at line 1. As each broadcast by correct process corresponds to the sending of imp-messages, then at most imp-messages are sent in a first step.
In a second step, at least one correct process reaches a quorum of signatures and passes the condition at line 1, and then broadcasts this quorum of signatures at line 1. Upon receiving this quorum, every correct process also passes the condition at line 1 (if it has not done it already) and broadcasts the imp-message containing the quorum at line 1. Hence, at most imp-messages are also sent in this second step, which amounts to a maximum of imp-messages sent in total. ∎
An additional property
The reader can check from the previous proofs that the algorithm satisfies the following MBRB-delivery property. If there is a set of correct processes, , such that there is a finite time after which the message adversary never eliminates the imp-messages sent to them, then, after , each process of mbrb-delivers all the app-messages mbrb-broadcast by correct processes.
4 A Tightness Bound
Definition
An algorithm implementing a broadcast communication abstraction is event-driven if, as far as the correct processes are concerned, only (i) the invocation of the broadcast operation that is provided to the application by the broadcast communication abstraction, or (ii) the reception of an imp-message—sent by a correct or a Byzantine process—can generate the sending of imp-messages (using the underlying unreliable network-level operation).
Theorem 2 (MBRB-Necessary-condition).
When , there is no event-driven (signature-free or signature-based) algorithm implementing the MBRB communication abstraction on top of an -process asynchronous system in which up to processes may be Byzantine and where a message adversary may suppress up to copies of each imp-message broadcast by a correct process.101010Let us recall that the underlying communication operation offered by the system is an unreliable broadcast defined in Section 2.1.
Proof.
Without loss of generality the proof considers the case . Let us partition the processes into five sets , , and , such that and .111111For the case , the partition is such that and . So, when considering the sets , , and , there are executions in which all the processes of either or or can be Byzantine, while the processes of the two other sets are not.
The proof is by contradiction. So, assuming that there is an event-driven algorithm that builds the MBRB-broadcast abstraction for , let us consider an execution of in which the processes of , , , and are not Byzantine while all the processes of are Byzantine.
Let us observe that the message adversary can isolate up to processes by preventing them from receiving any imp-messages. Without loss of generality, let us assume that the adversary isolates a set of correct processes not containing the sender of the app-message. As is event-driven, these isolated processes do not send imp-messages during the execution of . As a result, no correct process can expect imp-messages from more than different processes without risking being blocked forever. Thanks to the mbrb-Assumption , this translates as “no correct process can expect imp-messages from more than different processes without risking being blocked forever”.
In the execution , the (Byzantine) processes of simulate the mbrb-broadcast of an app-message such that this app-message appears as being mbrb-broadcast by one of them and is mbrb-delivered as the app-message to the processes of (hence the processes of appear, to the processes of , as if they were correct) and as the app-message to the processes of (hence, similarly to the previous case, the processes of appear to the processes of as if they were correct). Let us call -messages (resp., -messages) the imp-messages generated by the event-driven algorithm that entails the mbrb-delivery of (resp., ). Moreover, the execution is such that:
-
•
concerning the -messages: the message adversary suppresses all the -messages sent to the processes of , and asynchrony delays the reception of all the -messages sent to until some time defined below.121212 Equivalently, we could also say that asynchrony delays the reception of all the -messages sent to until time . The important point is here that, due to the assumed existence of Algorithm , the processes of and and mbrb-deliver with -messages from at most different processes. So, as , Algorithm A will cause the processes of and to mbrb-deliver .131313 Let us notice that this is independent from the fact that the processes in are Byzantine or not.
-
•
concerning the -messages: the message adversary suppresses all the -messages sent to the processes of , and the asynchrony delays the reception of all the -messages sent to until time . As previously, as , Algorithm will cause the processes of and to mbrb-deliver .
-
•
Finally, the time occurs after the mbrb-delivery of by the processes of and , and after the mbrb-delivery of by the processes of and .
It follows that different non-Byzantine processes mbrb-deliver different app-messages for the same mbrb-broadcast (or a fraudulent simulation of it) issued by a Byzantine process (with possibly the help of other Byzantine processes). This contradicts the MBRB-No-Duplicity property, which concludes the proof of the theorem. ∎
Theorem 3 (Algorithm optimality).
Considering an asynchronous -process system in which up to processes can be Byzantine and where a -message adversary can suppress imp-messages, Algorithm 1 is optimal with respect to the pair of values .
5 Conclusion
This article has presented a new communication abstraction (denoted MBRB) that extends Byzantine reliable broadcast (as defined by Bracha and Toueg [7, 8]) to systems where, at the underlying implementation level, an adversary may suppress some subset of implementation messages used by the processes to co-operate. From a practical point of view, this kind of message loss captures phenomena such as silent churn, input-disconnection, etc. A signature-based algorithm implementing the corresponding Byzantine-tolerant reliable broadcast in the presence of a message adversary has been presented and proven correct. This algorithm assumes (where is the number of processes, is the maximum number of Byzantine processes, and is an upper bound on the power of the message adversary), which has been shown to be a necessary and sufficient condition. message adversary),
When there is no message adversary, this algorithm is optimal from both Byzantine resilience and the number of communication steps. These properties are also satisfied in other circumstances including a message adversary whose power is restricted to some well-defined threshold.
Acknowledgments
This work was partially supported by the French ANR projects ByBloS (ANR-20-CE25-0002-01) and PriCLeSS (ANR-10-LABX-07-81) devoted to the modular design of building blocks for large-scale Byzantine-tolerant multi-users applications. The authors want to thank Colette Johnen, Elad Schiller, and Stefan Schmid for their kind invitation to participate in the SSS 2021 conference.
References
- [1] Abraham I., Nayak K., Ren L., and Xiang Z., Good-case latency of Byzantine broadcast: a complete categorization. Proc. 40th ACM Symposium on Principles of Distributed Computing (PODC’21), ACM Press, pp. 331-341 (2021) (arXiv:2102.07240v2)
- [2] Afek Y. and Gafni E., Asynchrony from synchrony. Proc. Int’l Conference on Distributed Computing and Networking (ICDCN’13), Springer LNCS 7730, pp. 225-239, (2013)
- [3] Albouy T., Frey D., Raynal M., and Taïani F., Byzantine-tolerant reliable broadcast in the presence of silent churn (Invited Talk). Proc. 23th Int’l Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’21) Springer LNCS 13046, pp. 21-33 (2021)
- [4] Albouy T., Frey D., Raynal M., and Taïani F., -cast: on the foundations of Byzantine reliable broadcast in the presence of message adversaries. (May 2022) (arXiv:2204.13388)
- [5] Auvolat A., Frey D., Raynal M., and Taïani F., Money Transfer Made Simple: a Specification, a Generic Algorithm, and its Proof. Bulletin of the EATCS, 132 (2020)
- [6] Bonomi S., Decouchant J., Farina G., Rahli V., and Tixeuil S., Practical Byzantine Reliable Broadcast on Partially Connected Networks. 41th IEEE International Conference on Distributed Computing Systems, ICDCS 2021, IEEE, pp. 506-516 (2021)
- [7] Bracha G., Asynchronous Byzantine agreement protocols. Information & Computation, 75(2):130-143 (1987)
- [8] Bracha G. and Toueg S., Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824-840 (1985)
- [9] Cachin Ch., Guerraoui R., and Rodrigues L., Reliable and secure distributed programming, Springer, 367 pages, ISBN 978-3-642-15259-7 (2011)
- [10] Charron-Bost B., and Schiper A., The heard-of model: computing in distributed systems with benign faults. Distributed Computing, 22(1):49-71 (2009)
- [11] Cohen S., and Keidar I., Tame the Wild with Byzantine Linearizability: Reliable Broadcast, Snapshots, and Asset Transfer. Proc. 35rd Int’l Symposium on Distributed Computing (DISC’21), pp. 18:1-18:18 (2021)
- [12] Collins D., Guerraoui R., Komatovic J., Kuznetsov P., Monti M., Pavlovic M., Pignolet Y.-A., Seredinschi D.-A., Tonkikh A., and Xygkis A., Online Payments by Merely Broadcasting Messages. Proc. 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020), pp. 26-38 (2020)
- [13] Danezis G., Kokoris-Kogias L., Sonnino A., and Spiegelman A., Narwhal and Tusk: a DAG-based mempool and efficient BFT consensus. Proc. 17th European Conference on Computer Systems (EUROSYS’22), ACM Press, pp. 34-50 (2022)
- [14] Dolev D., The Byzantine generals strike again. Journal of Algorithms, 3:14-20 (1982)
- [15] Guerraoui R., Kuznetsov P., Monti M., Pavlovic M., and Seredinschi D.-A., The Consensus Number of a Cryptocurrency. Proc. 38th ACM Symposium on Principles of Distributed Computing (PODC’19), ACM Press, pp. 307-316 (2019)
- [16] Guerraoui R., Komatovic J., Kuznetsov P., Pignolet P.A., Seredinschi D.-A., and Tonkikh A., Dynamic Byzantine reliable broadcast. Proc. 24th Int’l Conference on Principles of Distributed Systems (OPODIS’20), LIPIcs Vol. 184, Article 23, 18 pages (2020)
- [17] Guerraoui G., Kuznetsov P., Monti M., Pavlovic M., and Seredinschi D.-A., Scalable Byzantine reliable broadcast. Proc. 33rd Int’l Symposium on Distributed Computing (DISC’19), LIPIcs Vol. 146, Article 22, 16 pages (2019)
- [18] Hirt M., Kastrato A., and Liu-Zhang C.-D., Multi-threshold asynchronous reliable broadcast and consensus. Proc. 24th Int’l Conference on Principles of Distributed Systems (OPODIS’20), LIPICs Vol. 184, Article 6, 16 pages (2020)
- [19] Imbs D. and Raynal M., Trading -resilience for efficiency in asynchronous Byzantine reliable broadcast. Parallel Processing Letters, Vol. 26(4), 8 pages (2016)
- [20] Lamport L., Shostack R., and Pease M., The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3)-382-401, (1982)
- [21] Nayak K., Ren L., Shi E., Vaidya N.H., and Xiang Z., Improved extension protocols for Byzantine broadcast and agreement. Proc. 34rd Int’l Symposium on Distributed Computing (DISC’20), LIPIcs Vol. 179, Article 28, 16 pages (2020)
- [22] Pease M., Shostak R., and Lamport L., Reaching agreement in the presence of faults. Journal of the ACM, 27:228-234 (1980)
- [23] Raynal M., Message adversaries. Encyclopedia of Algorithms, Springer (2015)
- [24] Raynal M., Fault-tolerant message-passing distributed systems: an algorithmic approach. Springer, 480 pages, ISBN 978-3-319-94140-0 (2018)
- [25] Raynal M., On the versatility of Bracha’s Byzantine reliable broadcast algorithm. Parallel Processing Letters, 31(3), 2150006 (9 pages) (2021)
- [26] Raynal M. and Stainer J., Synchrony weakened by message adversaries vs asynchrony restricted by failure detectors. Proc. 32nd ACM Symposium on Principles of Distributed Computing (PODC’13), ACM Press, pp. 166-175 (2013)
- [27] Santoro N. and Widmayer P., Time is not a healer. Proc. 6th Annual Symposium on Theoretical Aspects of Computer Science (STACS’89), Springer LNCS 349, pp. 304-316 (1989)
- [28] Santoro N. and Widmayer P., Agreement in synchronous networks with ubiquitous faults. Theoretical Computer Science, 384(2-3): 232-249 (2007)
- [29] Tseng L., Zhang Q., Kumar S., and Zhang Y., Exact Consensus under Global Asymmetric Byzantine Links. Proc. 40th IEEE International Conference on Distributed Computing Systems (ICDCS 2020), pp. 721-731 (2020)