This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asynchronous Byzantine Approximate Consensus in Directed Networks

Dimitris Sakavalas [email protected] Lewis Tseng [email protected] Boston CollegeUSA  and  Nitin H. Vaidya Georgetown UniversityUSA [email protected]
(2018)
Abstract.

In this work, we study the approximate consensus problem in asynchronous message-passing networks where some nodes may become Byzantine faulty. We answer an open problem raised by Tseng and Vaidya, 2012, proposing the first algorithm of optimal resilience for directed networks. Interestingly, our results show that the tight condition on the underlying communication networks for asynchronous Byzantine approximate consensus coincides with the tight condition for synchronous Byzantine exact consensus. Our results can be viewed as a non-trivial generalization of the algorithm by Abraham et al., 2004, which applies to the special case of complete networks. The tight condition and techniques identified in the paper shed light on the fundamental properties for solving approximate consensus in asynchronous directed networks.

approximate consensus, asynchronous networks, network topology, Byzantine adversary
copyright: acmcopyrightjournalyear: 2018doi: 10.1145/1122445.1122456conference: PODC ’20: ACM Symposium on Principles of Distributed Computing; August 03–07, 2020; Salerno, Italybooktitle: PODC ’18: ACM Symposium on Principles of Distributed Computing, August 03–07, 2020, Salerno, Italyprice: 15.00isbn: 978-1-4503-XXXX-X/18/06ccs: Theory of computation Distributed algorithms

1. Introduction

The extensively studied fault-tolerant consensus problem (Pease et al., 1980) is a fundamental building block of many important distributed computing applications (Lynch, 1996). The FLP result (Fischer et al., 1985b) states that it is impossible to achieve exact consensus in asynchronous networks where nodes may crash (exact consensus requires nonfaulty nodes to reach an agreement on an identical value). The FLP impossibility result led to the study of weaker variations, including approximate consensus (Dolev et al., 1986). With approximate consensus, nonfaulty nodes only need to output values that are within ϵ\epsilon of each other for a given ϵ>0\epsilon>0. Practical applications of approximate consensus range from sensor fusion (Benediktsson and Swain, 1992) and load balancing (Cybenko, 1989), to natural systems like flocking (Vicsek et al., 1995) and opinion dynamics (Hegselmann and Krause, 2002). The feasibility of achieving consensus depends on the type of faults considered in the system. The literature has mainly focused on crash and Byzantine faults, the latter being the worst case since the misbehavior of faults may be arbitrary. In this work, we focus on the asynchronous Byzantine approximate consensus problem under the existence of at most ff faults.

Another important parameter affecting the feasibility is the topology of the underlying communication network G=(V,E)G=(V,E) in which nodes represent participants that reliably exchange messages through edges. The relation between network topology and feasibility in undirected networks was studied shortly after the introduction of the respective problems (e.g., (Lynch, 1996; Dolev, 1982)). For |V|=n|V|=n, connectivity κ(G)\kappa(G) of the network and upper bound ff on the number of faults, Table 1 summarizes the well-known necessary and sufficient topological conditions for achieving exact consensus and approximate consensus in various settings where GG is undirected. In undirected networks, satisfying the necessary graph conditions in Table 1 also implies feasibility of reliable message transmission (RMT) (cf. (Dolev et al., 1993)), which can be exploited to simulate algorithms designed for complete networks.

Crash fault Byzantine fault
Synchronous system
(exact consensus)
n>fn>f and κ(G)>f\kappa(G)>f
(Lynch, 1996)
n>3fn>3f and κ(G)>2f\kappa(G)>2f
(Dolev, 1982)
Asynchronous system
(approximate consensus)
n>2fn>2f and κ(G)>f\kappa(G)>f
(Lynch, 1996)
n>3fn>3f and κ(G)>2f\kappa(G)>2f
(Fischer et al., 1985a; Abraham et al., 2004; Dolev et al., 1993)
Table 1. Necessary and Sufficient Conditions for Undirected Graphs

The study of consensus in directed graphs is largely motivated by wireless networks wherein different nodes may have different transmission range, resulting in directed communication links. While the necessary and sufficient conditions for undirected graphs have been known for many years, their generalizations for directed graphs appeared only after 2012, e.g., (Tseng and Vaidya, 2012, 2015; LeBlanc et al., 2013; Vaidya et al., 2012). This is mainly due to the fact that no direct relation appears between reliable message transmission and consensus in directed graphs.

As Table 2 summarizes, for directed graphs, Tseng and Vaidya (Tseng and Vaidya, 2015, 2012) obtained necessary and sufficient conditions for solving consensus in the presence of crash faults in synchronous and asynchronous systems both. However, they were able to obtain such conditions for Byzantine faults only for synchronous systems. The determination of a tight condition for the asynchronous Byzantine model remains open since 2012. This paper closes this gap in the results. We identify a family of new conditions which we prove equivalent to the ones obtained in (Tseng and Vaidya, 2012, 2015), offering an important intuition, which essentially leads to the answer of this open question. Our condition family consists of 1-reach, 2-reach and 3-reach conditions, which are later defined in Section 2.111The general k-reach condition family, presented in the appendix, encompasses conditions 1-reach, 2-reach, 3-reach and may be of further interest. Results from (Tseng and Vaidya, 2012, 2015) imply that the 3-reach condition is tight for exact Byzantine consensus in synchronous systems. A key contribution of this paper is to show that the 3-reach condition is also necessary and sufficient for asynchronous Byzantine consensus in directed graphs.

Crash fault Byzantine fault
Synchronous system
(Exact consensus)
1-reach condition (see Section 2)
Tseng and Vaidya 2015 (Tseng and Vaidya, 2015)
3-reach condition (see Section 2)
Tseng and Vaidya 2015 (Tseng and Vaidya, 2015)
Asynchronous system
(Approximate consensus)
2-reach condition (see Section 2)
Tseng and Vaidya 2012, 2015 (Tseng and Vaidya, 2012, 2015)
3-reach condition (this paper)
open problem since 2012
Table 2. Necessary and Sufficient Conditions for Directed Graphs

Essentially, obtaining the tight graph conditions for directed graphs is much more difficult than the undirected case, since consensus may be possible even if reliable message transmission (RMT) is not possible between every pair of nodes. This is unlike the case of undirected graphs, as observed previously. For instance, Figure 1(a) presents an undirected network, where synchronous exact Byzantine consensus is possible for f=1f=1. In this graph, all-pair RMT is possible, since κ(G)>2f\kappa(G)>2f allows any pair of nodes to communicate through at least 2f+1=32f+1=3 disjoint paths. Note that removing any edge will reduce κ(G)\kappa(G), which will make both RMT and consensus impossible. Such an all-pair RMT is not necessary in directed graphs. In particular, Figure 1(b) shows a network that satisfies the 3-reach condition (stated later in Section 2) – this network includes two cliques, each containing 7 nodes, and eight additional directed edges as shown (edges within each clique are not shown in the figure). Observe that there are pairs of nodes (e.g., v1v_{1} and w1w_{1}) that are connected via only 2f=42f=4 disjoint paths. Clearly, all-pair RMT is not feasible in this case but consensus can still be achieved, as shown by (Tseng and Vaidya, 2012) and our results. The difficulty posed by directed graphs is further compounded by asynchrony. In this work, we show that the 3-reach condition is necessary and sufficient for asynchronous Byzantine approximate consensus in directed graphs – note that this condition is identical to that proved by Tseng and Vaidya (Tseng and Vaidya, 2015) for synchronous Byzantine exact consensus.

Refer to caption
(a) Byzantine exact consensus feasible for f=1f=1
Refer to caption
(b) Byzantine exact consensus feasible for f=2f=2
Figure 1. Example graphs allowing synchronous exact Byzantine consensus.
Related work

Additional related work includes studies of the special class of iterative algorithms, which only utilize local knowledge of the network topology and employ local communication between nodes. A tight condition for iterative approximate Byzantine consensus has been presented in (Vaidya et al., 2012; LeBlanc et al., 2013). A family of tight conditions for approximate Byzantine consensus under the more general class of kk-hop iterative algorithms has been presented recently in (Su and Vaidya, 2017) but is restricted to synchronous systems. The feasibility of asynchronous crash-tolerant consensus with respect to the kk-hop iterative algorithms has been considered in (Sakavalas et al., 2018). A series of works (Litsas et al., 2013; Pagourtzis et al., 2017a, b) studies the effects of topology knowledge on the feasibility of RMT, and consequently exact consensus in undirected networks with Byzantine faults.

2. Preliminaries and Main Result

For the approximate consensus problem (Dolev et al., 1986), each node is given a real-valued input, and the algorithm needs to satisfy the three properties below.

Definition 0.

Approximate consensus is achieved if the following conditions are satisfied for a given ϵ>0\epsilon>0.

  1. (1)

    Convergence: the output values of any pair of nonfaulty nodes are within ϵ\epsilon of each other.

  2. (2)

    Validity: the output of any nonfaulty node is within the range of the inputs of the nonfaulty nodes.

  3. (3)

    Termination: all nonfaulty nodes eventually output a value.

System Model

We consider an asynchronous message-passing network. The underlying communication network is modeled as a simple directed graph G(V,E)G(V,E), where V={1,,n}V=\{1,\dots,n\} is the set of nn nodes, and EE is the set of directed edges between the nodes in VV. Node ii can reliably transmit messages to node jj if and only if the directed edge (i,j)E(i,j)\in E. Each node can send messages to itself as well; however, for convenience, we exclude self-loops from set EE. A link is assumed to be reliable, but the message delay is not known a priori.

In the system, at most ff nodes may become Byzantine faulty during an execution of the algorithm. A faulty node may misbehave arbitrarily. The faulty nodes may potentially collaborate with each other.

New Graph Conditions

Hereafter, we will use the notation X¯\overline{X} to denote the complement VXV\setminus X of set XVX\subseteq V. The subgraph of GG induced by node set YVY\subseteq V will be denoted by GYG_{Y}. For a given node set FVF\subseteq V, we now define the reach set of node vv under FF, originally introduced in (Tseng and Vaidya, 2015).

Definition 0 (Reach set of vv under FF).

For node vVv\in V and node set FV{v}F\subseteq V\setminus\{v\}, define

reachv(F)={uF¯: u has a directed path to v in graph GF¯}reach_{v}(F)=\{u\in\overline{F}:\text{ $u$ has a directed path to $v$ in graph }G_{\overline{F}}\}

Observe that a node uu belongs to reachv(F)reach_{v}(F) if vv is reachable from uu in the subgraph of GG induced by node set VFV\setminus F. Trivially, vv is in reachv(F)reach_{v}(F). With the definition of a reach set, we introduce the 1-reach, 2-reach and 3-reach conditions referred in Section 1. Intuitively speaking, in the definitions below, the sets FF, FvF_{v}, FuF_{u} represent potential sets of faulty nodes; thus, these sets are chosen to be of size f\leq f. In the following, recall that C¯\overline{C} denotes the set VCV\setminus C.

Definition 0 (Reach Conditions).

We define three conditions:

  • 1-reach: For any FVF\subset V such that |F|f|F|\leq f and any nodes u,vF¯u,v\in\overline{F}, we have

    reachu(F)reachv(F).reach_{u}(F)\cap reach_{v}(F)\neq\emptyset.
  • 2-reach: For any nodes u,vVu,v\in V and any node subsets FuF_{u}, FvF_{v} such that |Fu|,|Fv|f|F_{u}|,|F_{v}|\leq f, uFu¯u\in\overline{F_{u}}, and vFv¯v\in\overline{F_{v}}, we have

    reachv(Fv)reachu(Fu).reach_{v}(F_{v})\cap reach_{u}(F_{u})\neq\emptyset.
  • 3-reach: For any nodes u,vVu,v\in V and any node subsets FF, FuF_{u}, FvF_{v} such that |F|,|Fu|,|Fv|f|F|,|F_{u}|,|F_{v}|\leq f, uFFu¯u\in\overline{F\cup F_{u}}, and vFFv¯v\in\overline{F\cup F_{v}}, we have

    reachv(FFv)reachu(FFu).reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u})\neq\emptyset.

It is easy to verify that in a clique, 1-reach, 2-reach, and 3-reach are equivalent with n>f,n>2fn>f,n>2f, and n>3fn>3f respectively. Details can be found in Appendix A.

Main Results

As noted previously, Tseng and Vaidya (Tseng and Vaidya, 2015) obtained necessary and sufficient conditions enumerated in Table 2. We have shown, in Appendix A, that each of their conditions to be equivalent to a respective reach condition in Definition 3 above. In particular, based on the results in (Tseng and Vaidya, 2015), we can prove Theorems 4, 5 and 6 below. These results are not used to prove our main results; hence, we defer the presentation of the original conditions in (Tseng and Vaidya, 2015) and the equivalence proofs to Appendix A.

Theorem 4.

Synchronous exact consensus is possible in network G(V,E)G(V,E) in the presence of up to ff crash faults if and only if GG satisfies 1-reach condition.

Theorem 5.

Asynchronous approximate consensus is possible in network G(V,E)G(V,E) in the presence of up to ff crash faults if and only if GG satisfies 2-reach condition.

Theorem 6.

Synchronous exact consensus is possible in network G(V,E)G(V,E) in the presence of up to ff Byzantine faults if and only if GG satisfies 3-reach condition.

 

Main Result

Theorem 7.

Asynchronous approximate consensus is possible in network G(V,E)G(V,E) in the presence of up to ff Byzantine faults if and only if 3-reach is satisfied.

This result solves the open problem in Table 2 in Section 1.

 

Proving the main result: The sufficiency of the 3-reach condition for asynchronous Byzantine approximate consensus is demonstrated constructively in Section 4, using Algorithm 1 for achieving this goal. The necessity of the 3-reach condition for asynchronous Byzantine approximate consensus follows by standard indistinguishability arguments; the proof is deferred to Appendix B.

Technique Outline: Our result generalizes the result of (Abraham et al., 2004), which shows the sufficiency of condition n>3fn>3f for asynchronous Byzantine approximate consensus in a clique. Note that the condition coincides with the tight condition for the synchronous case (cf. Table 1). For directed graphs, we show that 3-reach is the tight condition for both the synchronous and asynchronous cases. Condition 3-reach states that there exists a node that has (i) a directed path to node uu in the subgraph induced by the node subset FFu¯\overline{F\cup F_{u}}, and also (ii) a directed path to node vv in the subgraph induced by the node subset FFv¯\overline{F\cup F_{v}}. This “source of common influence” for any pair of nodes is crucial for achieving consensus. We outline two techniques used towards our generalization, since they may provide useful intuition for other fault-tolerant settings.

Maximal Consistency: We simplify the Reliable-Broadcast subroutine of (Abraham et al., 2004) by essentially replacing several rounds of communication between nodes with flooding. Even in a clique, rr communication rounds can be simulated by flooding through propagation paths 222Observe that the definition of a path also applies in a clique network. of length at most rr. The receiver of all these propagated messages can then detect the existence of faults in certain propagation paths if the propagated values are inconsistent (i.e., values from different paths do not match). The technique appears in the use of the Maximal-Consistency condition in Algorithm 1; this simple condition provides similar properties as Reliable-Broadcast of (Abraham et al., 2004).

Witness node: The witness technique used in (Abraham et al., 2004) relies on the fact that for any pair of nodes, there is a nonfaulty witness node which provides them with enough common information. The existence of an analogous nonfaulty witness for directed networks is implied by the 3-reach condition. Intuitively, even if two nodes v,uv,u “suspect” different sets Fv,FuF_{v},F_{u} to be faulty, the existence of a common nonfaulty witness guarantees the flow of common information to both. Guaranteeing that all pairs of nonfaulty nodes gather enough common values while ensuring that nonfaulty nodes with wrongly suspected faulty set are always able to proceed appeared to be the most challenging part of the proposed algorithm. This technique appears in how each node verifies the messages that it has received at line 1 in Algorithm 1. Generally speaking, a node tries to collect as many “verified messages” as possible while it cannot wait for messages that might never arrive (i.e., message tampered by faulty nodes).

3. Useful Terminology

Recall that directed graph G=(V,E)G=(V,E) represents the network connecting the nn nodes in the system. Thus, n=|V|n=|V|. We will sometimes use the notation V(G)V(G) to represent the set of nodes in graph GG. In the following, we will use the terms edge and link interchangeably. We now introduce some graph terminology to facilitate the discussion.

  • A path is represented by an ordered list of vertices. In particular, p=v1,,vkp=\langle v_{1},\ldots,v_{k}\rangle is a directed path pp comprising of nodes v1,,vkVv_{1},\ldots,v_{k}\in V and directed edges (vi,vi+1)E(v_{i},v_{i+1})\in E, where 1ik11\leq i\leq k-1.

  • 𝐢𝐧𝐢𝐭(𝐩)\mathbf{init(p)} and 𝐭𝐞𝐫(𝐩)\mathbf{ter(p)}, will be used to denote the initial node v1v_{1} and terminal node vkv_{k} of a path p=v1,,vkp=\langle v_{1},\ldots,v_{k}\rangle.

  • A (𝐯𝟏,𝐯𝐤)\mathbf{(v_{1},v_{k})}-path is a path with init(p)=v1init(p)=v_{1} and ter(p)=vkter(p)=v_{k}.

  • Operation p||u=v1,,vk,up||u=\langle v_{1},\ldots,v_{k},u\rangle denotes the concatenation of path p=v1,,vkp=\langle v_{1},\ldots,v_{k}\rangle with node uu assuming that (vk,u)E(v_{k},u)\in E. Analogously, if ter(p)=init(p)ter(p)=init(p^{\prime}), then p||pp||p^{\prime} denotes the concatenation of paths pp and pp^{\prime}.

  • Redundant path: a path pp is a redundant path if p=p1||p2p=p_{1}||p_{2} for some simple paths p1p_{1} and p2p_{2} (p1p_{1} and p2p_{2} have no cycles) and one of p1,p2p_{1},p_{2} may be empty. Note that a redundant path may contain cycles and its length is upper bounded by 2n2n.

  • The set of all redundant paths in graph GYG_{Y} (defined above) will be denoted as 𝒫Yr\mathcal{P}^{r}_{Y}.

  • Fully nonfaulty path: a path consisting entirely of nonfaulty nodes.

  • (𝐀,𝐯)\mathbf{(A,v)}-paths: given a set AVA\subseteq V and a node vVv\in V, an (A,v)(A,v)-path pp is a path with init(p)Ainit(p)\in A and ter(p)=vter(p)=v.

    When convenient, we will interpret a path pp as the set of nodes in the path. The next few definitions use this interpretation for a node set CC and paths p=v1,,vkp=\langle v_{1},\ldots,v_{k}\rangle, p=<v1,,vk>p^{\prime}=<v^{\prime}_{1},\ldots,v^{\prime}_{k}>.

  • CpC\cap p will denote the intersection C{v1,,vk}C\cap\{v_{1},\ldots,v_{k}\}.

  • We will say that pCp\subseteq C if {v1,,vk}C\{v_{1},\ldots,v_{k}\}\subseteq C.

  • By ppp\cap p^{\prime}, we will denote the node intersection {v1,,vk}{v1,,vk}\{v_{1},\ldots,v_{k}\}\cap\{v^{\prime}_{1},\ldots,v^{\prime}_{k}\} of paths pp and pp^{\prime}.

 

Definition 0 (ff-cover of a path set).

For a set of paths PP, a node set CC is a ff-cover of PP, if |C|f|C|\leq f, and
pP, Cp\forall p\in P,~{}~{}\text{ }~{}~{}C\cap p\neq\emptyset.

Definition 0 (Reduced Graph).

For graph G=(V,E)G=(V,E), and sets F1,F2VF_{1},\,F_{2}\subseteq V, such that |F1|,|F2|f|F_{1}|,|F_{2}|\leq f, reduced graph GF1,F2=(V,EF1,F2)G_{F_{1},F_{2}}=(V,E_{F_{1},F_{2}}) has set of vertices VV, and the set of edges EF1,F2E_{F_{1},F_{2}} is obtained by removing from EE all the outgoing links at each node in F1F2F_{1}\cup F_{2}. That is,

EF1,F2=E{(u,v)|uF1F2,vV,vu}.E_{F_{1},F_{2}}~{}=~{}E~{}\setminus~{}\left\{(u,v)~{}|~{}u\in F_{1}\cup F_{2},~{}v\in V,~{}v\neq u\right\}.
Definition 0 (Source Component).

For graph G=(V,E)G=(V,E), and sets F1,F2VF_{1},\,F_{2}\subseteq V, such that |F1|,|F2|f|F_{1}|,|F_{2}|\leq f, source component SF1,F2S_{F1,F2} is defined as the set of those nodes in the reduced graph GF1,F2=(V,EF1,F2)G_{F1,F2}=(V,E_{F_{1},F_{2}}) that have directed paths to all the nodes in VV.

By definition, the nodes in SF1,F2S_{F1,F2} form a strongly connected component in GF1,F2G_{F1,F2}. The source component SF1,F2S_{F1,F2} has other desirable properties that will be introduced when we prove the correctness of our algorithm later.

4. Asynchronous approximate consensus in directed networks

We next present an algorithm for approximate Byzantine consensus in asynchronous directed networks. The algorithm is optimal in terms of resilience, meaning that it matches the impossibility condition of the problem, i.e., the algorithm works in any graph that satisfies 3-reach. Our solution is inspired by the asynchronous approximate consensus algorithm of (Abraham et al., 2004) as explained in Section 2. However, the tools used in the algorithm of (Abraham et al., 2004) prove highly non-trivial to generalize in the case of a partially-connected directed network. This is due to the constraint of directed edges. In complete graphs considered in (Abraham et al., 2004), information can flow both directions, and each node can use the same rule to collect information. In the case of directed networks, information may only be able to flow in one direction. We need to devise new tools for nodes to exchange and filter values so that enough common information is shared between any pair of nonfaulty nodes.

Outline of the algorithm

In our algorithm, each node vv maintains a state value xv[r], for rx_{v}[r]\in\mathbb{R},\text{ for }r\in\mathbb{N}, which is updated regularly, with xv[0]x_{v}[0] denoting the real-valued input of node vv. Value xv[r]x_{v}[r] represents the rr-th update of the state value of node vv; we will also refer to it as the state value of vv in (asynchronous) round rr. Observe that in asynchronous systems, vv updates the value every time it receives enough messages of a certain type (i.e., an event-driven algorithm), thus creating the sequence (xv[r])r\left(x_{v}[r]\right)_{r\in\mathbb{N}}. The rr-th value update of a node vv may happen at a different real time than the respective update of another node uu.

The proposed algorithm is structured in two parts presented in Algorithm 1: Byzantine Witness (BW) and Algorithm 3: Filter-and-Average (FA). Algorithm BW intuitively guarantees that all nonfaulty nodes will gather enough common information in any given (asynchronous) round. The value update in each round is described in Algorithm FA, where we propose an appropriate way for a node to filter values received in Algorithm BW and obtain the state value for next round as an average of the filtered values. Each node needs to filter potentially faulty values to guarantee validity.

4.1. Algorithm Preliminaries

The proposed algorithm utilizes the propagation of values through certain redundant paths (defined in Section 3). We then describe tools for handling information received by a node through different paths

Messages and Message Sets.

In our algorithm, the messages propagated are of the form (x,p)(x,p) where xx is the propagated value and pp corresponds to the (redundant) path through which the value xx is propagated, i.e., its propagation path. For a message m=(x,p)m=(x,p), we will use the notation value(m)=xvalue(m)=x and path(m)=ppath(m)=p to denote the propagated value and propagation path, respectively. For simplicity, we will also use the terminology vv receives value xx from uu whenever node vv receives xx through some path pp initiating at node uu. A message set \mathcal{M} is a set of messages of the form m=(x,p)m=(x,p) where xx is the value reported though propagation path pp. Given \mathcal{M}, we will use 𝒫()\mathcal{P}(\mathcal{M}) to denote the set of all propagation paths in \mathcal{M}, i.e.,

𝒫()={p:(x,p)}\mathcal{P}(\mathcal{M})=\{p~{}:~{}(x,p)\in\mathcal{M}\}

As defined below, given a node set AA and a message set \mathcal{M}, the exclusion of \mathcal{M} on AA consists of the messages of \mathcal{M} that are propagated on paths that do not include any node in AA.

Definition 0 (Exclusion of message set).

Given a message set \mathcal{M} and AVA\subseteq V, the exclusion of \mathcal{M} on AA is the set

|A={(x,p):pA=}\mathcal{M}|_{A}=\{(x,p)\in\mathcal{M}~{}:~{}p\cap A=\emptyset\}

The notions of a consistent message set and full message set, presented below, are used to facilitate fault detection. A message set \mathcal{M} is consistent if all value-path tuples initiating at the same initial node report the same value.

Definition 0 (Consistent message set).

A message set \mathcal{M} is consistent if

for any two value-path pairs(x,p),(x,p),init(p)=init(p)x=x\text{for any two value-path pairs}~{}(x,p),(x^{\prime},p^{\prime})\in\mathcal{M},~{}~{}~{}~{}init(p)=init(p^{\prime})~{}\Rightarrow~{}x=x^{\prime}

Given a consistent message set \mathcal{M}, if (x,p)(x,p)\in\mathcal{M} and init(p)=vinit(p)=v, then we define valuev()=xvalue_{v}(\mathcal{M})=x. That is, for a node vv that appears as an initial node of a path in 𝒫()\mathcal{P}(\mathcal{M}), valuev()value_{v}(\mathcal{M}) denotes the unique value corresponding to vv. Note that the value is guaranteed to be unique owing to the the definition of a consistent message set.

We say that the received message set \mathcal{M} is a full message set for (A,v)(A,v), whenever a node vv receives messages from all possible incoming redundant paths excluding node set AA. The formal definition follows.

Definition 0 (Full message set).

Given AVA\subseteq V and vVAv\in V\setminus A, a message set \mathcal{M} is full for (A,v)(A,v) if

{p𝒫A¯r:ter(p)=v}𝒫()\{p\in\mathcal{P}^{r}_{\overline{A}}:ter(p)=v\}\subseteq\mathcal{P}(\mathcal{M})

4.2. Algorithm Byzantine Witness (BW)

The Byzantine Witness (BW) algorithm, presented as Algorithm 1, intuitively guarantees that all nonfaulty nodes will receive enough common state values in a specific asynchronous round of the algorithm; eventually, this common information guarantees convergence of local state values at all nonfaulty nodes. The algorithm is event-driven. That is, whenever a new message is received, each node checks whether a certain set of conditions are satisfied, and whether it should take an action. (In particular, Line 6, Line 8, Line 10, and Line 12 of Algorithm 1 will be triggered upon receipt of a new message.)

Parallel executions.

For the sake of succinctness, we present a parallel version of Algorithm BW; the algorithm makes use of parallel executions (threads) for any potential fault set FF. Note that there are exponential number of threads. In the parallel thread for set FF, a node “guesses” that the actual fault set of this execution is FF, and checks for inconsistencies to reject this guess. Observe that in lines 1-1 of Algorithm BW, the usage of a shared boolean variable nextroundnextround guarantees that a node will proceed to the next round during at most one parallel execution; we will later prove that there always exists such a parallel execution that proceeds to the next round. For each round rr, during this unique parallel execution, the node will execute Algorithm Filter-and-Average (FA), presented as Algorithm 3, through which the value is updated.

Suppose that a node’s parallel thread for set FF^{\prime} proceeds to the next round. It is possible that FF^{\prime} is not the actual faulty set. Our algorithm is designed in a way that even if the guess is incorrect, the node is still able to collect enough common values and make progress. Moreover, the parallel thread for set FF, where FF is the actual fault set, is guaranteed to make progress at every nonfaulty node.

Atomicity

Algorithm BW uses the shared variables v\mathcal{M}_{v} and nextroundnextround; v\mathcal{M}_{v} includes all values received by node vv and is updated whenever vv receives a new flooded value while nextroundnextround indicates if a parallel thread has proceeded to the next (asynchronous) round. For certain parts of the algorithm we need access to shared variables to be atomic, i.e., reads and writes to shared variables can be performed only by a parallel thread at a time. For clarity of the latter, we make use of the functions lock() and unclock() which indicate the parts of the code performed in an atomic fashion.

We next describe a flooding subroutine used to propagate state values throughout redundant paths in the network.

RedundantFlood (Redundant Flood) algorithm

In the beginning of each asynchronous round, in algorithm 1, all nodes will flood their values throughout the network. The difference between RedundantFlood algorithm and the standard flooding is that each flooded message is propagated through any redundant path in the network, not just through simple paths. The details of the algorithm are deferred to the Appendix E.

FIFO flooding of messages

During the execution of the algorithm, we will employ the FIFO Flood and FIFO Receive procedures which ensure that the order of messages sent from a sender is preserved during reception of these messages by any receiver, when the propagating path is fully comprised of nonfaulty nodes. For the correctness of our algorithm, we only need to FIFO-flood messages through simple paths in the network.333It is possible to use RedundantFlood everywhere. For efficiency, we only use RedundantFlood to propagate state values at the very beginning of each round. Thus, a node will propagate a message during FIFO-Flood, only if the resulting propagation path does not contain any cycle. We present a high-level description of the FIFO Flood and FIFO Receive procedures in the Appendix F.

Algorithm BW: Pseudo-code

Algorithm BW is presented in Algorithm  1. We stress that Algorithm BW is executed for each asynchronous round rr. Thus, all messages sent in round rr will be tagged with corresponding round identifier rr. For simplicity, we omit round numbers in the presentation and the analysis of the algorithm. The properties of Algorithm BW are proved hold for any specific asynchronous round rr. For brevity, the pseudo-code does not include the termination condition. We defer the discussion on termination to Section 4.6.

1
2
Input: State value xvx_{v} of node vv for round rr
3 \triangleright Round id rr, included in all sent messages, is omitted for simplicity
Code for vVv\in V:
Initialization
4v\mathcal{M}_{v}\leftarrow\emptyset \triangleright shared variable accessed by all parallel threads
5nextroundfalsenextround\leftarrow false \triangleright shared variable accessed by all parallel threads
6FIFORec(F)=falseFIFORec(F)=false, for each FVF\subseteq V with |F|f|F|\leq f
7
8RedundantFlood value xvx_{v}
9for each FvV{v}F_{v}\subseteq V\setminus\{v\}, with |Fv|f|F_{v}|\leq f do in parallel \triangleright only one parallel thread for some FvF_{v}
10 \triangleright executes Filter-and-Average due to lines 1-1
11      
12      upon receipt of message m=(x,p)m=(x,p) do
13             lock()
14            vvm\mathcal{M}_{v}\leftarrow\mathcal{M}_{v}\cup m \triangleright Atomic updates of v\mathcal{M}_{v}
15            unlock()
16      
17       \triangleright Maximal-Consistency Condition
18      upon  (v|Fv\mathcal{M}_{v}|_{F_{v}} is consistent  and  full for (Fv,v)(F_{v},v) for the first time) do
19             FIFO-Flood (v|Fv,COMPLETE(Fv))(\mathcal{M}_{v}|_{F_{v}},COMPLETE(F_{v}))
20      
21       \triangleright FIFO-Receive-All Condition for FvF_{v}
22      upon ( For all creachv(Fv)c\in reach_{v}(F_{v}),  vv FIFO-Receives the same message (c,COMPLETE(Fv))(\mathcal{M}^{c},COMPLETE(F_{v})) from all simple (c,v)(c,v)-paths preachv(Fv)p\subseteq reach_{v}(F_{v}))  do
23            
24            FIFORec(Fv)trueFIFORec(F_{v})\leftarrow true
25      
26      upon  Verify(v,FIFORec(Fv))Verify(\mathcal{M}_{v},FIFORec(F_{v})) do \triangleright For verification of a received value, vv will wait
27       \triangleright to receive the same value from enough paths as implied by Algorithm 2.
28             lock()
29            if nextroundnextround=false  then
30                  Execute Algorithm Filter-and-Average(v)(\mathcal{M}_{v}) \triangleright Execution of Algorithm 3
31                  nextroundtruenextround\leftarrow true
32            
33            unlock()
34      
35
36
37Function Verify(v,FIFORec(Fv))Verify(\mathcal{M}_{v},FIFORec(F_{v})):
38      
39      validityfalsevalidity\leftarrow false
40      if FIFORec(Fv)=trueFIFORec(F_{v})=true then
41            
42            validitytruevalidity\leftarrow true
43            for each (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})) FIFO-received through a simple (c,v)(c,v)-path preachv(Fv)p\subseteq reach_{v}(F_{v}), with consistent c\mathcal{M}^{c} do
44                   validityvalidityCompleteness(v,c,Fu)validity\leftarrow validity\wedge Completeness(\mathcal{M}_{v},\mathcal{M}^{c},Fu) \triangleright Completeness: Algorithm 2
45            
46      
47      return validityvalidity
48
Algorithm 1 BW (for node vv and round rr)
1
Input: Message sets v,c\mathcal{M}_{v},\mathcal{M}^{c}, FuVF_{u}\subseteq V
2
Initialization
3outputtrueoutput\leftarrow true
4for each FwVF_{w}\subseteq V with FwFu,F_{w}\neq F_{u}, and |Fw|f|F_{w}|\leq f do
5      
6      for each  qSFu,Fwq\in S_{F_{u},F_{w}} do
7             {(valueq(c),p)v:init(p)=q}\mathcal{M}^{\prime}\leftarrow\{(value_{q}(\mathcal{M}^{c}),p)\in\mathcal{M}_{v}:init(p)=q\} \triangleright All received messages from qq
8             \triangleright which are consistent with c\mathcal{M}^{c}
9            outputoutput( an f-cover HVSFu,Fw of 𝒫())output\leftarrow output\wedge(\nexists\text{ an $f$-cover }H\subseteq V\setminus S_{F_{u},F_{w}}\text{ of }\mathcal{P}(\mathcal{M}^{\prime}))
10      
11
12return outputoutput
Algorithm 2 Function Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u})
Function Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u})

We first remind the reader that SF1,F1S_{F_{1},F_{1}} denotes the source component of reduced graph GF1,F2G_{F_{1},F_{2}} as defined in Definitions 23. Observe that due to function Verify(v)Verify(\mathcal{M}_{v}) called in line 1, a node essentially waits to receive additional messages to the ones it received upon considering possible faulty set FvF_{v} (during the parallel execution for FvF_{v}) before it proceeds to update its value through Algorithm Filter-and-Average. Intuitively, for some received message (Mc,COMPLETE(Fu))(M^{c},COMPLETE(F_{u})), vv waits for the confirmation of the values in c\mathcal{M}^{c} through enough redundant paths from a source component. We will later prove that if message (Mc,COMPLETE(Fu))(M^{c},COMPLETE(F_{u})) is not faulty (i.e., tampered) then vv will eventually be able to “confirm” the values in McM^{c}. For the sake of simplicity, whenever the function Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) at node vv is true for some given c,Fu\mathcal{M}^{c},F_{u}, we will simply state that condition Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) is satisfied.

4.3. Properties of Algorithm BW

In the following, we introduce some notions necessary for the analysis of Algorithm BW. We first borrow the notion of propagation from (Tseng and Vaidya, 2012, 2015).

Definition 0 (Propagation between sets).

Given sets A,B,CVA,B,C\subseteq V with AB=A\cap B=\emptyset, BCB\subseteq C, set AA is said to propagate in CC to set BB if either (i) B=B=\emptyset, or (ii) for each node bBb\in B, there exist at least f+1f+1 node-disjoint (A,b)(A,b)-paths in the node subgraph of GG induced by node set CC, i.e., GCG_{C}. We will denote the fact that set AA propagates in CC to set BB by ACBA\stackrel{{\scriptstyle C}}{{\rightsquigarrow}}{B}.

Note that the f+1f+1 disjoint paths implied in Definition 4, are entirely contained in CC. Next, observe that by Definition 3, for any sets F1,F2VF_{1},F_{2}\subseteq V it holds that SF1,F2=SF2,F1S_{F_{1},F_{2}}=S_{F_{2},F_{1}}. The following theorem is repeatedly used in our analysis; its proof is based on Corollary 2 in (Tseng and Vaidya, 2012) and the equivalence of 3-reach condition with the condition in (Tseng and Vaidya, 2012) (proof in Appendix A). Intuitively, Theorem 5 below states that if 3-reach is satisfied then there are at least f+1f+1 disjoint paths, excluding nodes in F1F_{1}, that connect a source component SF1,F2S_{F_{1},F_{2}} with any node outside the source component.

Theorem 5.

Suppose that graph G=(V,E)G=(V,E) satisfies condition 3-reach. Then, for any F1VF_{1}\subseteq V and F2F1¯F_{2}\subseteq\overline{F_{1}}, such that |F1|,|F2|f|F_{1}|,|F_{2}|\leq f, SF1,F2F1¯F1SF1,F2¯S_{F_{1},F_{2}}\stackrel{{\scriptstyle\overline{F_{1}}}}{{\rightsquigarrow}}{\overline{F_{1}\cup S_{F_{1},F_{2}}}} and SF1,F2F2¯F2SF1,F2¯S_{F_{1},F_{2}}\stackrel{{\scriptstyle\overline{F_{2}}}}{{\rightsquigarrow}}{\overline{F_{2}\cup S_{F_{1},F_{2}}}} hold.

Using the notions above, we will next show some important properties of Algorithm BW. As defined in line 1, we will say that node vv satisfies the Maximal-Consistency Condition for node set FF^{\prime} if it receives the message set v\mathcal{M}_{v} and v|F\mathcal{M}_{v}|_{F^{\prime}} is consistent and full for (F,v)(F^{\prime},v).

Lemma 6.

For any nonfaulty node vv, the Maximal-Consistency condition will eventually be satisfied during a parallel execution for some set FF^{\prime}.

Proof.

Consider vv’s parallel execution for set F=FF^{\prime}=F, where FF is the actual faulty set of the execution. If the Maximal-Consistency condition has not been satisfied already, it will eventually be satisfied during the parallel execution for F=FF^{\prime}=F. This will happen since every node in GF¯G_{\overline{F}} behaves correctly and thus vv will eventually receive consistent values from all incoming paths in GF¯G_{\overline{F}}, i.e., v|Fv\mathcal{M}_{v}|_{F_{v}} will be consistent and full for (Fv,v)(F_{v},v). ∎

Lemma 7.

Consider two nonfaulty nodes v,uv,u that satisfy the Maximal-Consistency condition on the same set FF^{\prime}. Let the message sets v|F\mathcal{M}_{v}|_{F^{\prime}} and u|F\mathcal{M}_{u}|_{F^{\prime}} be the sets that are used to pass Maximal-Consistency condition at vv and uu, respectively. Then, the two sets contain the same unique value xw,wF′′FF′′V,|F′′|fSF,F′′\displaystyle x_{w},\forall w\in\bigcup_{\mathclap{\begin{subarray}{c}F^{\prime\prime}\neq F^{\prime}\\ F^{\prime\prime}\subseteq V,|F^{\prime\prime}|\leq f\end{subarray}}}S_{F^{\prime},F^{\prime\prime}}.

Proof.

We first prove that for each wSF,F′′w\in S_{F^{\prime},F^{\prime\prime}}, both nodes vv and uu will receive a unique value corresponding to ww, contained in the respective sets v|F\mathcal{M}_{v}|_{F^{\prime}} and u|F\mathcal{M}_{u}|_{F^{\prime}}. Observe that for any F′′FF^{\prime\prime}\neq F^{\prime} and wSF,F′′w\in S_{F^{\prime},F^{\prime\prime}}, by Theorem 5 and the fact that any source component SF,F′′S_{F^{\prime},F^{\prime\prime}} is strongly connected, there exists a simple (w,v)(w,v)-path in GF¯G_{\overline{F^{\prime}}}. Since v|F\mathcal{M}_{v}|_{F^{\prime}} is full for (F,v)(F^{\prime},v), v|F\mathcal{M}_{v}|_{F^{\prime}} will contain some value xwx_{w} corresponding to ww. Note that this value might not be the value sent by ww, since the above simple path might contain some faulty node. Next, recall that we also require v|F\mathcal{M}_{v}|_{F^{\prime}} to be consistent. Therefore, the previously mentioned xwx_{w} value contained in v|F\mathcal{M}_{v}|_{F^{\prime}} must be unique . The same argument applies to u|F\mathcal{M}_{u}|_{F^{\prime}}, too.

The 3-reach condition implies the existence of a node qreachv(FF)reachu(FF)q\in reach_{v}(F\cup F^{\prime})\cap reach_{u}(F\cup F^{\prime}) for the actual faulty set FF. By definition, qq is nonfaulty and is connected to both v,uv,u through fully nonfaulty simple paths pq,vp_{q,v} and pq,up_{q,u} respectively. By Theorem 5, either qSF,F′′q\in S_{F^{\prime},F^{\prime\prime}} or there exist f+1f+1 simple disjoint (SF,F′′,q)(S_{F^{\prime},F^{\prime\prime}},q)-paths in GF¯G_{\overline{F^{\prime}}}. In both cases, there exists a simple (w,q)(w,q)-path pw,qp_{w,q} in GF¯G_{\overline{F^{\prime}}}. Note that there might be some faulty nodes in pw,qp_{w,q}, since FF^{\prime} might not be the actual faulty node set.

This observation implies that in GF¯G_{\overline{F^{\prime}}}, there exist a redundant (w,v)(w,v)-path pw,v=pw,q||pq,vp_{w,v}=p_{w,q}||p_{q,v} and a redundant (w,u)(w,u)-path pw,u=pw,q||pq,up_{w,u}=p_{w,q}||p_{q,u} such that the first part pw,qp_{w,q} is identical in both paths. Note that the 33-reach condition only implies that pq,vp_{q,v} and pq,up_{q,u} are fully nonfaulty. Hence, it is possible that the value sent by node ww is xwx^{\prime}_{w}, but the message(s) propagated through pw,vp_{w,v} and pw,up_{w,u} are different. Since v|F\mathcal{M}_{v}|_{F^{\prime}} and u|F\mathcal{M}_{u}|_{F^{\prime}} are full; nodes vv and uu will receive some value from paths pw,vp_{w,v} and pw,up_{w,u}, respectively. The value received by vv and uu must be identical. This is because (i) the two redundant paths have a common first part pw,qp_{w,q}; and (ii) pq,vp_{q,v} and pq,up_{q,u} are fully nonfaulty by assumption. Let this value be xwx_{w} (which may or may not equal to xwx^{\prime}_{w}, the original value sent by ww). Finally, since v|F\mathcal{M}_{v}|_{F^{\prime}} is consistent, all the other messages propagated through paths pp with init(p)=winit(p)=w and ter(p)=vter(p)=v must also be xwx_{w}, the value forwarded by qq. The same argument applies to v|F\mathcal{M}_{v}|_{F^{\prime}}. Thus, For each ww, there exists a common value xwx_{w} in both v|F\mathcal{M}_{v}|_{F^{\prime}} and u|F\mathcal{M}_{u}|_{F^{\prime}}.

Next we prove the main Lemma for the correctness of Algorithm BW. With FIFO-Receive-all condition we refer to the condition stated in the event handler of line 1.

Lemma 8.

Consider a nonfaulty node vv such that the FIFO-Receive-All condition is satisfied at node vv for some parallel execution. If by the time the FIFO-Receive-All condition is satisfied, vv receives (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})) from a fully nonfaulty path pp with init(p)=cinit(p)=c, then vv will eventually receive a message set MvM_{v} such that the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition will be satisfied at node vv.

Proof.

First observe that since path pp is fully nonfaulty, cc is nonfaulty; also lines 1,1 of BW imply that cFuc\notin F_{u} since it propagates (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})). Consider any FwFuF_{w}\neq F_{u} with |Fw|f|F_{w}|\leq f and any qSFu,Fwq\in S_{F_{u},F_{w}}. Let FF be the actual faulty set of the execution. Note that since nonfaulty vv receives (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})) from a fully nonfaulty path with init(p)=cinit(p)=c, node cc must have FIFO-Flooded this message during the execution. By the 3-reach condition of Definition 2, we have the following.

(1) FVSFu,Fw{v}, such that |F|f,zreachv(FF)reachc(FFu)\forall F^{\prime}\subseteq V\setminus S_{F_{u},F_{w}}\setminus\{v\},\text{ such that }|F^{\prime}|\leq f,\exists z\in reach_{v}(F\cup F^{\prime})\cap reach_{c}(F\cup F_{u})

This, for any FVSFu,FwF^{\prime}\subseteq V\setminus S_{F_{u},F_{w}}, implies the existence of a fully nonfaulty simple (z,v)(z,v)-path pz,vp_{z,v} and a fully nonfaulty simple (z,c)(z,c)-path pz,cp_{z,c} in graphs GFF¯G_{\overline{F\cup F^{\prime}}} and GFFu¯G_{\overline{F\cup F_{u}}}, respectively.444Set FF^{\prime} is not to be confused with the set FvF_{v} during the parallel execution of which vv satisfies the FIFO-Receive-All condition. Set FvF_{v} is arbitrary in the proof. Note that vv might even receive values from paths intersecting with FvF_{v} in order to satisfy the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition during its parallel execution for FvF_{v}. This might occur if FvF_{v} is a wrong “guess” of the actual fault set. We consider the following two cases for zz,

  • Case I: zSFu,Fwz\in S_{F_{u},F_{w}}.

    We first prove the following key claim.

    Claim 1:

    Both vv and cc receive an identical value xqx_{q} from node qq.

    Proof of Claim: Since SFu,FwS_{F_{u},F_{w}} is strongly connected, there exists a simple (q,z)(q,z)-path pq,zp_{q,z} in graph GSFu,FwG_{S_{F_{u},F_{w}}}. This implies the existence of the redundant (q,c)(q,c)-path pq,c=pq,z||pz,cp_{q,c}=p_{q,z}||p_{z,c} in GFu¯G_{\overline{F_{u}}}. Recall that by assumption, cc is nonfaulty and FIFO-Floods COMPLETE(Fu)COMPLETE(F_{u}). This means that cc has received a unique value xqx_{q} through all redundant (q,c)(q,c)-paths in GFu¯G_{\overline{F_{u}}}, particularly through path pq,cp_{q,c}. Note that since pq,zp_{q,z} might contain some faulty nodes, xqx_{q} might be different from the value originally sent by node qq. Observe that there also exists the redundant (q,v)(q,v)-path pq,v=pq,z||pz,vp_{q,v}=p_{q,z}||p_{z,v} in GF¯G_{\overline{F^{\prime}}} which will eventually propagate the same value xqx_{q} to vv. This is because pz,vp_{z,v} is fully nonfaulty and the initial part pq,zp_{q,z} is identical in both pq,cp_{q,c} and pq,vp_{q,v}.

    Claim 2:

    Node vv will eventually receive xqx_{q} from a set of paths PP with no ff-cover HVSFu,FwH\subseteq V\setminus S_{F_{u},F_{w}}.

    Proof of Claim: Recall that Claim 1 holds for any FVSFu,Fw{v}F^{\prime}\subseteq V\setminus S_{F_{u},F_{w}}\setminus\{v\}, with |F|f|F^{\prime}|\leq f. Then vv will eventually receive xqx_{q} from all redundant paths pq,z||pz,vp_{q,z}||p^{\prime}_{z,v} for any pz,vp^{\prime}_{z,v} being a (z,v)(z,v)-path with pz,vF=p^{\prime}_{z,v}\cap F=\emptyset. This is because all these pz,vp^{\prime}_{z,v} paths are fully nonfaulty and the initial part pq,zp_{q,z} propagates xqx_{q} as implied by Claim 1. The set PP of all these pz,vp^{\prime}_{z,v} paths does not have an ff-cover FVSFu,FwF^{\prime}\subseteq V\setminus S_{F_{u},F_{w}}. If there was such an ff-cover FF^{\prime}, this would contradict Equation (1) because it would mean that no fully nonfaulty (z,v)(z,v)-path would exist in GF¯G_{\overline{F^{\prime}}}.555Observe that if vSFu,Fwv\in S_{F_{u},F_{w}} and receives xqx_{q} from a single path that entirely consists of nodes in SFu,FwS_{F_{u},F_{w}}, then no ff-cover HVSFu,FwH\subseteq V\setminus S_{F_{u},F_{w}} exists for this path. This is because HSFu,Fw=H\cap S_{F_{u},F_{w}}=\emptyset by definition.

  • Case II: zSFu,Fwz\notin S_{F_{u},F_{w}}.

    Theorem 5 implies that there exist f+1f+1 simple disjoint (SFu,Fw,z)(S_{F_{u},F_{w}},z)-paths in GFu¯G_{\overline{F_{u}}}. This together with the observation that SFu,FwS_{F_{u},F_{w}} is strongly connected imply the existence of f+1f+1 simple (q,z)(q,z)-paths p1,,pf+1p_{1},\ldots,p_{f+1} which trivially do not have an ff-cover HVSFu,FwH\subseteq V\setminus S_{F_{u},F_{w}}. Similarly with the previous case, since cc FIFO-Floods COMPLETE(Fu)COMPLETE(F_{u}), it must have received the same value from all redundant paths p1||pz,c,,pf+1||pz,cp_{1}||p_{z,c},\ldots,p_{f+1}||p_{z,c}. Since |F|f|F^{\prime}|\leq f and pz,vp_{z,v} is fully nonfaulty, one of the paths p1||pz,v,,pf+1||pz,vp_{1}||p_{z,v},\ldots,p_{f+1}||p_{z,v} will also eventually propagate value xqx_{q} to vv.

    Using the same argument for Claim 2 in Case I, vv will eventually receive xqx_{q} from a set of paths PP with no ff-cover HVSFu,FwH\subseteq V\setminus S_{F_{u},F_{w}}.

In both cases, any such value xqx_{q} received by vv will be consistent with values c\mathcal{M}^{c} propagated by cc, and thus vv will eventually satisfy the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition.

In the following, we will consider the case where a node vv executes Algorithm Filter-and-Average through line 1, during its parallel execution for set FvF_{v}. In this case, vv has already satisfied the Maximal-Consistency condition corresponding to FvF_{v} as well as the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) conditions for all COMPLETE(Fu)COMPLETE(F_{u}) messages it has received by the time it satisfied the FIFO-Receive-All condition. Intuitively, this means that vv has received redundant messages corresponding to “suspicious sets” Fv,FuF_{v},F_{u} by the time it executes Filter-and-Average. Note that there exists only one such parallel execution during which Filter-and-Average is executed; this follows easily from the atomic updates of shared variable v\mathcal{M}_{v} and lines 1-1 of Algorithm BW. We will use the following notion in our proofs.

Definition 0 (Informed node).

A node vv that executes Filter-and-Average during its parallel execution for a set FvF_{v} is informed about set FtF_{t} if Fv=FtF_{v}=F_{t}, or vv has satisfied the Completeness(v,c,Ft)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{t}) condition after receiving message (c,COMPLETE(Ft))(\mathcal{M}^{c},COMPLETE(F_{t})) from a fully nonfaulty path pp with init(p)=cinit(p)=c.

Theorem 10.

Any nonfaulty node vv will eventually execute Filter-and-Average during a parallel execution for a set FF^{\prime}.

The proof of Theorem 10 relies on the observation that algorithm Filter-and-Average will be executed during parallel execution for actual fault set FF if not during any other parallel execution. The full proof is presented in Appendix D.

Theorem 11.

Let any pair of nonfaulty nodes v,uv,u which execute Filter-and-Average during their parallel executions for sets FvF_{v} and FuF_{u}, respectively. Then, both nodes vv and uu will be informed about a node set Ft{Fv,Fu}F_{t}\in\{F_{v},F_{u}\}, where t{v,u}t\in\{v,u\}, and will both receive a common value xqx_{q} for each qFwFt|Fw|fSFt,Fw\displaystyle q\in\bigcup_{\mathclap{\begin{subarray}{c}F_{w}\neq F_{t}\\ |F_{w}|\leq f\end{subarray}}}S_{F_{t},F_{w}}. More specifically, each value xqx_{q} will be the unique value corresponding to node qq that node tt received by the time it satisfied its Maximal-Consistency condition.

Proof.

Theorem 10 implies that sets Fv,FuF_{v},F_{u} are well defined. Observe that, if Fv=FuF_{v}=F_{u} the theorem holds trivially due to Definition 9 and Lemma 7; this is because both nodes will trivially be informed about the same set, according to the first part of the definition of an informed node. Thus, we focus on the case where FvFuF_{v}\neq F_{u}. The FIFO-Receive-All condition for nodes vv and uu is satisfied in the parallel execution for Fv,FuF_{v},F_{u} respectively by assumption. Let FF be the actual fault set. Due to the 3-reach condition, there exists a node creachv(FFv)reachu(FFu)c\in reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u}) which is nonfaulty by definition of the reach set and is connected to v,uv,u through fully nonfaulty simple paths pc,vp_{c,v} and pc,up_{c,u} respectively. Note that both nodes v,uv,u will only satisfy their FIFO-Receive-All conditions only if they receive messages of the form (vc,COMPLETE(Fv))(\mathcal{M}^{c}_{v},COMPLETE(F_{v})) and (uc,COMPLETE(Fu))(\mathcal{M}^{c}_{u},COMPLETE(F_{u})), respectively, from cc through the existing nonfaulty paths pc,v,pc,up_{c,v},p_{c,u} respectively; this holds since due to Line 1, node vv (analogously node uu) will wait until it receives (vc,COMPLETE(Fv))(\mathcal{M}^{c}_{v},COMPLETE(F_{v})) from all paths entirely comprising of nodes in reachv(Fv)reach_{v}(F_{v}) which include all nodes on pc,vp_{c,v}. Thus, cc must have sent both COMPLETE(Fv)COMPLETE(F_{v}), COMPLETE(Fu)COMPLETE(F_{u}) messages. Since these messages are FIFO-flooded from cc and there are fully nonfaulty paths connecting cc with both u,vu,v, one of the two nodes will receive both COMPLETE(Fv)COMPLETE(F_{v}), COMPLETE(Fu)COMPLETE(F_{u}) messages before satisfying the FIFO-Receive-All condition. Assume without loss of generality that this node is vv. We will then show that the theorem holds for Ft=FuF_{t}=F_{u}.

Similar arguments 666This follows from the first paragraph of the proof of Lemma 7 for cc being any of the nodes v,uv,u. to the ones used in the proof of Lemma 7 imply that cc must have received a unique value xqx_{q} for each qFwFu|Fw|fSFu,Fw\displaystyle q\in\bigcup_{\mathclap{\begin{subarray}{c}F_{w}\neq F_{u}\\ |F_{w}|\leq f\end{subarray}}}S_{F_{u},F_{w}} from redundant paths in GFu¯G_{\overline{F_{u}}} in order to satisfy its Maximal-Consistency condition during the parallel execution for set FuF_{u}. Similarly with previous arguments, by Theorem 5 and the strong connectivity of SFu,FwS_{F_{u},F_{w}}, there exists a simple (q,c)(q,c)-path pq,cp_{q,c} in GFu¯G_{\overline{F_{u}}}, which propagates this value xqx_{q} to cc. Since, by assumption, uu executes Filter-and-Average during its parallel execution for FuF_{u}, by the Maximal-Consistency condition, uu will also receive the same unique value xqx_{q}, for each such node qq, propagated by redundant path pq,c||pc,up_{q,c}||p_{c,u} because pc,up_{c,u} is fully nonfaulty and entirely contained in GFu¯G_{\overline{F_{u}}}. As argued previously, vv will receive (vc,COMPLETE(Fv))(\mathcal{M}^{c}_{v},COMPLETE(F_{v})) by a fully nonfaulty path pc,vp_{c,v}. Consequently, by Lemma 8, vv will satisfy the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition and thus, due to Definition 9, vv will be informed about FuF_{u}. For the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition to be satisfied at node vv, it must receive the respective xqx_{q} values which are consistent with the ones in c\mathcal{M}^{c}, received through the fully nonfaulty path pc,vp_{c,v}. Thus, by definition of the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition, vv will also receive the same values xqx_{q} for each qFwFu|Fw|fSFu,Fw\displaystyle q\in\bigcup_{\mathclap{\begin{subarray}{c}F_{w}\neq F_{u}\\ |F_{w}|\leq f\end{subarray}}}S_{F_{u},F_{w}}.

Finally, we introduce notions that will be useful for our analysis later.

Definition 0.

Assume nonfaulty nodes v,uv,u which execute Filter-and-Average during their parallel executions for sets FvF_{v} and FuF_{u}, respectively. Let Ft{Fv,Fu}F_{t}\in\{F_{v},F_{u}\} be the set about which both v,uv,u are informed and both receive a common value xqx_{q} for each qFwFt|Fw|fSFt,Fw\displaystyle q\in\bigcup_{\mathclap{\begin{subarray}{c}F_{w}\neq F_{t}\\ |F_{w}|\leq f\end{subarray}}}S_{F_{t},F_{w}}, as implied by Theorem 11. We will refer to set FtF_{t} as the common fault set of v,uv,u, and to tt as the leading node of the pair. Considering the common values xqx_{q} received by both v,uv,u, for any set FwFtF_{w}\neq F_{t} with FwVF_{w}\subseteq V, |Fw|f|F_{w}|\leq f, we define the common value set RFwR_{F_{w}} as:

(2) RFw=qSFt,FwxqR_{F_{w}}=\bigcup_{q\in S_{F_{t},F_{w}}}x_{q}

4.4. Value Update

We next present Algorithm Filter-and-Average (FA), proposing a way for a node to filter its received messages in order to update its state value by an averaging procedure. Following the intuition of (Su and Vaidya, 2017), a node first sorts all the values in received message set v\mathcal{M}_{v}, which results to a sorted vector Ov[r]O_{v}[r] in round rr. In the next step, the node computes the maximal set of lowest values that might have been tampered by a faulty set (i.e., such that their propagation paths have an ff-cover) and trims (removes) them from sorted vector Ov[r]O_{v}[r]. Analogously, the node also trims from Ov[r]O_{v}[r] the maximal set of the highest values that may have been tampered by a faulty set. Finally, Finally, the remaining sorted values, denoted as Ov[r]O_{v}^{\prime}[r], are averaged as is usual in the majority of the approximate consensus literature (e.g., (Dolev et al., 1986; Kieckhafer and Azadmanesh, 1994; Bonomi et al., 2016)).

1
Input: Incoming message history v\mathcal{M}_{v} at the point where Filter-and-Average is called in BW in round rr
2
Code for vVv\in V:
3Sort all messages mvm\in\mathcal{M}_{v} in increasing order with respect to value(m)value(m) which results in sorted vector Ov[r]O_{v}[r].
4OvloO_{v}^{lo}\leftarrow the longest message prefix of Ov[r]O_{v}[r] for which \exists an ff-cover FloF_{lo} of 𝒫(Ovlo)\mathcal{P}(O_{v}^{lo}).
5OvhiO_{v}^{hi}\leftarrow the longest message suffix of Ov[r]O_{v}[r] for which \exists an ff-cover FhiF_{hi} of 𝒫(Ovhi)\mathcal{P}(O_{v}^{hi}).
6 \triangleright Trim extreme values and average
7Remove message Ovlo,OvhiO_{v}^{lo},O_{v}^{hi} from Ov[r]O_{v}[r] which results in the trimmed vector Ov[r]O_{v}^{\prime}[r].
8xv[r+1]=max(Ov[r])min(Ov[r])2x_{v}[r+1]=\frac{\max(O_{v}^{\prime}[r])-\min(O_{v}^{\prime}[r])}{2}
Algorithm 3 Filter-and-Average(v)(\mathcal{M}_{v}) (for node vv in round rr)

Towards proving the convergence property, we will first show that for any nonfaulty nodes v,uv,u running Algorithm Filter-and-Average in round rr, there will be a common value in the trimmed vectors Ov[r]O^{\prime}_{v}[r] and Ou[r]O^{\prime}_{u}[r] of the two nodes. For simplicity of presentation, we will omit the round variable rr in the following discussion. The theorems presented next guarantee the existence of this common element.

4.5. Properties of Algorithm Filter-and Average

According to the notation of Definition 12, Theorem 11 implies that for any pair of nonfaulty nodes v,uv,u executing Algorithm Filter-and-Average, there exists the common fault set FtF_{t} such that both of nodes vv and uu will be informed about FtF_{t} and will obtain the same common value sets RFwR_{F_{w}} for all FwFtF_{w}\neq F_{t}. The following theorem guarantees that for FtF_{t}, there will be some source components whose values will appear in the trimmed vector of v,uv,u regardless of the sets Flo,FhiF_{lo},F_{hi} used to trim their vectors. Recall that the notion of common fault set and leading node are introduced in Definition 12.

Theorem 13.

Let v,uv,u be any pair of nonfaulty nodes, FtF_{t} their common fault set, and t{v,u}t\in\{v,u\} the leading node. Then for z{v,u}z\in\{v,u\} and ff-covers Floz,FhizF_{lo}^{z},F_{hi}^{z} as identified in Algorithm 3, common value set RFlozR_{F_{lo}^{z}} will be included in the vector OzO_{z} after removing OzloO_{z}^{lo}, and common value set RFhizR_{F_{hi}^{z}} will be included in the vector OzO_{z} after removing OzhiO_{z}^{hi}.

Proof.

Without loss of generality, assume that t=ut=u. We first consider the validity of the theorem for node uu. The fact that t=ut=u implies that Ft=FuF_{t}=F_{u} is the set corresponding to the parallel execution during which uu executes Filter-and-Average. By Theorem 5 and the strong connectivity of SFu,FlouS_{F_{u},F_{lo}^{u}}, if uSFu,Flouu\notin S_{F_{u},F_{lo}^{u}}, it must have received each value of RFlouR_{F_{lo}^{u}} (as defined in Definition 12) from f+1f+1 disjoint (SFu,Flou,u)(S_{F_{u},F_{lo}^{u}},u)-paths in GFu¯G_{\overline{F_{u}}} to satisfy its Maximal-Consistency condition. Since |Flou|f|F_{lo}^{u}|\leq f there will be at least one path propagating each value of RFlouR_{F_{lo}^{u}} in GFuFlou¯G_{\overline{F_{u}\cup F_{lo}^{u}}}. If uSFu,Flouu\in S_{F_{u},F_{lo}^{u}}, by the strong connectivity of SFu,FlouS_{F_{u},F_{lo}^{u}}, uu will receive RFlouR_{F_{lo}^{u}} from paths entirely within SFu,FlouS_{F_{u},F_{lo}^{u}}, i.e., paths in GFuFlou¯G_{\overline{F_{u}\cup F_{lo}^{u}}}. Thus, in both cases, common value set RFlouR_{F_{lo}^{u}} will be included in vector OuO_{u} after removing OuloO_{u}^{lo}. Similar arguments hold for the case of OuhiO_{u}^{hi}.

Next, we consider node vv and assume that it executes Filter-and-Average during its parallel execution for FvF_{v}. Since vt=uv\neq t=u, node vv has satisfied the Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}) condition after receiving message (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})) from a nonfaulty path initiating at creachv(FFv)reachu(FFu)c\in reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u}); this holds by an argument identical to that of the proof of Theorem 11. Similarly with the proof of Theorem 11, cc will propagate all values of RFlovR_{F_{lo}^{v}} to vv through its FIFO-flooded message (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})). Since vv satisfies Completeness(v,c,Fu)Completeness(\mathcal{M}_{v},\mathcal{M}^{c},F_{u}), it receives each value of RFlovR_{F_{lo}^{v}} through a path set PP with no ff-cover HVSFu,FlovH\subseteq V\setminus S_{F_{u},F_{lo}^{v}}. Consequently, since |Flov|f|F_{lo}^{v}|\leq f and FlovSFu,Flov=F_{lo}^{v}\cap S_{F_{u},F_{lo}^{v}}=\emptyset, one of the paths of PP will not contain any node in FlovF_{lo}^{v}. This means that for any value xqRFlovx_{q}\in R_{F_{lo}^{v}}, there exists a path in GFlov¯G_{\overline{F_{lo}^{v}}} from which vv will receive xqx_{q}, and thus, all values of RFlovR_{F_{lo}^{v}} will be included in OvloO_{v}^{lo}. Similar arguments hold for the case of OvhiO_{v}^{hi}. ∎

The next theorem facilitates the analysis later; it states that there is always an overlap between certain pairs of source components of reduced graphs. Its proof is deferred to the Appendix C.

Theorem 14.

Suppose that graph G=(V,E)G=(V,E) satisfies condition 3-reach. For any three sets Fv,Fu,FwF_{v},F_{u},F_{w}, with FvV,Fu,FwVFvF_{v}\subset V,F_{u},F_{w}\subseteq V\setminus F_{v} and |Fv|,|Fu|,|Fw|f|F_{v}|,|F_{u}|,|F_{w}|\leq f, SFv,FuSFv,FwS_{F_{v},F_{u}}\cap S_{F_{v},F_{w}}\neq\emptyset.

We next define some notions, helpful to determine the existence of a common value in the intersection of Ov,OuO^{\prime}_{v},O^{\prime}_{u} for any pair of nonfaulty nodes v,uv,u. As before, we assume that FtF_{t} is the common fault set of v,uv,u.

Definition 0.

Let v,uv,u be two nonfaulty nodes and FtF_{t} their common fault set. For FwVF_{w}\subseteq V and |F|f|F|\leq f, let xminFw=minxRFwx\displaystyle x^{F_{w}}_{\min}=\min_{x\in R_{F_{w}}}x, i.e., the minimum common value for the source component SFt,FwS_{F_{t},F_{w}}. Define the maximum of all these minimum values over all possible FwF_{w} as

xmaxmin=maxFwV|Fw|fxminFwx_{\max\min}=\max_{\mathclap{\begin{subarray}{c}F_{w}\subseteq V\\ |F_{w}|\leq f\end{subarray}}}\ x_{\min}^{F_{w}}

and let SFt,FlS_{F_{t},F_{l}}, be a source component that includes the common value xmaxminx_{\max\min}, i.e., xmaxminRFlx_{\max\min}\in R_{F_{l}}. Analogously, let xmaxFwx^{F_{w}}_{\max} be the maximum common value for the source component SFt,FwS_{F_{t},F_{w}} and define minimum of these maximum values as

xminmax=minFwV|Fw|fxmaxFwx_{\min\max}=\min_{\mathclap{\begin{subarray}{c}F_{w}\subseteq V\\ |F_{w}|\leq f\end{subarray}}}\ x_{\max}^{F_{w}}

Similar to before, we assume that SFt,FhS_{F_{t},F_{h}} is a source component that includes the common value xminmaxx_{\min\max}.

Lemma 16.

For any two nonfaulty nodes v,uv,u, xmaxminxminmaxx_{\max\min}\leq x_{\min\max}

Proof.

By way of contradiction, assume that

(3) xmaxmin>xminmaxx_{\max\min}>x_{\min\max}

Then by Definition 15, we have the following two inequalities:

(4) For all valuesx contained inRFl,xxmaxmin\text{For all values}~{}x~{}\text{ contained in}~{}R_{F_{l}},~{}~{}~{}x\geq x_{\max\min}
(5) For all valuesy contained inRFh,yxminmax\text{For all values}~{}y~{}\text{ contained in}~{}R_{F_{h}},~{}~{}~{}y\geq x_{\min\max}

Now we make two observations:

  • Observation 1: Equations 3, 4, and 5 imply that RFlRFh=R_{F_{l}}\cap R_{F_{h}}=\emptyset.

  • Observation 2: By Definition 12, for each wSFt,FlSFt,Fhw\in S_{F_{t},F_{l}}\cap S_{F_{t},F_{h}} there will be a common value xwx_{w} contained in both RFlR_{F_{l}} and RFhR_{F_{h}}.

These two observations imply that SFt,FlSFt,Fh=S_{F_{t},F_{l}}\cap S_{F_{t},F_{h}}=\emptyset, a contradiction to Theorem 14. ∎

Theorem 17.

For nonfaulty nodes v,uv,u, after the termination of Algorithm Filter-and-Average, we have OvOuO^{\prime}_{v}\cap O^{\prime}_{u}\neq\emptyset.

Proof.

We will argue that a common value xx contained in RFlRFhR_{F_{l}}\cap R_{F_{h}} will be included in both the trimmed vectors OzO_{z}^{\prime}, for z{v,u}z\in\{v,u\}. For any FlozF_{lo}^{z} with |Floz|f|F_{lo}^{z}|\leq f chosen by any z{v,u}z\in\{v,u\}, define xminz,lox^{z,lo}_{\min} as the minimum value contained in RFlozR_{F_{lo}^{z}}. Then by the definition of xmaxminx_{\max\min}, we have xminz,loxmaxminx^{z,lo}_{\min}\leq x_{\max\min}.

Due to Theorem 13, value xminz,lox^{z,lo}_{\min} will be contained in OzO_{z} after removal of set OzloO_{z}^{lo}. Thus, due to the definition of OzloO_{z}^{lo}, any value xx contained in OzloO_{z}^{lo} will satisfy xxminz,lox\leq x^{z,lo}_{\min}, i.e., only values less or equal to xminz,lox^{z,lo}_{\min} will be removed from OzO_{z} due to removal of OzloO_{z}^{lo} and the value xminz,lox^{z,lo}_{\min} will remain in the trimmed OzO_{z}. Note that, in the event that there are multiple values identical to xminz,lox^{z,lo}_{\min}, then at least one instance of xminz,lox^{z,lo}_{\min} remains in OzO_{z}.

Next, observe that for z{v,u}z\in\{v,u\} and any choice of FhizF_{hi}^{z}, xmaxz,hixminmaxx^{z,hi}_{\max}\geq x_{\min\max} holds, where xmaxz,hix^{z,hi}_{\max} is the maximum value contained in RFlozR_{F_{lo}^{z}}, due to the definition of xminmaxx_{\min\max}. Due to Theorem 13, xmaxz,hix^{z,hi}_{\max} will be contained in OzO_{z} after removal of set OzhiO_{z}^{hi}. Similarly with the previous argument, any value xx contained in OzhiO_{z}^{hi} will satisfy xxmaxz,hix\geq x^{z,hi}_{\max} and the value xmaxz,hix^{z,hi}_{\max} will remain in the trimmed OzO_{z}. Note that, in the event that there are multiple values identical to xmaxz,hix^{z,hi}_{\max}, then at least one instance of xmaxz,hix^{z,hi}_{\max} remains in OzO_{z}. Now, by Lemma 16 we have that,

(6) xminz,loxmaxminxminmaxxmaxz,hix^{z,lo}_{\min}\leq x_{\max\min}\leq x_{\min\max}\leq x^{z,hi}_{\max}

Consequently, the only values removed from OzO_{z} will be less or equal to xminz,lox^{z,lo}_{\min}, greater or equal to xmaxz,hix^{z,hi}_{\max}, and values xminz,lo,xmaxz,hix^{z,lo}_{\min},x^{z,hi}_{\max} will not be trimmed. Thus, due to Equation (6), xmaxminx_{\max\min} and xminmaxx_{\min\max} will be included in the final trimmed vector OzO^{\prime}_{z} for z{v,u}z\in\{v,u\}.

4.6. Correctness of Algorithm BW

For a given execution round r0r\geq 0, recall that xv[r]x_{v}[r] is the state variable maintained at node vv at the end of round r1r-1. Value xv[0]x_{v}[0] is assumed to be the input given to node vv. We denote by U[r],μ[r]U[r],\mu[r], the maximum and the minimum state value at nonfaulty nodes by the end of round rr. Since the initial state of each node is equal to its input, U[0]U[0] and μ[0]\mu[0] is equal to the maximum and minimum value of the initial input of the nodes, respectively. Consequently, the convergence and validity requirements of approximate consensus can be stated as follows.

  • Convergence : ϵ>0, there exists an iteration rϵ such that for all rrϵ,U[r]μ[r]<ϵ\forall\epsilon>0,\text{ there exists an iteration }r_{\epsilon}\text{ such that for all }r\geq r_{\epsilon},U[r]-\mu[r]<\epsilon

  • Validity: r>0,U[r]U[0] and μ[r]μ[0]\forall r>0,U[r]\leq U[0]\text{ and }\mu[r]\geq\mu[0].

We next prove the convergence of the proposed algorithm which is based on the following lemma.

Lemma 18.

For every round rr, it holds that

U[r]μ[r]2U[r+1]μ[r+1]\displaystyle\frac{U[r]-\mu[r]}{2}\geq U[r+1]-\mu[r+1]
Proof.

For any r>0r>0, consider any pair of nonfaulty nodes v,uv,u. Without loss of generality, assume xv[r+1]xu[r+1]x_{v}[r+1]\geq x_{u}[r+1]. We prove the lemma by showing that

(7) U[r]μ[r]2xv[r+1]xu[r+1].\frac{U[r]-\mu[r]}{2}\geq x_{v}[r+1]-x_{u}[r+1].

Observe that by Theorem 17, the existence of element zz with zOv[r]Ou[r]z\in O^{\prime}_{v}[r]\cap O^{\prime}_{u}[r] is proved. This implies that max(Ou[r])z\max(O^{\prime}_{u}[r])\geq z.777Since Ou[r]O^{\prime}_{u}[r] is a sorted message set with respect to values, we define max(Ou[r])\max(O^{\prime}_{u}[r]) and min(Ou[r])\min(O^{\prime}_{u}[r]) to be the maximum and minimum value respectively, included in this set as the first component of its messages-pairs. Moreover, min(Ou[r])μ[r]\min(O^{\prime}_{u}[r])\geq\mu[r] holds, since if for node uu, Flo=FF_{lo}=F, then all actual faulty values are removed from from Ou[r]O_{u}[r] resulting to trimmed vector Ou[r]O^{\prime}_{u}[r]. Thus, for any remaining value xx in Ou[r]O^{\prime}_{u}[r], it holds that xμ[r]x\geq\mu[r]. If FloFF_{lo}\neq F, then there exists a nonfaulty node ww such that xw[r]x_{w}[r], state value at node ww in round rr, is trimmed, and hence, is smaller than any xOu[r]x\in O^{\prime}_{u}[r]. Therefore for any xOu[r]x\in O^{\prime}_{u}[r], xμ[r]x\geq\mu[r] holds. Consequently, since max(Ou[r])z\max(O^{\prime}_{u}[r])\geq z and min(Ou[r])μ[r]\min(O^{\prime}_{u}[r])\geq\mu[r], due to line 3 of Algorithm Filter-and-Average, we have that

xu[r+1]=max(Ov[r])min(Ov[r])2z+μ[r]2x_{u}[r+1]=\frac{\max(O_{v}^{\prime}[r])-\min(O_{v}^{\prime}[r])}{2}\geq\frac{z+\mu[r]}{2}

Similarly, min(Ov[r])z\min(O^{\prime}_{v}[r])\leq z and max(Ov[r])U[r]\max(O^{\prime}_{v}[r])\leq U[r], which implies

xv[r+1]z+U[r]2x_{v}[r+1]\leq\frac{z+U[r]}{2}

Equation (7) follows from these two inequalities. ∎

Lemma 18 and simple arithmetic operations imply the Convergence property (details appear in the termination study of Algorithm BW presented below). Validity is based on the observation that all the extreme values will be eliminated by each nonfaulty node owing to the way the trimmed vector Ov[p]O_{v}^{\prime}[p] is derived. The arguments are similar to that of the proof of Lemma 18. The proof is presented Appendix G.

Theorem 19 (Validity).

r0,U[r]U[0]\forall r\geq 0,U[r]\leq U[0] and μ[r]μ[0]\mu[r]\geq\mu[0]

Termination of Algorithm BW

Recall that the termination requirement for approximate consensus (Definition 1) requires that all nonfaulty nodes should output a value. We follow the approach in (Tseng and Vaidya, 2015, 2012). Suppose that the input is within the range [0,K][0,K], where KK\in\mathbb{R} is known a priori. If K<ϵK<\epsilon , then the problem is trivial, so it is assumed that KϵK\geq\epsilon. Repeated application of Lemma 18 implies that U[r+1]μ[r+1]U[0]μ[0]2rK2rU[r+1]-\mu[r+1]\leq\frac{U[0]-\mu[0]}{2^{r}}\leq\frac{K}{2^{r}}. This implies that for given K,ϵK,\epsilon, the state values of the nonfaulty nodes will be within ϵ\epsilon of each other after round r>log2Kϵr>\log_{2}\frac{K}{\epsilon}. Since K,ϵK,\epsilon are a priori known , each node can locally compute Kϵ\frac{K}{\epsilon} and output its value in the first round rr such that r>log2Kϵr>\log_{2}\frac{K}{\epsilon}.

Acknowledgements.
Nitin Vaidya’s work is supported in part by the Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196, and by the National Science Foundation award 1842198. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory, National Science Foundation or the U.S. Government.

References

  • (1)
  • Abraham et al. (2004) Ittai Abraham, Yonatan Amit, and Danny Dolev. 2004. Optimal Resilience Asynchronous Approximate Agreement. In OPODIS. 229–239.
  • Benediktsson and Swain (1992) J. A. Benediktsson and P. H. Swain. 1992. Consensus theoretic classification methods. IEEE Transactions on Systems, Man, and Cybernetics 22, 4 (July 1992), 688–704. https://doi.org/10.1109/21.156582
  • Bonomi et al. (2016) S. Bonomi, A. D. Pozzo, M. Potop-Butucaru, and S. Tixeuil. 2016. Approximate Agreement under Mobile Byzantine Faults. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 727–728. https://doi.org/10.1109/ICDCS.2016.68
  • Cybenko (1989) George Cybenko. 1989. Dynamic Load Balancing for Distributed Memory Multiprocessors. J. Parallel Distrib. Comput. 7, 2 (1989), 279–301. https://doi.org/10.1016/0743-7315(89)90021-X
  • Dolev (1982) Danny Dolev. March 1982. The Byzantine Generals Strike Again. Journal of Algorithms 3(1) (March 1982).
  • Dolev et al. (1993) Danny Dolev, Cynthia Dwork, Orli Waarts, and Moti Yung. 1993. Perfectly Secure Message Transmission. J. ACM 40, 1 (Jan. 1993), 17–47. https://doi.org/10.1145/138027.138036
  • Dolev et al. (1986) Danny Dolev, Nancy A. Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. 1986. Reaching approximate agreement in the presence of faults. J. ACM 33 (May 1986), 499–516. Issue 3. https://doi.org/10.1145/5925.5931
  • Fischer et al. (1985a) Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. 1985a. Easy impossibility proofs for distributed consensus problems. In Proceedings of the fourth annual ACM symposium on Principles of distributed computing (PODC ’85). ACM, New York, NY, USA, 59–70. https://doi.org/10.1145/323596.323602
  • Fischer et al. (1985b) Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985b. Impossibility of Distributed Consensus with One Faulty Process. J. ACM 32, 2 (April 1985), 374–382. https://doi.org/10.1145/3149.214121
  • Georgiou et al. (2013) Chryssis Georgiou, Seth Gilbert, Rachid Guerraoui, and Dariusz R. Kowalski. 2013. Asynchronous Gossip. J. ACM 60, 2, Article 11 (May 2013), 42 pages. https://doi.org/10.1145/2450142.2450147
  • Hegselmann and Krause (2002) Rainer Hegselmann and Ulrich Krause. 2002. Opinion dynamics and bounded confidence: Models, analysis and simulation. Journal of Artificial Societies and Social Simulation 5 (2002), 1–24.
  • Kieckhafer and Azadmanesh (1994) Roger M. Kieckhafer and Mohammad H. Azadmanesh. 1994. Reaching Approximate Agreement with Mixed-Mode Faults. IEEE Trans. Parallel Distrib. Syst. 5, 1 (1994), 53–63. https://doi.org/10.1109/71.262588
  • LeBlanc et al. (2013) H. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram. April 2013. Resilient Asymptotic Consensus in Robust Networks. IEEE Journal on Selected Areas in Communications: Special Issue on In-Network Computation 31 (April 2013), 766–781.
  • Litsas et al. (2013) Chris Litsas, Aris Pagourtzis, and Dimitris Sakavalas. 2013. A Graph Parameter That Matches the Resilience of the Certified Propagation Algorithm. In Ad-hoc, Mobile, and Wireless Network - 12th International Conference, ADHOC-NOW 2013, Wrocław, Poland, July 8-10, 2013. Proceedings (Lecture Notes in Computer Science), Jacek Cichon, Maciej Gebala, and Marek Klonowski (Eds.), Vol. 7960. Springer, 269–280. https://doi.org/10.1007/978-3-642-39247-4_23
  • Lynch (1996) Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann. 177–182 pages.
  • Pagourtzis et al. (2017a) Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. 2017a. Reliable broadcast with respect to topology knowledge. Distributed Computing 30, 2 (2017), 87–102. https://doi.org/10.1007/s00446-016-0279-6
  • Pagourtzis et al. (2017b) Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. 2017b. Reliable Communication via Semilattice Properties of Partial Knowledge. In Fundamentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux, France, September 11-13, 2017, Proceedings (Lecture Notes in Computer Science), Ralf Klasing and Marc Zeitoun (Eds.), Vol. 10472. Springer, 367–380. https://doi.org/10.1007/978-3-662-55751-8_29
  • Pease et al. (1980) M. Pease, R. Shostak, and L. Lamport. 1980. Reaching Agreement in the Presence of Faults. J. ACM 27, 2 (April 1980), 228–234. https://doi.org/10.1145/322186.322188
  • Sakavalas and Tseng (2018) Dimitris Sakavalas and Lewis Tseng. 2018. Delivery Delay and Mobile Faults. In 17th IEEE International Symposium on Network Computing and Applications, NCA 2018, Cambridge, MA, USA, November 1-3, 2018. IEEE, 1–8. https://doi.org/10.1109/NCA.2018.8548345
  • Sakavalas and Tseng (2019) Dimitris Sakavalas and Lewis Tseng. 2019. Network Topology and Fault-Tolerant Consensus. Morgan & Claypool Publishers. https://doi.org/10.2200/S00918ED1V01Y201904DCT016
  • Sakavalas et al. (2018) Dimitris Sakavalas, Lewis Tseng, and Nitin H. Vaidya. 2018. Effects of Topology Knowledge and Relay Depth on Asynchronous Appoximate Consensus. In 22nd International Conference on Principles of Distributed Systems, OPODIS 2018, December 17-19, 2018, Hong Kong, China (LIPIcs), Jiannong Cao, Faith Ellen, Luis Rodrigues, and Bernardo Ferreira (Eds.), Vol. 125. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 14:1–14:16. https://doi.org/10.4230/LIPIcs.OPODIS.2018.14
  • Su and Vaidya (2017) Lili Su and Nitin H. Vaidya. 2017. Reaching approximate Byzantine consensus with multi-hop communication. Inf. Comput. 255 (2017), 352–368. https://doi.org/10.1016/j.ic.2016.12.003
  • Tseng and Vaidya (2012) Lewis Tseng and Nitin H. Vaidya. 2012. Exact Byzantine Consensus in Directed Graphs. CoRR abs/1208.5075 (2012). arXiv:1208.5075 http://arxiv.org/abs/1208.5075
  • Tseng and Vaidya (2015) Lewis Tseng and Nitin H. Vaidya. 2015. Fault-Tolerant Consensus in Directed Graphs. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC ’15). ACM, New York, NY, USA, 451–460. https://doi.org/10.1145/2767386.2767399
  • Vaidya et al. (2012) Nitin H. Vaidya, Lewis Tseng, and Guanfeng Liang. 2012. Iterative Approximate Byzantine Consensus in Arbitrary Directed Graphs. In Proceedings of the thirty-first annual ACM symposium on Principles of distributed computing (PODC ’12). ACM.
  • Vicsek et al. (1995) Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. 1995. Novel Type of Phase Transition in a System of Self-Driven Particles. Phys. Rev. Lett. 75 (Aug 1995), 1226–1229. Issue 6. https://doi.org/10.1103/PhysRevLett.75.1226

Appendix A The kk-reach condition family

In this section, we summarize the tight topological conditions related with consensus in directed networks that have appeared in the literature along with their equivalents from the family of kk-reach conditions, for k=1,2,3k=1,2,3. The topological conditions CCS (abbreviating Crash-Consensus-Synchronous), CCA (Crash-Consensus-Asynchronous) and BCS (Byzantine-Consensus-Synchronous) were introduced in (Tseng and Vaidya, 2015) and proven tight for the cases of synchronous crash consensus, asynchronous approximate crash consensus and synchronous Byzantine consensus respectively. The determination of the necessary and sufficient topological condition for solving approximate Byzantine consensus in asynchronous systems, has been an open problem since 2012.

We next present some additional definitions that facilitate our presentation.

For set BVB\subseteq V, process vv is said to be an incoming (resp. outgoing) neighbor of set BB if vBv\not\in B, and there exists uBu\in B such that (v,u)E(v,u)\in E (resp. (u,v)E(u,v)\in E). The incoming and outgoing neighborhood of a node vv are the sets of its incoming and outgoing neighbors respectively and will be denoted with 𝒩v,𝒩v+\mathcal{N}^{-}_{v},\mathcal{N}^{+}_{v} respectively. We extend the notion to the incoming (resp. outgoing) neighborhood of a set BB, denoted with 𝒩B\mathcal{N}^{-}_{B} (resp. 𝒩B+\mathcal{N}^{+}_{B}) and defined as the set of all incoming (resp. outgoing) neighbors of BB. Given subsets of nodes AA and BB, set BB is said to have kk incoming neighbors in set AA if AA contains kk distinct incoming neighbors of BB. Next, we define a notion which concerns the connectivity of any two node sets of the graph as presented in (Tseng and Vaidya, 2015).

Definition 0.

Given disjoint non-empty subsets of nodes AA and BB, we will say that AxBA\stackrel{{\scriptstyle x}}{{\longrightarrow}}{B} holds, if BB has at least xx incoming neighbors in AA. The negation of AxBA\stackrel{{\scriptstyle x}}{{\longrightarrow}}{B} will be denoted by A⟶̸xBA\stackrel{{\scriptstyle x}}{{\not\longrightarrow}}{B}.

We also introduce the following useful generalization fo the reach set notion. The notion denotes all the multi-hop incoming neighbors of node vv in graph GF¯G_{\bar{F}}.

Definition 0 (Reach set of vv under FF).

For a subgraph G=(V,E)G^{\prime}=(V^{\prime},E^{\prime}) of GG, node vVv\in V^{\prime} and node set FV{v}F\subseteq V\setminus\{v\}, we will use the following notation,

reachvG(F)={uVF: v is reachable from u in graph GVF}reach_{v}^{G^{\prime}}(F)=\{u\in V^{\prime}\setminus F:\text{ $v$ is reachable from $u$ in graph }G^{\prime}_{V^{\prime}\setminus F}\}

Whenever G=GG^{\prime}=G we will omit the superscript GG^{\prime} and simply use the notation reachv(F)reach_{v}(F).

The definitions of conditions CCS, CCA, BCS defined in (Tseng and Vaidya, 2015) follow.

Definition 0 (Condition CCS).

For any partition F,L,C,RF,L,C,R of VV, where LL and RR are both non-empty, and |F|f|F|\leq f, at least one of the following holds:

  • LC1RL\cup C\stackrel{{\scriptstyle 1}}{{\longrightarrow}}{R}

  • RC1LR\cup C\stackrel{{\scriptstyle 1}}{{\longrightarrow}}{L}

Definition 0 (Condition CCA).

For any partition L,C,RL,C,R of VV, where LL and RR are both non-empty, at least one of the following holds:

  • LCf+1RL\cup C\stackrel{{\scriptstyle f+1}}{{\longrightarrow}}{R}

  • RCf+1LR\cup C\stackrel{{\scriptstyle f+1}}{{\longrightarrow}}{L}

Definition 0 (Condition BCS).

For any partition F,L,C,RF,L,C,R of VV, where LL and RR are both non-empty, and |F|f|F|\leq f, at least one of the following holds:

  • LCf+1RL\cup C\stackrel{{\scriptstyle f+1}}{{\longrightarrow}}{R}

  • RCf+1LR\cup C\stackrel{{\scriptstyle f+1}}{{\longrightarrow}}{L}

Observe that while BCS requires a 4-set partition F,L,R,CF,L,R,C of VV, condition CCA only requires a 3-set partition L,R,CL,R,C of VV.

We next present an equivalent condition to CCS, CCA, BCS, based on reach sets of any two nodes of the graph.

Definitions 1.

In the following, sets F,Fv,FuVF,F_{v},F_{u}\subseteq V intuitively represent possible faulty sets and thus are of cardinality at most ff, i.e., |F|,|Fv|,|Fu|f|F|,|F_{v}|,|F_{u}|\leq f. We define the three following conditions,

  • 1-reach: For any FVF\subset V such that |F|f|F|\leq f and any nodes u,vF¯u,v\in\overline{F}, we have

    reachu(F)reachv(F).reach_{u}(F)\cap reach_{v}(F)\neq\emptyset.
  • 2-reach: For any nodes u,vVu,v\in V and any node subsets FuF_{u}, FvF_{v} such that |Fu|,|Fv|f|F_{u}|,|F_{v}|\leq f, uFu¯u\in\overline{F_{u}}, and vFv¯v\in\overline{F_{v}}, we have

    reachv(Fv)reachu(Fu).reach_{v}(F_{v})\cap reach_{u}(F_{u})\neq\emptyset.
  • 3-reach: For any nodes u,vVu,v\in V and any node subsets FF, FuF_{u}, FvF_{v} such that |F|,|Fu|,|Fv|f|F|,|F_{u}|,|F_{v}|\leq f, uFFu¯u\in\overline{F\cup F_{u}}, and vFFv¯v\in\overline{F\cup F_{v}}, we have

    reachv(FFv)reachu(FFu).reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u})\neq\emptyset.

Observe that in a clique, it holds that reachv(Fv)reachu(Fu)=reachv(FvFu)reachu(FvFu)reach_{v}(F_{v})\cap reach_{u}(F_{u})=reach_{v}(F_{v}\cup~{}F_{u})\cap reach_{u}(F_{v}\cup F_{u}). Thus, for example condition 3-reach in a clique is equivalent with,

reachv(FFvFu)reachu(FFvFu)reach_{v}(F\cup F_{v}\cup F_{u})\cap reach_{u}(F\cup F_{v}\cup F_{u})\neq\emptyset

which is equivalent with the well known clique condition n>3fn>3f, tight for byzantine consensus. Analogously, one can show that in a clique, 1-reach and 2-reach are equivalent with n>fn>f and n>2fn>2f respectively.

We next present the generalization of the above conditions kk-reach which determines the family of conditions encompassing the above.

Definition 1.

For any sets F,Fv1,Fu1,,Fvk,FukF,F_{v}^{1},F_{u}^{1},\ldots,F_{v}^{k},F_{u}^{k}, each of cardinality at most ff

k-reach: {reachv(Fv1Fvk)reachu(Fu1Fuk) if k=evenreachv(FFv1Fvk1)reachu(FFu1Fuk1) if k=odd\textbf{{k-reach:} }\begin{cases}reach_{v}(F_{v}^{1}\cup\ldots\cup F_{v}^{k})\cap reach_{u}(F_{u}^{1}\cup\ldots\cup F_{u}^{k})\neq\emptyset&\text{ if }k=even\\ reach_{v}(F\cup F_{v}^{1}\cup\ldots\cup F_{v}^{k-1})\cap reach_{u}(F\cup F_{u}^{1}\cup\ldots\cup F_{u}^{k-1})&\text{ if }k=odd\end{cases}

In the following theorem, we show that conditions 1-reach, 2-reach, and 3-reach prove are equivalent to CCS, CCA, and BCS respectively.

Theorem 7.
  1. (a)

    CCS\Leftrightarrow 1-reach

  2. (b)

    CCA\Leftrightarrow 2-reach

  3. (c)

    BCS\Leftrightarrow 3-reach

Proof.

(a)(a) Condition 1-reach is trivially equivalent with the existence of a directed rooted tree in GF¯G_{\bar{F}} as presented in (Tseng and Vaidya, 2015). In turn, the equivalence of the latter condition with CCS has been proven in (Tseng and Vaidya, 2015; Sakavalas and Tseng, 2019).

(b)(b) Direction“\Rightarrow” is implicitly proven in (Tseng and Vaidya, 2015)888The claim is implied by the proof of Lemma 7 in (Tseng and Vaidya, 2015) proves the ``"``\Rightarrow" direction. We next prove direction “\Leftarrow“.
If CCA does not hold in GG, then there exists a partition L,R,CL,R,C of VV with L,RL,R\neq\emptyset such that LC⟶̸f+1RL\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{R} and RC⟶̸f+1LR\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{L}. Observe that |NL|,|𝒩R|f|N^{-}_{L}|,|\mathcal{N}^{-}_{R}|\leq f. This is because L,R,CL,R,C is a partition of VV and thus, NLRCN^{-}_{L}\subseteq R\cup C and NRLCN^{-}_{R}\subseteq L\cup C; since CCA is not satisfied, the claim holds. Subsequently, let vL,uRv\in L,u\in R; these nodes exist since L,RL,R\neq\emptyset as per CCA definition. Note that there exist two sets Fv=𝒩L,Fu=𝒩RF_{v}=\mathcal{N}^{-}_{L},F_{u}=\mathcal{N}^{-}_{R} of cardinality at most ff such that the following holds,

reachv(𝒩L)reachu(𝒩R)LR=reach_{v}(\mathcal{N}^{-}_{L})\cap reach_{u}(\mathcal{N}^{-}_{R})\subseteq L\cap R=\emptyset

and thus condition 2-reach does not hold.

(c)(c) For any sets F,FvVF,F_{v}\subseteq V with |F|,|Fv|f|F|,|F_{v}|\leq f and node wVFFvw\in V\setminus F\cup F_{v} we will use reachwGF¯(Fv)reach_{w}^{G_{\bar{F}}}(F_{v}) as defined in Definition 3. Note that by definitions 4 and 5, Condition BCS is equivalent to the following condition: for all sets FVF\subseteq V with |F|f|F|\leq f, CCA holds in GF¯G_{\bar{F}}.

Due to the equivalence of CCA with 2-reach in the previous step (b)(b), it holds that BCS is equivalent with the following condition: For all sets F,Fv,FuVF,F_{v},F_{u}\subseteq V with |F|,|Fv|,|Fu|f|F|,|F_{v}|,|F_{u}|\leq f,

reachvGF¯(Fv)reachvGF¯(Fv)reach_{v}^{G_{\bar{F}}}(F_{v})\cap reach_{v}^{G_{\bar{F}}}(F_{v})\neq\emptyset

Thus BCS is equivalent to,

reachv(FFv)reachv(FFv)reach_{v}(F\cup F_{v})\cap reach_{v}(F\cup F_{v})\neq\emptyset

which coincides with condition 3-reach.

Appendix B Necessity of condition 3-reach

We next show that condition 3-reach is necessary for asynchronous byzantine approximate consensus to be achieved in a network. With evee\stackrel{{\scriptstyle v}}{{\sim}}e^{\prime} we will denote the fact that that execution ee is indistinguishable from execution ee^{\prime} with respect to node vv (cf. (Lynch, 1996)). Note that, considering an approximate consensus algorithm, evee\stackrel{{\scriptstyle v}}{{\sim}}e^{\prime} implies that node vv will output the same value in both executions e,ee,e^{\prime}. To facilitate the proof, for A,BVA,B\subseteq V we will use the notation E(A,B)={(v,u):vA,uB}E(A,B)=\{(v,u):v\in A,u\in B\} to denote all edges from set AA to set BB.

Theorem 1 (Impossibility of Approximate Consensus).

Byzantine asynchronous approximate consensus is impossible in networks where condition 3-reach is not satisfied.

Proof.

Consider a network G=(V,E)G=(V,E) where condition 3-reach is not satisfied and assume the existence of algorithm 𝒜\mathcal{A} that achieves asynchronous approximate consensus in GG. This means that there exist sets F,Fv,FuVF,F_{v},F_{u}\subseteq V with |F|,|Fv|,|Fu|f|F|,|F_{v}|,|F_{u}|\leq f, and nodes vFFv¯v\in\overline{F\cup F_{v}}, uFFu¯u\in\overline{F\cup F_{u}} such that:

(8) reachv(FFv)reachu(FFu)=reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u})=\emptyset

We define the following three executions of 𝒜\mathcal{A} determined by the set of faulty nodes and their behavior, the inputs of all nodes and the communication delay. Observe that a Byzantine fault may simply deviate from the protocol by crashing and not sending any message since the Byzantine fault model is strictly stronger than the crash fault. Also, note that we can assume an external notion of time (global clock), not directly observable by the nodes, to facilitate the explicit description of delays. The latter technique has been considered in (Georgiou et al., 2013; Sakavalas and Tseng, 2018).

  1. (e1e_{1})

    The input of every node zVz\in V is xz=0x_{z}=0, all nodes in FvF_{v} have crashed from the beginning of the execution and all other nodes are nonfaulty; the latter is possible since |Fv|f|F_{v}|\leq f.

  2. (e2e_{2})

    The input of every node zVz\in V is xz=ϵx_{z}=\epsilon, all nodes in FuF_{u} have crashed from the beginning of the execution and all other nodes are nonfaulty; the latter is possible since |Fu|f|F_{u}|\leq f.

  3. (e3e_{3})

    Inputs: The input of every node zreachv(FFv)z\in reach_{v}(F\cup F_{v}) is xz=0x_{z}=0, the input of every node wreachu(FFu)w\in reach_{u}(F\cup F_{u}) is xw=ϵx_{w}=\epsilon; these inputs are well defined because of Eq. 8. All remaining nodes in V(reachv(FFv)reachu(FFu))V\setminus(reach_{v}(F\cup F_{v})\cup reach_{u}(F\cup F_{u})) have arbitrary inputs.

    Delivery delays: Message deliveries delays are the same as e1e_{1} and e2e_{2} except the delays of all messages transmitted through edges E(Fv,reachv(FFv))E(F_{v},reach_{v}(F\cup F_{v})), and all messages transmitted through edges E(Fu,reachu(FFu))E(F_{u},reach_{u}(F\cup F_{u})). We assume that the delivery delay of the latter messages is lower bounded by an arbitrary number TT\in\mathbb{N} of time-steps. The exact value of TT will be defined in the following. Message deliveries though all other edges are instant.

    Faulty set behavior: Node set FF is faulty and behaves towards reachv(FFv)reach_{v}(F\cup F_{v}) as in e1e_{1} and towards reachu(FFu)reach_{u}(F\cup F_{u}) as in e2e_{2}. More concretely, all messages transmitted through edges in E(F,reachv(FFv))E(F,reach_{v}(F\cup F_{v})) are identical to the messages transmitted through E(F,reachv(FFv))E(F,reach_{v}(F\cup F_{v})) in e1e_{1} and all messages transmitted through edges in E(F,reachu(FFu))E(F,reach_{u}(F\cup F_{u})) are identical to the messages transmitted in e2e_{2}. Observe that Eq 8 implies that E(F,reachv(FFv))E(F,reachu(FFu))=E(F,reach_{v}(F\cup F_{v}))\cap E(F,reach_{u}(F\cup F_{u}))=\emptyset and thus the latter behavior is well defined. This holds because if there exists an edge (w,z)E(F,reachv(FFv))E(F,reachu(FFu))(w,z)\in E(F,reach_{v}(F\cup F_{v}))\cap E(F,reach_{u}(F\cup F_{u})) , then wFw\in F and zreachv(FFv)reachu(FFu)z\in reach_{v}(F\cup F_{v})\cap reach_{u}(F\cup F_{u}); the latter contradicts Eq, 8. Also, this behavior is possible under the Byzantine faults model since |F|f|F|\leq f and Byzantine faults can have any arbitrary behavior 999This is a standard argument used towards indistinguishability in distributed systems. Details proving that this faulty behavior is well defined can be found in (Pagourtzis et al., 2017a)..

Note that all three executions are well defined due to the fact that 3-reach condition is not satisfied (Eq. 8). Since e1e_{1} is a well defined execution of 𝒜\mathcal{A}, there will be a specific time point tLt_{L} by which, node vv will terminate in e1e_{1}. Moreover, in order to satisfy the validity condition vv will output the value 0 upon termination of e1e_{1}. Similarly, since e2e_{2} is a well defined execution of 𝒜\mathcal{A}, there will be a specific time point tRt_{R} by which, node uu will terminate in e2e_{2} by outputting value ϵ\epsilon. We now assume that the lower bound TT for all delivery delays described in execution e3e_{3} is any TT with

T>max{tL,tR}T>\max\{t_{L},t_{R}\}

Now consider execution e3e_{3}; all messages received by vv are the same in executions e1,e3e_{1},e_{3}. Therefore e3ve1e_{3}\stackrel{{\scriptstyle v}}{{\sim}}e_{1} holds and by the previous argument vv will output 0 in execution e3e_{3}. Similarly e3ue2e_{3}\stackrel{{\scriptstyle u}}{{\sim}}e_{2} holds and uu will output ϵ\epsilon in execution e3e_{3}. Thus, the convergence property is violated.

Appendix C Proof of Theorem 14

The next theorem states that there is always an overlap between certain pairs of source components of reduced graphs. For ease of presentation we prove the theorem for condition BCS, which was proved equivalent to 3-reach in Theorem 7.

Theorem 1.

Suppose that graph G=(V,E)G=(V,E) satisfies condition 3-reach. For any three sets Fv,Fu,FwF_{v},F_{u},F_{w}, with FvV,Fu,FwVFvF_{v}\subset V,F_{u},F_{w}\subseteq V\setminus F_{v} and |Fv|,|Fu|,|Fw|f|F_{v}|,|F_{u}|,|F_{w}|\leq f, SFv,FuSFv,FwS_{F_{v},F_{u}}\cap S_{F_{v},F_{w}}\neq\emptyset

Proof.

Observe that by definition of a source component of a reduced graph it holds that

(9) 𝒩SFv,FuFvFu\displaystyle\mathcal{N}^{-}_{S_{F_{v},F_{u}}}\subseteq F_{v}\cup F_{u}
(10) 𝒩SFv,FwFvFw\displaystyle\mathcal{N}^{-}_{S_{F_{v},F_{w}}}\subseteq F_{v}\cup F_{w}

If SFv,FuSFv,Fw=S_{F_{v},F_{u}}\cap S_{F_{v},F_{w}}=\emptyset then we can consider the following partition L,R,F,CL,R,F,C of VV.

L\displaystyle L =SFv,Fu\displaystyle=S_{F_{v},F_{u}}
R\displaystyle R =SFv,Fw\displaystyle=S_{F_{v},F_{w}}
F\displaystyle F =Fv\displaystyle=F_{v}
C\displaystyle C =V(LRF)\displaystyle=V\setminus(L\cup R\cup F)

We will next show that RC⟶̸f+1LR\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{L} and LC⟶̸f+1RL\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{R}. First, observe that L,RL,R\neq\emptyset by the definition of source component and the fact that BCS holds. Moreover, we have that,

(RC)F=(SFv,Fw(VSFv,FuSFv,FwFv))Fv=SFv,FwFv=(R\cup C)\cap F=(S_{F_{v},F_{w}}\cup(V\setminus S_{F_{v},F_{u}}\setminus S_{F_{v},F_{w}}\setminus F_{v}))\cap F_{v}=S_{F_{v},F_{w}}\cap F_{v}=\emptyset

where the latter equation holds by definition of SFv,FwS_{F_{v},F_{w}}. Thus, by equation 9 we have that NL(RC)FuN^{-}_{L}\cap(R\cup C)\subseteq F_{u}, which is equivalent to |NL(RC)|f|N^{-}_{L}\cap(R\cup C)|\leq f, which in turn implies that RC⟶̸f+1LR\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{L}. Similarly, LC⟶̸f+1RL\cup C\stackrel{{\scriptstyle f+1}}{{\not\longrightarrow}}{R} can be shown. These two facts imply that partition L,R,F,CL,R,F,C violates condition BCS; a contradiction since BCS is equivalent with 3-reach by Theorem 7. ∎

Appendix D Proof of Theorem 10

Theorem 1.

Any nonfaulty node vv will eventually execute Filter-and-Average during a parallel execution for a set FF^{\prime}.

Proof.

Assume that vv does not execute Filter-and-Average for any parallel execution. Due to lines 1-1 of BW, this means that the shared variable nextroundnextround will be false. Consider the parallel execution for F=FF^{\prime}=F, where FF is the actual faulty set. Next observe that due to Lemma 6, vv will satisfy Maximal-Consistency condition. The same holds for all other nonfaulty nodes. Consequently, since all nodes in reachv(F)reach_{v}(F) are nonfaulty, they will all eventually FIFO-Flood COMPLETE(F)COMPLETE(F) and vv will eventually FIFO-receive all of these messages through nonfaulty paths in GF¯G_{\overline{F}}, satisfying the FIFO-Receive-All condition. Finally, by Lemma 8, vv will eventually satisfy the Completeness(c,Fu)Completeness(\mathcal{M}^{c},F_{u}) condition corresponding to any received (c,COMPLETE(Fu))(\mathcal{M}^{c},COMPLETE(F_{u})) message, since all these messages will be received through fully nonfaulty paths and thus function Verify(v)Verify(\mathcal{M}_{v}) will be assigned to true. Finally, since the nextroundnextround variable is assigned to false, node vv will execute Filter-and-Average. ∎

Appendix E The Redundant Flood algorithm (RedundantFlood)

We next present the natural algorithm for flooding a message throughout all redundant paths in the network. To avoid a trivial adversary attack where the adversary lies about the propagation path, each time a node vv receives a message (x,p)(x,p) from node uu we assume that vv checks if ter(p)=uter(p)=u and rejects the message if this is not the case. This is a feasible check since edges represent reliable communication channels where the recipient knows the identity of the sender.

Input: sender node identifier ss, sender’s value xx
1
2Code for ss: send message (x,s)(x,\langle s\rangle) to all outgoing neighbors. \triangleright local broadcast
3Code for vsv\neq s:
4if m=(x,p)m=(x,p) is the first message with path(m)=ppath(m)=p received from a node uNvu\in N^{-}_{v} then
5       if w𝒩v+ and pvww\in\mathcal{N}_{v}^{+}\text{ and }p||v||w is a redundant path  then
6             send message (x,p||v)(x,p||v) to ww
7      
8
Algorithm 4 RedundantFlood

Appendix F The FIFO-Flood and FIFO-Receive procedures

In the following we present a high-level description of the FIFO Flood and FIFO Receive procedures used in algorithm 1.

FIFO-Flood. During this procedure, each node ii maintains a FIFO-counter which is incremented any time the node sends a message. The counter is appended to every message sent by node ii. We stress that this counter is a shared variable between all parallel threads at node ii.

FIFO-Receive. We will say that node vv FIFO-receives a message mm from node uu propagated through a FIFO-Flood procedure if vv has also received all previous messages (with respect to FIFO-counter) sent by uu. More concretely, if vv FIFO-receives mm with FIFO-counter kk initiated by node uu, then vv must have received all messages initiated by uu with FIFO-counters 1,,k11,\ldots,k-1.

The latter procedures clearly implement FIFO channels between nonfaulty nodes which are connected via fully nonfaulty paths. Trivially, the ordering of messages propagated through fully nonfaulty paths will be maintained since all FIFO-counters will be propagated correctly. In the case of a path containing a faulty node, it is obvious that message order is impossible to maintain since the adversary can change the order arbitrarily under any protocol; this holds since the adversary can filter all information propagated through this path.

Appendix G Proof of Theorem 19

Theorem 1 (Validity).

r0,U[r]U[0]\forall r\geq 0,U[r]\leq U[0] and μ[r]μ[0]\mu[r]\geq\mu[0]

Proof.

We prove the claim by induction on the round index rr. For r=0r=0 both inequalities trivially hold. Assume that the claim holds for round rr. For any nonfaulty node uu we can show that min(Ou[r])μ[r]\min(O^{\prime}_{u}[r])\geq\mu[r] and max(Ou[r])U[r]\max(O^{\prime}_{u}[r])\leq U[r] with identical arguments used in the proof of Lemma 18. Consequently, we have that since xu[r+1]=max(Ov[r])min(Ov[r])2x_{u}[r+1]=\frac{\max(O_{v}^{\prime}[r])-\min(O_{v}^{\prime}[r])}{2}, μ[r]xu[r+1]U[r]\mu[r]\leq x_{u}[r+1]\leq U[r] holds. ∎