This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fault-Tolerant Consensus in Quantum Networks

MohammadTaghi Hajiaghayi 111 Department of Computer Science, University of Maryland, College Park, Maryland, USA. Partially supported by DARPA QuICC, the NSF grant 2218678, and the NSF grant 2114269.    Dariusz R. Kowalski 222 School of Computer and Cyber Sciences, Augusta University, Augusta, Georgia, USA.    Jan Olkowski 11footnotemark: 1
Abstract

Fault-tolerant consensus is about reaching agreement on some of the input values in a limited time by non-faulty autonomous processes, despite of failures of processes or communication medium. This problem is particularly challenging and costly against an adaptive adversary with full information. Bar-Joseph and Ben-Or [8] (PODC’98) were the first who proved an absolute lower bound Ω(n/logn)\Omega(\sqrt{n/\log n}) on expected time complexity of consensus in any classic (i.e., randomized or deterministic) message-passing network with nn processes succeeding with probability 11 against such a strong adaptive adversary crashing processes.

Seminal work of Ben-Or and Hassidim [10] (STOC’05) broke the Ω(n/logn)\Omega(\sqrt{n/\log n}) barrier for consensus in classic (deterministic and randomized) networks by employing quantum computing. They showed an (expected) constant-time quantum algorithm for a linear number of crashes t<n/3t<n/3.

In this paper, we improve upon that seminal work by reducing the number of quantum and communication bits to an arbitrarily small polynomial, and even more, to a polylogarithmic number – though, the latter in the cost of a slightly larger polylogarithmic time (still exponentially smaller than the time lower bound Ω(n/logn)\Omega(\sqrt{n/\log n}) for classic computation).

Keywords: distributed algorithms, quantum algorithms, adaptive adversary, crash failures, Consensus, quantum common coin, approximate counting

1 Introduction

Consensus is about making a common decision among the processes’ initial input values in a limited time by every non-faulty process, despite the faulty behaviour of some of the players. Since its introduction by Pease, Shostak and Lamport [31] (JACM’80), who ruled out trivial solutions (such as always deciding on the same bit), fault-tolerant consensus has constantly been among foundation problems in distributed computing. This problem has been studied in synchronous/asynchronous and deterministic/randomized computation models and under various fault-tolerant or adversarial models: Fail-Stop (crashes) and Byzantine, Static and Adaptive, Computationally Bounded and Unbounded Adversaries - just to name a few (see Section 2.1 for related work).

While the landscape of Consensus problem under the classic model of computation is well-developed, much less is known if we allow for quantum computation. The seminal work of Ben-Or and Hassidim [10] (albeit a short 5-pager in STOC’05) broke the Ω(n/logn)\Omega(\sqrt{n/\log n}) rounds time barrier for classic computing by employing quantum computing to further use the power of randomization in distributed computing. They showed an (expected) constant-time quantum algorithm for a linear number of crashes t<n/3t<n/3, however, the algorithm is inefficient in terms of communication bits and, even more importantly, in terms of the number of quantum bits (qubits), as it uses Ω(n)\Omega(n) of them per process. Since then no algorithm has managed to reduce the quantum resources needed to solve Consensus. Because generating, maintaining and sending quantum bits is extremely costly (today’s quantum devices use less than 500 qubits), thus the main question of our paper emerges naturally:

Could the number of qubits be substantially reduced without harming the time complexity?

Distributed setting.

We consider a quantum synchronous message-passing model (c.f., [7]), consisting of nn synchronous processes (also called players), each with common clock (a clock tick is called a round or a step) and unique id from the known set 𝒫=[n]={1,,n}\mathcal{P}=[n]=\{1,\ldots,n\}.

Between any pair of processes we assume the existence of a quantum channel being able to transmit reliable***Messages are not lost nor corrupted while in transit. messages caring quantum bits, qubits. For the sake of completeness, we also augment the model by classic point-to-point channels between any pair of processes. In each round, a process can send (personalized) quantum and classic messages to any selected subset of other processes. After multicasting messages, in the same round a process receives messages that were just sent to it by other processes, and performs local computation, which involves both quantum and classic bits.Local computation also decides what messages to send in the next round and to whom.

Processes are prone to crash failures, also called fail-stops. A crashed process permanently stops any activity, including sending and receiving messages.

We model crashes as incurred by a full-information adversary (the same as in [8, 10]) that knows the algorithm, the exact pure quantum state (see Section 3) and the classic state of the system at any point of an execution, and has an unbounded computational power. The adversary decides which processes to fail and when. The adversary is also adaptive – it can make a decision on-line based on its current full-knowledge of the system. However, the adversary does not know the future computation, which means that it does not know future random bits drawn by processes.

As for the quantum part, the adversary can apply no quantum operation to the system, but it is aware of all quantum and classic changes of state that the network undergoes. If a process is crashed by the adversary, we assume that its quantum bits are not destroyed (in particular, entangled qubits in other processes do not collapse but maintain their entanglement), however they cannot be used in further computation.

Failures are not clean – when a process crashes when attempting to multicast a message, then some of the recipients may receive the message and some may not; this aspect is controlled by the adversary. A tt-adversary is additionally restricted by the the number of crashed processes being smaller than tt; if t=nt=n then the nn-adversary is also called an unbounded adversary (note that even in such case, at least one process must survive for consensus to make sense). Throughout the paper, we will be calling the adversary described above “adaptive”, for short.

Consensus problem:

each process pp has its own initial value inputpinput_{p} and has to output a (common) decision value, so that the following conditions hold: validity – decision must be on an input value of some process; agreement – no two processes decide on different values; and termination – each process eventually decides on some value, unless it is faulty. All those three requirements must hold with probability 1. We focus on binary consensus, in which initial values are in {0,1}\{0,1\}.

Correctness and complexity – in terms of time (the number of rounds needed for all processes to terminate) and the number of quantum bits (qubits) and communication bits – are analyzed and maximized (worst-case) against an adaptive adversary.

We say that a random event occurs with high probability (whp for short), if its probability could be made 1O(nc)1-O(n^{-c}) for any sufficiently large positive constant cc by linear scaling of parameters.

2 Our Results

In this work, we focus on improving quantum bits’ and communication complexities (without harming time complexity) of quantum algorithms solving Consensus problem with probability 1 against an adaptive full-information adversary capable of causing processes’ crashes. We observe that the maximum, per process, number of communication bits in Consensus problem is Ω(n)\Omega(n), therefore one can only hope to improve amortized communication complexity (per process), see Lemma 6 in Appendix D.

Our first main result is a quantum algorithm that solves Consensus (correctness with probability 11) in expected constant number of rounds and amortized number of qubits and communication bits per process being an arbitrarily low polynomial. This directly improves, by a polynomial factor, the result of Ben-Or and Hassidim [10], which required a super-linear number of qubits and communication bits, amortized per process. More precisely, in Section 4 we give an algorithm and in Section A we prove the following result (see also its technical counterpart – Theorem 6):

Theorem 1.

For any ϵ>0\epsilon>0, there is an algorithm solving consensus against an adaptive n/3n/3-adversary in expected O(1)O(1) rounds while using O(nϵ)O(n^{\epsilon}) qubits and communication bits (amortized) per process, whp.

Although our algorithm, called CheapQuantumConsensus, uses an idea of so called weak global coin analogously to e.g., [19, 10], ours is based on two entirely different qubit-and-communication efficient algorithmic techniques. First technique is a new quantum implementation of a weak global coin, c.f., Definition 1, called CheapQuantumCoin. It works in constant, O(1ϵ)O\big{(}\frac{1}{\epsilon}\big{)}, time, uses an arbitrarily small polynomial number of bits O(nϵ)O(n^{\epsilon}), amortized per process (for any chosen constant ϵ>0\epsilon>0), and results in all non-faulty processes returning the same value with a non-zero constant probability, c.f., Theorem 3. Second is a deterministic algorithm for counting, which returns, in constant time, a count of the number of input 1 (or 0, resp.) among the processes. The output could be different at different processes, due to failures and a short time for counting, but each of them is a number not smaller than the number of processes with input 1 (0, resp.) at the end of the procedure and not larger than the number of processes with input 1 (0, resp.) in the beginning of the procedure. It uses only an arbitrary small polynomial number of communication bits, amortized per process. We give an overview of these two techniques in the next Sections 5 and 6, respectively, and their analysis in Sections B and C, respectively.

Although constant-time algorithms cannot achieve low communication, as we argue in Lemma 7 in Appendix D, interestingly, we show that our main algorithm could be re-instantiated in such a way that it uses only a polylogarithmic number of qubits and communication to solve consensus in a slightly (polylogarithmically) larger number of rounds (see Section 4 and Appendix A, and technical counterpart Theorem 7 for more details):

Theorem 2.

There is an algorithm solving consensus against an unbounded adaptive adversary in polylogarithmic number of rounds, in expectation, while using a polylogarithmic number of qubits and communication bits (amortized) per process, whp.

We believe that the newly developed techniques could be also applied to other types of failures, after failure-specific modifications. For example, although message omission failures require linear amortized communication per process (c.f., [24]), one could still use a small polynomial or even a polylogarithmic number of qubits (together with a linear number of classic communication bits) per process, if qubits are handled according to our techniques while some additional classic communication bits are introduced to handle message omissions. We leave details to follow-up work.

2.1 Previous and Related Work

Consensus in the classic (non-quantum) model.

Bar-Joseph and Ben-Or [8] (see also their extended version [9]) proved a lower bound O(nlogn)O(\frac{\sqrt{n}}{\log{n}}) on expected time complexity of consensus against an adaptive adversary. They also complemented it by time-optimal randomized algorithm. Their algorithm uses expected O(n3/2logn)O(\frac{n^{3/2}}{\log{n}}) number of communications bits, amortized per process, which has been recently improved by Hajiaghay, Kowalski and Olkowski [25] to O(n)O(\sqrt{n}) (while maintaining the almost-optimal round complexity O(nlogn)O(\sqrt{n}\log{n})).

Fisher and Lynch [20] proved a lower bound f+1f+1 on deterministic consensus with ff crashes (that actually occurred, i.e., f<tf<t), thus separating deterministic solutions from randomized. Regarding communication complexity, Amdur, Weber and Hadzilacos [3] showed that the amortized number of messages per process is at least constant, even in some failure-free execution. Dwork, Halpern and Waarts [18] found a solution with O(logn)O(\log n) messages per process, but requiring an exponential time, and later Galil, Mayer and Yung [22] developed a message-optimal algorithm working in over-linear O(n1+ε)O(n^{1+\varepsilon}) time, for any 0<ε<10<\varepsilon<1. They also improved the communication further to a constant number of communication bits per process, but the resulting algorithm was exponential in the number of rounds. Chlebus and Kowalski [12] showed that consensus can be solved in O(f+1)O(f+1) time and with O(log2f)O(\log^{2}f) messages if only the number nfn-f of non-faulty processors satisfies nf=Ω(n)n-f=\Omega(n). It was later improved in [13] to O(f+1)O(f+1) time and O(polylog n)O(\text{polylog }n) number of communication bits. All the abovementioned communication complexities are amortized per process.

Quantum consensus.

To the best of our knowledge, almost all previous papers on quantum consensus concentrated on assuring feasibility of the problem against strong Byzantine adversaries, c.f., [15, 26, 28], or on optimizing time complexity, including the work of Ben-Or and Hassidim [10] achieving constant time against an adaptive adversary. Sparse quantum communication has been considered by Chlebus, Kowalski and Strojnowski in [14], but their protocols work correctly only with some probability smaller than 11 and for a specific number of failures corresponding to the probability of success. Another difference is that they used quantum operations to encode the classical inputs in quantum registers and used it to directly solve consensus. In this paper, we show another, more-efficient approach, in which we first create a quantum, weak global coin and later employ this tool to the state-of-the-art framework of solving consensus based on the common coin. Other distributed computing problems, not necessarily fault-prone, were also analyzed in quantum models, c.f., [11, 32, 4, 33].

More efficient classic randomized solutions against weak adversaries.

Whenever weaker oblivious adversaries are considered, randomness itself occurred to be enough in reducing time complexity to a constant. Chor, Merritt and Shmoys [16] developed constant-time algorithms for consensus against an oblivious adversary – that is, the adversary who knows the algorithm but has to decide which process fails and when before the execution starts. Their solution, however, requires a linear number of communication bits per process. Gilbert and Kowalski [23] presented a randomized consensus algorithm that achieves optimal communication complexity of O(1)O(1) amortized communication bits per process while terminating in O(logn)O(\log n) time with high probability (whp for short), as long as the number of crashes f<n/2f<n/2.

Classic consensus in more demanding models.

Dolev and Reischuk [17] and Hadzilacos and Halpern [24] proved the Ω(f)\Omega(f) lower bound on the amortized message complexity per process of deterministic consensus for omission or (authenticated) Byzantine failures. However, under some limitation on the adversary and requiring termination only whp, the sublinear expected communication complexity O(npolylog n)O(\sqrt{n}\;\text{polylog }{n}) per process can be achieved even in case of Byzantine failures, as proved by King and Saia [27]. Such limitations are apparently necessary to achieve subquadratic time complexity for Byzantine failures, c.f., Abraham et al. [1]. Asynchrony also implies large communication – Aspnes [5] proved a lower bound Ω(n/log2n)\Omega(n/\log^{2}n) on communication complexity per process. The complexity bounds in this setting have been later improved, c.f., [6, 2].

3 Technical preliminaries

Quantum model of computation.

We focus on quantum properties relevant to our algorithms. A quantum system operates on qubits. Single qubit can be either in a pure or a mixed state. A pure state is a vector in a 22-dimensional Hilbert space \mathcal{H}, while a mixed state is modelled as a probabilistic distribution over pure states. Similarly, a register consisting of dd qubits can also be in a pure or a mixed state. A pure quantum state, denoted |x\ket{x}, is a vector of 2d2^{d}-dimensional Hilbert space d=\mathcal{H}^{\otimes d}=\mathcal{H}\otimes\ldots\otimes\mathcal{H}. Again, a mixed state of larger dimension is viewed a probabilistic distribution over pure states. In our paper, we will operate only on pure states, and we will use only the standard computational basis of the Hilbert space, which consists of vectors {|b1bd:b1,bd{0,1}d}\{\ket{b_{1}\ldots b_{d}}:b_{1}\ldots,b_{d}\in\{0,1\}^{d}\}, to describe the system. Therefore, any state |x\ket{x} can be expressed as |x=i=02d1αi|i\ket{x}=\sum_{i=0}^{2^{d}-1}\alpha_{i}\ket{i}, with the condition that i|αi|2=1\sum_{i}|\alpha_{i}|^{2}=1, since quantum states can be only normalized vectors.

Transitions, or equivalently – changes of states of a quantum system, are given by unitary transformations on the Hilbert space of dd qubits. These unitary transformations are called quantum gates. These operations are exhaustive in the sense that any quantum computation can be expressed as a unitary operator on some Hilbert space. There are small-size sets of quantum gates working on two-dimensional space that are universal – any unitary transformation on a 2d2^{d}-dimensional quantum space can be approximated by a finite collection of these universal gates. In our applications, any quantum algorithm computation run by a process requires polynomial (in nn) number of universal gates.

Finally, an important part of quantum computation is also a quantum measurement. Measurements are performed with respect to a basis of the Hilbert space – in our case this is always the computational basis. A complete measurement in the computational basis executed on a state |x=i=02d1αi|i\ket{x}=\sum_{i=0}^{2^{d}-1}\alpha_{i}\ket{i} leaves the state in one of the basis vectors, |i\ket{i}, for i{0,1}di\in\{0,1\}^{d}, with probability αi2\alpha_{i}^{2}. The outcome of the measurement is a classic register of dd bits, informing to which vector the state has been transformed. It is also possible to measure only some qubits of the system, which is called a partial measurement. If AA describes the subset of qubits that we want to measure and BB is the remaining part of the system, then the partial measurement is defined by the set of projectors {Πi=|iAi|AIB|for i{0,1}d}\{\Pi_{i}=\ket{i}_{A}\bra{i}_{A}\otimes I_{B}\ \,|\ \,\mbox{for }i\in\{0,1\}^{d}\}.Whenever it could be irrelevant, from the context, we may follow the standard notation in quantum computing and skip writing normalizing factors. In the former, a subscript refers to the part of the system on which the object exists, II denotes the identity function, while i|\bra{i} is a functional of the dual space to the original Hilbert space (its matrix representation is the conjugate transpose of the matrix representation of |i\ket{i}). If before the measurement the system was in a state |xAB\ket{x}_{AB} then, after the measurement, it is in one of the states {Πi|xAB|for i{0,1}d}\{\Pi_{i}\ket{x}_{AB}\ \,|\ \,\mbox{for }i\in\{0,1\}^{d}\}, where state Πi|xAB\Pi_{i}\ket{x}_{AB} is achieved with probability x|ABΠi|xAB\bra{x}_{AB}\Pi_{i}\ket{x}_{AB}.§§§Πi|xAB\Pi_{i}\ket{x}_{AB} and x|ABΠi|xAB\bra{x}_{AB}\Pi_{i}\ket{x}_{AB} are simply linear operations on matrices and vectors. The reader can find a comprehensive introduction to quantum computing in [29].

Graph notations.

Let G=(V,E)G=(V,E) denote an undirected graph. Let WVW\subseteq V be a set of nodes of GG. We say that an edge (v,w)(v,w) of GG is internal for WW if vv and ww are both in WW. We say that an edge (v,w)(v,w) of GG connects the sets W1W_{1} and W2W_{2}, or is between W1W_{1} and W2W_{2}, for any disjoint subsets W1W_{1} and W2W_{2} of VV, if one of its ends is in W1W_{1} and the other in W2W_{2}. The subgraph of GG induced by WW, denoted G|WG|_{W}, is the subgraph of GG containing the nodes in WW and all the edges internal for WW in GG. A node adjacent to a node vv is a neighbor of vv and the set of all the neighbors of a node vv is the neighborhood of vv. NGi(W)N^{i}_{G}(W) denotes the set of all the nodes in VV that are of distance at most ii from some node in WW in graph GG. In particular, the (direct) neighborhood of vv is denoted NG(v)=NG1(v)N_{G}(v)=N^{1}_{G}(v).

The following combinatorial properties are of utter importance in the analysis of our algorithms. Graph GG is said to be \ell-expanding, or to be an \ell-expander, if any two subsets of \ell nodes each are connected by an edge. Graph GG is said to be (,α,β)(\ell,\alpha,\beta)-edge-dense if, for any set XVX\subseteq V of at least \ell nodes, there are at least α|X|\alpha|X| edges internal for XX, and for any set YVY\subseteq V of at most \ell nodes, there are at most β|Y|\beta|Y| edges internal for YY. Graph GG is said to be (,ε,δ)(\ell,\varepsilon,\delta)-compact if, for any set BVB\subseteq V of at least \ell nodes, there is a subset CBC\subseteq B of at least ε\varepsilon\ell nodes such that each node’s degree in G|CG|_{C} is at least δ\delta. We call any such set CC a survival set for BB.

4 Consensus Algorithm

On a very high level, our consensus algorithm CheapQuantumConsensus repeatedly uses the counting procedure FastCounting, specified in Section 6, to compute the number of 0’s and 1’s preferred by processes, see line 1. (Recall that the outcomes of FastCounting could be slightly different across processes, but by no more than the number of crashes.) Depending on the outcome, each process may change its preferred value to the dominating one (among the received preferred values), decide on it if the domination is substantial, or run the quantum common coin procedure in some almost-unbiased cases – see lines 1-1 in the pseudocode of CheapQuantumConsensus in Figure 1; all lines involving communication are underlined. The latter is a well-established general framework for solving consensus, proposed first by Bar-Joseph and Ben-Or [9] in the context of classic randomized computation against an adaptive adversary. Our novelty is in two new techniques, employed in lines 1 and 1 of the pseudocode in Figure 1: fast and quantum/communication-efficient counting and weak global coin, respectively. Both these techniques use parameters x,d,αx,d,\alpha, which, roughly speaking, correspond to the density of random communication graphs used in these algorithms. The detailed performance formulas of these algorithms, with respect to those parameters, are stated in Theorems 3 and 5.

In the heart of these two techniques lies a crucial observation: consensus (as well as common coin and counting) could be achieved quickly even if many processes do not directly exchange messages, but use some carefully selected sparse set of communication links instead. This way, instead of creating qubits for each pair of processes, we could do it only per some pairs corresponding to some communication links to be used. Obviously, this set of links, modeled as an evolving communication graph, needs to be maintained adaptively and locally by processes throughout the execution – otherwise, an adaptive adversary would learn it and design crashes to separate processes and prevent consensus.

input: 𝒫\mathcal{P}, pp, inputpinput_{p}
1 bpinputpb_{p}\leftarrow input_{p}   ;   r1r\leftarrow 1   ;   decidedFALSE\texttt{decided}\leftarrow FALSE ;
2 while TRUETRUE do
3       participate in FastCounting(𝒫,p,bp)(\mathcal{P},p,b_{p}) (run with parameters x=nϵ,d=logn,α=nϵx=n^{\epsilon},d=\log{n},\alpha=n^{\epsilon}) that counts the processes who have bp=1b_{p}=1 and the processes who have bp=0b_{p}=0; let OprO_{p}^{r}, ZprZ_{p}^{r} be the numbers of ones and zeros (resp.) returned by FastCounting;
4       NprZpr+OprN_{p}^{r}\leftarrow Z_{p}^{r}+O_{p}^{r};
5      
6      if (Npr<n/logn)(N_{p}^{r}<\sqrt{n/\log n}) then
7             1)1) send bpb_{p} to all processes; 2)2) receive all messages sent to pp in round r+1r+1;
8             3)3) implement a deterministic protocol for n/logn\sqrt{n/\log n} rounds;
9            
10       end if
11      
12      if decided=TRUE\texttt{decided}=TRUE then
13             diff \leftarrow Npr3N_{p}^{r-3} - NprN_{p}^{r};
14             if (diff Npr2/10\leq N_{p}^{r-2}/10) then STOP;
15             else decided FALSE\leftarrow FALSE;
16            
17       end if
18      if Opr>(7Npr1)/10O_{p}^{r}>(7N_{p}^{r}-1)/10 then bp1b_{p}\leftarrow 1, decided TRUE\leftarrow TRUE;
19       else if Opr>(6Npr1)/10O_{p}^{r}>(6N_{p}^{r}-1)/10 then bp1b_{p}\leftarrow 1;
20       else if Opr<(4Npr1)/10O_{p}^{r}<(4N_{p}^{r}-1)/10 then bp0b_{p}\leftarrow 0, decided TRUE\leftarrow TRUE;
21       else if Opr<(5Npr1)/10O_{p}^{r}<(5N_{p}^{r}-1)/10 then bp0b_{p}\leftarrow 0;
22       else set bpb_{p} to the output of CheapQuantumCoin(𝒫,p)(\mathcal{P},p) executed with parameters d=lognd=\log{n}, α=nϵ\alpha=n^{\epsilon} ;
23      
24      rr+1r\leftarrow r+1;
25      
26 end while
27
return bpb_{p} ;
  /* pp outputs final decision */
Algorithm 1 CheapQuantumConsensus for process pp

Algorithm’s description. Each process pp stores its current choice in bpb_{p} (which is initialized to pp’s input). The value bpb_{p} in the end of the algorithm indicates pp’s decision. Now, processes use O(1)O(1) (in expectation) phases to update their values bpb_{p} such that eventually every process keeps the same decision. To do so, in a round rr every process pp calculates the number of processes whose current choice is 11 and the number of processes whose current choice is 0, denoted OprO_{p}^{r} and ZprZ_{p}^{r} respectively. Based on these numbers, process pp: either sets bpb_{p} to 11, if the number OprO_{p}^{r} is large enough; or it sets bpb_{p} to 0, if the number ZprZ_{p}^{r} is large; or it replaces bpb_{p} with a random bit, if the numbers of zeros and ones are close to each other. If for generating the random bit, in line 1, processes use a quantum implementation of a weak global coin (implemented with CheapQuantumCoin algorithm, specified in Section 5), they will all have the same value bpb_{p} with constant probability unless more than third of alive processes crash. Assuming the presence of the adaptive adversary, this could not be achieved quickly if using classic communication only. Once it happens with the help of the quantum weak global coin, the conditional statements in lines 1-1, run in the next iteration of the “while” loop, guarantee that once the majority of processes have the same value bpb_{p}, the system converges to this value in at most 22 phases. Since the probability of this event is constant (guaranteed by the quantum weak global coin combined with the analysis of the first classic framework in [9]), the expected number of phases before the consensus algorithm terminates is constant. That reasoning holds, assuming that at most 1/31/3 fraction of processes crashed (we will generalize it to any tnt\leq n at the end of this section).

As mentioned earlier, the major improvement in the above protocol comes from using novel techniques for counting and weak global coin. For the former, we use the FastCounting algorithm, which, with the choice of parameters given in line 1, works in O((1ϵ)4)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{4}\Big{)} rounds and uses O(n1+3ϵlog2n)O\big{(}n^{1+3\epsilon}\log^{2}{n}\big{)} (classic) communication bits in total. Similarly, the CheapQuantumCoin algorithm, executed in line 1, terminates in O((1ϵ)3)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{3}\Big{)} rounds and uses O(n1+2ϵlog2n)O\Big{(}n^{1+2\epsilon}\log^{2}{n}\Big{)} both quantum and classic bits; we need to divide the communication formulas by nn to obtain the complexity amortized per process. By applying these two techniques in the fashion described above, we get Theorem 1; detail proof is deferred to Appendix A.

Handling arbitrary number of crashes.

Consider O(logn)O(\log{n}) repetitions of the main loop (phases) of the CheapQuantumConsensus algorithm. If during these phases, the processes with value bp=1b_{p}=1 become a large majority (at least 610\frac{6}{10} fraction of alive processes), then, as discussed before, every process will decide within the next two rounds. The same holds if processes with value bp=0b_{p}=0 start to overpopulate by a ratio of 610\frac{6}{10} all non-faulty processes. On the other hand, if the cardinalities of the two groups with different values bpb_{p} are close to each other, then the processes execute the CheapQuantumCoin algorithm. It outputs a random bit (the same in every participating process), under the assumption that at least a 23\frac{2}{3} fraction of processes that started this phase as non-faulty have not crashed during this phase. However, in these O(logn)O(\log{n}) phases there must be at least one phase in which the property of a 23\frac{2}{3} fraction of processes survive holds. In what follows, we argue that if the adversary can crash arbitrarily many processes, but smaller than nn, then the expected number of phases should still be O(logn)O(\log{n}). Now, to obtain the algorithm stated in Theorem 2, we make two more adjustments of the original CheapQuantumConsensus algorithm. In lines 1 and 1, processes execute the algorithms FastCounting and CheapQuantumCoin, respectively, with parameters x,d,αx,d,\alpha set as follows: x=2,d=logn,α=lognx=2,d=\log{n},\alpha=\log{n}. This corresponds to a use of sparse graph for communication (of degree roughly O(logn)O(\log{n})). In consequence, the time complexity of the FastCounting algorithm increases to O(log4n)O(\log^{4}{n}), but the communication complexity decreases to O(log8n)O(\log^{8}{n}) amortized per process. The details are presented in Section 6, and performance follows from putting the abovementioned parameters to the formulas in Theorem 5). Similarly, the time complexity of the CheapQuantumCoin algorithm increases to O(log3n)O(\log^{3}{n}), but the communication complexity (both quantum and classical) decreases to O(log7n)O(\log^{7}{n}) amortized per process. The details are presented in Section 5, and performance follows from putting the abovementioned parameters to the formulas in Theorem 3).

5 Qubit-and-Communication Efficient Quantum Common Coin

In this section, we design a new tt-resilient weak global coin, for t<nt<n, with the help of quantum communication and computation. The coin must satisfy the following definition:

Definition 1 ([10]).

Let 𝒞\mathcal{C} be a protocol for nn players (with no input), where each player ii outputs a (classical) bit vi{0,1}v_{i}\in\{0,1\}. We say that the protocol 𝒞\mathcal{C} is a tt-resilient weak global coin protocol (or computes a weak global coin, for short) with fairness ρ>0\rho>0, if for any adaptive tt-adversary and any value b{0,1}b\in\{0,1\}, with probability at least ρ\rho, vi=bv_{i}=b for all good players ii.

On the high level, our protocol CheapQuantumCoin chooses a leader process uniformly at random and all other processes agree on the random bit proposed by the leader. Quantum phenomena is used to hide the random choices of the leader and its output from the adaptive adversary when processes communicate with each other. The idea was first proposed in [10], yet there are key differences between that work and our algorithm. Instead of all-to-all communication, which required large number of qubits, we use a sequence of random graphs of node degrees d,dα1,,dαkd,d\alpha^{1},\ldots,d{\alpha}^{k}, respectively, where d,αΩ(logn)d,\alpha\in\Omega(\log{n}) and k=logn/logαk=\left\lceil\log{n}/\log{\alpha}\right\rceil are some integer parameters. The vertices of these graphs correspond to processes and edges correspond to communication link – each process communicates with neighbors in one of the graphs at a time. If the graph is chosen properly (i.e., so that there is no situation in which too many processes use denser graphs), it reduces the communication complexity, but simultaneously imposes a new challenge. Mainly, the communication procedure has to now assure delivery of quantum bits between every two non-faulty processes regardless of the pattern of crashes. For instance, if only one random graph of degree dd was used then the adversary could easily isolate any vertex using only O(d)O(d) crashes (i.e., by crashing all its neighbors). Hence, strictly speaking, assuring such delivery is not possible while using a sparse communication graph as relays, but we show that a certain majority could still come up with a common coin value based only on their exchanges with neighbors in the communication graphs; they could later propagate their common value to other processes by adaptively controlling their (increasing) set of neighbors, taken from subsequent communication graphs of increasing density. A thorough analysis shows that in this way it is possible to achieve the same quantum properties that are guaranteed by Ben-Or’s and Hassidim’s global coin [10], and at the same time reducing the quantum communication by a polynomial factor. Formally, we prove the following result.

Algorithm’s description.

We now describe the CheapQuantumCoin algorithm. Its pseudocode is presented in Figure 2. It takes as an input: a process name pp and two integers, dd and α\alpha. The two latter parameters are used to determine communication pattern between non-faulty processes, and their choice determines complexity of the algorithm.

As mentioned before, processes use quantum equivalent of a procedure in which processes draw random names from the range [1,,n3][1,\ldots,n^{3}] and decide on a random bit proposed by the process with the largest name.Note that the latter procedure cannot be used against an adaptive adversary, as it could crash such a leader. We view the quantum part of the algorithm as a quantum circuit on the joint space of all qubits ever used by different processes. Due to the distributed nature of the system, not all quantum operations are allowed by the quantum circuit. That is, (i) any process can perform arbitrary unitary gates on its part of the system, (ii) unitary gates on qubits of different processes might be performed, but must be preceded by quantum communication that send qubits involved in the operation to a single process. The communication can be either direct or via relays. We next explain what unitary gates can be used to simulate the classic algorithm of the leader election in the quantum distributed setting. For the purpose of explanation, we assume that the qubits needed to execute each of the following gates have been delivered to proper processes. The pattern of communication assuring this delivery is described in the next paragraph.

  1. 1.

    Hadamard gate. This unitary operation on a single qubit is given by the matrix H=12[1111]H=\frac{1}{\sqrt{2}}\begin{bmatrix}1&1\\ 1&-1\end{bmatrix}. When applied to every qubit of an mm-qubit state |00\ket{0\ldots 0}, it produces a uniform superposition of vectors of the computational basis, i.e.,

    Hm|00=12n3/21,,m1{0,1}m1,b{0,1}|1mb=12m/2i=02m1|i.H^{\otimes m}\ket{0\ldots 0}=\frac{1}{\sqrt{2}\cdot n^{3/2}}\sum_{\ell_{1},\ldots,\ell_{m-1}\in\{0,1\}^{m-1},b\in\{0,1\}}\ket{\ell_{1}\ldots\ell_{m}b}=\frac{1}{2^{m/2}}\sum_{i=0}^{2^{m}-1}\ket{i}\ .

    In the beginning, every process applies this gate to its main register |Leader|Coin\ket{Leader}\ket{Coin} consisting of 3log+13\log+1 qubits, (line 2). This way the entire system is in the following state:

    (12n3/2i=02n31|i)n=(12n3/2i=02n31|i)(12n3/2i=02n31|i).\left(\frac{1}{\sqrt{2}\cdot n^{3/2}}\sum_{i=0}^{2n^{3}-1}\ket{i}\right)^{\otimes n}=\left(\frac{1}{\sqrt{2}\cdot n^{3/2}}\sum_{i=0}^{2n^{3}-1}\ket{i}\right)\otimes\ldots\otimes\left(\frac{1}{\sqrt{2}\cdot n^{3/2}}\sum_{i=0}^{2n^{3}-1}\ket{i}\right).

    Observe, that if all processes measure the main registers in the computational basis, they the outcome has the same probability distribution as a random experiment in which every process draws a number from uniform distribution over [1,,n3][1,\ldots,n^{3}] (first 3logn3\log{n} qubits of each register) and a uniform random bit (the last qubit of each register).

  2. 2.

    CNOT and Pairwise_CNOT. The CNOTCNOT is a 2-qubit gate given by the following unitary transform

    CNOT(α|00+β|01+γ|10+δ|11)=α|00+β|01+γ|11+δ|10.CNOT\left(\alpha\ket{00}+\beta\ket{01}+\gamma\ket{10}+\delta\ket{11}\right)=\alpha\ket{00}+\beta\ket{01}+\gamma\ket{11}+\delta\ket{10}.

    The Pairwise_CNOT gate works on a register of 2m2\cdot m qubits. Let AA be the first half of the register while BB is the second. The gate applies the CNOTCNOT gate qubit-wisely to corresponding pairs of qubits of AA and BB. We only use this gate when the part BB of the entire register is in the state |00\ket{0\ldots 0}. If the part AA of the enitre register is in a state i=02m1αi|iA\sum_{i=0}^{2^{m}-1}\alpha_{i}\ket{i}_{A} then the unitary gate results in

    Pairwise_CNOT(i=02m1αi|iA,|00B)=i=02m1αi|iA|iB.\texttt{Pairwise\_CNOT}\left(\sum_{i=0}^{2^{m}-1}\alpha_{i}\ket{i}_{A},\ket{0\ldots 0}_{B}\right)=\sum_{i=0}^{2^{m}-1}\alpha_{i}\ket{i}_{A}\ket{i}_{B}\ .

    Whenever a process decides to send its qubits to another process, it performs the Pairwise_CNOT gate on its qubits and the new block of 3logn+13\log{n}+1 qubits initialized to |00\ket{0\ldots 0}. Since, as a result, the information needed to be propagated is now also encoded on another 3logn+13\log{n}+1 qubits, the processes can send the new block to another part keeping the original qubits for itself. Observe, however, that this is not a universal copy gate (which is not possible in quantum model of computation), since it does not copy the part AA to the part BB, but rather achieves a specific form of entanglement that encodes the same information if described in the computational basis.

  3. 3.

    F_CNOT and Controlled_Swap. Let f:{0,1}m{0,1}f:\{0,1\}^{m}\rightarrow\{0,1\} be a Boolean function. Let AA be a register of mm qubits and CC be an additional single qubit register. The gate F_CNOTf\texttt{F\_CNOT}_{f} performs NOTNOT operation on the last qubit iff f(A)=1f(A)=1. If C=|0C=\ket{0}, then we get

    F_CNOTf(i=02m1αi|iA,|0C)=i=02m1αi|iA|f(i)C.\texttt{F\_CNOT}_{f}\left(\sum_{i=0}^{2^{m}-1}\alpha_{i}\ket{i}_{A},\ket{0}_{C}\right)=\sum_{i=0}^{2^{m}-1}\alpha_{i}\ket{i}_{A}\ket{f(i)}_{C}\ .

    The gate can be implemented using 2m2^{m} unitary transforms. For each vector xx from the computational basis such that f(x)=1f(x)=1, we apply NOTNOT on the last qubit and then compose at most 2m2^{m} such gates. We note here that we never use the F_CNOT gate on registers of more than 3logn+13\log{n}+1 qubits, therefore the F_CNOT has polynomial size.

    The Controlled_Swap operates on a control register CC and two registers of the same size, AA and BB. It swaps the content of two registers of the same size if the control register is |1\ket{1}. Formally, the gate works as follows:

    Controlled_Swap(|iA,|jB,α|0+β|1)=α|iA|jB|0+β|jA|iB|1.\texttt{Controlled\_Swap}\left(\ket{i}_{A},\ket{j}_{B},\alpha\ket{0}+\beta\ket{1}\right)=\alpha\cdot\ket{i}_{A}\ket{j}_{B}\ket{0}+\beta\cdot\ket{j}_{A}\ket{i}_{B}\ket{1}\ .

    We use these two gates whenever a process receives qubits of another process. First, a control qubit |S\ket{S} is prepared by comparing the content of the received register and the main register of the process. The condition function used in the gate is the following:

    f:{0,1}3logn+1×{0,1}3logn+1{0,1},f(i,j)i>j.f:\{0,1\}^{3\log{n}+1}\times\{0,1\}^{3\log{n}+1}\rightarrow\{0,1\}\ \ ,\ \ f(i,j)\iff i>j\ .

    Next, the qubit |S\ket{S} is passed to the Controlled_Swap gate, operating again on the main register and the newly received qubits. For the |1\ket{1} part of the control qubit, which corresponds to the fact that the received register has larger values in the computational basis, the gate switches the content of the two registers.

Let us now explain how all the gates listed above work together. As mentioned in the description of the Hadamard gate, after line 2 ends, the main registers of the system are in the state being a uniform superposition of all vectors of the computational basis. Starting from this point, the composition of all gates applied to different registers along the execution can be viewed as a single unitary gate on the entire system, consisting of the qubits that any processes ever created. Note that the unitary transformation might be different depending on the failure in communication, i.e., a failure in delivery of some block of qubits between two processes may result in abandoning gates involving these qubits, but for a moment let us assume that the links are reliable. Since the unitary transformation is linear, it is enough to consider how it affects the vectors of the computational basis. However, all the gates described above behave in the computational basis as their classic equivalents. More precisely, let |x\ket{x} be a vector from the computational basis spanning the whole circuit. Let pp be the process whose main register has the largestThe probability of a tie is polynomially small value in the state |x\ket{x}. From the point of view of this register, the following happens in the algorithm. In each round, pp creates an entangled state on 6logn+26\log{n}+2 qubits (see point (2)(2)) that has the same qubits on its new block of 3logn+13\log{n}+1 qubits as it has on the main register. Then, it propagates the new block to its neighbors (line 2). The neighbors compare the content of received qubits and exchange them with their main register if their content is smaller (gates F_CNOT and Controlled_Swap in lines 1). This operation is then repeated (k+2)2(γα+1)(k+2)^{2}(\gamma_{\alpha}+1) on the set of links defined by some random evolving graphs, see the later paragraph about adaptive communication pattern. In the end, the processes who, either directly or via relays, received the content of the largest main register, have the same value in their main register. Therefore, the result of the measurement in line 2 must be the same in all these processes.

Assume now that we are able to achieve bidirectional quantum communication between any pair of processes of an α\alpha-fraction of the entire system, regardless of the (dynamic) actions of the adversary. From the above, the algorithm transforms any vector whose largest main register is one of the registers of the α\alpha-fraction to a vector such that processes from the α\alpha-fraction have the same values in main registers. Connecting the above discussion with the fact the algorithm is a 11-to1-1 transformation on the linear space of states, we get that the probability of measuring the same values in the processes of the α\alpha-fraction is at least the probability of measuring the largest value in one of the processes belonging to the α\alpha-fraction, if the measurement was done before the quantum communication. Since the state is a uniform superposition at that time, the probability is at least αo(1)\alpha-o(1) and we can claim the following lemma.******The o(1)o(1) part contributes to the unlikely event that there are ties between largest values.

Lemma 1.

Let AA be a set of correct processes such that any pair of them was connected by quantum communication either directly or by relays. Then the probability that all processes from AA output the same bit from the algorithm CheapQuantumConsensus is at least |A|no(1)\frac{|A|}{n}-o(1).

input: pp, two parameters: d,αd,\alpha
1
2For 0ilog(n/d)logα0\leq i\leq\left\lceil\frac{\log(n/d)}{\log{\alpha}}\right\rceil: 𝒩p(dαi)\mathcal{N}_{p}(d\alpha^{i})\leftarrow a set of processes such that each process is chosen independently with probability dαin\frac{d\alpha^{i}}{n} ;
3 degreepd\texttt{degree}_{p}\leftarrow d,    γαlognlogα\gamma_{\alpha}\leftarrow\frac{\log{n}}{\log{\alpha}},    δα23logn\delta_{\alpha}\leftarrow\frac{2}{3}\log{n} ;
4
5|Leaderp|CoinpHn|000\ket{Leader}_{p}\ket{Coin}_{p}\leftarrow H^{\otimes{n}}\ket{00\ldots 0} (a gate on 3logn+13\log{n}+1) ;
6
for i=1i=1 to (k+2)2(k+2)^{2} ;
  /* iter. of epochs */
7 do
8       adaptive_degreedegreep\texttt{adaptive\_degree}\leftarrow\texttt{degree}_{p} ;
       for j=1j=1 to γα+1\gamma_{\alpha}+1 ;
        /* iter. of testing rounds */
9       do
10             send to each process in 𝒩p(degreep)\mathcal{N}_{p}(\texttt{degree}_{p}): an inquire bit 11 ;
11             \mathcal{I}\leftarrow the set of processes who sent an inquire bit to pp ;
12             q:|BqPairwise_CNOT(|LeaderCoinp,|00)\forall_{q\in\mathcal{I}}:\ket{B_{q}}\leftarrow\texttt{Pairwise\_CNOT}\left(\ket{LeaderCoin}_{p},\ket{0\ldots 0}\right) ;
13             send to each process qq\in\mathcal{I}: a quantum message containing first log3n\log^{3}n bits of |Bq\ket{B_{q}}, and a classical message containing adaptive_degreep\texttt{adaptive\_degree}_{p} ;
14             \mathcal{R}\leftarrow the set of processes who responded to pp’s inquires ;
15             foreach qq\in\mathcal{R} do
16                   |CLeaderq|CCoinq\ket{CLeader}_{q}\ket{CCoin}_{q}\leftarrow received quantim bits from qq,   |S|0\ket{S}\leftarrow\ket{0};
17                   F_CNOTLeaderp>CLeaderq(|Leaderp,|Leaderq,|S)\texttt{F\_CNOT}_{Leader_{p}>CLeader_{q}}\left(\ket{Leader}_{p},\ket{Leader}_{q},\ket{S}\right);
18                   Controlled_Swap(|LeaderCoinp,|CLeaderCCoinq,|S)\texttt{Controlled\_Swap}\left(\ket{LeaderCoin}_{p},\ket{CLeaderCCoin}_{q},\ket{S}\right);
19                  
20             end foreach
21            
            while |{q:adaptive_degreeqadaptive_degreep}|<δα|\{q\in\mathcal{R}:\texttt{adaptive\_degree}_{q}\geq\texttt{adaptive\_degree}_{p}\}|<\delta_{\alpha} and adaptive_degreepd\texttt{adaptive\_degree}_{p}\geq d ;
              /* adapting #neighbors during testing */
22              do
23                   adaptive_degreep1αadaptive_degreep\texttt{adaptive\_degree}_{p}\leftarrow\frac{1}{\alpha}\texttt{adaptive\_degree}_{p} ;
24                  
25             end while
26            
27       end for
28      if adaptive_degreep<degreep\texttt{adaptive\_degree}_{p}<\texttt{degree}_{p} then
             degreepmin{degreepα,dαk}\texttt{degree}_{p}\leftarrow\min\{\texttt{degree}_{p}\cdot\alpha,d\alpha^{k}\} ;
              /* neighborhood for next epoch grows */
29            
30      
31 end for
32bpb_{p}\leftarrow be the last bit the measurement of |Leaderp|Coinp\ket{Leader}_{p}\ket{Coin}_{p} in the computational basis;
return bpb_{p} ;
  /* pp outputs random bit */
Algorithm 2 CheapQuantumCoin for process pp

Adaptive communication pattern.

As explained, we not only need that the communication should be efficient in terms of the number of qubits and classic bits, but also it should be such that any two correct processes of a large fraction of the entire system are connected by a short path of correct process so that quantum registers can be relayed. Let d,αd,\alpha be two integers parameters. We define k=log(n/d)logαk=\left\lceil\frac{\log(n/d)}{\log{\alpha}}\right\rceil, γα=lognlogα\gamma_{\alpha}=\frac{\log{n}}{\log{\alpha}}, and δα=23α\delta_{\alpha}=\frac{2}{3}\alpha. Initially, each process pp draws independently k+1k+1 sets 𝒩p(d),𝒩p(dα1),,𝒩p(dαk)\mathcal{N}_{p}(d),\mathcal{N}_{p}(d\alpha^{1}),\ldots,\mathcal{N}_{p}(d\alpha^{k}), where a set 𝒩p(dαi)\mathcal{N}_{p}(d\alpha^{i}), for 0ik0\leq i\leq k, includes each process from 𝒫\mathcal{P} with probability dαin\frac{d\alpha^{i}}{n}.

Communication is structured into (k+2)2(k+2)^{2} epochs, see line 2. Each epoch consists of 2(γα+1)2(\gamma_{\alpha}+1) communication rounds, also called testing rounds. They are scheduled in γα+1\gamma_{\alpha}+1 iterations within the loop “for” in line 2, each iteration containing two communication rounds (underlined in the pseudocode): sending/receiving inquiries in line 2 and sending/receiving responses in line 2. In the testing rounds of the first epoch, a process pp sends inquiries to processes in set 𝒩p(d)\mathcal{N}_{p}(d). The inquired processes respond by sending in the next round (line 2) their current classic state and specially prepared, in line 2, quantum register. However, if in a result of crashes pp starts receiving less than δα\delta_{\alpha} responses per round, it switches its communication neighborhood from 𝒩p(d)\mathcal{N}_{p}(d) to the next, larger set, 𝒩p(dα)\mathcal{N}_{p}(d\cdot\alpha). A similar adaptation to a crash pattern is continued in the remaining epochs.

Process pp stores the cardinally of the set being inquired in an epoch in the variable degreep\texttt{degree}_{p} (initialized to dd in line 2). For the purpose of testing rounds, pp copies the value degreep\texttt{degree}_{p} to a variable adaptive_degreep\texttt{adaptive\_degree}_{p} (line 2). In every testing round, pp adapts its variable adaptive_degreep\texttt{adaptive\_degree}_{p} to the largest value xadaptive_degreepx\leq\texttt{adaptive\_degree}_{p} such that it received at least δa\delta_{a} responses from processes that have their variable adaptive_degree at least xx (loop “while” in line 2). If pp had to decrease the value adaptive_degreep\texttt{adaptive\_degree}_{p} in testing rounds of an epoch, it then increases the main variable degreep\texttt{degree}_{p} by the factor α\alpha before the next epoch, see line 2. The latter operation formally encodes the intuition that the process pp expected to have δα\delta_{\alpha} non-faulty neighbors with their values of degree at least as big as its own, but due to crashes it did not happen; Therefore, pp increases the number of inquired processes, by adopting the next, larger neighborhood set 𝒩p()\mathcal{N}_{p}(\cdot), randomly selected, in order to increase the chance of communication with the majority of non-faulty processes in the next epoch. On the other hand, the adaptive procedure of reducing adaptive_degree in testing rounds of a single epoch helps neighbors of pp to estimate correctly the size of the neighborhood that process pp is using in the current testing round, which might be much smaller than the value degreep\texttt{degree}_{p} from the beginning of the epoch. This, in turn, calibrates the value of adaptive_degree of the neighbors of pp, and this calibration can propagate to other processes of distance up to γα\gamma_{\alpha} from pp in the next iterations of testing rounds.

Analysis.

Let us define graphs 𝒢(dαi)\mathcal{G}(d\alpha^{i}), for 0ik0\leq i\leq k, as the union of random sets p𝒫𝒩p(dαi)\cup_{p\in\mathcal{P}}\mathcal{N}_{p}(d\alpha^{i}). The probability distribution of the graph 𝒢(dαi)\mathcal{G}(d\alpha^{i}) is the same as the random graph G(n,y)G(n,y) for y=dαiny=\frac{d\alpha^{i}}{n}. Chlebus, Kowalski and Strojnowski [13] showed in their Theorem 2, applied for k=64ndαi1k=\frac{64n}{d\alpha^{i-1}}, that the graph 𝒢(dαi)\mathcal{G}(d\alpha^{i}) has the following properties, whp:
(i) it is (ndαi1)(\frac{n}{d\alpha^{i-1}})-expanding, which follows from (ndαi1,23ndαi2,43ndαi2)(\frac{n}{d\alpha^{i-1}},\frac{2}{3}\frac{n}{d\alpha^{i-2}},\frac{4}{3}\frac{n}{d\alpha^{i-2}})-edge-expanding property,
(ii) it is (ndαi1,13α,23α)(\frac{n}{d\alpha^{i-1}},\frac{1}{3}\alpha,\frac{2}{3}\alpha)-edge-dense,                 (iii) it is (16ndαi1,3/4,23α)(16\frac{n}{d\alpha^{i-1}},3/4,\frac{2}{3}\alpha)-compact,
(iv) the degree of each node is at most 2120dαi\frac{21}{20}d\alpha^{i}.

Since the variable degreep\texttt{degree}_{p} takes values in the set {d,dα1,,dαk}\{d,d\alpha^{1},\ldots,d\alpha^{k}\}, the pigeonhole principle assures that eventually pp will participate in an epoch in which degreep\texttt{degree}_{p} has not been increased (in the most severe scenario, pp will use the set 𝒩p(dαk)\mathcal{N}_{p}(d\alpha^{k}), which consists of all other processes – because it contains each process, as a neighbor of pp, with probability 11). The (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood property of random graphs composed from neighborhoods 𝒩(degreep)\mathcal{N}(\texttt{degree}_{p}) implies that pp will then contact a majority of other non-faulty processes via at most γα+1\gamma_{\alpha}+1 intermediate processes (during testing rounds). Formally, the following holds:

Lemma 2.

If a process pp does not change its variable degreep\texttt{degree}_{p} at the end of an epoch ii, then at the beginning of epoch ii there exists a (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood of pp in the graph 𝒢(degreep)\mathcal{G}(\texttt{degree}_{p}) consisting of non-faulty processes with variable degree being at least degreep\texttt{degree}_{p} in the epoch ii, whp.

On the other hand, (16n/dαi1,3/4,2α/3)(16n/d\alpha^{i-1},3/4,2\alpha/3)-compactness of the (random) graph composed of processes that have the variable degree at least dαid\alpha^{i}, guarantees that the total number of processes that use sets 𝒩(dαi)\mathcal{N}(d\alpha^{i}) during the epoch ii is at most nαi2\frac{n}{\alpha^{i-2}}, which amortizes communication complexity.

Lemma 3.

For any integer ii, such that 0ik0\leq i\leq k, at the beginning of each epoch there is at most 16ndαi2\frac{16n}{d\alpha^{i-2}} processes with the variable degree greater than dαid\alpha^{i}, whp.

The above discussion yields the following result.

Theorem 3.

For two integer parameters d,αΩ(logn)d,\alpha\in\Omega(\log{n}), the algorithm QuantumCoinFlip generates a weak global coin, provided that at most 13\frac{1}{3}-fraction of initially non-faulty processes have crashed. It terminates in O((lognlogα)3)O\Big{(}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{3}\Big{)} rounds and with high probability uses O((lognlogα)4dα2logn)O\Big{(}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{4}d\alpha^{2}\log{n}\Big{)} both quantum and classic communication bits (amortized) per process.

6 Constant-Time Communication-Efficient Fuzzy Counting

Although generating a weak global coin in a constant number of rounds against an adaptive adversary requires quantum communication (due to the lower bound by Ben-Or and Bar-Joseph [8]), the CheapQuantumCoin algorithm, even without quantum communication, achieves few other goals. As discussed in the previous section, its random communication pattern guarantees, whp, that any additional classic message, also called a rumor, of a non-faulty process can be conveyed to any other non-faulty process if added to every classic message sent/received in line 2. Even more, assume that there is a set of xx messages/rumors M={m1,,mx}M=\{m_{1},\ldots,m_{x}\}, distributed as inputs among some subset of processes (one message from MM per process). If processes always convey all the known rumors from set MM when using classic communication (avoiding repetition), then they solve a Gossip problem, whp, i.e., every rumor mim_{i} given to a non-faulty process, for 1ix1\leq i\leq x, is known at the end of the protocol to every other non-faulty process. Observe that processes resign from the quantum content of communication for this purpose, and instead of logn\log{n} bits (or qubits) per message, they use |M||M| bits, where |M||M| denotes the number of bits needed to encode all rumors from MM. Finally, if processes work in a model where the names of other processes are commonly known, they can withdraw from using random communication. Instead, they can use a deterministic family of graphs 𝒢(dαi)\mathcal{G}(d\alpha^{i}), for the same choice of parameters dd and α\alpha. The proof of existence of such graphs, using the probabilistic argument, was described in [13] (Theorem 2). In such case, the set 𝒩p(dαi)\mathcal{N}_{p}(d\alpha^{i}) is the neighborhood of process pp in the deterministic graph 𝒢(dαi)\mathcal{G}(d\alpha^{i}). (Processes compute the same copies of graphs 𝒢\mathcal{G} locally in the beginning of the algorithm.) The above augmentation of the algorithm, together with the proof of Theorem 3, from which we take the upper bound on the number of messages send and the upper bound on the number of rounds, leads to the following result.

Theorem 4.

For integer parameters d,αΩ(logn)d,\alpha\in\Omega(\log{n}), there is a deterministic algorithm that solves the gossip problem in O((lognlogα)3)O\left(\left(\frac{\log{n}}{\log{\alpha}}\right)^{3}\right) rounds using O((lognlogα)4dα2(|M|+logn))O\left(\left(\frac{\log{n}}{\log{\alpha}}\right)^{4}d\alpha^{2}\cdot(|M|+\log{n})\right) communication bits per process (amortized), where |M||M| is the number of bits needed to encode all rumors in MM.

Generalized Fuzzy Counting.

Definition 2 (Fuzzy Counting [25]).

An algorithm solves Fuzzy Counting if each process returns a number between the initial and the final number of active processes. Here, the notion of “being active” depends on the goal of the counting, e.g., all non-faulty processes, processes with initial value 11, etc.

Note that the returned numbers could be different across processes. Here, we refine the state-of-art method of solving the fuzzy counting problem, even deterministically, and propose a new recursive algorithm with any branching factor xx, called FastCounting. Formally, we prove the following:

Theorem 5.

For any 2xn2\leq x\leq n and δ,αΩ(n)\delta,\alpha\in\Omega(n), the FastCounting deterministic algorithm solves the Fuzzy Counting problem in O(lognlogx(lognlogα)3)O\Big{(}\frac{\log{n}}{\log{x}}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{3}\Big{)} rounds, using O(lognlogx(lognlogα)4dα2xlogn)O\Big{(}\frac{\log{n}}{\log{x}}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{4}d\alpha^{2}\cdot x\log{n}\Big{)} communication bits (amortized) per process.

Obviously, the constant-time is achieved in Theorem 5 when x,α=nϵx,\alpha=n^{\epsilon}, for a constant ϵ(0,1)\epsilon\in(0,1). In this case, the number of rounds is O((1ϵ)4)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{4}\Big{)}, while the communication complexity is O(n3ϵlog2n)O(n^{3\epsilon}\log^{2}{n}) (amortized) per process. In our approach, we generalize the method of Hajiaghayi et al. [25] to denser communication graphs of certain properties, which allows us to achieve constant running time. The constant running time is the key feature of the algorithm, since it is used in the implementation of (expected) constant-round quantum consensus protocol CheapQuantumConsensus. The main difference between ours and the state-of-art approach is a different Gossip protocol used in the divide-and-conquer method.

The FastCounting algorithm is recurrent. It takes as an input the following values: 𝒫\mathcal{P} is the set of processes on which the algorithm is executed; pp is the name of a process which executes the algorithm; ap{0,1}a_{p}\in\{0,1\} denotes if pp is active (ap=1a_{p}=1) or not; and parameters x,d,αx,d,\alpha, where xx is the branching factor and d,αd,\alpha steer the density of the communication graph in the execution. Knowing the set 𝒫\mathcal{P} of nn processes, FastCounting splits the set into xx disjoint groups of processes, each of size between nx\left\lfloor\frac{n}{x}\right\rfloor and nx\left\lceil\frac{n}{x}\right\rceil. Name these groups 𝒫1,,𝒫x\mathcal{P}_{1},\ldots,\mathcal{P}_{x}. The groups are known to each participating process. The algorithm then makes xx parallel recursive calls on each of these groups. As a result, a process pp from a group 𝒫i\mathcal{P}_{i}, for 1ix1\leq i\leq x, gets the number of the active processes in its group 𝒫i\mathcal{P}_{i}. In the merging step, all processes execute Gossip algorithm, proposed in Theorem 4, with parameters d,αd,\alpha, where the input rumors are the numbers calculated in the recursive calls. To keep the communication small, when processes learn new rumors they always store at most one rumor corresponding to each of the xx groups. This way, the number of bits needed to encode all rumors is O(xlogn)O(x\log{n}). Let r1,,rr_{1},\ldots,r_{\ell} be the rumors learned by process pp from the execution of the Gossip algorithm. The final output of pp is the sum i=1ri\sum_{i=1}^{\ell}r_{i}. We provide the pseudocode of the algorithm in Figure 3. Note that all processes run the algorithm to help counting and Gossip, not only those with ap=1a_{p}=1, but only the latter are counted (see lines 2, 5, 7). For detail analysis of the round and communication bit complexity we refer to Section C.

input: 𝒫\mathcal{P}, pp, apa_{p}; x,d,αx,d,\alpha
1 if  |𝒫|=1|\mathcal{P}|=1 then
2       return apa_{p}
𝒫1,,𝒫x\mathcal{P}_{1},\ldots,\mathcal{P}_{x}\leftarrow partition of 𝒫\mathcal{P} into xx disjoint group of size between |𝒫|x\left\lfloor\frac{|\mathcal{P}|}{x}\right\rfloor and |𝒫|x\left\lceil\frac{|\mathcal{P}|}{x}\right\rceil ;
  /* this partition is common for all processes, since we assumed the common knowledge of set 𝒫\mathcal{P} */
3 let gg be the index of the pp’s group, i.e., p𝒫gp\in\mathcal{P}_{g};
4 activepFastCounting(𝒫g,p,ap,x,d,α)\texttt{active}_{p}\leftarrow\textsc{FastCounting}(\mathcal{P}_{g},p,a_{p},x,d,\alpha);
5 \mathcal{R}\leftarrow the outcome of the Gossip algorithm, referred to in Theorem 4, executed on the set of processes 𝒫\mathcal{P} with initial message activep\texttt{active}_{p} and parameters d,αd,\alpha;
6
return qactiveq\sum_{q\in\mathcal{R}}\texttt{active}_{q}
Algorithm 3 FastCounting for process pp

References

  • [1] I. Abraham, T.-H. Hubert Chan, D. Dolev, K. Nayak, R. Pass, L. Ren, and E. Shi, Communication Complexity of Byzantine Agreement, Revisited, in Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2019, pp. 317 - 326.
  • [2] D. Alistarh, J. Aspnes, V. King, and J. Saia, Communication-efficient randomized consensus, Distributed Computing, 31 (2018), 489 - 501.
  • [3] S. Amdur, S. Weber, and V. Hadzilacos, On the message complexity of binary agreement under crash failures, Distributed Computing, 5 (1992) 175 - 186.
  • [4] J. van Apeldoorn and T. de Vos, A Framework for Distributed Quantum Queries in the CONGEST Model. Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2022.
  • [5] J. Aspnes, Lower Bounds for Distributed Coin-Flipping and Randomized Consensus, J. ACM 45(3) (1998): 415 - 450.
  • [6] H. Attiya, and K. Censor, Tight bounds for asynchronous randomized consensus, Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007
  • [7] H. Attiya, and J. Welch, Distributed Computing: Fundamentals, Simulations and Advanced Topics, 22nd edition, Wiley, 2004.
  • [8] Z. Bar-Joseph, and M. Ben-Or, A Tight Lower Bound for Randomized Synchronous Consensus, Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing (PODC), 1998, 193 - 199.
  • [9] Z. Bar-Joseph, and M. Ben-Or, (2002) A Tight Lower Bound for Randomized Synchronous Consensus. DOI: 10.1145/277697.277733
  • [10] M. Ben-Or and A. Hassidim, Fast quantum byzantine agreement, Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), 2005.
  • [11] A. Broadbent and A. Tapp, Can quantum mechanics help distributed computing? SIGACT News, 39(3):67-76, 2008.
  • [12] B.S. Chlebus, and D.R. Kowalski, Robust gossiping with an application to consensus, Journal of Computer and System Sciences, 72 (2006) 1262 - 1281.
  • [13] B.S. Chlebus, D.R. Kowalski, and M. Strojnowski, Fast scalable deterministic consensus for crash failures, in Proceedings of the 2828th ACM Symposium on Principles of Distributed Computing (PODC), 2009, pp. 111 - 120.
  • [14] B.S. Chlebus, D.R. Kowalski, M. Strojnowski, Scalable Quantum Consensus for Crash Failures, in Proceedings of the 24th International Symposium on Distributed Computing (DISC), 2010.
  • [15] V. Cholvi, Quantum Byzantine Agreement for Any Number of Dishonest Parties, Quantum Inf. Process., 21 (2022) 1 - 11.
  • [16] B. Chor, M. Merritt, and D.B. Shmoys, Simple constant-time consensus protocols in realistic failure models, J. ACM, 36(3):591–614, 1989.
  • [17] D. Dolev, and R. Reischuk, Bounds on information exchange for Byzantine agreement, Journal of the ACM, 32 (1985) 191 - 204.
  • [18] C. Dwork, J. Halpern, and O. Waarts, Performing work efficiently in the presence of faults, SIAM Journal on Computing, 27 (1998) 1457 - 1491.
  • [19] P. Feldman, and S. Micali, An optimal probabilistic protocol for synchronous Byzantine agreement, SIAM Journal on Computing, 26 (1997) 873 - 933.
  • [20] M. Fisher, and N. Lynch, A lower bound for the time to assure interactive consistency, Information Processing Letters, 14 (1982) 183 - 186.
  • [21] M. Fisher, N. Lynch, and M. Paterson, Impossibility of distributed consensus with one faulty process, Journal of the ACM, 32 (1985) 374 - 382.
  • [22] Z. Galil, A. Mayer, and M. Yung, Resolving message complexity of Byzantine agreement and beyond, in Proceedings of the 3636th IEEE Symposium on Foundations of Computer Science (FOCS), 1995, pp. 724 - 733.
  • [23] S. Gilbert, and D.R. Kowalski, Distributed agreement with optimal communication complexity, in Proceedings of the 2121st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2010, pp. 965 - 977.
  • [24] V. Hadzilacos, and J.Y. Halpern, Message-optimal protocols for Byzantine agreement, Mathematical Systems Theory, 26 (1993) 41 - 102.
  • [25] M. Hajiaghayi, D.R. Kowalski, and J. Olkowski, Improved communication complexity of fault-tolerant consensus, in Proceedings of the 54th Annual ACM Symposium on Theory of Computing (STOC), 2022, to appear.
  • [26] L.K. Helm, Quantum distributed consensus, in Proceedings of the 27th ACM symposium on Principles of distributed computing (PODC), 2008.
  • [27] Valerie King, Jared Saia Breaking the O(n2{}^{\mbox{2}}) bit barrier: Scalable byzantine agreement with an adaptive adversary, J. ACM, (2011) 58, 18:1–18:24.
  • [28] Q. Luo, K. Feng, and M. Zheng, Quantum multi-valued byzantine agreement based on dd-dimensional entangled states, International Journal of Theoretical Physics, 58 (2019) 4025 - 4032.
  • [29] M. A Nielsen and I. Chuang, Quantum computation and quantum information, American Association of Physics Teachers
  • [30] D. Peleg, (2000). Distributed Computing: A Locality-Sensitive Approach. SIAM.
  • [31] M. Pease, R. Shostak, and L. Lamport, Reaching agreement in the presence of faults, Journal of the ACM, 27 (1980) 228 - 234.
  • [32] S. Tani, H. Kobayashi, and K. Matsumoto, Exact Quantum Algorithms for the Leader Election Problem, ACM Transactions on Computation Theory, 4 (2012) 1 - 24.
  • [33] X. Wu, and P. Yao, Quantum Complexity of Weighted Diameter and Radius in CONGEST Networks, Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2022

APPENDIX

Appendix A The Analysis of CheapQuantumConsensus Algorithm

To analyze the CheapQuantumConsensus algorithm we first recall a combinations of lemmas from [9].

Lemma 4 (Lemmas 4.1,4.2,4.34.1,4.2,4.3 in [9]).

If all processes have the same value at the beginning of an iteration of the main while loop, then the algorithm returns the decision after at most two iterations.

Theorem 6.

For any ϵ>0\epsilon>0, the CheapQuantumConsensus algorithm solves Consensus against n/3n/3-adversary in O((1ϵ)4)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{4}\Big{)} rounds in expectation while using O(n3ϵ)O(n^{3\epsilon}) qubits and communication bits per process (amortized), whp.

Proof.

First, we argue for correctness. Compared to the protocol of Bar-Joseph and Ben-Or [9], which works for an arbitrary number of failures tnt\leq n, we changed the method of deriving random coin, c.f., line 1. Observe that even if CheapQuantumCoin fails to meet conditions of tt-resilient coin flip, it always outputs some bit in every non-faulty processes. Thus, regardless of number of crashes the output could be an output of local coin flips with a non-zero probability. Since Bar-Josephs’s and Ben-Or’s algorithm works with probability 11 (see Theorem 3 in [9]), thus CheapQuantumConsensus also achieves correctness with probability 11.

Next, we estimate the expected number of phases (i.e. the number of iterations of the main while loop). We consider only good phases, i.e. phases in which the adversary crashed at most 110\frac{1}{10} fraction of processes that were correct at the beginning of this iteration. Note, that there can be at most 13/110<4\frac{1}{3}/\frac{1}{10}<4 ”bad” phases. Let xx be the number of non-faulty processes at the beginning of some good phase. We consider following cases:
Case a) There exists a process that in this iteration executes line 1 or line 1. In this case, all other processes have to execute line 1 or line 1, since the number of ones counted in different processes may differ by x10\frac{x}{10} at most. All processes that in this iteration execute CheapQuantumCoin will set bpb_{p} to 11 with probability 14\frac{1}{4} at least. What follows, in the next iteration all processes will start with bpb_{p} set to 11 and by Lemma 4 the algorithms will decide within two next phases.
Case b) There exists a process that in this phase executes line 1 or line 1. Similarly to the previous case, we observe that all other processes have to execute line 1 or line 1, since, again, the number of ones counted in different processes may differ by x10\frac{x}{10} at most. Therefore the same arguments applies, but know the final decision will be 0.
Case c) None of processes executes one of lines 111, or 1. Thus, all processes participated in CheapQuantumCoin in line 1. By Theorem 3, with probability at least 14\frac{1}{4}, all processes will set value bpb_{p} to the same value. Thus, again by applying Lemma 4, we get that the algorithms will decide within two next phases.

We showed that if good phase happens, then the algorithm terminates within 22 next iterations with probability at least 14\frac{1}{4}. Since there can be at most 44 bad iterations, thus we can calculate the expected number of iterations as follows

𝔼(ITE)=i=4i(14)i=O(1).\mathbb{E}(ITE)=\sum_{i=4}^{\infty}i\bigg{(}\frac{1}{4}\bigg{)}^{i}=O(1)\ .

Executing a single phase takes O((1ϵ)4)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{4}\Big{)} rounds, which is the round complexity of the FastCounting algorithm and an upper bound of the time complexity of the CheapQuantumCoin algorithm, therefore the algorithm terminates in O((1ϵ)4)O\Big{(}\big{(}\frac{1}{\epsilon}\big{)}^{4}\Big{)} rounds in expectation. Similarly, by taking the upper bounds on the communication complexity of the algorithms FastCounting and CheapQuantumCoin we get that the expected number of amortized communication bits used by the algorithm is O(n3ϵ)O(n^{3\epsilon}). ∎

Finally, let us analyze the modified version of the algorithm CheapQuantumConsensus with the difference that processes use parameters x:=2,d=logn,α:=lognx:=2,d=\log{n},\alpha:=\log{n} in lines 1 and 1.

Theorem 7.

The modified version of the CheapQuantumConsensus algorithm solves Consensus against any adversary in O(log5n)O(\log^{5}n) rounds in expectation while using O(log8n)O(\log^{8}{n}) qubits and communication bits per process (amortized), whp.

Proof.

For the correcntess we argue exactly the same as in the proof of previous Theorem. We also define good and bad phases, with only this differences that now the number of bad phases is at most O(logn)O(\log{n}), since the adversary has the full power of crashing arbitrary number of processes. This being said we get that, by the very same reasoning, that the expected number of phases is

𝔼(ITE)=i=logni(14)i=Θ(logn).\mathbb{E}(ITE)=\sum_{i=\log{n}}^{\infty}i\bigg{(}\frac{1}{4}\bigg{)}^{i}=\Theta(\log{n})\ .

By examining the time and bits complexity of the algorithms FastCounting and CheapQuantumCoin with parameters x=2,d=logn,α=lognx=2,d=\log{n},\alpha=\log{n}, we get a single phase lasts O(log4n)O(\log^{4}{n}) rounds and contributes O(nlog7n)O(n\log^{7}{n}) bits to the total communication complexity. The latter, after dividing by nn, gives the sought complexity amortized per process. Thus, the theorem follows. ∎

Appendix B Omitted proofs from Section 5

Proof of Lemma 2.

Consider two last iterations of the epoch ii. Since the variable degreep\texttt{degree}_{p} has not changed it the epoch, thus also the variable adaptive_degreep\texttt{adaptive\_degree}_{p} has remained unchanged in the epoch. In particular, examining line 2 assures, that in the last two testing rounds of this epoch, process pp received at least δα\delta_{\alpha} responses from processes that had the variable adaptive_degree at least degreep\texttt{degree}_{p}.

Next, we proceed by induction. We claim, that in the 2j2j and 2j12j-1, for 1jγα1\leq j\leq\gamma_{\alpha} last testing rounds there existed (j,δα)(j,\delta_{\alpha})-dense-neighborhood of pp with the properties the every processes belonging to this neighborhood is non-faulty and has its variable adaptive_degree at least adaptive_degreep\texttt{adaptive\_degree}_{p}. The induction step goes as follows. Assume that there exists (j,δα)(j,\delta_{\alpha})-dense-neighborhood of pp with the mentioned properties. In particular in the 2(j+1),2(j+1)12(j+1),2(j+1)-1 last testing rounds, every process from this set had to receive at least δα\delta_{\alpha} responses from processes with the variable adaptive_degree at least as big as the variable adaptive_degree of the process. This follow from inspecting line 2. The set of processes who responded with large values of adaptive_degree constitute to the set Nj+1(p)N^{j+1}(p).

Eventually, we obtain the existence of the set Nγα(p)N^{\gamma_{\alpha}}(p) which satisfies the property of (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood. Each process from the neighborhood has the variable degree at least degreep\texttt{degree}_{p}, since for every process qq it holds that degreeqadaptive_degreeq\texttt{degree}_{q}\geq\texttt{adaptive\_degree}_{q}. ∎

Proof of Lemma 3.

Assume to the contrary that there exists an epoch such that more than 16n3αi2\frac{16n}{3\alpha^{i-2}} processes start this epoch with the variable degree set to a value greater than dαid\alpha^{i}. Assume also w.l.o.g. that ii is the first such epoch and let AA be the set of processes that have the variable degree set to at least dαid\alpha^{i} at the beginning of this epoch. Let CC be any survival set for AA with respect to the graph 𝒢(dαi1)\mathcal{G}(d\alpha^{i-1}). Note that since AA is a set of processes that are correct in epoch ii, thus the processes from CC have been behaving correctly in epoch 1,,i11,\ldots,i-1 inclusively. Since the graph 𝒢(dαi1)\mathcal{G}(d\alpha^{i-1}) is (16ndαi2,3/4,23α)(16\frac{n}{d\alpha^{i-2}},3/4,\frac{2}{3}\alpha)-compact and |A|16ndαi2|A|\geq 16\frac{n}{d\alpha^{i-2}}, thus CC exists. At the beginning of the epoch ii, the variable degree of all processes from CC is greater than dαid\alpha^{i}, thus there must be a round j<ij<i in which the last process from CC, say qq, increases its variable degree to dαid\alpha^{i}. But this gives us the contradiction. In the epoch jj, all processes from CC operates in the graph 𝒢(dαi1)\mathcal{G}(d\alpha^{i-1}), thus they have at least δα\delta_{\alpha} neighbors in CC. All these neighbors have the variable degree greater than dαi1d\alpha^{i-1}. In particular, in this epoch, the process qq will not execute line 2 and therefore it will not increase its variable degreep\texttt{degree}_{p} which give a contradiction with the maximality of jj and in consequence the minimality of ii. ∎

Note that in the above proof of Lemma 3 we use the fact that a suitable set CC of processes that have been correct throughout the whole epoch exists. We may choose this set and argue about it after the epoch, as the communication pattern in the algorithm is deterministic. Hence, in any round of the epoch, processes in CC have at least as many non-faulty neighbors in communication graph as they have neighbors from set CC. We use this number of neighbors in CC as a lower bound to argue about behavior of variables degree; therefore, our arguments do not depend on behavior of processes outside of CC and whether/when some of them fail during the epoch.

Lemma 5.

Any two non-faulty processes pp and qq were connected by a quantum path of communication during the run of the algorithm.

Proof.

First observe, that the variable degreep\texttt{degree}_{p} can take at most k+1k+1 different values. Since there is (k+2)2(k+2)^{2} epochs, thus by the pigeonhole principle there must be a sequence of consecutive k+2k+2 epochs in which the variable degreep\texttt{degree}_{p} remains unchanged. Similarly, among these k+2k+2 epochs there must be two epochs in which the variable degreeq\texttt{degree}_{q} remains unchanged. Let us denote these two epochs ii and i+1i+1. Since the variable degreep\texttt{degree}_{p} has not changed in the epoch i+1i+1, thus by applying Lemma 2, we get that there exists a (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood of pp in the graph 𝒢(degreep)\mathcal{G}(\texttt{degree}_{p}), say AA, consisting of processes that a) were non-faulty in the epoch b) their variable degree were at least degreep\texttt{degree}_{p}. An immediate consequence is that all processes from AA were non-faulty in the epoch ii. Moreover, since AA is (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood of pp in the graph 𝒢(degreep)\mathcal{G}(\texttt{degree}_{p}) thus its size is at least nαdegreep\frac{n\alpha}{\texttt{degree}_{p}} (Lemma 1 [13]). Let BB be the analogues (γα,δα)(\gamma_{\alpha},\delta_{\alpha})-dense-neighborhood of qq derived from the properties of the graph 𝒢(degreeq)\mathcal{G}(\texttt{degree}_{q}).

Now, consider quantum communication between sets AA and BB in the first testing round of the epoch i+1i+1. Processes from AA use the graph 𝒢(degreep)\mathcal{G}(\texttt{degree}_{p}) to communicate (or a denser graph) while processes from BB use the graph 𝒢(degreeq)\mathcal{G}(\texttt{degree}_{q}) (or a denser graph). The graph 𝒢(degreep)\mathcal{G}(\texttt{degree}_{p}) is (nαdegreep)(\frac{n\alpha}{\texttt{degree}_{p}})-expanding and the graph 𝒢(degreeq)\mathcal{G}(\texttt{degree}_{q}) is at least (nαdegreeq)(\frac{n\alpha}{\texttt{degree}_{q}})-expanding. Therefore, at least one process from set AA inquires a process from set BB when executes line 2 in this testing round if degreepdegreeq\texttt{degree}_{p}\geq\texttt{degree}_{q}; or vice-versa if degreep<degreeq\texttt{degree}_{p}<\texttt{degree}_{q}. Since the set BB has diameter γα\gamma_{\alpha} (Lemma 1 [13]) and there remains γα\gamma_{\alpha} testing rounds in the epoch i+1i+1, thus by the end of this epoch any quantum messages send from pp could reach qq. ∎

Proof of Theorem 3.

Let 𝒫\mathcal{H}\subseteq\mathcal{P} be the set of initially non-faulty processes. Assume that at least 23||\frac{2}{3}|\mathcal{H}| of them remains non-faulty during the execution of the algorithm. By Lemma 5, any pair of processes from \mathcal{H} is connected by quantum communication, therefore applying Lemma 1 to this set, gives that there is at least 23o(1)\frac{2}{3}-o(1) (which is greater than 12\frac{1}{2} for sufficiently large n) probability that all non-faulty processes return the same output bit. Since 0 and 11 are equally likely, thus the probabilistic guarantee on the weak global coin follows.

The number of rounds is deterministic and upper bounded by O(k2γα)=O((log(n/d)logα)2lognlogα)=O((lognlogα)3)O(k^{2}\cdot\gamma_{\alpha})=O\Big{(}\big{(}\frac{\log(n/d)}{\log{\alpha}}\big{)}^{2}\frac{\log{n}}{\log{\alpha}}\Big{)}=O\Big{(}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{3}\Big{)}. To bound the communication complexity, assume that each graph 𝒢(dαi)\mathcal{G}(d\alpha^{i}), for 0ik0\leq i\leq k, satisfies the desired expanding properties listed in the description of the algorithm. This, by the union bound argument, holds whp. By Lemma 3 at the beginning of each epoch there is at most 16ndαi2\frac{16n}{d\alpha^{i-2}} processes that inquires more than dαid\alpha^{i} other processes in testing rounds of this epoch, for each 0ik0\leq i\leq k. Since each message uses at most O(logn)O(\log{n}) bits and qubits, thus a single testing round in an epoch adds at most

i=0k16ndαi2dαi16kdα2nlogn\sum_{i=0}^{k}\frac{16n}{d\alpha^{i-2}}\cdot d\alpha^{i}\leq 16kd\alpha^{2}\cdot n\log{n}

qubits and bits to communication complexity. Since there is exactly O((lognlogα)3)O\Big{(}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{3}\Big{)} testing rounds, thus the O((lognlogα)4dα2nlogn)O\Big{(}\big{(}\frac{\log{n}}{\log{\alpha}}\big{)}^{4}d\alpha^{2}\cdot n\log{n}\Big{)} upper bound on the total communication complexity of the algorithm follows. Dividing the latter by nn we receive amortized formula per process. ∎

Appendix C The Analysis of FastCounting algorithm

Proof of Theorem 5.

Let us first argue for the correctness. If |𝒫|=1|\mathcal{P}|=1, then the conditional statement in line 3 proves the correctness. Next, we proceed by induction. Assume that the correctness is proved for any subset 𝒫𝒫\mathcal{P}^{\prime}\subseteq\mathcal{P}. Therefore, whenever a process pp executes a recursive call in line 3, its variable activep\texttt{active}_{p} is assigned to the number being an estimation of non-faulty active processes in the set 𝒫g\mathcal{P}_{g}. Consider now set {activep:p𝒫}\{\texttt{active}_{p}:p\in\mathcal{P}\}. This is the set of input messages to the execution of the Gossip algorithm, see line 3. Thus, by Theorem 4, we get that unless all process from a group 𝒫p\mathcal{P}_{p} crashes, then all other non-faulty processes will have the number activep\texttt{active}_{p} in their sets \mathcal{R} (line 3). Therefore, the sum returned in line 3 is the actual estimate of the number of all-non faulty processes in the set 𝒫\mathcal{P}, since it can differ from the number of all processes starting the algorithm only by the number of crashed processes.

The bounds on time and communication complexity follows immediately from the bounds on complexity of the Gossip algorithm proposed in Theorem 4 and the fact that depth of the recursion at most O(logn/logx)O(\log{n}/\log{x}). ∎

Appendix D Further Results

Lemma 6.

For any Consensus algorithm there is a strategy of an adaptive adversary under which some process sends Ω(n)\Omega(n) messages with non-zero constant probability.

Proof.

Suppose otherwise. Consider any process pp and two executions, in one of them pp starts with value 0 and in the other starts with 11. In both executions, the adversary fails all processes that want to send a message to pp and all to whom pp sends a message. For each pair (p,q)(p,q), we obtain two probabilities α(p,q),β(p,q)\alpha(p,q),\beta(p,q), of pp sending a message to qq in the first and the second execution. The sum of probabilities qGα(p,q)+β(p,q)\sum_{q\in G}\alpha(p,q)+\beta(p,q) is o(n)o(n) with constant probability c>0c>0, by contradictory assumption (recall that this sum is for the fixed process pp).

Repeat the above for every process pp. This way we obtain a complete weighted graph (by weights α(,)\alpha(\cdot,\cdot) and β(,)\beta(\cdot,\cdot)). Again, by contradictory assumption, the total sum of all weights is o(n2)o(n^{2}) with the same constant probability c>0c>0, therefore, because the number of pairs is n(n1)n(n-1), there is a pair p,qp,q such that

α(p,q)+α(q,p)+β(p,q)+β(q,p)=o(1).\alpha(p,q)+\alpha(q,p)+\beta(p,q)+\beta(q,p)=o(1)\ .

Consider an execution \mathcal{E} in which the adversary crashes all but processes p,qp,q in the very beginning. It assigns value 0 to pp and value 11 to qq. It follows that in this execution p,qp,q try to communicate directly with probability α(p,q)+β(q,p)=o(1)\alpha(p,q)+\beta(q,p)=o(1). With complementary probability, 1o(1)1-o(1), they do not communicate. In this case, pp cannot distinguish this execution \mathcal{E} from another execution p\mathcal{E}_{p} in which all processes start with value 0 but the adversary crashes all of them but pp in the beginning – hence, by validity condition of consensus, pp must decide 0 in both of these executions. Similarly, qq cannot distinguish the constructed execution \mathcal{E} from execution q\mathcal{E}_{q} in which all processes start with value 11 but the adversary crashes all of them but qq in the beginning – hence, by validity condition of consensus, qq must decide 11 in both of these executions. Both the above facts must hold with probability 11, as validity is required with probability 11. Consequently, in the constructed execution cEcE, pp decides 0 while qq decides 11, with probability 1o(1)1-o(1), which violates agreement condition of consensus. This contradiction concludes the proof of the lemma. ∎

Next, we argue that no constant-time Consensus algorithm uses amortized polylogarithmic number of communication bits, per process.

Lemma 7.

Any quantum algorithm solving Consensus in expected time O(1)O(1) requires a polynomial number of messages per process (amortized), whp.

Proof.

Consider a Consensus algorithm working in expected τ\tau number of rounds, for some τ=O(1)\tau=O(1). By Markov inequality, the algorithm terminates by time 2τ2\tau with probability at least 1/21/2, for every strategy of the (adaptive) adversary. We base on the idea in the proof of Lemma 6, but we do it for every round r2τr\leq 2\tau and for number of sent messages in a round being rnr/(2τ)r\cdot n^{r/(2\tau)}. Using recursive argument, we argue that there is a set ArA_{r} of at least Ω(nrnr/(2τ))\Omega(\frac{n}{r\cdot n^{r/(2\tau)}}) processes (instead of just a pair of elements as in the proof of Lemma 6) that communicate with each other with probability o(1)o(1), with already sending Ω(n(r1)/(2τ))\Omega(n^{(r-1)/(2\tau)}) messages, amortized per process in this set. In one of the rounds, however, a constant (1/(2τ)1/(2\tau)) fraction of alive processes must increase their communication peers by factor of at least n1/τn^{1/\tau}, because in τ\tau expected rounds (so also with constant probability) they need to contact Ω(n)\Omega(n) other peers in some executions – again, by the same argument as in the proof of Lemma 6 (i.e., otherwise, we could find a pair of them, assign different values, and enforce decision on them with a constant probability, violating agreement condition). In such round, the adversary allows them to send their message without crashes, thus enforcing total number of Ω(nrnr/(2τ)n(r1)/(2τ)n1/τ)=Ω(n1+1/(2τ))\Omega(\frac{n}{r\cdot n^{r/(2\tau)}}\cdot n^{(r-1)/(2\tau)}\cdot n^{1/\tau})=\Omega(n^{1+1/(2\tau)}) messages, i.e., at least Ω(n1/(2τ))\Omega(n^{1/(2\tau)}) messages amortized per every process in the system. As this happens with a constant probability, the proof follows. ∎