Byzantine Spectral Ranking

Arnhav Datar
Indian Institute of Technology, Madras
[email protected]
&Arun Rajkumar ¹¹footnotemark: 1
Indian Institute of Technology, Madras
[email protected]
John Augustine
Indian Institute of Technology, Madras
[email protected]
Supported in part by the Robert Bosch Centre for Data Science and Artificial Intelligence, IIT Madras.Supported in part by the Cryptography, Cybersecurity, and Distributed Trust pCoE and the Accenture CoE at IIT Madras.

Abstract

We study the problem of rank aggregation where the goal is to obtain a global ranking by aggregating pair-wise comparisons of voters over a set of objects. We consider an adversarial setting where the voters are partitioned into two sets. The first set votes in a stochastic manner according to the popular score-based Bradley-Terry-Luce (BTL) model for pairwise comparisons. The second set comprises malicious Byzantine voters trying to deteriorate the ranking. We consider a strongly-adversarial scenario where the Byzantine voters know the BTL scores, the votes of the good voters, the algorithm, and can collude with each other. We first show that the popular spectral ranking based Rank-Centrality algorithm, though optimal for the BTL model, does not perform well even when a small constant fraction of the voters are Byzantine.

We introduce the Byzantine Spectral Ranking Algorithm (and a faster variant of it), which produces a reliable ranking when the number of good voters exceeds the number of Byzantine voters. We show that no algorithm can produce a satisfactory ranking with probability $>1/2$ for all BTL weights when there are more Byzantine voters than good voters, showing that our algorithm works for all possible population fractions. We support our theoretical results with experimental results on synthetic and real datasets to demonstrate the failure of the Rank-Centrality algorithm under several adversarial scenarios and how the proposed Byzantine Spectral Ranking algorithm is robust in obtaining good rankings.

1 Introduction

Rank aggregation is a fundamental task in a wide spectrum of learning and social contexts such as social choice [Soufiani et al., 2014, Caplin and Nalebuff, 1991], web search and information retrieval [Brin and Page, 1998, Dwork et al., 2001], recommendation systems [Baltrunas et al., 2010] and crowd sourcing [Chen et al., 2013]. Given pairwise comparison information over a set of $n$ objects, the goal is to identify a ranking that best respects the revealed preferences. Frequently, we are also interested in the scores associated with the objects to deduce the intensity of the resulting preference. There have been several solutions to this problem including Spectral Ranking methods such as Rank-Centrality [Negahban et al., 2017] and MLE methods [Ford, 1957].

The Bradley-Terry-Luce (BTL) model [Bradley and Terry, 1952, Luce, 1959] has been prevalent for a vast variety of ranking algorithms. The BTL model assumes that there are $n$ objects that are to be compared and each of these objects has a positive weight ( $w_{i}$ ) associated with itself. Whenever any voter is asked a query for a pair, the voter claims $i$ is better than $j$ with probability ${w_{i}}/{(w_{i}+w_{j})}$ independently at random. However, the BTL model assumes that all voters are identical and use the same probabilistic strategy for voting. Generally, in crowd-sourced settings, voters differ significantly from each other. Some of the voters may be spammers or even direct competitors of the entity ranking the objects, leading to data poisoning [Sun et al., 2021]. To model this situation, we assume that there is a split in the population. We consider that there are $K-F$ good voters and $F$ Byzantine voters, to appropriately model the division of the voters. The good voters vote according to the BTL model, while the Byzantine voters can potentially be controlled by an adversary that can centrally coordinate their behavior.

1.1 Overview of our Results

We naturally initially consider the tolerance of the Rank-Centrality algorithm against a constant fraction of Byzantine voters. Negahban et al. [2017] showed that for an Erdős-Rényi pairwise-comparison graph $G(n,p)$ we have

Theorem 1 (Informal from Negahban et al. [2017]).

If $p\in\Omega(\log n/n)$ and $k\in\Omega(1)$ then the Rank-Centrality algorithm outputs a weight-distribution ( $\pi$ ) such that with high probability¹¹1Whenever we say with high probability we mean with probability $\geq 1-\frac{1}{n^{c}}$ for $c\geq 0$

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\mathcal{O}\left(\sqrt{\frac{\log n}{knp}}\right)

Here $\tilde{\pi}$ are the true weights of the objects and $k$ is the number of voter queries asked per edge in $G$ . We can see that when $k\in\omega(1)$ the RHS goes to 0 as $n$ goes to $\infty$ . We show that a strategy as simple as all Byzantine voters voting the reverse of the weights causes the Rank-Centrality algorithm to output weights far from true weights with high probability.

Theorem 2 (Informal).

There exists a strategy that the Byzantine voters can use such that the Rank-Centrality algorithm outputs a distribution $\pi$ such that with high probability

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\Omega(1)

provided there are a constant fraction of Byzantine voters

Even simple algorithms that output the same weights regardless of the voter’s responses also give $\Theta(1)$ error. To tackle this situation, we propose Byzantine Spectral Ranking, an algorithm that runs in polynomial time to remove some of the voters such that the Rank-Centrality algorithm can be applied. As long as the proportion of Byzantine voters is bounded strictly below $1/2$ , the algorithm removes some voters such that the remaining Byzantine voters are unlikely to cause the Rank-Centrality algorithm to deviate significantly.

Theorem 3 (Informal).

For a range of values of $k$ , there exists an $\mathcal{O}(n^{2})$ -time algorithm that removes voters such that with high probability we have:

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\mathcal{O}\left(\frac{d_{max}}{k}\right)

Here $d_{max}$ is the maximum degree of the comparison graph. Finally, we also show that when $F\geq K/2$ there is no algorithm that can solve this problem for all weights with probability $>1/2$ , by coming up with a strategy that the Byzantine voters can use for two different weight distributions, and showing therefore that the algorithm must fail in at least one of the weight distributions.

Theorem 4 (Informal).

If $F\geq K/2$ , then there is no algorithm that can for all weights ( $\tilde{\pi}$ ) with probability $>0.5$ , give a weight distribution ( $\pi^{\ast}$ ) such that

\frac{\lVert{\pi}^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in o(1)

1.2 Related Works

In recent times, there has been a lot of work on building robust machine learning algorithms [Blanchard et al., 2017, El Mhamdi et al., 2018, Li et al., 2020, El-Mhamdi et al., 2021, Chen et al., 2018, Wu et al., 2021] that are resilient to Byzantine faults.

Ranking from pairwise comparisons has been a very active field of research and there have been multiple algorithms proposed with regard to the BTL model [Negahban et al., 2017, Agarwal et al., 2018, Rajkumar and Agarwal, 2014, Chen and Suh, 2015]. One of the most popular algorithms proposed in recent times has been the Rank-Centrality algorithm. Negahban et al. [2017] were able to show an upper-bound on the relative $L_{2}$ -error (between true weights and proposed weights) such that the relative $L_{2}$ -error goes to 0 for larger $n$ . They did this by finding the stationary distribution of matrix $P$ which was defined as

P_{ij}=\begin{cases}\frac{A_{ij}}{d_{max}}&\text{if }i\neq j\\ 1-\frac{1}{d_{max}}\sum_{k\neq i}A_{ik}&\text{if }i=j\end{cases}

(1)

Here $A_{ij}$ is the number of times $i$ beats $j$ in $k$ queries divided by $k$ , if $(i,j)\in E(G)$ else $A_{ij}=0$ . The pseudocode can be found in Algorithm 1. They also show that their algorithm outperformed other algorithms on various datasets. As our work builds on Negahban et al. [2017] we have included a primer to Rank-Centrality in the Appendix A.1.

Algorithm 1 Rank-Centrality

n

objects to compare,

E

the set of edges between these

n

objects,

A

the edge weights calculated based on the voter inputs.

2:A weighing of the

n

objects such that the output weight is fairly close to input hidden weights.

3:Compute the transition matrix P according to Equation 1

4:Compute the stationary distribution

\pi

(as the limit of

p^{T}_{t+1}=p^{T}_{t}P

Besides research on adversarial attacks in ranking, there has also been research on how the robustness of the ranking algorithms when exposed to noise. Cucuringu [2016] proposed Sync-Rank, by viewing the ranking problem as an instance of the group synchronization problem over the group $SO(2)$ of planar rotations. This led to a robust algorithm for ranking from pairwise comparisons, as was verified by synthetic and real-world data. Mohajer and Suh [2016] worked on a stochastic noise model in order to find top- $K$ rankings. They were able to show an upper-bound on the minimum number of samples required for ranking and were able to provide a linear-time algorithm that achieved the lower-bound. However, both of these works [Cucuringu, 2016, Mohajer and Suh, 2016] were primarily tackling the ranking problem when there was noisy and incomplete data as opposed to an adversary who was trying to disrupt the algorithm.

Suh et al. [2017] worked on the adversarial ranking problem considering two cases one where the fraction of adversarial voters is known and is unknown. They got results that were asymptotically equivalent to Chen and Suh [2015]’s Spectral-MLE algorithm despite the presence of adversarial voters. However, they worked with the significantly weaker assumption that the adversarial voters will vote exactly opposite to the faithful voters (i.e. if the good voters voted that $i$ is better than $j$ with probability $\frac{w_{i}}{w_{i}+w_{j}}$ then the Byzantine voters voted with probability $\frac{w_{j}}{w_{i}+w_{j}}$ ). Our setting is much more general in that we do not make any assumptions regarding the Byzantine voters.

Agarwal et al. [2020] worked on rank aggregation when edges in the graph were compromised. They were able to prove fundamental results regarding the identifiability of weights for both general graphs as well as Erdős-Rényi graphs. They also came up with polynomial-time algorithms for identifying these weights. However, their model is significantly different from ours. They assume that some of the comparisons in the comparison graphs might have been completely compromised while the others are free from any kind of corrtuption. In practice this is unlikely to happen, we assume a stronger model where every edge can have good and Byzantine voters. A detailed comparison with Agarwal et al. [2020] made in section 3.5

2 Problem Setting

Given a set of $n$ objects, our goal is the obtain a ranking from a set of pairwise comparisons over the objects. We assume that the pairs to be compared are determined by an Erdős-Rényi comparison graph $G(n,p)=(V,E)$ . Each edge in the graph corresponds to a pair that can be compared and we allow at most $k$ comparisons per pair.

The pairs can be compared by voters who fall into one of two categories - good or Byzantine. Out of $K$ total voters, $F$ are assumed to be Byzantine. Every good voter when queried to compare a pair of objects, votes stochastically according to a Bradley-Terry-Luce (BTL) preference model parametrized by a weight/score vector $w\in\mathbb{R}^{n}_{+}$ where the probability of preferring object $i$ over object $j$ is given by $\frac{w_{i}}{w_{i}+w_{j}}$ . The Byzantine voters on the other hand can vote however they wish (stochastic/deterministic). To make the problem general we consider a powerful central-adversary that could potentially control all of the Byzantine voters. We assume that the Byzantine adversary knows (1) The queries asked to the good voters and their corresponding votes (2) The comparison graph $G(n,p)$ (3) The learning algorithm that is used to produce the ranking (4) The underlying true BTL weights of the objects that the good voters use to produce their comparisons.

We assume that both the good voters and the byzantine voters will produce the same output when the same pair is asked to be compared multiple times by them.²²2This ensures that simple algorithms which query all voters to vote on the same pair multiple times and obtain a median preference will not work.

The goal of a learning algorithm is to use the least number of voter comparisons to output a score vector that is as close to the true scores. The learning algorithm is allowed to choose any voter to compare any pair of objects. However, it does not know the type of voter - good or Byzantine and naturally, a good algorithm must be robust to the presence of the powerful Byzantine voters. Finally, we work in the passive setting, where a learning algorithm is needed to fix the pair to voter mapping apriori before the votes are collected.

3 Results for the Erdős-Rényi Graph

We focus our efforts on the Erdős-Rényi graph primarily because it has been heavily studied in the context of rankings. We pay careful attention to the $p\in\Theta(\log n/n)$ . We go on to find that our algorithms turned out to be exponential in terms of the degree. This makes the Erdős-Rényi graph an optimal candidate for working out our algorithms because the degrees of the Erdős-Rényi graph will also be logarithmic with high probability, therefore giving us polynomial-time algorithms.

3.1 Rank-Centrality Failure

We initially try to motivate the need for a robust version of the Rank-Centrality algorithm by showing that even a simple strategy from a Byzantine adversary will lead to an unsuitable ranking. In this scenario, the Byzantine adversary does not know the comparison graph, nor does it know the votes of good voters, and still can beat the Rank-Centrality algorithm. Formally we show that:

Theorem 5.

Given $n$ objects, let the comparison graph be $G(n,p)$ . Each pair in $G$ is compared $k$ times by voters chosen uniformly at random amongst the $K$ voters. If all Byzantine voters vote according to the opposite-weight strategy in an Erdős-Rényi graph with $p=C\log n/n$ then there exists a $\tilde{\pi}$ for which the Rank-Centrality algorithm will output a $\pi$ such that

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{C\log n}{(2\sqrt{b}+b+2bC\log n)}\cdot\frac{b-1}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability $\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}-\frac{1}{n^{C/3-1}}$ . Here $b$ is the skew in object weights i.e. $b=\max_{i,j}w_{i}/w_{j}$ .

Here when we say the opposite-weight strategy we mean a strategy where the Byzantine voter will vote for $i$ if $w_{i}\leq w_{j}$ otherwise will vote for $j$ . If we carefully examine the RHS, we see that the first term asymptotically converges to $\frac{1}{2b}$ . The third term is also clearly a constant if there is a constant fraction of Byzantine voters. Therefore, showing the existence of a strategy that will with high probability, significantly deviate the weights given by the Rank-Centrality algorithm. The detailed analysis can be found in Appendix A.2.

Proof Sketch.

We initially show that

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(2\sqrt{b}+b\lVert\Delta\rVert_{2})}

(2)

Here $\pi$ is the stationary distribution of the transition matrix ( $P$ ) defined in Equation 1 and $\Delta$ denotes the fluctuation of the transition matrix around its mean ( $\tilde{P})$ , such that $\Delta\equiv P-\tilde{P}$ . We initially prove that $\lVert\Delta\rVert_{2}$ will at most be $1+d_{max}$ , since there are at most $1+d_{max}$ terms in any row or column and each term can deviate by at most 1. Following this, we show that with probability $\geq 1-\frac{1}{n^{C/3-1}}$ we will have $d_{max}\leq 2C\log n$ . Finally, we show that $\lVert\tilde{\pi}^{T}\Delta\rVert$ will be $\in\Omega(pn)$ with probability $\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}$ by constructing a weight-distribution where half the weights are low and half were high. An arbitrary weight-distribution could not be used because if $\tilde{\pi}=\left[\frac{1}{n},\dots,\frac{1}{n}\right]$ we would get $\tilde{\pi}^{T}\Delta=0$ for all $\Delta$ . This is because the row sum of $P$ and $\tilde{P}$ will always be 1. ∎

3.2 Byzantine Spectral Ranking

In this section, we propose a novel algorithm called Byzantine Spectral Ranking (BSR). The BSR algorithm relies on asking multiple voters multiple queries and based on the collective response decides whether a voter is acting suspiciously or whether it is plausible that the voter might be a good voter. For each object $i$ to be compared, we take a set of $d_{i}$ objects to compare it against and have $k$ voters compare $i$ with all of the $d_{i}$ objects (leading to $kd_{i}$ comparisons for each voter). Based on the voter’s responses, we try to eliminate voters who are likely to be Byzantine. The BSR algorithm ensures that a constant fraction of good voters remain and that if any Byzantine voters remain then these voters have votes similar to good voters. Intuitively, the BSR algorithm goes over every single subset of $[d_{i}]$ and tries to find if the Byzantine voters have collectively voted strongly in favor or against this subset. The pseudocode can be found in Algorithm 2.

Algorithm 2 Byzantine Spectral Ranking

n

objects,

K

voters, Comparison graph

G

k

comparisons allowed per edge, parameter

Q

2:Weights (

\pi

) for each of the objects in

[n]

3:procedure

\mathsf{Bound\_Sum\_Deviations}

(

i,S,d,k,\alpha

)

4: Pick

k

voters randomly from the voter set (

[1,\dots,K])

5: The

k

voters are queried to compare all objects in

S

with

i

\triangleright

k\times d

queries in total

T

represents the binary representation of the voters’ inputs

\triangleright

T

will be

k\times{d}

matrix

\delta\leftarrow\sqrt{\frac{Q}{2}d\log k}

\triangleright

We show later that

Q

needs to be

\geq 1

M\leftarrow[0]_{2^{d}\times k}

9: for each

\xi\in\{1,-1\}^{d}

10:

U\leftarrow T\xi

11:

\hat{m}\leftarrow\mathsf{Median}(U)

12:

M[\xi][v]\leftarrow 1

if voter

v

5\delta

away from

\hat{m}

and 0 otherwise

13: end for

14:

\mathsf{max\_out}\leftarrow 8k^{1-Q}+{8k^{1-\alpha}}

15: Remove voters with

M[\xi][v]=1

if the sum of

M[\xi]

\geq\mathsf{max\_out}

for some

\xi

16: Return leftover

\mathsf{votes}

17:end procedure

18:

\alpha\leftarrow 1-{\log d_{max}}/{\log k}

\triangleright

\alpha

is a parameter whose value is set during analysis

19:for each object

i

to be compared do

20:

\mathsf{votes}\leftarrow\mathsf{Bound\_Sum\_Deviations}

(

i

N_{G}(i)

d_{i}

k

\alpha

)

21: Using

\mathsf{votes}

update

P

as described in Equation 1

22:end for

23:Compute the stationary distribution

\pi

which is the limit of

p^{T}_{t+1}=p^{T}_{t}P

3.2.1 Analysis

The detailed analysis can be found in Appendix A.3. Based on the proof of the Rank-Centrality Convergence the only result which does not hold for Byzantine voters is Lemma 3, where they bound the 2-norm of the $\Delta$ matrix. We show that:

Lemma 6.

For an Erdős-Rényi graph with $p\geq 10C^{2}\log n/n$ with $k\in\Omega(np)$ comparisons per edge, algorithm 2 removes some voters such that we have:

\lVert\Delta\rVert_{2}\leq 80\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability $\geq 1-16n^{(1-1.5C^{2})}$ .

Proof Sketch.

Proving that $\lVert\Delta\rVert_{2}$ is bounded with high probability turns out to be equivalent to proving that $\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s)$ is bounded. $C_{ij}$ is defined as $kA_{ij}-kp_{ij}$ , where $p_{ij}=\frac{w_{i}}{w_{i}+w_{j}}$ . Since it is a sum of $d_{i}$ absolute values it can be upper-bounded by considering the union bound over all $2^{d_{i}}$ combinations for this sum.

We initially show that with high probability for all $2^{d_{i}}$ values of $\xi$ the mean( $\hat{m}$ ) that we compute in the BSR algorithm will only be $\mathcal{O}(\delta)$ away from the expected mean. This gives us a good check for whether a voter is acting out of the ordinary or not. The algorithm then filters out voters that are collectively acting out of the ordinary for a particular $\xi$ . We show that at the end of these removals, the number of voters will only be a constant factor far from $k$ , i.e. most of the good voters will be retained. Finally, we show an upper-bound on the row sum of the absolute values of the deviations of the remaining Byzantine voters to complete the proof.∎

Intuitively, we get the terms ${d_{max}}/{k}$ and $\sqrt{{\log k}/{d_{min}}}$ in Lemma 6 by considering the maximum average deviation for the row sum that can be caused for any $\xi$ by the Byzantine voters. This can be written as $\mathsf{max\_out}$ voters having a deviation of $d$ and the remaining $k$ voters having a deviation of $\delta$ . Therefore we can see that the overall deviation will be $\frac{1}{kd}\left(\mathsf{max\_out}\cdot d+\delta\cdot k\right)$ . Substituting the values of the parameters, and choosing an appropriate $Q$ we get $\mathcal{O}\left(\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right)$ . Finally, using lemma 6, and a few lemmas from Negahban et al. [2017] we conclude that:

Theorem 7.

Given $n$ objects, let the comparison graph be $G(n,p)$ . Each pair in $G$ is compared $k$ times with the outcomes of comparisons produced as per the Byzantine-BTL model with weights $[w_{1},\dots,w_{n}]$ . Then there exists an algorithm such that when $F\leq K(1-\epsilon)/2$ , $p\geq 10C^{2}\log n/n$ , $\epsilon>0$ and $k\geq 18d_{max}/\epsilon^{2}$ the following bound on the error

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 480b^{5/2}\max\left({\frac{d_{max}}{k}},\sqrt{\frac{\log k}{d_{min}}}\right)

holds with probability $\geq 1-6n^{-C/8}-16n^{(1-1.5C^{2})}$ .

Remark. The BSR algorithm gives a suitable ranking in polynomial time, but is fairly slow as the complexity will be $\in\Omega(n^{5C^{2}+1})$ , which is not suitable for larger values of $n$ and $C$ .

Remark. Our process of filtering out Byzantine voters is tightly tied to the analysis of the Rank-Centrality algorithm and the procedure to remove voters may not be suitable for other algorithms. The advantage of tying the removal procedure to the ranking algorithm is that we would not remove voters who may vote in an adversarial fashion, but do not significantly affect the row sum of the absolute difference between the empirical and true Markov chain transition matrices. An algorithm that tries to explicitly identify Byzantine voters might end up removing such voters too (and hence their votes for other useful pairs) which might be counter-productive.

3.3 Fast Byzantine Spectral Ranking

While Algorithm 2 performs asymptotically similar to the Rank-Centrality algorithm, it is fairly slow requiring at least $\Omega(2^{d_{min}}\cdot n)$ time. We now present the Fast Byzantine Spectral Ranking (FBSR) Algorithm for more practical settings. Essentially in our previous algorithm, we ensure that $\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s)$ is upper-bounded. This takes up a lot of time because essentially we iterate over all $\xi\in\{-1,1\}^{d_{i}}$ in the algorithm. But, this can be improved. If $d_{i}=\beta\log_{2}n$ we can bucket the entries into $\beta$ buckets ( $[B_{1},\dots B_{\beta}]$ ) of size $\log_{2}n$ each. Therefore an equivalent formulation would be to upper-bound $\sum_{l=1}^{\beta}\mathbb{P}(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>\frac{kd_{i}s}{\beta})$ . Here instead of summing up $d_{i}$ absolute values, we will only be summing up $\log_{2}n$ values $\beta$ times.

Algorithm 3 Fast Byzantine Spectral Ranking

\alpha\leftarrow 1-\log((2+C/8)\log n)/\log k

2:for each object

i

to be compared do

\mathsf{max\_size}\leftarrow\log_{2}n

\triangleright

Can be made larger than

\log_{2}n

\beta\leftarrow\left\lceil\frac{d_{i}}{\mathsf{max\_size}}\right\rceil

5: Split

N_{G}(i)

into

\beta

buckets such that they are equally sized and let them be

[B_{1},\dots,B_{\beta}]

6: for

\mathsf{bucket}\leftarrow 1

\beta

\mathsf{votes}\leftarrow\mathsf{Bound\_Sum\_Deviations}

(

i

B_{\mathsf{bucket}}

\mathsf{size}(B_{\mathsf{bucket}})

k

\alpha

)

8: Using

\mathsf{votes}

update

P

as described in Equation 1

9: end for

10:end for

11:Compute the stationary distribution

\pi

which is the limit of

p^{T}_{t+1}=p^{T}_{t}P

Theorem 8.

Given $n$ objects, let the comparison graph be $G(n,p)$ . Each pair in $G$ is compared $k$ times with the outcomes of comparisons produced as per the Byzantine-BTL model with weights $[w_{1},\dots,w_{n}]$ . Then there exists an algorithm such that when $F\leq K(1-\epsilon)/2$ , $p=10C^{2}\log n/n$ , $\epsilon>0$ , and $k\geq 18(2+C/8)\log n/\epsilon^{2}$ the following bound on the error

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 480b^{5/2}\max\left({\frac{\log n}{k}},\sqrt{\frac{\log\log n}{\log n}}\right)

holds with probability $\geq 1-(6+240C^{2})n^{-C/8}$ that runs in $\mathcal{O}(n^{2})$ time.

The detailed analysis can be found in Appendix A.4.

3.4 An Impossibility Result and optimality of FBSR

The main idea in the BSR algorithm was the estimation of an accurate mean for all values of $\xi$ . However, we can see that finding the mean can become very challenging if the Byzantine voters outnumber the good voters. This is because the Byzantine voters can always create a shifted binomial distribution and trick us into believing that an entirely different value is the mean.

We answer a natural question: is it even possible to find algorithms when $F$ crosses $K/2$ ? We prove a fundamental result showing that for any ranking algorithm to give a satisfactory result, a majority of good voters will always be required. Formally we show that:

Theorem 9.

If $F\geq K/2$ , then no algorithm can for all weights ( $\tilde{\pi}$ ), output weights ( $\pi^{\ast}$ ) such that

\frac{\lVert{\pi}^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n)

with probability $>1/2$ , where $f(n)$ is a function that converges to 0 as $n$ goes to $\infty$ .

Proof Sketch.

We prove this result by contradiction, let us suppose there is an algorithm $\mathcal{A}$ which can give a satisfactory ranking for all weights. We consider the $F=K/2$ case, as the case where $F>K/2$ can clearly be reduced to the $F=K/2$ case.

We then consider two instances: (1) Good voters vote according to $\tilde{\pi}$ and the Byzantine voters vote according to $\tilde{\pi}^{\prime}$ . (2) Byzantine voters vote according to $\tilde{\pi}$ and the good voters vote according to $\tilde{\pi}^{\prime}$ . We go on to claim that when $F=K/2$ these instances are indistinguishable to any algorithm. Since $\mathcal{A}$ succeeds with probability $>1/2$ for any instance, we can say by using the union bound that there must be a $\pi^{\ast}$ such that it is close to both $\tilde{\pi}$ and $\tilde{\pi}^{\prime}$ . We can then choose the values of $\tilde{\pi}$ and $\tilde{\pi}^{\prime}$ such that they are far from each other and therefore showing that there is no $\pi^{\ast}$ that it is close to both $\tilde{\pi}$ and $\tilde{\pi}^{\prime}$ . Giving us a contradiction. ∎

The detailed proof can be found in Appendix A.5.

Remark. This impossibility consequently shows us that the FBSR algorithm (cf. Theorem 8) is optimal in terms of tolerance towards the Byzantine fraction ( $F/K$ ).

3.5 Comparison with Agarwal et al. [2020]

While the setting of Agarwal et al. [2020] is different from our setting as mentioned in Section 1.2, if we were to consider both works in Byzantine voter setting a comparison could be made by considering an edge that has a Byzantine voter to be corrupted as per their model. Here, their algorithm ,to have a valid convergence proof, needs $k\in\Omega(\log n)$ . This would imply that $F/K\in O(\log\log n/\log^{2}n)$ for the sparse graphs and $F/K\in O(1/\log n)$ for denser graphs. Our algorithms can potentially tolerate a significantly higher corruption rate of $1/2$ .

Another way to compare our results with Agarwal et al. [2020] would be to set $k=\Omega(\log n),p=\Theta(\log n/n)$ for our model and $k^{\prime}=1,p^{\prime}=pk/k^{\prime}$ for their model. This allocation ensures only a single voter per edge for their model and equal number of comparisons in expectation. We see that their algorithm does not converge ( $L_{1}\in\mathcal{O}(\sqrt{\log n})$ ), while our algorithms converges $\left(L_{2}\in\mathcal{O}\left(\max\left(\frac{\log n}{k},\sqrt{\frac{\log k}{\log n}}\right)\right)\right)$ . Furthermore, our algorithm can handle a corruption rate of $1/2$ while their algorithm can handle a corruption rate of $\mathcal{O}\left(\log\log n/\log n\right)$ ³³3they also come up with an algorithm for denser graphs that can handle a corruption rate of $1/4$ , however the algorithm has worse time-complexity and requires $p\in n^{\epsilon-1}$ for $\epsilon>0$ .

4 Experimental Results

In this section, we confirm our theoretical bounds with experiments conducted on synthetic and real data. While truly Byzantine strategies might be hard to find and can even be computationally infeasible, we consider some practically possible strategies. We show the performance of Rank-Centrality against these strategies, supporting our results in section 3.1. We then proceed to show that these strategies are not as effective against the BSR/FBSR algorithm which gives better rankings.

Experimental Settings: We see that for the FBSR Algorithm to work well we need the existence of at least a few entries that are at a distance $\geq 5\delta$ from $\hat{m}$ . However, since we are summing up $\log n$ entries we will need $5\delta\leq\log n$ . Otherwise, the BSR algorithm will run exactly as the Rank-Centrality algorithm. Using the value of $\delta$ from Algorithm 3 and setting $Q=1$ and using $k\geq\log n$ we get:

\frac{25}{2}\log\log n\leq\log n

While it is asymptotically true, we see that the RHS only becomes greater than the LHS when $n\geq 1.18\times 10^{21}$ . However, as our proof was designed to work against any Byzantine strategy, and at the same time the convergence derivation of Negahban et al. [2017] was fairly loose (for example, the union bound of $2^{d}$ probabilities to ensure the sum of absolute terms is bounded) the FBSR algorithm empirically works for a variety of strategies even for significantly smaller thresholds. We applied the FBSR Algorithm with the following modifications considering the smaller $k$ and $n$ values: (1) setting $5\delta=1+\sqrt{\mathsf{size}(B_{\mathsf{bucket}})}$ and (2) setting $\mathsf{max\_out}$ as $k/20$ . All experimental results in sections 4 and 3 have been taken by taking the mean over 10 random trials.

Synthetic Data: We consider the following parameters in our testing (1) $n=200$ (2) $k=100$ (3) $p=20\log n/n$ and (4) $w_{i}=\text{Uniform}(1,100)$ . These are then normalized such that $\sum\tilde{\pi}_{i}=1$ . We compare our model with the Rank-Centrality algorithm on synthetic data. We consider four potential strategies used by Byzantine voters:

(1)

Fixed Order Vote: The Byzantine voters vote according to some random pre-determined order. This is a fairly simple strategy that does not even require the Byzantine voters to know the $\tilde{\pi}$ values.
(2)

Opposite Vote: The Byzantine voters vote exactly the reverse of the weights. We had shown in section 3.1 that this strategy leads to a failure for some weights.
(3)

Opposite Vote Probabilistic: The Byzantine voters are equivalent to good voters, except their final answer is flipped. For example, the Byzantine voters will vote for object $j$ with probability $\frac{w_{i}}{w_{j}+w_{i}}$ and vote for object $i$ with probability $\frac{w_{j}}{w_{j}+w_{i}}$ .
(4)

Random Subset: For each pair, the Byzantine voters either decide to behave like good voters with half probability or decide to opposite vote.

Figures 3 and 3 demonstrate the performance of the Rank-Centrality algorithm and the FBSR algorithm against the aforementioned strategies. For Figure 3, we plot the relative $L_{2}$ -Error against the Byzantine Fraction ( $F/K$ ). Similar to our results in section 3.1, we see that for most strategies the relative $L_{2}$ -Error for the Rank-Centrality Algorithm grows linearly with $F/K$ . We can see that the relative $L_{2}$ -Error for both Fixed Order Vote and Opposite Vote strategies are fairly high for the Rank-Centrality algorithm and that the Fixed Order Vote dominates the relative $L_{2}$ -Error for the FBSR algorithm. However, even though the Fixed Order Vote is the best strategy against the FBSR algorithm. The FBSR algorithm is still able to perform around two times better than the Rank-Centrality algorithm across all values of $F/K$ .

We also confirm that our rankings are close to the actual rankings. To do this we consider Kendall’s Tau Correlation [Kendall, 1938] between our ranking and the actual ranking. From Figure 3, we see that for the Rank-Centrality algorithm, the Opposite Vote strategy is able to give very low values of $\tau$ . When the Byzantine fraction is around $0.3$ we get a Kendall’s Tau Correlation close to $0$ (essentially a random ranking). We also see that the FBSR algorithm is able to give better results for all strategies.

Refer to caption — Figure 1: $L_{2}$ vs $F/K$

To further show the convergence of the algorithms in the Byzantine-BTL model for larger $n$ , we plot the relative $L_{2}$ error and Kendall’s Tau Correlation against the number of objects in Figures 3 and 6 for the Fixed Order Vote strategy. We consider values of $F/K$ in $[0.1,0.2]$ . These results come when we have set $k=n$ . We see that for a large range of the number of objects both metrics have remained constant for the Rank-Centrality algorithm thus supporting our hypothesis. On the other hand, we see that as we increase the number of objects the $L_{2}$ -error gradually falls and the Kendall’s Tau Correlation gradually increases. The rather slow descent of the relative $L_{2}$ -error is backed by Theorem 8 where we see that for larger $k$ the relative $L_{2}$ error is $\in\mathcal{O}\left(\sqrt{\frac{\log\log n}{\log n}}\right)$ .

Real Data: Experimentation with real data was not straightforward, considering that most datasets do not have voter labels nor do they have voter queries as requested by the BSR algorithm (one voter is always asked multiple queries of the form $(i,j)$ where $i$ is fixed and $j$ is varied). To get around this issue, we use complete rankings. We consider the Sushi dataset comprising $5000$ voters ordering the $10$ sushis from most preferred to least preferred. Given the preference order, we get $45$ pairwise votes from each of the voters. Since we are assuming that we are dealing with permutations here, we assume the Byzantine voters can use the following strategies:

(1)

Fixed Order Vote / Opposite Vote: Same as previously defined, but this time the entire permutation is given.
(2)

Opposite Random Flips: Each Byzantine voter starts with the opposite permutation and then swaps some of the objects at random.

Since the Sushi dataset has many voters we do not use our (2)nd modification mentioned above. We compute the true weights by applying the Rank-Centrality algorithm directly (thus the Rank-Centrality algorithm has a big advantage, especially for smaller weights). Figures 6 and 6 show the Rank-Centrality algorithm and the BSR algorithm’s performance. We see that the BSR algorithm achieves lower relative $L_{2}$ -errors/Kendall’s Tau Correlation for most Byzantine fractions and strategies. The comparatively small improvement in Kendall’s Tau Correlation is attributed to a rather small difference in the weights of the various Sushi (5 sushis have weights $\in[0.085,0.13]$ ).

5 Conclusion and Further Work

We studied the potential disruption that Byzantine voters can do to the widely-used Rank-Centrality algorithm and proposed fast solutions to counter the same. Our algorithms and analysis techniques are quite non-trivial and indicate to us the need for developing this theory further. We require $d\in\Theta(\log n)$ and $k\in\Omega(\log n)$ for the success of the BSR and FBSR algorithms. However, Negahban et al. [2017] only required $d\in\Omega(\log n)$ and $k\in\Omega(1)$ . An interesting direction would be to explore whether sub-logarithmic values of $k$ will be possible. Alternatively, a lower bound on the number of required votes per edge for an algorithm to succeed with constant probability would also be an interesting direction.

References

Agarwal et al. [2018] A. Agarwal, P. Patil, and S. Agarwal. Accelerated spectral ranking. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 70–79. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/agarwal18b.html.
Agarwal et al. [2020] A. Agarwal, S. Agarwal, S. Khanna, and P. Patil. Rank aggregation from pairwise comparisons in the presence of adversarial corruptions. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 85–95. PMLR, 7 2020. URL https://proceedings.mlr.press/v119/agarwal20a.html.
Baltrunas et al. [2010] L. Baltrunas, T. Makcinskas, and F. Ricci. Group recommendations with rank aggregation and collaborative filtering. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, page 119–126, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605589060. doi: 10.1145/1864708.1864733. URL https://doi.org/10.1145/1864708.1864733.
Blanchard et al. [2017] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.
Bradley and Terry [1952] R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
Brin and Page [1998] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, page 107–117, NLD, 1998. Elsevier Science Publishers B. V.
Caplin and Nalebuff [1991] A. Caplin and B. Nalebuff. Aggregation and social choice: A mean voter theorem. Econometrica, 59(1):1–23, 1991. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/2938238.
Chen et al. [2013] X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, page 193–202, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450318693. doi: 10.1145/2433396.2433420. URL https://doi.org/10.1145/2433396.2433420.
Chen and Suh [2015] Y. Chen and C. Suh. Spectral mle: Top-k rank aggregation from pairwise comparisons. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 371–380. JMLR.org, 2015.
Chen et al. [2018] Y. Chen, L. Su, and J. Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. SIGMETRICS Perform. Eval. Rev., 46(1):96, jun 2018. ISSN 0163-5999. doi: 10.1145/3292040.3219655. URL https://doi.org/10.1145/3292040.3219655.
Cucuringu [2016] M. Cucuringu. Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and sdp synchronization. IEEE Transactions on Network Science and Engineering, 3(1):58–79, 2016. doi: 10.1109/TNSE.2016.2523761.
Dwork et al. [2001] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, page 613–622, New York, NY, USA, 2001. Association for Computing Machinery. ISBN 1581133480. doi: 10.1145/371920.372165. URL https://doi.org/10.1145/371920.372165.
El Mhamdi et al. [2018] E. M. El Mhamdi, R. Guerraoui, and S. Rouault. The hidden vulnerability of distributed learning in Byzantium. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3521–3530. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/mhamdi18a.html.
El-Mhamdi et al. [2021] E. M. El-Mhamdi, S. Farhadkhani, R. Guerraoui, A. Guirguis, L.-N. Hoang, and S. Rouault. Collaborative learning in the jungle (decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 25044–25057. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/d2cd33e9c0236a8c2d8bd3fa91ad3acf-Paper.pdf.
Ford [1957] L. R. Ford. Solution of a ranking problem from binary comparisons. The American Mathematical Monthly, 64(8):28–33, 1957. ISSN 00029890, 19300972. URL http://www.jstor.org/stable/2308513.
Kendall [1938] M. G. Kendall. A NEW MEASURE OF RANK CORRELATION. Biometrika, 30(1-2):81–93, 06 1938. ISSN 0006-3444. doi: 10.1093/biomet/30.1-2.81. URL https://doi.org/10.1093/biomet/30.1-2.81.
Li et al. [2020] J. Li, W. Abbas, and X. Koutsoukos. Byzantine resilient distributed multi-task learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18215–18225. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/d37eb50d868361ea729bb4147eb3c1d8-Paper.pdf.
Luce [1959] R. D. Luce. Individual Choice Behavior. John Wiley, 1959.
Mohajer and Suh [2016] S. Mohajer and C. Suh. Active top-k ranking from noisy comparisons. 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 875–882, 2016.
Negahban et al. [2017] S. Negahban, S. Oh, and D. Shah. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1):266–287, 2017.
Rajkumar and Agarwal [2014] A. Rajkumar and S. Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 118–126, Bejing, China, 22–24 Jun 2014. PMLR. URL https://proceedings.mlr.press/v32/rajkumar14.html.
Soufiani et al. [2014] H. A. Soufiani, D. Parkes, and L. Xia. Computing parametric ranking models via rank-breaking. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 360–368, Bejing, China, 6 2014. PMLR. URL https://proceedings.mlr.press/v32/soufiani14.html.
Suh et al. [2017] C. Suh, V. Y. F. Tan, and R. Zhao. Adversarial top- $k$ ranking. IEEE Trans. Inf. Theor., 63(4):2201–2225, 4 2017. ISSN 0018-9448. doi: 10.1109/TIT.2017.2659660. URL https://doi.org/10.1109/TIT.2017.2659660.
Sun et al. [2021] G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, and J. Liu. Data poisoning attacks on federated machine learning. IEEE Internet of Things Journal, 2021.
Wu et al. [2021] Y. Wu, H. Chen, X. Wang, C. Liu, P. Nguyen, and Y. Yesha. Tolerating adversarial attacks and byzantine faults in distributed machine learning. In 2021 IEEE International Conference on Big Data (Big Data), pages 3380–3389, 2021. doi: 10.1109/BigData52589.2021.9671583.

Updated Checklist

1.
For all authors…
1. (a)
  
  Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes]
2. (b)
  
  Did you describe the limitations of your work? [Yes]
3. (c)
  
  Did you discuss any potential negative societal impacts of your work? [Yes]
4. (d)
  
  Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes]
2.
If you are including theoretical results…
1. (a)
  
  Did you state the full set of assumptions of all theoretical results? [Yes]
2. (b)
  
  Did you include complete proofs of all theoretical results? [Yes]
3.
If you ran experiments…
1. (a)
  
  Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
2. (b)
  
  Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]
3. (c)
  
  Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes]
4. (d)
  
  Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]
4.
If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…
1. (a)
  
  If your work uses existing assets, did you cite the creators? [N/A]
2. (b)
  
  Did you mention the license of the assets? [N/A]
3. (c)
  
  Did you include any new assets either in the supplemental material or as a URL? [N/A]
4. (d)
  
  Did you discuss whether and how consent was obtained from people whose data you’re using/curating? [N/A]
5. (e)
  
  Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A]
5.
If you used crowdsourcing or conducted research with human subjects…
1. (a)
  
  Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A]
2. (b)
  
  Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A]
3. (c)
  
  Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]

Appendix A Appendix

A.1 A primer on Rank-Centrality

A.1.1 Rank-Centrality Convergence

Rank-Centrality proves the below theorem

Theorem 10 (From Negahban et al. [2017]).

Given $n$ objects and a connected comparison graph $G=([n],E)$ , let each pair $(i,j)\in E$ be compared for $k$ times with outcomes produced as per a BTL model with parameters $w_{1},\dots,w_{n}$ . Then, for some positive constant $C\geq 8$ and when $k\geq 4C^{2}(b^{5}\kappa^{2}/d_{max}\xi^{2})\log n$ , the following bound on the normalized error holds with probability at least $1-4n^{-C/8}$ :

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq\frac{Cb^{5/2}\kappa}{\xi}\sqrt{\frac{\log n}{kd_{max}}}

where $\tilde{\pi}(i)=w_{i}/\sum_{l}w_{l}$ , $b\equiv\max_{i,j}w_{i}/w_{j}$ and $\kappa=d_{max}/d_{min}$

which when extended to Erdős-Rényi random graphs takes the form of

Theorem 11 (From Negahban et al. [2017]).

Given $n$ objects, let the comparison graph $G=([n],E)$ be generated by selecting each pair $(i,j)$ to be in $E$ with probability $d/n$ independently of everything else. Each such chosen pair of objects is compared $k$ times with the outcomes of comparisons produced as per a BTL model with parameters $w_{1},\dots,w_{n}$ . Then if $d\geq 10C^{2}\log n$ and $kd\geq 128C^{2}b^{5}\log n$ , the following bound on the error rate holds with probability at least $1-10n^{-C/8}$ :

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 8Cb^{5/2}\sqrt{\frac{\log n}{kd_{max}}}

where $\tilde{\pi}(i)=w_{i}/\sum_{l}w_{l}$ and $b\equiv\max_{i,j}w_{i}/w_{j}$

A.1.2 Rank-Centrality Convergence Proof Sketch

The proof from theorem 10 to 11 is straightforward and has no relation to the fact that there are Byzantine voters. Rank-Centrality algorithm’s convergence proof for Theorem 10 requires the proving of three lemmas (Lemmas 2,3 and 4 given below)

Lemma 12 (From Negahban et al. [2017]).

For any Markov chain $P=\tilde{P}+\Delta$ with a reversible Markov chain $\tilde{P}$ , let $p_{t}$ be the distribution of the Markov chain $P$ when started with the initial distribution $p_{0}$ . Then,

\frac{\lVert p_{t}-\tilde{\pi}\rVert}{\tilde{\pi}}\leq\rho^{t}\frac{\lVert p_{0}-\tilde{\pi}\rVert}{\tilde{\pi}}\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}+\frac{1}{1-\rho}\lVert\delta\rVert_{2}\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}

where $\tilde{\pi}$ is the stationary distribution of $\tilde{P}$ , $\tilde{\pi}_{min}=\min_{i}\tilde{\pi}(i)$ , $\tilde{\pi}_{max}=\max_{i}\tilde{\pi}(i)$ and $\rho=\lambda_{max}(\tilde{P})+\Delta\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}$

The next lemma bounds the norm of the $\Delta$ matrix:

Lemma 13 (From Negahban et al. [2017]).

For some constant $C\geq 8$ , the error matrix $\Delta=P-\tilde{P}$ satisfies

\lVert\Delta\rVert_{2}\leq C\sqrt{\frac{\log n}{kd_{max}}}

with probability at least $1-4n^{-C/8}$ .

The next lemma bounds the value of $1-\rho$ :

Lemma 14 (From Negahban et al. [2017]).

If $\lVert\Delta\rVert_{2}\leq C\sqrt{\frac{\log n}{kd_{max}}}$ and if the value of $k\geq 4C^{2}b^{5}d_{max}\log n(1/d^{2}_{min}\xi^{2})$ then,

1-\rho\geq\frac{\xi d_{min}}{b^{2}d_{max}}

Here we find that Lemmas 12 and 14 hold irrespective of whether there are Byzantine voters or not. Therefore we need to find a substitute for Lemma 13. The proof of Lemma 13 is split into two parts, we mainly focus on the first part where $d$ is sufficiently small and therefore is a sparse graph. Proving Lemma 13 for Erdős-Rényi graphs with lower degrees essentially boils down to proving that $R_{i}=\frac{1}{kd_{max}}\sum_{j\neq i}|A_{ij}-kp_{ij}|\geq s$ with a very low probability, where $s=\frac{C}{2}\sqrt{\frac{\log n+d_{max}\log 2}{kd_{max}}}$ . In simpler terms, this implies that the sum of the absolute values of the deviation of the voters’ entries from the expected entries for a particular object is not a big number with high probability.

A.2 Rank-Centrality Failure

Proof.

Negahban et al. [2017] define an inner product space $L^{2}(\tilde{\pi})$ as a space of $n$ -dimensional vectors $\in\mathbb{R}^{n}$ endowed with

\langle a,b\rangle_{\tilde{\pi}}=\sum_{i=1}^{n}a_{i}\tilde{\pi}_{i}b_{i}.

Similarly, they define $\lVert a\rVert_{\tilde{\pi}}=\langle a,a\rangle_{\tilde{\pi}}$ as the 2-norm in $L^{2}(\tilde{\pi})$ and $\lVert A\rVert_{\tilde{\pi},2}=\max_{a}\lVert Aa\rVert_{\tilde{\pi}}/\lVert a\rVert_{\tilde{\pi}}$ as the operator norm for a self-adjoint operator in $L^{2}(\tilde{\pi})$ . They also relate these norms to norms in the Euclidean as

	$\displaystyle\sqrt{\tilde{\pi}_{min}}\lVert a\rVert$	$\displaystyle\leq\lVert a\rVert_{\tilde{\pi}}\leq\sqrt{\tilde{\pi}_{max}}\lVert a\rVert,$		(3)
	$\displaystyle\sqrt{\frac{\tilde{\pi}_{min}}{\tilde{\pi}_{max}}}\lVert A\rVert_{2}$	$\displaystyle\leq\lVert A\rVert_{\tilde{\pi},2}\leq\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}\lVert A\rVert_{2}.$		(4)

Finally Negahban et al. [2017] define the following matrices.

(1)

$\tilde{\Pi}$ : Diagonal matrix such that $\tilde{\Pi}_{ii}=\tilde{\pi}_{i}$ .
(2)

$S:=\tilde{\Pi}^{1/2}\tilde{P}\tilde{\Pi}^{-1/2}$
(3)

$S_{1}$ : The rank-1 projection of $S$ .
(4)

$\tilde{P_{1}}:=\tilde{\Pi}^{-1/2}\tilde{S_{1}}\tilde{\Pi}^{1/2}$

Using the above results Negahban et al. [2017] show that:

(p_{t}-\tilde{\pi})^{T}=(p_{t-1}-\tilde{\pi})^{T}(\tilde{P}-\tilde{P_{1}}+\Delta)+\tilde{\pi}^{T}\Delta

(5)

where $p_{t}$ is the distribution of the Markov chain on $P$ after $t$ iterations, for large $t$ we have $p_{t}\rightarrow\pi$ . By the definition of $\tilde{P}_{1}$ , it follows that $\lVert\tilde{P}-\tilde{P}_{1}\rVert_{\tilde{\pi},2}=\lVert S-S_{1}\rVert=\lambda_{max}$ . Here $\lambda_{max}$ corresponds to the second-largest eigenvalue of the $\tilde{P}$ matrix, which by the Perron-Frobenius Theorem is $\lambda_{max}=\max(\lambda_{2},|\lambda_{n}|)$ . We can rearrange Equation 5 to get:

	$\displaystyle\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}}$	$\displaystyle\leq\lVert p_{t}-\tilde{\pi}\rVert_{\tilde{\pi}}+(\lVert\tilde{P}-\tilde{P}_{1}\rVert_{\tilde{\pi},2}+\lVert\Delta\rVert_{\tilde{\pi},2})\cdot\lVert p_{t-1}-\tilde{\pi}\rVert_{\tilde{\pi}}$
		$\displaystyle\leq\lVert p_{t}-\tilde{\pi}\rVert_{\tilde{\pi}}+\rho\cdot\lVert p_{t-1}-\tilde{\pi}\rVert_{\tilde{\pi}}$

here $\rho=\lambda_{max}+\lVert\Delta\rVert_{\tilde{\pi},2}$ . Choosing a large enough $t$ where the Markov chain has converged we get:

\displaystyle\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}}

\displaystyle\leq(1+\rho)\cdot\lVert\pi-\tilde{\pi}\rVert_{\tilde{\pi}}

We then use the bounds in Equations 3 and 4 to get:

	$\displaystyle\frac{\lVert\pi-\tilde{\pi}\rVert_{\tilde{\pi}}}{\lVert\tilde{\pi}\rVert}$	$\displaystyle\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}}}{\lVert\tilde{\pi}\rVert\cdot(1+\rho)}$
	$\displaystyle\sqrt{\tilde{\pi}_{max}}\cdot\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}$	$\displaystyle\geq\sqrt{\tilde{\pi}_{min}}\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(1+\rho)}$
	$\displaystyle\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}$	$\displaystyle\geq\sqrt{\frac{\tilde{\pi}_{min}}{\tilde{\pi}_{max}}}\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(1+\lambda_{max}+\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}\cdot\lVert\Delta\rVert_{2})}$

Since $\lambda_{max}\leq 1$ we get:

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(2\sqrt{b}+b\lVert\Delta\rVert_{2})}

(6)

Using lemmas 15 and 16 in Equation 6 we can conclude that :

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{C\log n}{(2\sqrt{b}+b+2bC\log n)}\cdot\frac{b-1}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

(7)

with probability $\geq 1-\frac{1}{n^{C/3-1}}-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}$ . We can see that for large enough $n$ the RHS will converge to a constant with high probability. ∎

Lemma 15.

For the opposite-voting Byzantine strategy, we have

\lVert\Delta\rVert_{2}\leq 1+2C\log n

with probability $\geq 1-\frac{1}{n^{C/3-1}}$

Proof.

We will first get an upper-bound for the denominator. We know that:

\displaystyle\lVert\Delta\rVert_{2}

\displaystyle\leq\sqrt{\lVert\Delta\rVert_{1}\cdot\lVert\Delta\rVert_{\infty}}

These are essentially the maximum row and column sums. Since all of the entries in $\Delta$ are the difference corresponding entries between $P$ and $\tilde{P}$ we know that they will always be between $-1$ and $1$ . Therefore we can say that both the maximum column and maximum row sum will be at most $1+d_{max}$ . We can see the probability of having a vertex with a degree greater than $2C\log n$ can by Chernoff’s bound that $\mathbb{P}(d_{i}\geq 2C\log n)\leq e^{-C\log n/3}$ . Applying union bound we can say that $\mathbb{P}(d_{max}\geq 2C\log n)\leq ne^{-C\log n/3}$ . We can see that by setting the value of $C\geq 3$ we can say that with high probability that $d_{max}\leq 2C\log n$ . Therefore we can conclude that $\lVert\Delta\rVert_{2}\leq 1+2C\log n$ . Hence Proved. ∎

Lemma 16.

For the opposite-voting Byzantine strategy, there exists a $\tilde{\pi}$ such that we have

\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{(b-1)C\log n}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability $\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}$

Proof.

We now try to get a lower-bound on the RHS of Equation 6. We consider weights:

\tilde{\pi}_{i}=\begin{cases}\frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lfloor\frac{n}{2}\rfloor\\ \frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lfloor\frac{n}{2}\rfloor<i\leq n\end{cases}

Let us call the resulting vector from $\tilde{\pi}^{T}\Delta$ as $v$ . We can see that for $i=1$ to $\lfloor n/2\rfloor$ we have:

$\displaystyle v_{i}$	$\displaystyle=\sum_{(i,j)\in E}\tilde{\pi}_{j}\Delta_{ji}+\tilde{\pi}_{i}\Delta_{ii}$
	$\displaystyle=\sum_{(i,j)\in E\&j\leq\lfloor n/2\rfloor}\frac{\Delta_{ji}}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}+\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\frac{b\Delta_{ji}}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}+\tilde{\pi}_{i}\Delta_{ii}$
	$\displaystyle=\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}$	( $\sum_{(i,j)\in E}\Delta_{ji}+\Delta_{ii}=0$ )

Given the above relation for $v_{i}$ we can say that

	$\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}v_{i}^{2}$	$\displaystyle=\left(\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\right)^{2}\cdot\left(\sum_{i=1}^{\lfloor n/2\rfloor}\left(\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right)^{2}\right)$
		$\displaystyle\geq\left(\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\right)^{2}\cdot\frac{1}{\left\lfloor\frac{n}{2}\right\rfloor}\cdot\left(\sum_{i=1}^{\lfloor n/2\rfloor}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right)^{2}$		(Cauchy-Schwarz)

Since the first two terms are already known we are primarily interesting in getting a high probability bound on the third term. We can see that we are only interested in $\lfloor n/2\rfloor\cdot\lceil n/2\rceil\sim n^{2}/4$ entries in the adjacency matrix. We can see that each of these have a probability $p$ of being an edge. Since $p=C\log n/n$ we can conclude there will be at least $pn^{2}/8$ edges with probability

\geq 1-e^{-\frac{p^{2}n^{2}}{8}}\geq 1-\frac{1}{n^{{C^{2}\log n}/{8}}}

Furthermore, it is easy to see a single byzantine voter amongst the $k$ voters will cause a deviation of $\frac{b}{(b+1)k}$ . On the other hand, any good voter will in expectancy not cause any deviation. Therefore in expectancy, we can say that the deviation for a randomly chosen voter will be $\frac{Fb}{(b+1)kK}$ . Here we are summing $k\cdot pn^{2}/8$ binomial random variables, therefore we can say that by Chernoff bound out of the $kpn^{2}/8$ voters the total deviation will be at least $\frac{bF}{2Kk(b+1)}\cdot\frac{kpn^{2}}{8}$ Byzantine voters with probability

\geq 1-e^{-\frac{F^{2}}{64K^{2}}\cdot Ckn\log n}=1-\frac{1}{n^{knCF^{2}/64K^{2}}}

Therefore we can claim that:

	$\displaystyle\left(\sum_{i=1}^{\lfloor n/2\rfloor}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right)$	$\displaystyle\geq\frac{Fkpn^{2}}{16K}\cdot\frac{b}{(b+1)k}$
		$\displaystyle\geq\frac{FCnb\log n}{16K(b+1)}$

Substituting in the above equation and considering larger $n$ we get:

\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}v_{i}^{2}

\displaystyle\geq\frac{2}{n}\cdot\left(\frac{b-1}{nb}\cdot\frac{FCnb\log n}{32K(b+1)}\right)^{2}

Coming back to our objective of bounding $\lVert\tilde{\pi}^{T}\Delta\rVert$ we can say that:

\lVert\tilde{\pi}^{T}\Delta\rVert\geq\frac{(b-1)FC\log n}{16(b+1)K}\cdot\sqrt{\frac{1}{2n}}

We can also see that the maximum term in $\tilde{\pi}$ can at most be $\frac{b}{b+n-1}$ . Therefore we can conclude that $\lVert\tilde{\pi}\rVert$ will at most be $\sqrt{n}\cdot\frac{b}{b+n-1}$ , which for constant values of $b$ essentially comes out to $\leq\frac{b}{2\sqrt{n}}$ .

Therefore we claim that:

\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{(b-1)C\log n}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability $\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}$ ∎

A.3 BSR Algorithm Analysis

We start off by proving Lemma 6.

Proof.

We initially prove a few lemmas to find out the probability that for some $\xi$ a voter will be at a distance of $\delta$ away from the mean.

Lemma 17.

For each good voter $v$ and for each $\xi\in\{1,-1\}^{d}$ let $D_{i,v}=\sum_{j\neq i}\xi_{j}\cdot y_{i,j,v}$ where $y_{i,j,v}$ is the vote casted by voter $v$ when queried about $(i,j)$ . Then $D_{i,v}$ is at distance $\geq\delta$ away from the expected mean with probability at most $2\exp\left(-\frac{2\delta^{2}}{d_{i}}\right)$ .

Proof.

We can directly apply Hoeffding’s inequality as even after being multiplied by $\xi$ the range of $y_{i,j,v}$ still remains between $-1$ and $+1$ . Therefore we can conclude that with probability $\geq 2\exp\left(-\frac{2\delta^{2}}{d_{i}}\right)$ , $D_{i,v}$ does not deviate from the mean by more than $\delta$ . ∎

Corollary 18.

When $\delta\geq\sqrt{\frac{Q}{2}d_{i}\log k}$ , then we have the probability of being at most $\delta$ away from the mean $\geq 1-2e^{-Q\log k}=1-2k^{-Q}$ .

Proof.

Trivially true using Lemma 17. ∎

Lemma 19.

If $\delta=\sqrt{\frac{Q}{2}\cdot d_{i}\log k}$ and $F\leq\frac{K(1-\epsilon)}{2}$ then with probability

\geq 1-\exp\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)-\exp\left(-\frac{k\epsilon^{2}}{2}\right)

the predicted mean will be $\mathcal{O}(\delta)$ away from the observed mean for all $\xi$ .

Proof.

We define a voter to be in good range if it is $\delta$ away from the mean, otherwise, we say it is in bad range. We now make two claims:

(1)

At least $(1-\epsilon_{1})$ fraction of the good voters will be in good range with a probability $\geq 1-\exp(-k\epsilon_{1}/6+d_{i}\log 2)$ for all values of $\xi$ .
(2)

With a probability of $\geq 1-\exp(-\frac{k\epsilon^{2}}{2})$ we will have $f<\frac{k}{2}\cdot(1-\frac{\epsilon}{2})$ , where $f$ are the number of Byzantine voters amongst the $k$ chosen voters.

Proof of Claim (1): From Corollary 18 we know that a voter will be in good range with probability $\geq 1-2k^{-Q}$ . Therefore we can claim by the Chernoff bound(setting $\mu=k^{1-Q}/2$ and $\delta=2\epsilon_{1}k^{Q}-1$ ) that for some $\xi$ the probability of having $(1-\epsilon_{1})$ voters not being in good range can be written as $\geq 1-\exp\left(k\epsilon_{1}/6\right)$ . However, for this to be applicable over all values of $\xi$ we need to multiply this by $2^{d_{i}}$ . Therefore we can get the statement of Claim (1). It should be noted that for Claim 1 to hold with high probability we need

k\geq\frac{6}{\epsilon_{1}}(c^{\ast}\log n+d_{max}\log 2)

Proof of Claim (2): We can also see that since $F\leq\frac{K(1-\epsilon)}{2}$ we can apply Hoeffding’s inequality to conclude the statement of claim 2.

By claim (1), we can see that there are at least $(1-\epsilon_{1})\cdot(k-f)$ good voters that lie in good range with probability $\geq 1-\exp(-k\epsilon_{1}/6+d_{i}\log 2)$ . By Claim (2), we can also see that $(k-f)$ will be at least $\frac{k}{2}\cdot(1+\frac{\epsilon}{2})$ with probability $\geq 1-\exp(-\frac{k\epsilon^{2}}{2})$ . Therefore, we can conclude that there are at least $\frac{k}{2}\cdot(1-\epsilon_{1})\cdot(1+\frac{\epsilon}{2})$ good voters in good range with probability $1-\exp(-\frac{k\epsilon^{2}}{2})-\exp(-k\epsilon_{1}/6+d_{i}\log 2)$ by using a simple application of the union bound. By choosing the value of $\epsilon_{1}$ such that

\epsilon_{1}\leq\frac{\epsilon}{2+\epsilon}

(8)

We can see that at most $k/2$ good voters will be in good range whp if $k\in\Omega(\log n)$ . Therefore we can set $\epsilon_{1}=\epsilon/3$ to satisfy Equation 8.

Since know that at least $k/2$ of the voters lie in the good range we can conclude that the median of the voter’s (good and Byzantine) outputs also lies in the good range. Therefore we can conclude that the sample median of the good voters is at most $2\delta$ away from the predicted median.

While we have this result what we ideally want is a good predictor for the expected mean. We have already shown that the predicted median would be at a distance $2\delta$ from the sample median of the good voters. We will now go on to show that the sample median of the good voters is also very close to the expected mean. For the sample median to be more than $2\delta$ away from the expected mean we would need there to be less than $50\%$ of the entries $\delta$ away from the expected mean. However, since we are assuming Claim 1 is in effect we know that $50\%$ of the entries are in good range with high probability. We can therefore conclude that the predicted median is at most $4\delta$ away from the expected mean. Therefore if we consider entries that are $5\delta$ away from the predicted median this will include the entries that are $\delta$ away from the mean encapsulating a large fraction of the voters while ensuring that any chosen voter is at most $9\delta$ away from the expected mean. ∎

We will now finally prove Lemma 6. As in the Rank-Centrality algorithm’s analysis, we will prove this in two parts. We know that $\Delta=D+\bar{\Delta}$ , where $D$ is a diagonal matrix consisting only of the diagonal entries of $\Delta$ . We can also decipher that $\lVert\Delta\rVert_{2}\leq\lVert D\rVert_{2}+\lVert\bar{\Delta}\rVert_{2}$ We now separately show that both $\lVert D\rVert_{2}$ and $\lVert\bar{\Delta}\rVert_{2}$ are $o\left(1\right)$ .

Lemma 20.

Algorithm 2 removes some voters such that we have:

\lVert\bar{\Delta}\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability $\geq 1-\frac{8}{n^{1.5C^{2}-1}}$ .

Proof.

We will initially show that the non-diagonal matrix is bounded. Since we know that:

\lVert\bar{\Delta}\rVert_{2}\leq\sqrt{\lVert\bar{\Delta}\rVert_{1}\lVert\bar{\Delta}\rVert_{\infty}}

Since $\bar{\Delta}$ is a symmetrical matrix we can simply bound the maximal-row sum of the absolute values of $\bar{\Delta}$ . To that end we define

R_{i}=\frac{1}{kd_{max}}\sum_{j\neq i}|C_{ij}|

where $C_{ij}=kA_{ij}-kp_{ij}$ . Continuing from Equation (22) from Negahban et al. [2017] we get:

$\displaystyle\mathbb{P}(R_{i}\geq s)$	$\displaystyle=\mathbb{P}(\sum_{j\in\partial i}\|C_{ij}\|>kd_{i}s)$	(9)
	$\displaystyle\leq\sum_{\xi\in\{-1,1\}^{d_{i}}}\mathbb{P}\left(\sum_{j\in\partial i}\xi_{j}C_{ij}>kd_{i}s\right)$	(10)
	$\displaystyle\leq\sum_{\xi\in\{-1,1\}^{d_{i}}}\mathbb{P}\left(\sum_{j\in\partial i\text{ and good}}\xi_{j}C_{ij}>\frac{kd_{i}s}{2}\right)+\mathbb{P}\left(\sum_{j\in\partial i\text{ and Byz}}\xi_{j}C_{ij}>\frac{kd_{i}s}{2}\right)$	(11)

We need to show two things

(1)

The number of good voters that will get eliminated are at most a constant fraction.
(2)

The Byzantine voters that are not eliminated in any round can not cause the second term to get very large.

The first term in Equation 11 will work out the same way as originally upto constant factors because we will show that only a constant factor of the good voters can be removed with high probability.

Proof of (1):

We will now show that Line 13 in Algorithm 2 does not eliminate more than a constant fraction of the good voters. This is necessary to show a bound on the good voters’ term in Equation 11.

We will now calculate a bound on the maximum number of good out-of-bound( $>5\delta$ away from the $\hat{m}$ ) entries that can be there across all $\xi$ . We claim that this number is $2k^{1-Q}+{2k^{1-\alpha}}$ . This is because by a similar argument as claim (1) we know that by the Chernoff-Hoeffding bound that the probability with which there will be so many out-of-bound entries is:

$\displaystyle\mathbb{P}$	$\displaystyle\left(\frac{1}{k}\sum Y_{i}\geq\frac{2}{k^{Q}}+\frac{2}{k^{\alpha}}\right)$
	$\displaystyle\leq\exp\left(D\left(\frac{2}{k^{Q}}+\frac{2}{k^{\alpha}}\;\middle\\|\;\frac{2}{k^{Q}}\right)k\right)$
	$\displaystyle\leq\exp\left(-D\left(p+p^{\prime}\;\middle\\|\;p\right)k\right)$	( $p^{\prime}=\frac{2}{k^{\alpha}}$ and $p=\frac{2}{k^{Q}}$ )
	$\displaystyle\leq\exp\left(\left(-(p+p^{\prime})\cdot\log\left(1+\frac{p^{\prime}}{p}\right)-(1-p-p^{\prime})\log\left(\frac{1-p-p^{\prime}}{1-p}\right)\right)k\right)$
	$\displaystyle\leq\exp\left(\left(-(p+p^{\prime})\cdot\log\left(1+\frac{p^{\prime}}{p}\right)-\log\left(1-p-p^{\prime}\right)\right)k\right)$
	$\displaystyle\leq\exp\left(-(p+p^{\prime})\cdot\left(\log\left(1+\frac{p^{\prime}}{p}\right)-\frac{3}{2}\right)k\right)$	( $\log(1-t)\leq 3t/2$ for $t<0.5$ )
	$\displaystyle\leq\exp\left(-p^{\prime}k/2\right)=\exp(-k^{1-\alpha})$

For the last equation to hold we need $p^{\prime}/p\geq e^{2}-1$ . In other words, we need $k^{Q}\geq k^{\alpha}\cdot(e^{2}-1)$ , we can see that this can clearly be achieved for all values of $\alpha\in[0,1]$ for large values of $k$ if $Q>1$ . However, the probability that we have computed above is for some $\xi$ . For it to be applicable over all $\xi$ we need $2^{d_{i}}\cdot\exp(-k^{1-\alpha})$ to be very small, which we can see is satisfied when $k^{1-\alpha}\geq d_{i}$ . Since we have defined $\alpha=1-\log d_{max}/\log k$ , we can see that it is always true.

We see that for a voter to be removed there need to be $8k^{1-Q}+{8k^{1-\alpha}}$ out-of-bounds entries, therefore when we remove a single voter we will subsequently be removing 3 Byzantine voters. This ensures that at most $k/6$ good voters can be removed. Therefore in total after the removal, the number of good voters are $\geq\frac{k}{3}$ . Therefore using the same equation from Negahban et al. [2017], with probability $\geq 1-2^{d_{i}}\cdot\exp(-k^{1-\alpha})$ we can conclude that:

\displaystyle\sum_{j\in\partial i}\sum_{\xi_{j}\in\{-1,1\}}\mathbb{P}\left(\sum_{j\in good}|C_{ij}|>\frac{kd_{i}s}{2}\right)\leq\exp\left(-\frac{k}{3}d_{max}s^{2}+d_{i}\log 2\right)

(12)

The above result comes directly by applying Hoeffding’s inequality and is similar to the result in Negahban et al. [2017] as we are only handling good agents here. We can see that for this result to hold with high probability we need $\frac{ks^{2}}{3}\geq\log 2$ since $d_{i}\in\Omega(\log n)$ . Therefore we come to the conclusion of:

s\geq\sqrt{\frac{3\log 2}{k}}

(13)

Proof of (2): We will now prove that the second term also does not occur with high probability. We will do this in two parts. We will consider voters that lie close to the mean and the voters that do not lie close to the mean.

We initially only consider the cases where the voters lie within the $5\delta$ range from $\hat{m}$ . We can see that for each $\xi$ , the Byzantine entries of $D_{i,j,v}$ can only be at a distance of $9\delta$ away from the mean or they cannot fall in the $5\delta$ range. Therefore $\sum_{j}\xi_{j}C_{i,j}$ can only be at most $9\delta f$ away from the mean. Therefore all we need to show is that $\frac{9\delta k}{2}\leq\frac{1}{4}kd_{i}s$ , since $f\leq k/2$ . This is equivalent to

s\geq 9\sqrt{\frac{2Q\log k}{d_{i}}}.

(14)

Now all we need to do is consider the cases where the voters do not lie within the $\delta$ range. We can see that we eliminate voters if for a particular $\xi$ they have a lot of out-of-bound entries. Therefore we can claim that for any $\xi$ there are only $8k^{1-Q}+{8k^{1-\alpha}}$ entries that are not in the $5\delta$ range and each can at most be $d$ , we can see that the maximum deviation will at most be $8k\cdot\left(\frac{1}{k^{Q}}+\frac{1}{k^{\alpha}}\right)d_{i}$ . Since we know that $k^{Q}\geq k^{\alpha}\cdot(e^{2}-1)$ we can conclude that the max deviation will be $\frac{8e^{2}k^{1-\alpha}}{e^{2}-1}\leq\frac{1}{4}kd_{i}s$ . Using an approximate value for $e$ we get:

s\geq\frac{38}{k^{\alpha}}.

(15)

Therefore we can conclude that the second term does not occur. Therefore we can conclude that:

\displaystyle\sum_{j\in\partial i}\sum_{\xi_{j}\in\{-1,1\}}\mathbb{P}\left(\sum_{j\in Byz}|C_{ij}|>\frac{kd_{i}s}{2}\right)

\displaystyle\leq\exp\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)+\exp\left(-\frac{k\epsilon^{2}}{2}\right)

(16)

The last equation comes by using Equation 8. Combining Equations 13, 14, 15 we get

s=\max\left(\sqrt{\frac{3\log 2}{k}},\frac{38}{k^{\alpha}},9\sqrt{\frac{2Q\log k}{d_{min}}}\right)

However, since $d_{min}<k$ we can get rid of the first term to get:

s=\max\left(\frac{38d_{max}}{k},9\sqrt{\frac{2Q\log k}{d_{min}}}\right)

(17)

Combining equations 12 and 16 with equation 11 we get:

	$\displaystyle\mathbb{P}(R_{i}\geq s)$	$\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k}{3}d_{i}s^{2}+d_{i}\log 2\right)+\exp(-k^{1-\alpha}+d_{i}\log 2)$
		$\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+d_{max}\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k}{3}d_{max}s^{2}+d_{max}\log 2\right)+\exp(-k^{1-\alpha}+d_{max}\log 2)$

Since this is a sum of 4 terms we individually analyse them. Firstly we see that the first two terms are clearly small because $k\in\Omega(\log n)$ . For the third term, we see that because of equation 17, $ks^{2}/3\geq 1+\log 2$ therefore making the third term fairly small as well. Finally, the definition of $\alpha$ ensures that the last term is small. Since we have to find a value for $k$ we can set $k$ to be:

k=\max\left(\frac{18}{\epsilon}(d_{max}\log 2+c^{\ast}\log n),\frac{2c^{\ast}\log n}{\epsilon^{2}},\frac{3(1+\log 2)}{s^{2}}\right)

(18)

We see that the third term is asymptotically smaller than the first two, we can remove it from consideration. By setting

k\geq 18\frac{d_{max}}{\epsilon^{2}}

we get

\mathbb{P}(R_{i}\geq s)\leq 1-\frac{4}{e^{d_{max}(1-\log 2)}}

The Rank-Centrality analysis later takes into account the probability that all degrees are in between $np/2$ and $3np/2$ . Therefore we get

\mathbb{P}(R_{i}\geq s)\leq 1-\frac{4}{n^{5C^{2}(1-\log 2)}}

Applying the Union bound we know that

\displaystyle\mathbb{P}\left(\lVert\bar{\Delta}\rVert_{2}\geq s\right)

\displaystyle\leq 2n\mathbb{P}(R_{i}\geq s)

(19)

Therefore we can say that:

\lVert\bar{\Delta}\rVert_{2}\leq\max\left(\frac{38d_{max}}{k},9\sqrt{\frac{2Q\log k}{d_{min}}}\right)

with probability $\geq 1-\frac{8n}{n^{5C^{2}(1-\log 2)}}\geq 1-\frac{8}{n^{1.5C^{2}-1}}$ . Since $\max\left(38,9\sqrt{2Q}\right)\leq 40$ for values of $1\leq Q\leq 9$ , we can say

\lVert\bar{\Delta}\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability $\geq 1-\frac{8}{n^{1.5C^{2}-1}}$ . Hence Proved. ∎

We will now prove that the diagonal entries of $\Delta$ are bounded

Lemma 21.

Algorithm 2 removes some voters such that we have:

\lVert D\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability $\geq 1-\frac{8}{n^{1.5C^{2}-1}}$ .

Proof.

From Equations (15) and (16) we know that

\Delta_{ii}=-\frac{1}{kd_{max}}\sum_{j\neq i}C_{ij}

(20)

Therefore we can write

\displaystyle\mathbb{P}\left(\lVert D\rVert_{2}\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right)

\displaystyle\leq\sum_{i=1}^{n}\mathbb{P}\left(|\sum_{j\neq i}C_{ij}|\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right)

Since in the previous part we had proved a much strong bound wherein we had shown that $\sum_{i=1}^{n}\mathbb{P}\left(\sum_{j\neq i}|C_{ij}|\geq 40\sqrt{\frac{\log d_{i}}{d_{i}}}\right)\leq\frac{8}{n^{1.5C^{2}-1}}$ , we can conclude that by the triangle inequality we have:

\displaystyle\mathbb{P}\left(\lVert D\rVert_{2}\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right)

\displaystyle\leq\frac{8}{n^{1.5C^{2}-1}}

Hence Proved. ∎

Now combining Lemmas 20 and 21 we have proved Lemma 6. ∎

We now finally prove Theorem 7. Similar to the Rank-Centrality convergence analysis, using the above results and using Lemmas 12 and 14 from Negahban et al. [2017] we can conclude that

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\pi\rVert}\leq\frac{80b^{5/2}\kappa}{\xi(G)}\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

where $\kappa=d_{max}/d_{min}$ . Similar to Negahban et al. [2017] we can see that for an Erdős-Rényi graph with probability $\geq 1-6n^{-C/8}$ we have $\kappa\leq 3$ and $\xi(G)\geq 1/2$ . Substituting we get:

\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\pi\rVert}\leq 480b^{5/2}\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

A.4 FBSR Algorithm Analysis

Firstly since we are dealing with $p=10C^{2}\log n/n$ and since the proof of the Rank-Centrality algorithm already takes into account the probability that all degrees will be between $pn/2$ and $3pn/2$ we can say that the minimum degree will at most be $5C^{2}\log n$ . Therefore we can say that any bucket will have at least $\frac{5C^{2}\log 2}{5C^{2}\log 2+1}\log_{2}n$ objects and at most $\log_{2}n$ objects. We know that:

\displaystyle\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s)

\displaystyle\leq\sum_{l=1}^{\beta}\mathbb{P}\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>ks|B_{l}|\right)

(21)

Using the same analysis as above we can say that:

	$\displaystyle\mathbb{P}$	$\displaystyle\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}\|C_{ij}\|>ks\|B_{l}\|\right)$
		$\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+\|B_{l}\|\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k\|B_{l}\|}{3}s^{2}+\|B_{l}\|\log 2\right)+\exp(-k^{1-\alpha}+\|B_{l}\|\log 2)$

Substituting $k\geq 18(2+C/8)\log n/\epsilon^{2}$ we get

\displaystyle\mathbb{P}\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>ks|B_{l}|\right)\leq\frac{4}{n^{1+C/8}}

We can also see that $\beta$ will be at most $15C^{2}$ . Substituting in Equation 21 we get:

\displaystyle\mathbb{P}(R_{i}\geq s)\leq\frac{60C^{2}}{n^{1+C/8}}

where $s$ is defined as

s=\max\left(\frac{38\log n}{k},9\sqrt{2Q\frac{(5C^{2}\log 2+1)\log\log n}{5C^{2}\log n\log 2}}\right)

Using $C\geq 1$ and $1\leq Q\leq 4$ to get:

s=40\max\left(\frac{\log n}{k},\sqrt{\frac{\log\log n}{\log n}}\right)

Finally using Equation 19 we get

\displaystyle\mathbb{P}\left(\lVert\bar{\Delta}\rVert_{2}\geq s\right)\leq\frac{120C^{2}}{n^{C/8}}

The other steps are identical to the analysis of the BSR algorithm

A.5 Impossibility result proof

Proof.

We will prove by contradiction. Let there be an algorithm $\mathcal{A}$ such that for all input weights it returns output weights such that $\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n)$ with probability $>1/2$ , where $f(n)$ converges to $0$ . We consider the case where $F=K/2$ . We can see that the cases where $F>K/2$ can be handled by making $2F-K$ Byzantine voters just output $1$ irrespective of the queries. We can assume that the algorithm is aware of these $2F-K$ voters and will therefore always ignore these voters. Therefore the problem reduces to the $F=K/2$ case.

Consider an instance (Instance 1) where $\tilde{\pi}$ is defined as follows:

\tilde{\pi}_{i}=\begin{cases}\frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lfloor\frac{n}{2}\rfloor\\ \frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lfloor\frac{n}{2}\rfloor<i\leq n\end{cases}

(22)

Let us consider the case where the good voters are numbered $[1,\frac{K}{2}]$ , while the Byzantine voters are numbered $[\frac{K}{2}+1,K]$ . We can use $\mathcal{A}$ to give us a $\pi^{\ast}$ such that $\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n)$ with probability $>1/2$ . We now outline the strategy that the Byzantine voters can use. The Byzantine voters give their results according to some $\tilde{\pi}^{\prime}$ . Where:

\tilde{\pi}^{\prime}_{i}=\begin{cases}\frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lceil\frac{n}{2}\rceil\\ \frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lceil\frac{n}{2}\rceil<i\leq n\end{cases}

(23)

In another instance (Instance 2), we consider the good and Byzantine voters flip their numberings i.e. the Byzantine voters are numbered from $[1,\frac{K}{2}]$ and the good voters are numbered from $[\frac{K}{2}+1,K]$ . Furthermore, let the input weights be $\tilde{\pi}^{\prime}$ and the Byzantine voters vote according to $\tilde{\pi}$ . Based on the description of our instance and the existence of algorithm $\mathcal{A}$ , with probability $>1/2$ we will get a weight output from $\mathcal{A}$ such that

\frac{\lVert\pi^{\ast}-\tilde{\pi}^{\prime}\rVert}{\lVert\tilde{\pi}^{\prime}\rVert}\leq f(n)

(24)

But to any algorithm, the above two instances are identical and therefore with probability $>1/2$ :

\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n)

(25)

We can therefore conclude that with a non-zero probability we require both Equations 24 and 25 by a simple application of the union bound. We see that:

$\displaystyle\lVert\tilde{\pi}-\tilde{\pi}^{\prime}\rVert$	$\displaystyle\leq\lVert\tilde{\pi}-\pi^{\ast}\rVert+\lVert{\pi}^{\ast}-\tilde{\pi}^{\prime}\rVert$	(Triangle Inequality)
$\displaystyle\lVert\tilde{\pi}-\tilde{\pi}^{\prime}\rVert$	$\displaystyle\leq f(n)\cdot\left(\lVert\tilde{\pi}\rVert+\lVert\tilde{\pi}^{\prime}\rVert\right)$	(Equations 24 and 25)
$\displaystyle\frac{1}{2}\cdot\frac{(b-1)\cdot\sqrt{n-1}}{\sqrt{b^{2}\cdot\lceil{n/2}\rceil+\lfloor n/2\rfloor}}$	$\displaystyle\leq f(n)$	(Equations 22 and 23)

For larger $n$ , the LHS converges to $\geq\frac{b-1}{2b}$ , while the RHS converges to 0. Giving a contradiction. ∎

A.6 Broader Impact

Due to the more theoretical nature of our research, we do not see any potential negative societal impacts associated with the work.

A.7 Experimental Results and Code

We provide the detailed tabular results for Figures 3, 3, 3, 6, 6, and 6 in Tables 1, 2, 3, 4, 5 and 6 with error bounds for all the obtained results. The code for producing these can be found here.

Table 1:

L_{2}

-Errors for the FBSR and RC algorithms on Synthetic Data when Byzantine Fraction(BF) is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, OVP = Opposite Vote Probabilistic, RS = Random Subset

Strategy Algo 0 0.05 0.1 0.15 0.2 0.25 0.3 FOV FBSR 0.02 $\pm$ 0.001 0.03 $\pm$ 0.002 0.04 $\pm$ 0.002 0.07 $\pm$ 0.004 0.11 $\pm$ 0.008 0.15 $\pm$ 0.011 0.23 $\pm$ 0.015 RC 0.02 $\pm$ 0.002 0.07 $\pm$ 0.002 0.14 $\pm$ 0.002 0.2 $\pm$ 0.005 0.27 $\pm$ 0.006 0.33 $\pm$ 0.008 0.4 $\pm$ 0.01 OV FBSR 0.02 $\pm$ 0.002 0.02 $\pm$ 0.001 0.02 $\pm$ 0.001 0.02 $\pm$ 0.002 0.03 $\pm$ 0.002 0.04 $\pm$ 0.005 0.06 $\pm$ 0.007 RC 0.02 $\pm$ 0.001 0.09 $\pm$ 0.002 0.17 $\pm$ 0.005 0.24 $\pm$ 0.004 0.31 $\pm$ 0.005 0.38 $\pm$ 0.007 0.46 $\pm$ 0.006 OVP FBSR 0.02 $\pm$ 0.001 0.02 $\pm$ 0.001 0.03 $\pm$ 0.002 0.05 $\pm$ 0.002 0.07 $\pm$ 0.004 0.09 $\pm$ 0.004 0.13 $\pm$ 0.004 RC 0.02 $\pm$ 0.001 0.06 $\pm$ 0.003 0.1 $\pm$ 0.008 0.15 $\pm$ 0.01 0.2 $\pm$ 0.006 0.24 $\pm$ 0.01 0.29 $\pm$ 0.015 RS FBSR 0.02 $\pm$ 0.001 0.02 $\pm$ 0.001 0.03 $\pm$ 0.002 0.04 $\pm$ 0.002 0.06 $\pm$ 0.004 0.09 $\pm$ 0.003 0.12 $\pm$ 0.004 RC 0.02 $\pm$ 0.001 0.05 $\pm$ 0.002 0.09 $\pm$ 0.004 0.13 $\pm$ 0.005 0.17 $\pm$ 0.002 0.21 $\pm$ 0.005 0.25 $\pm$ 0.006

Table 2: Kendall’s Tau Correlation for the FBSR and RC algorithms on Synthetic Data when BF is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, OVP = Opposite Vote Probabilistic, RS = Random Subset

Strategy Algo 0 0.05 0.1 0.15 0.2 0.25 0.3 FOV FBSR 0.98 $\pm$ 0.003 0.97 $\pm$ 0.003 0.96 $\pm$ 0.003 0.92 $\pm$ 0.007 0.89 $\pm$ 0.012 0.85 $\pm$ 0.015 0.8 $\pm$ 0.018 RC 0.97 $\pm$ 0.003 0.92 $\pm$ 0.006 0.85 $\pm$ 0.01 0.77 $\pm$ 0.016 0.7 $\pm$ 0.014 0.62 $\pm$ 0.027 0.55 $\pm$ 0.026 OV FBSR 0.98 $\pm$ 0.002 0.98 $\pm$ 0.002 0.98 $\pm$ 0.003 0.97 $\pm$ 0.002 0.97 $\pm$ 0.004 0.97 $\pm$ 0.004 0.96 $\pm$ 0.005 RC 0.97 $\pm$ 0.002 0.97 $\pm$ 0.002 0.94 $\pm$ 0.021 0.89 $\pm$ 0.018 0.72 $\pm$ 0.052 0.49 $\pm$ 0.079 0.15 $\pm$ 0.107 OVP FBSR 0.98 $\pm$ 0.003 0.98 $\pm$ 0.002 0.98 $\pm$ 0.003 0.97 $\pm$ 0.003 0.97 $\pm$ 0.002 0.97 $\pm$ 0.004 0.96 $\pm$ 0.004 RC 0.98 $\pm$ 0.002 0.97 $\pm$ 0.002 0.97 $\pm$ 0.004 0.96 $\pm$ 0.004 0.96 $\pm$ 0.003 0.95 $\pm$ 0.005 0.93 $\pm$ 0.005 RS FBSR 0.98 $\pm$ 0.002 0.98 $\pm$ 0.002 0.97 $\pm$ 0.003 0.97 $\pm$ 0.004 0.97 $\pm$ 0.004 0.96 $\pm$ 0.003 0.95 $\pm$ 0.009 RC 0.97 $\pm$ 0.002 0.97 $\pm$ 0.003 0.96 $\pm$ 0.006 0.94 $\pm$ 0.004 0.92 $\pm$ 0.008 0.88 $\pm$ 0.016 0.83 $\pm$ 0.027

Table 3:

L_{2}

-Errors for the FBSR and RC algorithms on Synthetic Data when

n(=k)

is varied.

Algo BF 50 90 130 170 210 250 FBSR 0.1 0.063 $\pm$ 0.009 0.046 $\pm$ 0.005 0.042 $\pm$ 0.003 0.039 $\pm$ 0.005 0.037 $\pm$ 0.004 0.034 $\pm$ 0.003 FBSR 0.2 0.133 $\pm$ 0.019 0.123 $\pm$ 0.011 0.106 $\pm$ 0.011 0.099 $\pm$ 0.008 0.1 $\pm$ 0.009 0.094 $\pm$ 0.006 RC 0.1 0.14 $\pm$ 0.011 0.14 $\pm$ 0.005 0.14 $\pm$ 0.005 0.14 $\pm$ 0.005 0.14 $\pm$ 0.007 0.14 $\pm$ 0.002 RC 0.2 0.27 $\pm$ 0.013 0.27 $\pm$ 0.005 0.26 $\pm$ 0.01 0.27 $\pm$ 0.006 0.27 $\pm$ 0.007 0.27 $\pm$ 0.005

Table 4: Kendall’s Tau Correlation for the FBSR and RC algorithms on Synthetic Data when

n(=k)

is varied.

Algo BF 50 90 130 170 210 250 FBSR 0.1 0.936 $\pm$ 0.008 0.949 $\pm$ 0.008 0.952 $\pm$ 0.008 0.958 $\pm$ 0.007 0.958 $\pm$ 0.004 0.963 $\pm$ 0.004 FBSR 0.2 0.867 $\pm$ 0.021 0.892 $\pm$ 0.017 0.899 $\pm$ 0.007 0.898 $\pm$ 0.014 0.902 $\pm$ 0.01 0.908 $\pm$ 0.008 RC 0.1 0.83 $\pm$ 0.017 0.85 $\pm$ 0.02 0.85 $\pm$ 0.012 0.85 $\pm$ 0.014 0.84 $\pm$ 0.015 0.84 $\pm$ 0.005 RC 0.2 0.67 $\pm$ 0.047 0.67 $\pm$ 0.027 0.69 $\pm$ 0.019 0.69 $\pm$ 0.021 0.69 $\pm$ 0.023 0.69 $\pm$ 0.01

Table 5:

L_{2}

-Error for the BSR and RC algorithms on Sushi Dataset when the Byzantine Fraction is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, ORF = Opposite Random Flips

Strategy Algo 0 0.05 0.1 0.15 0.2 FOV BSR 0.0 $\pm$ 0.0 0.063 $\pm$ 0.013 0.102 $\pm$ 0.041 0.207 $\pm$ 0.062 0.267 $\pm$ 0.049 RC 0.0 $\pm$ 0.0 0.07 $\pm$ 0.019 0.15 $\pm$ 0.043 0.2 $\pm$ 0.047 0.24 $\pm$ 0.042 OV BSR 0.0 $\pm$ 0.0 0.08 $\pm$ 0.0 0.076 $\pm$ 0.0 0.126 $\pm$ 0.0 0.169 $\pm$ 0.0 RC 0.0 $\pm$ 0.0 0.11 $\pm$ 0.0 0.21 $\pm$ 0.0 0.29 $\pm$ 0.0 0.37 $\pm$ 0.0 ORF BSR 0.0 $\pm$ 0.0 0.135 $\pm$ 0.002 0.067 $\pm$ 0.001 0.085 $\pm$ 0.002 0.117 $\pm$ 0.006 RC 0.0 $\pm$ 0.0 0.09 $\pm$ 0.003 0.16 $\pm$ 0.002 0.23 $\pm$ 0.003 0.29 $\pm$ 0.003

Table 6: Kendall’s Tau Correlation for the BSR and RC algorithms on Sushi Dataset when Byzantine Fraction is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, ORF = Opposite Random Flips

Strategy Algo 0 0.05 0.1 0.15 0.2 FOV BSR 1.0 $\pm$ 0.0 0.942 $\pm$ 0.049 0.844 $\pm$ 0.116 0.742 $\pm$ 0.067 0.653 $\pm$ 0.084 RC 1.0 $\pm$ 0.0 0.92 $\pm$ 0.062 0.88 $\pm$ 0.056 0.8 $\pm$ 0.069 0.69 $\pm$ 0.153 OV BSR 1.0 $\pm$ 0.0 1.0 $\pm$ 0.0 0.956 $\pm$ 0.0 0.822 $\pm$ 0.0 0.511 $\pm$ 0.0 RC 1.0 $\pm$ 0.0 1.0 $\pm$ 0.0 0.96 $\pm$ 0.0 0.82 $\pm$ 0.0 0.38 $\pm$ 0.0 ORF BSR 1.0 $\pm$ 0.0 1.0 $\pm$ 0.0 0.991 $\pm$ 0.013 0.969 $\pm$ 0.018 0.893 $\pm$ 0.04 RC 1.0 $\pm$ 0.0 1.0 $\pm$ 0.0 0.96 $\pm$ 0.0 0.96 $\pm$ 0.013 0.85 $\pm$ 0.022