This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Byzantine Spectral Ranking

Arnhav Datar
Indian Institute of Technology, Madras
[email protected]
&Arun Rajkumar 11footnotemark: 1
Indian Institute of Technology, Madras
[email protected]
John Augustine
Indian Institute of Technology, Madras
[email protected]
Supported in part by the Robert Bosch Centre for Data Science and Artificial Intelligence, IIT Madras.Supported in part by the Cryptography, Cybersecurity, and Distributed Trust pCoE and the Accenture CoE at IIT Madras.
Abstract

We study the problem of rank aggregation where the goal is to obtain a global ranking by aggregating pair-wise comparisons of voters over a set of objects. We consider an adversarial setting where the voters are partitioned into two sets. The first set votes in a stochastic manner according to the popular score-based Bradley-Terry-Luce (BTL) model for pairwise comparisons. The second set comprises malicious Byzantine voters trying to deteriorate the ranking. We consider a strongly-adversarial scenario where the Byzantine voters know the BTL scores, the votes of the good voters, the algorithm, and can collude with each other. We first show that the popular spectral ranking based Rank-Centrality algorithm, though optimal for the BTL model, does not perform well even when a small constant fraction of the voters are Byzantine.

We introduce the Byzantine Spectral Ranking Algorithm (and a faster variant of it), which produces a reliable ranking when the number of good voters exceeds the number of Byzantine voters. We show that no algorithm can produce a satisfactory ranking with probability >1/2>1/2 for all BTL weights when there are more Byzantine voters than good voters, showing that our algorithm works for all possible population fractions. We support our theoretical results with experimental results on synthetic and real datasets to demonstrate the failure of the Rank-Centrality algorithm under several adversarial scenarios and how the proposed Byzantine Spectral Ranking algorithm is robust in obtaining good rankings.

1 Introduction

Rank aggregation is a fundamental task in a wide spectrum of learning and social contexts such as social choice [Soufiani et al., 2014, Caplin and Nalebuff, 1991], web search and information retrieval [Brin and Page, 1998, Dwork et al., 2001], recommendation systems [Baltrunas et al., 2010] and crowd sourcing [Chen et al., 2013]. Given pairwise comparison information over a set of nn objects, the goal is to identify a ranking that best respects the revealed preferences. Frequently, we are also interested in the scores associated with the objects to deduce the intensity of the resulting preference. There have been several solutions to this problem including Spectral Ranking methods such as Rank-Centrality  [Negahban et al., 2017] and MLE methods [Ford, 1957].

The Bradley-Terry-Luce (BTL) model [Bradley and Terry, 1952, Luce, 1959] has been prevalent for a vast variety of ranking algorithms. The BTL model assumes that there are nn objects that are to be compared and each of these objects has a positive weight (wiw_{i}) associated with itself. Whenever any voter is asked a query for a pair, the voter claims ii is better than jj with probability wi/(wi+wj){w_{i}}/{(w_{i}+w_{j})} independently at random. However, the BTL model assumes that all voters are identical and use the same probabilistic strategy for voting. Generally, in crowd-sourced settings, voters differ significantly from each other. Some of the voters may be spammers or even direct competitors of the entity ranking the objects, leading to data poisoning [Sun et al., 2021]. To model this situation, we assume that there is a split in the population. We consider that there are KFK-F good voters and FF Byzantine voters, to appropriately model the division of the voters. The good voters vote according to the BTL model, while the Byzantine voters can potentially be controlled by an adversary that can centrally coordinate their behavior.

1.1 Overview of our Results

We naturally initially consider the tolerance of the Rank-Centrality algorithm against a constant fraction of Byzantine voters. Negahban et al. [2017] showed that for an Erdős-Rényi pairwise-comparison graph G(n,p)G(n,p) we have

Theorem 1 (Informal from Negahban et al. [2017]).

If pΩ(logn/n)p\in\Omega(\log n/n) and kΩ(1)k\in\Omega(1) then the Rank-Centrality algorithm outputs a weight-distribution (π\pi) such that with high probability111Whenever we say with high probability we mean with probability 11nc\geq 1-\frac{1}{n^{c}} for c0c\geq 0

ππ~π~𝒪(lognknp)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\mathcal{O}\left(\sqrt{\frac{\log n}{knp}}\right)

Here π~\tilde{\pi} are the true weights of the objects and kk is the number of voter queries asked per edge in GG. We can see that when kω(1)k\in\omega(1) the RHS goes to 0 as nn goes to \infty. We show that a strategy as simple as all Byzantine voters voting the reverse of the weights causes the Rank-Centrality algorithm to output weights far from true weights with high probability.

Theorem 2 (Informal).

There exists a strategy that the Byzantine voters can use such that the Rank-Centrality algorithm outputs a distribution π\pi such that with high probability

ππ~π~Ω(1)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\Omega(1)

provided there are a constant fraction of Byzantine voters

Even simple algorithms that output the same weights regardless of the voter’s responses also give Θ(1)\Theta(1) error. To tackle this situation, we propose Byzantine Spectral Ranking, an algorithm that runs in polynomial time to remove some of the voters such that the Rank-Centrality algorithm can be applied. As long as the proportion of Byzantine voters is bounded strictly below 1/21/2, the algorithm removes some voters such that the remaining Byzantine voters are unlikely to cause the Rank-Centrality algorithm to deviate significantly.

Theorem 3 (Informal).

For a range of values of kk, there exists an 𝒪(n2)\mathcal{O}(n^{2})-time algorithm that removes voters such that with high probability we have:

ππ~π~𝒪(dmaxk)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in\mathcal{O}\left(\frac{d_{max}}{k}\right)

Here dmaxd_{max} is the maximum degree of the comparison graph. Finally, we also show that when FK/2F\geq K/2 there is no algorithm that can solve this problem for all weights with probability >1/2>1/2, by coming up with a strategy that the Byzantine voters can use for two different weight distributions, and showing therefore that the algorithm must fail in at least one of the weight distributions.

Theorem 4 (Informal).

If FK/2F\geq K/2, then there is no algorithm that can for all weights (π~\tilde{\pi}) with probability >0.5>0.5, give a weight distribution (π\pi^{\ast}) such that

ππ~π~o(1)\frac{\lVert{\pi}^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\in o(1)

1.2 Related Works

In recent times, there has been a lot of work on building robust machine learning algorithms [Blanchard et al., 2017, El Mhamdi et al., 2018, Li et al., 2020, El-Mhamdi et al., 2021, Chen et al., 2018, Wu et al., 2021] that are resilient to Byzantine faults.

Ranking from pairwise comparisons has been a very active field of research and there have been multiple algorithms proposed with regard to the BTL model [Negahban et al., 2017, Agarwal et al., 2018, Rajkumar and Agarwal, 2014, Chen and Suh, 2015]. One of the most popular algorithms proposed in recent times has been the Rank-Centrality algorithm. Negahban et al. [2017] were able to show an upper-bound on the relative L2L_{2}-error (between true weights and proposed weights) such that the relative L2L_{2}-error goes to 0 for larger nn. They did this by finding the stationary distribution of matrix PP which was defined as

Pij={Aijdmaxif ij11dmaxkiAikif i=jP_{ij}=\begin{cases}\frac{A_{ij}}{d_{max}}&\text{if }i\neq j\\ 1-\frac{1}{d_{max}}\sum_{k\neq i}A_{ik}&\text{if }i=j\end{cases} (1)

Here AijA_{ij} is the number of times ii beats jj in kk queries divided by kk, if (i,j)E(G)(i,j)\in E(G) else Aij=0A_{ij}=0. The pseudocode can be found in Algorithm 1. They also show that their algorithm outperformed other algorithms on various datasets. As our work builds on Negahban et al. [2017] we have included a primer to Rank-Centrality in the Appendix A.1.

Algorithm 1 Rank-Centrality
1:nn objects to compare, EE the set of edges between these nn objects, AA the edge weights calculated based on the voter inputs.
2:A weighing of the nn objects such that the output weight is fairly close to input hidden weights.
3:Compute the transition matrix P according to Equation 1
4:Compute the stationary distribution π\pi (as the limit of pt+1T=ptTPp^{T}_{t+1}=p^{T}_{t}P).

Besides research on adversarial attacks in ranking, there has also been research on how the robustness of the ranking algorithms when exposed to noise. Cucuringu [2016] proposed Sync-Rank, by viewing the ranking problem as an instance of the group synchronization problem over the group SO(2)SO(2) of planar rotations. This led to a robust algorithm for ranking from pairwise comparisons, as was verified by synthetic and real-world data. Mohajer and Suh [2016] worked on a stochastic noise model in order to find top-KK rankings. They were able to show an upper-bound on the minimum number of samples required for ranking and were able to provide a linear-time algorithm that achieved the lower-bound. However, both of these works [Cucuringu, 2016, Mohajer and Suh, 2016] were primarily tackling the ranking problem when there was noisy and incomplete data as opposed to an adversary who was trying to disrupt the algorithm.

Suh et al. [2017] worked on the adversarial ranking problem considering two cases one where the fraction of adversarial voters is known and is unknown. They got results that were asymptotically equivalent to Chen and Suh [2015]’s Spectral-MLE algorithm despite the presence of adversarial voters. However, they worked with the significantly weaker assumption that the adversarial voters will vote exactly opposite to the faithful voters (i.e. if the good voters voted that ii is better than jj with probability wiwi+wj\frac{w_{i}}{w_{i}+w_{j}} then the Byzantine voters voted with probability wjwi+wj\frac{w_{j}}{w_{i}+w_{j}}). Our setting is much more general in that we do not make any assumptions regarding the Byzantine voters.

Agarwal et al. [2020] worked on rank aggregation when edges in the graph were compromised. They were able to prove fundamental results regarding the identifiability of weights for both general graphs as well as Erdős-Rényi graphs. They also came up with polynomial-time algorithms for identifying these weights. However, their model is significantly different from ours. They assume that some of the comparisons in the comparison graphs might have been completely compromised while the others are free from any kind of corrtuption. In practice this is unlikely to happen, we assume a stronger model where every edge can have good and Byzantine voters. A detailed comparison with Agarwal et al. [2020] made in section 3.5

2 Problem Setting

Given a set of nn objects, our goal is the obtain a ranking from a set of pairwise comparisons over the objects. We assume that the pairs to be compared are determined by an Erdős-Rényi comparison graph G(n,p)=(V,E)G(n,p)=(V,E). Each edge in the graph corresponds to a pair that can be compared and we allow at most kk comparisons per pair.

The pairs can be compared by voters who fall into one of two categories - good or Byzantine. Out of KK total voters, FF are assumed to be Byzantine. Every good voter when queried to compare a pair of objects, votes stochastically according to a Bradley-Terry-Luce (BTL) preference model parametrized by a weight/score vector w+nw\in\mathbb{R}^{n}_{+} where the probability of preferring object ii over object jj is given by wiwi+wj\frac{w_{i}}{w_{i}+w_{j}}. The Byzantine voters on the other hand can vote however they wish (stochastic/deterministic). To make the problem general we consider a powerful central-adversary that could potentially control all of the Byzantine voters. We assume that the Byzantine adversary knows (1) The queries asked to the good voters and their corresponding votes (2) The comparison graph G(n,p)G(n,p) (3) The learning algorithm that is used to produce the ranking (4) The underlying true BTL weights of the objects that the good voters use to produce their comparisons.

We assume that both the good voters and the byzantine voters will produce the same output when the same pair is asked to be compared multiple times by them.222This ensures that simple algorithms which query all voters to vote on the same pair multiple times and obtain a median preference will not work.

The goal of a learning algorithm is to use the least number of voter comparisons to output a score vector that is as close to the true scores. The learning algorithm is allowed to choose any voter to compare any pair of objects. However, it does not know the type of voter - good or Byzantine and naturally, a good algorithm must be robust to the presence of the powerful Byzantine voters. Finally, we work in the passive setting, where a learning algorithm is needed to fix the pair to voter mapping apriori before the votes are collected.

3 Results for the Erdős-Rényi Graph

We focus our efforts on the Erdős-Rényi graph primarily because it has been heavily studied in the context of rankings. We pay careful attention to the pΘ(logn/n)p\in\Theta(\log n/n). We go on to find that our algorithms turned out to be exponential in terms of the degree. This makes the Erdős-Rényi graph an optimal candidate for working out our algorithms because the degrees of the Erdős-Rényi graph will also be logarithmic with high probability, therefore giving us polynomial-time algorithms.

3.1 Rank-Centrality Failure

We initially try to motivate the need for a robust version of the Rank-Centrality algorithm by showing that even a simple strategy from a Byzantine adversary will lead to an unsuitable ranking. In this scenario, the Byzantine adversary does not know the comparison graph, nor does it know the votes of good voters, and still can beat the Rank-Centrality algorithm. Formally we show that:

Theorem 5.

Given nn objects, let the comparison graph be G(n,p)G(n,p). Each pair in GG is compared kk times by voters chosen uniformly at random amongst the KK voters. If all Byzantine voters vote according to the opposite-weight strategy in an Erdős-Rényi graph with p=Clogn/np=C\log n/n then there exists a π~\tilde{\pi} for which the Rank-Centrality algorithm will output a π\pi such that

ππ~π~Clogn(2b+b+2bClogn)b182b(b+1)FK\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{C\log n}{(2\sqrt{b}+b+2bC\log n)}\cdot\frac{b-1}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability 11nknCF2/64K21nC2logn/81nC/31\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}-\frac{1}{n^{C/3-1}}. Here bb is the skew in object weights i.e. b=maxi,jwi/wjb=\max_{i,j}w_{i}/w_{j}.

Here when we say the opposite-weight strategy we mean a strategy where the Byzantine voter will vote for ii if wiwjw_{i}\leq w_{j} otherwise will vote for jj. If we carefully examine the RHS, we see that the first term asymptotically converges to 12b\frac{1}{2b}. The third term is also clearly a constant if there is a constant fraction of Byzantine voters. Therefore, showing the existence of a strategy that will with high probability, significantly deviate the weights given by the Rank-Centrality algorithm. The detailed analysis can be found in Appendix A.2.

Proof Sketch.

We initially show that

ππ~π~π~TΔπ~(2b+bΔ2)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(2\sqrt{b}+b\lVert\Delta\rVert_{2})} (2)

Here π\pi is the stationary distribution of the transition matrix (PP) defined in Equation 1 and Δ\Delta denotes the fluctuation of the transition matrix around its mean (P~)\tilde{P}), such that ΔPP~\Delta\equiv P-\tilde{P}. We initially prove that Δ2\lVert\Delta\rVert_{2} will at most be 1+dmax1+d_{max}, since there are at most 1+dmax1+d_{max} terms in any row or column and each term can deviate by at most 1. Following this, we show that with probability 11nC/31\geq 1-\frac{1}{n^{C/3-1}} we will have dmax2Clognd_{max}\leq 2C\log n. Finally, we show that π~TΔ\lVert\tilde{\pi}^{T}\Delta\rVert will be Ω(pn)\in\Omega(pn) with probability 11nknCF2/64K21nC2logn/8\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}} by constructing a weight-distribution where half the weights are low and half were high. An arbitrary weight-distribution could not be used because if π~=[1n,,1n]\tilde{\pi}=\left[\frac{1}{n},\dots,\frac{1}{n}\right] we would get π~TΔ=0\tilde{\pi}^{T}\Delta=0 for all Δ\Delta. This is because the row sum of PP and P~\tilde{P} will always be 1. ∎

3.2 Byzantine Spectral Ranking

In this section, we propose a novel algorithm called Byzantine Spectral Ranking (BSR). The BSR algorithm relies on asking multiple voters multiple queries and based on the collective response decides whether a voter is acting suspiciously or whether it is plausible that the voter might be a good voter. For each object ii to be compared, we take a set of did_{i} objects to compare it against and have kk voters compare ii with all of the did_{i} objects (leading to kdikd_{i} comparisons for each voter). Based on the voter’s responses, we try to eliminate voters who are likely to be Byzantine. The BSR algorithm ensures that a constant fraction of good voters remain and that if any Byzantine voters remain then these voters have votes similar to good voters. Intuitively, the BSR algorithm goes over every single subset of [di][d_{i}] and tries to find if the Byzantine voters have collectively voted strongly in favor or against this subset. The pseudocode can be found in Algorithm 2.

Algorithm 2 Byzantine Spectral Ranking
1:nn objects, KK voters, Comparison graph GG, kk comparisons allowed per edge, parameter QQ.
2:Weights (π\pi) for each of the objects in [n][n]
3:procedure 𝖡𝗈𝗎𝗇𝖽_𝖲𝗎𝗆_𝖣𝖾𝗏𝗂𝖺𝗍𝗂𝗈𝗇𝗌\mathsf{Bound\_Sum\_Deviations}(i,S,d,k,αi,S,d,k,\alpha)
4:     Pick kk voters randomly from the voter set ([1,,K])[1,\dots,K])
5:     The kk voters are queried to compare all objects in SS with ii \triangleright k×dk\times d queries in total
6:     TT represents the binary representation of the voters’ inputs \triangleright TT will be k×dk\times{d} matrix
7:     δQ2dlogk\delta\leftarrow\sqrt{\frac{Q}{2}d\log k} \triangleright We show later that QQ needs to be 1\geq 1
8:     M[0]2d×kM\leftarrow[0]_{2^{d}\times k}
9:     for each ξ{1,1}d\xi\in\{1,-1\}^{d} do
10:         UTξU\leftarrow T\xi
11:         m^𝖬𝖾𝖽𝗂𝖺𝗇(U)\hat{m}\leftarrow\mathsf{Median}(U)
12:         M[ξ][v]1M[\xi][v]\leftarrow 1 if voter vv is 5δ5\delta away from m^\hat{m} and 0 otherwise
13:     end for
14:     𝗆𝖺𝗑_𝗈𝗎𝗍8k1Q+8k1α\mathsf{max\_out}\leftarrow 8k^{1-Q}+{8k^{1-\alpha}}
15:     Remove voters with M[ξ][v]=1M[\xi][v]=1 if the sum of M[ξ]M[\xi] is 𝗆𝖺𝗑_𝗈𝗎𝗍\geq\mathsf{max\_out} for some ξ\xi
16:     Return leftover 𝗏𝗈𝗍𝖾𝗌\mathsf{votes}
17:end procedure
18:α1logdmax/logk\alpha\leftarrow 1-{\log d_{max}}/{\log k} \triangleright α\alpha is a parameter whose value is set during analysis
19:for each object ii to be compared do
20:     𝗏𝗈𝗍𝖾𝗌𝖡𝗈𝗎𝗇𝖽_𝖲𝗎𝗆_𝖣𝖾𝗏𝗂𝖺𝗍𝗂𝗈𝗇𝗌\mathsf{votes}\leftarrow\mathsf{Bound\_Sum\_Deviations}(ii, NG(i)N_{G}(i), did_{i}, kk, α\alpha)
21:     Using 𝗏𝗈𝗍𝖾𝗌\mathsf{votes} update PP as described in Equation 1
22:end for
23:Compute the stationary distribution π\pi which is the limit of pt+1T=ptTPp^{T}_{t+1}=p^{T}_{t}P.

3.2.1 Analysis

The detailed analysis can be found in Appendix A.3. Based on the proof of the Rank-Centrality Convergence the only result which does not hold for Byzantine voters is Lemma 3, where they bound the 2-norm of the Δ\Delta matrix. We show that:

Lemma 6.

For an Erdős-Rényi graph with p10C2logn/np\geq 10C^{2}\log n/n with kΩ(np)k\in\Omega(np) comparisons per edge, algorithm 2 removes some voters such that we have:

Δ280max(dmaxk,logkdmin)\lVert\Delta\rVert_{2}\leq 80\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability 116n(11.5C2)\geq 1-16n^{(1-1.5C^{2})}.

Proof Sketch.

Proving that Δ2\lVert\Delta\rVert_{2} is bounded with high probability turns out to be equivalent to proving that (ji|Cij|>kdis)\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s) is bounded. CijC_{ij} is defined as kAijkpijkA_{ij}-kp_{ij}, where pij=wiwi+wjp_{ij}=\frac{w_{i}}{w_{i}+w_{j}}. Since it is a sum of did_{i} absolute values it can be upper-bounded by considering the union bound over all 2di2^{d_{i}} combinations for this sum.

We initially show that with high probability for all 2di2^{d_{i}} values of ξ\xi the mean(m^\hat{m}) that we compute in the BSR algorithm will only be 𝒪(δ)\mathcal{O}(\delta) away from the expected mean. This gives us a good check for whether a voter is acting out of the ordinary or not. The algorithm then filters out voters that are collectively acting out of the ordinary for a particular ξ\xi. We show that at the end of these removals, the number of voters will only be a constant factor far from kk, i.e. most of the good voters will be retained. Finally, we show an upper-bound on the row sum of the absolute values of the deviations of the remaining Byzantine voters to complete the proof.∎

Intuitively, we get the terms dmax/k{d_{max}}/{k} and logk/dmin\sqrt{{\log k}/{d_{min}}} in Lemma 6 by considering the maximum average deviation for the row sum that can be caused for any ξ\xi by the Byzantine voters. This can be written as 𝗆𝖺𝗑_𝗈𝗎𝗍\mathsf{max\_out} voters having a deviation of dd and the remaining kk voters having a deviation of δ\delta. Therefore we can see that the overall deviation will be 1kd(𝗆𝖺𝗑_𝗈𝗎𝗍d+δk)\frac{1}{kd}\left(\mathsf{max\_out}\cdot d+\delta\cdot k\right). Substituting the values of the parameters, and choosing an appropriate QQ we get 𝒪(max(dmaxk,logkdmin))\mathcal{O}\left(\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right). Finally, using lemma 6, and a few lemmas from Negahban et al. [2017] we conclude that:

Theorem 7.

Given nn objects, let the comparison graph be G(n,p)G(n,p). Each pair in GG is compared kk times with the outcomes of comparisons produced as per the Byzantine-BTL model with weights [w1,,wn][w_{1},\dots,w_{n}]. Then there exists an algorithm such that when FK(1ϵ)/2F\leq K(1-\epsilon)/2, p10C2logn/np\geq 10C^{2}\log n/n, ϵ>0\epsilon>0 and k18dmax/ϵ2k\geq 18d_{max}/\epsilon^{2} the following bound on the error

ππ~π~480b5/2max(dmaxk,logkdmin)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 480b^{5/2}\max\left({\frac{d_{max}}{k}},\sqrt{\frac{\log k}{d_{min}}}\right)

holds with probability 16nC/816n(11.5C2)\geq 1-6n^{-C/8}-16n^{(1-1.5C^{2})}.

Remark. The BSR algorithm gives a suitable ranking in polynomial time, but is fairly slow as the complexity will be Ω(n5C2+1)\in\Omega(n^{5C^{2}+1}), which is not suitable for larger values of nn and CC.

Remark. Our process of filtering out Byzantine voters is tightly tied to the analysis of the Rank-Centrality algorithm and the procedure to remove voters may not be suitable for other algorithms. The advantage of tying the removal procedure to the ranking algorithm is that we would not remove voters who may vote in an adversarial fashion, but do not significantly affect the row sum of the absolute difference between the empirical and true Markov chain transition matrices. An algorithm that tries to explicitly identify Byzantine voters might end up removing such voters too (and hence their votes for other useful pairs) which might be counter-productive.

3.3 Fast Byzantine Spectral Ranking

While Algorithm 2 performs asymptotically similar to the Rank-Centrality algorithm, it is fairly slow requiring at least Ω(2dminn)\Omega(2^{d_{min}}\cdot n) time. We now present the Fast Byzantine Spectral Ranking (FBSR) Algorithm for more practical settings. Essentially in our previous algorithm, we ensure that (ji|Cij|>kdis)\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s) is upper-bounded. This takes up a lot of time because essentially we iterate over all ξ{1,1}di\xi\in\{-1,1\}^{d_{i}} in the algorithm. But, this can be improved. If di=βlog2nd_{i}=\beta\log_{2}n we can bucket the entries into β\beta buckets ([B1,Bβ][B_{1},\dots B_{\beta}]) of size log2n\log_{2}n each. Therefore an equivalent formulation would be to upper-bound l=1β(ji & jBl|Cij|>kdisβ)\sum_{l=1}^{\beta}\mathbb{P}(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>\frac{kd_{i}s}{\beta}). Here instead of summing up did_{i} absolute values, we will only be summing up log2n\log_{2}n values β\beta times.

Algorithm 3 Fast Byzantine Spectral Ranking
1:α1log((2+C/8)logn)/logk\alpha\leftarrow 1-\log((2+C/8)\log n)/\log k
2:for each object ii to be compared do
3:     𝗆𝖺𝗑_𝗌𝗂𝗓𝖾log2n\mathsf{max\_size}\leftarrow\log_{2}n \triangleright Can be made larger than log2n\log_{2}n
4:     βdi𝗆𝖺𝗑_𝗌𝗂𝗓𝖾\beta\leftarrow\left\lceil\frac{d_{i}}{\mathsf{max\_size}}\right\rceil
5:     Split NG(i)N_{G}(i) into β\beta buckets such that they are equally sized and let them be [B1,,Bβ][B_{1},\dots,B_{\beta}]
6:     for 𝖻𝗎𝖼𝗄𝖾𝗍1\mathsf{bucket}\leftarrow 1 to β\beta do
7:         𝗏𝗈𝗍𝖾𝗌𝖡𝗈𝗎𝗇𝖽_𝖲𝗎𝗆_𝖣𝖾𝗏𝗂𝖺𝗍𝗂𝗈𝗇𝗌\mathsf{votes}\leftarrow\mathsf{Bound\_Sum\_Deviations}(ii, B𝖻𝗎𝖼𝗄𝖾𝗍B_{\mathsf{bucket}}, 𝗌𝗂𝗓𝖾(B𝖻𝗎𝖼𝗄𝖾𝗍)\mathsf{size}(B_{\mathsf{bucket}}), kk, α\alpha)
8:         Using 𝗏𝗈𝗍𝖾𝗌\mathsf{votes} update PP as described in Equation 1
9:     end for
10:end for
11:Compute the stationary distribution π\pi which is the limit of pt+1T=ptTPp^{T}_{t+1}=p^{T}_{t}P.
Theorem 8.

Given nn objects, let the comparison graph be G(n,p)G(n,p). Each pair in GG is compared kk times with the outcomes of comparisons produced as per the Byzantine-BTL model with weights [w1,,wn][w_{1},\dots,w_{n}]. Then there exists an algorithm such that when FK(1ϵ)/2F\leq K(1-\epsilon)/2, p=10C2logn/np=10C^{2}\log n/n, ϵ>0\epsilon>0, and k18(2+C/8)logn/ϵ2k\geq 18(2+C/8)\log n/\epsilon^{2} the following bound on the error

ππ~π~480b5/2max(lognk,loglognlogn)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 480b^{5/2}\max\left({\frac{\log n}{k}},\sqrt{\frac{\log\log n}{\log n}}\right)

holds with probability 1(6+240C2)nC/8\geq 1-(6+240C^{2})n^{-C/8} that runs in 𝒪(n2)\mathcal{O}(n^{2}) time.

The detailed analysis can be found in Appendix A.4.

3.4 An Impossibility Result and optimality of FBSR

The main idea in the BSR algorithm was the estimation of an accurate mean for all values of ξ\xi. However, we can see that finding the mean can become very challenging if the Byzantine voters outnumber the good voters. This is because the Byzantine voters can always create a shifted binomial distribution and trick us into believing that an entirely different value is the mean.

We answer a natural question: is it even possible to find algorithms when FF crosses K/2K/2? We prove a fundamental result showing that for any ranking algorithm to give a satisfactory result, a majority of good voters will always be required. Formally we show that:

Theorem 9.

If FK/2F\geq K/2, then no algorithm can for all weights (π~\tilde{\pi}), output weights (π\pi^{\ast}) such that

ππ~π~f(n)\frac{\lVert{\pi}^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n)

with probability >1/2>1/2, where f(n)f(n) is a function that converges to 0 as nn goes to \infty.

Proof Sketch.

We prove this result by contradiction, let us suppose there is an algorithm 𝒜\mathcal{A} which can give a satisfactory ranking for all weights. We consider the F=K/2F=K/2 case, as the case where F>K/2F>K/2 can clearly be reduced to the F=K/2F=K/2 case.

We then consider two instances: (1) Good voters vote according to π~\tilde{\pi} and the Byzantine voters vote according to π~\tilde{\pi}^{\prime}. (2) Byzantine voters vote according to π~\tilde{\pi} and the good voters vote according to π~\tilde{\pi}^{\prime}. We go on to claim that when F=K/2F=K/2 these instances are indistinguishable to any algorithm. Since 𝒜\mathcal{A} succeeds with probability >1/2>1/2 for any instance, we can say by using the union bound that there must be a π\pi^{\ast} such that it is close to both π~\tilde{\pi} and π~\tilde{\pi}^{\prime}. We can then choose the values of π~\tilde{\pi} and π~\tilde{\pi}^{\prime} such that they are far from each other and therefore showing that there is no π\pi^{\ast} that it is close to both π~\tilde{\pi} and π~\tilde{\pi}^{\prime}. Giving us a contradiction. ∎

The detailed proof can be found in Appendix A.5.

Remark. This impossibility consequently shows us that the FBSR algorithm (cf. Theorem 8) is optimal in terms of tolerance towards the Byzantine fraction (F/KF/K).

3.5 Comparison with Agarwal et al. [2020]

While the setting of Agarwal et al. [2020] is different from our setting as mentioned in Section 1.2, if we were to consider both works in Byzantine voter setting a comparison could be made by considering an edge that has a Byzantine voter to be corrupted as per their model. Here, their algorithm ,to have a valid convergence proof, needs kΩ(logn)k\in\Omega(\log n). This would imply that F/KO(loglogn/log2n)F/K\in O(\log\log n/\log^{2}n) for the sparse graphs and F/KO(1/logn)F/K\in O(1/\log n) for denser graphs. Our algorithms can potentially tolerate a significantly higher corruption rate of 1/21/2.

Another way to compare our results with Agarwal et al. [2020] would be to set k=Ω(logn),p=Θ(logn/n)k=\Omega(\log n),p=\Theta(\log n/n) for our model and k=1,p=pk/kk^{\prime}=1,p^{\prime}=pk/k^{\prime} for their model. This allocation ensures only a single voter per edge for their model and equal number of comparisons in expectation. We see that their algorithm does not converge (L1𝒪(logn)L_{1}\in\mathcal{O}(\sqrt{\log n})), while our algorithms converges (L2𝒪(max(lognk,logklogn)))\left(L_{2}\in\mathcal{O}\left(\max\left(\frac{\log n}{k},\sqrt{\frac{\log k}{\log n}}\right)\right)\right). Furthermore, our algorithm can handle a corruption rate of 1/21/2 while their algorithm can handle a corruption rate of 𝒪(loglogn/logn)\mathcal{O}\left(\log\log n/\log n\right)333they also come up with an algorithm for denser graphs that can handle a corruption rate of 1/41/4, however the algorithm has worse time-complexity and requires pnϵ1p\in n^{\epsilon-1} for ϵ>0\epsilon>0.

4 Experimental Results

In this section, we confirm our theoretical bounds with experiments conducted on synthetic and real data. While truly Byzantine strategies might be hard to find and can even be computationally infeasible, we consider some practically possible strategies. We show the performance of Rank-Centrality against these strategies, supporting our results in section 3.1. We then proceed to show that these strategies are not as effective against the BSR/FBSR algorithm which gives better rankings.

Experimental Settings: We see that for the FBSR Algorithm to work well we need the existence of at least a few entries that are at a distance 5δ\geq 5\delta from m^\hat{m}. However, since we are summing up logn\log n entries we will need 5δlogn5\delta\leq\log n. Otherwise, the BSR algorithm will run exactly as the Rank-Centrality algorithm. Using the value of δ\delta from Algorithm 3 and setting Q=1Q=1 and using klognk\geq\log n we get:

252loglognlogn\frac{25}{2}\log\log n\leq\log n

While it is asymptotically true, we see that the RHS only becomes greater than the LHS when n1.18×1021n\geq 1.18\times 10^{21}. However, as our proof was designed to work against any Byzantine strategy, and at the same time the convergence derivation of Negahban et al. [2017] was fairly loose (for example, the union bound of 2d2^{d} probabilities to ensure the sum of absolute terms is bounded) the FBSR algorithm empirically works for a variety of strategies even for significantly smaller thresholds. We applied the FBSR Algorithm with the following modifications considering the smaller kk and nn values: (1) setting 5δ=1+𝗌𝗂𝗓𝖾(B𝖻𝗎𝖼𝗄𝖾𝗍)5\delta=1+\sqrt{\mathsf{size}(B_{\mathsf{bucket}})} and (2) setting 𝗆𝖺𝗑_𝗈𝗎𝗍\mathsf{max\_out} as k/20k/20. All experimental results in sections 4 and 3 have been taken by taking the mean over 10 random trials.

Synthetic Data: We consider the following parameters in our testing (1) n=200n=200 (2) k=100k=100 (3) p=20logn/np=20\log n/nand (4) wi=Uniform(1,100)w_{i}=\text{Uniform}(1,100). These are then normalized such that π~i=1\sum\tilde{\pi}_{i}=1. We compare our model with the Rank-Centrality algorithm on synthetic data. We consider four potential strategies used by Byzantine voters:

  1. (1)

    Fixed Order Vote: The Byzantine voters vote according to some random pre-determined order. This is a fairly simple strategy that does not even require the Byzantine voters to know the π~\tilde{\pi} values.

  2. (2)

    Opposite Vote: The Byzantine voters vote exactly the reverse of the weights. We had shown in section 3.1 that this strategy leads to a failure for some weights.

  3. (3)

    Opposite Vote Probabilistic: The Byzantine voters are equivalent to good voters, except their final answer is flipped. For example, the Byzantine voters will vote for object jj with probability wiwj+wi\frac{w_{i}}{w_{j}+w_{i}} and vote for object ii with probability wjwj+wi\frac{w_{j}}{w_{j}+w_{i}}.

  4. (4)

    Random Subset: For each pair, the Byzantine voters either decide to behave like good voters with half probability or decide to opposite vote.

Figures 3 and 3 demonstrate the performance of the Rank-Centrality algorithm and the FBSR algorithm against the aforementioned strategies. For Figure 3, we plot the relative L2L_{2}-Error against the Byzantine Fraction (F/KF/K). Similar to our results in section 3.1, we see that for most strategies the relative L2L_{2}-Error for the Rank-Centrality Algorithm grows linearly with F/KF/K. We can see that the relative L2L_{2}-Error for both Fixed Order Vote and Opposite Vote strategies are fairly high for the Rank-Centrality algorithm and that the Fixed Order Vote dominates the relative L2L_{2}-Error for the FBSR algorithm. However, even though the Fixed Order Vote is the best strategy against the FBSR algorithm. The FBSR algorithm is still able to perform around two times better than the Rank-Centrality algorithm across all values of F/KF/K.

We also confirm that our rankings are close to the actual rankings. To do this we consider Kendall’s Tau Correlation [Kendall, 1938] between our ranking and the actual ranking. From Figure 3, we see that for the Rank-Centrality algorithm, the Opposite Vote strategy is able to give very low values of τ\tau. When the Byzantine fraction is around 0.30.3 we get a Kendall’s Tau Correlation close to 0 (essentially a random ranking). We also see that the FBSR algorithm is able to give better results for all strategies.

Refer to caption
Figure 1: L2L_{2} vs F/KF/K
Refer to caption
Figure 2: τ\tau vs F/KF/K
Refer to caption
Figure 3: L2L_{2} vs nn

To further show the convergence of the algorithms in the Byzantine-BTL model for larger nn, we plot the relative L2L_{2} error and Kendall’s Tau Correlation against the number of objects in Figures 3 and 6 for the Fixed Order Vote strategy. We consider values of F/KF/K in [0.1,0.2][0.1,0.2]. These results come when we have set k=nk=n. We see that for a large range of the number of objects both metrics have remained constant for the Rank-Centrality algorithm thus supporting our hypothesis. On the other hand, we see that as we increase the number of objects the L2L_{2}-error gradually falls and the Kendall’s Tau Correlation gradually increases. The rather slow descent of the relative L2L_{2}-error is backed by Theorem 8 where we see that for larger kk the relative L2L_{2} error is 𝒪(loglognlogn)\in\mathcal{O}\left(\sqrt{\frac{\log\log n}{\log n}}\right).

Real Data: Experimentation with real data was not straightforward, considering that most datasets do not have voter labels nor do they have voter queries as requested by the BSR algorithm (one voter is always asked multiple queries of the form (i,j)(i,j) where ii is fixed and jj is varied). To get around this issue, we use complete rankings. We consider the Sushi dataset comprising 50005000 voters ordering the 1010 sushis from most preferred to least preferred. Given the preference order, we get 4545 pairwise votes from each of the voters. Since we are assuming that we are dealing with permutations here, we assume the Byzantine voters can use the following strategies:

  1. (1)

    Fixed Order Vote / Opposite Vote: Same as previously defined, but this time the entire permutation is given.

  2. (2)

    Opposite Random Flips: Each Byzantine voter starts with the opposite permutation and then swaps some of the objects at random.

Since the Sushi dataset has many voters we do not use our (2)nd modification mentioned above. We compute the true weights by applying the Rank-Centrality algorithm directly (thus the Rank-Centrality algorithm has a big advantage, especially for smaller weights). Figures 6 and 6 show the Rank-Centrality algorithm and the BSR algorithm’s performance. We see that the BSR algorithm achieves lower relative L2L_{2}-errors/Kendall’s Tau Correlation for most Byzantine fractions and strategies. The comparatively small improvement in Kendall’s Tau Correlation is attributed to a rather small difference in the weights of the various Sushi (5 sushis have weights [0.085,0.13]\in[0.085,0.13]).

Refer to caption
Figure 4: τ\tau vs nn
Refer to caption
Figure 5: Sushi: L2L_{2}
Refer to caption
Figure 6: Sushi: τ\tau

5 Conclusion and Further Work

We studied the potential disruption that Byzantine voters can do to the widely-used Rank-Centrality algorithm and proposed fast solutions to counter the same. Our algorithms and analysis techniques are quite non-trivial and indicate to us the need for developing this theory further. We require dΘ(logn)d\in\Theta(\log n) and kΩ(logn)k\in\Omega(\log n) for the success of the BSR and FBSR algorithms. However, Negahban et al. [2017] only required dΩ(logn)d\in\Omega(\log n) and kΩ(1)k\in\Omega(1). An interesting direction would be to explore whether sub-logarithmic values of kk will be possible. Alternatively, a lower bound on the number of required votes per edge for an algorithm to succeed with constant probability would also be an interesting direction.

References

  • Agarwal et al. [2018] A. Agarwal, P. Patil, and S. Agarwal. Accelerated spectral ranking. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 70–79. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/agarwal18b.html.
  • Agarwal et al. [2020] A. Agarwal, S. Agarwal, S. Khanna, and P. Patil. Rank aggregation from pairwise comparisons in the presence of adversarial corruptions. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 85–95. PMLR, 7 2020. URL https://proceedings.mlr.press/v119/agarwal20a.html.
  • Baltrunas et al. [2010] L. Baltrunas, T. Makcinskas, and F. Ricci. Group recommendations with rank aggregation and collaborative filtering. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, page 119–126, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605589060. doi: 10.1145/1864708.1864733. URL https://doi.org/10.1145/1864708.1864733.
  • Blanchard et al. [2017] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.
  • Bradley and Terry [1952] R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  • Brin and Page [1998] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, page 107–117, NLD, 1998. Elsevier Science Publishers B. V.
  • Caplin and Nalebuff [1991] A. Caplin and B. Nalebuff. Aggregation and social choice: A mean voter theorem. Econometrica, 59(1):1–23, 1991. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/2938238.
  • Chen et al. [2013] X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, page 193–202, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450318693. doi: 10.1145/2433396.2433420. URL https://doi.org/10.1145/2433396.2433420.
  • Chen and Suh [2015] Y. Chen and C. Suh. Spectral mle: Top-k rank aggregation from pairwise comparisons. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 371–380. JMLR.org, 2015.
  • Chen et al. [2018] Y. Chen, L. Su, and J. Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. SIGMETRICS Perform. Eval. Rev., 46(1):96, jun 2018. ISSN 0163-5999. doi: 10.1145/3292040.3219655. URL https://doi.org/10.1145/3292040.3219655.
  • Cucuringu [2016] M. Cucuringu. Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and sdp synchronization. IEEE Transactions on Network Science and Engineering, 3(1):58–79, 2016. doi: 10.1109/TNSE.2016.2523761.
  • Dwork et al. [2001] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, page 613–622, New York, NY, USA, 2001. Association for Computing Machinery. ISBN 1581133480. doi: 10.1145/371920.372165. URL https://doi.org/10.1145/371920.372165.
  • El Mhamdi et al. [2018] E. M. El Mhamdi, R. Guerraoui, and S. Rouault. The hidden vulnerability of distributed learning in Byzantium. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3521–3530. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/mhamdi18a.html.
  • El-Mhamdi et al. [2021] E. M. El-Mhamdi, S. Farhadkhani, R. Guerraoui, A. Guirguis, L.-N. Hoang, and S. Rouault. Collaborative learning in the jungle (decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 25044–25057. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/d2cd33e9c0236a8c2d8bd3fa91ad3acf-Paper.pdf.
  • Ford [1957] L. R. Ford. Solution of a ranking problem from binary comparisons. The American Mathematical Monthly, 64(8):28–33, 1957. ISSN 00029890, 19300972. URL http://www.jstor.org/stable/2308513.
  • Kendall [1938] M. G. Kendall. A NEW MEASURE OF RANK CORRELATION. Biometrika, 30(1-2):81–93, 06 1938. ISSN 0006-3444. doi: 10.1093/biomet/30.1-2.81. URL https://doi.org/10.1093/biomet/30.1-2.81.
  • Li et al. [2020] J. Li, W. Abbas, and X. Koutsoukos. Byzantine resilient distributed multi-task learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18215–18225. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/d37eb50d868361ea729bb4147eb3c1d8-Paper.pdf.
  • Luce [1959] R. D. Luce. Individual Choice Behavior. John Wiley, 1959.
  • Mohajer and Suh [2016] S. Mohajer and C. Suh. Active top-k ranking from noisy comparisons. 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 875–882, 2016.
  • Negahban et al. [2017] S. Negahban, S. Oh, and D. Shah. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1):266–287, 2017.
  • Rajkumar and Agarwal [2014] A. Rajkumar and S. Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 118–126, Bejing, China, 22–24 Jun 2014. PMLR. URL https://proceedings.mlr.press/v32/rajkumar14.html.
  • Soufiani et al. [2014] H. A. Soufiani, D. Parkes, and L. Xia. Computing parametric ranking models via rank-breaking. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 360–368, Bejing, China, 6 2014. PMLR. URL https://proceedings.mlr.press/v32/soufiani14.html.
  • Suh et al. [2017] C. Suh, V. Y. F. Tan, and R. Zhao. Adversarial top- kk ranking. IEEE Trans. Inf. Theor., 63(4):2201–2225, 4 2017. ISSN 0018-9448. doi: 10.1109/TIT.2017.2659660. URL https://doi.org/10.1109/TIT.2017.2659660.
  • Sun et al. [2021] G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, and J. Liu. Data poisoning attacks on federated machine learning. IEEE Internet of Things Journal, 2021.
  • Wu et al. [2021] Y. Wu, H. Chen, X. Wang, C. Liu, P. Nguyen, and Y. Yesha. Tolerating adversarial attacks and byzantine faults in distributed machine learning. In 2021 IEEE International Conference on Big Data (Big Data), pages 3380–3389, 2021. doi: 10.1109/BigData52589.2021.9671583.

Updated Checklist

  1. 1.

    For all authors…

    1. (a)

      Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes]

    2. (b)

      Did you describe the limitations of your work? [Yes]

    3. (c)

      Did you discuss any potential negative societal impacts of your work? [Yes]

    4. (d)

      Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes]

  2. 2.

    If you are including theoretical results…

    1. (a)

      Did you state the full set of assumptions of all theoretical results? [Yes]

    2. (b)

      Did you include complete proofs of all theoretical results? [Yes]

  3. 3.

    If you ran experiments…

    1. (a)

      Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]

    2. (b)

      Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]

    3. (c)

      Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes]

    4. (d)

      Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]

  4. 4.

    If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…

    1. (a)

      If your work uses existing assets, did you cite the creators? [N/A]

    2. (b)

      Did you mention the license of the assets? [N/A]

    3. (c)

      Did you include any new assets either in the supplemental material or as a URL? [N/A]

    4. (d)

      Did you discuss whether and how consent was obtained from people whose data you’re using/curating? [N/A]

    5. (e)

      Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A]

  5. 5.

    If you used crowdsourcing or conducted research with human subjects…

    1. (a)

      Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A]

    2. (b)

      Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A]

    3. (c)

      Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]

Appendix A Appendix

A.1 A primer on Rank-Centrality

A.1.1 Rank-Centrality Convergence

Rank-Centrality proves the below theorem

Theorem 10 (From Negahban et al. [2017]).

Given nn objects and a connected comparison graph G=([n],E)G=([n],E), let each pair (i,j)E(i,j)\in E be compared for kk times with outcomes produced as per a BTL model with parameters w1,,wnw_{1},\dots,w_{n}. Then, for some positive constant C8C\geq 8 and when k4C2(b5κ2/dmaxξ2)lognk\geq 4C^{2}(b^{5}\kappa^{2}/d_{max}\xi^{2})\log n, the following bound on the normalized error holds with probability at least 14nC/81-4n^{-C/8} :

ππ~π~Cb5/2κξlognkdmax\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq\frac{Cb^{5/2}\kappa}{\xi}\sqrt{\frac{\log n}{kd_{max}}}

where π~(i)=wi/lwl\tilde{\pi}(i)=w_{i}/\sum_{l}w_{l}, bmaxi,jwi/wjb\equiv\max_{i,j}w_{i}/w_{j} and κ=dmax/dmin\kappa=d_{max}/d_{min}

which when extended to Erdős-Rényi random graphs takes the form of

Theorem 11 (From Negahban et al. [2017]).

Given nn objects, let the comparison graph G=([n],E)G=([n],E) be generated by selecting each pair (i,j)(i,j) to be in EE with probability d/nd/n independently of everything else. Each such chosen pair of objects is compared kk times with the outcomes of comparisons produced as per a BTL model with parameters w1,,wnw_{1},\dots,w_{n} . Then if d10C2lognd\geq 10C^{2}\log n and kd128C2b5lognkd\geq 128C^{2}b^{5}\log n, the following bound on the error rate holds with probability at least 110nC/81-10n^{-C/8}:

ππ~π~8Cb5/2lognkdmax\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq 8Cb^{5/2}\sqrt{\frac{\log n}{kd_{max}}}

where π~(i)=wi/lwl\tilde{\pi}(i)=w_{i}/\sum_{l}w_{l} and bmaxi,jwi/wjb\equiv\max_{i,j}w_{i}/w_{j}

A.1.2 Rank-Centrality Convergence Proof Sketch

The proof from theorem 10 to 11 is straightforward and has no relation to the fact that there are Byzantine voters. Rank-Centrality algorithm’s convergence proof for Theorem 10 requires the proving of three lemmas (Lemmas 2,3 and 4 given below)

Lemma 12 (From Negahban et al. [2017]).

For any Markov chain P=P~+ΔP=\tilde{P}+\Delta with a reversible Markov chain P~\tilde{P}, let ptp_{t} be the distribution of the Markov chain PP when started with the initial distribution p0p_{0}. Then,

ptπ~π~ρtp0π~π~π~maxπ~min+11ρδ2π~maxπ~min\frac{\lVert p_{t}-\tilde{\pi}\rVert}{\tilde{\pi}}\leq\rho^{t}\frac{\lVert p_{0}-\tilde{\pi}\rVert}{\tilde{\pi}}\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}+\frac{1}{1-\rho}\lVert\delta\rVert_{2}\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}

where π~\tilde{\pi} is the stationary distribution of P~\tilde{P}, π~min=miniπ~(i)\tilde{\pi}_{min}=\min_{i}\tilde{\pi}(i), π~max=maxiπ~(i)\tilde{\pi}_{max}=\max_{i}\tilde{\pi}(i) and ρ=λmax(P~)+Δπ~maxπ~min\rho=\lambda_{max}(\tilde{P})+\Delta\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}

The next lemma bounds the norm of the Δ\Delta matrix:

Lemma 13 (From Negahban et al. [2017]).

For some constant C8C\geq 8, the error matrix Δ=PP~\Delta=P-\tilde{P} satisfies

Δ2Clognkdmax\lVert\Delta\rVert_{2}\leq C\sqrt{\frac{\log n}{kd_{max}}}

with probability at least 14nC/81-4n^{-C/8}.

The next lemma bounds the value of 1ρ1-\rho:

Lemma 14 (From Negahban et al. [2017]).

If Δ2Clognkdmax\lVert\Delta\rVert_{2}\leq C\sqrt{\frac{\log n}{kd_{max}}} and if the value of k4C2b5dmaxlogn(1/dmin2ξ2)k\geq 4C^{2}b^{5}d_{max}\log n(1/d^{2}_{min}\xi^{2}) then,

1ρξdminb2dmax1-\rho\geq\frac{\xi d_{min}}{b^{2}d_{max}}

Here we find that Lemmas 12 and 14 hold irrespective of whether there are Byzantine voters or not. Therefore we need to find a substitute for Lemma 13. The proof of Lemma 13 is split into two parts, we mainly focus on the first part where dd is sufficiently small and therefore is a sparse graph. Proving Lemma 13 for Erdős-Rényi graphs with lower degrees essentially boils down to proving that Ri=1kdmaxji|Aijkpij|sR_{i}=\frac{1}{kd_{max}}\sum_{j\neq i}|A_{ij}-kp_{ij}|\geq s with a very low probability, where s=C2logn+dmaxlog2kdmaxs=\frac{C}{2}\sqrt{\frac{\log n+d_{max}\log 2}{kd_{max}}}. In simpler terms, this implies that the sum of the absolute values of the deviation of the voters’ entries from the expected entries for a particular object is not a big number with high probability.

A.2 Rank-Centrality Failure

Proof.

Negahban et al. [2017] define an inner product space L2(π~)L^{2}(\tilde{\pi}) as a space of nn-dimensional vectors n\in\mathbb{R}^{n} endowed with

a,bπ~=i=1naiπ~ibi.\langle a,b\rangle_{\tilde{\pi}}=\sum_{i=1}^{n}a_{i}\tilde{\pi}_{i}b_{i}.

Similarly, they define aπ~=a,aπ~\lVert a\rVert_{\tilde{\pi}}=\langle a,a\rangle_{\tilde{\pi}} as the 2-norm in L2(π~)L^{2}(\tilde{\pi}) and Aπ~,2=maxaAaπ~/aπ~\lVert A\rVert_{\tilde{\pi},2}=\max_{a}\lVert Aa\rVert_{\tilde{\pi}}/\lVert a\rVert_{\tilde{\pi}} as the operator norm for a self-adjoint operator in L2(π~)L^{2}(\tilde{\pi}). They also relate these norms to norms in the Euclidean as

π~mina\displaystyle\sqrt{\tilde{\pi}_{min}}\lVert a\rVert aπ~π~maxa,\displaystyle\leq\lVert a\rVert_{\tilde{\pi}}\leq\sqrt{\tilde{\pi}_{max}}\lVert a\rVert, (3)
π~minπ~maxA2\displaystyle\sqrt{\frac{\tilde{\pi}_{min}}{\tilde{\pi}_{max}}}\lVert A\rVert_{2} Aπ~,2π~maxπ~minA2.\displaystyle\leq\lVert A\rVert_{\tilde{\pi},2}\leq\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}\lVert A\rVert_{2}. (4)

Finally Negahban et al. [2017] define the following matrices.

  1. (1)

    Π~\tilde{\Pi}: Diagonal matrix such that Π~ii=π~i\tilde{\Pi}_{ii}=\tilde{\pi}_{i}.

  2. (2)

    S:=Π~1/2P~Π~1/2S:=\tilde{\Pi}^{1/2}\tilde{P}\tilde{\Pi}^{-1/2}

  3. (3)

    S1S_{1}: The rank-1 projection of SS.

  4. (4)

    P1~:=Π~1/2S1~Π~1/2\tilde{P_{1}}:=\tilde{\Pi}^{-1/2}\tilde{S_{1}}\tilde{\Pi}^{1/2}

Using the above results Negahban et al. [2017] show that:

(ptπ~)T=(pt1π~)T(P~P1~+Δ)+π~TΔ(p_{t}-\tilde{\pi})^{T}=(p_{t-1}-\tilde{\pi})^{T}(\tilde{P}-\tilde{P_{1}}+\Delta)+\tilde{\pi}^{T}\Delta (5)

where ptp_{t} is the distribution of the Markov chain on PP after tt iterations, for large tt we have ptπp_{t}\rightarrow\pi. By the definition of P~1\tilde{P}_{1}, it follows that P~P~1π~,2=SS1=λmax\lVert\tilde{P}-\tilde{P}_{1}\rVert_{\tilde{\pi},2}=\lVert S-S_{1}\rVert=\lambda_{max}. Here λmax\lambda_{max} corresponds to the second-largest eigenvalue of the P~\tilde{P} matrix, which by the Perron-Frobenius Theorem is λmax=max(λ2,|λn|)\lambda_{max}=\max(\lambda_{2},|\lambda_{n}|). We can rearrange Equation 5 to get:

π~TΔπ~\displaystyle\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}} ptπ~π~+(P~P~1π~,2+Δπ~,2)pt1π~π~\displaystyle\leq\lVert p_{t}-\tilde{\pi}\rVert_{\tilde{\pi}}+(\lVert\tilde{P}-\tilde{P}_{1}\rVert_{\tilde{\pi},2}+\lVert\Delta\rVert_{\tilde{\pi},2})\cdot\lVert p_{t-1}-\tilde{\pi}\rVert_{\tilde{\pi}}
ptπ~π~+ρpt1π~π~\displaystyle\leq\lVert p_{t}-\tilde{\pi}\rVert_{\tilde{\pi}}+\rho\cdot\lVert p_{t-1}-\tilde{\pi}\rVert_{\tilde{\pi}}

here ρ=λmax+Δπ~,2\rho=\lambda_{max}+\lVert\Delta\rVert_{\tilde{\pi},2}. Choosing a large enough tt where the Markov chain has converged we get:

π~TΔπ~\displaystyle\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}} (1+ρ)ππ~π~\displaystyle\leq(1+\rho)\cdot\lVert\pi-\tilde{\pi}\rVert_{\tilde{\pi}}

We then use the bounds in Equations 3 and 4 to get:

ππ~π~π~\displaystyle\frac{\lVert\pi-\tilde{\pi}\rVert_{\tilde{\pi}}}{\lVert\tilde{\pi}\rVert} π~TΔπ~π~(1+ρ)\displaystyle\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert_{\tilde{\pi}}}{\lVert\tilde{\pi}\rVert\cdot(1+\rho)}
π~maxππ~π~\displaystyle\sqrt{\tilde{\pi}_{max}}\cdot\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert} π~minπ~TΔπ~(1+ρ)\displaystyle\geq\sqrt{\tilde{\pi}_{min}}\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(1+\rho)}
ππ~π~\displaystyle\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert} π~minπ~maxπ~TΔπ~(1+λmax+π~maxπ~minΔ2)\displaystyle\geq\sqrt{\frac{\tilde{\pi}_{min}}{\tilde{\pi}_{max}}}\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(1+\lambda_{max}+\sqrt{\frac{\tilde{\pi}_{max}}{\tilde{\pi}_{min}}}\cdot\lVert\Delta\rVert_{2})}

Since λmax1\lambda_{max}\leq 1 we get:

ππ~π~π~TΔπ~(2b+bΔ2)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert\cdot(2\sqrt{b}+b\lVert\Delta\rVert_{2})} (6)

Using lemmas 15 and 16 in Equation 6 we can conclude that :

ππ~π~Clogn(2b+b+2bClogn)b182b(b+1)FK\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{C\log n}{(2\sqrt{b}+b+2bC\log n)}\cdot\frac{b-1}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K} (7)

with probability 11nC/311nknCF2/64K21nC2logn/8\geq 1-\frac{1}{n^{C/3-1}}-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}. We can see that for large enough nn the RHS will converge to a constant with high probability. ∎

Lemma 15.

For the opposite-voting Byzantine strategy, we have

Δ21+2Clogn\lVert\Delta\rVert_{2}\leq 1+2C\log n

with probability 11nC/31\geq 1-\frac{1}{n^{C/3-1}}

Proof.

We will first get an upper-bound for the denominator. We know that:

Δ2\displaystyle\lVert\Delta\rVert_{2} Δ1Δ\displaystyle\leq\sqrt{\lVert\Delta\rVert_{1}\cdot\lVert\Delta\rVert_{\infty}}

These are essentially the maximum row and column sums. Since all of the entries in Δ\Delta are the difference corresponding entries between PP and P~\tilde{P} we know that they will always be between 1-1 and 11. Therefore we can say that both the maximum column and maximum row sum will be at most 1+dmax1+d_{max}. We can see the probability of having a vertex with a degree greater than 2Clogn2C\log n can by Chernoff’s bound that (di2Clogn)eClogn/3\mathbb{P}(d_{i}\geq 2C\log n)\leq e^{-C\log n/3}. Applying union bound we can say that (dmax2Clogn)neClogn/3\mathbb{P}(d_{max}\geq 2C\log n)\leq ne^{-C\log n/3}. We can see that by setting the value of C3C\geq 3 we can say that with high probability that dmax2Clognd_{max}\leq 2C\log n. Therefore we can conclude that Δ21+2Clogn\lVert\Delta\rVert_{2}\leq 1+2C\log n. Hence Proved. ∎

Lemma 16.

For the opposite-voting Byzantine strategy, there exists a π~\tilde{\pi} such that we have

π~TΔπ~(b1)Clogn82b(b+1)FK\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{(b-1)C\log n}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability 11nknCF2/64K21nC2logn/8\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}

Proof.

We now try to get a lower-bound on the RHS of Equation 6. We consider weights:

π~i={1n+(b1)n2for 1in2bn+(b1)n2for n2<in\tilde{\pi}_{i}=\begin{cases}\frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lfloor\frac{n}{2}\rfloor\\ \frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lfloor\frac{n}{2}\rfloor<i\leq n\end{cases}

Let us call the resulting vector from π~TΔ\tilde{\pi}^{T}\Delta as vv. We can see that for i=1i=1 to n/2\lfloor n/2\rfloor we have:

vi\displaystyle v_{i} =(i,j)Eπ~jΔji+π~iΔii\displaystyle=\sum_{(i,j)\in E}\tilde{\pi}_{j}\Delta_{ji}+\tilde{\pi}_{i}\Delta_{ii}
=(i,j)E&jn/2Δjin+(b1)n2+(i,j)E&j>n/2bΔjin+(b1)n2+π~iΔii\displaystyle=\sum_{(i,j)\in E\&j\leq\lfloor n/2\rfloor}\frac{\Delta_{ji}}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}+\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\frac{b\Delta_{ji}}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}+\tilde{\pi}_{i}\Delta_{ii}
=(b1)n+(b1)n2(i,j)E&j>n/2Δji\displaystyle=\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji} ((i,j)EΔji+Δii=0\sum_{(i,j)\in E}\Delta_{ji}+\Delta_{ii}=0)

Given the above relation for viv_{i} we can say that

i=1n/2vi2\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}v_{i}^{2} =((b1)n+(b1)n2)2(i=1n/2((i,j)E&j>n/2Δji)2)\displaystyle=\left(\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\right)^{2}\cdot\left(\sum_{i=1}^{\lfloor n/2\rfloor}\left(\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right)^{2}\right)
((b1)n+(b1)n2)21n2(i=1n/2(i,j)E&j>n/2Δji)2\displaystyle\geq\left(\frac{(b-1)}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}\right)^{2}\cdot\frac{1}{\left\lfloor\frac{n}{2}\right\rfloor}\cdot\left(\sum_{i=1}^{\lfloor n/2\rfloor}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right)^{2} (Cauchy-Schwarz)

Since the first two terms are already known we are primarily interesting in getting a high probability bound on the third term. We can see that we are only interested in n/2n/2n2/4\lfloor n/2\rfloor\cdot\lceil n/2\rceil\sim n^{2}/4 entries in the adjacency matrix. We can see that each of these have a probability pp of being an edge. Since p=Clogn/np=C\log n/n we can conclude there will be at least pn2/8pn^{2}/8 edges with probability

1ep2n2811nC2logn/8\geq 1-e^{-\frac{p^{2}n^{2}}{8}}\geq 1-\frac{1}{n^{{C^{2}\log n}/{8}}}

Furthermore, it is easy to see a single byzantine voter amongst the kk voters will cause a deviation of b(b+1)k\frac{b}{(b+1)k}. On the other hand, any good voter will in expectancy not cause any deviation. Therefore in expectancy, we can say that the deviation for a randomly chosen voter will be Fb(b+1)kK\frac{Fb}{(b+1)kK}. Here we are summing kpn2/8k\cdot pn^{2}/8 binomial random variables, therefore we can say that by Chernoff bound out of the kpn2/8kpn^{2}/8 voters the total deviation will be at least bF2Kk(b+1)kpn28\frac{bF}{2Kk(b+1)}\cdot\frac{kpn^{2}}{8} Byzantine voters with probability

1eF264K2Cknlogn=11nknCF2/64K2\geq 1-e^{-\frac{F^{2}}{64K^{2}}\cdot Ckn\log n}=1-\frac{1}{n^{knCF^{2}/64K^{2}}}

Therefore we can claim that:

(i=1n/2(i,j)E&j>n/2Δji)\displaystyle\left(\sum_{i=1}^{\lfloor n/2\rfloor}\sum_{(i,j)\in E\&j>\lfloor n/2\rfloor}\Delta_{ji}\right) Fkpn216Kb(b+1)k\displaystyle\geq\frac{Fkpn^{2}}{16K}\cdot\frac{b}{(b+1)k}
FCnblogn16K(b+1)\displaystyle\geq\frac{FCnb\log n}{16K(b+1)}

Substituting in the above equation and considering larger nn we get:

i=1n/2vi2\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}v_{i}^{2} 2n(b1nbFCnblogn32K(b+1))2\displaystyle\geq\frac{2}{n}\cdot\left(\frac{b-1}{nb}\cdot\frac{FCnb\log n}{32K(b+1)}\right)^{2}

Coming back to our objective of bounding π~TΔ\lVert\tilde{\pi}^{T}\Delta\rVert we can say that:

π~TΔ(b1)FClogn16(b+1)K12n\lVert\tilde{\pi}^{T}\Delta\rVert\geq\frac{(b-1)FC\log n}{16(b+1)K}\cdot\sqrt{\frac{1}{2n}}

We can also see that the maximum term in π~\tilde{\pi} can at most be bb+n1\frac{b}{b+n-1}. Therefore we can conclude that π~\lVert\tilde{\pi}\rVert will at most be nbb+n1\sqrt{n}\cdot\frac{b}{b+n-1}, which for constant values of bb essentially comes out to b2n\leq\frac{b}{2\sqrt{n}}.

Therefore we claim that:

π~TΔπ~(b1)Clogn82b(b+1)FK\frac{\lVert\tilde{\pi}^{T}\Delta\rVert}{\lVert\tilde{\pi}\rVert}\geq\frac{(b-1)C\log n}{8\sqrt{2}b(b+1)}\cdot\frac{F}{K}

with probability 11nknCF2/64K21nC2logn/8\geq 1-\frac{1}{n^{knCF^{2}/64K^{2}}}-\frac{1}{n^{{C^{2}\log n}/{8}}}

A.3 BSR Algorithm Analysis

We start off by proving Lemma 6.

Proof.

We initially prove a few lemmas to find out the probability that for some ξ\xi a voter will be at a distance of δ\delta away from the mean.

Lemma 17.

For each good voter vv and for each ξ{1,1}d\xi\in\{1,-1\}^{d} let Di,v=jiξjyi,j,vD_{i,v}=\sum_{j\neq i}\xi_{j}\cdot y_{i,j,v} where yi,j,vy_{i,j,v} is the vote casted by voter vv when queried about (i,j)(i,j). Then Di,vD_{i,v} is at distance δ\geq\delta away from the expected mean with probability at most 2exp(2δ2di)2\exp\left(-\frac{2\delta^{2}}{d_{i}}\right).

Proof.

We can directly apply Hoeffding’s inequality as even after being multiplied by ξ\xi the range of yi,j,vy_{i,j,v} still remains between 1-1 and +1+1. Therefore we can conclude that with probability 2exp(2δ2di)\geq 2\exp\left(-\frac{2\delta^{2}}{d_{i}}\right), Di,vD_{i,v} does not deviate from the mean by more than δ\delta. ∎

Corollary 18.

When δQ2dilogk\delta\geq\sqrt{\frac{Q}{2}d_{i}\log k}, then we have the probability of being at most δ\delta away from the mean 12eQlogk=12kQ\geq 1-2e^{-Q\log k}=1-2k^{-Q}.

Proof.

Trivially true using Lemma 17. ∎

Lemma 19.

If δ=Q2dilogk\delta=\sqrt{\frac{Q}{2}\cdot d_{i}\log k} and FK(1ϵ)2F\leq\frac{K(1-\epsilon)}{2} then with probability

1exp(kϵ18+dilog2)exp(kϵ22)\geq 1-\exp\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)-\exp\left(-\frac{k\epsilon^{2}}{2}\right)

the predicted mean will be 𝒪(δ)\mathcal{O}(\delta) away from the observed mean for all ξ\xi.

Proof.

We define a voter to be in good range if it is δ\delta away from the mean, otherwise, we say it is in bad range. We now make two claims:

  1. (1)

    At least (1ϵ1)(1-\epsilon_{1}) fraction of the good voters will be in good range with a probability 1exp(kϵ1/6+dilog2)\geq 1-\exp(-k\epsilon_{1}/6+d_{i}\log 2) for all values of ξ\xi.

  2. (2)

    With a probability of 1exp(kϵ22)\geq 1-\exp(-\frac{k\epsilon^{2}}{2}) we will have f<k2(1ϵ2)f<\frac{k}{2}\cdot(1-\frac{\epsilon}{2}), where ff are the number of Byzantine voters amongst the kk chosen voters.

Proof of Claim (1): From Corollary 18 we know that a voter will be in good range with probability 12kQ\geq 1-2k^{-Q}. Therefore we can claim by the Chernoff bound(setting μ=k1Q/2\mu=k^{1-Q}/2 and δ=2ϵ1kQ1\delta=2\epsilon_{1}k^{Q}-1) that for some ξ\xi the probability of having (1ϵ1)(1-\epsilon_{1}) voters not being in good range can be written as 1exp(kϵ1/6)\geq 1-\exp\left(k\epsilon_{1}/6\right). However, for this to be applicable over all values of ξ\xi we need to multiply this by 2di2^{d_{i}}. Therefore we can get the statement of Claim (1). It should be noted that for Claim 1 to hold with high probability we need

k6ϵ1(clogn+dmaxlog2)k\geq\frac{6}{\epsilon_{1}}(c^{\ast}\log n+d_{max}\log 2)

Proof of Claim (2): We can also see that since FK(1ϵ)2F\leq\frac{K(1-\epsilon)}{2} we can apply Hoeffding’s inequality to conclude the statement of claim 2.

By claim (1), we can see that there are at least (1ϵ1)(kf)(1-\epsilon_{1})\cdot(k-f) good voters that lie in good range with probability 1exp(kϵ1/6+dilog2)\geq 1-\exp(-k\epsilon_{1}/6+d_{i}\log 2). By Claim (2), we can also see that (kf)(k-f) will be at least k2(1+ϵ2)\frac{k}{2}\cdot(1+\frac{\epsilon}{2}) with probability 1exp(kϵ22)\geq 1-\exp(-\frac{k\epsilon^{2}}{2}). Therefore, we can conclude that there are at least k2(1ϵ1)(1+ϵ2)\frac{k}{2}\cdot(1-\epsilon_{1})\cdot(1+\frac{\epsilon}{2}) good voters in good range with probability 1exp(kϵ22)exp(kϵ1/6+dilog2)1-\exp(-\frac{k\epsilon^{2}}{2})-\exp(-k\epsilon_{1}/6+d_{i}\log 2) by using a simple application of the union bound. By choosing the value of ϵ1\epsilon_{1} such that

ϵ1ϵ2+ϵ\epsilon_{1}\leq\frac{\epsilon}{2+\epsilon} (8)

We can see that at most k/2k/2 good voters will be in good range whp if kΩ(logn)k\in\Omega(\log n). Therefore we can set ϵ1=ϵ/3\epsilon_{1}=\epsilon/3 to satisfy Equation 8.

Since know that at least k/2k/2 of the voters lie in the good range we can conclude that the median of the voter’s (good and Byzantine) outputs also lies in the good range. Therefore we can conclude that the sample median of the good voters is at most 2δ2\delta away from the predicted median.

While we have this result what we ideally want is a good predictor for the expected mean. We have already shown that the predicted median would be at a distance 2δ2\delta from the sample median of the good voters. We will now go on to show that the sample median of the good voters is also very close to the expected mean. For the sample median to be more than 2δ2\delta away from the expected mean we would need there to be less than 50%50\% of the entries δ\delta away from the expected mean. However, since we are assuming Claim 1 is in effect we know that 50%50\% of the entries are in good range with high probability. We can therefore conclude that the predicted median is at most 4δ4\delta away from the expected mean. Therefore if we consider entries that are 5δ5\delta away from the predicted median this will include the entries that are δ\delta away from the mean encapsulating a large fraction of the voters while ensuring that any chosen voter is at most 9δ9\delta away from the expected mean. ∎

We will now finally prove Lemma 6. As in the Rank-Centrality algorithm’s analysis, we will prove this in two parts. We know that Δ=D+Δ¯\Delta=D+\bar{\Delta}, where DD is a diagonal matrix consisting only of the diagonal entries of Δ\Delta. We can also decipher that Δ2D2+Δ¯2\lVert\Delta\rVert_{2}\leq\lVert D\rVert_{2}+\lVert\bar{\Delta}\rVert_{2} We now separately show that both D2\lVert D\rVert_{2} and Δ¯2\lVert\bar{\Delta}\rVert_{2} are o(1)o\left(1\right).

Lemma 20.

Algorithm 2 removes some voters such that we have:

Δ¯240max(dmaxk,logkdmin)\lVert\bar{\Delta}\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability 18n1.5C21\geq 1-\frac{8}{n^{1.5C^{2}-1}}.

Proof.

We will initially show that the non-diagonal matrix is bounded. Since we know that:

Δ¯2Δ¯1Δ¯\lVert\bar{\Delta}\rVert_{2}\leq\sqrt{\lVert\bar{\Delta}\rVert_{1}\lVert\bar{\Delta}\rVert_{\infty}}

Since Δ¯\bar{\Delta} is a symmetrical matrix we can simply bound the maximal-row sum of the absolute values of Δ¯\bar{\Delta}. To that end we define

Ri=1kdmaxji|Cij|R_{i}=\frac{1}{kd_{max}}\sum_{j\neq i}|C_{ij}|

where Cij=kAijkpijC_{ij}=kA_{ij}-kp_{ij}. Continuing from Equation (22) from Negahban et al. [2017] we get:

(Ris)\displaystyle\mathbb{P}(R_{i}\geq s) =(ji|Cij|>kdis)\displaystyle=\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s) (9)
ξ{1,1}di(jiξjCij>kdis)\displaystyle\leq\sum_{\xi\in\{-1,1\}^{d_{i}}}\mathbb{P}\left(\sum_{j\in\partial i}\xi_{j}C_{ij}>kd_{i}s\right) (10)
ξ{1,1}di(ji and goodξjCij>kdis2)+(ji and ByzξjCij>kdis2)\displaystyle\leq\sum_{\xi\in\{-1,1\}^{d_{i}}}\mathbb{P}\left(\sum_{j\in\partial i\text{ and good}}\xi_{j}C_{ij}>\frac{kd_{i}s}{2}\right)+\mathbb{P}\left(\sum_{j\in\partial i\text{ and Byz}}\xi_{j}C_{ij}>\frac{kd_{i}s}{2}\right) (11)

We need to show two things

  1. (1)

    The number of good voters that will get eliminated are at most a constant fraction.

  2. (2)

    The Byzantine voters that are not eliminated in any round can not cause the second term to get very large.

The first term in Equation 11 will work out the same way as originally upto constant factors because we will show that only a constant factor of the good voters can be removed with high probability.

Proof of (1):

We will now show that Line 13 in Algorithm 2 does not eliminate more than a constant fraction of the good voters. This is necessary to show a bound on the good voters’ term in Equation 11.

We will now calculate a bound on the maximum number of good out-of-bound(>5δ>5\delta away from the m^\hat{m}) entries that can be there across all ξ\xi. We claim that this number is 2k1Q+2k1α2k^{1-Q}+{2k^{1-\alpha}}. This is because by a similar argument as claim (1) we know that by the Chernoff-Hoeffding bound that the probability with which there will be so many out-of-bound entries is:

\displaystyle\mathbb{P} (1kYi2kQ+2kα)\displaystyle\left(\frac{1}{k}\sum Y_{i}\geq\frac{2}{k^{Q}}+\frac{2}{k^{\alpha}}\right)
exp(D(2kQ+2kα2kQ)k)\displaystyle\leq\exp\left(D\left(\frac{2}{k^{Q}}+\frac{2}{k^{\alpha}}\;\middle\|\;\frac{2}{k^{Q}}\right)k\right)
exp(D(p+pp)k)\displaystyle\leq\exp\left(-D\left(p+p^{\prime}\;\middle\|\;p\right)k\right) (p=2kαp^{\prime}=\frac{2}{k^{\alpha}} and p=2kQp=\frac{2}{k^{Q}})
exp(((p+p)log(1+pp)(1pp)log(1pp1p))k)\displaystyle\leq\exp\left(\left(-(p+p^{\prime})\cdot\log\left(1+\frac{p^{\prime}}{p}\right)-(1-p-p^{\prime})\log\left(\frac{1-p-p^{\prime}}{1-p}\right)\right)k\right)
exp(((p+p)log(1+pp)log(1pp))k)\displaystyle\leq\exp\left(\left(-(p+p^{\prime})\cdot\log\left(1+\frac{p^{\prime}}{p}\right)-\log\left(1-p-p^{\prime}\right)\right)k\right)
exp((p+p)(log(1+pp)32)k)\displaystyle\leq\exp\left(-(p+p^{\prime})\cdot\left(\log\left(1+\frac{p^{\prime}}{p}\right)-\frac{3}{2}\right)k\right) (log(1t)3t/2\log(1-t)\leq 3t/2 for t<0.5t<0.5)
exp(pk/2)=exp(k1α)\displaystyle\leq\exp\left(-p^{\prime}k/2\right)=\exp(-k^{1-\alpha})

For the last equation to hold we need p/pe21p^{\prime}/p\geq e^{2}-1. In other words, we need kQkα(e21)k^{Q}\geq k^{\alpha}\cdot(e^{2}-1), we can see that this can clearly be achieved for all values of α[0,1]\alpha\in[0,1] for large values of kk if Q>1Q>1. However, the probability that we have computed above is for some ξ\xi. For it to be applicable over all ξ\xi we need 2diexp(k1α)2^{d_{i}}\cdot\exp(-k^{1-\alpha}) to be very small, which we can see is satisfied when k1αdik^{1-\alpha}\geq d_{i}. Since we have defined α=1logdmax/logk\alpha=1-\log d_{max}/\log k, we can see that it is always true.

We see that for a voter to be removed there need to be 8k1Q+8k1α8k^{1-Q}+{8k^{1-\alpha}} out-of-bounds entries, therefore when we remove a single voter we will subsequently be removing 3 Byzantine voters. This ensures that at most k/6k/6 good voters can be removed. Therefore in total after the removal, the number of good voters are k3\geq\frac{k}{3}. Therefore using the same equation from Negahban et al. [2017], with probability 12diexp(k1α)\geq 1-2^{d_{i}}\cdot\exp(-k^{1-\alpha}) we can conclude that:

jiξj{1,1}(jgood|Cij|>kdis2)exp(k3dmaxs2+dilog2)\displaystyle\sum_{j\in\partial i}\sum_{\xi_{j}\in\{-1,1\}}\mathbb{P}\left(\sum_{j\in good}|C_{ij}|>\frac{kd_{i}s}{2}\right)\leq\exp\left(-\frac{k}{3}d_{max}s^{2}+d_{i}\log 2\right) (12)

The above result comes directly by applying Hoeffding’s inequality and is similar to the result in Negahban et al. [2017] as we are only handling good agents here. We can see that for this result to hold with high probability we need ks23log2\frac{ks^{2}}{3}\geq\log 2 since diΩ(logn)d_{i}\in\Omega(\log n). Therefore we come to the conclusion of:

s3log2ks\geq\sqrt{\frac{3\log 2}{k}} (13)

Proof of (2): We will now prove that the second term also does not occur with high probability. We will do this in two parts. We will consider voters that lie close to the mean and the voters that do not lie close to the mean.

We initially only consider the cases where the voters lie within the 5δ5\delta range from m^\hat{m}. We can see that for each ξ\xi, the Byzantine entries of Di,j,vD_{i,j,v} can only be at a distance of 9δ9\delta away from the mean or they cannot fall in the 5δ5\delta range. Therefore jξjCi,j\sum_{j}\xi_{j}C_{i,j} can only be at most 9δf9\delta f away from the mean. Therefore all we need to show is that 9δk214kdis\frac{9\delta k}{2}\leq\frac{1}{4}kd_{i}s, since fk/2f\leq k/2. This is equivalent to

s92Qlogkdi.s\geq 9\sqrt{\frac{2Q\log k}{d_{i}}}. (14)

Now all we need to do is consider the cases where the voters do not lie within the δ\delta range. We can see that we eliminate voters if for a particular ξ\xi they have a lot of out-of-bound entries. Therefore we can claim that for any ξ\xi there are only 8k1Q+8k1α8k^{1-Q}+{8k^{1-\alpha}} entries that are not in the 5δ5\delta range and each can at most be dd, we can see that the maximum deviation will at most be 8k(1kQ+1kα)di8k\cdot\left(\frac{1}{k^{Q}}+\frac{1}{k^{\alpha}}\right)d_{i}. Since we know that kQkα(e21)k^{Q}\geq k^{\alpha}\cdot(e^{2}-1) we can conclude that the max deviation will be 8e2k1αe2114kdis\frac{8e^{2}k^{1-\alpha}}{e^{2}-1}\leq\frac{1}{4}kd_{i}s. Using an approximate value for ee we get:

s38kα.s\geq\frac{38}{k^{\alpha}}. (15)

Therefore we can conclude that the second term does not occur. Therefore we can conclude that:

jiξj{1,1}(jByz|Cij|>kdis2)\displaystyle\sum_{j\in\partial i}\sum_{\xi_{j}\in\{-1,1\}}\mathbb{P}\left(\sum_{j\in Byz}|C_{ij}|>\frac{kd_{i}s}{2}\right) exp(kϵ18+dilog2)+exp(kϵ22)\displaystyle\leq\exp\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)+\exp\left(-\frac{k\epsilon^{2}}{2}\right) (16)

The last equation comes by using Equation 8. Combining Equations 13, 14, 15 we get

s=max(3log2k,38kα,92Qlogkdmin)s=\max\left(\sqrt{\frac{3\log 2}{k}},\frac{38}{k^{\alpha}},9\sqrt{\frac{2Q\log k}{d_{min}}}\right)

However, since dmin<kd_{min}<k we can get rid of the first term to get:

s=max(38dmaxk,92Qlogkdmin)s=\max\left(\frac{38d_{max}}{k},9\sqrt{\frac{2Q\log k}{d_{min}}}\right) (17)

Combining equations 12 and 16 with equation 11 we get:

(Ris)\displaystyle\mathbb{P}(R_{i}\geq s) e(kϵ18+dilog2)+ekϵ22+exp(k3dis2+dilog2)+exp(k1α+dilog2)\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+d_{i}\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k}{3}d_{i}s^{2}+d_{i}\log 2\right)+\exp(-k^{1-\alpha}+d_{i}\log 2)
e(kϵ18+dmaxlog2)+ekϵ22+exp(k3dmaxs2+dmaxlog2)+exp(k1α+dmaxlog2)\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+d_{max}\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k}{3}d_{max}s^{2}+d_{max}\log 2\right)+\exp(-k^{1-\alpha}+d_{max}\log 2)

Since this is a sum of 4 terms we individually analyse them. Firstly we see that the first two terms are clearly small because kΩ(logn)k\in\Omega(\log n). For the third term, we see that because of equation 17, ks2/31+log2ks^{2}/3\geq 1+\log 2 therefore making the third term fairly small as well. Finally, the definition of α\alpha ensures that the last term is small. Since we have to find a value for kk we can set kk to be:

k=max(18ϵ(dmaxlog2+clogn),2clognϵ2,3(1+log2)s2)k=\max\left(\frac{18}{\epsilon}(d_{max}\log 2+c^{\ast}\log n),\frac{2c^{\ast}\log n}{\epsilon^{2}},\frac{3(1+\log 2)}{s^{2}}\right) (18)

We see that the third term is asymptotically smaller than the first two, we can remove it from consideration. By setting

k18dmaxϵ2k\geq 18\frac{d_{max}}{\epsilon^{2}}

we get

(Ris)14edmax(1log2)\mathbb{P}(R_{i}\geq s)\leq 1-\frac{4}{e^{d_{max}(1-\log 2)}}

The Rank-Centrality analysis later takes into account the probability that all degrees are in between np/2np/2 and 3np/23np/2. Therefore we get

(Ris)14n5C2(1log2)\mathbb{P}(R_{i}\geq s)\leq 1-\frac{4}{n^{5C^{2}(1-\log 2)}}

Applying the Union bound we know that

(Δ¯2s)\displaystyle\mathbb{P}\left(\lVert\bar{\Delta}\rVert_{2}\geq s\right) 2n(Ris)\displaystyle\leq 2n\mathbb{P}(R_{i}\geq s) (19)

Therefore we can say that:

Δ¯2max(38dmaxk,92Qlogkdmin)\lVert\bar{\Delta}\rVert_{2}\leq\max\left(\frac{38d_{max}}{k},9\sqrt{\frac{2Q\log k}{d_{min}}}\right)

with probability 18nn5C2(1log2)18n1.5C21\geq 1-\frac{8n}{n^{5C^{2}(1-\log 2)}}\geq 1-\frac{8}{n^{1.5C^{2}-1}}. Since max(38,92Q)40\max\left(38,9\sqrt{2Q}\right)\leq 40 for values of 1Q91\leq Q\leq 9, we can say

Δ¯240max(dmaxk,logkdmin)\lVert\bar{\Delta}\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability 18n1.5C21\geq 1-\frac{8}{n^{1.5C^{2}-1}}. Hence Proved. ∎

We will now prove that the diagonal entries of Δ\Delta are bounded

Lemma 21.

Algorithm 2 removes some voters such that we have:

D240max(dmaxk,logkdmin)\lVert D\rVert_{2}\leq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

with probability 18n1.5C21\geq 1-\frac{8}{n^{1.5C^{2}-1}}.

Proof.

From Equations (15) and (16) we know that

Δii=1kdmaxjiCij\Delta_{ii}=-\frac{1}{kd_{max}}\sum_{j\neq i}C_{ij} (20)

Therefore we can write

(D240max(dmaxk,logkdmin))\displaystyle\mathbb{P}\left(\lVert D\rVert_{2}\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right) i=1n(|jiCij|40max(dmaxk,logkdmin))\displaystyle\leq\sum_{i=1}^{n}\mathbb{P}\left(|\sum_{j\neq i}C_{ij}|\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right)

Since in the previous part we had proved a much strong bound wherein we had shown that i=1n(ji|Cij|40logdidi)8n1.5C21\sum_{i=1}^{n}\mathbb{P}\left(\sum_{j\neq i}|C_{ij}|\geq 40\sqrt{\frac{\log d_{i}}{d_{i}}}\right)\leq\frac{8}{n^{1.5C^{2}-1}}, we can conclude that by the triangle inequality we have:

(D240max(dmaxk,logkdmin))\displaystyle\mathbb{P}\left(\lVert D\rVert_{2}\geq 40\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)\right) 8n1.5C21\displaystyle\leq\frac{8}{n^{1.5C^{2}-1}}

Hence Proved. ∎

Now combining Lemmas 20 and 21 we have proved Lemma 6. ∎

We now finally prove Theorem 7. Similar to the Rank-Centrality convergence analysis, using the above results and using Lemmas 12 and 14 from Negahban et al. [2017] we can conclude that

ππ~π80b5/2κξ(G)max(dmaxk,logkdmin)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\pi\rVert}\leq\frac{80b^{5/2}\kappa}{\xi(G)}\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

where κ=dmax/dmin\kappa=d_{max}/d_{min}. Similar to Negahban et al. [2017] we can see that for an Erdős-Rényi graph with probability 16nC/8\geq 1-6n^{-C/8} we have κ3\kappa\leq 3 and ξ(G)1/2\xi(G)\geq 1/2. Substituting we get:

ππ~π480b5/2max(dmaxk,logkdmin)\frac{\lVert\pi-\tilde{\pi}\rVert}{\lVert\pi\rVert}\leq 480b^{5/2}\max\left(\frac{d_{max}}{k},\sqrt{\frac{\log k}{d_{min}}}\right)

A.4 FBSR Algorithm Analysis

Firstly since we are dealing with p=10C2logn/np=10C^{2}\log n/n and since the proof of the Rank-Centrality algorithm already takes into account the probability that all degrees will be between pn/2pn/2 and 3pn/23pn/2 we can say that the minimum degree will at most be 5C2logn5C^{2}\log n. Therefore we can say that any bucket will have at least 5C2log25C2log2+1log2n\frac{5C^{2}\log 2}{5C^{2}\log 2+1}\log_{2}n objects and at most log2n\log_{2}n objects. We know that:

(ji|Cij|>kdis)\displaystyle\mathbb{P}(\sum_{j\in\partial i}|C_{ij}|>kd_{i}s) l=1β(ji & jBl|Cij|>ks|Bl|)\displaystyle\leq\sum_{l=1}^{\beta}\mathbb{P}\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>ks|B_{l}|\right) (21)

Using the same analysis as above we can say that:

\displaystyle\mathbb{P} (ji & jBl|Cij|>ks|Bl|)\displaystyle\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>ks|B_{l}|\right)
e(kϵ18+|Bl|log2)+ekϵ22+exp(k|Bl|3s2+|Bl|log2)+exp(k1α+|Bl|log2)\displaystyle\leq e^{\left(-\frac{k\epsilon}{18}+|B_{l}|\log 2\right)}+e^{-\frac{k\epsilon^{2}}{2}}+\exp\left(-\frac{k|B_{l}|}{3}s^{2}+|B_{l}|\log 2\right)+\exp(-k^{1-\alpha}+|B_{l}|\log 2)

Substituting k18(2+C/8)logn/ϵ2k\geq 18(2+C/8)\log n/\epsilon^{2} we get

(ji & jBl|Cij|>ks|Bl|)4n1+C/8\displaystyle\mathbb{P}\left(\sum_{j\in\partial i\text{ \& }j\in B_{l}}|C_{ij}|>ks|B_{l}|\right)\leq\frac{4}{n^{1+C/8}}

We can also see that β\beta will be at most 15C215C^{2}. Substituting in Equation 21 we get:

(Ris)60C2n1+C/8\displaystyle\mathbb{P}(R_{i}\geq s)\leq\frac{60C^{2}}{n^{1+C/8}}

where ss is defined as

s=max(38lognk,92Q(5C2log2+1)loglogn5C2lognlog2)s=\max\left(\frac{38\log n}{k},9\sqrt{2Q\frac{(5C^{2}\log 2+1)\log\log n}{5C^{2}\log n\log 2}}\right)

Using C1C\geq 1 and 1Q41\leq Q\leq 4 to get:

s=40max(lognk,loglognlogn)s=40\max\left(\frac{\log n}{k},\sqrt{\frac{\log\log n}{\log n}}\right)

Finally using Equation 19 we get

(Δ¯2s)120C2nC/8\displaystyle\mathbb{P}\left(\lVert\bar{\Delta}\rVert_{2}\geq s\right)\leq\frac{120C^{2}}{n^{C/8}}

The other steps are identical to the analysis of the BSR algorithm

A.5 Impossibility result proof

Proof.

We will prove by contradiction. Let there be an algorithm 𝒜\mathcal{A} such that for all input weights it returns output weights such that ππ~π~f(n)\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n) with probability >1/2>1/2, where f(n)f(n) converges to 0. We consider the case where F=K/2F=K/2. We can see that the cases where F>K/2F>K/2 can be handled by making 2FK2F-K Byzantine voters just output 11 irrespective of the queries. We can assume that the algorithm is aware of these 2FK2F-K voters and will therefore always ignore these voters. Therefore the problem reduces to the F=K/2F=K/2 case.

Consider an instance (Instance 1) where π~\tilde{\pi} is defined as follows:

π~i={1n+(b1)n2for 1in2bn+(b1)n2for n2<in\tilde{\pi}_{i}=\begin{cases}\frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lfloor\frac{n}{2}\rfloor\\ \frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lfloor\frac{n}{2}\rfloor<i\leq n\end{cases} (22)

Let us consider the case where the good voters are numbered [1,K2][1,\frac{K}{2}], while the Byzantine voters are numbered [K2+1,K][\frac{K}{2}+1,K]. We can use 𝒜\mathcal{A} to give us a π\pi^{\ast} such that ππ~π~f(n)\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n) with probability >1/2>1/2. We now outline the strategy that the Byzantine voters can use. The Byzantine voters give their results according to some π~\tilde{\pi}^{\prime}. Where:

π~i={bn+(b1)n2for 1in21n+(b1)n2for n2<in\tilde{\pi}^{\prime}_{i}=\begin{cases}\frac{b}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }1\leq i\leq\lceil\frac{n}{2}\rceil\\ \frac{1}{n+(b-1)\cdot\lceil\frac{n}{2}\rceil}&\text{for }\lceil\frac{n}{2}\rceil<i\leq n\end{cases} (23)

In another instance (Instance 2), we consider the good and Byzantine voters flip their numberings i.e. the Byzantine voters are numbered from [1,K2][1,\frac{K}{2}] and the good voters are numbered from [K2+1,K][\frac{K}{2}+1,K]. Furthermore, let the input weights be π~\tilde{\pi}^{\prime} and the Byzantine voters vote according to π~\tilde{\pi}. Based on the description of our instance and the existence of algorithm 𝒜\mathcal{A}, with probability >1/2>1/2 we will get a weight output from 𝒜\mathcal{A} such that

ππ~π~f(n)\frac{\lVert\pi^{\ast}-\tilde{\pi}^{\prime}\rVert}{\lVert\tilde{\pi}^{\prime}\rVert}\leq f(n) (24)

But to any algorithm, the above two instances are identical and therefore with probability >1/2>1/2:

ππ~π~f(n)\frac{\lVert\pi^{\ast}-\tilde{\pi}\rVert}{\lVert\tilde{\pi}\rVert}\leq f(n) (25)

We can therefore conclude that with a non-zero probability we require both Equations 24 and 25 by a simple application of the union bound. We see that:

π~π~\displaystyle\lVert\tilde{\pi}-\tilde{\pi}^{\prime}\rVert π~π+ππ~\displaystyle\leq\lVert\tilde{\pi}-\pi^{\ast}\rVert+\lVert{\pi}^{\ast}-\tilde{\pi}^{\prime}\rVert (Triangle Inequality)
π~π~\displaystyle\lVert\tilde{\pi}-\tilde{\pi}^{\prime}\rVert f(n)(π~+π~)\displaystyle\leq f(n)\cdot\left(\lVert\tilde{\pi}\rVert+\lVert\tilde{\pi}^{\prime}\rVert\right) (Equations 24 and 25)
12(b1)n1b2n/2+n/2\displaystyle\frac{1}{2}\cdot\frac{(b-1)\cdot\sqrt{n-1}}{\sqrt{b^{2}\cdot\lceil{n/2}\rceil+\lfloor n/2\rfloor}} f(n)\displaystyle\leq f(n) (Equations 22 and 23)

For larger nn, the LHS converges to b12b\geq\frac{b-1}{2b}, while the RHS converges to 0. Giving a contradiction. ∎

A.6 Broader Impact

Due to the more theoretical nature of our research, we do not see any potential negative societal impacts associated with the work.

A.7 Experimental Results and Code

We provide the detailed tabular results for Figures 3, 3, 3, 6, 6, and 6 in Tables 1, 2, 3, 4, 5 and 6 with error bounds for all the obtained results. The code for producing these can be found here.

Table 1: L2L_{2}-Errors for the FBSR and RC algorithms on Synthetic Data when Byzantine Fraction(BF) is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, OVP = Opposite Vote Probabilistic, RS = Random Subset

Strategy Algo 0 0.05 0.1 0.15 0.2 0.25 0.3 FOV FBSR 0.02 ±\pm 0.001 0.03 ±\pm 0.002 0.04 ±\pm 0.002 0.07 ±\pm 0.004 0.11 ±\pm 0.008 0.15 ±\pm 0.011 0.23 ±\pm 0.015 RC 0.02 ±\pm 0.002 0.07 ±\pm 0.002 0.14 ±\pm 0.002 0.2 ±\pm 0.005 0.27 ±\pm 0.006 0.33 ±\pm 0.008 0.4 ±\pm 0.01 OV FBSR 0.02 ±\pm 0.002 0.02 ±\pm 0.001 0.02 ±\pm 0.001 0.02 ±\pm 0.002 0.03 ±\pm 0.002 0.04 ±\pm 0.005 0.06 ±\pm 0.007 RC 0.02 ±\pm 0.001 0.09 ±\pm 0.002 0.17 ±\pm 0.005 0.24 ±\pm 0.004 0.31 ±\pm 0.005 0.38 ±\pm 0.007 0.46 ±\pm 0.006 OVP FBSR 0.02 ±\pm 0.001 0.02 ±\pm 0.001 0.03 ±\pm 0.002 0.05 ±\pm 0.002 0.07 ±\pm 0.004 0.09 ±\pm 0.004 0.13 ±\pm 0.004 RC 0.02 ±\pm 0.001 0.06 ±\pm 0.003 0.1 ±\pm 0.008 0.15 ±\pm 0.01 0.2 ±\pm 0.006 0.24 ±\pm 0.01 0.29 ±\pm 0.015 RS FBSR 0.02 ±\pm 0.001 0.02 ±\pm 0.001 0.03 ±\pm 0.002 0.04 ±\pm 0.002 0.06 ±\pm 0.004 0.09 ±\pm 0.003 0.12 ±\pm 0.004 RC 0.02 ±\pm 0.001 0.05 ±\pm 0.002 0.09 ±\pm 0.004 0.13 ±\pm 0.005 0.17 ±\pm 0.002 0.21 ±\pm 0.005 0.25 ±\pm 0.006

Table 2: Kendall’s Tau Correlation for the FBSR and RC algorithms on Synthetic Data when BF is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, OVP = Opposite Vote Probabilistic, RS = Random Subset

Strategy Algo 0 0.05 0.1 0.15 0.2 0.25 0.3 FOV FBSR 0.98 ±\pm 0.003 0.97 ±\pm 0.003 0.96 ±\pm 0.003 0.92 ±\pm 0.007 0.89 ±\pm 0.012 0.85 ±\pm 0.015 0.8 ±\pm 0.018 RC 0.97 ±\pm 0.003 0.92 ±\pm 0.006 0.85 ±\pm 0.01 0.77 ±\pm 0.016 0.7 ±\pm 0.014 0.62 ±\pm 0.027 0.55 ±\pm 0.026 OV FBSR 0.98 ±\pm 0.002 0.98 ±\pm 0.002 0.98 ±\pm 0.003 0.97 ±\pm 0.002 0.97 ±\pm 0.004 0.97 ±\pm 0.004 0.96 ±\pm 0.005 RC 0.97 ±\pm 0.002 0.97 ±\pm 0.002 0.94 ±\pm 0.021 0.89 ±\pm 0.018 0.72 ±\pm 0.052 0.49 ±\pm 0.079 0.15 ±\pm 0.107 OVP FBSR 0.98 ±\pm 0.003 0.98 ±\pm 0.002 0.98 ±\pm 0.003 0.97 ±\pm 0.003 0.97 ±\pm 0.002 0.97 ±\pm 0.004 0.96 ±\pm 0.004 RC 0.98 ±\pm 0.002 0.97 ±\pm 0.002 0.97 ±\pm 0.004 0.96 ±\pm 0.004 0.96 ±\pm 0.003 0.95 ±\pm 0.005 0.93 ±\pm 0.005 RS FBSR 0.98 ±\pm 0.002 0.98 ±\pm 0.002 0.97 ±\pm 0.003 0.97 ±\pm 0.004 0.97 ±\pm 0.004 0.96 ±\pm 0.003 0.95 ±\pm 0.009 RC 0.97 ±\pm 0.002 0.97 ±\pm 0.003 0.96 ±\pm 0.006 0.94 ±\pm 0.004 0.92 ±\pm 0.008 0.88 ±\pm 0.016 0.83 ±\pm 0.027

Table 3: L2L_{2}-Errors for the FBSR and RC algorithms on Synthetic Data when n(=k)n(=k) is varied.

Algo BF 50 90 130 170 210 250 FBSR 0.1 0.063 ±\pm 0.009 0.046 ±\pm 0.005 0.042 ±\pm 0.003 0.039 ±\pm 0.005 0.037 ±\pm 0.004 0.034 ±\pm 0.003 FBSR 0.2 0.133 ±\pm 0.019 0.123 ±\pm 0.011 0.106 ±\pm 0.011 0.099 ±\pm 0.008 0.1 ±\pm 0.009 0.094 ±\pm 0.006 RC 0.1 0.14 ±\pm 0.011 0.14 ±\pm 0.005 0.14 ±\pm 0.005 0.14 ±\pm 0.005 0.14 ±\pm 0.007 0.14 ±\pm 0.002 RC 0.2 0.27 ±\pm 0.013 0.27 ±\pm 0.005 0.26 ±\pm 0.01 0.27 ±\pm 0.006 0.27 ±\pm 0.007 0.27 ±\pm 0.005

Table 4: Kendall’s Tau Correlation for the FBSR and RC algorithms on Synthetic Data when n(=k)n(=k) is varied.

Algo BF 50 90 130 170 210 250 FBSR 0.1 0.936 ±\pm 0.008 0.949 ±\pm 0.008 0.952 ±\pm 0.008 0.958 ±\pm 0.007 0.958 ±\pm 0.004 0.963 ±\pm 0.004 FBSR 0.2 0.867 ±\pm 0.021 0.892 ±\pm 0.017 0.899 ±\pm 0.007 0.898 ±\pm 0.014 0.902 ±\pm 0.01 0.908 ±\pm 0.008 RC 0.1 0.83 ±\pm 0.017 0.85 ±\pm 0.02 0.85 ±\pm 0.012 0.85 ±\pm 0.014 0.84 ±\pm 0.015 0.84 ±\pm 0.005 RC 0.2 0.67 ±\pm 0.047 0.67 ±\pm 0.027 0.69 ±\pm 0.019 0.69 ±\pm 0.021 0.69 ±\pm 0.023 0.69 ±\pm 0.01

Table 5: L2L_{2}-Error for the BSR and RC algorithms on Sushi Dataset when the Byzantine Fraction is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, ORF = Opposite Random Flips

Strategy Algo 0 0.05 0.1 0.15 0.2 FOV BSR 0.0 ±\pm 0.0 0.063 ±\pm 0.013 0.102 ±\pm 0.041 0.207 ±\pm 0.062 0.267 ±\pm 0.049 RC 0.0 ±\pm 0.0 0.07 ±\pm 0.019 0.15 ±\pm 0.043 0.2 ±\pm 0.047 0.24 ±\pm 0.042 OV BSR 0.0 ±\pm 0.0 0.08 ±\pm 0.0 0.076 ±\pm 0.0 0.126 ±\pm 0.0 0.169 ±\pm 0.0 RC 0.0 ±\pm 0.0 0.11 ±\pm 0.0 0.21 ±\pm 0.0 0.29 ±\pm 0.0 0.37 ±\pm 0.0 ORF BSR 0.0 ±\pm 0.0 0.135 ±\pm 0.002 0.067 ±\pm 0.001 0.085 ±\pm 0.002 0.117 ±\pm 0.006 RC 0.0 ±\pm 0.0 0.09 ±\pm 0.003 0.16 ±\pm 0.002 0.23 ±\pm 0.003 0.29 ±\pm 0.003

Table 6: Kendall’s Tau Correlation for the BSR and RC algorithms on Sushi Dataset when Byzantine Fraction is varied. FOV = Fixed Order Vote, OV = Opposite Order Vote, ORF = Opposite Random Flips

Strategy Algo 0 0.05 0.1 0.15 0.2 FOV BSR 1.0 ±\pm 0.0 0.942 ±\pm 0.049 0.844 ±\pm 0.116 0.742 ±\pm 0.067 0.653 ±\pm 0.084 RC 1.0 ±\pm 0.0 0.92 ±\pm 0.062 0.88 ±\pm 0.056 0.8 ±\pm 0.069 0.69 ±\pm 0.153 OV BSR 1.0 ±\pm 0.0 1.0 ±\pm 0.0 0.956 ±\pm 0.0 0.822 ±\pm 0.0 0.511 ±\pm 0.0 RC 1.0 ±\pm 0.0 1.0 ±\pm 0.0 0.96 ±\pm 0.0 0.82 ±\pm 0.0 0.38 ±\pm 0.0 ORF BSR 1.0 ±\pm 0.0 1.0 ±\pm 0.0 0.991 ±\pm 0.013 0.969 ±\pm 0.018 0.893 ±\pm 0.04 RC 1.0 ±\pm 0.0 1.0 ±\pm 0.0 0.96 ±\pm 0.0 0.96 ±\pm 0.013 0.85 ±\pm 0.022