This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Multi-Agent Bandits Over Undirected Graphs

Daniel Vial [email protected] University of Texas at AustinAustinTexasUSA Sanjay Shakkottai [email protected] University of Texas at AustinAustinTexasUSA  and  R. Srikant [email protected] University of Illinois Urbana-ChampaignUrbana-ChampaignIllinoisUSA
(August 2022; October 2022; November 2022)
Abstract.

We consider a multi-agent multi-armed bandit setting in which nn honest agents collaborate over a network to minimize regret but mm malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m+K/n)log(T)/Δ)O((m+K/n)\log(T)/\Delta) regret in this setting, where KK is the number of arms and Δ\Delta is the arm gap. For mKm\ll K, this improves over the single-agent baseline regret of O(Klog(T)/Δ)O(K\log(T)/\Delta).

In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in KK and nn. In light of this negative result, we propose a new algorithm for which the ii-th agent has regret O((dmal(i)+K/n)log(T)/Δ)O((d_{\text{mal}}(i)+K/n)\log(T)/\Delta) on any connected and undirected graph, where dmal(i)d_{\text{mal}}(i) is the number of ii’s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i)=md_{\text{mal}}(i)=m), and show the effect of malicious agents is entirely local (in the sense that only the dmal(i)d_{\text{mal}}(i) malicious agents directly connected to ii affect its long-term regret).

multi-armed bandits; malicious agents
copyright: acmcopyrightjournal: POMACSjournalyear: 2022journalvolume: 6journalnumber: 3article: 53publicationmonth: 12price: 15.00doi: 10.1145/3570614ccs: Theory of computation Online learning theoryccs: Theory of computation Sequential decision makingccs: Theory of computation Regret boundsccs: Theory of computation Multi-agent learningccs: Theory of computation Distributed algorithmsccs: Theory of computation Adversarial learning

1. Introduction

Motivated by applications including distributed computing, social recommendation systems, and federated learning, a number of recent papers have studied multi-agent variants of the classical multi-armed bandit problem. Typically, these variants involve a large number of agents playing a bandit while communicating over a network. The goal is to devise communication protocols that allow the agents to efficiently amalgamate information, thereby learning the bandit’s parameters more quickly than they could by running single-agent algorithms in isolation.

Among the many multi-agent variants considered in the literature (see Section 1.5), we focus on a particular setting studied in the recent line of work (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021; Vial et al., 2021). In these papers, nn agents play separate instances of the same KK-armed bandit and are restricted to o(T)o(T) pairwise and bit-limited communications per TT arm pulls. We recount two motivating applications from this prior work.

Example 1.

For an e-commerce site (e.g., Amazon), the agents model nn servers choosing one of KK products to show visitors to the site. The product selection problem can be viewed as a bandit – products are arms, while purchases yield rewards – and communication among the agents/servers is restricted by bandwidth.

Example 2.

For a social recommendation site (e.g., Yelp), the agents represent nn users choosing among KK items, such as restaurants. This is analogously modeled as a bandit, and communication is limited because agents/users are exposed to a small portion of all reviews.

To contextualize our contributions, we next discuss this line of work in more detail.

1.1. Fully cooperative multi-agent bandits

The goal of (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) is to devise fully cooperative algorithms for which the cumulative regret RT(i)R_{T}^{(i)} of each agent ii is small (see (5) for the formal definition of regret). All of these papers follow a similar approach, which roughly proceeds as follows (see Section 3 for details):

  • The arms are partitioned into nn subsets of size O(K/n)O(K/n), and each agent is assigned a distinct subset called a sticky set, which they are responsible for exploring.

  • Occasionally (o(T)o(T) times per TT arm pulls), each agent ii asks a random neighbor ii^{\prime} for an arm recommendation; ii^{\prime} responds with the arm they believe is best, which ii begins playing.

For these algorithms, the regret analysis essentially contains two steps:

  • First, the authors show that the agent (say, ii^{\star}) with the true best arm in its sticky set eventually identifies it as such. Thereafter, a gossip process unfolds. Namely, ii^{\star} recommends the best arm to its neighbors, who recommend it to their neighbors, etc., spreading the best arm to all agents. The spreading time (and thus the regret before this time) is shown to be polynomial in KK, nn, and 1/Δ1/\Delta, where Δ\Delta is the gap in mean reward between the two best arms.

  • Once the best arm spreads, agents play only it and their sticky sets, so long-term, they effectively face O(K/n)O(K/n)-armed bandits instead of the full KK-armed bandit. By classical bandit results (see, e.g., (Auer et al., 2002)), this implies O((K/n)log(T)/Δ)O((K/n)\log(T)/\Delta) regret over horizon TT.

Hence, summing up the two terms, (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) provide regret bounds of the form111More precisely, (Chawla et al., 2020; Newton et al., 2021) prove (1), while the K/nK/n term balloons to (K/n)+logn(K/n)+\log n in (Sankararaman et al., 2019).

(1) RT(i)=O(KnlogTΔ+poly(K,n,1Δ)),R_{T}^{(i)}=O\left(\frac{K}{n}\frac{\log T}{\Delta}+\text{poly}\left(K,n,\frac{1}{\Delta}\right)\right),

as compared to O(Klog(T)/Δ)O(K\log(T)/\Delta) regret for running a single-agent algorithm in isolation.

1.2. Robust multi-agent bandits on the complete graph

Despite these improved bounds, (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) require all agents to execute the prescribed algorithm, and in particular, to recommend best arm estimates to their neighbors. As pointed out in (Vial et al., 2021), this may be unrealistic: in Example 2, review spam can be modeled as bad arm recommendations, while in Example 1, servers may fail entirely. Hence, (Vial et al., 2021) considers a more realistic setting where nn honest agents recommend best arm estimates but mm malicious agents recommend arbitrarily. For this setting, the authors propose a robust version of the algorithm from (Chawla et al., 2020) where honest agents block suspected malicious agents. More specifically, (Vial et al., 2021) considers the following blocking rule:

  • If agent ii^{\prime} recommends arm kk to honest agent ii, but arm kk subsequently performs poorly for ii – in the sense that the upper confidence bound (UCB) algorithm does not select it sufficiently often – then ii temporarily suspends communication with ii^{\prime}.

As shown in (Vial et al., 2021), this blocking scheme prevents each malicious agent from recommending more than O(1)O(1) bad arms long-term, which (effectively) results in an O(m+K/n)O(m+K/n)-armed bandit (O(m)O(m) malicious recommendations, plus the O(K/n)O(K/n)-sized sticky set). Under the assumption that honest and malicious agents are connected by the complete graph, this allows (Vial et al., 2021) to prove

(2) RT(i)=O((Kn+m)logTΔ+poly(K,n,m,1Δ)).R_{T}^{(i)}=O\left(\left(\frac{K}{n}+m\right)\frac{\log T}{\Delta}+\text{poly}\left(K,n,m,\frac{1}{\Delta}\right)\right).

In (Vial et al., 2021), it is also shown that blocking is necessary: for any nn\in\mathbb{N}, if even m=1m=1 malicious agent is present, the algorithm from (Chawla et al., 2020) (which does not use blocking) incurs Ω(Klog(T)/Δ)\Omega(K\log(T)/\Delta) regret. Thus, one malicious agent negates the improvement over the single-agent baseline.

1.3. Objective and challenges

Our main goal is to generalize the results of (Vial et al., 2021) from the complete graph to the case where the honest agent subgraph is only connected and undirected. This is nontrivial because (Vial et al., 2021) relies heavily on the complete graph assumption. In particular, the analysis in (Vial et al., 2021) requires that ii^{\star} (the agent with the best arm in its sticky set) itself recommends the best arm to each of the other honest agents. In other words, each honest agent iii\neq i^{\star} relies on ii^{\star} to inform them of the best arm, which means ii^{\star} must be a neighbor of ii. Thus, to extend (2) beyond complete graphs, we need to show a gossip process unfolds (like in the fully cooperative case): ii^{\star} recommends the best arm to its neighbors, who recommend it to their neighbors, etc., spreading it through the network.

The challenge is that, while blocking is necessary to prevent Ω(Klog(T)/Δ)\Omega(K\log(T)/\Delta) regret, it also causes honest agents to accidentally block each other. Indeed, due to the aforementioned blocking rule and the noisy rewards, they will block each other until they collect enough samples to reliably identify good arms. From a network perspective, accidental blocking means that edges in the subgraph of honest agents temporarily fail. Consequently, it is not clear if the best arm spreads to all honest agents, or if (for example) this subgraph eventually becomes disconnected, preventing the spread and causing the agents who do not receive the best arm to suffer Θ(T)\Theta(T) regret.

Analytically, accidental blocking means we must deal with a gossip process over a dynamic graph. This process is extremely complicated, because the graph dynamics are driven by the bandit algorithms, which in turn affect the future evolution of the graph. Put differently, blocking causes the randomness of the communication protocol and that of the bandit algorithms to become interdependent. We note this does not occur for the original non-blocking algorithm, where the two sources of randomness can be cleverly decoupled and separately analyzed – see (Chawla et al., 2020, Proposition 4). Thus, in contrast to existing work, we need to analyze the interdependent processes directly.

1.4. Our contributions

Failure of the existing blocking rule: In Section 4, we show that the algorithm from (Vial et al., 2021) fails to achieve a regret bound of the form (2) for connected and undirected graphs in general. Toward this end, we define a natural “bad instance” in which n=Kn=K, the honest agent subgraph is an undirected line (thus connected), and all honest agents share a malicious neighbor. For this instance, we propose a malicious strategy that causes honest agents to repeatedly block one another, which results in the best arm spreading extremely slowly. More specifically, we show that if honest agents run the algorithm from (Vial et al., 2021), then the best arm does not reach honest agent nn (the one at the end of the line) until time is doubly exponential in n=Kn=K. Note (Vial et al., 2021) shows the best arm spreads polynomially fast for the complete graph, so we demonstrate a doubly exponential slowdown for complete versus line graphs. This is rather surprising, because for classical rumor processes that do not involve bandits or blocking (see, e.g., (Pittel, 1987)), the slowdown is only exponential (i.e., Θ(logn)\Theta(\log n) rumor spreading time on the complete graph versus Θ(n)\Theta(n) on the line graph). As a consequence of the slow spread, we show the algorithm from (Vial et al., 2021) suffers regret

(3) RT(n)=Ω(min{log(T)+exp(exp(n3)),Tlog7T}),R_{T}^{(n)}=\Omega\left(\min\left\{\log(T)+\exp\left(\exp\left(\frac{n}{3}\right)\right),\frac{T}{\log^{7}T}\right\}\right),

i.e., it incurs (nearly) linear regret until time Ω(exp(exp(n/3)))\Omega(\exp(\exp(n/3))) and thereafter incurs logarithmic regret but with a huge additive term (see Theorem 1).

Refined blocking rule: In light of this negative result, we propose a refined blocking rule in Section 5. Roughly, our rule is as follows: agent ii blocks ii^{\prime} for recommending arm kk if

  • arm kk performs poorly, i.e., it is not chosen sufficiently often by UCB,

  • and agent ii has not changed its own best arm estimate recently.

The second criterion is the key distinction from (Vial et al., 2021). Intuitively, it says that agents should not block for seemingly-poor recommendations until they become confident that their own best arm estimates have settled on truly good arms. This idea is the main new algorithmic insight of the paper. It is directly motivated by the negative result of Section 4; see Remark 5.

Gossip despite blocking: Analytically, our main contribution is to show that, with our refined blocking rule, the best arm quickly spreads to all honest agents. The proof is quite involved; we provide an outline in Section 7. At a very high level, the idea is to show that honest agents using our blocking rule eventually stop blocking each other. Thereafter, we can couple the arm spreading process with a much more tractable noisy rumor process that involves neither bandits nor blocking (see Definition 1), and that is guaranteed to spread the best arm in polynomial time.

Regret upper bound: Combining our novel gossip analysis with some existing regret minimization techniques, we show in Section 5 that our refined algorithm enjoys the regret bound

(4) RT(i)=O((Kn+dmal(i))logTΔ+poly(K,n,m,1Δ)),R_{T}^{(i)}=O\left(\left(\frac{K}{n}+d_{\text{mal}}(i)\right)\frac{\log T}{\Delta}+\text{poly}\left(K,n,m,\frac{1}{\Delta}\right)\right),

where dmal(i)d_{\text{mal}}(i) is the number of malicious neighbors of ii (see Theorem 2). Thus, our result generalizes (2) from the complete graph (where dmal(i)=md_{\text{mal}}(i)=m) to connected and undirected graphs. Moreover, note the leading logT\log T term in (4) is entirely local – only the dmal(i)d_{\text{mal}}(i) malicious agents directly connected to ii affect its long-term regret. For example, in the sparse regime dmal(i)=O(1)d_{\text{mal}}(i)=O(1), our logT\log T term matches the one in (1) up to constants, which (we recall) (Chawla et al., 2020; Newton et al., 2021) proved in the case where there are no malicious agents anywhere in the network. In fact, for honest agents ii with dmal(i)=0d_{\text{mal}}(i)=0, we can prove that the logT\log T term in our regret bound matches the corresponding term from (Chawla et al., 2020), including constants (see Corollary 2). In other words, we show that for large horizons TT, the effects of malicious agents do not propagate beyond one-step neighbors. Furthermore, we note that the additive term in (4) is polynomial in all parameters, whereas for the existing algorithm it can be doubly exponential in KK and nn, as shown in (3) and discussed above.

Numerical results: In Section 6, we replicate the experiments from (Vial et al., 2021) and extend them from the complete graph to G(n+m,p)G(n+m,p) random graphs. Among other findings, we show that for p=1/2p=1/2 and p=1/4p=1/4, respectively, the algorithm from (Vial et al., 2021) can perform worse than the non-blocking algorithm from (Chawla et al., 2020) and the single-agent baseline, respectively. In other words, the existing blocking rule becomes a liability as pp decreases from the extreme case p=1p=1 considered in (Vial et al., 2021). In contrast, we show that our refined rule has lower regret than (Chawla et al., 2020) across the range of pp tested. Additionally, it outperforms (Vial et al., 2021) on average for all but the largest pp and has much lower variance for smaller pp.

Summary: Ultimately, the high-level messages of this paper are twofold:

  • In multi-agent bandits with malicious agents, we can devise algorithms that simultaneously (1) learn useful information and spread it through the network via gossip, and (2) learn who is malicious and block them to mitigate the harm they cause. Moreover, this harm is local in the sense that it only affects one-hop neighbors.

  • However, blocking must be done carefully; algorithms designed for the complete graph may spread information extremely slowly on general graphs. In particular, the slowdown can be doubly exponential, much worse than the exponential slowdown of simple rumor processes.

1.5. Other related work

In addition to the paper (Vial et al., 2021) discussed above, several others have considered multi-agent bandits where some of the agents are uncooperative. In (Awerbuch and Kleinberg, 2008), the honest agents face a non-stochastic (i.e., adversarial) bandit (Auer et al., 1995) and communicate at every time step, in contrast to the stochastic bandit and limited communication of our work. The authors of (Mitra et al., 2021) consider the objective of best arm identification (Audibert and Bubeck, 2010) instead of cumulative regret. Most of their paper involves a different communication model where the agents/clients collaborate via a central server; Section 6 studies a “peer-to-peer” model which is closer to ours but requires additional assumptions on the number of malicious neighbors. A different line of work considers the case where an adversary can corrupt the observed rewards (see, e.g., (Bogunovic et al., 2020, 2021; Garcelon et al., 2020; Gupta et al., 2019; Jun et al., 2018; Kapoor et al., 2019; Liu and Shroff, 2019; Liu et al., 2021; Lykouris et al., 2018), and the references therein), which is distinct from the role that malicious agents play in our setting.

For the fully cooperative case, there are several papers with communication models that differ from the aforementioned (Chawla et al., 2020; Newton et al., 2021; Sankararaman et al., 2019). For example, agents in (Buccapatnam et al., 2015; Chakraborty et al., 2017) broadcast information instead of exchanging pairwise arm recommendations, communication in (Kolla et al., 2018; Lalitha and Goldsmith, 2021; Martínez-Rubio et al., 2019) is more frequent, the number of transmissions in (Madhushani and Leonard, 2021) depends on Δ1\Delta^{-1} so could be large, and agents in (Landgren et al., 2016) exchange arm mean estimates instead of (bit-limited) arm indices.

More broadly, other papers have studied fully cooperative variants of different bandit problems. These include minimizing simple instead of cumulative regret (e.g., (Hillel et al., 2013; Szörényi et al., 2013)), minimizing the total regret across agents rather than ensuring all have low regret (e.g., (Dubey et al., 2020a; Wang et al., 2020)), contextual instead of multi-armed bandits (e.g., (Chawla et al., 2022; Dubey et al., 2020b; Dubey and Pentland, 2020; Korda et al., 2016; Tekin and Van Der Schaar, 2015)), adversarial rather than stochastic bandits (e.g., (Bar-On and Mansour, 2019; Cesa-Bianchi et al., 2016; Kanade et al., 2012)), and bandits that vary across agents (e.g., (Bistritz and Leshem, 2018; Shahrampour et al., 2017; Zhu et al., 2021)). Another long line of work features collision models where rewards are lower if multiple agents simultaneously pull the same arm (e.g., (Anandkumar et al., 2011; Avner and Mannor, 2014; Boursier and Perchet, 2019; Dakdouk et al., 2021; Kalathil et al., 2014; Liu and Zhao, 2010; Liu et al., 2020; Mansour et al., 2018; Rosenski et al., 2016)), unlike our model. Along these lines, other reward structures have been studied, such as reward being a function of the agents’ joint action (e.g., (Bargiacchi et al., 2018; Bistritz and Bambos, 2020; Kao et al., 2022)).

1.6. Organization

The rest of the paper is structured as follows. We begin in Section 2 with definitions. In Section 3, we introduce the algorithm from (Vial et al., 2021). Sections 4 and 5 discuss the existing and proposed blocking rules. Section 6 contains experiments. We discuss our analysis in Section 7 and close in Section 8.

2. Preliminaries

Communication network: Let G=([n+m],E)G=([n+m],E) be an undirected graph with vertices [n+m]={1,,n+m}[n+m]=\{1,\ldots,n+m\}. We call [n][n] the honest agents and assume they execute the forthcoming algorithm. The remaining agents are termed malicious; their behavior will be specified shortly. For instance, honest and malicious agents represent functioning and failed servers in Example 1. The edge set EE encodes which agents are allowed to communicate, e.g., if (i,i)E(i,i^{\prime})\in E, the ii-th and ii^{\prime}-th servers can communicate in the forthcoming algorithm.

Denote by Ehon={(i,i)E:i,i[n]}E_{\text{hon}}=\{(i,i^{\prime})\in E:i,i^{\prime}\in[n]\} the edges between honest agents and Ghon=([n],Ehon)G_{\text{hon}}=([n],E_{\text{hon}}) the honest agent subgraph. For each i[n]i\in[n], we let N(i)={i[n+m]:(i,i)E}N(i)=\{i^{\prime}\in[n+m]:(i,i^{\prime})\in E\} denote its neighbors, Nhon(i)=N(i)[n]N_{\text{hon}}(i)=N(i)\cap[n] its honest neighbors, and Nmal(i)=N(i)[n]N_{\text{mal}}(i)=N(i)\setminus[n] its malicious neighbors. We write d(i)=|N(i)|d(i)=|N(i)|, dhon(i)=|Nhon(i)|d_{\text{hon}}(i)=|N_{\text{hon}}(i)|, and dmal(i)=|Nmal(i)|d_{\text{mal}}(i)=|N_{\text{mal}}(i)| for the associated degrees, and d¯=maxi[n]d(i)\bar{d}=\max_{i\in[n]}d(i), d¯hon=maxi[n]dhon(i)\bar{d}_{\text{hon}}=\max_{i\in[n]}d_{\text{hon}}(i), and d¯mal=maxi[n]dmal(i)\bar{d}_{\text{mal}}=\max_{i\in[n]}d_{\text{mal}}(i) for the maximal degrees. We make the following assumption, which generalizes the complete graph case of (Vial et al., 2021).

Assumption 1.

The honest agent subgraph GhonG_{\text{hon}} is connected, i.e., for any i,i[n]i,i^{\prime}\in[n], there exists ll\in\mathbb{N} and i0,i1,,il[n]i_{0},i_{1},\ldots,i_{l}\in[n] such that i0=ii_{0}=i, (ij1,ij)Ehonj[l](i_{j-1},i_{j})\in E_{\text{hon}}\ \forall\ j\in[l], and il=ii_{l}=i^{\prime}.

Multi-armed bandit: We consider the standard stochastic multi-armed bandit (Lattimore and Szepesvári, 2020, Chapter 4). Denote by KK\in\mathbb{N} the number of arms and [K][K] the set of arms. For each k[K]k\in[K], we let νk\nu_{k} be a probability distribution over \mathbb{R} and {Xi,t(k)}i[n],t\{X_{i,t}(k)\}_{i\in[n],t\in\mathbb{N}} an i.i.d. sequence of rewards sampled from νk\nu_{k}. The interpretation is that, if the ii-th honest agent chooses the kk-th arm at time tt, it earns reward Xi,t(k)X_{i,t}(k). The objective (to be formalized shortly) is reward maximization. In Example 2, for instance, [K][K] represents the set of restaurants in a city, and the reward Xi,t(k)X_{i,t}(k) quantifies how much person ii enjoys restaurant kk if they dine there on day tt.

For each arm k[K]k\in[K], we let μk=𝔼[Xi,t(k)]\mu_{k}=\mathbb{E}[X_{i,t}(k)] denote the corresponding expected reward. Without loss of generality, we assume the arms are labeled such that μ1μK\mu_{1}\geq\cdots\geq\mu_{K}. We additionally assume the following, which generalizes the νk=Bernoulli(μk)\nu_{k}=\text{Bernoulli}(\mu_{k}) and μ1>μ2\mu_{1}>\mu_{2} setting of (Vial et al., 2021). Notice that under this assumption, the arm gap Δkμ1μk\Delta_{k}\triangleq\mu_{1}-\mu_{k} is strictly positive.

Assumption 2.

Rewards are [0,1][0,1]-valued, i.e., for each k[K]k\in[K], νk\nu_{k} is a distribution over [0,1][0,1]. Furthermore, the best arm is unique, i.e., μ1>μ2\mu_{1}>\mu_{2}.

Objective: For each i[n]i\in[n] and tt\in\mathbb{N}, let It(i)[K]I_{t}^{(i)}\in[K] denote the arm chosen by honest agent ii at time tt. Our goal is to minimize the regret RT(i)R_{T}^{(i)}, which is the expected additive loss in cumulative reward for agent ii’s sequence of arm pulls {It(i)}t=1T\{I_{t}^{(i)}\}_{t=1}^{T} compared to the optimal policy that always chooses the best arm 11. More precisely, we define regret as follows:

(5) RT(i)t=1T𝔼[Xi,t(1)Xi,t(It(i))]=t=1T𝔼[μ1μIt(i)]=t=1T𝔼[ΔIt(i)].R_{T}^{(i)}\triangleq\sum_{t=1}^{T}\mathbb{E}\left[X_{i,t}(1)-X_{i,t}(I_{t}^{(i)})\right]=\sum_{t=1}^{T}\mathbb{E}\left[\mu_{1}-\mu_{I_{t}^{(i)}}\right]=\sum_{t=1}^{T}\mathbb{E}\left[\Delta_{I_{t}^{(i)}}\right].

3. Algorithm

We next discuss the algorithm from (Vial et al., 2021) (Algorithm 1 below), which modifies the one from (Chawla et al., 2020) to include blocking. For ease of exposition, we begin by discussing the key algorithmic design principles from (Chawla et al., 2020) in Section 3.1. We then define Algorithm 1 formally in Section 3.2. Finally, we introduce and discuss one additional assumption in Section 3.3.

3.1. Key ideas of the non-blocking algorithm

Refer to caption
Figure 1. Illustration of the phases in Algorithm 1; see beginning of Section 3.1 for details.
\Description

Illustration of the phases in Algorithm 1; see beginning of Section 3.1 for details.

Refer to caption
(a) Initial active sets
Refer to caption
(b) Recommendations
Refer to caption
(c) Updated active sets
Refer to caption
(d) Later active sets
Refer to caption
(e) Recommendations
Refer to caption
(f) Updated active sets
Figure 2. Illustration of the active sets in Algorithm 1; see Example 3 for details.
\Description

Illustration of the active sets in Algorithm 1; see Example 3 for details.

We assume m=0m=0 this subsection and describe the non-blocking algorithm from (Chawla et al., 2020).

  • Phases: In (Chawla et al., 2020), the time steps 1,,T1,\ldots,T are grouped into phases, whose role is twofold. First, within the jj-th phase, the ii-th honest agent only pulls arms belonging to a particular subset Sj(i)[K]S_{j}^{(i)}\subset[K]. We call these active sets and detail their construction next. Second, at the end of the jj-th phase, the agents construct new active sets by exchanging arm recommendations with neighbors, in a manner to be described shortly. See Figure 1 for a pictorial description. Notice that the phase durations are increasing, which will be discussed in Section 3.2.

  • Active sets: The active set Sj(i)S_{j}^{(i)} will always contain a subset of arms S^(i)[K]\hat{S}^{(i)}\subset[K] that does not vary with the phase jj. Following (Chawla et al., 2020; Vial et al., 2021), we call S^(i)\hat{S}^{(i)} the sticky set and its elements sticky arms. The sticky sets ensure that each arm is explored by some agent, as will be seen in the forthcoming example. In addition, Sj(i)S_{j}^{(i)} will contain two non-sticky arms that are dynamically updated across phases jj based on arm recommendations from neighbors.

  • Arm recommendations: After the jj-th phase, each agent ii contacts a random neighbor, who responds with whichever of their active arms performed “best” in the current phase. Upon receiving this recommendation, ii adds it to its active set and discards whichever currently-active non-sticky arm (i.e., whichever element of Sj(i)S^(i)S_{j}^{(i)}\setminus\hat{S}^{(i)}) performed “worse”. (We quantify “best” and “worse” in the formal discussion of Section 3.2.)

Example 3.

Each subfigure of Figure 2 depicts n=3n=3 honest agents as circles and their active sets as rectangles. The blue rectangles are sticky sets, the orange rectangles are non-sticky arms, and the arms are sorted by performance. For example, the left agent in Figure 2(a) has sticky set {1,2}\{1,2\} and active set {1,2,3,6}\{1,2,3,6\} and believes arm 33 to be the best of these. Note the blue sticky sets partition [K]=[6][K]=[6], so at each phase, each arm is active for some agent. This ensures the best arm is never permanently discarded during the arm recommendations discussed above. Figure 2(b) shows agents recommending the active arms they believe are best, and Figure 2(c) depicts the updated active sets. For instance, the left agent replaces its worse non-sticky arm 66 with the recommendation 55. Figure 2(d) shows a later phase where the best arm 11 has spread to all agents, who have all identified it as such. Thereafter, all agents recommend 11, so the active set remains fixed (Figures 2(e) and 2(f)). Hence, all agents eventually exploit the best arm while only exploring a subset of the suboptimal arms (three instead of five here).

3.2. Formal definition of the blocking algorithm

The algorithm in (Vial et al., 2021) supplements the one from Section 3.1 with a blocking procedure. Specifically, honest agents run the algorithm from (Chawla et al., 2020) while maintaining blocklists of neighbors they are unwilling to communicate with. This approach is defined in Algorithm 1 and detailed next.

1
2Input: UCB parameter α>0\alpha>0, phase parameter β>1\beta>1, sticky arms S^(i)\hat{S}^{(i)} with |S^(i)|=SK2|\hat{S}^{(i)}|=S\leq K-2
3Initialize Aj=jβ,Pj(i)=jA_{j}=\lceil j^{\beta}\rceil,P_{j}^{(i)}=\emptyset\ \forall\ j\in\mathbb{N} (communication times and blocklists)
4Set j=1j=1 (current phase), let {Uj(i),Lj(i)}[K]S^(i)\{U_{j}^{(i)},L_{j}^{(i)}\}\subset[K]\setminus\hat{S}^{(i)} be two distinct non-sticky arms and Sj(i)=S^(i){Uj(i),Lj(i)}S_{j}^{(i)}=\hat{S}^{(i)}\cup\{U_{j}^{(i)},L_{j}^{(i)}\} the initial active set
5for tt\in\mathbb{N} do
6      
7      Pull It(i)=argmaxkSj(i)(μ^k(i)(t1)+αlog(t)/Tk(i)(t1))I_{t}^{(i)}=\operatorname*{arg\,max}_{k\in S_{j}^{(i)}}\left(\hat{\mu}_{k}^{(i)}(t-1)+\sqrt{\alpha\log(t)/T_{k}^{(i)}(t-1)}\right) (UCB over active set)
8      if t=Ajt=A_{j} (if communication occurs at this time) then
9            
10            Bj(i)=argmaxkSj(i)(Tk(i)(Aj)Tk(i)(Aj1))B_{j}^{(i)}=\operatorname*{arg\,max}_{k\in S_{j}^{(i)}}\left(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1})\right) (most played active arm in this phase)
11            {Pj(i)}j=jUpdate-Blocklist({Pj(i)}j=j)\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}\leftarrow\texttt{Update-Blocklist}(\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}) ((Vial et al., 2021) uses Alg. 3; we propose Alg. 4)
12            (Hj(i),Rj(i))=Get-Recommendation(i,j,Pj(i))(H_{j}^{(i)},R_{j}^{(i)})=\texttt{Get-Recommendation}(i,j,P_{j}^{(i)}) (see Alg. 2)
13            if Rj(i)Sj(i)R_{j}^{(i)}\notin S_{j}^{(i)} (if recommendation not already active) then
14                  
15                  Uj+1(i)=argmaxk{Uj(i),Lj(i)}(Tk(i)(Aj)Tk(i)(Aj1))U_{j+1}^{(i)}=\operatorname*{arg\,max}_{k\in\{U_{j}^{(i)},L_{j}^{(i)}\}}\left(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1})\right) (best non-sticky active arm)
16                  Lj+1(i)=Rj(i)L_{j+1}^{(i)}=R_{j}^{(i)} (replace worst non-sticky active arm with recommendation)
17                  Sj+1(i)=S^(i){Uj+1(i),Lj+1(i)}S_{j+1}^{(i)}=\hat{S}^{(i)}\cup\{U_{j+1}^{(i)},L_{j+1}^{(i)}\} (new active set is sticky set and two non-sticky arms)
18            else
19                  
20                  Sj+1(i)=Sj(i)S_{j+1}^{(i)}=S_{j}^{(i)} (keep the same active set, since recommendation is already active)
21            
22            jj+1j\leftarrow j+1 (increment phase)
Algorithm 1 Multi-agent bandits with blocking (executed by i[n]i\in[n])
1
2Input: Agent i{1,,n}i\in\{1,\ldots,n\}, phase jj\in\mathbb{N}, blocklist Pj(i)P_{j}^{(i)}
3Sample Hj(i)H_{j}^{(i)} from N(i)Pj(i)N(i)\setminus P_{j}^{(i)} (non-blocked neighbors) uniformly at random
4if Hj(i)nH_{j}^{(i)}\leq n (if the sampled agent is honest) then
5      
6      Set Rj(i)=Bj(Hj(i))R_{j}^{(i)}=B_{j}^{(H_{j}^{(i)})} (honest agents recommend most played arm from this phase)
7else
8      
9      Choose Rj(i)[K]R_{j}^{(i)}\in[K] arbitrarily (malicious agents recommend arbitrary arms)
10Output: (Hj(i),Rj(i))(H_{j}^{(i)},R_{j}^{(i)})
Algorithm 2 (Hj(i),Rj(i))=Get-Recommendation(i,j,Pj(i))(H_{j}^{(i)},R_{j}^{(i)})=\texttt{Get-Recommendation}(i,j,P_{j}^{(i)}) (black box to i[n]i\in[n])

Inputs (Line 1): The first input is a standard UCB exploration parameter α>0\alpha>0, which will be discussed shortly. The input β>1\beta>1 controls the lengths of the phases; namely, the jj-th phase encompasses times 1+Aj1,,Aj1+A_{j-1},\ldots,A_{j}, where AjjβA_{j}\triangleq\lceil j^{\beta}\rceil. Note the phase duration AjAj1=O(jβ1)A_{j}-A_{j-1}=O(j^{\beta-1}) grows with jj, as shown in Figure 1. The final input is an SS-sized sticky set S^(i)\hat{S}^{(i)} (S=2S=2 in Example 3), which, as in (Vial et al., 2021), we assume are provided to the agents (see Section 3.3 for more details).

Initialization (Lines 1-1): To start, ii initializes the times AjA_{j} at which the jj-th phase ends, along with the blocklist Pj(i)P_{j}^{(i)}. Additionally, ii chooses two distinct (but otherwise arbitrary) non-sticky arms U1(i)U_{1}^{(i)} and L1(i)L_{1}^{(i)} and constructs the active set S1(i)=S^(i){U1(i),L1(i)}S_{1}^{(i)}=\hat{S}^{(i)}\cup\{U_{1}^{(i)},L_{1}^{(i)}\}. Notice that the active set contains the sticky set and two arms that depend on the phase, as described in Section 3.1.

UCB over the active set (Line 1): As was also mentioned in Section 3.1, ii only pulls arms from its current active set Sj(i)S_{j}^{(i)}. More specifically, at each time tt during phase jj, ii chooses the active arm It(i)Sj(i)I_{t}^{(i)}\in S_{j}^{(i)} that maximizes the UCB in Line 1 (see (Lattimore and Szepesvári, 2020, Chapters 7-10) for background). Here Tk(i)(t1)T_{k}^{(i)}(t-1) and μ^k(i)(t1)\hat{\mu}_{k}^{(i)}(t-1) are the number of pulls of kk and the empirical mean of those pulls, i.e.,

Tk(i)(t1)=s[t1]𝟙(Is(i)=k),μ^k(i)(t1)=1Tk(i)(t1)s[t1]:Is(i)=kXi,s(k),T_{k}^{(i)}(t-1)=\sum_{s\in[t-1]}\mathbbm{1}(I_{s}^{(i)}=k),\quad\hat{\mu}_{k}^{(i)}(t-1)=\frac{1}{T_{k}^{(i)}(t-1)}\sum_{s\in[t-1]:I_{s}^{(i)}=k}X_{i,s}(k),

where Xi,s(k)νkX_{i,s}(k)\sim\nu_{k} as in Section 2 and 𝟙\mathbbm{1} is the indicator function.

Best arm estimate (Line 1): At the end of phase jj (i.e., when t=Ajt=A_{j}), ii defines its best arm estimate Bj(i)B_{j}^{(i)} as the active arm it played the most in phase jj. The intuition is that, for large horizons, the arm chosen most by UCB is a good estimate of the true best arm (Bubeck et al., 2011). Thus, because phase lengths are increasing (see Figure 1), Bj(i)B_{j}^{(i)} will be a good estimate of the best active arm for large jj.

Blocklist update (Line 1): Next, ii calls the Update-Blocklist subroutine to update its blocklist Pj(i)P_{j}^{(i)}. The implementation of this subroutine is the key distinction between (Vial et al., 2021) and our work. We discuss the respective implementations in Sections 4 and 5, respectively.

Arm recommendations (Line 1): Having updated Pj(i)P_{j}^{(i)}, ii requests an arm recommendation Rj(i)R_{j}^{(i)} via Algorithm 2. Algorithm 2 is a black box (i.e., ii provides the input and observes the output), which samples a random non-blocked neighbor Hj(i)N(i)Pj(i)H_{j}^{(i)}\in N(i)\setminus P_{j}^{(i)}. If Hj(i)H_{j}^{(i)} is honest, it recommends its best arm estimate, while if malicious, it recommends an arbitrary arm.222Technically, malicious recommendations need to be measurable; see (Vial et al., 2021, Section 3) for details.

Updating the active set (Lines 1-1): Finally, ii updates its active set as in Section 3.1. In particular, if the recommendation Rj(i)R_{j}^{(i)} is not currently active333If the recommendation is currently active, the active set remains unchanged (see Line 1 of Algorithm 1)., ii defines Uj+1(i)U_{j+1}^{(i)} to be the non-sticky arm that performed better in phase jj, in the sense that UCB chose it more often (following the above intuition from (Bubeck et al., 2011)). The other non-sticky arm Lj+1(i)L_{j+1}^{(i)} becomes the recommendation Rj(i)R_{j}^{(i)}, and the new active set becomes Sj+1(i)=S^(i){Uj+1(i),Lj+1(i)}S_{j+1}^{(i)}=\hat{S}^{(i)}\cup\{U_{j+1}^{(i)},L_{j+1}^{(i)}\} (the sticky set and two other arms, as above).

3.3. Additional assumption

Observe that Algorithm 1 does not preclude the case where the best arm is not in any honest agent’s sticky set, i.e., 1i=1nS^(i)1\notin\cup_{i=1}^{n}\hat{S}^{(i)}. In this case, the best arm may be permanently discarded, which causes linear regret even in the absence of malicious agents. For example, this would occur if 11 was not a sticky arm for the left agent in Figure 2 (since the right agent discards 11 in Figure 2(c)). To prevent this situation, we will follow (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021; Vial et al., 2021) in assuming the following.

Assumption 3.

There exists i[n]i^{\star}\in[n] with the best arm in its sticky set, i.e., 1S^(i)1\in\hat{S}^{(i^{\star})}.

Remark 1.

As discussed in (Chawla et al., 2020, Appendix N), Assumption 3 holds with high probability if SS (the size of the sticky set input to Algorithm 1) is set to Θ~(K/n)\tilde{\Theta}(K/n) and each sticky set S^(i)\hat{S}^{(i)} is sampled uniformly at random from the SS-sized subsets of [K][K].

Remark 2.

The choice S=Θ~(K/n)S=\tilde{\Theta}(K/n) from Remark 1 requires the honest agents to know an order-accurate estimate of nn, i.e., they need to know some n=Θ~(n)n^{\prime}=\tilde{\Theta}(n) in order to set S=Θ~(K/n)S=\tilde{\Theta}(K/n^{\prime}) and ensure that S=Θ~(K/n)S=\tilde{\Theta}(K/n). As discussed in (Vial et al., 2021, Remark 7), this amounts to knowing order-accurate estimates of n+mn+m and n/(n+m)n/(n+m). The former quantity is the total number of agents, knowledge of which is rather benign and is also assumed in the fully-cooperative setting (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021). The latter requires the agents to know that, e.g., half of the others are honest, which is similar in spirit to the assumptions in related problems regarding social learning in the presence of adversarial agents (e.g., (LeBlanc et al., 2013)).

Remark 3.

Alternatively, we can avoid Assumption 3 entirely by defining the set of the arms to be those initially known by the honest agents (i.e., their sticky sets), rather than sampling the sticky sets from a larger “base set” as in Remark 1. In this alternative model, the honest agents aim to identify and spread through the network whichever of the initially-known arms is best, similar to what happens on platforms like Yelp (see Example 2). In contrast, the Section 2 model allows for the pathological case where the base set contains a better arm than any initially known to honest agents (e.g., where no honest Yelp user has ever dined at the best restaurant). Coping with these pathological cases either requires Assumption 3, or another mode of exploration (i.e., exploration of base arms) that obfuscates the key point of our work (collaborative bandit exploration amidst adversaries). For these reasons, we prefer the alternative model, but to enable a cleaner comparison with prior work (Vial et al., 2021), we restrict attention to the Section 2 model (which generalizes that of (Vial et al., 2021)).

4. Existing blocking rule

We can now define the blocking approach from (Vial et al., 2021), which is provided in Algorithm 3. In words, the rule is as follows: if the recommendation Rj1(i)R_{j-1}^{(i)} from phase j1j-1 is not ii’s most played arm in the subsequent phase jj, then the agent Hj1(i)H_{j-1}^{(i)} who recommended it is added to the blocklists Pj(i),,Pjη(i)P_{j}^{(i)},\ldots,P_{j^{\eta}}^{(i)}, where η>1\eta>1 is a tuning parameter. By Algorithm 2, this means ii blocks (i.e., does not communicate with) Hj1(i)H_{j-1}^{(i)} until phase jη+1j^{\eta}+1 (at the earliest). Thus, agents block others whose recommendations perform poorly – in the sense that UCB does not play them often – and the blocking becomes more severe as the phase counter jj grows. See (Vial et al., 2021, Remark 4) for further intuition.

1
2if j>1j>1 and Bj(i)Rj1(i)B_{j}^{(i)}\neq R_{j-1}^{(i)} (if previous recommendation not most played) then
3      
4      Pj(i)Pj(i){Hj1(i)}j{j,,jη}P_{j^{\prime}}^{(i)}\leftarrow P_{j^{\prime}}^{(i)}\cup\{H_{j-1}^{(i)}\}\ \forall\ j^{\prime}\in\{j,\ldots,\lceil j^{\eta}\rceil\} (block the recommender until phase jηj^{\eta})
Algorithm 3 {Pj(i)}j=j=Update-Blocklist\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}=\texttt{Update-Blocklist} (executed by i[n]i\in[n], existing rule from (Vial et al., 2021))

In the remainder of this section, we define a bad instance (Section 4.1) on which this blocking rule provably fails (Section 4.2). Our goal here is to demonstrate a single such instance in order to show this blocking rule must be refined. Therefore, we have opted for a concrete example, which includes some numerical constants (e.g., 13/1513/15 in (6), the 77 in the log7T\log^{7}T term in Theorem 1, etc.) that have no particular meaning. Nevertheless, the instance can be generalized; see Remark 4.

4.1. Bad instance

The network and bandit for the bad instance are as follows:

  • There are an even number of honest agents (at least four) arranged in a line, increasing in index from left to right, and there is a malicious agent connected to each of the honest ones. Mathematically, we have n{4,6,8,}n\in\{4,6,8,\ldots\}, m=1m=1, and E={(i,i+1)}i=1n1{(i,n+1)}i=1nE=\{(i,i+1)\}_{i=1}^{n-1}\cup\{(i,n+1)\}_{i=1}^{n}.

  • There are K=nK=n arms that generate deterministic rewards (i.e., νk=δμk\nu_{k}=\delta_{\mu_{k}}) with

    (6) μ1=1,μk=1315+h=1(n/2)k22h+1k{2,,n/2},μk=0k>n/2.\mu_{1}=1,\quad\mu_{k}=\frac{13}{15}+\sum_{h=1}^{(n/2)-k}2^{-2^{h+1}}\ \forall\ k\in\{2,\ldots,n/2\},\quad\mu_{k}=0\ \forall\ k>n/2.

    Intuitively, there are three sets of arms: the best arm, (n/2)1(n/2)-1 mediocre arms, and n/2n/2 bad arms. We provide further intuition in the forthcoming proof sketch. For now, we highlight three key properties. First, the gap from mediocre to bad arms is constant, i.e., μk1μk213/15\mu_{k_{1}}-\mu_{k_{2}}\geq 13/15 when k1n/2<k2k_{1}\leq n/2<k_{2}. Second, the gaps between mediocre arms are doubly exponentially small, i.e., μkμk+1=22(n/2)k+1\mu_{k}-\mu_{k+1}=2^{-2^{(n/2)-k+1}} for k{2,,(n/2)1}k\in\{2,\ldots,(n/2)-1\}. Third, the gap Δ2\Delta_{2} from the best to the mediocre arms is at least 1/151/15, as shown in Appendix C.

Observation 1.

Since rewards are deterministic, the most played arm Bj+1(i)B_{j+1}^{(i)} in phase j+1j+1 is a deterministic function of the number of plays of the active arms at the beginning of the phase, i.e., of the set {Tk(i)(Aj)}kSj+1(i)\{T_{k}^{(i)}(A_{j})\}_{k\in S_{j+1}^{(i)}}. Hence, when the jj-th recommendation is already active (i.e., when Rj(i)Sj(i)R_{j}^{(i)}\in S_{j}^{(i)}, which implies Sj+1(i)=Sj(i)S_{j+1}^{(i)}=S_{j}^{(i)} in Algorithm 1), Bj+1(i)B_{j+1}^{(i)} is a function of {Tk(i)(Aj)}kSj(i)\{T_{k}^{(i)}(A_{j})\}_{k\in S_{j}^{(i)}}, which is information available to the malicious agent at the jj-th communication time AjA_{j}. Consequently, the malicious agent can always recommend some Rj(i)Sj(i)R_{j}^{(i)}\in S_{j}^{(i)} such that Bj+1(i)=Rj(i)B_{j+1}^{(i)}=R_{j}^{(i)} to avoid being blocked by ii.

We make the following assumptions on Algorithms 1 and 2:

  • The parameters in Algorithm 1 are α=4\alpha=4 and β=2\beta=2, while η=2\eta=2 in Algorithm 3.

  • Sticky sets have size S=1S=1 and for any i{1+n/2,,n}i\in\{1+n/2,\ldots,n\}, ii’s initial active set satisfies minS1(i)>n/2\min S_{1}^{(i)}>n/2. Thus, active sets contain three arms, and the right half of the honest agents are initially only aware of the bad arms, i.e., of those that provide no reward.

Remark 4.

Note that Assumptions 1-3 all hold for this instance, and the choices α=4\alpha=4 and β=η=2\beta=\eta=2 are used for the complete graph experiments in (Vial et al., 2021). Additionally, the instance can be significantly generalized – the key properties are that KK and nn have the same scaling, the gaps from mediocre arms to others are constant, the gaps among mediocre arms are doubly exponentially small, and a constant fraction of agents on the right initially only have bad arms active.

Finally, we define a particular malicious agent strategy. Let J1=28J_{1}=2^{8} and inductively define Jl+1=(Jl+2)2J_{l+1}=(J_{l}+2)^{2} for each ll\in\mathbb{N}. Then the malicious recommendations are as follows:

  • If j=Jlj=J_{l} and i{l+1+n/2,l+2+n/2}i\in\{l+1+n/2,l+2+n/2\} for some l[(n/2)1]l\in[(n/2)-1], set Rj(i)=1l+n/2R_{j}^{(i)}=1-l+n/2.

  • Otherwise, let Rj(i)Sj(i)R_{j}^{(i)}\in S_{j}^{(i)} be such that Bj+1(i)=Rj(i)B_{j+1}^{(i)}=R_{j}^{(i)} (see Observation 1).

Similar to the arm means, we will wait for the proof sketch to explain this strategy in more detail. For now, we only mention that the phases JlJ_{l} grow doubly exponentially, i.e.,

(7) Jl+1=(Jl+2)2>Jl2>>J12ll.J_{l+1}=(J_{l}+2)^{2}>J_{l}^{2}>\cdots>J_{1}^{2^{l}}\ \forall\ l\in\mathbb{N}.

4.2. Negative result

We can now state the main result of this section. It shows that if the existing blocking rule from (Vial et al., 2021) is used on the above instance, then the honest agent nn at the end of the line suffers nearly linear regret Ω~(T)\tilde{\Omega}(T) until time TT exceeds a doubly exponential function of n=Kn=K.

Theorem 1.

If we run Algorithm 1 and use Algorithm 3 as the Update-Blocklist subroutine with the parameters and problem instance described in Section 4, then

RT(n)=Ω(min{log(T)+exp(exp(n/3)),T/log7T}).R_{T}^{(n)}=\Omega\left(\min\left\{\log(T)+\exp\left(\exp\left(n/3\right)\right),T/\log^{7}T\right\}\right).
Proof sketch.

We provide a complete proof in Appendix C but discuss the intuition here.

  • First, suppose honest agent 1+n/21+n/2 contacts the malicious agent n+1n+1 at all phases j[J11]j\in[J_{1}-1] (this occurs with constant probability since J1J_{1} is constant). Then the right half of honest agents (i.e., agents 1+n/2,,n1+n/2,\ldots,n) only have bad arms (i.e., arms 1+n/2,,n1+n/2,\ldots,n) in their active sets at phase J1J_{1}. This is because their initial active sets only contain such arms (by assumption), n+1n+1 only recommends currently-active arms before J1J_{1}, and no arm recommendations flow from the left half of the graph to the right half (they need to first be sent from n/2n/2 to 1+n/21+n/2, but we are assuming the latter only contacts n+1n+1 before J1J_{1}).

  • Now consider phase J1J_{1}. With constant probability, 1+n/21+n/2 and 2+n/22+n/2 both contact n+1n+1, who recommends a currently active (thus bad) arm and the mediocre arm n/2n/2, respectively. Then, again with constant probability, 2+n/22+n/2 contacts 1+n/21+n/2 at the next phase J1+1J_{1}+1; 1+n/21+n/2 only has bad arms active and thus recommends a bad arm. Therefore, during phase J1+2J_{1}+2, agent 2+n/22+n/2 has the mediocre arm n/2n/2 and some bad recommendation from 1+n/21+n/2 in its active set. The inverse gap squared between these arms is constant, thus less than the length of phase J1+2J_{1}+2 (for appropriate J1J_{1}), so by standard bandit arguments (basically, noiseless versions of best arm identification results from (Bubeck et al., 2011)), n/2n/2 will be most played. Consequently, by the blocking rule in Algorithm 3, 2+n/22+n/2 blocks 1+n/21+n/2 until phase (J1+2)2=J2(J_{1}+2)^{2}=J_{2}.

  • We then use induction. For each l[(n/2)1]l\in[(n/2)-1] (l=1l=1 in the previous bullet), suppose l+1+n/2l+1+n/2 blocks l+n/2l+n/2 between phases Jl+2J_{l}+2 and Jl+1J_{l+1}. Then during these phases, no arm recommendations flow past l+n/2l+n/2, so agents l+1+n/2\geq l+1+n/2 only play arms 1l+n/2\geq 1-l+n/2. At phase Jl+1J_{l+1}, the malicious agent recommends k1l+n/2k\geq 1-l+n/2 and l+n/2-l+n/2 to agents l+1+n/2l+1+n/2 and l+2+n/2l+2+n/2, respectively, and at the subsequent phase Jl+1+1J_{l+1}+1, l+1+n/2l+1+n/2 recommends kl+1+n/2k^{\prime}\geq l+1+n/2 to l+2+n/2l+2+n/2. Similar to the previous bullet, we then show l+2+n/2l+2+n/2 plays arm l+n/2-l+n/2 more than kk^{\prime} during phase Jl+1+2J_{l+1}+2 and thus blocks l+1+n/2l+1+n/2 until (Jl+1+2)2=Jl+2(J_{l+1}+2)^{2}=J_{l+2}, completing the inductive step. The proof that l+n/2-l+n/2 is played more than kk^{\prime} during phase Jl+1+2J_{l+1}+2 again follows from noiseless best arm identification, although unlike the previous bullet, the relevant arm gap is no longer constant (both could be mediocre arms). However, we chose the mediocre arm means such that their inverse gap squared is at most doubly exponential in ll, so by (7), the length of phase Jl+1J_{l+1} dominates it.

In summary, we show that due to blocking amongst honest agents, l+1+n/2l+1+n/2 does not receive arm 1l+n/21-l+n/2 until phase JlJ_{l}, given that some constant probability events occur at each of the times J1,,JlJ_{1},\ldots,J_{l}. This allows us to show that, with probability at least exp(Ω(n))\exp(-\Omega(n)), agent nn does not receive the best arm until phase Jn/2=exp(exp(Ω(n)))J_{n/2}=\exp(\exp(\Omega(n))), and thus does not play the best arm until time exp(exp(Ω(n)))\exp(\exp(\Omega(n))) in expectation. Since Δ2\Delta_{2} is constant, we can lower bound regret similarly. ∎

5. Proposed blocking rule

To summarize the previous section, we showed that the existing blocking rule (Algorithm 3) may result in honest agents blocking too aggressively, which causes the best arm to spread very slowly. In light of this, we propose a relaxed blocking criteria (see Algorithm 4): at phase jj, agent ii will block the agent Hj1(i)H_{j-1}^{(i)} who recommended arm Rj1(i)R_{j-1}^{(i)} at the previous phase j1j-1 if

(8) TRj1(i)(i)(Aj)κjandBj(i)=Bj1(i)==Bθj(i),T_{R_{j-1}^{(i)}}^{(i)}(A_{j})\leq\kappa_{j}\quad\text{and}\quad B_{j}^{(i)}=B_{j-1}^{(i)}=\cdots=B_{\lfloor\theta_{j}\rfloor}^{(i)},

where κjAj\kappa_{j}\leq A_{j} and θjj\theta_{j}\leq j are tuning parameters. Thus, ii blocks if both of the following occur:

  • The recommended arm Rj1(i)R_{j-1}^{(i)} performs poorly, in the sense that UCB has not chosen it sufficiently often (i.e., at least κj\kappa_{j} times) by the end of phase jj.

  • Agent ii has not changed its own best arm estimate since phase θj\theta_{j}. Intuitively, this can be viewed as a confidence criterion: if instead ii has recently changed its estimate, then ii is currently unsure which arm is best, so should not block for recommendations that appear suboptimal at first glance (i.e., those for which the first criterion in (8) may hold).

1
2if j>1j>1 and (8) holds then Pj(i)Pj(i){Hj1(i)}j{j,,jη}P_{j^{\prime}}^{(i)}\leftarrow P_{j^{\prime}}^{(i)}\cup\{H_{j-1}^{(i)}\}\ \forall\ j\in\{j,\ldots,\lceil j^{\eta}\rceil\} (block recommender);
3
Algorithm 4 {Pj(i)}j=j=Update-Blocklist\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}=\texttt{Update-Blocklist} (executed by i[n]i\in[n], proposed rule)
Remark 5.

The first criterion in (8) is a natural relaxation of demanding the recommended arm be most played. The second is directly motivated by the negative result from Section 4. In particular, recall from the Theorem 1 proof sketch that l+1+n/2l+1+n/2 blocked l+n/2l+n/2 shortly after receiving a new mediocre arm from the malicious agent. Thus, blocking amongst honest agents was always precipitated by the blocking agent changing its best arm estimate. The second criterion in (8) aims to avoid this.

Remark 6.

Our proposed rule has two additional parameters compared to the existing one: κj\kappa_{j} and θj\theta_{j}. For our theoretical results, these will be specified in Theorem 2; for experiments, they are discussed in Section 6. For now, we only mention that they should satisfy two properties. First, κj\kappa_{j} should be o(Aj)o(A_{j}), so that the first criterion in (8) dictates a sublinear number of plays. Second, jθjj-\theta_{j} should grow with jj, since (as discussed above) the second criterion represents the confidence in the best arm estimate, which grows as the number of reward observations increases.

In the remainder of this section, we introduce a further definition (Section 5.1), provide a general regret bound under our blocking rule (Section 5.2), and discuss some special cases (Section 5.3).

5.1. Noisy rumor process

As discussed in Section 1.4, we will show that under our proposed rule (1) honest agents eventually stop blocking each other, and (2) honest agents with the best arm active will eventually recommend it to others. Thereafter, we essentially reduce the arm spreading process to a much simpler rumor process in which each honest agent ii contacts a uniformly random neighbor ii^{\prime} and, if ii^{\prime} is an honest agent who knows the rumor (i.e., if the best arm is active for ii^{\prime}), then ii^{\prime} informs ii of the rumor (i.e., ii^{\prime} recommends the best arm to ii). The only caveat is that we make no assumption on the malicious agent arm recommendations, so we have no control over whether or not they are blocked. In other words, the rumor process unfolds over a dynamic graph, where edges between honest and malicious agents may or may not be present, and we have no control over these dynamics.

In light of this, we take a worst-case view and lower bound the arm spreading process with a noisy rumor process that unfolds on the (static) honest agent subgraph. More specifically, we consider the process {¯j}j=0\{\bar{\mathcal{I}}_{j}\}_{j=0}^{\infty} that tracks the honest agents informed of the rumor. Initially, only ii^{\star} (the agent from Assumption 3) is informed (i.e., ¯0={i}\bar{\mathcal{I}}_{0}=\{i^{\star}\}). Then at each phase jj\in\mathbb{N}, each honest agent ii contacts a random honest neighbor ii^{\prime}. If ii^{\prime} is informed (i.e., if i¯j1i^{\prime}\in\bar{\mathcal{I}}_{j-1}), then ii becomes informed as well (i.e., i¯ji\in\bar{\mathcal{I}}_{j}), subject to some Bernoulli(Υ)\text{Bernoulli}(\Upsilon) noise, where Υdhon(i)/d(i)\Upsilon\leq d_{\text{hon}}(i)/d(i). Hence, ii becomes informed with probability |¯j1Nhon(i)|Υ/dhon(i)|¯j1Nhon(i)|/d(i)|\bar{\mathcal{I}}_{j-1}\cap N_{\text{hon}}(i)|\Upsilon/d_{\text{hon}}(i)\leq|\bar{\mathcal{I}}_{j-1}\cap N_{\text{hon}}(i)|/d(i). Note the right side of this inequality is in turn upper bounded by the probability with which they receive the best arm in the process of the previous paragraph.

More formally, we define the noisy rumor process as follows. The key quantity in Definition 1 is τ¯spr\bar{\tau}_{\text{spr}}, the first phase all are informed. Analogous to (Chawla et al., 2020), our most general result will be in terms of the expected time that this phase occurs, i.e., 𝔼[Aτ¯spr]\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]. Under Assumption 1, the latter quantity is O~((nd¯hon/Υ)β)\tilde{O}((n\bar{d}_{\text{hon}}/\Upsilon)^{\beta}), which cannot be improved in general (see Appendix D.4). However, Section 5.3 provides sharper bounds for 𝔼[Aτ¯spr]\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}] in some special cases.

Definition 1.

Let Υ=mini[n]dhon(i)/d(i)\Upsilon=\min_{i\in[n]}d_{\text{hon}}(i)/d(i). For each honest agent i[n]i\in[n], let {Y¯j(i)}j=1\{\bar{Y}_{j}^{(i)}\}_{j=1}^{\infty} be i.i.d. Bernoulli(Υ)\text{Bernoulli}(\Upsilon) random variables and {H¯j(i)}j=1\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty} i.i.d. random variables chosen uniformly at random from Nhon(i)N_{\text{hon}}(i). Inductively define {¯j}j=0\{\bar{\mathcal{I}}_{j}\}_{j=0}^{\infty} as follows: ¯0={i}\bar{\mathcal{I}}_{0}=\{i^{\star}\} (the agent from Assumption 3) and

(9) ¯j=¯j1{i[n]¯j1:Y¯j(i)=1,H¯j(i)¯j1}j.\bar{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\bar{\mathcal{I}}_{j-1}:\bar{Y}_{j}^{(i)}=1,\bar{H}_{j}^{(i)}\in\bar{\mathcal{I}}_{j-1}\}\ \forall\ j\in\mathbb{N}.

Finally, let τ¯spr=inf{j:¯j=[n]}\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\}.

5.2. Positive result

We can now present the main result of this section: a regret upper bound for the proposed blocking rule. We state it first and then unpack the statement in some ensuing remarks. The proof of this result (and all others in this section) is deferred to Appendix D.

Theorem 2.

Let Assumptions 1-3 hold. Suppose we run Algorithm 1 and use Algorithm 4 as the Update-Blocklist subroutine with θj=(j/3)ρ1\theta_{j}=(j/3)^{\rho_{1}} and κj=jρ2/(K2S)\kappa_{j}=j^{\rho_{2}}/(K^{2}S) in (8). Also assume

(10) β>1,η>1,0<ρ11η,α>32+12β+12ρ12,12α3<ρ2<ρ1(β1).\beta>1,\quad\eta>1,\quad 0<\rho_{1}\leq\frac{1}{\eta},\quad\alpha>\frac{3}{2}+\frac{1}{2\beta}+\frac{1}{2\rho_{1}^{2}},\quad\frac{1}{2\alpha-3}<\rho_{2}<\rho_{1}(\beta-1).

Then for any honest agent i[n]i\in[n] and horizon TT\in\mathbb{N}, we have

(11) RT(i)4αlog(T)min{2η1η1k=2dmal(i)+31Δk+k=dmal(i)+4S+dmal(i)+41Δk,k=2K1Δk}+2𝔼[A2τ¯spr]+C,R_{T}^{(i)}\leq 4\alpha\log(T)\min\left\{\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}},\sum_{k=2}^{K}\frac{1}{\Delta_{k}}\right\}+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+C_{\star},

where Δk=1\Delta_{k}=1 by convention if k>Kk>K. Here CC_{\star} is a term independent of TT satisfying

(12) C=O~(max{dmal(i)/Δ2,(K/Δ2)2,Sβ/(ρ12(β1)),(S/Δ22)β/(β1),d¯β/ρ1,nK2S}),C_{\star}=\tilde{O}\left(\max\left\{d_{\text{mal}}(i)/\Delta_{2},(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},\bar{d}^{\beta/\rho_{1}},nK^{2}S\right\}\right),

where O~()\tilde{O}(\cdot) hides dependencies on α\alpha, β\beta, η\eta, ρ1\rho_{1}, and ρ2\rho_{2} and log dependencies on KK, nn, mm, and Δ21\Delta_{2}^{-1}.

Remark 7.

The theorem shows that our algorithm’s regret scales as (dmal(i)+S)log(T)/Δ(d_{\text{mal}}(i)+S)\log(T)/\Delta, plus an additive term 2𝔼[A2τ¯spr]+C2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+C_{\star} that is independent of TT and polynomial in all other parameters. When S=O(K/n)S=O(K/n) (see Remark 1), the first term is O((dmal(i)+K/n)log(T)/Δ)O((d_{\text{mal}}(i)+K/n)\log(T)/\Delta), as stated in Section 1.4. Also, when dmal(i)d_{\text{mal}}(i) is large, we recover the O(Klog(T)/Δ)O(K\log(T)/\Delta) single-agent bound (including the constant 4α4\alpha), i.e., if there are many malicious agents, honest ones fare no worse than the single-agent case.

Remark 8.

In addition to Assumptions 1-3, the theorem requires the algorithmic parameters to satisfy (10). For example, we can choose β=η=2\beta=\eta=2, ρ1=1/2\rho_{1}=1/2, α=4\alpha=4, and ρ2=1/3\rho_{2}=1/3. More generally, we view these five parameters as small numerical constants and hide them in the O~()\tilde{O}(\cdot) notation.

Remark 9.

The bound in Theorem 2 can be simplified under additional assumptions. For instance, in Example 2, it is reasonable to assume K=Θ(n)K=\Theta(n) (i.e., the number of restaurants is proportional to the population) and d¯=O(1)\bar{d}=O(1) (i.e., the degrees are constant, as in sparse social networks). Under these assumptions, the choice S=O(K/n)=O(1)S=O(K/n)=O(1) from Remark 1, and the parameters from Remark 6, the theorem’s regret bound can be further upper bounded by

RT(i)k=2O(1)48logTΔk+2𝔼[A2τ¯spr]+O~(max{(K/Δ2)2,Δ24,nK2}).R_{T}^{(i)}\leq\sum_{k=2}^{O(1)}\frac{48\log T}{\Delta_{k}}+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+\tilde{O}(\max\{(K/\Delta_{2})^{2},\Delta_{2}^{-4},nK^{2}\}).
Remark 10.

Note the parameters from Remark 8 were also used for the bad instance of Section 4. There, we had Δk>1/15\Delta_{k}>1/15, S=dmal(i)=1S=d_{\text{mal}}(i)=1, and 𝔼[A2τ¯spr]=O~(nβ)\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=\tilde{O}(n^{\beta}), so our regret is O(logT)O(\log T) plus a polynomial additive term that is much smaller than the doubly exponential term in Section 4.

Proof sketch.

Let τspr=inf{j:Bj(i)=1i[n],jj}\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i^{\prime})}=1\ \forall\ i^{\prime}\in[n],j^{\prime}\geq j\} denote the first phase where the best arm is most played for all honest agents at all phases thereafter. Before this phase (i.e., before time AτsprA_{\tau_{\text{spr}}}) we simply upper bound regret by 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}]. The main novelty of our analysis is bounding 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}] in terms of CC_{\star} and 𝔼[Aτ¯spr]\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]. We devote Section 7 to discussing this proof.

After phase τspr\tau_{\text{spr}}, the best arm is active by definition, so ii incurs logarithmic in TT regret. We let

(13) τblk(i)=inf{j:Hj1(i)Pj(i)Pj1(i)jjs.t.Rj1(i)1}\tau_{\text{blk}}^{(i)}=\inf\{j\in\mathbb{N}:H_{j^{\prime}-1}^{(i)}\in P_{j^{\prime}}^{(i)}\setminus P_{j^{\prime}-1}^{(i)}\ \forall\ j^{\prime}\geq j\ s.t.\ R_{j^{\prime}-1}^{(i)}\neq 1\}

be the earliest phase such that ii blocks for all suboptimal recommendations thereafter. We then split the phases after τspr\tau_{\text{spr}} into two groups: those before τ~(i)τsprτblk(i)T1/(βK)\tilde{\tau}^{(i)}\triangleq\tau_{\text{spr}}\vee\tau_{\text{blk}}^{(i)}\vee T^{1/(\beta K)} and those after.

For the phases after τspr\tau_{\text{spr}} but before τ~(i)\tilde{\tau}^{(i)}, we consider three cases:

  • τ~(i)=τspr\tilde{\tau}^{(i)}=\tau_{\text{spr}}: In this case, there are no such phases, so there is nothing to prove.

  • τ~(i)=T1/(βK)\tilde{\tau}^{(i)}=T^{1/(\beta K)}: Here we have an effective horizon Aτ~(i)=(τ~(i))β=T1/KA_{\tilde{\tau}^{(i)}}=(\tilde{\tau}^{(i)})^{\beta}=T^{1/K}, so similar to (Vial et al., 2021), we exploit the fact that the best arm is active and modify existing UCB analysis to bound regret by O(Klog(T1/K)/Δ)=O(log(T)/Δ)O(K\log(T^{1/K})/\Delta)=O(\log(T)/\Delta), which is dominated by (11) (in an order sense).

  • τ~(i)=τblk(i)\tilde{\tau}^{(i)}=\tau_{\text{blk}}^{(i)}: Here we are considering phases jj where the best arm is most played by ii (since jτsprj\geq\tau_{\text{spr}}) but ii does not block suboptimal recommendations (since jτblk(i)j\leq\tau_{\text{blk}}^{(i)}). Note that no such phases arise for the existing blocking rule, so here the proof diverges from (Vial et al., 2021), and most of Appendix D.1 is dedicated to this case. Roughly speaking, the analogous argument of the previous case yields the regret bound O((K/Δ)𝔼[logτ~(i)]O((K/\Delta)\mathbb{E}[\log\tilde{\tau}^{(i)}], and we prove this term is also O(log(T)/Δ)O(\log(T)/\Delta) by deriving a tail bound for τ~(i)\tilde{\tau}^{(i)}. The tail amounts to showing that, once the best arm is active, ii can identify suboptimal arms as such, within the phase. This in turn follows from best arm identification results and the growing phase lengths.

After phase τ~(i)\tilde{\tau}^{(i)}, the best arm is most played for all honest agents (since τ~(i)τspr\tilde{\tau}^{(i)}\geq\tau_{\text{spr}}), so they only recommend this arm. Thus, ii only plays the best arm, its SS sticky arms, and any malicious recommendations. Consequently, to bound regret by O((S+dmal(i))log(T)/Δ)O((S+d_{\text{mal}}(i))\log(T)/\Delta) as in (11), we need to show each malicious neighbor ii^{\prime} only recommends O(1)O(1) suboptimal arms. It is easy to see that ii^{\prime} can only recommend O(logK)O(\log K) such arms: if ii^{\prime} recommends a bad arm at phase τ~(i)\tilde{\tau}^{(i)}, they will be blocked until phase Tη/(βK)T^{\eta/(\beta K)} (since τ~(i)τblk(i)T1/(βK)\tilde{\tau}^{(i)}\geq\tau_{\text{blk}}^{(i)}\vee T^{1/(\beta K)}), then until phase (Tη/(βK))η=Tη2/(βK)(T^{\eta/(\beta K)})^{\eta}=T^{\eta^{2}/(\beta K)}, etc. Thus, the (logηK)(\log_{\eta}K)-th bad recommendation occurs at phase TηlogηK/(βK)=T1/βT^{\eta^{\log_{\eta}K}/(\beta K)}=T^{1/\beta}, which is time TT by definition Aj=jβA_{j}=j^{\beta}. Finally, an argument from (Vial et al., 2021) sharpens this O(logK)O(\log K) term to O(1)O(1). ∎

5.3. Special cases

We next discuss some special cases of our regret bound. First, as in (Chawla et al., 2020), we can prove an explicit bound assuming the honest agent subgraph GhonG_{\text{hon}} is dd-regular, i.e., dhon(i)=di[n]d_{\text{hon}}(i)=d\ \forall\ i\in[n].

Corollary 1.

Let the assumptions of Theorem 2 hold and further assume GhonG_{\text{hon}} is dd-regular with d2d\geq 2. Let ϕ\phi denote the conductance of GhonG_{\text{hon}}. Then for any honest agent i[n]i\in[n] and horizon TT\in\mathbb{N},

(14) RT(i)\displaystyle R_{T}^{(i)} 4αlog(T)min{2η1η1k=2dmal(i)+31Δk+k=dmal(i)+4S+dmal(i)+41Δk,k=2K1Δk}\displaystyle\leq 4\alpha\log(T)\min\left\{\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}},\sum_{k=2}^{K}\frac{1}{\Delta_{k}}\right\}
(15) +O~(max{dmal(i)/Δ2,(K/Δ2)2,Sβ/(ρ12(β1)),(S/Δ22)β/(β1),d¯β/ρ1,nK2S,(ϕΥ)β}).\displaystyle\quad+\tilde{O}\left(\max\left\{d_{\text{mal}}(i)/\Delta_{2},(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},\bar{d}^{\beta/\rho_{1}},nK^{2}S,(\phi\Upsilon)^{-\beta}\right\}\right).
Remark 11.

This corollary includes the complete graph case studied in (Vial et al., 2021), where dmal(i)=md_{\text{mal}}(i)=m, ϕ=Θ(1)\phi=\Theta(1), and Υ=Θ(n/(n+m))\Upsilon=\Theta(n/(n+m)). In this case, the term (14) matches the corresponding term from (Vial et al., 2021) exactly, i.e., for large TT, Corollary 1 is a strict generalization. Our additive term scales as

max{(m/Δ2),(K/Δ2)2,Sβ/(ρ12(β1)),(S/Δ22)β/(β1),(n+m)β/ρ1,nK2S,((n+m)/n)β}\max\left\{(m/\Delta_{2}),(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},(n+m)^{\beta/\rho_{1}},nK^{2}S,((n+m)/n)^{\beta}\right\}

whereas the additive term from (Vial et al., 2021) scales as max{(m/Δ2),(K/Δ2),(S/Δ22)2βη/(β1),(n+m)β,nK2S}\max\{(m/\Delta_{2}),(K/\Delta_{2}),(S/\Delta_{2}^{2})^{2\beta\eta/(\beta-1)},(n+m)^{\beta},nK^{2}S\}. Notice our dependence on the arm gap is Δ22β/(β1)\Delta_{2}^{-2\beta/(\beta-1)}, which matches the fully cooperative case (Chawla et al., 2020), whereas the dependence is Δ22βη/(β1)\Delta_{2}^{-2\beta\eta/(\beta-1)} in (Vial et al., 2021), which is potentially much larger.

Remark 12.

In the setting of Remark 9, the corollary’s regret bound becomes

RT(i)k=2O(1)48logTΔk+O~(max{(K/Δ2)2,Δ24,nK2,(ϕΥ)β}).R_{T}^{(i)}\leq\sum_{k=2}^{O(1)}\frac{48\log T}{\Delta_{k}}+\tilde{O}(\max\{(K/\Delta_{2})^{2},\Delta_{2}^{-4},nK^{2},(\phi\Upsilon)^{-\beta}\}).

The key difference is the dependence on conductance O~(ϕβ)\tilde{O}(\phi^{-\beta}), which matches the result from (Chawla et al., 2022).

Proof sketch.

In light of Theorem 2, we only need to show 𝔼[Aτ¯spr]=O~((ϕΥ)β)\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=\tilde{O}((\phi\Upsilon)^{-\beta}). To do so, we let ¯j\underline{\mathcal{I}}_{j} denote the noiseless version of ¯j\bar{\mathcal{I}}_{j} (defined in the same way but with Υ=1\Upsilon=1) and τ¯spr=inf{j:¯j=[n]}\underline{\tau}_{\text{spr}}=\inf\{j:\underline{\mathcal{I}}_{j}=[n]\}. We then construct a coupling between ¯j\bar{\mathcal{I}}_{j} and ¯j\underline{\mathcal{I}}_{j}, which ensures that with high probability, τ¯sprjlog(j)/Υ\bar{\tau}_{\text{spr}}\leq j\log(j)/\Upsilon whenever τ¯sprj\underline{\tau}_{\text{spr}}\leq j. Finally, using this coupling and a tail bound for τ¯spr\underline{\tau}_{\text{spr}} from (Chawla et al., 2020) (which draws upon the analysis of (Chierichetti et al., 2010)), we derive a tail bound for τ¯spr\bar{\tau}_{\text{spr}}. This allows us to show 𝔼[Aτ¯spr]=O(((logn)2log(log(n)/ϕ)/(ϕΥ))β)=O~((ϕΥ)β)\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=O(((\log n)^{2}\log(\log(n)/\phi)/(\phi\Upsilon))^{\beta})=\tilde{O}((\phi\Upsilon)^{-\beta}), as desired.444When Υ=1\Upsilon=1, (Chawla et al., 2020) shows 𝔼[Aτ¯spr]=𝔼[Aτ¯spr]=O((log(n)/ϕ)β)\mathbb{E}[A_{\underline{\tau}_{\text{spr}}}]=\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=O((\log(n)/\phi)^{\beta}), so our bound generalizes theirs up to log\log terms.

Finally, we can sharpen the above results for honest agents without malicious neighbors.

Corollary 2.

For i[n]i\in[n] with dmal(i)=0d_{\text{mal}}(i)=0, the terms (11) and (14) from Theorem 2 and Corollary 1, respectively, can (under their respective assumptions) be improved to 4αlog(T)k=2S+2Δk14\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-1}.

Remark 13.

The improved term in Corollary 2 matches the logT\log T term from (Chawla et al., 2020), including constants. Thus, the corollary shows that for large TT, agents who are not directly connected to malicious agents are unaffected by their presence elsewhere in the graph.

Proof sketch.

Recall from the Theorem 2 proof sketch that the logT\log T term arises from regret after phase τspr\tau_{\text{spr}}. At any such phase, the best arm is most played for all honest agents (by definition), so when dmal(i)=0d_{\text{mal}}(i)=0, ii’s neighbors only recommend this arm. Therefore, ii’s active sets after τspr\tau_{\text{spr}} are fixed; they contain the best arm and S+1S+1 suboptimal ones. Thus, ii only plays S+1S+1 suboptimal arms long-term, so in the worst case incurs the standard UCB regret 4αlog(T)k=2S+2Δk14\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-1}. ∎

6. Numerical results

Thus far, we have shown the proposed blocking rule adapts to general graphs more gracefully than the existing one, at least in theory. We now illustrate this finding empirically.

Experimental setup: We follow (Vial et al., 2021, Section 6) except we extend those experiments to G(n+m,p)G(n+m,p) graphs, i.e., each edge is present with probability pp. For each p{1,1/2,1/4}p\in\{1,1/2,1/4\} and each of two malicious strategies (to be defined shortly), we conduct 100100 trials of the following:

  • Set n=25n=25 and m=10m=10 and generate GG as a G(n+m,p)G(n+m,p) random graph, resampling if necessary until the honest agent subgraph GhonG_{\text{hon}} is connected (see Assumption 1).

  • Set K=100K=100, μ1=0.95\mu_{1}=0.95, and μ2=0.85\mu_{2}=0.85, then sample the remaining arm means {μk}k=3K\{\mu_{k}\}_{k=3}^{K} uniformly from [0,0.85][0,0.85] (so Δ2=0.1\Delta_{2}=0.1). For each k[K]k\in[K], set νk=Bernoulli(μk)\nu_{k}=\text{Bernoulli}(\mu_{k}).

  • Set S=K/nS=K/n and sample the sticky sets {S^(i)}i=1n\{\hat{S}^{(i)}\}_{i=1}^{n} uniformly from the SS-sized subsets of [K][K], resampling if necessary until 1i=1nS^(i)1\in\cup_{i=1}^{n}\hat{S}^{(i)} (see Assumption 3).

  • Run Algorithm 1 with the existing (Algorithm 3) and proposed (Algorithm 4) blocking rules, along with two baselines: a no communication scheme, where agents ignore the network and run UCB in isolation, and the algorithm from (Chawla et al., 2020), where they do not block.

Algorithmic parameters: We set α=4\alpha=4 and β=η=2\beta=\eta=2 as in Remarks 4 and 8. For the parameters in the proposed blocking rule, we choose κj=j1.5\kappa_{j}=j^{1.5}, and θj=jlogj\theta_{j}=j-\log j. While these are different from the parameters specified in our theoretical results (which we found are too conservative in practice), they do satisfy the key properties discussed in Remark 6.

Malicious strategies: Like (Vial et al., 2021), we use strategies we call the naive and smart strategies (they are called uniform and omniscient in (Vial et al., 2021)). The naive strategy simply recommends a uniformly random suboptimal arm. The smart strategy recommends Rj(i)=argmink{2,,K}Sj(i)Tk(i)(Aj)R_{j}^{(i)}=\operatorname*{arg\,min}_{k\in\{2,\ldots,K\}\setminus S_{j}^{(i)}}T_{k}^{(i)}(A_{j}), i.e., the least played, inactive, suboptimal arm. Intuitively, this is a more devious strategy which forces ii to play Rj(i)R_{j}^{(i)} often in the next phase (to drive down its upper confidence bound). Consequently, ii may play it most and discard a better arm in favor of it (see Lines 1-1 of Algorithm 1).

Results: In Figure 3, we plot the average and standard deviation (across trials) of the per-agent regret i=1nRT(i)/n\sum_{i=1}^{n}R_{T}^{(i)}/n. For the naive strategy, the existing blocking rule eventually becomes worse than the no blocking baseline as pp decreases. More strikingly, it even becomes worse than the no communication baseline for the smart strategy. In other words, honest agents would have been better off ignoring the network and simply running UCB on their own. As in Section 4, this is because accidental blocking causes the best arm to spread very slowly. Additionally, the standard deviation becomes much higher than all other algorithms, suggesting that regret is significantly worse in some trials. In contrast, the proposed blocking rule improves as pp decreases, because it is mild enough to spread the best arm at all values of pp, and for smaller pp, honest agents have fewer malicious neighbors (on average). We also observe that the proposed rule outperforms both baselines uniformly across pp. Additionally, it improves over the existing rule more dramatically for the smart strategy, i.e., when the honest agents face a more sophisticated adversary. Finally, it is worth acknowledging the existing rule is better when p=1p=1 – although not in a statistically significant sense for the smart strategy – because it does spread the best arm quickly on the complete graph (as shown in (Vial et al., 2021)), and thereafter more aggressively blocks malicious agents.

Refer to caption
Figure 3. Empirical results for synthetic data. Rows of subfigures correspond to the malicious strategy, while columns correspond to the edge probability pp for the G(n+m,p)G(n+m,p) random graph.
\Description

Empirical results for synthetic data. Rows of subfigures correspond to the malicious strategy, while columns correspond to the edge probability pp for the G(n+m,p)G(n+m,p) random graph.

Other results: As in (Vial et al., 2021), we reran the simulations using arm means derived from the MovieLens dataset (Harper and Konstan, 2015). We also experimented with new variants of the smart and naive strategies, where the malicious agents follow these strategies if the best arm is active (in hopes of forcing honest agents to discard it) and recommend the second best arm otherwise. Intuitively, these variants differ in that malicious agents recommend good arms (i.e., the second best) more frequently, while still never revealing the best arm (the only one that leads to logarithmic regret). For all experiments, the key message – that the proposed blocking rule adapts to varying graph structures more gracefully than the existing one – is consistent. See Appendix B for details.

7. Gossip despite blocking

As discussed above, the main analytical contribution of this work is proving that the best arm spreads in a gossip fashion, despite accidental blocking. In this (technical) section, we provide a detailed sketch of this proof. We begin with a high-level outline. The key is to show that honest agents eventually stop blocking each other. This argument (roughly) proceeds as follows:

  • Step 1: First, we show that honest agents learn the arm statistics in a certain sense. More specifically, we provide a tail bound for a random phase τarm\tau_{\text{arm}} such that for all phases jτarmj\geq\tau_{\text{arm}} (1) each honest agent’s most played arm in phase jj is close to its true best active arm and (2) any active arm close to the true best one is played at least κj\kappa_{j} times by the end of phase jj.

  • Step 2: Next, we show that honest agents communicate with their neighbors frequently. In particular, we establish a tail bound for another random phase τcom\tau_{\text{com}} such that for any jτcomj\geq\tau_{\text{com}}, each honest agent contacts all of its honest neighbors at least once between θj\theta_{j} and jj.

  • Step 3: Finally, we use the above to show that eventually, no blocking occurs amongst honest agents. The basic idea is as follows. Consider a phase jj, an honest agent ii, and a neighbor ii^{\prime} of ii. Then if ii has had the same best arm estimate kk since phase θj\theta_{j} – i.e., if the second blocking criterion in (8) holds – ii^{\prime} would have contacted ii at some phase j{θj,,j}j^{\prime}\in\{\theta_{j},\ldots,j\} (by step 2) and received arm kk. Between phases jj^{\prime} and jj, the most played arm for ii^{\prime} cannot get significantly worse (by step 1). Thus, if ii asks ii^{\prime} for a recommendation at jj, ii^{\prime} will respond with an arm whose mean is close to μk\mu_{k}, which ii will play at least κj\kappa_{j} times (by step 1). Hence, the first criterion in (8) fails, i.e., the two cannot simultaneously hold.

In the next three sub-sections, we discuss these three steps. Then in Section 7.4, we describe how, once accidental blocking stops, the arm spreading process can be coupled to the noisy rumor process from Definition 1. Finally, in Section 7.5, we discuss how to combine all of these steps to bound the term 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}] from the Theorem 2 proof sketch.

7.1. Learning the arm statistics

Recall we assume μ1μK\mu_{1}\geq\cdots\geq\mu_{K}, so for any W[K]W\subset[K], minW\min W is the best arm in WW, i.e., μminW=maxwWμw\mu_{\min W}=\max_{w\in W}\mu_{w}. Therefore, for any δ(0,1)\delta\in(0,1), Gδ(W){wW:μwμminWδ}G_{\delta}(W)\triangleq\{w\in W:\mu_{w}\geq\mu_{\min W}-\delta\} is the subset of arms at least δ\delta-close to the best one. For each honest agent i[n]i\in[n] and phase jj\in\mathbb{N}, define

(16) Ξj,1(i)={Bj(i)Gδj,1(Sj(i))},Ξj,2(i)={minwGδj,2(Sj(i))Tw(i)(Aj)κj},Ξj(i)=Ξj,1(i)Ξj,2(i).\displaystyle\Xi_{j,1}^{(i)}=\left\{B_{j}^{(i)}\notin G_{\delta_{j,1}}(S_{j}^{(i)})\right\},\quad\Xi_{j,2}^{(i)}=\left\{\min_{w\in G_{\delta_{j,2}}(S_{j}^{(i)})}T_{w}^{(i)}(A_{j})\leq\kappa_{j}\right\},\quad\Xi_{j}^{(i)}=\Xi_{j,1}^{(i)}\cup\Xi_{j,2}^{(i)}.

where δj,1,δj,2(0,1)\delta_{j,1},\delta_{j,2}\in(0,1) will be chosen shortly. Finally, define the random phase

(17) τarm=inf{j:𝟙(Ξj(i))=0i[n],j{j,j+1,}}.\tau_{\text{arm}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\Xi_{j^{\prime}}^{(i)})=0\ \forall\ i\in[n],j^{\prime}\in\{j,j+1,\ldots\}\}.

In words, τarm\tau_{\text{arm}} is the earliest phase such that, at all phases jj thereafter, (1) the most played arms are δj,1\delta_{j,1}-close to best active arms and (2) all arms δj,2\delta_{j,2}-close to the best are played at least κj\kappa_{j} times.

As discussed above, Step 1 involves a tail bound for τarm\tau_{\text{arm}}. The analysis is based on (Bubeck et al., 2011, Theorem 2), which includes a tail bound showing that the most played arm is δ\delta-close to the best, provided that 1/δ21/\delta^{2} samples have been collected from each of the δ\delta-far arms. In our case, phase jj lasts AjAj1=Θ(jβ1)A_{j}-A_{j-1}=\Theta(j^{\beta-1}) time steps, so each of S+2S+2 active arms is played Θ(jβ1/S)\Theta(j^{\beta-1}/S) times on average. Hence, we can show the most played arm within the phase is δj,1\delta_{j,1}-close to the best if we choose δj,1=Θ(S/jβ1)\delta_{j,1}=\Theta(\sqrt{S/j^{\beta-1}}), which allows us to bound (Ξj,1(i))\mathbb{P}(\Xi_{j,1}^{(i)}). Analogously, we choose δj,2=Θ(1/κj)\delta_{j,2}=\Theta(1/\sqrt{\kappa_{j}}) and show that δj,2\delta_{j,2}-close arms must be played 1/δj,22=Θ~(κj)1/\delta_{j,2}^{2}=\tilde{\Theta}(\kappa_{j}) times before they are distinguished as such, which allows us to bound (Ξj,2(i))\mathbb{P}(\Xi_{j,2}^{(i)}). Taken together, we can prove a tail bound for τarm\tau_{\text{arm}} (Lemma 9 in Appendix E.1) with these choices δj,1=Θ(S/jβ1)\delta_{j,1}=\Theta(\sqrt{S/j^{\beta-1}}) and δj,2=Θ~(1/κj)\delta_{j,2}=\tilde{\Theta}(1/\sqrt{\kappa_{j}}).

7.2. Communicating frequently

Next, for any i,i[n]i,i^{\prime}\in[n] such that (i,i)Ehon(i,i^{\prime})\in E_{\text{hon}}, let Ξj(ii)=j=θjj2{Hj(i)i}\Xi_{j}^{(i\rightarrow i^{\prime})}=\cap_{j^{\prime}=\lfloor\theta_{j}\rfloor}^{j-2}\{H_{j^{\prime}}^{(i^{\prime})}\neq i\} denote the event that ii did not send a recommendation to ii^{\prime} between phases θj\lfloor\theta_{j}\rfloor and j2j-2. Also define

(18) τcom=inf{j:𝟙(iiEhonΞj(ii))=0j{j,j+1,}}.\tau_{\text{com}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\cup_{i\rightarrow i^{\prime}\in E_{\text{hon}}}\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})=0\ \forall\ j^{\prime}\in\{j,j+1,\ldots\}\}.

Here we abuse notation slightly; the union is over all (undirected) edges in EhonE_{\text{hon}} but viewed as pairs of directed edges. Hence, at all phases jτcomj\geq\tau_{\text{com}}, each honest agent ii^{\prime} receives a recommendation from each of its honest neighbors ii at some phase jj^{\prime} between θj\theta_{j} and j2j-2.

Step 2 involves the tail bound for τcom\tau_{\text{com}} that was mentioned above (see Lemma 10 in Appendix E.2). The proof amounts to bounding the probability of Ξj(ii)\Xi_{j}^{(i\rightarrow i^{\prime})}. Recall this event says ii^{\prime} did not contact ii for a recommendation at any phase j{θj,,j2}j^{\prime}\in\{{\theta_{j}},\ldots,j-2\}. Clearly, this means ii^{\prime} did not block ii at any such phase. Hence, in the worst case, ii^{\prime} blocked ii just before θj\theta_{j}, in which case ii was un-blocked at θjη=(j/3)ρ1ηj/3\theta_{j}^{\eta}=(j/3)^{\rho_{1}\eta}\leq j/3, where the inequality holds by assumption in Theorem 2. Hence, Ξj(ii)\Xi_{j}^{(i\rightarrow i^{\prime})} implies ii^{\prime} was not blocking ii between phases j/3j/3 and j2j-2, so each of the Θ(j)\Theta(j) neighbors that ii^{\prime} contacted in these phases was sampled uniformly from a set containing ii, yet ii was never sampled. The probability of this decays exponentially in jj, which yields an exponential tail for τcom\tau_{\text{com}}.

7.3. Avoiding accidental blocking

Next, we show honest agents eventually stop blocking each other. Toward this end, we first note

(19) i[n],jτarm,μminSj(i)μBj(i)+δj,1μminSj+1(i)+δj,1\forall\ i\in[n],\quad\forall\ j\geq\tau_{\text{arm}},\quad\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}\leq\mu_{\min S_{j+1}^{(i)}}+\delta_{j,1}

where the first inequality uses the definition of τarm\tau_{\text{arm}}, and the second holds because minSj+1(i)argmaxkSj+1(i)μk\min S_{j+1}^{(i)}\in\operatorname*{arg\,max}_{k\in S_{j+1}^{(i)}}\mu_{k} and Bj(i)Sj+1(i)B_{j}^{(i)}\in S_{j+1}^{(i)} in Algorithm 1 (see Claim 13 in Appendix E.3 for details). In words, (19) says the best active arm can decay by at most δj,1\delta_{j,1} at phase jj. Applying iteratively and since there are KK arms total, we then show (see Claim 14 in Appendix E.3)

(20) i[n],jjτarm,μminSj(i)μminSj(i)(K1)supj′′{j,,j}δj′′,1.\forall\ i\in[n],\quad\forall\ j^{\prime}\geq j\geq\tau_{\text{arm}},\quad\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Combining the previous two inequalities, we conclude (see Corollary 3 in Appendix E.3)

(21) i[n],jjτarm,μBj(i)μminSj(i)Ksupj′′{j,,j}δj′′,1.\forall\ i\in[n],\quad\forall\ j^{\prime}\geq j\geq\tau_{\text{arm}},\quad\mu_{B_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Now the key part of Step 3 is to use (21) to show (see Claim 15 in Appendix E.3)

(22) js.t.θjτarm,i,i[n]s.t.i{Hj(i)}j=θjj2,iPj(i)Pj1(i).\forall\ j\in\mathbb{N}\ s.t.\ \theta_{j}\geq\tau_{\text{arm}},\quad\forall\ i,i^{\prime}\in[n]\ s.t.\ i\in\{H_{j^{\prime}}^{(i^{\prime})}\}_{j^{\prime}=\theta_{j}}^{j-2},\quad i^{\prime}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}.

In words, this result says that if jj is sufficiently large, and if ii has sent a recommendation to ii^{\prime} since phase θj\theta_{j}, then ii will not block ii^{\prime} at phase jj. The proof is by contraction: if instead ii blocks ii^{\prime} at jj, then by Algorithm 4, ii has not changed its best arm estimate kk since phase θj\theta_{j}, so it would have recommended kk to ii^{\prime} at some phase jθjj^{\prime}\geq\theta_{j}. Therefore, μminSj(i)μk\mu_{\min S_{j^{\prime}}^{(i^{\prime})}}\geq\mu_{k}. Additionally, since jθj=Ω(jρ1)j^{\prime}\geq\theta_{j}=\Omega(j^{\rho_{1}}), we know that for any j′′jj^{\prime\prime}\geq j^{\prime}, the choice of δj′′,1\delta_{j^{\prime\prime},1} in Section 7.1 guarantees that

Kδj′′,1O(Kδj,1)=O~(K2S/(j)β1)O~(K2S/jρ1(β1)).K\delta_{j^{\prime\prime},1}\leq O(K\delta_{j^{\prime},1})=\tilde{O}\Big{(}\sqrt{K^{2}S/(j^{\prime})^{\beta-1}}\Big{)}\leq\tilde{O}\Big{(}\sqrt{K^{2}S/j^{\rho_{1}(\beta-1)}}\Big{)}.

Combining these observations and using (21) (with jj^{\prime} and jj replaced by j1j-1 and jj^{\prime}), we then show

(23) μBj1(i)μminSj(i)Ksupj′′jδj′′,1μkO~(K2S/jρ1(β1)).\mu_{B_{j-1}^{(i^{\prime})}}\geq\mu_{\min S_{j^{\prime}}^{(i^{\prime})}}-K\sup_{j^{\prime\prime}\geq j^{\prime}}\delta_{j^{\prime\prime},1}\geq\mu_{k}-\tilde{O}\Big{(}\sqrt{K^{2}S/j^{\rho_{1}(\beta-1)}}\Big{)}.

On the other hand, ii blocking ii^{\prime} at phase jj means ii plays the recommended arm Bj1(i)B_{j-1}^{(i^{\prime})} fewer than κj\kappa_{j} times by the end of phase jj. Since jθjτarmj\geq\theta_{j}\geq\tau_{\text{arm}}, this implies (by definition of τarm\tau_{\text{arm}}) that μk>μBj1(i)+δj,2\mu_{k}>\mu_{B_{j-1}^{(i^{\prime})}}+\delta_{j,2}, where δj,2=Θ~(1/κj)\delta_{j,2}=\tilde{\Theta}(1/\sqrt{\kappa_{j}}) as in Section 7.1. Combined with (23) and the choice κj=jρ2/(K2S)\kappa_{j}=j^{\rho_{2}}/(K^{2}S) from Theorem 2, we conclude jρ1(β1)O~(jρ2)j^{\rho_{1}(\beta-1)}\leq\tilde{O}(j^{\rho_{2}}). This contradicts the assumption ρ2<ρ1(β1)\rho_{2}<\rho_{1}(\beta-1) in Theorem 2, which completes the proof of (22).

Finally, we use (22) to show honest agents eventually stop blocking each other entirely, i.e.,

(24) js.t.θjτcom,θθjτarm,Pj(i)[n]=i[n]\forall\ j\in\mathbb{N}\ s.t.\ \theta_{j}\geq\tau_{\text{com}},\theta_{\theta_{j}}\geq\tau_{\text{arm}},\quad P_{j}^{(i)}\cap[n]=\emptyset\ \forall\ i\in[n]

(see Lemma 11 in Appendix E.3). Intuitively, (24) holds because after new blocking stops (22), old blocking will eventually “wear off”. The proof is again by contradiction: if ii is blocking some honest ii^{\prime} at phase jj, the blocking must have started at some jj1/ηj^{\prime}\geq j^{1/\eta} (else, it ends by (j)η<j(j^{\prime})^{\eta}<j). Thus, by assumption j1/ηjρ1θjτcomj^{1/\eta}\geq j^{\rho_{1}}\geq{\theta_{j}}\geq\tau_{\text{com}}, ii blocked ii^{\prime} at phase jτcomj^{\prime}\geq\tau_{\text{com}}. But by definition of τcom\tau_{\text{com}}, ii^{\prime} would have contacted ii at some phase j′′{θj,,j}j^{\prime\prime}\in\{\theta_{j^{\prime}},\ldots,j^{\prime}\}. Applying (22) (at phase jj^{\prime}; note that by the above inequalities, θjθj1/ηθθjτarm\theta_{j^{\prime}}\geq\theta_{j^{1/\eta}}\geq\theta_{\theta_{j}}\geq\tau_{\text{arm}}, as required by (22)), we obtain a contradiction.

7.4. Coupling with noisy rumor process

To begin, we define an equivalent way to sample Hj(i)H_{j}^{(i)} in Algorithm 2.555Claim 16 in Appendix E.4 verifies this equivalency (the proof is a straightforward application of the law of total probability). This equivalent method will allow us to couple the arm spreading and noisy rumor processes through a set of primitive random variables. In particular, for each honest agent i[n]i\in[n], let {υj(i)}j=1\{\upsilon_{j}^{(i)}\}_{j=1}^{\infty} and {H¯j(i)}j=1\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty} be i.i.d. sequences drawn uniformly from [0,1][0,1] and Nhon(i)N_{\text{hon}}(i). Then choose Hj(i)H_{j}^{(i)} according to two cases:

  • If Pj(i)[n]=P_{j}^{(i)}\cap[n]=\emptyset, let Yj(i)=𝟙(υj(i)dhon(i)/|N(i)Pj(i)|)Y_{j}^{(i)}=\mathbbm{1}(\upsilon_{j}^{(i)}\leq d_{\text{hon}}(i)/|N(i)\setminus P_{j}^{(i)}|) and consider two sub-cases. First, if Yj(i)=1Y_{j}^{(i)}=1, set Hj(i)=H¯j(i)H_{j}^{(i)}=\bar{H}_{j}^{(i)}. Second, if Yj(i)=0Y_{j}^{(i)}=0, sample Hj(i)H_{j}^{(i)} from Nmal(i)Pj(i)N_{\text{mal}}(i)\setminus P_{j}^{(i)} uniformly.

  • If Pj(i)[n]P_{j}^{(i)}\cap[n]\neq\emptyset, sample Hj(i)H_{j}^{(i)} from N(i)Pj(i)N(i)\setminus P_{j}^{(i)} uniformly.

Next, we observe that since δj,10\delta_{j,1}\rightarrow 0 as jj\rightarrow\infty by the choice of δj,1\delta_{j,1} in Section 7.1 and Δ2>0\Delta_{2}>0 by Assumption 2, we have δj,1<Δ2\delta_{j,1}<\Delta_{2} for large enough jj. Paired with the definition of τarm\tau_{\text{arm}}, this allows us to show that for all large jj and i[n]i\in[n] with 1Sj(i)1\in S_{j}^{(i)} (i.e., with the best arm active), Bj(i)=1B_{j}^{(i)}=1 (i.e., the best arm is played most). See Claim 17 in Appendix E.4 for the formal statement.

Finally, we observe that by (24), only the first case of the above sampling strategy occurs for large jj. Moreover, in this case, Yj(i)Y_{j}^{(i)} is Bernoulli with parameter

dhon(i)/|N(i)Pj(i)|dhon(i)/|N(i)|dhon(i)/d(i)Υ,d_{\text{hon}}(i)/|N(i)\setminus P_{j}^{(i)}|\geq d_{\text{hon}}(i)/|N(i)|\triangleq d_{\text{hon}}(i)/d(i)\geq\Upsilon,

where the second inequality holds by Definition 1. Hence, the probability that Yj(i)=1Y_{j}^{(i)}=1, and thus the probability that ii contacts the random honest neighbor H¯j(i)\bar{H}_{j}^{(i)} in the above sampling strategy, dominates the probability that ii contacts H¯j(i)\bar{H}_{j}^{(i)} in the noisy rumor process of Definition 1. Additionally, by the previous paragraph, agents with the best arm active will recommend it (for large enough jj). Taken together, we can show that the probability of receiving the best arm in the arm spreading process dominates the probability of being informed of the rumor in the noisy rumor process. This allows us to prove a tail bound for τspr\tau_{\text{spr}} in terms of a tail bound for the random phase τ¯spr\bar{\tau}_{\text{spr}} from Definition 1, on the event that the tails of τarm\tau_{\text{arm}} and τcom\tau_{\text{com}} are sufficiently small (in the sense of (24); see Lemma 12 in Appendix E.4 for details).

7.5. Spreading the best arm

In summary, we prove tail bounds for τarm\tau_{\text{arm}} and τcom\tau_{\text{com}} (Sections 7.1 and 7.2) and show the tails of τspr\tau_{\text{spr}} are controlled by those of τ¯spr\bar{\tau}_{\text{spr}}, provided the tails of τarm\tau_{\text{arm}} and τcom\tau_{\text{com}} are not too heavy (Sections 7.3 and 7.4). Combining and summing tails allows us to bound 𝔼[Aτspr]\mathbb{E}[A_{{\tau}_{\text{spr}}}] in terms of CC_{\star} (which accounts for the tails of τarm\tau_{\text{arm}} and τcom\tau_{\text{com}}) and 𝔼[Aτ¯spr]\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}] (which accounts for the tail of τ¯spr\bar{\tau}_{\text{spr}}), as mentioned in the Theorem 2 proof sketch. See Theorem 3 and Corollary 4 in Appendix E.5 for details.

8. Conclusion

In this work, we showed that existing algorithms for multi-agent bandits with malicious agents fail to generalize beyond the complete graph. In light of this, we proposed a new blocking algorithm and showed it has low regret on any connected and undirected graph. This regret bound relied on the analysis of a novel process involving gossip and blocking. Our work leaves open several questions, such as whether our insights can be applied to multi-agent reinforcement learning.

Acknowledgements.
This work was supported by NSF Grants CCF 22-07547, CCF 19-34986, CNS 21-06801, 2019844, 2112471, and 2107037; ONR Grant N00014-19-1-2566; the Machine Learning Lab (MLL) at UT Austin; and the Wireless Networking and Communications Group (WNCG) Industrial Affiliates Program.

References

  • (1)
  • Anandkumar et al. (2011) Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. 2011. Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications 29, 4 (2011), 731–745.
  • Audibert and Bubeck (2010) Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010. 13–p.
  • Auer et al. (2002) Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.
  • Auer et al. (1995) Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. IEEE, 322–331.
  • Avner and Mannor (2014) Orly Avner and Shie Mannor. 2014. Concurrent bandits and cognitive radio networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 66–81.
  • Awerbuch and Kleinberg (2008) Baruch Awerbuch and Robert Kleinberg. 2008. Competitive collaborative learning. J. Comput. System Sci. 74, 8 (2008), 1271–1288.
  • Bar-On and Mansour (2019) Yogev Bar-On and Yishay Mansour. 2019. Individual regret in cooperative nonstochastic multi-armed bandits. Advances in Neural Information Processing Systems 32 (2019), 3116–3126.
  • Bargiacchi et al. (2018) Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, and Hado Hasselt. 2018. Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In International conference on machine learning. PMLR, 482–490.
  • Bistritz and Bambos (2020) Ilai Bistritz and Nicholas Bambos. 2020. Cooperative multi-player bandit optimization. Advances in Neural Information Processing Systems 33 (2020).
  • Bistritz and Leshem (2018) Ilai Bistritz and Amir Leshem. 2018. Distributed multi-player bandits-a game of thrones approach. In Advances in Neural Information Processing Systems. 7222–7232.
  • Bogunovic et al. (2020) Ilija Bogunovic, Andreas Krause, and Jonathan Scarlett. 2020. Corruption-tolerant Gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 1071–1081.
  • Bogunovic et al. (2021) Ilija Bogunovic, Arpan Losalka, Andreas Krause, and Jonathan Scarlett. 2021. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics. PMLR, 991–999.
  • Boursier and Perchet (2019) Etienne Boursier and Vianney Perchet. 2019. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits. Advances in Neural Information Processing Systems 32 (2019), 12071–12080.
  • Bubeck et al. (2011) Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2011. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science 412, 19 (2011), 1832–1852.
  • Buccapatnam et al. (2015) Swapna Buccapatnam, Jian Tan, and Li Zhang. 2015. Information sharing in distributed stochastic bandits. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2605–2613.
  • Cesa-Bianchi et al. (2016) Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. 2016. Delay and cooperation in nonstochastic bandits. In Conference on Learning Theory, Vol. 49. 605–622.
  • Chakraborty et al. (2017) Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, and Brendan Juba. 2017. Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits.. In IJCAI. 164–170.
  • Chawla et al. (2020) Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2020. The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 3471–3481.
  • Chawla et al. (2022) Ronshee Chawla, Abishek Sankararaman, and Sanjay Shakkottai. 2022. Multi-agent low-dimensional linear bandits. IEEE Trans. Automat. Control (2022).
  • Chierichetti et al. (2010) Flavio Chierichetti, Silvio Lattanzi, and Alessandro Panconesi. 2010. Almost tight bounds for rumour spreading with conductance. In Proceedings of the forty-second ACM symposium on Theory of computing. 399–408.
  • Dakdouk et al. (2021) Hiba Dakdouk, Raphaël Féraud, Romain Laroche, Nadège Varsier, and Patrick Maillé. 2021. Collaborative Exploration and Exploitation in massively Multi-Player Bandits. (2021).
  • Dubey et al. (2020a) Abhimanyu Dubey et al. 2020a. Cooperative multi-agent bandits with heavy tails. In International Conference on Machine Learning. PMLR, 2730–2739.
  • Dubey et al. (2020b) Abhimanyu Dubey et al. 2020b. Kernel methods for cooperative multi-agent contextual bandits. In International Conference on Machine Learning. PMLR, 2740–2750.
  • Dubey and Pentland (2020) Abhimanyu Dubey and AlexSandy’ Pentland. 2020. Differentially-Private Federated Linear Bandits. Advances in Neural Information Processing Systems 33 (2020).
  • Garcelon et al. (2020) Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, and Matteo Pirotta. 2020. Adversarial Attacks on Linear Contextual Bandits. Advances in Neural Information Processing Systems 33 (2020).
  • Gupta et al. (2019) Anupam Gupta, Tomer Koren, and Kunal Talwar. 2019. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. In Conference on Learning Theory. 1562–1578.
  • Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  • Hillel et al. (2013) Eshcar Hillel, Zohar S Karnin, Tomer Koren, Ronny Lempel, and Oren Somekh. 2013. Distributed exploration in multi-armed bandits. In Advances in Neural Information Processing Systems. 854–862.
  • Jun et al. (2018) Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Jerry Zhu. 2018. Adversarial attacks on stochastic bandits. Advances in Neural Information Processing Systems 31 (2018), 3640–3649.
  • Kalathil et al. (2014) Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. 2014. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory 60, 4 (2014), 2331–2345.
  • Kanade et al. (2012) Varun Kanade, Zhenming Liu, and Bozidar Radunovic. 2012. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems. 260–268.
  • Kao et al. (2022) Hsu Kao, Chen-Yu Wei, and Vijay Subramanian. 2022. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory. PMLR, 573–605.
  • Kapoor et al. (2019) Sayash Kapoor, Kumar Kshitij Patel, and Purushottam Kar. 2019. Corruption-tolerant bandit learning. Machine Learning 108, 4 (2019), 687–715.
  • Kolla et al. (2018) Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan. 2018. Collaborative learning of stochastic bandits over a social network. IEEE/ACM Transactions on Networking 26, 4 (2018), 1782–1795.
  • Korda et al. (2016) Nathan Korda, Balázs Szörényi, and Li Shuai. 2016. Distributed clustering of linear bandits in peer to peer networks. In Journal of machine learning research workshop and conference proceedings, Vol. 48. International Machine Learning Societ, 1301–1309.
  • Lalitha and Goldsmith (2021) Anusha Lalitha and Andrea Goldsmith. 2021. Bayesian Algorithms for Decentralized Stochastic Bandits. IEEE Journal on Selected Areas in Information Theory 2, 2 (2021), 564–583.
  • Landgren et al. (2016) Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. 2016. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms. In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 167–172.
  • Lattimore and Szepesvári (2020) Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
  • LeBlanc et al. (2013) Heath J LeBlanc, Haotian Zhang, Xenofon Koutsoukos, and Shreyas Sundaram. 2013. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications 31, 4 (2013), 766–781.
  • Liu and Shroff (2019) Fang Liu and Ness Shroff. 2019. Data Poisoning Attacks on Stochastic Bandits. In International Conference on Machine Learning. 4042–4050.
  • Liu et al. (2021) Junyan Liu, Shuai Li, and Dapeng Li. 2021. Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions. arXiv preprint arXiv:2106.04207 (2021).
  • Liu and Zhao (2010) Keqin Liu and Qing Zhao. 2010. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing 58, 11 (2010), 5667–5681.
  • Liu et al. (2020) Lydia T Liu, Horia Mania, and Michael Jordan. 2020. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics. PMLR, 1618–1628.
  • Lykouris et al. (2018) Thodoris Lykouris, Vahab Mirrokni, and Renato Paes Leme. 2018. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 114–122.
  • Madhushani and Leonard (2021) Udari Madhushani and Naomi Leonard. 2021. When to call your neighbor? strategic communication in cooperative stochastic bandits. arXiv preprint arXiv:2110.04396 (2021).
  • Mansour et al. (2018) Yishay Mansour, Aleksandrs Slivkins, and Steven Wu. 2018. Competing bandits: Learning under competition. In 9th Innovations in Theoretical Computer Science, ITCS 2018. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 48.
  • Martínez-Rubio et al. (2019) David Martínez-Rubio, Varun Kanade, and Patrick Rebeschini. 2019. Decentralized cooperative stochastic multi-armed bandits. Advances in Neural Information Processing Systems (2019).
  • Mitra et al. (2021) Aritra Mitra, Hamed Hassani, and George Pappas. 2021. Exploiting Heterogeneity in Robust Federated Best-Arm Identification. arXiv preprint arXiv:2109.05700 (2021).
  • Newton et al. (2021) Conor Newton, AJ Ganesh, and Henry Reeve. 2021. Asymptotic Optimality for Decentralised Bandits. In Reinforcement Learning in Networks and Queues, Sigmetrics 2021.
  • Pittel (1987) Boris Pittel. 1987. On spreading a rumor. SIAM J. Appl. Math. 47, 1 (1987), 213–223.
  • Rosenski et al. (2016) Jonathan Rosenski, Ohad Shamir, and Liran Szlak. 2016. Multi-player bandits–a musical chairs approach. In International Conference on Machine Learning. 155–163.
  • Sankararaman et al. (2019) Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social learning in multi agent multi armed bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, 3 (2019), 1–35.
  • Shahrampour et al. (2017) Shahin Shahrampour, Alexander Rakhlin, and Ali Jadbabaie. 2017. Multi-armed bandits in multi-agent networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2786–2790.
  • Szörényi et al. (2013) Balázs Szörényi, Róbert Busa-Fekete, István Hegedűs, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. 2013. Gossip-based distributed stochastic bandit algorithms. In Journal of Machine Learning Research Workshop and Conference Proceedings, Vol. 2. International Machine Learning Societ, 1056–1064.
  • Tekin and Van Der Schaar (2015) Cem Tekin and Mihaela Van Der Schaar. 2015. Distributed online learning via cooperative contextual bandits. IEEE transactions on signal processing 63, 14 (2015), 3700–3714.
  • Vial et al. (2021) Daniel Vial, Sanjay Shakkottai, and R Srikant. 2021. Robust multi-agent multi-armed bandits. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 161–170.
  • Wang et al. (2020) Po-An Wang, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, and Alessio Russo. 2020. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 4120–4129.
  • Zhu et al. (2021) Zhaowei Zhu, Jingxuan Zhu, Ji Liu, and Yang Liu. 2021. Federated bandit: A gossiping approach. In Abstract Proceedings of the 2021 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems. 3–4.

Appendix A Notes on appendices

The appendices are organized as follows. First, Appendix B contains the additional numerical results that were mentioned in Section 6. Next, we prove Theorem 1 in Appendix C and all results from Section 5 in Appendix D. We then provide a rigorous version of the proof sketch from Section 7 in Appendix E. Finally, Appendix F contains some auxiliary results – namely, Appendix F.1 records some simple inequalities, Appendix F.2 provides some bandit results that are essentially known but stated in forms convenient to us, and Appendices F.3-F.4 contain some tedious calculations.

For the analysis, we use CiC_{i}, CiC_{i}^{\prime}, etc. to denote positive constants depending only on the algorithmic parameters α\alpha, β\beta, η\eta, ρ1\rho_{1}, and ρ2\rho_{2}. Each is associated with a corresponding claim, e.g., C1C_{\ref{clmLogRegNaive}} with Claim 1. Within the proofs, we use CC, CC^{\prime}, etc. to denote constants whose values may change across proofs. Finally, 𝟙\mathbbm{1} denotes the indicator function, 𝔼j\mathbb{E}_{j} and j\mathbb{P}_{j} are expectation and probability conditioned on all randomness before the jj-th communication period, and A1(t)=min{j:tAj}A^{-1}(t)=\min\{j\in\mathbb{N}:t\leq A_{j}\} denotes the current phase at time tt\in\mathbb{N}.

Appendix B Additional experiments

Refer to caption
Figure 4. Empirical results for real data. Rows of subfigures correspond to the malicious strategy, while columns correspond to the edge probability pp for the G(n+m,p)G(n+m,p) random graph.
\Description

Empirical results for real data. Rows of subfigures correspond to the malicious strategy, while columns correspond to the edge probability pp for the G(n+m,p)G(n+m,p) random graph.

Refer to caption
Figure 5. Empirical results for synthetic data with p=1/2p=1/2 and mixed malicious strategies.
\Description

Empirical results for synthetic data with p=1/2p=1/2 and mixed malicious strategies.

As mentioned in Section 6, we also considered arm means derived from real data. The setup was the same as for the synthetic means, except for two changes (as in (Vial et al., 2021)): we choose m=15m=15 instead of m=10m=10, and we sample {μk}k=1K\{\mu_{k}\}_{k=1}^{K} uniformly from a set of arm means derived from MovieLens (Harper and Konstan, 2015) user film ratings via matrix completion; see (Vial et al., 2021, Section 6.2) for details. The results (Figure 4) are qualitatively similar to the synthetic case.

Finally, we repeated the synthetic data experiments from Section 6 with the intermediate G(n+m,p)G(n+m,p) graph parameter p=1/2p=1/2 and two new malicious strategies called mixed naive and mixed smart. As discussed in Section 6, these approaches use a “mixed report” where the malicious agents more frequently recommend good arms – namely, the second best when the best is inactive and the naive or smart recommendation otherwise. Results are shown in Figure 5. They again reinforce the key message that the proposed rule adapts more gracefully to networks beyond the complete graph – in this case, our blocking rule has less than half of the regret of the existing one at the horizon TT. Additionally, we observe that the no blocking algorithm from (Chawla et al., 2020) has much lower regret in Figure 5 than Figure 3, though still higher than our proposed blocking algorithm. This suggests that our algorithm remains superior even for “nicer” malicious strategies under which blocking is less necessary (in the sense that (Chawla et al., 2020) has lower regret in Figure 5 than Figure 3).

Appendix C Proof of Theorem 1

We first observe that since h2h1hh\leq 2^{h-1}\ \forall\ h\in\mathbb{N}, we can lower bound the arm gap as follows:

Δ2=μ1μ2=11315h=1(n/2)2(116)2h1>11315h=1(116)h=115.\Delta_{2}=\mu_{1}-\mu_{2}=1-\frac{13}{15}-\sum_{h=1}^{(n/2)-2}\left(\frac{1}{16}\right)^{2^{h-1}}>1-\frac{13}{15}-\sum_{h=1}^{\infty}\left(\frac{1}{16}\right)^{h}=\frac{1}{15}.

Next, we show that for phases jJlj\geq J_{l}, agents aware of arms at least as good as 1l+n/21-l+n/2 will (1) play such arms most often in phase jj and (2) have such arms active thereafter. The proof is basically a noiseless version of a known bandit argument, specialized to the setting of Section 4.

Lemma 1.

Under the assumptions of Theorem 1, for any l[n/2]l\in[n/2], jJlj\geq J_{l}, and i[n]i\in[n] such that minSj(i)1l+n/2\min S_{j}^{(i)}\leq 1-l+n/2, we have Bj(i)1l+n/2B_{j}^{(i)}\leq 1-l+n/2 and minSj(i)1l+n/2jj\min S_{j^{\prime}}^{(i)}\leq 1-l+n/2\ \forall\ j^{\prime}\geq j.

Proof.

First, we prove by contradiction that Bj(i)1l+n/2B_{j}^{(i)}\leq 1-l+n/2: suppose instead that Bj(i)2l+n/2B_{j}^{(i)}\geq 2-l+n/2. Let k1=minSj(i)k_{1}=\min S_{j}^{(i)} and k2=Bj(i)k_{2}=B_{j}^{(i)}. Then Tk2(i)(Aj)Tk2(i)(Aj1)(AjAj1)/3T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})\geq(A_{j}-A_{j-1})/3; otherwise, since |Sj(i)|=S+2=3|S_{j}^{(i)}|=S+2=3 by assumption and k2k_{2} is the most played arm in phase jj, we obtain

kSj(i)(Tk(i)(Aj)Tk(i)(Aj1))3(Tk2(i)(Aj)Tk2(i)(Aj1))<AjAj1,\sum_{k\in S_{j}^{(i)}}(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1}))\leq 3(T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1}))<A_{j}-A_{j-1},

which is a contradiction. Furthermore, there clearly exists t{1+Aj1,,Aj}t\in\{1+A_{j-1},\ldots,A_{j}\} such that

Tk2(i)(t1)Tk2(i)(Aj1)=Tk2(i)(Aj)Tk2(i)(Aj1)1,It(i)=k2.T_{k_{2}}^{(i)}(t-1)-T_{k_{2}}^{(i)}(A_{j-1})=T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})-1,\quad I_{t}^{(i)}=k_{2}.

Combining these observations, and since Tk2(i)(Aj1)0T_{k_{2}}^{(i)}(A_{j-1})\geq 0, we obtain that

Tk2(i)(t1)Tk2(i)(Aj)Tk2(i)(Aj1)1AjAj131,It(i)=k2.T_{k_{2}}^{(i)}(t-1)\geq T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})-1\geq\frac{A_{j}-A_{j-1}}{3}-1,\quad I_{t}^{(i)}=k_{2}.

By the UCB policy and since α=4\alpha=4 by assumption, the previous expression implies

(25) μk1<μk1+4logtTk1(i)(t1)μk2+4logtTk2(i)(t1)μk2+4logtAjAj131.\mu_{k_{1}}<\mu_{k_{1}}+\sqrt{\frac{4\log t}{T_{k_{1}}^{(i)}(t-1)}}\leq\mu_{k_{2}}+\sqrt{\frac{4\log t}{T_{k_{2}}^{(i)}(t-1)}}\leq\mu_{k_{2}}+\sqrt{\frac{4\log t}{\frac{A_{j}-A_{j-1}}{3}-1}}.

Since Aj=j2A_{j}=j^{2} by the assumption β=2\beta=2 and tAjt\leq A_{j}, we also know

(26) 4logtAjAj1318logj2j131=12logjj2=h(j),\frac{4\log t}{\frac{A_{j}-A_{j-1}}{3}-1}\leq\frac{8\log j}{\frac{2j-1}{3}-1}=\frac{12\log j}{j-2}=h(j),

where we define h(j)=12log(j)/(j2)j>2h(j^{\prime})=12\log(j^{\prime})/(j^{\prime}-2)\ \forall\ j^{\prime}>2. Note this function decreases on [3,)[3,\infty), since

h(j)=12(j2jlogj)j(j2)212(j2jlog3)j(j2)2<24j(j2)2<0j3,h^{\prime}(j^{\prime})=\frac{12(j^{\prime}-2-j^{\prime}\log j^{\prime})}{j^{\prime}(j^{\prime}-2)^{2}}\leq\frac{12(j^{\prime}-2-j^{\prime}\log 3)}{j^{\prime}(j^{\prime}-2)^{2}}<\frac{-24}{j^{\prime}(j^{\prime}-2)^{2}}<0\ \forall\ j^{\prime}\geq 3,

where the second inequality is e<3e<3. Thus, since jJlJ13j\geq J_{l}\geq J_{1}\geq 3, we know h(j)h(Jl)h(j)\leq h(J_{l}). Combined with (25) and (26), we obtain μk1<μk2+h(Jl)\mu_{k_{1}}<\mu_{k_{2}}+\sqrt{h(J_{l})}. Finally, recall k22l+n/2k_{2}\geq 2-l+n/2 and k11l+n/2k_{1}\leq 1-l+n/2, so μk1μk2μ1l+n/2μ2l+n/2>0\mu_{k_{1}}-\mu_{k_{2}}\geq\mu_{1-l+n/2}-\mu_{2-l+n/2}>0. Combined with μk1<μk2+h(Jl)\mu_{k_{1}}<\mu_{k_{2}}+\sqrt{h(J_{l})}, we conclude

(27) (μ1l+n/2μ2l+n/2)2<h(Jl)=12log(Jl)/(Jl2).(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}<h(J_{l})=12\log(J_{l})/(J_{l}-2).

We now show that in each of three cases, (27) yields a contradiction.

  • l=1l=1: By definition, the left side of (27) is (μn/2μ1+n/2)2=(13/15)2(\mu_{n/2}-\mu_{1+n/2})^{2}=(13/15)^{2} and the right side is 12log(28)/(282)12\log(2^{8})/(2^{8}-2), and one can verify that 12log(28)/(282)(13/15)212\log(2^{8})/(2^{8}-2)\leq(13/15)^{2}.

  • 1<l<n/21<l<n/2: Here 1l+n/2>11-l+n/2>1 and 2l+n/2<1+n/22-l+n/2<1+n/2, i.e., both arms are mediocre. By definition, we thus have (μ1l+n/2μ2l+n/2)2=22l+1(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}=2^{-2^{l+1}}, so to obtain a contradiction to (27), it suffices to show h(Jl)22l+1h(J_{l})\leq 2^{-2^{l+1}}. We show by induction that this holds for all l{2,3,}l\in\{2,3,\ldots\}.

    For l=2l=2, note J2=(J1+2)2>J12+2=216+2J_{2}=(J_{1}+2)^{2}>J_{1}^{2}+2=2^{16}+2 (so J22>216J_{2}-2>2^{16}) and J2=J12+4J1+4=216+210+22<217J_{2}=J_{1}^{2}+4J_{1}+4=2^{16}+2^{10}+2^{2}<2^{17} (so logJ2<17log2<17\log J_{2}<17\log 2<17). Thus, h(J2)<1217/216=204/216<28h(J_{2})<12\cdot 17/2^{16}=204/2^{16}<2^{-8}.

    Now assume h(Jl)<22l+1h(J_{l})<2^{-2^{l+1}} for some l2l\geq 2; we aim to show h(Jl+1)<22l+2h(J_{l+1})<2^{-2^{l+2}}. Since Jl2J_{l}\geq 2, we have Jl+1(2Jl)2Jl4J_{l+1}\leq(2J_{l})^{2}\leq J_{l}^{4}; we also know Jl+12=Jl2+4Jl+2>Jl(Jl2)J_{l+1}-2=J_{l}^{2}+4J_{l}+2>J_{l}(J_{l}-2). Thus, we obtain

    (28) h(Jl+1)<12logJl4Jl(Jl2)=4Jlh(Jl)<4Jl22l+1<4J12l122l+1=222l+22l+1,h(J_{l+1})<\frac{12\log J_{l}^{4}}{J_{l}(J_{l}-2)}=\frac{4}{J_{l}}\cdot h(J_{l})<\frac{4}{J_{l}}\cdot 2^{-2^{l+1}}<\frac{4}{J_{1}^{2^{l-1}}}\cdot 2^{-2^{l+1}}=2^{2-2^{l+2}-2^{l+1}},

    where the inequalities follow from the previous paragraph, the inductive hypothesis, and (7) from Section 4.1, respectively. Since 2<2l+12<2^{l+1}, this completes the proof.

  • l=n/2l=n/2: Recall that in the previous case, we showed h(Jl)<22l+1h(J_{l})<2^{-2^{l+1}} for any l{2,3,}l\in\{2,3,\ldots\}. Therefore, h(Jn/2)22(n/2)+128h(J_{n/2})\leq 2^{-2^{(n/2)+1}}\leq 2^{-8} by assumption n{4,6,}n\in\{4,6,\ldots\}. Since (μ1l+n/2μ2l+n/2)2=Δ22=(1/15)2>(1/16)2=28(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}=\Delta_{2}^{2}=(1/15)^{2}>(1/16)^{2}=2^{-8} in this case, we obtain a contradiction to (27).

Thus, we have established the first part of the lemma (Bj(i)1l+n/2B_{j}^{(i)}\leq 1-l+n/2). To show minSj(i)1l+n/2\min S_{j^{\prime}}^{(i)}\leq 1-l+n/2, we suppose instead that minSj(i)>1l+n/2\min S_{j^{\prime}}^{(i)}>1-l+n/2 for some jjj^{\prime}\geq j. Then j=min{jj:minSj(i)>1l+n/2}j^{\dagger}=\min\{j^{\prime}\geq j:\min S_{j^{\prime}}^{(i)}>1-l+n/2\} is well-defined. If j=jj^{\dagger}=j, then minSj(i)>1l+n/2\min S_{j}^{(i)}>1-l+n/2, which violates the assumption of the lemma, so we assume j>jj^{\dagger}>j. In this case, we know minSj1(i)1l+n/2\min S_{j^{\dagger}-1}^{(i)}\leq 1-l+n/2 (since jj^{\dagger} is minimal) and Bj1(i)>1l+n/2B_{j^{\dagger}-1}^{(i)}>1-l+n/2 (else, because Bj1(i)Sj(i)B_{j^{\dagger}-1}^{(i)}\in S_{j^{\dagger}}^{(i)}, we would have minSj(i)Bj1(i)1l+n/2\min S_{j^{\dagger}}^{(i)}\leq B_{j^{\dagger}-1}^{(i)}\leq 1-l+n/2). But since j1Jlj^{\dagger}-1\geq J_{l} (by assumption j>jj^{\dagger}>j and jJlj\geq J_{l}), this contradicts the first part of the lemma. ∎

Next, for each l[n/2]l\in[n/2], we define the event

l={l+n/2PJl(l+1+n/2)}j=1Jli=l+n/2n{minSj(i)>1l+n/2,n+1Pj(i)},\mathcal{E}_{l}=\{l+n/2\notin P_{J_{l}}^{(l+1+n/2)}\}\cap\cap_{j=1}^{J_{l}}\cap_{i=l+n/2}^{n}\{\min S_{j}^{(i)}>1-l+n/2,n+1\notin P_{j}^{(i)}\},

where {nPJn/2(n+1)}=Ω\{n\notin P_{J_{n/2}}^{(n+1)}\}=\Omega by convention (so n/2\mathcal{E}_{n/2} is well-defined). Thus, in words l\mathcal{E}_{l} says (1) l+1+n/2l+1+n/2 is not blocking l+n/2l+n/2 at phase JlJ_{l}, (2) no honest agent il+n/2i\geq l+n/2 has ever been aware of arms as good as 1l+n/21-l+n/2 up to phase JlJ_{l}, and (3) no such ii has ever blocked the malicious agent n+1n+1. Point (2) will allow us to show that agent nn does not become aware of the best arm until phase Jn/2J_{n/2} (when n/2\mathcal{E}_{n/2} holds). The other events in the definition of {l}l=1n/2\{\mathcal{E}_{l}\}_{l=1}^{n/2} will allow us to inductively lower bound their probabilities. The next lemma establishes the base of this inductive argument.

Lemma 2.

Under the assumptions of Theorem 1, (1)32(J11)\mathbb{P}(\mathcal{E}_{1})\geq 3^{-2(J_{1}-1)}.

Proof.

We first observe that at all phases j[J11]j\in[J_{1}-1], only the second case of the malicious strategy – where the malicious agent recommends to avoid blocking – arises, which implies n+1Pj(i)j[J1],i[n]n+1\notin P_{j}^{(i)}\ \forall\ j\in[J_{1}],i\in[n]. Therefore, it suffices to show (1)32(J11)\mathbb{P}(\mathcal{E}_{1}^{\prime})\geq 3^{-2(J_{1}-1)}, where we define

1={1+n/2PJ1(2+n/2)}j=1J1i=1+n/2n{minSj(i)>n/2}.\mathcal{E}_{1}^{\prime}=\{1+n/2\notin P_{J_{1}}^{(2+n/2)}\}\cap\cap_{j=1}^{J_{1}}\cap_{i=1+n/2}^{n}\{\min S_{j}^{(i)}>n/2\}.

To do so, we will show 1j=1J11i=1+n/22+n/2{Hj(i)=n+1}\mathcal{E}_{1}^{\prime}\supset\mathcal{F}\triangleq\cap_{j=1}^{J_{1}-1}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\} and ()32(J11)\mathbb{P}(\mathcal{F})\geq 3^{-2(J_{1}-1)}.

To show ()32(J11)\mathbb{P}(\mathcal{F})\geq 3^{-2(J_{1}-1)}, first note that by the law of total expectation, we have

()=𝔼[𝟙(j=1J12i=1+n/22+n/2{Hj(i)=n+1})J11(i=1+n/22+n/2{HJ11(i)=n+1})].\mathbb{P}(\mathcal{F})=\mathbb{E}[\mathbbm{1}(\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})\mathbb{P}_{J_{1}-1}(\cap_{i=1+n/2}^{2+n/2}\{H_{J_{1}-1}^{(i)}=n+1\})].

Now when j=1J12i=1+n/22+n/2{Hj(i)=n+1}\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\} occurs, the malicious strategy implies PJ11(i)=P_{J_{1}-1}^{(i)}=\emptyset, so HJ11(i)H_{J_{1}-1}^{(i)} is sampled from a set of three agents which includes n+1n+1, for each i{1+n/2,2+n/2}i\in\{1+n/2,2+n/2\}. Since this sampling is independent, we conclude that when j=1J12i=1+n/22+n/2{Hj(i)=n+1}\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\} occurs,

J11(i=1+n/22+n/2{HJ11(i)=n+1})=32.\displaystyle\mathbb{P}_{J_{1}-1}(\cap_{i=1+n/2}^{2+n/2}\{H_{J_{1}-1}^{(i)}=n+1\})=3^{-2}.

Thus, combining the previous two expressions with the definition of \mathcal{F} and iterating, we obtain

(j=1J11i=1+n/22+n/2{Hj(i)=n+1})=()=(j=1J12i=1+n/22+n/2{Hj(i)=n+1})/32=32(J11).\mathbb{P}(\cap_{j=1}^{J_{1}-1}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})=\mathbb{P}(\mathcal{F})=\mathbb{P}(\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})/3^{2}=3^{-2(J_{1}-1)}.

To show 1\mathcal{E}_{1}^{\prime}\supset\mathcal{F}, first observe that when \mathcal{F} occurs, 2+n/22+n/2 does not contact 1+n/21+n/2 at any phase j[J11]j\in[J_{1}-1], so 1+n/2PJ1(2+n/2)1+n/2\notin P_{J_{1}}^{(2+n/2)}. Thus, it only remains to show that \mathcal{F} implies minSj(i)>n/2\min S_{j}^{(i)}>n/2 for all jJ1j\leq J_{1} and i>n/2i>n/2. Suppose instead that \mathcal{F} holds and minSj(i)n/2\min S_{j}^{(i)}\leq n/2 for some such jj and ii. Let j=min{jJ1:minSj(i)n/2 for some i>n/2}j^{\dagger}=\min\{j\leq J_{1}:\min S_{j}^{(i)}\leq n/2\text{ for some }i>n/2\} be the earliest jj it occurs; note j>1j^{\dagger}>1 by assumption that minS1(i)>n/2\min S_{1}^{(i)}>n/2 for i>n/2i>n/2. Let ii^{\dagger} be some agent it occurs for, i.e., i>n/2i^{\dagger}>n/2 is such that kSj(i)k^{\dagger}\in S_{j^{\dagger}}^{(i^{\dagger})} for some kn/2k^{\dagger}\leq n/2. Since jj^{\dagger} is the earliest such phase and j>1j^{\dagger}>1, we know kk^{\dagger} was not active for ii^{\dagger} at the previous phase j1j^{\dagger}-1, so it was recommended to ii^{\dagger} at this phase. By the malicious strategy and j1J11j^{\dagger}-1\leq J_{1}-1, this implies that the agent ii^{\ddagger} who recommended kk^{\dagger} to ii^{\dagger} is honest, so kk^{\dagger} was active for ii^{\ddagger} at the previous phase, which implies in/2i^{\ddagger}\leq n/2 (else, we contradict the minimality of jj^{\dagger}). From the assumed line graph structure, we must have i=1+n/2i^{\dagger}=1+n/2 and i=n/2i^{\ddagger}=n/2, i.e., 1+n/21+n/2 contacted n/2n/2 at phase j1J11j^{\dagger}-1\leq J_{1}-1. But this contradicts the definition of \mathcal{F}, which stipulates that 1+n/21+n/2 only contacts n+1n+1 before J1J_{1}. ∎

To continue the inductive argument, we lower bound (l+1)\mathbb{P}(\mathcal{E}_{l+1}) in terms of (l)\mathbb{P}(\mathcal{E}_{l}).

Lemma 3.

Under the assumptions of Theorem 1, for any l[(n/21]l\in[(n/2-1], we have (l+1)34(l)\mathbb{P}(\mathcal{E}_{l+1})\geq 3^{-4}\mathbb{P}(\mathcal{E}_{l}).

Proof.

The proof is somewhat lengthy and proceeds in four steps.

Step 1: Probabilistic arguments. First, we define the events

𝒢1=i=l+n/2l+2+n/2{HJl(i)=n+1},𝒢2={HJl+1(l+1+n/2)=l+n/2},𝒢=𝒢1𝒢2.\mathcal{G}_{1}=\cap_{i=l+n/2}^{l+2+n/2}\{H_{J_{l}}^{(i)}=n+1\},\quad\mathcal{G}_{2}=\{H_{J_{l}+1}^{(l+1+n/2)}=l+n/2\},\quad\mathcal{G}=\mathcal{G}_{1}\cap\mathcal{G}_{2}.

Then by the law of total expectation, we know that

(29) (l𝒢)=𝔼[𝔼Jl+1[𝟙(l𝒢1)𝟙(𝒢2)]]=𝔼[𝟙(l𝒢1)Jl+1(𝒢2)].\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})=\mathbb{E}[\mathbb{E}_{J_{l}+1}[\mathbbm{1}(\mathcal{E}_{l}\cap\mathcal{G}_{1})\mathbbm{1}(\mathcal{G}_{2})]]=\mathbb{E}[\mathbbm{1}(\mathcal{E}_{l}\cap\mathcal{G}_{1})\mathbb{P}_{J_{l}+1}(\mathcal{G}_{2})].

Now if l𝒢1\mathcal{E}_{l}\cap\mathcal{G}_{1} occurs, then l+n/2PJl(l+1+n/2)l+n/2\notin P_{J_{l}}^{(l+1+n/2)} (by l\mathcal{E}_{l}) and HJl(l+1+n/2)=n+1H_{J_{l}}^{(l+1+n/2)}=n+1 (by 𝒢1\mathcal{G}_{1}); the latter implies l+n/2PJl+1(l+1+n/2)PJl(l+1+n/2)l+n/2\notin P_{J_{l}+1}^{(l+1+n/2)}\setminus P_{J_{l}}^{(l+1+n/2)}, so combined with the former, l+n/2PJl+1(l+1+n/2)l+n/2\notin P_{J_{l}+1}^{(l+1+n/2)}. Thus, l𝒢1\mathcal{E}_{l}\cap\mathcal{G}_{1} implies that HJl+1(l+1+n/2)H_{J_{l}+1}^{(l+1+n/2)} is sampled from a set of most three agents containing l+n/2l+n/2, so Jl+1(𝒢2)1/3\mathbb{P}_{J_{l}+1}(\mathcal{G}_{2})\geq 1/3. Substituting into (29), and again using total expectation, we thus obtain

(l𝒢)(l𝒢1)/3=𝔼[𝔼Jl[𝟙(l)𝟙(𝒢1)]]/3=𝔼[𝟙(l)Jl(𝒢1)]/3.\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})\geq\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G}_{1})/3=\mathbb{E}[\mathbb{E}_{J_{l}}[\mathbbm{1}(\mathcal{E}_{l})\mathbbm{1}(\mathcal{G}_{1})]]/3=\mathbb{E}[\mathbbm{1}(\mathcal{E}_{l})\mathbb{P}_{J_{l}}(\mathcal{G}_{1})]/3.

Analogously, when l\mathcal{E}_{l} holds, n+1PJl(i)i{l+n/2,,l+2+n/2}n+1\notin P_{J_{l}}^{(i)}\ \forall\ i\in\{l+n/2,\ldots,l+2+n/2\}, which by similar logic gives Jl(𝒢1)33\mathbb{P}_{J_{l}}(\mathcal{G}_{1})\geq 3^{-3}. Therefore, combining the previous two inequalities, we have shown (l𝒢)34(l)\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})\geq 3^{-4}\mathbb{P}(\mathcal{E}_{l}). Consequently, it suffices to show that l𝒢l+1\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{E}_{l+1}.

Step 2: Event decomposition. For l{l,l+1}l^{\prime}\in\{l,l+1\}, we decompose l=h=14l,h\mathcal{E}_{l^{\prime}}=\cap_{h=1}^{4}\mathcal{H}_{l^{\prime},h}, where

(30) l,1={l+n/2PJl(l+1+n/2)},l,2=j=1Jl1i=l+n/2n{minSj(i)>1l+n/2,n+1Pj(i)},\displaystyle\mathcal{H}_{l^{\prime},1}=\{l^{\prime}+n/2\notin P_{J_{l^{\prime}}}^{(l^{\prime}+1+n/2)}\},\ \mathcal{H}_{l^{\prime},2}=\cap_{j=1}^{J_{l^{\prime}-1}}\cap_{i=l^{\prime}+n/2}^{n}\{\min S_{j}^{(i)}>1-l^{\prime}+n/2,n+1\notin P_{j}^{(i)}\},
(31) l,3=j=Jl1+1Jli=l+n/2n{minSj(i)>1l+n/2},l,4=j=Jl1+1Jli=l+n/2n{n+1Pj(i)}.\displaystyle\mathcal{H}_{l^{\prime},3}=\cap_{j=J_{l^{\prime}-1}+1}^{J_{l^{\prime}}}\cap_{i=l^{\prime}+n/2}^{n}\{\min S_{j}^{(i)}>1-l^{\prime}+n/2\},\ \mathcal{H}_{l^{\prime},4}=\cap_{j=J_{l^{\prime}-1}+1}^{J_{l^{\prime}}}\cap_{i=l^{\prime}+n/2}^{n}\{n+1\notin P_{j}^{(i)}\}.

As a simple consequence of these definitions, we note that

(32) l,2l,3l,4\displaystyle\mathcal{H}_{l,2}\cap\mathcal{H}_{l,3}\cap\mathcal{H}_{l,4} =j=1Jli=l+n/2n{minSj(i)>1l+n/2,n+1Pj(i)}\displaystyle=\cap_{j=1}^{J_{l}}\cap_{i=l+n/2}^{n}\{\min S_{j}^{(i)}>1-l+n/2,n+1\notin P_{j}^{(i)}\}
(33) j=1Jli=l+1+n/2n{minSj(i)>1(l+1)+n/2,n+1Pj(i)}=l+1,2.\displaystyle\subset\cap_{j=1}^{J_{l}}\cap_{i=l+1+n/2}^{n}\{\min S_{j}^{(i)}>1-(l+1)+n/2,n+1\notin P_{j}^{(i)}\}=\mathcal{H}_{l+1,2}.

Hence, to prove l𝒢l+1\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{E}_{l+1}, it suffices to show l𝒢l+1,1l+1,3l+1,4\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{H}_{l+1,1}\cap\mathcal{H}_{l+1,3}\cap\mathcal{H}_{l+1,4}. For the remainder of the proof, we thus assume l𝒢\mathcal{E}_{l}\cap\mathcal{G} holds and argue l+1,1l+1,3l+1,4\mathcal{H}_{l+1,1}\cap\mathcal{H}_{l+1,3}\cap\mathcal{H}_{l+1,4} holds.

Step 3: Some consequences. We begin by deriving several consequences of l𝒢\mathcal{E}_{l}\cap\mathcal{G}. First, note each i{l+1+n/2,l+2+n/2}i\in\{l+1+n/2,l+2+n/2\} contacts n+1n+1 at phase JlJ_{l} (by 𝒢1\mathcal{G}_{1}), who recommends 1l+n/21-l+n/2 (by the malicious strategy). Since minSJl(i)>1l+n/2\min S_{J_{l}}^{(i)}>1-l+n/2 (by l,3\mathcal{H}_{l,3}), this implies 1l+n/2=minSJl+1(i)1-l+n/2=\min S_{J_{l}+1}^{(i)}, so 1l+n/21-l+n/2 is most played in phase Jl+1J_{l}+1 (by Lemma 1). In summary, we have shown

(34) HJl(i)=n+1,RJl(i)=BJl+1(i)=1l+n/2i{l+1+n/2,l+2+n/2}.H_{J_{l}}^{(i)}=n+1,R_{J_{l}}^{(i)}=B_{J_{l}+1}^{(i)}=1-l+n/2\ \forall\ i\in\{l+1+n/2,l+2+n/2\}.

Second, as a consequence of the above and Lemma 1, we can also write

(35) 1l+n/2=minSJl+1(i)minSJl+2(i)i{l+1+n/2,l+2+n/2}.1-l+n/2=\min S_{J_{l}+1}^{(i)}\geq\min S_{J_{l}+2}^{(i)}\geq\cdots\ \forall\ i\in\{l+1+n/2,l+2+n/2\}.

Third, we know l+n/2l+n/2 contacts n+1n+1 at phase JlJ_{l} (by 𝒢1\mathcal{G}_{1}), who responds with a currently active arm (by the malicious strategy), so since minSJl(l+n/2)>1l+n/2\min S_{J_{l}}^{(l+n/2)}>1-l+n/2 (by l,3\mathcal{H}_{l,3}), we have

(36) minSJl+1(l+n/2)>1l+n/2.\min S_{J_{l}+1}^{(l+n/2)}>1-l+n/2.

As a consequence of (36), we see that when l+1+n/2l+1+n/2 contacts l+n/2l+n/2 at phase Jl+1J_{l}+1 (which occurs by 𝒢2\mathcal{G}_{2}), l+n/2l+n/2 recommends some arm strictly worse than 1l+n/21-l+n/2. On the other hand, by (35) and Lemma 1, we know the most played arm for l+1+n/2l+1+n/2 in phase Jl+2J_{l}+2 has index at most 1l+n/21-l+n/2. Taken together, the recommendation is not most played, so

(37) l+n/2Pj(l+1+n/2)j{Jl+2,,(Jl+2)2=Jl+1}.l+n/2\in P_{j}^{(l+1+n/2)}\ \forall\ j\in\{J_{l}+2,\ldots,(J_{l}+2)^{2}=J_{l+1}\}.

Step 4: Completing the proof. Using the above, we prove in turn that l+1,4\mathcal{H}_{l+1,4}, l+1,3\mathcal{H}_{l+1,3}, and l+1,1\mathcal{H}_{l+1,1} hold. For l+1,4\mathcal{H}_{l+1,4}, we use proof by contradiction: if l+1,4\mathcal{H}_{l+1,4} fails, we can find il+1+n/2i\geq l+1+n/2 and j{Jl+1,,Jl+1}j\in\{J_{l}+1,\ldots,J_{l+1}\} such that n+1Pj(i)n+1\in P_{j}^{(i)}. Let j=min{j{Jl+1,,Jl+1}:n+1Pj(i)}j^{\dagger}=\min\{j\in\{J_{l}+1,\ldots,J_{l+1}\}:n+1\in P_{j}^{(i)}\} be the minimal such jj (for this ii). Since n+1PJl(i)n+1\notin P_{J_{l}}^{(i)} (by l,4\mathcal{H}_{l,4}) and jj^{\dagger} is minimal, we must have n+1Pj(i)Pj1(i)n+1\in P_{j^{\dagger}}^{(i)}\setminus P_{j^{\dagger}-1}^{(i)}, i.e., n+1n+1 was blocked for the recommendation it provided at j1j^{\dagger}-1. If il+3+n/2i\geq l+3+n/2, this contradicts the malicious strategy, since j1{Jl,,Jl+11}j^{\dagger}-1\in\{J_{l},\ldots,J_{l+1}-1\} and the strategy avoids blocking for such ii and jj^{\dagger}. A similar contradiction arises if i{l+1+n/2,l+2+n/2}i\in\{l+1+n/2,l+2+n/2\} and jJl+2j^{\dagger}\geq J_{l}+2 (since j1{Jl+1,,Jl+11}j^{\dagger}-1\in\{J_{l}+1,\ldots,J_{l+1}-1\} in this case), so we must have i{l+1+n/2,l+2+n/2}i\in\{l+1+n/2,l+2+n/2\} and j=Jl+1j^{\dagger}=J_{l}+1. But in this case, n+1Pj(i)Pj1(i)=PJl+1(i)PJl(i)n+1\in P_{j^{\dagger}}^{(i)}\setminus P_{j^{\dagger}-1}^{(i)}=P_{J_{l}+1}^{(i)}\setminus P_{J_{l}}^{(i)} contradicts (34).

Next, we show l+1,3\mathcal{H}_{l+1,3} holds. The logic is similar to the end of the Lemma 2 proof. If instead l+1,3\mathcal{H}_{l+1,3} fails, we can find j{Jl+1,,Jl+1}j\in\{J_{l}+1,\ldots,J_{l+1}\} and i{l+1+n/2,,n}i\in\{l+1+n/2,\ldots,n\} such that minSj(i)(n/2)l\min S_{j}^{(i)}\leq(n/2)-l. Let jj^{\dagger} be the minimal such jj and il+1+n/2i^{\dagger}\geq l+1+n/2 an agent with minSj(i)=k\min S_{j^{\dagger}}^{(i^{\dagger})}=k^{\dagger} for some k(n/2)lk^{\dagger}\leq(n/2)-l. Since minSJl(i)>1l+n/2\min S_{J_{l}}^{(i^{\dagger})}>1-l+n/2 (by l,3\mathcal{H}_{l,3}), jJl+1j^{\dagger}\geq J_{l}+1, and jj^{\dagger} is minimal, we know that kk^{\dagger} was recommended to ii^{\dagger} at phase j1{Jl,,Jl+11}j^{\dagger}-1\in\{J_{l},\ldots,J_{l+1}-1\}. By the malicious strategy, this implies that the recommending agent (say, ii^{\ddagger}) was honest. Therefore, kk^{\dagger} was active for ii^{\ddagger} at phase j1j^{\dagger}-1, so since jj^{\dagger} is minimal, il+n/2i^{\ddagger}\leq l+n/2. Hence, by the assumed graph structure, i=l+1+n/2i^{\dagger}=l+1+n/2 contacted i=l+n/2i^{\ddagger}=l+n/2 at phase j1j^{\dagger}-1, who recommended kk^{\dagger}. If j1{Jl,Jl+2,,Jl+11}j^{\dagger}-1\in\{J_{l},J_{l}+2,\ldots,J_{l+1}-1\}, this contact cannot occur, since l+1+n/2l+1+n/2 instead contacts n+1n+1 at JlJ_{l} (by 𝒢1\mathcal{G}_{1}) and does not contact l+n/2l+n/2 at Jl+2,,Jl+1J_{l}+2,\ldots,J_{l+1} (by (37)). Hence, we must have j1=Jl+1j^{\dagger}-1=J_{l}+1, so minSJl+1(l+n/2)k(n/2)l\min S_{J_{l}+1}^{(l+n/2)}\leq k^{\dagger}\leq(n/2)-l, contradicting (36).

Finally, we prove l+1,1\mathcal{H}_{l+1,1}. Suppose instead that l+1+n/2PJl+1(l+2+n/2)l+1+n/2\in P_{J_{l+1}}^{(l+2+n/2)}, i.e., l+1+n/2l+1+n/2 is blocked at Jl+1J_{l+1}. Then since P0(l+2+n/2)=P_{0}^{(l+2+n/2)}=\emptyset, we must have l+1+n/2Pj(l+2+n/2)Pj1(l+2+n/2)l+1+n/2\in P_{j}^{(l+2+n/2)}\setminus P_{j-1}^{(l+2+n/2)} for some j[Jl+1]j\in[J_{l+1}]. Let jj^{\dagger} be the maximal such jj. Then jJl+1=Jl+2j^{\dagger}\geq\sqrt{J_{l+1}}=J_{l}+2; otherwise, if j<Jl+1j^{\dagger}<\sqrt{J_{l+1}}, l+1+n/2l+1+n/2 would have been un-blocked by phase Jl+1J_{l+1}. Therefore, the blocking rule implies

(38) Bj(l+2+n/2)Rj1(l+2+n/2)=Bj1(l+1+n/2).B_{j^{\dagger}}^{(l+2+n/2)}\neq R_{j^{\dagger}-1}^{(l+2+n/2)}=B_{j^{\dagger}-1}^{(l+1+n/2)}.

By j{Jl+2,,Jl+1}j^{\dagger}\in\{J_{l}+2,\ldots,J_{l+1}\}, l+1,3\mathcal{H}_{l+1,3}, and (35), we also know

l+n/2<minSj(l+2+n/2),minSj1(l+1+n/2)1l+n/2,-l+n/2<\min S_{j^{\dagger}}^{(l+2+n/2)},\min S_{j^{\dagger}-1}^{(l+1+n/2)}\leq 1-l+n/2,

so minSj(l+2+n/2)=minSj1(l+1+n/2)=1l+n/2\min S_{j^{\dagger}}^{(l+2+n/2)}=\min S_{j^{\dagger}-1}^{(l+1+n/2)}=1-l+n/2. Combined with (38), we must have Bj+h2(l+h+n/2)>minSj+h2(l+h+n/2)=1l+n/2B_{j^{\dagger}+h-2}^{(l+h+n/2)}>\min S_{j^{\dagger}+h-2}^{(l+h+n/2)}=1-l+n/2 for some h{1,2}h\in\{1,2\}, which contradicts Lemma 1 (since jJl+2j^{\dagger}\geq J_{l}+2). ∎

Finally, we can prove the theorem. Define σ=min{j:1Sj(n)}\sigma=\min\{j\in\mathbb{N}:1\in S_{j}^{(n)}\}. Then by definition, It(n)1I_{t}^{(n)}\neq 1 for any tAσ1t\leq A_{\sigma-1}. Hence, because Δ2=1/15\Delta_{2}=1/15 in the problem instance of the theorem, we obtain

Aσ1T15=t=1Aσ1T𝟙(It(n)1)15=t=1AσTk=2K𝟙(It(n)=k)15t=1Tk=2KΔk𝟙(It(n)=k).\frac{A_{\sigma-1}\wedge T}{15}=\sum_{t=1}^{A_{\sigma-1}\wedge T}\frac{\mathbbm{1}(I_{t}^{(n)}\neq 1)}{15}=\sum_{t=1}^{A_{\sigma}\wedge T}\sum_{k=2}^{K}\frac{\mathbbm{1}(I_{t}^{(n)}=k)}{15}\leq\sum_{t=1}^{T}\sum_{k=2}^{K}\Delta_{k}\mathbbm{1}(I_{t}^{(n)}=k).

Thus, by Claim 23 from Appendix F.1 and since Aσ1=(σ1)2A_{\sigma-1}=(\sigma-1)^{2} by the choice β=2\beta=2, we can write

(39) RT(n)=𝔼[t=1Tk=2KΔk𝟙(It(i)=k)]𝔼[Aσ1T]15=𝔼[(σ1)2T]15.R_{T}^{(n)}=\mathbb{E}\left[\sum_{t=1}^{T}\sum_{k=2}^{K}\Delta_{k}\mathbbm{1}(I_{t}^{(i)}=k)\right]\geq\frac{\mathbb{E}[A_{\sigma-1}\wedge T]}{15}=\frac{\mathbb{E}[(\sigma-1)^{2}\wedge T]}{15}.

Let l[n/2]l\in[n/2] be chosen later. Then σ>Jl\sigma>J_{l} implies σ1Jl\sigma-1\geq J_{l} (since σ,Jl\sigma,J_{l}\in\mathbb{N}). Thus, we can write

𝔼[(σ1)2T]𝔼[((σ1)2T)𝟙(σ>Jl)](Jl2T)(σ>Jl).\mathbb{E}[(\sigma-1)^{2}\wedge T]\geq\mathbb{E}[((\sigma-1)^{2}\wedge T)\mathbbm{1}(\sigma>J_{l})]\geq(J_{l}^{2}\wedge T)\mathbb{P}(\sigma>J_{l}).

By definition of σ\sigma and l\mathcal{E}_{l}, along with Lemmas 2 and 3, we also know

(σ>Jl)(l)34(l1)(1)34(l1)32(J11)=932lJ1.\mathbb{P}(\sigma>J_{l})\geq\mathbb{P}(\mathcal{E}_{l})\geq 3^{-4(l-1)}\mathbb{P}(\mathcal{E}_{1})\geq 3^{-4(l-1)}\cdot 3^{-2(J_{1}-1)}=9^{3-2l-J_{1}}.

By (7) from Section 4.1, we know Jl2J12l=(28)2l=22l+3J_{l}^{2}\geq J_{1}^{2^{l}}=(2^{8})^{2^{l}}=2^{2^{l+3}}. Combined with the previous three bounds, and letting CC denote the constant C=93J1/15C=9^{3-J_{1}}/15, we thus obtain

(40) RT(n)(22l+3T)932lJ1/15=C81l(22l+3T)l[n/2].R_{T}^{(n)}\geq(2^{2^{l+3}}\wedge T)\cdot 9^{3-2l-J_{1}}/15=C\cdot 81^{-l}\cdot(2^{2^{l+3}}\wedge T)\ \forall\ l\in[n/2].

We now consider three different cases, each with a different choice of ll.

  • If T>22(n/2)+3T>2^{2^{(n/2)+3}}, choose l=n/2l=n/2. Then (40) becomes RT(n)C81n/222(n/2)+3R_{T}^{(n)}\geq C\cdot 81^{-n/2}\cdot 2^{2^{(n/2)+3}}. Observe that

    81n/222(n/2)+3\displaystyle 81^{-n/2}\cdot 2^{2^{(n/2)+3}} 16n22(n/2)+3=(24)22n/2n(24)2n/2>exp(2n/2)\displaystyle\geq 16^{-n}\cdot 2^{2^{(n/2)+3}}=(2^{4})^{2\cdot 2^{n/2}-n}\geq(2^{4})^{2^{n/2}}>\exp(2^{n/2})
    =exp(exp(nlog(2)/2))>exp(exp(n/3)),\displaystyle=\exp(\exp(n\log(2)/2))>\exp(\exp(n/3)),

    where the second inequality is n2n/2n\leq 2^{n/2} for n{2,4,8,}n\in\{2,4,8,\ldots\}. On the other hand, Claim 1 below shows RT(n)log(T)/C1R_{T}^{(n)}\geq\log(T)/C_{\ref{clmLogRegNaive}} for some absolute constant C1>0C_{\ref{clmLogRegNaive}}>0. Thus, we have shown

    RT(n)=(RT(n)/2)+(RT(n)/2)(C/2)exp(exp(n/3))+log(T)/(2C1).R_{T}^{(n)}=(R_{T}^{(n)}/2)+(R_{T}^{(n)}/2)\geq(C/2)\exp(\exp(n/3))+\log(T)/(2C_{\ref{clmLogRegNaive}}).
  • If T(28,22(n/2)+3]T\in(2^{8},2^{2^{(n/2)+3}}], let l=log2(log2(T))3l=\lceil\log_{2}(\log_{2}(T))-3\rceil. Then 22l+3T2^{2^{l+3}}\geq T, so 22l+3T=T2^{2^{l+3}}\wedge T=T. Furthermore, we know llog2(log2(T))2l\leq\log_{2}(\log_{2}(T))-2, which implies

    81l81281log2(log2(T))81227log2(log2(T))=812/log27(T)=812log7(2)/log7(T).\quad 81^{-l}\geq 81^{2}\cdot 81^{-\log_{2}(\log_{2}(T))}\geq 81^{2}\cdot 2^{-7\log_{2}(\log_{2}(T))}=81^{2}/\log_{2}^{7}(T)=81^{2}\log^{7}(2)/\log^{7}(T).

    Next, observe that 0=log2(log2(28))3<log2(log2(T))3n/20=\log_{2}(\log_{2}(2^{8}))-3<\log_{2}(\log_{2}(T))-3\leq n/2 for this case of TT, so l[n/2]l\in[n/2]. Thus, we can choose this ll in (40) and combine with the above bounds to lower bound regret as RT(n)812log7(2)CT/log7(T)R_{T}^{(n)}\geq 81^{2}\log^{7}(2)CT/\log^{7}(T).

  • If T28T\leq 2^{8}, choose l=1l=1. Then 22l+3=216T2^{2^{l+3}}=2^{16}\geq T, so (40) implies RT(n)CT/81R_{T}^{(n)}\geq CT/81.

Hence, in all three cases, we have shown RT(n)Cmin{log(T)+exp(exp(n/3)),T/log7(T)}R_{T}^{(n)}\geq C^{\prime}\min\{\log(T)+\exp(\exp(n/3)),T/\log^{7}(T)\} for some absolute constant C>0C^{\prime}>0. This establishes the theorem.

We return to state and prove the aforementioned Claim 1. We note the analysis is rather coarse; our only goal here is to establish a logT\log T scaling (not optimize constants).

Claim 1.

Under the assumptions of Theorem 1, we have RT(n)log(T)/C1R_{T}^{(n)}\geq\log(T)/C_{\ref{clmLogRegNaive}}, where C1=15log99C_{\ref{clmLogRegNaive}}=15\log 99.

Proof.

If T=1T=1, the bound holds by nonnegativity. If T{2,,99}T\in\{2,\ldots,99\}, then since minS1(n)>n/2\min S_{1}^{(n)}>n/2 and Δ21/15\Delta_{2}\geq 1/15 by assumption in Theorem 1, we know ΔI1(n)1/15\Delta_{I_{1}^{(n)}}\geq 1/15, which implies RT(n)1/15log(T)/C1R_{T}^{(n)}\geq 1/15\geq\log(T)/C_{\ref{clmLogRegNaive}}. Thus, only the case T100T\geq 100 remains. By Claim 23 from Appendix F.1 and Δ21/15\Delta_{2}\geq 1/15,

RT(n)𝔼[t=1T𝟙(It(n)1)]15=log(99)𝔼[t=1T𝟙(It(n)1)]C1>2𝔼[t=1T𝟙(It(n)1)]C1.R_{T}^{(n)}\geq\frac{\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{15}=\frac{\log(99)\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{C_{\ref{clmLogRegNaive}}}>\frac{2\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{C_{\ref{clmLogRegNaive}}}.

Thus, it suffices to show that t=1T𝟙(It(n)1)log(T)/2\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\geq\log(T)/2. Suppose instead that this inequality fails. Then since the left side is an integer, we have t=1T𝟙(It(n)1)log(T)/2\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\leq\lfloor\log(T)/2\rfloor by assumption. Therefore, we can find t{Tlog(T)/2+1,,T}t\in\{T-\lfloor\log(T)/2\rfloor+1,\ldots,T\} such that It(n)=1I_{t}^{(n)}=1 (else, we violate the assumed inequality). By this choice of tt and the assumed inequality, we can then write

T1(n)(t1)=t1s=1t1𝟙(It(n)1)(Tlog(T)/2)(log(T)/2)TlogT.T_{1}^{(n)}(t-1)=t-1-\sum_{s=1}^{t-1}\mathbbm{1}(I_{t}^{(n)}\neq 1)\geq\left(T-\lfloor\log(T)/2\rfloor\right)-\left(\lfloor\log(T)/2\rfloor\right)\geq T-\log T.

We can lower bound the right side by 4logT4\log T (else, applying Claim 20 from Appendix F.1 with x=Tx=T, y=1y=1, and z=5z=5 yields T<100T<100, a contradiction), which is further bounded by 4logt4\log t. Combined with the fact that rewards are deterministic, μ1=1\mu_{1}=1, and α=4\alpha=4 in Theorem 1, we obtain

(41) μ^1(n)(t1)+αlog(t)/T1(n)(t1)=1+4log(t)/T1(n)(t1)2.\hat{\mu}_{1}^{(n)}(t-1)+\sqrt{\alpha\log(t)/T_{1}^{(n)}(t-1)}=1+\sqrt{4\log(t)/T_{1}^{(n)}(t-1)}\leq 2.

Next, let kSA1(t)(n)k\in S_{A^{-1}(t)}^{(n)} be any other arm which is active for nn at time tt. Then clearly

(42) Tk(n)(t1)t=1T𝟙(It(n)=k)t=1T𝟙(It(n)1)log(T)/2log(T)/2\displaystyle T_{k}^{(n)}(t-1)\leq\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}=k)\leq\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\leq\lfloor\log(T)/2\rfloor\leq\log(T)/2
(43) μ^k(n)(t1)+αlog(t)/Tk(n)(t1)8log(t)/log(T)=2log(t)/log(T).\displaystyle\Rightarrow\hat{\mu}_{k}^{(n)}(t-1)+\sqrt{\alpha\log(t)/T_{k}^{(n)}(t-1)}\geq\sqrt{8\log(t)/\log(T)}=2\sqrt{\log(t)/\log(\sqrt{T})}.

By (41), (43), the fact that It(n)=1I_{t}^{(n)}=1, and the UCB policy, we conclude tTt\leq\sqrt{T}. Since T4T\geq 4, this further implies tT/2t\leq T/2. But we also know that tTlog(T)/2+1>Tlog(T)/2t\geq T-\lfloor\log(T)/2\rfloor+1>T-\log(T)/2. Combining these inequalities gives T<logTT<\log T, a contradiction. ∎

Appendix D Proofs from Section 5

D.1. Proof of Theorem 2

Fix an honest agent i[n]i\in[n]. Let τ(i)=τsprτblk(i)\tau^{(i)}=\tau_{\text{spr}}\vee\tau_{\text{blk}}^{(i)}, where we recall from the proof sketch that

(44) τspr=inf{j:Bj(i)=1i[n],jj},\displaystyle\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i^{\prime})}=1\ \forall\ i^{\prime}\in[n],j^{\prime}\geq j\},
(45) τblk(i)=inf{j:Hj1(i)Pj(i)Pj1(i)jjs.t.Rj1(i)1}.\displaystyle\tau_{\text{blk}}^{(i)}=\inf\{j\in\mathbb{N}:H_{j^{\prime}-1}^{(i)}\in P_{j^{\prime}}^{(i)}\setminus P_{j^{\prime}-1}^{(i)}\ \forall\ j^{\prime}\geq j\ s.t.\ R_{j^{\prime}-1}^{(i)}\neq 1\}.

Let γi(0,1)\gamma_{i}\in(0,1) be chosen later. Denote by S¯(i)={2,,K}S^(i)\overline{S}^{(i)}=\{2,\ldots,K\}\cap\hat{S}^{(i)} and S¯(i)={2,,K}S^(i)\underline{S}^{(i)}=\{2,\ldots,K\}\setminus\hat{S}^{(i)} the suboptimal sticky and non-sticky arms, respectively, for agent ii. Then by Claim 23 from Appendix F.1, we can decompose regret as RT(i)=h=14RT,h(i)R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}, where

(46) RT,1(i)=𝔼[t=1AτsprTΔIt(i)],RT,2(i)=kS¯(i)Δk𝔼[t=1+AτsprT𝟙(It(i)=k)],\displaystyle R_{T,1}^{(i)}=\mathbb{E}\left[\sum_{t=1}^{A_{\tau_{\text{spr}}}\wedge T}\Delta_{I_{t}^{(i)}}\right],\quad R_{T,2}^{(i)}=\sum_{k\in\overline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}(I_{t}^{(i)}=k)\right],
(47) RT,3(i)=kS¯(i)Δk𝔼[t=1+AτsprAτ(i)Tγi/βT𝟙(It(i)=k)],RT,4(i)=kS¯(i)Δk𝔼[t=1+Aτ(i)Tγi/βT𝟙(It(i)=k)],\displaystyle R_{T,3}^{(i)}=\sum_{k\in\underline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}(I_{t}^{(i)}=k)\right],\quad R_{T,4}^{(i)}=\sum_{k\in\underline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}}^{T}\mathbbm{1}(I_{t}^{(i)}=k)\right],

and where t=s1s2𝟙(It(i)=k)=0\sum_{t=s_{1}}^{s_{2}}\mathbbm{1}(I_{t}^{(i)}=k)=0 whenever s1>s2s_{1}>s_{2} by convention. Thus, we have rewritten regret as the sum of four terms: RT,1(i)R_{T,1}^{(i)}, which accounts for regret before the best arm spreads; RT,2(i)R_{T,2}^{(i)}, the regret due to sticky arms after the best arm spreads; RT,3(i)R_{T,3}^{(i)}, regret from non-sticky arms after the best arm spreads but before phase τ(i)Tγi/β\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil; and RT,4(i)R_{T,4}^{(i)}, regret from non-sticky arms after this phase. The subsequent lemmas bound these terms in turn.

Lemma 4.

Under the assumptions of Theorem 2, for any i[n]i\in[n] and TT\in\mathbb{N}, we have

RT,1(i)𝔼[Aτspr]=O(Sβ/(ρ12(β1))(Slog(S/Δ2)/Δ22)β/(β1)(d¯log(n+m))β/ρ1nK2S)+𝔼[A2τ¯spr].R_{T,1}^{(i)}\leq\mathbb{E}[A_{\tau_{\text{spr}}}]=O(S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee(\bar{d}\log(n+m))^{\beta/\rho_{1}}\vee nK^{2}S)+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}].
Proof.

Assumption 2 ensures Δk1\Delta_{k}\leq 1, so RT,1(i)𝔼[Aτspr]R_{T,1}^{(i)}\leq\mathbb{E}[A_{\tau_{\text{spr}}}]. The result follows from the bound on 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}] discussed in Section 7.5 and formally stated as Corollary 4 in Appendix E.5. ∎

Lemma 5.

Under the assumptions of Theorem 2, for any i[n]i\in[n] and TT\in\mathbb{N}, we have

(48) RT,2(i)kS¯(i)4αlogTΔk+4(α1)|S¯(i)|2α3.R_{T,2}^{(i)}\leq\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\overline{S}^{(i)}|}{2\alpha-3}.
Proof.

For any kS¯(i)k\in\overline{S}^{(i)}, Claim 22 and Corollary 6 from Appendix F.2 imply

(49) 𝔼[t=1+AτsprT𝟙(It(i)=k)]\displaystyle\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right] =𝔼[t=1+AτsprT𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)]\displaystyle=\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]
(50) +𝔼[t=1+AτsprT𝟙(It(i)=k,Tk(i)(t1)4αlogtΔk2)]\displaystyle\quad+\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]
(51) 4αlogTΔk2+4(α1)2α3,\displaystyle\leq\frac{4\alpha\log T}{\Delta_{k}^{2}}+\frac{4(\alpha-1)}{2\alpha-3},

so multiplying by Δk\Delta_{k}, using Δk1\Delta_{k}\leq 1, and summing over kS¯(i)k\in\overline{S}^{(i)} completes the proof. ∎

Lemma 6.

Under the assumptions of Theorem 2, for any i[n]i\in[n], γi(0,1)\gamma_{i}\in(0,1), and TT\in\mathbb{N}, we have

RT,3(i)kS¯(i)4αlogATγi/βΔk+4(α1)|S¯(i)|2α3+4C4αKΔ2γilog(C4KΔ2γi)+1+𝔼[Aτspr].R_{T,3}^{(i)}\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}+\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\left(\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}\right)+1+\mathbb{E}[A_{\tau_{\text{spr}}}].
Proof.

The proof is nontrivial; we defer it to the end of this sub-appendix. ∎

Lemma 7.

Under the assumptions of Theorem 2, for any i[n]i\in[n], γi(0,1)\gamma_{i}\in(0,1), and TT\in\mathbb{N}, we have

RT,4(i)2η1η1maxS~S¯(i):|S~|dmal(i)+2kS~4αlogTΔk+8αβlogη(1/γi)(dmal(i)+2)Δ2+4(α1)|S¯(i)|2α3.R_{T,4}^{(i)}\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:|\tilde{S}|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{8\alpha\beta\log_{\eta}(1/\gamma_{i})(d_{\text{mal}}(i)+2)}{\Delta_{2}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.
Proof sketch.

The proof follows the same logic as that of (Vial et al., 2021, Lemma 4), so we omit it. The only differences are (1) we replace mm (the number of malicious agents connected to ii for the complete graph) with dmal(i)d_{\text{mal}}(i), and (2) we use Claim 19 from Appendix F.1 to bound the summation in (Vial et al., 2021, Lemma 4). We refer the reader to the Theorem 2 proof sketch for intuition. ∎

Additionally, we note the sum RT,3(i)+RT,4(i)R_{T,3}^{(i)}+R_{T,4}^{(i)} can be naively bounded as follows.

Lemma 8.

Under the assumptions of Theorem 2, for any i[n]i\in[n] and TT\in\mathbb{N}, we have

RT,3(i)+RT,4(i)kS¯(i)4αlogTΔk+4(α1)|S¯(i)|2α3.R_{T,3}^{(i)}+R_{T,4}^{(i)}\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.
Proof.

The proof follows the exact same logic as that of Lemma 5 so is omitted. ∎

We can now prove the theorem. First, we use the regret decomposition RT(i)=h=14RT,h(i)R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}, Lemmas 4-7, and the fact that |S¯(i)|+|S¯(i)|K|\overline{S}^{(i)}|+|\underline{S}^{(i)}|\leq K to write

(52) RT(i)\displaystyle R_{T}^{(i)} 2η1η1maxS~S¯(i):|S~|dmal(i)+2kS~4αlogTΔk+kS¯(i)4αlogTΔk+kS¯(i)4αlogATγi/βΔk\displaystyle\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:|\tilde{S}|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}
(53) +8αβlogη(1/γi)(dmal(i)+2)Δ2+8(α1)K2α3+4C4αKΔ2γilog(C4KΔ2γi)+1+2𝔼[Aτspr].\displaystyle\quad+\frac{8\alpha\beta\log_{\eta}(1/\gamma_{i})(d_{\text{mal}}(i)+2)}{\Delta_{2}}+\frac{8(\alpha-1)K}{2\alpha-3}+\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\left(\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}\right)+1+2\mathbb{E}[A_{\tau_{\text{spr}}}].

Now choose γi=Δ2/(KΔS+dmal(i)+4)(0,1)\gamma_{i}=\Delta_{2}/(K\Delta_{S+d_{\text{mal}}(i)+4})\in(0,1). Then

(54) (53)\displaystyle\eqref{eqProposedProofNoT} =O~((dmal(i)/Δ2)(K/Δ2)2)+2𝔼[Aτspr]\displaystyle=\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\right)+2\mathbb{E}[A_{\tau_{\text{spr}}}]
(55) =O~((dmal(i)/Δ2)(K/Δ2)2Sβ/(ρ12(β1))(S/Δ22)β/(β1)d¯β/ρ1nK2S)+4𝔼[A2τ¯spr],\displaystyle=\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\vee S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee\bar{d}^{\beta/\rho_{1}}\vee nK^{2}S\right)+4\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}],

where the second inequality is due to Lemma 4. Furthermore, by Claim 18 from Appendix F.1, we know that log(ATγi/β)log(e2β(Tγi/β)β)=2β+γilogT\log(A_{\lceil T^{\gamma_{i}/\beta}\rceil})\leq\log(e^{2\beta}(T^{\gamma_{i}/\beta})^{\beta})=2\beta+\gamma_{i}\log T. Combined with S¯(i)K\underline{S}^{(i)}\leq K, ΔkΔ2kS¯(i)\Delta_{k}\geq\Delta_{2}\ \forall\ k\in\underline{S}^{(i)}, and the choice of γi\gamma_{i}, we can thus write

kS¯(i)4αlogATγi/βΔkkS¯(i)8αβ+γilogTΔk8αβK+γiKlogTΔ2=8αβKΔ2+logTΔS+dmal(i)+4.\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}\leq\sum_{k\in\underline{S}^{(i)}}\frac{8\alpha\beta+\gamma_{i}\log T}{\Delta_{k}}\leq\frac{8\alpha\beta K+\gamma_{i}K\log T}{\Delta_{2}}=\frac{8\alpha\beta K}{\Delta_{2}}+\frac{\log T}{\Delta_{S+d_{\text{mal}}(i)+4}}.

Therefore, we can bound (52) as follows:

(56) (52)\displaystyle\eqref{eqProposedProofT} 2η1η1maxS~S¯(i):|S~|dmal(i)+2kS~4αlogTΔk+kS¯(i)4αlogTΔk+logTΔS+dmal(i)+4+8αβKΔ2\displaystyle\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:|\tilde{S}|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{\log T}{\Delta_{S+d_{\text{mal}}(i)+4}}+\frac{8\alpha\beta K}{\Delta_{2}}
(57) 2η1η1k=2dmal(i)+34αlogTΔk+k=dmal(i)+4S+dmal(i)+44αlogTΔk+8αβKΔ2,\displaystyle\leq\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{4\alpha\log T}{\Delta_{k}}+\frac{8\alpha\beta K}{\Delta_{2}},

where the second inequality holds by Δ2ΔK\Delta_{2}\leq\cdots\leq\Delta_{K} and |S¯(i)|S|\overline{S}^{(i)}|\leq S. Combining the above yields

(58) RT(i)\displaystyle R_{T}^{(i)} 4αlog(T)(2η1η1k=2dmal(i)+31Δk+k=dmal(i)+4S+dmal(i)+41Δk)+2𝔼[A2τ¯spr]\displaystyle\leq 4\alpha\log(T)\left(\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}}\right)+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]
(59) +O~((dmal(i)/Δ2)(K/Δ2)2Sβ/(ρ12(β1))(S/Δ22)β/(β1)d¯β/ρ1nK2S).\displaystyle\quad+\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\vee S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee\bar{d}^{\beta/\rho_{1}}\vee nK^{2}S\right).

Alternatively, we can simply use Lemmas 4, 5, and 8 to write

(60) RT(i)4αlog(T)k=2K1Δk+4(α1)K2α3+𝔼[Aτspr].R_{T}^{(i)}\leq 4\alpha\log(T)\sum_{k=2}^{K}\frac{1}{\Delta_{k}}+\frac{4(\alpha-1)K}{2\alpha-3}+\mathbb{E}[A_{\tau_{\text{spr}}}].

Therefore, combining the previous two expressions and again invoking Lemma 4 to bound the additive terms in (60) by those in (58), we obtain the desired bound.

Thus, it only remains to prove Lemma 6. We begin by using some standard bandit arguments recounted in Appendix F.2 to bound RT,3(i)R_{T,3}^{(i)} in terms of a particular tail of τ(i)\tau^{(i)}.

Claim 2.

Under the assumptions of Theorem 2, for any i[n]i\in[n], γi(0,1)\gamma_{i}\in(0,1), and TT\in\mathbb{N}, we have

(61) RT,3(i)\displaystyle R_{T,3}^{(i)} kS¯(i)4αlogATγi/βΔk+4(α1)|S¯(i)|2α3+𝔼[Aτspr]\displaystyle\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}+\mathbb{E}[A_{\tau_{\text{spr}}}]
(62) +4αKlogTΔ2(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2).\displaystyle\quad+\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right).
Proof.

If T=1T=1, we can naively bound RT,3(i)1R_{T,3}^{(i)}\leq 1, which completes the proof. Thus, we assume T>1T>1 (which will allow us to divide by logT\log T later). For any kS¯(i)k\in\underline{S}^{(i)}, we first write

(63) 𝔼[t=1+AτsprAτ(i)Tγi/βT𝟙(It(i)=k)]𝔼[t=1+AτsprT𝟙(It(i)=k,Tk(i)(t1)4αlogtΔk2)]\displaystyle\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right]\leq\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]
(64) +𝔼[𝟙(τ(i)Tγi/β)t=1+AτsprATγi/βT𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)]\displaystyle\quad+\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}\leq\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]
(65) +𝔼[𝟙(τ(i)>Tγi/β)t=1+AτsprAτ(i)T𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)].\displaystyle\quad+\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right].

By Corollary 6 from Appendix F.2, (63) is bounded by 4(α1)/(2α3)4(\alpha-1)/(2\alpha-3). By Claim 22 from Appendix F.2 and 𝟙()1\mathbbm{1}(\cdot)\leq 1, (64) is bounded by 4αlog(ATγi/β)/Δk24\alpha\log(A_{\lceil T^{\gamma_{i}/\beta}\rceil})/\Delta_{k}^{2}. For (65), Claim 22 similarly gives

(66) t=1+AτsprAτ(i)T𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)4αlogTΔk2,\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq\frac{4\alpha\log T}{\Delta_{k}^{2}},

which clearly implies the following bound for (65):

(67) 𝔼[𝟙(τ(i)>Tγi/β)t=1+AτsprAτ(i)T𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)]4αlogTΔk2(τ(i)>Tγi/β).\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq\frac{4\alpha\log T}{\Delta_{k}^{2}}\mathbb{P}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil).

For the remaining probability term, by Markov’s inequality, we have

(68) (τ(i)>Tγi/β)(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)+Δ2𝔼[Aτspr]4αKlogT.\mathbb{P}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\leq\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)+\frac{\Delta_{2}\mathbb{E}[A_{\tau_{\text{spr}}}]}{4\alpha K\log T}.

Hence, combining the previous two inequalities, and since Δ2Δk\Delta_{2}\leq\Delta_{k}, we can bound (65) by

(69) 4αlogTΔ2Δk(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)+𝔼[Aτspr]KΔk.\frac{4\alpha\log T}{\Delta_{2}\Delta_{k}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)+\frac{\mathbb{E}[A_{\tau_{\text{spr}}}]}{K\Delta_{k}}.

Finally, combining these bounds, then multiplying by Δk\Delta_{k}, summing over kS¯(i)k\in\underline{S}^{(i)}, and using Δk1\Delta_{k}\leq 1 and |S¯(i)|K|\underline{S}^{(i)}|\leq K, we obtain the desired result. ∎

To bound (62), we consider two cases defined in terms of the following inequalities:

(70) 4αKlog(T)/Δ2<θTγi/ββ,4αKlog(T)/Δ2<κTγi/β,\displaystyle 4\alpha K\log(T)/\Delta_{2}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta},\quad 4\alpha K\log(T)/\Delta_{2}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil},
(71) 4αlogAjΔ22<κj1jTγi/β,log(T)j=Tγi/β(κj1)32α<Δ2(2α3)8αK2.\displaystyle\frac{4\alpha\log A_{j}}{\Delta_{2}^{2}}<\lceil\kappa_{j}\rceil-1\ \forall\ j\geq\lceil T^{\gamma_{i}/\beta}\rceil,\quad\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}<\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}.

Roughly speaking, when all of these inequalities hold, then TT is large enough to ensure that the event {τ(i)=Ω(poly(T)),Aτspr=O(logT)}\{\tau^{(i)}=\Omega(\text{poly}(T)),A_{\tau_{\text{spr}}}=O(\log T)\} in (62) is unlikely. The next claim makes this precise.

Claim 3.

Under the assumptions of Theorem 2, for any i[n]i\in[n], γi(0,1)\gamma_{i}\in(0,1), and TT\in\mathbb{N} such that all of the inequalities in (70)-(71) hold, we have

(72) 4αKlogTΔ2(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)1.\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq 1.
Proof.

If T=1T=1, the left side is zero and the bound is immediate, so we assume T>1T>1. First note that if Aτspr<4αKlog(T)/Δ2A_{\tau_{\text{spr}}}<4\alpha K\log(T)/\Delta_{2}, then since Aτspr=τsprβτsprβA_{\tau_{\text{spr}}}=\lceil\tau_{\text{spr}}^{\beta}\rceil\geq\tau_{\text{spr}}^{\beta} by definition and 4αKlog(T)/Δ2<θTγi/ββκTγi/β4\alpha K\log(T)/\Delta_{2}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil} by (70), τspr<θTγi/βκTγi/β1/β\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}. By definition θTγi/β=(Tγi/β/3)ρ1\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}=(\lceil T^{\gamma_{i}/\beta}\rceil/3)^{\rho_{1}} with ρ1(0,1)\rho_{1}\in(0,1), this further implies τspr<Tγi/β\tau_{\text{spr}}<\lceil T^{\gamma_{i}/\beta}\rceil. Thus, when τ(i)>Tγi/β\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil and Aτspr<4αKlog(T)/Δ2A_{\tau_{\text{spr}}}<4\alpha K\log(T)/\Delta_{2}, we have τspr<τ(i)\tau_{\text{spr}}<\tau^{(i)}, which by definition of τ(i)\tau^{(i)} implies τ(i)=τblk(i)\tau^{(i)}=\tau_{\text{blk}}^{(i)}. Therefore, we can write

(73) (τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)(τblk(i)>Tγi/β,τspr<θTγi/βκTγi/β1/β).\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq\mathbb{P}(\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).

Now by definition, τspr<θTγi/β\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor implies that Bj(i)=1jθTγi/βB_{j}^{(i)}=1\ \forall\ j\geq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor. Also by definition, τblk(i)>Tγi/β\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil implies that for some jTγi/βj\geq\lceil T^{\gamma_{i}/\beta}\rceil and k>1k>1, Rj1(i)=kR_{j-1}^{(i)}=k but Hj1(i)Pj(i)Pj1(i)H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}. Thus,

(74) (τblk(i)>Tγi/β,τspr<θTγi/βκTγi/β1/β)\displaystyle\mathbb{P}(\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})
(75) k=2Kj=Tγi/β(Rj1(i)=k,Hj1(i)Pj(i)Pj1(i),τspr<θTγi/βκTγi/β1/β).\displaystyle\quad\leq\sum_{k=2}^{K}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}\mathbb{P}(R_{j-1}^{(i)}=k,H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)},\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).

Now fix kk and jj as in the double summation. Again using τspr<θTγi/βBj(i)=1jθTγi/β\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\Rightarrow B_{j}^{(i)}=1\ \forall\ j\geq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor, the blocking rules implies that if τspr<θTγi/β\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor, Rj1(i)=kR_{j-1}^{(i)}=k, and Hj1(i)Pj(i)Pj1(i)H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}, then Tk(i)(Aj)>κjT_{k}^{(i)}(A_{j})>\kappa_{j}. Since Tk(i)(Aj)T_{k}^{(i)}(A_{j})\in\mathbb{N}, this means Tk(i)(Aj)κjT_{k}^{(i)}(A_{j})\geq\lceil\kappa_{j}\rceil. Hence, there must exist some t{κj,,Aj}t\in\{\lceil\kappa_{j}\rceil,\ldots,A_{j}\} such that Tk(i)(t1)κj1T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1 and It(i)=kI_{t}^{(i)}=k. Thus, taking another union bound,

(76) (Rj1(i)=k,Hj1(i)Pj(i)Pj1(i),τspr<θTγi/βκTγi/β1/β)\displaystyle\mathbb{P}(R_{j-1}^{(i)}=k,H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)},\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})
(77) t=κjAj(Tk(i)(t1)κj1,It(i)=k,τspr<κTγi/β1/β).\displaystyle\quad\leq\sum_{t=\lceil\kappa_{j}\rceil}^{A_{j}}\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,I_{t}^{(i)}=k,\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).

Next, note τspr<κTγi/β1/β\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta} implies that for any jTγi/βj\geq\lceil T^{\gamma_{i}/\beta}\rceil and tκjt\geq\lceil\kappa_{j}\rceil, we have tκTγi/βτsprβ=Aτsprt\geq\lceil\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\rceil\geq\lceil\tau_{\text{spr}}^{\beta}\rceil=A_{\tau_{\text{spr}}}, so 1SA1(t)(i)1\in S_{A^{-1}(t)}^{(i)} by definition of τspr\tau_{\text{spr}}. Therefore, for any t{κj,,Aj}t\in\{\lceil\kappa_{j}\rceil,\ldots,A_{j}\},

(78) (Tk(i)(t1)κj1,It(i)=k,τspr<κTγi/β1/β)(Tk(i)(t1)κj1,1SA1(t)(i),It(i)=k).\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,I_{t}^{(i)}=k,\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})\leq\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k).

Now let k1=kk_{1}=k, k2=1k_{2}=1, and =κj1\ell=\lceil\kappa_{j}\rceil-1. Then μk2μk1=Δk4αlog(t)/(κj1)\mu_{k_{2}}-\mu_{k_{1}}=\Delta_{k}\geq\sqrt{4\alpha\log(t)/(\lceil\kappa_{j}\rceil-1)} by definition and (71), respectively. Therefore, we can use Corollary 5 from Appendix F.2 to obtain

(79) (Tk(i)(t1)κj1,1SA1(t)(i),It(i)=k)2t2(1α).\displaystyle\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k)\leq 2t^{2(1-\alpha)}.

Combining the above five inequalities, then using Claim 19 from Appendix F.1 and (71), we obtain

(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)\displaystyle\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right) 2Kj=Tγi/βt=κjt2(1α)\displaystyle\leq 2K\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}\sum_{t=\lceil\kappa_{j}\rceil}^{\infty}t^{2(1-\alpha)}
2K2α3j=Tγi/β(κj1)32αΔ24αKlogT,\displaystyle\leq\frac{2K}{2\alpha-3}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\leq\frac{\Delta_{2}}{4\alpha K\log T},

so multiplying both sides by 4αKlog(T)/Δ4\alpha K\log(T)/\Delta completes the proof. ∎

On the other hand, when (70)-(71) fails, we can show that TT is bounded, and thus we bound the logT\log T term in (62) by a constant and the probability term by 11, as shown in the following claim.

Claim 4.

Under the assumptions of Theorem 2, there exists a constant C4>0C_{\ref{clmIntNonStickSmallT}}>0 such that, for any i[n]i\in[n], γi(0,1)\gamma_{i}\in(0,1), and TT\in\mathbb{N} for which any of the inequalities in (70)-(71) fails, we have

(80) 4αKlogTΔ2(τ(i)>Tγi/β,Aτspr<4αKlogTΔ2)4C4αKΔ2γilogC4KΔ2γi.\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}.
Proof.

By Claims 33-36 in Appendix F.4, we can set C4=maxi{33,,36}CiC_{\ref{clmIntNonStickSmallT}}=\max_{i\in\{\ref{clmSmallTtheta},\ldots,\ref{clmSmallTSum}\}}C_{i} to ensure that, if any of the inequalities in (70)-(71) fail, then logT(C4/γi)log(C4K/(Δ2γi))\log T\leq(C_{\ref{clmIntNonStickSmallT}}/\gamma_{i})\log(C_{\ref{clmIntNonStickSmallT}}K/(\Delta_{2}\gamma_{i})). The claim follows after upper bounding the probability by 11. ∎

Finally, Lemma 6 follows by combining the previous three claims.

D.2. Proof of Corollary 1

As discussed in the proof sketch, we will couple with the noiseless process. We define this process as follows: let {H¯j(i)}j=1\{\underline{H}_{j}^{(i)}\}_{j=1}^{\infty} be i.i.d. Uniform(Nhon(i))\text{Uniform}(N_{\text{hon}}(i)) random variables for each ii\in\mathbb{N}, and

(81) ¯0={i},¯j=¯j1{i[n]¯j1:H¯j(i)¯j1}j.\underline{\mathcal{I}}_{0}=\{i^{\star}\},\quad\underline{\mathcal{I}}_{j}=\underline{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\underline{\mathcal{I}}_{j-1}:\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1}\}\ \forall\ j\in\mathbb{N}.

For the coupling, we first define

(82) σ0=0,σl=inf{j>σl1:mini[n]j=1+σl1jY¯j(i)1}l.\sigma_{0}=0,\quad\sigma_{l}=\inf\left\{j>\sigma_{l-1}:\min_{i\in[n]}\sum_{j^{\prime}=1+\sigma_{l-1}}^{j}\bar{Y}_{j^{\prime}}^{(i)}\geq 1\right\}\ \forall\ l\in\mathbb{N}.

Next, for each i[n]i\in[n] and ll\in\mathbb{N}, let Zl(i)=min{j{1+σl1,,σl}:Y¯j(i)=1}Z_{l}^{(i)}=\min\{j\in\{1+\sigma_{l-1},\ldots,\sigma_{l}\}:\bar{Y}_{j}^{(i)}=1\}. Note this set is nonempty, and since Zl(i)Z_{l}^{(i)} is a deterministic function of {Y¯j}j=1\{\bar{Y}_{j}\}_{j=1}^{\infty}, which is independent of {H¯j(i)}j=1\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty}, H¯Zl(i)(i)\bar{H}_{Z_{l}^{(i)}}^{(i)} is Uniform(Nhon(i))\text{Uniform}(N_{\text{hon}}(i)) for each ll\in\mathbb{N}. Hence, we can set

H¯j(i)={H¯Zl(i)(i),if j=Zl(i) for some lUniform(Nhon(i)),if j{Zl(i)}l=1\underline{H}_{j}^{(i)}=\begin{cases}\bar{H}_{Z_{l}^{(i)}}^{(i)},&\text{if }j=Z_{l}^{(i)}\text{ for some }l\in\mathbb{N}\\ \text{Uniform}(N_{\text{hon}}(i)),&\text{if }j\notin\{Z_{l}^{(i)}\}_{l=1}^{\infty}\end{cases}

without changing the distribution of {¯j}j=0\{\underline{\mathcal{I}}_{j}\}_{j=0}^{\infty}. This results in a coupling where the noiseless process dominates the noisy one, in the following sense.

Claim 5.

For the coupling described above, ¯j¯σj\underline{\mathcal{I}}_{j}\subset\bar{\mathcal{I}}_{\sigma_{j}} for any j0j\geq 0.

Proof.

We use induction on jj. For j=0j=0, we simply have ¯j={i}=¯j=¯σj\underline{\mathcal{I}}_{j}=\{i^{\star}\}=\bar{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{\sigma_{j}}. Now assume ¯j1¯σj1\underline{\mathcal{I}}_{j-1}\subset\bar{\mathcal{I}}_{\sigma_{j-1}}; we aim to show that if i¯ji\in\underline{\mathcal{I}}_{j}, then i¯σji\in\bar{\mathcal{I}}_{\sigma_{j}}. We consider two cases, the first of which is straightforward: if i¯j1i\in\underline{\mathcal{I}}_{j-1}, then i¯σj1i\in\bar{\mathcal{I}}_{\sigma_{j-1}} by the inductive hypothesis, so since σj1<σj\sigma_{j-1}<\sigma_{j} by definition and {¯j}j=0\{\bar{\mathcal{I}}_{j^{\prime}}\}_{j^{\prime}=0}^{\infty} increases monotonically, we obtain i¯σji\in\bar{\mathcal{I}}_{\sigma_{j}}, as desired.

For the second case, we assume i[n]¯j1i\in[n]\setminus\underline{\mathcal{I}}_{j-1} and H¯j(i)¯j1\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1}. Set j=Zj(i)j^{\prime}=Z_{j}^{(i)} and recall j{1+σj1,,σj}j^{\prime}\in\{1+\sigma_{j-1},\ldots,\sigma_{j}\} by definition. From the coupling above, we know Y¯j(i)=1\bar{Y}_{j^{\prime}}^{(i)}=1 and H¯j(i)=H¯j(i)\bar{H}_{j^{\prime}}^{(i)}=\underline{H}_{j}^{(i)}. Since H¯j(i)¯j1\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1} in the present case, we have H¯j(i)¯j1\bar{H}_{j^{\prime}}^{(i)}\in\underline{\mathcal{I}}_{j-1} as well. Hence, because ¯j1¯σj1\underline{\mathcal{I}}_{j-1}\subset\bar{\mathcal{I}}_{\sigma_{j-1}} by the inductive hypothesis, j1σj1j^{\prime}-1\geq\sigma_{j-1} by definition, and {¯j′′}j′′=0\{\bar{\mathcal{I}}_{j^{\prime\prime}}\}_{j^{\prime\prime}=0}^{\infty} is increasing, we obtain H¯j(i)¯j1\bar{H}_{j^{\prime}}^{(i)}\in\bar{\mathcal{I}}_{j^{\prime}-1}. We have thus shown Y¯j(i)=1\bar{Y}_{j^{\prime}}^{(i)}=1 and H¯j(i)¯j1\bar{H}_{j^{\prime}}^{(i)}\in\bar{\mathcal{I}}_{j^{\prime}-1}, so i¯ji\in\bar{\mathcal{I}}_{j^{\prime}} by Definition 1 Finally, using jσjj^{\prime}\leq\sigma_{j} and again appealing to monotonocity, we conclude that i¯σji\in\bar{\mathcal{I}}_{\sigma_{j}}. ∎

We can now relate the rumor spreading times of the two processes. In particular, let τ¯spr=inf{j:¯j=[n]}\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\} (as in Definition 1) and τ¯spr=inf{j:¯j=[n]}\underline{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\underline{\mathcal{I}}_{j}=[n]\}. We then have the following.

Claim 6.

For any j{3,4,}j\in\{3,4,\ldots\} and ι>1\iota>1, we have (τ¯spr>ιjlog(j)/Υ)(τ¯spr>j)+27nj1ι\mathbb{P}(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon)\leq\mathbb{P}(\underline{\tau}_{\text{spr}}>j)+27nj^{1-\iota}.

Proof.

Let h(0)=0h(0)=0 and h(j)=ιjlog(j)/Υh(j^{\prime})=\iota j^{\prime}\log(j)/\Upsilon for each jj^{\prime}\in\mathbb{N}. Then clearly

(83) (τ¯spr>ιjlog(j)/Υ)=(τ¯spr>h(j),σjh(j))+(τ¯spr>h(j),σj>h(j)).\mathbb{P}(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon)=\mathbb{P}(\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}\leq h(j))+\mathbb{P}(\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}>h(j)).

For the first term in (83), by definition of τ¯spr\bar{\tau}_{\text{spr}} and Claim 5, we have

(84) {τ¯spr>h(j),σjh(j)}{τ¯spr>σj}={¯σj[n]}{¯j[n]}={τ¯spr>j}.\{\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}\leq h(j)\}\subset\{\bar{\tau}_{\text{spr}}>\sigma_{j}\}=\{\bar{\mathcal{I}}_{\sigma_{j}}\neq[n]\}\subset\{\underline{\mathcal{I}}_{j}\neq[n]\}=\{\underline{\tau}_{\text{spr}}>j\}.

For the second term in (83), we first observe that for any jj^{\prime}\in\mathbb{N},

h(j)h(j1)1h(j)h(j1)3=(ιlog(j)/Υ)3>0,\lfloor h(j^{\prime})\rfloor-\lceil h(j^{\prime}-1)\rceil-1\geq h(j^{\prime})-h(j^{\prime}-1)-3=(\iota\log(j)/\Upsilon)-3>0,

where the last inequality holds by assumption on jj and ι\iota. Thus, by the union bound, we can write

(85) (σj>h(j),σj1h(j1))\displaystyle\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime}),\sigma_{j^{\prime}-1}\leq h(j^{\prime}-1)) i=1n(j′′=1+h(j1)h(j){Y¯j′′(i)=0})=n(1Υ)h(j)h(j1)1\displaystyle\leq\sum_{i=1}^{n}\mathbb{P}(\cap_{j^{\prime\prime}=1+\lceil h(j^{\prime}-1)\rceil}^{\lfloor h(j^{\prime})\rfloor}\{\bar{Y}_{j^{\prime\prime}}^{(i)}=0\})=n(1-\Upsilon)^{\lfloor h(j^{\prime})\rfloor-\lceil h(j^{\prime}-1)\rceil-1}
(86) n(1Υ)(ιlog(j)/Υ)3nexp(ιlog(j)+3Υ)<27njι.\displaystyle\leq n(1-\Upsilon)^{(\iota\log(j)/\Upsilon)-3}\leq n\exp(-\iota\log(j)+3\Upsilon)<27nj^{-\iota}.

Hence, because σ0=0=h(0)\sigma_{0}=0=h(0), we can iterate this argument to obtain that for any jj^{\prime}\in\mathbb{N},

(87) (σj>h(j))\displaystyle\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime})) (σj>h(j),σj1h(j1))+(σj1>h(j1))\displaystyle\leq\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime}),\sigma_{j^{\prime}-1}\leq h(j^{\prime}-1))+\mathbb{P}(\sigma_{j^{\prime}-1}>h(j^{\prime}-1))
(88) 27njι+(σj1>h(j1))27njjι.\displaystyle\leq 27nj^{-\iota}+\mathbb{P}(\sigma_{j^{\prime}-1}>h(j^{\prime}-1))\leq\cdots\leq 27nj^{\prime}j^{-\iota}.

In particular, (σj>h(j))27nj1ι\mathbb{P}(\sigma_{j}>h(j))\leq 27nj^{1-\iota}. Combining with (83) and (84) completes the proof. ∎

To bound the tail of τ¯spr\underline{\tau}_{\text{spr}}, we will use the following result.

Claim 7 (Lemma 19 from (Chawla et al., 2020)).

Under the assumptions of Corollary 1, there exists an absolute constant C7>0C_{\ref{clmNoiselessTail}}>0 such that, for any hh\in\mathbb{N}, we have (τ¯sprC7hlog(n)/ϕ)n4h\mathbb{P}(\underline{\tau}_{\text{spr}}\geq C_{\ref{clmNoiselessTail}}h\log(n)/\phi)\leq n^{-4h}.

Using the previous two claims, we can prove a tail bound for τ¯spr\bar{\tau}_{\text{spr}}.

Claim 8.

Under the assumptions of Corollary 1, there exists an absolute constant C8>0C_{\ref{clmNoisyTail}}>0 such that, for any h{3,4,}h\in\{3,4,\ldots\}, we have (τ¯sprξ(h))562h\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\xi(h))\leq 56\cdot 2^{-h}, where we define

ξ(h)=C8(logn)2h3log(C8log(n)/ϕ)/(ϕΥ).\xi(h)=C_{\ref{clmNoisyTail}}(\log n)^{2}h^{3}\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)/(\phi\Upsilon).
Proof.

Since GhonG_{\text{hon}} is dd-regular with d2d\geq 2 by assumption, we have n2n\geq 2 as well. Therefore, setting C8=(2C7)((e+1)/log(2))C_{\ref{clmNoisyTail}}=(2C_{\ref{clmNoiselessTail}})\vee((e+1)/\log(2)), we know log(C8log(n)/ϕ)1\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq 1, which implies

(h1)log(C8log(n)/ϕ)h1loghhlog(C8log(n)/ϕ)log(C8hlog(n)/ϕ).(h-1)\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq h-1\geq\log h\quad\Rightarrow\quad h\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq\log(C_{\ref{clmNoisyTail}}h\log(n)/\phi).

Consequently, if we define ι=hlogn\iota=h\log n and j=C8hlog(n)/ϕj=\lfloor C_{\ref{clmNoisyTail}}h\log(n)/\phi\rfloor, we can write

(89) ξ(h)\displaystyle\xi(h) =(hlogn)(C8hlog(n)/ϕ)(hlog(C8log(n)/ϕ))/Υ\displaystyle=\left(h\log n\right)\left(C_{\ref{clmNoisyTail}}h\log(n)/\phi\right)\left(h\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\right)/\Upsilon
(90) (hlogn)(C8hlog(n)/ϕ)log(C8hlog(n)/ϕ)/Υιjlog(j)/Υ.\displaystyle\geq\left(h\log n\right)\left(C_{\ref{clmNoisyTail}}h\log(n)/\phi\right)\log(C_{\ref{clmNoisyTail}}h\log(n)/\phi)/\Upsilon\geq\iota j\log(j)/\Upsilon.

Because C8(e+1)/log(2)C_{\ref{clmNoisyTail}}\geq(e+1)/\log(2), h3h\geq 3, and n2n\geq 2, we also know

(91) jC8hlog(n)1((e+1)/log(2))3log(2)1=3(e+1)1>3e>e2.j\geq C_{\ref{clmNoisyTail}}\cdot h\cdot\log(n)-1\geq((e+1)/\log(2))\cdot 3\cdot\log(2)-1=3(e+1)-1>3e>e^{2}.

Hence, j{3,4,}j\in\{3,4,\ldots\}; combined with ι3log2>1\iota\geq 3\log 2>1, we can apply Claim 6 to obtain

(92) (τ¯spr>ξ(h))(τ¯spr>ιjlog(j)/Υ)(τ¯spr>j)+27nj1ι.\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\xi(h)\right)\leq\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon\right)\leq\mathbb{P}\left(\underline{\tau}_{\text{spr}}>j\right)+27nj^{1-\iota}.

On the other hand, (91) implies C8hlog(n)/ϕ2C_{\ref{clmNoisyTail}}h\log(n)/\phi\geq 2, so by definition of C8C_{\ref{clmNoisyTail}},

j(C8hlog(n)/ϕ)1(C8hlog(n)/ϕ)/2=(C8/2)hlog(n)/ϕC7hlog(n)/ϕ.j\geq(C_{\ref{clmNoisyTail}}h\log(n)/\phi)-1\geq(C_{\ref{clmNoisyTail}}h\log(n)/\phi)/2=(C_{\ref{clmNoisyTail}}/2)h\log(n)/\phi\geq C_{\ref{clmNoiselessTail}}h\log(n)/\phi.

Therefore, by Claim 7, we know that

(τ¯spr>j)(τ¯spr>C7hlog(n)/ϕ)n4h<n1h.\mathbb{P}\left(\underline{\tau}_{\text{spr}}>j\right)\leq\mathbb{P}\left(\underline{\tau}_{\text{spr}}>C_{\ref{clmNoiselessTail}}h\log(n)/\phi\right)\leq n^{-4h}<n^{1-h}.

Finally, notice that ι=hlogn3log2>2\iota=h\log n\geq 3\log 2>2, so 1ι<ι/21-\iota<-\iota/2, thus by (91),

27nj1ι<27njι/2=27njι<27nexp(ι)=27n1h.27nj^{1-\iota}<27nj^{-\iota/2}=27n\sqrt{j}^{-\iota}<27n\exp(-\iota)=27n^{1-h}.

Hence, substituting the previous two inequalities into (92) and using n2n\geq 2 completes the proof. ∎

We now bound 𝔼[A2τ¯spr]\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]. First, we define

χ=C8(logn)2log(C8log(n)/ϕ)/(ϕΥ),C=8β/log2,h=ClogC3.\chi=\lceil C_{\ref{clmNoisyTail}}(\log n)^{2}\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)/(\phi\Upsilon)\rceil,\quad C=8\beta/\log 2,\quad h_{\star}=\lceil C\log C\rceil\vee 3.

Notice that for any hhClogCh\geq h_{\star}\geq C\log C, we have 2hh4β2^{-h}\leq h^{-4\beta} (else, we can invoke Claim 20 from Appendix F.1 with x=hx=h, y=1y=1, and z=C/2z=C/2 to obtain h<ClogCh<C\log C, a contradiction). We write

(93) j=1(AjAj1)(τ¯sprj/2)A2χh3+h=hj=2χh3+12χ(h+1)3(AjAj1)(τ¯sprj/2).\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq A_{2\chi h_{\star}^{3}}+\sum_{h=h_{\star}}^{\infty}\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2).

Now for any hhh\geq h_{\star} and j2χh3+1j\geq 2\chi h^{3}+1, we use 2hh4β2^{-h}\leq h^{-4\beta}, the definition of χ\chi, and Claim 8 to write

(τ¯sprj/2)(τ¯sprχh3)(τ¯sprξ(h))562h56h4β.\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\chi h^{3})\leq\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\xi(h))\leq 56\cdot 2^{-h}\leq 56h^{-4\beta}.

Therefore, for any such hh, we obtain

j=2χh3+12χ(h+1)3(AjAj1)(τ¯sprj/2)56h4βj=2χh3+12χ(h+1)3(AjAj1)56h4βA2χ(h+1)3.\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq 56h^{-4\beta}\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\leq 56h^{-4\beta}A_{2\chi(h+1)^{3}}.

Furthermore, by Claim 18 in Appendix F.1 and h1h\geq 1, we know that

A2χ(h+1)3e2β(2χ(h+1)3)βe2β(2χ(2h)3)β=(4e)2βχβh3βh1.A_{2\chi(h+1)^{3}}\leq e^{2\beta}(2\chi(h+1)^{3})^{\beta}\leq e^{2\beta}(2\chi(2h)^{3})^{\beta}=(4e)^{2\beta}\chi^{\beta}h^{3\beta}\ \forall\ h\geq 1.

Therefore, by the previous two inequalities and (93), and since h3h_{\star}\geq 3, we have shown

j=1(AjAj1)(τ¯sprj/2)(4e)2βχβ(h3β+56h=hhβ)(4e)2βχβ(h3β+56β1),\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq(4e)^{2\beta}\chi^{\beta}\left(h_{\star}^{3\beta}+56\sum_{h=h_{\star}}^{\infty}h^{-\beta}\right)\leq(4e)^{2\beta}\chi^{\beta}\left(h_{\star}^{3\beta}+\frac{56}{\beta-1}\right),

where the second inequality follows from Claim 19 from Appendix F.1, h3h_{\star}\geq 3, and β>1\beta>1. Because hh_{\star} is a constant, the right side is O(χβ)O(\chi^{\beta}). Therefore, we have shown

𝔼[2τ¯spr]=𝔼[j=1(AjAj1)𝟙(2τ¯sprj)]=j=1(AjAj1)(τ¯sprj/2)=O(χβ).\mathbb{E}[_{2\bar{\tau}_{\text{spr}}}]=\mathbb{E}\left[\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbbm{1}(2\bar{\tau}_{\text{spr}}\geq j)\right]=\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)=O(\chi^{\beta}).

Hence, by definition of χ\chi, we obtain 𝔼[A2τ¯spr]=O(((logn)2log(log(n)/ϕ)/(ϕΥ))β)\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=O(((\log n)^{2}\log(\log(n)/\phi)/(\phi\Upsilon))^{\beta}). Combining this bound with Theorem 2 completes the proof of Corollary 1.

D.3. Proof of Corollary 2

Similar to the analysis in Appendix D.1, we can use the decomposition RT(i)=h=14RT,h(i)R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}, along with Lemmas 4 and 5, to bound regret as follows:

(94) RT(i)kS¯(i)4αlogTΔk+4(α1)|S¯(i)|2α3+RT,3(i)+RT,4(i)+𝔼[Aτspr].\displaystyle R_{T}^{(i)}\leq\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\overline{S}^{(i)}|}{2\alpha-3}+R_{T,3}^{(i)}+R_{T,4}^{(i)}+\mathbb{E}[A_{\tau_{\text{spr}}}].

Next, for each kS¯(i)k\in\underline{S}^{(i)}, let Yk=𝟙(j=τspr{kSj(i)})Y_{k}=\mathbbm{1}(\cup_{j=\tau_{\text{spr}}}^{\infty}\{k\in S_{j}^{(i)}\}) be the indicator that kk was active after AτsprA_{\tau_{\text{spr}}}. Then as in the proof of Lemma 5, we can use Claim 22 and Corollary 6 from Appendix F.2 to write

(95) 𝔼[t=1+AτsprT𝟙(It(i)=k)]𝔼[Yk]4αlogTΔk2+4(α1)2α3.\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right]\leq\mathbb{E}[Y_{k}]\frac{4\alpha\log T}{\Delta_{k}^{2}}+\frac{4(\alpha-1)}{2\alpha-3}.

(The only difference from the proof of Lemma 5 is that, when applying Claim 22, we write

t=1+AτsprT𝟙(It(i)=k,Tk(i)(t1)<4αlogtΔk2)Yk4αlogTΔk2,\displaystyle\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq Y_{k}\frac{4\alpha\log T}{\Delta_{k}^{2}},

where we can multiply by YkY_{k} because the left side is also zero when Yk=0Y_{k}=0.) Combining (95) with the definitions of RT,3(i)R_{T,3}^{(i)} and RT,4(i)R_{T,4}^{(i)} and using Δk1\Delta_{k}\leq 1, we thus obtain

RT,3(i)+RT,4(i)𝔼[kS¯(i)Yk4αlogTΔk]+4(α1)|S¯(i)|2α3.R_{T,3}^{(i)}+R_{T,4}^{(i)}\leq\mathbb{E}\left[\sum_{k\in\underline{S}^{(i)}}Y_{k}\frac{4\alpha\log T}{\Delta_{k}}\right]+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.

We claim, and will return to prove, that when dmal(i)=0d_{\text{mal}}(i)=0,

(96) kS¯(i)1Δk+kS¯(i)YkΔkk=2S+21Δk.\sum_{k\in\overline{S}^{(i)}}\frac{1}{\Delta_{k}}+\sum_{k\in\underline{S}^{(i)}}\frac{Y_{k}}{\Delta_{k}}\leq\sum_{k=2}^{S+2}\frac{1}{\Delta_{k}}.

Assuming (96) holds, we can combine the previous two inequalities and substitute into (94) to obtain RT(i)4αlog(T)k=2S+2Δk2+O(K)+𝔼[Aτspr]R_{T}^{(i)}\leq 4\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-2}+O(K)+\mathbb{E}[A_{\tau_{\text{spr}}}]. Bounding 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}] as in Lemma 4 yields the sharper version of Theorem 2, and further bounding 𝔼[A2τ¯spr]\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}] as in Appendix D.2 sharpens Corollary 2.

To prove (96), we first show kS¯(i)Yk1+𝟙(1S^(i))\sum_{k\in\underline{S}^{(i)}}Y_{k}\leq 1+\mathbbm{1}(1\in\hat{S}^{(i)}). Suppose instead that kS¯(i)Yk2+𝟙(1S^(i))H\sum_{k\in\underline{S}^{(i)}}Y_{k}\geq 2+\mathbbm{1}(1\in\hat{S}^{(i)})\triangleq H. Then we can find HH distinct arms k1,,kHS¯(i)k_{1},\ldots,k_{H}\in\underline{S}^{(i)}, and HH corresponding phases jhτsprj_{h}\geq\tau_{\text{spr}}, such that khk_{h} was active at phase jhj_{h} for each h[H]h\in[H]. Without loss of generality, we can assume each jhj_{h} is minimal, i.e., jh=min{jτspr:khSj(i)}j_{h}=\min\{j\geq\tau_{\text{spr}}:k_{h}\in S_{j}^{(i)}\}. We consider two cases (which are exhaustive since jhτsprj_{h}\geq\tau_{\text{spr}}) and show that both yield contradictions.

  • jh=τsprh[H]j_{h}=\tau_{\text{spr}}\ \forall\ h\in[H]: We consider two further sub-cases.

    • 1S^(i)1\in\hat{S}^{(i)}, i.e., the best arm is sticky. Then H=3H=3, so k1,,k3k_{1},\ldots,k_{3} are all active at phase τspr\tau_{\text{spr}}. But all of these arms are non-sticky and only two such arms are active per phase.

    • 1S^(i)1\notin\hat{S}^{(i)}. Here k1,k2k_{1},k_{2} are both active at phase τspr\tau_{\text{spr}}, as is 11 (by definition of τspr\tau_{\text{spr}}). But since k1k_{1} and k2k_{2} are suboptimal, we again have three non-sticky active arms.

  • maxh[H]jh>τspr\max_{h\in[H]}j_{h}>\tau_{\text{spr}}: We can assume (after possibly relabeling) that j1>τsprj_{1}>\tau_{\text{spr}}. Thus, by minimality of j1j_{1}, k1k_{1} was not active at phase j11j_{1}-1 but became active at j1j_{1}, so it was recommended by some neighbor ii^{\prime} at j11j_{1}-1. But since dmal(i)=0d_{\text{mal}}(i)=0, ii^{\prime} is honest, and since j11τsprj_{1}-1\geq\tau_{\text{spr}}, the best arm was most played for ii^{\prime} in phase j11j_{1}-1, so ii^{\prime} would not have recommended k1k_{1}.

Thus, kS¯(i)Yk1+𝟙(1S^(i))\sum_{k\in\underline{S}^{(i)}}Y_{k}\leq 1+\mathbbm{1}(1\in\hat{S}^{(i)}) holds. Combined with the fact that |S¯(i)|=S𝟙(1S^(i))|\overline{S}^{(i)}|=S-\mathbbm{1}(1\in\hat{S}^{(i)}) by definition, at most S+1S+1 terms are nonzero in the summations on the left side of (96). Since Δ2ΔK\Delta_{2}\leq\cdots\leq\Delta_{K} by the assumed ordering of the arm means, this completes the proof.

D.4. Coarse analysis of the noisy rumor process

For completeness, we provide a coarser though more general bound for 𝔼[A2τ¯spr]\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}] than the one derived in Appendix D.2. To begin, let j\mathbb{P}_{j}^{\prime} and 𝔼j\mathbb{E}_{j}^{\prime} denote probability and expectation conditioned on {Y¯j(i),H¯j(i)}j=1j1\{\bar{Y}_{j^{\prime}}^{(i)},\bar{H}_{j^{\prime}}^{(i)}\}_{j^{\prime}=1}^{j-1}. For each h[n]h\in[n], define the random phase τ¯(h)=inf{j:|¯j|=h}\bar{\tau}(h)=\inf\{j\in\mathbb{N}:|\bar{\mathcal{I}}_{j}|=h\}. Note that τ¯(1)=1\bar{\tau}(1)=1 and τ¯spr=τ¯(n)\bar{\tau}_{\text{spr}}=\bar{\tau}(n). We then have the following tail bound.

Claim 9.

For any l,jl,j\in\mathbb{N}, we have (τ¯(l)>lj)l(1Υ/d¯hon)j\mathbb{P}(\bar{\tau}(l)>lj)\leq l(1-\Upsilon/\bar{d}_{\text{hon}})^{j}.

Proof.

We use induction on ll. For l=1l=1, τ¯(1)=1\bar{\tau}(1)=1 ensures (τ¯(l)>lj)=0\mathbb{P}(\bar{\tau}(l)>lj)=0 for any jj\in\mathbb{N}, so the bound is immediate. Next, assume the bound holds for ll\in\mathbb{N}. We first write

(τ¯(l+1)>(l+1)j)(τ¯(l+1)>(l+1)j,τ¯(l)lj)+(τ¯(l)>lj).\mathbb{P}(\bar{\tau}(l+1)>(l+1)j)\leq\mathbb{P}(\bar{\tau}(l+1)>(l+1)j,\bar{\tau}(l)\leq lj)+\mathbb{P}(\bar{\tau}(l)>lj).

Thus, by the inductive hypothesis, it suffices to bound the first term by (1Υ/d¯hon)j(1-\Upsilon/\bar{d}_{\text{hon}})^{j}. We first write

(τ¯(l+1)>(l+1)j,τ¯(l)lj)=𝔼[𝟙(τ¯(l)lj)lj+1(τ¯(l+1)>(l+1)j)].\mathbb{P}(\bar{\tau}(l+1)>(l+1)j,\bar{\tau}(l)\leq lj)=\mathbb{E}[\mathbbm{1}(\bar{\tau}(l)\leq lj)\mathbb{P}_{lj+1}^{\prime}(\bar{\tau}(l+1)>(l+1)j)].

Now suppose τ¯(l)lj\bar{\tau}(l)\leq lj. By Assumption 1, we can find i¯lji\in\bar{\mathcal{I}}_{lj} and i¯lji^{\prime}\notin\bar{\mathcal{I}}_{lj} such that iNhon(i)i\in N_{\text{hon}}(i^{\prime}). Then for τ¯(l+1)>(l+1)j\bar{\tau}(l+1)>(l+1)j to occur, it must be the case that, for each j{lj+1,,(l+1)j}j^{\prime}\in\{lj+1,\ldots,(l+1)j\}, the event {H¯j(i)=i,Y¯j(i)=1}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\} did not occur. Therefore, we have

(97) lj+1(τ¯(l+1)>(l+1)j)lj+1(j=lj+1(l+1)j{H¯j(i)=i,Y¯j(i)=1}C).\mathbb{P}_{lj+1}^{\prime}(\bar{\tau}(l+1)>(l+1)j)\leq\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C}).

By the law of total expectation, we can write

(98) lj+1(j=lj+1(l+1)j{H¯j(i)=i,Y¯j(i)=1}C)\displaystyle\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})
(99) =𝔼lj+1[𝟙(j=lj+1(l+1)j1{H¯j(i)=i,Y¯j(i)=1}C)(1(l+1)j(H¯(l+1)j(i)=i,Y¯(l+1)j(i)=1))].\displaystyle\quad=\mathbb{E}_{lj+1}^{\prime}[\mathbbm{1}(\cap_{j^{\prime}=lj+1}^{(l+1)j-1}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})(1-\mathbb{P}_{(l+1)j}^{\prime}(\bar{H}_{(l+1)j}^{(i^{\prime})}=i,\bar{Y}_{(l+1)j}^{(i^{\prime})}=1))].

Since H¯(l+1)j(i)\bar{H}_{(l+1)j}^{(i^{\prime})} is Uniform(Nhon(i))\text{Uniform}(N_{\text{hon}}(i)) and Y¯(l+1)j(i)\bar{Y}_{(l+1)j}^{(i^{\prime})} is Bernoulli(Υ)\text{Bernoulli}(\Upsilon), we have

(l+1)j(H¯(l+1)j(i)=i,Y¯(l+1)j(i)=1)=Υ/dhon(i)Υ/d¯hon.\mathbb{P}_{(l+1)j}^{\prime}(\bar{H}_{(l+1)j}^{(i^{\prime})}=i,\bar{Y}_{(l+1)j}^{(i^{\prime})}=1)=\Upsilon/d_{\text{hon}}(i)\geq\Upsilon/\bar{d}_{\text{hon}}.

Therefore, combining the previous two expressions and iterating, we obtain

lj+1(j=lj+1(l+1)j{H¯j(i)=i,Y¯j(i)=1}C)\displaystyle\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C}) lj+1(j=lj+1(l+1)j1{H¯j(i)=i,Y¯j(i)=1}C)(1Υ/d¯hon)\displaystyle\leq\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j-1}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})(1-\Upsilon/\bar{d}_{\text{hon}})
(1Υ/d¯hon)j.\displaystyle\leq\cdots\leq(1-\Upsilon/\bar{d}_{\text{hon}})^{j}.\qed

Next, we have a simple technical claim.

Claim 10.

Let h=(8βd¯hon/Υ)log(8βd¯honn/Υ)h_{\dagger}=(8\beta\bar{d}_{\text{hon}}/\Upsilon)\log(8\beta\bar{d}_{\text{hon}}n/\Upsilon). Then for any hhh\geq h_{\dagger}, exp(hΥ/d¯hon)h2β/n\exp(-h\Upsilon/\bar{d}_{\text{hon}})\leq h^{-2\beta}/n.

Proof.

If the claimed bound fails, we have h<(2βd¯hon/Υ)log(h)+(d¯hon/Υ)log(n)h<(2\beta\bar{d}_{\text{hon}}/\Upsilon)\log(h)+(\bar{d}_{\text{hon}}/\Upsilon)\log(n). Then since (d¯hon/Υ)log(n)<h/2h/2(\bar{d}_{\text{hon}}/\Upsilon)\log(n)<h_{\dagger}/2\leq h/2, we obtain h<(4βd¯hon/Υ)loghh<(4\beta\bar{d}_{\text{hon}}/\Upsilon)\log h. Applying Claim 20 from Appendix F.1 with x=hx=h, y=1y=1, and z=4βd¯hon/Υz=4\beta\bar{d}_{\text{hon}}/\Upsilon, we obtain h<2zlog(2z)hh<2z\log(2z)\leq h_{\dagger}, a contradiction. ∎

We can now bound 𝔼[A2τ¯spr]\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]. The analysis is similar to Appendix D.2. We first write

𝔼[A2τ¯spr]A2nh+h=h(τ¯spr>nh)j=2nh+12n(h+1)(AjAj1).\displaystyle\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]\leq A_{2n\lceil h_{\dagger}\rceil}+\sum_{h=\lceil h_{\dagger}\rceil}^{\infty}\mathbb{P}(\bar{\tau}_{\text{spr}}>nh)\sum_{j=2nh+1}^{2n(h+1)}(A_{j}-A_{j-1}).

By the previous two claims, we know

(τ¯spr>nh)=(τ¯(n)>nh)n(1Υ/d¯hon)hnexp(hΥ/d¯hon)h2β.\mathbb{P}(\bar{\tau}_{\text{spr}}>nh)=\mathbb{P}(\bar{\tau}(n)>nh)\leq n(1-\Upsilon/\bar{d}_{\text{hon}})^{h}\leq n\exp(-h\Upsilon/\bar{d}_{\text{hon}})\leq h^{-2\beta}.

By Claim 18 from Appendix F.1, we also have

A2nh(2e)2β(nh)β,A2n(h+1)e2β(2n(h+1))βe2β(2n(2h))β=(2e)2β(nh)βh.A_{2n\lceil h_{\dagger}\rceil}\leq(2e)^{2\beta}(nh_{\dagger})^{\beta},\quad A_{2n(h+1)}\leq e^{2\beta}(2n(h+1))^{\beta}\leq e^{2\beta}(2n(2h))^{\beta}=(2e)^{2\beta}(nh)^{\beta}\ \forall\ h\in\mathbb{N}.

Therefore, combining the previous three expressions, we obtain

𝔼A2τ¯spr(2e)2βnβ(hβ+h=hhβ)(2e)2βnβ(hβ+1β1),\mathbb{E}A_{2\bar{\tau}_{\text{spr}}}\leq(2e)^{2\beta}n^{\beta}\left(h_{\dagger}^{\beta}+\sum_{h=\lceil h_{\dagger}\rceil}^{\infty}h^{-\beta}\right)\leq(2e)^{2\beta}n^{\beta}\left(h_{\dagger}^{\beta}+\frac{1}{\beta-1}\right),

where the second inequality uses β>1\beta>1, h2h_{\dagger}\geq 2, and Claim 19 from Appendix F.1. Hence, we have shown 𝔼[A2τ¯spr]=O((nh)β)=O~((nd¯hon/Υ)β)\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=O((nh_{\dagger})^{\beta})=\tilde{O}((n\bar{d}_{\text{hon}}/\Upsilon)^{\beta}). Note this bound cannot be improved in general – for example, if GhonG_{\text{hon}} is a line graph, it becomes O~((n/Υ)β)\tilde{O}((n/\Upsilon)^{\beta}), so since 𝔼[τ¯spr]β=O(𝔼[Aτ¯spr])\mathbb{E}[\bar{\tau}_{\text{spr}}]^{\beta}=O(\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]), we have 𝔼[τ¯spr]=O~(n/Υ)\mathbb{E}[\bar{\tau}_{\text{spr}}]=\tilde{O}(n/\Upsilon), which is the correct scaling (up to log terms) in Definition 1.

Appendix E Details from Section 7

In this appendix, we formalize the analysis that was discussed in Section 7. In particular, the subsequent five sub-appendices provide details on the respective five subsections of Section 7.

E.1. Details from Section 7.1

Let ψ=(ρ2+β(2α1))/(2αβ)\psi^{\prime}=(\rho_{2}+\beta(2\alpha-1))/(2\alpha\beta). Note that since α>2\alpha>2 and ρ2(0,β)\rho_{2}\in(0,\beta) by assumption,

0<11/(2α)=β(2α1)/(2αβ)<ψ<(β+β(2α1))/(2αβ)=1,0<1-1/(2\alpha)=\beta(2\alpha-1)/(2\alpha\beta)<\psi^{\prime}<(\beta+\beta(2\alpha-1))/(2\alpha\beta)=1,

so ψ=ψ\psi=\sqrt{\psi^{\prime}} is well-defined and ψ(0,1)\psi\in(0,1). Next, for any jj\in\mathbb{N}, define

(100) δj,1=4αlogAj(AjAj1S+21)1,δj,2=αlog(Aj11)(1ψκj2(AjAj1S+21)1)0.\delta_{j,1}=\sqrt{\frac{4\alpha\log A_{j}}{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}},\quad\delta_{j,2}=\sqrt{\alpha\log(A_{j-1}\vee 1)}\left(\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}\right)\vee 0.

Since AjAj1=Θ(jβ1)A_{j}-A_{j-1}=\Theta(j^{\beta-1}), ψ<1\psi<1, and ρ2<β1\rho_{2}<\beta-1, we are guaranteed that AjAj12(S+2)A_{j}-A_{j-1}\geq 2(S+2) and δj,2>0\delta_{j,2}>0 for large jj, so the following is well-defined:

(101) J1=min{j:AjAj12(S+2),δj,2>0jj}.J_{1}^{\star}=\min\left\{j\in\mathbb{N}:A_{j^{\prime}}-A_{j^{\prime}-1}\geq 2(S+2),\delta_{j^{\prime},2}>0\ \forall\ j^{\prime}\geq j\right\}.

Also note J12J_{1}^{\star}\geq 2 (since A1A0=1A_{1}-A_{0}=1). Next, recall from Section 7.1 that

(102) Ξj,1(i)={Bj(i)Gδj,1(Sj(i))},Ξj,2(i)={minwGδj,2(Sj(i))Tw(i)(Aj)κj},Ξj(i)=Ξj,1(i)Ξj,2(i).\displaystyle\Xi_{j,1}^{(i)}=\left\{B_{j}^{(i)}\notin G_{\delta_{j,1}}(S_{j}^{(i)})\right\},\quad\Xi_{j,2}^{(i)}=\left\{\min_{w\in G_{\delta_{j,2}}(S_{j}^{(i)})}T_{w}^{(i)}(A_{j})\leq\kappa_{j}\right\},\quad\Xi_{j}^{(i)}=\Xi_{j,1}^{(i)}\cup\Xi_{j,2}^{(i)}.

Hence, if we let 𝒮(i)={W[K]:|W|=S+2,S^(i)W}\mathcal{S}^{(i)}=\{W\subset[K]:|W|=S+2,\hat{S}^{(i)}\subset W\} denote the possible active sets for agent ii (i.e., Sj(i)𝒮(i)S_{j}^{(i)}\in\mathcal{S}^{(i)} for any phase jj), we can write

(103) Ξj(i)=W𝒮(i)((Ξj,1(i){Sj(i)=W})((Ξj,1(i))CΞj,2(i){Sj(i)=W})).\Xi_{j}^{(i)}=\cup_{W\in\mathcal{S}^{(i)}}((\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\cup((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})).

Consequently, by the union bound, we obtain

(104) (Ξj(i))W𝒮(i)((Ξj,1(i){Sj(i)=W})+((Ξj,1(i))CΞj,2(i){Sj(i)=W})).\mathbb{P}(\Xi_{j}^{(i)})\leq\sum_{W\in\mathcal{S}^{(i)}}\left(\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})+\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})\right).

The next two claims bound the two summands on the right side.

Claim 11.

Under the assumptions of Theorem 2, for any i[n]i\in[n], jJ1j\geq J_{1}^{\star}, and W𝒮(i)W\in\mathcal{S}^{(i)}, we have

(105) (Ξj,1(i){Sj(i)=W})4S(j1)β(32α)/(2α3).\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq 4S(j-1)^{\beta(3-2\alpha)}/(2\alpha-3).
Proof.

If WGδj,1(W)=W\setminus G_{\delta_{j,1}}(W)=\emptyset, the claim is immediate. Otherwise, Ξj,1(i){Sj(i)=W}\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\} implies Bj(i)=wB_{j}^{(i)}=w for some wWGδj,1(W)w\in W\setminus G_{\delta_{j,1}}(W). Thus, by the union bound,

(106) (Ξj,1(i){Sj(i)=W})wWGδj,1(W)(Bj(i)=w,minWSj(i)).\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq\sum_{w\in W\setminus G_{\delta_{j,1}}(W)}\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)}).

Fix wWGδj,1(W)w\in W\setminus G_{\delta_{j,1}}(W). Then Bj(i)=wB_{j}^{(i)}=w implies Tw(i)(Aj)Tw(i)(Aj1)(AjAj1)/(S+2)T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})\geq(A_{j}-A_{j-1})/(S+2) (else, by definition of Bj(i)B_{j}^{(i)}, kSj(i)(Tk(i)(Aj)Tk(i)(Aj1))<AjAj1\sum_{k\in S_{j}^{(i)}}(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1}))<A_{j}-A_{j-1}). Since AjAj1S+2A_{j}-A_{j-1}\geq S+2 (by jJ1j\geq J_{1}^{\star}), we conclude Tw(i)(Aj)Tw(i)(Aj1)1T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})\geq 1, so there exists t{1+Aj1,,Aj}t\in\{1+A_{j-1},\ldots,A_{j}\} such that

(107) Tw(i)(t1)Tw(i)(Aj1)=Tw(i)(Aj)Tw(i)(Aj1)1,It(i)=w.T_{w}^{(i)}(t-1)-T_{w}^{(i)}(A_{j-1})=T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})-1,\quad I_{t}^{(i)}=w.

Combining and using the union bound and with Tw(i)(Aj1)0T_{w}^{(i)}(A_{j-1})\geq 0 by definition, we obtain

(108) (Bj(i)=w,minWSj(i))t=1+Aj1Aj(Tw(i)(t1)AjAj1S+21,minWSj(i),It(i)=w).\displaystyle\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)})\leq\sum_{t=1+A_{j-1}}^{A_{j}}\mathbb{P}\left(T_{w}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,\min W\in S_{j}^{(i)},I_{t}^{(i)}=w\right).

Now fix tt as in the summation. Observe that since wWGδj,1(W)w\in W\setminus G_{\delta_{j,1}}(W) and jJ1j\geq J_{1}^{\star}, we have

μminWμw>δj,1=4αlogAjAjAj1S+214αlogtAjAj1S+21.\mu_{\min W}-\mu_{w}>\delta_{j,1}=\sqrt{\frac{4\alpha\log A_{j}}{\frac{A_{j}-A_{j-1}}{S+2}-1}}\geq\sqrt{\frac{4\alpha\log t}{\frac{A_{j}-A_{j-1}}{S+2}-1}}.

Therefore, for any such tt, we can apply a basic bandit tail (namely, Corollary 5 from Appendix F.2 with the parameters k1=wk_{1}=w, k2=minWk_{2}=\min W, and =(AjAj1)/(S+2)1\ell=(A_{j}-A_{j-1})/(S+2)-1) to obtain

(Tw(i)(t1)AjAj1S+21,minWSj(i),It(i)=w)2t2(1α).\mathbb{P}\left(T_{w}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,\min W\in S_{j}^{(i)},I_{t}^{(i)}=w\right)\leq 2t^{2(1-\alpha)}.

Substituting into (108) and using Claim 19 from Appendix F.1 (which applies since α>2\alpha>2), we obtain

(Bj(i)=w,minWSj(i))2t=1+Aj1t2(1α)2Aj132α2α32(j1)β(32α)2α3.\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)})\leq 2\sum_{t=1+A_{j-1}}^{\infty}t^{2(1-\alpha)}\leq\frac{2A_{j-1}^{3-2\alpha}}{2\alpha-3}\leq\frac{2(j-1)^{\beta(3-2\alpha)}}{2\alpha-3}.

Substituting into (106) and using |WGδj,1(W)||W|1=S+12S|W\setminus G_{\delta_{j,1}}(W)|\leq|W|-1=S+1\leq 2S completes the proof. ∎

Claim 12.

Under the assumption of Theorem 2, for any i[n]i\in[n], jJ1j\geq J_{1}^{\star}, and W𝒮(i)W\in\mathcal{S}^{(i)},

(109) ((Ξj,1(i))CΞj,2(i){Sj(i)=W})62βS(j1)β(32α)/(2α3).\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq 6\cdot 2^{\beta}S(j-1)^{\beta(3-2\alpha)}/(2\alpha-3).
Proof.

By definition, we have

(110) (Ξj,1(i))CΞj,2(i){Sj(i)=W}={Bj(i)Gδj,1(W),minwGδj,2(W)Tw(i)(Aj)κj,Sj(i)=W}.\displaystyle(\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\}=\left\{B_{j}^{(i)}\in G_{\delta_{j,1}}(W),\min_{w\in G_{\delta_{j,2}}(W)}T_{w}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right\}.

As in the proof of Claim 11, we know that

(111) TBj(i)(i)(Aj)TBj(i)(i)(Aj)TBj(i)(i)(Aj1)AjAj1S+2>(AjAj1S+21)1>κj,T_{B_{j}^{(i)}}^{(i)}(A_{j})\geq T_{B_{j}^{(i)}}^{(i)}(A_{j})-T_{B_{j}^{(i)}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2}>\left(\frac{A_{j}-A_{j-1}}{S+2}-1\right)\vee 1>\kappa_{j},

where the final inequality holds since δj,2>0\delta_{j,2}>0 by assumption jJ1j\geq J_{1}^{\star}, which implies

(112) (AjAj1S+21)1κj>(21ψ)2>1.\frac{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}{\kappa_{j}}>\left(\frac{2}{1-\psi}\right)^{2}>1.

Thus, (Ξj,1(i))CΞj,2(i){Sj(i)=W}(\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\} implies Bj(i)argminwGδj,2(W)Tw(i)(Aj)B_{j}^{(i)}\notin\operatorname*{arg\,min}_{w\in G_{\delta_{j,2}}(W)}T_{w}^{(i)}(A_{j}), so by the union bound,

(113) ((Ξj,1(i))CΞj,2(i){Sj(i)=W})\displaystyle\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})
(114) w1Gδj,1(W),w2Gδj,2(W){w1}(Tw1(i)(Aj)Tw1(i)(Aj1)AjAj1S+2,Tw2(i)(Aj)κj,Sj(i)=W).\displaystyle\quad\leq\sum_{w_{1}\in G_{\delta_{j,1}}(W),w_{2}\in G_{\delta_{j,2}}(W)\setminus\{w_{1}\}}\mathbb{P}\left(T_{w_{1}}^{(i)}(A_{j})-T_{w_{1}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2},T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right).

Now fix w1,w2w_{1},w_{2} as in the double summation. Then similar to the proof of Claim 11,

(115) (Tw1(i)(Aj)Tw1(i)(Aj1)AjAj1S+2,Tw2(i)(Aj)κj,Sj(i)=W)\displaystyle\mathbb{P}\left(T_{w_{1}}^{(i)}(A_{j})-T_{w_{1}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2},T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right)
(116) t=1+Aj1Aj(Tw1(i)(t1)AjAj1S+21,Tw2(i)(Aj)κj,w2Sj(i),It(i)=w1)\displaystyle\quad\leq\sum_{t=1+A_{j-1}}^{A_{j}}\mathbb{P}\left(T_{w_{1}}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},w_{2}\in S_{j}^{(i)},I_{t}^{(i)}=w_{1}\right)
(117) t=1+Aj1Aj2κjt12αψ2=2t=1+Aj1Aj(κjtρ2/β)t22α,\displaystyle\quad\leq\sum_{t=1+A_{j-1}}^{A_{j}}2\kappa_{j}t^{1-2\alpha\psi^{2}}=2\sum_{t=1+A_{j-1}}^{A_{j}}(\kappa_{j}t^{-\rho_{2}/\beta})t^{2-2\alpha},

where the second bound follows from applying Claim 21 from Appendix F.2 with k1=w1k_{1}=w_{1}, k2=w2k_{2}=w_{2}, =(AjAj1)/(S+2)1\ell=(A_{j}-A_{j-1})/(S+2)-1, u=κju=\kappa_{j}, and ι=ψ\iota=\psi; note this claim applies since by assumption jJ1j\geq J_{1}^{\star},

μw2μw1μw2μminWδj,2αlogt(2AjAj1S+211ψκj).\mu_{w_{2}}-\mu_{w_{1}}\geq\mu_{w_{2}}-\mu_{\min W}\geq-\delta_{j,2}\geq\sqrt{\alpha\log t}\left(\frac{2}{\sqrt{\frac{A_{j}-A_{j-1}}{S+2}-1}}-\frac{1-\psi}{\sqrt{\kappa_{j}}}\right).

Next, observe that for any tAj1(j1)βt\geq A_{j-1}\geq(j-1)^{\beta}, by definition of κj\kappa_{j}, jJ12j\geq J_{1}^{\star}\geq 2, and ρ2<β1\rho_{2}<\beta-1,

κjtρ2/βκj(j1)ρ2=(11/j)ρ2/(K2S)2β1/S.\kappa_{j}t^{-\rho_{2}/\beta}\leq\kappa_{j}(j-1)^{-\rho_{2}}=(1-1/j)^{-\rho_{2}}/(K^{2}S)\leq 2^{\beta-1}/S.

Similar to the proof of Claim 11, we can then use Claim 19 to obtain

2t=1+Aj1Aj(κjtρ2/β)t22α2βSt=1+Aj1Ajt22α2β(j1)β(32α)S(2α3).2\sum_{t=1+A_{j-1}}^{A_{j}}(\kappa_{j}t^{-\rho_{2}/\beta})t^{2-2\alpha}\leq\frac{2^{\beta}}{S}\sum_{t=1+A_{j-1}}^{A_{j}}t^{2-2\alpha}\leq\frac{2^{\beta}(j-1)^{\beta(3-2\alpha)}}{S(2\alpha-3)}.

Combining with (113) and (115) completes the proof, since

|Gδj,1(W)||Gδj,2(W){w1}|<(S+2)(S+1)(3S)(2S)=6S2.\displaystyle|G_{\delta_{j,1}}(W)||G_{\delta_{j,2}}(W)\setminus\{w_{1}\}|<(S+2)(S+1)\leq(3S)(2S)=6S^{2}.\qed

Finally, we provide the tail for τarm=inf{j:𝟙(Ξj(i))=0i[n],jj}\tau_{\text{arm}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\Xi_{j^{\prime}}^{(i)})=0\ \forall\ i\in[n],j^{\prime}\geq j\}.

Lemma 9.

Under the assumptions of Theorem 2, for any jJ13j\geq J_{1}^{\star}\vee 3,

(118) (τarm>j)(6β+2)nK2S(j2)β(32α)+1(2α3)(β(2α3)1).\mathbb{P}(\tau_{\text{arm}}>j)\leq\frac{(6^{\beta}+2)nK^{2}S(j-2)^{\beta(3-2\alpha)+1}}{(2\alpha-3)(\beta(2\alpha-3)-1)}.
Proof.

By (104), Claims 11 and 12, |𝒮(i)|=(K2)<K22|\mathcal{S}^{(i)}|=\binom{K}{2}<\frac{K^{2}}{2}, and β1\beta\geq 1, we can write

(119) (Ξj(i))(62β+4)K2S(j1)β(32α)2(2α3)(6β+2)K2S(j1)β(32α)2α3.\mathbb{P}(\Xi_{j^{\prime}}^{(i)})\leq\frac{(6\cdot 2^{\beta}+4)K^{2}S(j^{\prime}-1)^{\beta(3-2\alpha)}}{2(2\alpha-3)}\leq\frac{(6^{\beta}+2)K^{2}S(j^{\prime}-1)^{\beta(3-2\alpha)}}{2\alpha-3}.

Thus, because τarm>j\tau_{\text{arm}}>j implies 𝟙(i=1nΞj(i))=1\mathbbm{1}(\cup_{i=1}^{n}\Xi_{j^{\prime}}^{(i)})=1 for some i[n]i\in[n] and jjj^{\prime}\geq j, the union bound gives

(120) (τarm>j)j=ji=1n(Ξj(i))(6β+2)nK2S2α3j=j(j1)β(32α).\mathbb{P}(\tau_{\text{arm}}>j)\leq\sum_{j^{\prime}=j}^{\infty}\sum_{i=1}^{n}\mathbb{P}(\Xi_{j^{\prime}}^{(i)})\leq\frac{(6^{\beta}+2)nK^{2}S}{2\alpha-3}\sum_{j^{\prime}=j}^{\infty}(j^{\prime}-1)^{\beta(3-2\alpha)}.

Finally, use Claim 19 (which applies since β(2α3)>1\beta(2\alpha-3)>1) to bound the sum. ∎

E.2. Details from Section 7.2

Recall θj=(j/3)ρ1\theta_{j}=(j/3)^{\rho_{1}}, where ρ1(0,1/η]\rho_{1}\in(0,1/\eta] and η>1\eta>1. Hence, for all large jj, we have

(121) 1θjj2,θjηθjη+1(j/3)+1(j2)j/3.1\leq\lfloor\theta_{j}\rfloor\leq j-2,\quad\lceil\lfloor\theta_{j}\rfloor^{\eta}\rceil\leq\theta_{j}^{\eta}+1\leq(j/3)+1\leq(j-2)-j/3.

Thus, the following is well-defined:

(122) J2=min{j:1θjj2,j/3(j2)θjηjj}.J_{2}^{\star}=\min\left\{j\in\mathbb{N}:1\leq\lfloor\theta_{j^{\prime}}\rfloor\leq j^{\prime}-2,j^{\prime}/3\leq(j^{\prime}-2)-\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil\ \forall\ j^{\prime}\geq j\right\}.

Now recall from Section 7.2 that Ξj(ii)=j=θjj2{Hj(i)i}\Xi_{j}^{(i\rightarrow i^{\prime})}=\cap_{j^{\prime}=\lfloor\theta_{j}\rfloor}^{j-2}\{H_{j^{\prime}}^{(i^{\prime})}\neq i\}, and

τcom=inf{j:𝟙((i,i)EhonΞj(ii))=0j{j,j+1,}}.\tau_{\text{com}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\cup_{(i,i^{\prime})\in E_{\text{hon}}}\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})=0\ \forall\ j^{\prime}\in\{j,j+1,\ldots\}\}.

The next lemma provides a tail bound for this random phase.

Lemma 10.

Under the assumptions of Theorem 2, for any jJ2j\geq J_{2}^{\star},

(123) (τcom>j)3(n+m)3exp(j/(3d¯)).\mathbb{P}(\tau_{\text{com}}>j)\leq 3(n+m)^{3}\exp(-j/(3\bar{d})).
Proof.

We first use the union bound to write

(124) (τcom>j)j=jiiEhon(Ξj(ii)).\displaystyle\mathbb{P}(\tau_{\text{com}}>j)\leq\sum_{j^{\prime}=j}^{\infty}\sum_{i\rightarrow i^{\prime}\in E_{\text{hon}}}\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}).

Fix iiEhoni\rightarrow i^{\prime}\in E_{\text{hon}} and jjj^{\prime}\geq j. Suppose Ξj(ii)\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})} holds. Then iPj′′(i)Pj′′1(i)j′′{θj+1,,j1}i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor+1,\ldots,j^{\prime}-1\}; else, we can find j′′{θj+1,,j1}j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor+1,\ldots,j^{\prime}-1\} such that Hj′′1(i)=iH_{j^{\prime\prime}-1}^{(i^{\prime})}=i (i.e., Hj′′(i)=iH_{j^{\prime\prime}}^{(i^{\prime})}=i for j′′{θj,,j2}j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor,\ldots,j^{\prime}-2\}), contradicting Ξj(ii)\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}. Hence, we have two cases: iPj′′(i)Pj′′1(i)j′′[j1]i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in[j^{\prime}-1], or iPj′′(i)Pj′′1(i)i\in P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})} for some j′′[j1]j^{\prime\prime}\in[j^{\prime}-1] and max{j′′[j1]:iPj′′(i)Pj′′1(i)}θj\max\{j^{\prime\prime}\in[j^{\prime}-1]:i\in P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\}\leq\lfloor\theta_{j^{\prime}}\rfloor. In the former case, iPj′′(i)j′′[j1]i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in[j^{\prime}-1]; in the latter, iPj′′(i)j′′{θjη+1,,j1}i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in\{\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1,\ldots,j^{\prime}-1\}. Thus,

(125) (Ξj(ii))\displaystyle\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}) (j′′=θjη+1j2{iPj′′(i),Hj′′(i)i})\displaystyle\leq\mathbb{P}(\cap_{j^{\prime\prime}=\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1}^{j^{\prime}-2}\{i\notin P_{j^{\prime\prime}}^{(i^{\prime})},H_{j^{\prime\prime}}^{(i^{\prime})}\neq i\})
(126) =𝔼[𝟙(j′′=θjη+1j3{iPj′′(i),Hj′′(i)i})𝟙(iPj2(i))j2(Hj2(i)i)].\displaystyle=\mathbb{E}[\mathbbm{1}(\cap_{j^{\prime\prime}=\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1}^{j^{\prime}-3}\{i\notin P_{j^{\prime\prime}}^{(i^{\prime})},H_{j^{\prime\prime}}^{(i^{\prime})}\neq i\})\mathbbm{1}(i\notin P_{j^{\prime}-2}^{(i^{\prime})})\mathbb{P}_{j^{\prime}-2}(H_{j^{\prime}-2}^{(i)}\neq i)].

Now given that iPj2(i)i\notin P_{j^{\prime}-2}^{(i^{\prime})}, Hj2(i)H_{j^{\prime}-2}^{(i)} is sampled uniformly from a set of at most d¯\bar{d} elements which includes ii, so j2(Hj2(i)i)(11/d¯)\mathbb{P}_{j^{\prime}-2}(H_{j^{\prime}-2}^{(i)}\neq i)\leq(1-1/\bar{d}). Substituting above and iterating yields

(127) (Ξj(ii))(11/d¯)j2θjη(11/d¯)j/3,\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})\leq(1-1/\bar{d})^{j^{\prime}-2-\lceil\lfloor\theta_{j}^{\prime}\rfloor^{\eta}\rceil}\leq(1-1/\bar{d})^{j^{\prime}/3},

where the final inequality uses jjJ2j^{\prime}\geq j\geq J_{2}^{\star}. Combining (124) and (127) and computing a geometric series, we obtain

(128) (τcom>j)|Ehon|(11/d¯)j/31(11/d¯)j/3|Ehon|(11/d¯)j/31(11/d¯)1/3\mathbb{P}(\tau_{\text{com}}>j)\leq\frac{|E_{\text{hon}}|(1-1/\bar{d})^{j/3}}{1-(1-1/\bar{d})^{j/3}}\leq\frac{|E_{\text{hon}}|(1-1/\bar{d})^{j/3}}{1-(1-1/\bar{d})^{1/3}}

Finally, using |Ehon|n2<(n+m)2|E_{\text{hon}}|\leq n^{2}<(n+m)^{2}, 1xexx1-x\leq e^{-x}\ \forall\ x\in\mathbb{R}, (1+x)r1+rx(1+x)^{r}\leq 1+rx for any r(0,1)r\in(0,1) and x1x\geq-1, and d¯m+n\bar{d}\leq m+n, we obtain the desired bound. ∎

E.3. Details from Section 7.3

We begin with some intermediate claims.

Claim 13.

If the assumptions of Theorem 2 hold, then for any i[n]i\in[n] and jτarmj\geq\tau_{\text{arm}}, we have

μminSj(i)μBj(i)+δj,1μminSj+1(i)+δj,1.\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}\leq\mu_{\min S_{j+1}^{(i)}}+\delta_{j,1}.
Proof.

The first inequality holds by definition of τarm\tau_{\text{arm}} and assumption jτarmj\geq\tau_{\text{arm}}. The second holds since minSj+1(i)\min S_{j+1}^{(i)} is the best arm in Sj+1(i)S_{j+1}^{(i)} and Bj(i)Sj+1(i)B_{j}^{(i)}\in S_{j+1}^{(i)} in the algorithm. ∎

Claim 14.

If the assumptions of Theorem 2 hold, then for any i[n]i\in[n] and jjτarmj^{\prime}\geq j\geq\tau_{\text{arm}},

(129) μminSj(i)μminSj(i)(K1)supj′′{j,,j}δj′′,1.\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.
Proof.

If j=jj=j^{\prime} or μminSj(i)μminSj(i)\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}, the bound is immediate, so we assume j>jj^{\prime}>j and μminSj(i)<μminSj(i)\mu_{\min S_{j^{\prime}}^{(i)}}<\mu_{\min S_{j}^{(i)}} for the remainder of the proof. Under this assumption, there must exist a phase j′′{j+1,,j}j^{\prime\prime}\in\{j+1,\ldots,j^{\prime}\} at which the mean of the best active arm reaches a new strict minimum since jj, i.e., μminSj′′(i)<μminSj(i)\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j}^{(i)}}. Let mm denote the number of phases this occurs and j(1),,j(m)j^{(1)},\ldots,j^{(m)} these (ordered) phases; formally,

(130) j(0)=j,j(l)=min{j′′{j(l1)+1,,j}:μminSj′′(i)<μminSj(l1)(i)}l[m].\displaystyle j^{(0)}=j,\quad j^{(l)}=\min\left\{j^{\prime\prime}\in\{j^{(l-1)}+1,\ldots,j^{\prime}\}:\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j^{(l-1)}}^{(i)}}\right\}\ \forall\ l\in[m].

The remainder of the proof relies on the following three inequalities:

(131) mK1,μminSj(m)(i)μminSj(i),μminSj(l1)(i)μminSj(l)1(i)l[m].m\leq K-1,\quad\mu_{\min S_{j^{(m)}}^{(i)}}\leq\mu_{\min S_{j^{\prime}}^{(i)}},\quad\mu_{\min S_{j^{(l-1)}}^{(i)}}\leq\mu_{\min S_{j^{(l)}-1}^{(i)}}\ \forall\ l\in[m].

The first inequality holds since μminSj(0)(i)>>μminSj(m)(i)\mu_{\min S_{j^{(0)}}^{(i)}}>\cdots>\mu_{\min S_{j^{(m)}}^{(i)}} by definition, so minSj(0)(i),,minSj(m)(i)\min S_{j^{(0)}}^{(i)},\ldots,\min S_{j^{(m)}}^{(i)} are distinct arms; since there are m+1m+1 of these arms and KK in total, m+1Km+1\leq K. For the second, we have μminSj(m)(i)=μminSj(i)\mu_{\min S_{j^{(m)}}^{(i)}}=\mu_{\min S_{j^{\prime}}^{(i)}} when j(m)=jj^{(m)}=j^{\prime} and μminSj(m)(i)μminSj(i)\mu_{\min S_{j^{(m)}}^{(i)}}\leq\mu_{\min S_{j^{\prime}}^{(i)}} when j(m)<jj^{(m)}<j^{\prime} (if the latter fails, we contradict the definition of mm). For the third, note j(l)j(l1)+1j^{(l)}\geq j^{(l-1)}+1 by construction, so if j(l)=j(l1)+1j^{(l)}=j^{(l-1)}+1, the bound holds with equality; else, j(l)1j(l1)+1j^{(l)}-1\geq j^{(l-1)}+1, so if the bound fails,

(132) j(l)1{j′′{j(l1)+1,,j}:μminSj′′(i)<μminSj(l1)(i)},j^{(l)}-1\in\left\{j^{\prime\prime}\in\{j^{(l-1)}+1,\ldots,j^{\prime}\}:\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j^{(l-1)}}^{(i)}}\right\},

which is a contradiction, since j(l)j^{(l)} is the minimal element of the set at right. Hence, (131) holds. Combined with Claim 13 (note j(l)1j(0)=jτarml[m]j^{(l)}-1\geq j^{(0)}=j\geq\tau_{\text{arm}}\ \forall\ l\in[m], as required), we obtain

(133) μminSj(i)μminSj(i)\displaystyle\mu_{\min S_{j}^{(i)}}-\mu_{\min S_{j^{\prime}}^{(i)}} =l=1m(μminSj(l1)(i)μminSj(l)(i))+μminSj(m)(i)μminSj(i)\displaystyle=\sum_{l=1}^{m}\left(\mu_{\min S_{j^{(l-1)}}^{(i)}}-\mu_{\min S_{j^{(l)}}^{(i)}}\right)+\mu_{\min S_{j^{(m)}}^{(i)}}-\mu_{\min S_{j^{\prime}}^{(i)}}
(134) l=1m(μminSj(l)1(i)μminSj(l)(i))l=1mδj(l)1,1(K1)supj′′{j,,j}δj′′,1,\displaystyle\leq\sum_{l=1}^{m}\left(\mu_{\min S_{j^{(l)}-1}^{(i)}}-\mu_{\min S_{j^{(l)}}^{(i)}}\right)\leq\sum_{l=1}^{m}\delta_{j^{(l)}-1,1}\leq(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1},

where the last inequality uses j=j(0)j(l)1<jl[m]j=j^{(0)}\leq j^{(l)}-1<j^{\prime}\ \forall\ l\in[m]. ∎

As a simple corollary of the previous two claims, we have the following.

Corollary 3.

If the assumptions of Theorem 2 hold, then for any i[n]i\in[n] and jjτarmj^{\prime}\geq j\geq\tau_{\text{arm}},

(135) μBj(i)μminSj(i)Ksupj′′{j,,j}δj′′,1.\mu_{B_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.
Proof.

Since jjτarmj^{\prime}\geq j\geq\tau_{\text{arm}}, we can use Claims 13 and 14, respectively, to obtain

(136) μBj(i)\displaystyle\mu_{B_{j^{\prime}}^{(i)}} μminSj(i)δj,1μminSj(i)supj′′{j,,j}δj′′,1μminSj(i)Ksupj′′{j,,j}δj′′,1.\displaystyle\geq\mu_{\min S_{j^{\prime}}^{(i)}}-\delta_{j^{\prime},1}\geq\mu_{\min S_{j^{\prime}}^{(i)}}-\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.\qed

Next, inspecting the analysis in Section 7.3, we see that δj,2(K+1)supj{θj,,j}δj,1\delta_{j,2}\geq(K+1)\sup_{j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime},1} for large jj. Thus, the following is well-defined:

(137) J3=min{j:δj,2(K+1)supj′′{θj,,j}δj′′,1jj}.J_{3}^{\star}=\min\left\{j\in\mathbb{N}:\delta_{j^{\prime},2}\geq(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}\ \forall\ j^{\prime}\geq j\right\}.

As discussed in Section 7.3, we can now show that no new accidental blocking occurs at late phases, at least among pairs of honest agents that have recently communicated.

Claim 15.

Under the assumptions of Theorem 2, if jJ3j\geq J_{3}^{\star}, θjτarm\lfloor\theta_{j}\rfloor\geq\tau_{\text{arm}}, and Hj(i)=iH_{j^{\prime}}^{(i^{\prime})}=i for some i,i[n]i,i^{\prime}\in[n] and j{θj,,j2}j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j-2\}, then iPj(i)Pj1(i)i^{\prime}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}.

Proof.

Suppose instead that iPj(i)Pj1(i)i^{\prime}\in P_{j}^{(i)}\setminus P_{j-1}^{(i)}. Then by the algorithm,

(138) Bj(i)==Bθj(i),TRj1(i)(i)(Aj)κj,Rj1(i)=Bj1(i)Sj(i).B_{j}^{(i)}=\cdots=B_{\lfloor\theta_{j}\rfloor}^{(i)},\quad T_{R_{j-1}^{(i)}}^{(i)}(A_{j})\leq\kappa_{j},\quad R_{j-1}^{(i)}=B_{j-1}^{(i^{\prime})}\in S_{j}^{(i)}.

Since jθjτarmj\geq\lfloor\theta_{j}\rfloor\geq\tau_{\text{arm}}, this implies Bj1(i)Gδj,2(Sj(i))B_{j-1}^{(i^{\prime})}\notin G_{\delta_{j,2}}(S_{j}^{(i)}). We then observe the following:

  • Since Bj1(i)Sj(i)Gδj,2(Sj(i))B_{j-1}^{(i^{\prime})}\in S_{j}^{(i)}\setminus G_{\delta_{j,2}}(S_{j}^{(i)}), the definition of Gδj,2(Sj(i))G_{\delta_{j,2}}(S_{j}^{(i)}) implies μBj1(i)<μminSj(i)δj,2\mu_{B_{j-1}^{(i^{\prime})}}<\mu_{\min S_{j}^{(i)}}-\delta_{j,2}.

  • Again using jτarmj\geq\tau_{\text{arm}}, Claim 13 implies μminSj(i)μBj(i)+δj,1\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}.

  • Since θjjj\lfloor\theta_{j}\rfloor\leq j^{\prime}\leq j, (138) implies Bj(i)=Bj(i)B_{j}^{(i)}=B_{j^{\prime}}^{(i)}, so μBj(i)=μBj(i)\mu_{B_{j}^{(i)}}=\mu_{B_{j^{\prime}}^{(i)}}.

  • Since Hj(i)=iH_{j^{\prime}}^{(i^{\prime})}=i, the algorithm implies Bj(i)Sj+1(i)B_{j^{\prime}}^{(i)}\in S_{j^{\prime}+1}^{(i^{\prime})}, so μBj(i)μminSj+1(i)\mu_{B_{j^{\prime}}^{(i)}}\leq\mu_{\min S_{j^{\prime}+1}^{(i^{\prime})}}.

  • Since τarmθj<j+1<j\tau_{\text{arm}}\leq\lfloor\theta_{j}\rfloor<j^{\prime}+1<j, Corollary 3 implies μminSj+1(i)μBj1(i)+Ksupj′′{θj,,j}δj′′,1\mu_{\min S_{j^{\prime}+1}^{(i^{\prime})}}\leq\mu_{B_{j-1}^{(i^{\prime})}}+K\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}.

Stringing together these inequalities, we obtain

(139) μBj1(i)<μBj1(i)+Ksupj′′{θj,,j}δj′′,1δj,2+δj,1μBj1(i)+(K+1)supj′′{θj,,j}δj′′,1δj,2,\mu_{B_{j-1}^{(i^{\prime})}}<\mu_{B_{j-1}^{(i^{\prime})}}+K\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}-\delta_{j,2}+\delta_{j,1}\leq\mu_{B_{j-1}^{(i^{\prime})}}+(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}-\delta_{j,2},

i.e., δj,2<(K+1)supj′′{θj,,j}δj′′,1\delta_{j,2}<(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}, which contradicts jJ3j\geq J_{3}^{\star}. ∎

Finally, we prove that honest agents eventually stop blocking each other.

Lemma 11.

Under the assumptions of Theorem 2, if jJ2j\geq J_{2}^{\star}, θjτcomJ3\lfloor\theta_{j}\rfloor\geq\tau_{\text{com}}\vee J_{3}^{\star}, and θθjτarm\lfloor\theta_{\lfloor\theta_{j}\rfloor}\rfloor\geq\tau_{\text{arm}}, then for any i[n]i\in[n] and jjj^{\prime}\geq j, Pj(i)[n]=P_{j^{\prime}}^{(i)}\cap[n]=\emptyset.

Proof.

Fix iiEhoni\rightarrow i^{\prime}\in E_{\text{hon}} and jjj^{\prime}\geq j; we aim to show iPj(i)i^{\prime}\notin P_{j^{\prime}}^{(i^{\prime})}. This clearly holds if iPj′′(i)Pj′′1(i)j′′ji^{\prime}\notin P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\ \forall\ j^{\prime\prime}\leq j^{\prime}. Otherwise, jB,1=max{j′′j:iPj′′(i)Pj′′1(i)}j_{B,1}=\max\{j^{\prime\prime}\leq j^{\prime}:i^{\prime}\in P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\} (the latest phase up to and including jj^{\prime} that ii blocked ii^{\prime}) is well-defined. We consider two cases of jB,1j_{B,1}.

The first case is jB,1θjj_{B,1}\leq\lfloor\theta_{j}\rfloor. Let jB,2=min{j′′>j:Pj′′(i)Pj′′1(i)}j_{B,2}=\min\{j^{\prime\prime}>j^{\prime}:P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\} denote the first phase after jj^{\prime} that ii blocked ii^{\prime}. Combined with the definition of jB,1j_{B,1}, the algorithm implies

(140) iPj′′(i)j′′{jB,1η+1,,jB,2}.i^{\prime}\notin P_{j^{\prime\prime}}^{(i)}\ \forall\ j^{\prime\prime}\in\{\lceil j_{B,1}^{\eta}\rceil+1,\ldots,j_{B,2}\}.

Since jJ2j\geq J_{2}^{\star}, the definition of J2J_{2}^{\star} implies

(141) θjη+1(j2)(j/3)+1<(j2)+1=j1,\lceil\lfloor\theta_{j}\rfloor^{\eta}\rceil+1\leq(j-2)-(j/3)+1<(j-2)+1=j-1,

so jBη+1<j1<j\lceil j_{B}^{\eta}\rceil+1<j-1<j^{\prime} as well. Combined with jjB,21j^{\prime}\leq j_{B,2}-1 by definition, iPj(i)i^{\prime}\notin P_{j^{\prime}}^{(i^{\prime})} holds by (140).

The second case is jB,1>θjj_{B,1}>\lfloor\theta_{j}\rfloor. By assumption, jB,1>θjτcomj_{B,1}>\lfloor\theta_{j}\rfloor\geq\tau_{\text{com}}. Hence, by definition of τcom\tau_{\text{com}}, HjC(i)=iH_{j_{C}}^{(i^{\prime})}=i for some jC{θjB,1,,jB,12}j_{C}\in\{\lfloor\theta_{j_{B,1}}\rfloor,\ldots,j_{B,1}-2\}. Note that jB,1θjJ3j_{B,1}\geq\lfloor\theta_{j}\rfloor\geq J_{3}^{\star} and θjB,1θθjτarm\lfloor\theta_{j_{B,1}}\rfloor\geq\lfloor\theta_{\lfloor\theta_{j}\rfloor}\rfloor\geq\tau_{\text{arm}} by assumption (and by monotonicity of {θj′′}j′′\{\lfloor\theta_{j^{\prime\prime}}\rfloor\}_{j^{\prime\prime}\in\mathbb{N}} in the latter case). Hence, we can apply Claim 15 (with j=jB,1j=j_{B,1} and j=jCj^{\prime}=j_{C} in the claim) to obtain iPjB,1(i)PjB,11(i)i^{\prime}\notin P_{j_{B,1}}^{(i)}\setminus P_{j_{B,1}-1}^{(i)}. This is a contradiction. ∎

E.4. Details from Section 7.4

We first verify that the sampling strategy in Section 7.4 is identical to the one in Algorithm 2.

Claim 16.

Suppose we replace the sampling of Hj(i)H_{j}^{(i)} in Algorithm 2 with the sampling of Section 7.4, and recall j\mathbb{P}_{j} denotes probability conditioned on all randomness before this sampling occurs. Then

(142) j(Hj(i)=i)=𝟙(iN(i)Pj(i))/|N(i)Pj(i)|i[n],i[n+m],j.\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime})=\mathbbm{1}(i^{\prime}\in N(i)\setminus P_{j}^{(i)})/|N(i)\setminus P_{j}^{(i)}|\ \forall\ i\in[n],i^{\prime}\in[n+m],j\in\mathbb{N}.
Proof.

Since j\mathbb{P}_{j} conditions on Pj(i)P_{j}^{(i)}, we can prove the identity separately in the cases Pj(i)[n]P_{j}^{(i)}\cap[n]\neq\emptyset and Pj(i)[n]=P_{j}^{(i)}\cap[n]=\emptyset. The identity is immediate in the former case. For the latter, we have

(143) j(Hj(i)=i)\displaystyle\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime}) =j(Hj(i)=i|Yj(i)=1)j(Yj(i)=1)+j(Hj(i)=i|Yj(i)=0)j(Yj(i)=0)\displaystyle=\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime}|Y_{j}^{(i)}=1)\mathbb{P}_{j}(Y_{j}^{(i)}=1)+\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime}|Y_{j}^{(i)}=0)\mathbb{P}_{j}(Y_{j}^{(i)}=0)
(144) =𝟙(iNhon(i))dhon(i)dhon(i)|N(i)Pj(i)|+𝟙(iNmal(i)Pj(i))|Nmal(i)Pj(i)|(1dhon(i)|N(i)Pj(i)|)\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))}{d_{\text{hon}}(i)}\frac{d_{\text{hon}}(i)}{|N(i)\setminus P_{j}^{(i)}|}+\frac{\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{|N_{\text{mal}}(i)\setminus P_{j}^{(i)}|}\left(1-\frac{d_{\text{hon}}(i)}{|N(i)\setminus P_{j}^{(i)}|}\right)
(145) =𝟙(iNhon(i))|N(i)Pj(i)|+𝟙(iNmal(i)Pj(i))|Nmal(i)Pj(i)||N(i)Pj(i)|dhon(i)|N(i)Pj(i)|\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))}{|N(i)\setminus P_{j}^{(i)}|}+\frac{\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{|N_{\text{mal}}(i)\setminus P_{j}^{(i)}|}\frac{|N(i)\setminus P_{j}^{(i)}|-d_{\text{hon}}(i)}{|N(i)\setminus P_{j}^{(i)}|}
(146) =𝟙(iNhon(i))+𝟙(iNmal(i)Pj(i))|N(i)Pj(i)|=𝟙(iN(i)Pj(i))|N(i)Pj(i)|.\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))+\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{|N(i)\setminus P_{j}^{(i)}|}=\frac{\mathbbm{1}(i^{\prime}\in N(i)\setminus P_{j}^{(i)})}{|N(i)\setminus P_{j}^{(i)}|}.\qed

Next, note that since δj,10\delta_{j^{\prime},1}\rightarrow 0 as jj^{\prime}\rightarrow\infty, the following is well-defined:

(147) J4=min{j{2,3,}:δj,1<Δ2jj/2}.J_{4}^{\star}=\min\{j\in\{2,3,\ldots\}:\delta_{j^{\prime},1}<\Delta_{2}\ \forall\ j^{\prime}\geq\lfloor j/2\rfloor\}.

As in Section 7.4, we let j={i[n]:1Sj(i)}\mathcal{I}_{j}=\{i\in[n]:1\in S_{j}^{(i)}\} be the agents with the best arm active at phase jj.

Claim 17.

Under the assumptions of Theorem 2, if jJ4j\geq J_{4}^{\star} and j/2τarm\lfloor j/2\rfloor\geq\tau_{\text{arm}}, then Bj(i)=1jj/2,ijB_{j^{\prime}}^{(i)}=1\ \forall\ j^{\prime}\geq\lfloor j/2\rfloor,i\in\mathcal{I}_{j^{\prime}}.

Proof.

Suppose instead that Bj(i)1B_{j^{\prime}}^{(i)}\neq 1 for some jj/2j^{\prime}\geq\lfloor j/2\rfloor and iji\in\mathcal{I}_{j^{\prime}}. Since jj/2J4/2j^{\prime}\geq\lfloor j/2\rfloor\geq\lfloor J_{4}^{\star}/2\rfloor, we know δj,1<Δ2\delta_{j^{\prime},1}<\Delta_{2}. Hence, because 1Sj(i)1\in S_{j^{\prime}}^{(i)} by definition of j\mathcal{I}_{j^{\prime}}, we have Gδj,1(Sj(i))={1}G_{\delta_{j^{\prime},1}}(S_{j^{\prime}}^{(i)})=\{1\}. Combined with Bj(i)1B_{j^{\prime}}^{(i)}\neq 1, we get Bj(i)Sj(i)Gδj,1(Sj(i))B_{j^{\prime}}^{(i)}\in S_{j^{\prime}}^{(i)}\setminus G_{\delta_{j^{\prime},1}}(S_{j^{\prime}}^{(i)}), which contradicts jj/2τarmj^{\prime}\geq\lfloor j/2\rfloor\geq\tau_{\text{arm}}. ∎

Finally, recall τspr=inf{j:Bj(i)=1i[n],jj}\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n],j^{\prime}\geq j\} and τ¯spr=inf{j:¯j=[n]}\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\}, where {¯j}j=1\{\bar{\mathcal{I}}_{j}\}_{j=1}^{\infty} is the noisy rumor process from Definition 1.

Lemma 12.

Under the assumptions of Theorem 2, if jJ4j\geq J_{4}^{\star}, j/2J2\lfloor j/2\rfloor\geq J_{2}^{\star}, and θj/2J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}, then

(148) (τcomθj/2,τarmθθj/2,τspr>j)(τ¯spr>j/2).\mathbb{P}(\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{spr}}>j)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).
Proof.

Let j={τcomθj/2,τarmθθj/2}\mathcal{E}_{j}=\{\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\} and

(149) ~j/2={i},~j=¯j1{i[n]~j1:Y¯j(i)=1,H¯j(i)~j1}j{j/2+1,j/2+2,}.\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}=\{i^{\star}\},\ \tilde{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\tilde{\mathcal{I}}_{j-1}:\bar{Y}_{j}^{(i)}=1,\bar{H}_{j}^{(i)}\in\tilde{\mathcal{I}}_{j-1}\}\ \forall\ j\in\{\lfloor j/2\rfloor+1,\lfloor j/2\rfloor+2,\ldots\}.

Then it suffices to prove the following:

(150) j/2(j{τspr>j})j/2(j{j[n]})j/2(~j[n])=(τ¯spr>j/2).\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).

For the first inequality in (150), we begin by proving

(151) j{τspr>j}{j=[n]}=.\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\}\cap\{\mathcal{I}_{j}=[n]\}=\emptyset.

To do so, we show j,j=[n]τsprj\mathcal{E}_{j},\mathcal{I}_{j}=[n]\Rightarrow\tau_{\text{spr}}\leq j. Assume j\mathcal{E}_{j} and j=[n]\mathcal{I}_{j}=[n] hold; by definition of τspr\tau_{\text{spr}}, we aim to show Bj(i)=1i[n],jjB_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n],j^{\prime}\geq j. We use induction. The base of induction (j=jj^{\prime}=j) holds by j=[n]\mathcal{I}_{j}=[n], j{τarmθθj/2j/2}\mathcal{E}_{j}\subset\{\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\leq\lfloor j/2\rfloor\}, and Claim 17. Given the inductive hypothesis Bj(i)=1i[n]B_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n], we have 1Sj+1(i)1\in S_{j^{\prime}+1}^{(i)} by the algorithm and j+1=[n]\mathcal{I}_{j^{\prime}+1}=[n] by definition, so we again use the assumption that j\mathcal{E}_{j} holds and Claim 17 to obtain Bj+1(i)=1i[n]B_{j^{\prime}+1}^{(i)}=1\ \forall\ i\in[n]. Hence, (151) holds, so

(152) j/2(j{τspr>j})=j/2(j{τspr>j}{j[n]})j/2(j{j[n]}).\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\})=\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\}).

For the second inequality in (150), we claim, and will return to prove, the following:

(153) jj=j/2j{~jj}.\mathcal{E}_{j}\subset\cap_{j^{\prime}=\lfloor j/2\rfloor}^{j}\{\tilde{\mathcal{I}}_{j^{\prime}}\subset\mathcal{I}_{j^{\prime}}\}.

Assuming (153) holds, we obtain

(154) j/2(j{j[n]})j/2(~jj,j[n],)j/2(~j[n]).\displaystyle\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\subset\mathcal{I}_{j},\mathcal{I}_{j}\neq[n],)\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n]).

Hence, it only remains to prove (153). We show by induction on jj^{\prime} when j\mathcal{E}_{j} holds, ~jj\tilde{\mathcal{I}}_{j^{\prime}}\subset\mathcal{I}_{j^{\prime}} for each j{j/2,,j}j^{\prime}\in\{\lfloor j/2\rfloor,\ldots,j\}. For j=j/2j^{\prime}=\lfloor j/2\rfloor, recall 1Sj/2(i)1\in S_{\lfloor j/2\rfloor}^{(i^{\star})} by Assumption 3, so j/2{i}=~j/2\mathcal{I}_{\lfloor j/2\rfloor}\supset\{i^{\star}\}=\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}. Now assume ~j1j1\tilde{\mathcal{I}}_{j^{\prime}-1}\subset\mathcal{I}_{j^{\prime}-1} for some j{j/2+1,,j}j^{\prime}\in\{\lfloor j/2\rfloor+1,\ldots,j\}. Let i~ji\in\tilde{\mathcal{I}}_{j^{\prime}}; we aim to show that iji\in\mathcal{I}_{j^{\prime}}, i.e., that 1Sj(i)1\in S_{j^{\prime}}^{(i)}. By (149), we have two cases to consider:

  • i~j1i\in\tilde{\mathcal{I}}_{j^{\prime}-1}: By the inductive hypothesis, ij1i\in{\mathcal{I}}_{j^{\prime}-1} as well, so 1Sj1(i)1\in S_{j^{\prime}-1}^{(i)} by definition. Combined with j1j/2j^{\prime}-1\geq\lfloor j/2\rfloor and Claim 17 (recall jJ4j\geq J_{4}^{\star} and j/2θθj/2τarm\lfloor j/2\rfloor\geq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq\tau_{\text{arm}} when j\mathcal{E}_{j} holds, so the claim applies), this implies Bj1(i)=1B_{j^{\prime}-1}^{(i)}=1, so 1Sj(i)1\in S_{j^{\prime}}^{(i)} by the algorithm.

  • i[n]~j1,Y¯j(i)=1,H¯j(i)~j1i\in[n]\setminus\tilde{\mathcal{I}}_{j^{\prime}-1},\bar{Y}_{j^{\prime}}^{(i)}=1,\bar{H}_{j^{\prime}}^{(i)}\in\tilde{\mathcal{I}}_{j^{\prime}-1}: First observe that since θj/2J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star} and jj/2J2j^{\prime}\geq\lfloor j/2\rfloor\geq J_{2}^{\star}, we can apply Lemma 11 on the event j\mathcal{E}_{j} (with jj replaced by j/2\lfloor j/2\rfloor in that lemma) to obtain Pj(i)[n]=P_{j^{\prime}}^{(i)}\cap[n]=\emptyset. On the other hand, recall that Y¯j(i)\bar{Y}_{j^{\prime}}^{(i)} is Bernoulli(Υ)\text{Bernoulli}(\Upsilon) in Definition 1 and υj(i)\upsilon_{j^{\prime}}^{(i)} is Uniform[0,1]\text{Uniform}[0,1] for the Section 7.4 sampling, so we can realize the former as Y¯j(i)=𝟙(υj(i)Υ)\bar{Y}_{j^{\prime}}^{(i)}=\mathbbm{1}(\upsilon_{j^{\prime}}^{(i)}\leq\Upsilon). Hence, by assumption Y¯j(i)=1\bar{Y}_{j^{\prime}}^{(i)}=1 and definition Υ=mini[n]dhon(i)/d(i)\Upsilon=\min_{i\in[n]}d_{\text{hon}}(i)/d(i), we obtain

    (155) 1=Y¯j(i)=𝟙(υj(i)Υ)𝟙(υj(i)dhon(i)d(i))𝟙(υj(i)dhon(i)|N(i)Pj(i)|)=Yj(i).1=\bar{Y}_{j^{\prime}}^{(i)}=\mathbbm{1}(\upsilon_{j^{\prime}}^{(i)}\leq\Upsilon)\leq\mathbbm{1}\left(\upsilon_{j^{\prime}}^{(i)}\leq\frac{d_{\text{hon}}(i)}{d(i)}\right)\leq\mathbbm{1}\left(\upsilon_{j^{\prime}}^{(i)}\leq\frac{d_{\text{hon}}(i)}{|N(i)\setminus P_{j}^{(i)}|}\right)=Y_{j}^{(i)}.

    In summary, we have shown that Pj(i)[n]=P_{j^{\prime}}^{(i)}\cap[n]=\emptyset and Yj(i)=1Y_{j}^{(i)}=1. Hence, by the Section 7.4 sampling, we conclude Hj(i)=H¯j(i)H_{j}^{(i)}=\bar{H}_{j}^{(i)}. Let i=Hj(i)i^{\prime}=H_{j}^{(i)} denote this honest agent. Then by the inductive hypothesis, we know that i~j1j1i^{\prime}\in\tilde{\mathcal{I}}_{j^{\prime}-1}\subset{\mathcal{I}}_{j^{\prime}-1}, i.e., that 1Sj1(i)1\in S_{j^{\prime}-1}^{(i^{\prime})}. By Claim 17, this implies Bj1(i)=1B_{j^{\prime}-1}^{(i^{\prime})}=1, so by the algorithm, 1=Bj1(i)=Rj1(i)Sj(i)1=B_{j^{\prime}-1}^{(i^{\prime})}=R_{j^{\prime}-1}^{(i)}\in S_{j^{\prime}}^{(i)}.

Finally, for the equality in (150), note that ~j\tilde{\mathcal{I}}_{j^{\prime}} is independent of the randomness before j/2\lfloor j/2\rfloor. By ~j/2=¯0={i}\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}=\bar{\mathcal{I}}_{0}=\{i^{\star}\}, the fact that Y¯j(i)\bar{Y}_{j^{\prime}}^{(i)} and Y¯jj/2(i)\bar{Y}_{j^{\prime}-\lfloor j/2\rfloor}^{(i)} are both Bernoulli(Υ)\text{Bernoulli}(\Upsilon) random variables and H¯j(i)\bar{H}_{j^{\prime}}^{(i)} and H¯jj/2(i)\bar{H}_{j^{\prime}-\lfloor j/2\rfloor}^{(i)} are both sampled uniformly from Nhon(i)N_{\text{hon}}(i), ~j\tilde{\mathcal{I}}_{j^{\prime}} has the same distribution as ¯jj/2\bar{\mathcal{I}}_{j^{\prime}-\lfloor j/2\rfloor}. Also, note that ~j\tilde{\mathcal{I}}_{j^{\prime}} is independent of the randomness before j/2\lfloor j/2\rfloor. Finally, by definition of τ¯spr\bar{\tau}_{\text{spr}}, ¯jj/2[n]\bar{\mathcal{I}}_{j-\lfloor j/2\rfloor}\neq[n] implies that τ¯spr>jj/2j/2\bar{\tau}_{\text{spr}}>j-\lfloor j/2\rfloor\geq j/2. These observations successively imply

(156) j/2(~j[n])=(~j[n])=(¯jj/2[n])(τ¯spr>j/2).\displaystyle\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\bar{\mathcal{I}}_{j^{\prime}-\lfloor j/2\rfloor}\neq[n])\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).\qed

E.5. Details from Section 7.5

Combining the lemmas of the above sub-appendices, we can bound the tail of the spreading time.

Theorem 3.

Under the assumptions of Theorem 2, for any jJj\geq J_{\star}, where JJ_{\star} is defined in Claim 32 from Appendix F.3, we have

(157) (τspr>j)(84ρ12(β(2α3)1)(6β+2)nK2S(2α3)(β(2α3)1)+3)jρ12(β(32α)+1)+(τ¯spr>j2).\mathbb{P}(\tau_{\text{spr}}>j)\leq\left(\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)nK^{2}S}{(2\alpha-3)(\beta(2\alpha-3)-1)}+3\right)j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}+\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\frac{j}{2}\right).
Proof.

For any jJj\geq J_{\star}, we can use Claim 32 to obtain

(158) jJ4,θj/2J2J3,θθj/2J1(2+jρ12/84)3\displaystyle j\geq J_{4}^{\star},\quad\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\vee J_{3}^{\star},\quad\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee(2+j^{\rho_{1}^{2}}/84)\geq 3
(159) (n+m)3exp(θj/2/(3d¯))jρ12(β(32α)+1).\displaystyle(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

In particular, θθj/2J13\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee 3 implies that we can use Lemma 9 with jj replaced by θθj/2\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor. Combined with (158) (namely, θθj/22jρ12/84\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor-2\geq j^{\rho_{1}^{2}}/84); this yields

(160) (τarm>θθj/2)84ρ12(β(2α3)1)(6β+2)nK2S(2α3)(β(2α3)1)jρ12(β(32α)+1).\mathbb{P}(\tau_{\text{arm}}>\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor)\leq\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)nK^{2}S}{(2\alpha-3)(\beta(2\alpha-3)-1)}j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

Since θj/2J2\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star} by (158), we can use Lemma 10 (with jj replaced by θj/2\lfloor\theta_{\lfloor j/2\rfloor}\rfloor) and (159) to obtain

(161) (τcom>θj/2)3(n+m)2exp(θj/23d¯)3jρ12(β(32α)+1).\mathbb{P}(\tau_{\text{com}}>\lfloor\theta_{\lfloor j/2\rfloor}\rfloor)\leq 3(n+m)^{2}\exp\left(-\frac{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}{3\bar{d}}\right)\leq 3j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

Furthermore, using the bounds jJ4j\geq J_{4}^{\star}, θj/2J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}, and θj/2J2\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star} from (158) (the last of which implies j/2J2\lfloor j/2\rfloor\geq J_{2}^{\star}, since θj/2=(j/2/3)ρ1j/2\theta_{\lfloor j/2\rfloor}=(\lfloor j/2\rfloor/3)^{\rho_{1}}\leq\lfloor j/2\rfloor by ρ1(0,1)\rho_{1}\in(0,1)), we can use Lemma 12 to get

(162) (τarmθθj/2,τcomθj/2,τspr>j)(τ¯spr>j/2).\mathbb{P}(\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{spr}}>j)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).

Finally, by the union bound, we have

(163) (τspr>j)\displaystyle\mathbb{P}(\tau_{\text{spr}}>j) (τarm>θθj/2)+(τcom>θj/2)\displaystyle\leq\mathbb{P}(\tau_{\text{arm}}>\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor)+\mathbb{P}(\tau_{\text{com}}>\lfloor\theta_{\lfloor j/2\rfloor}\rfloor)
(164) +(τarmθθj/2,τcomθj/2,τspr>j),\displaystyle\quad+\mathbb{P}(\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{spr}}>j),

so combining the previous four inequalities yields the desired result. ∎

Finally, as a corollary, we can bound 𝔼[Aτspr]\mathbb{E}[A_{\tau_{\text{spr}}}].

Corollary 4.

Under the assumptions of Theorem 2, we have

(165) 𝔼[Aτspr]\displaystyle\mathbb{E}[A_{\tau_{\text{spr}}}] (J)β+(84ρ12(β(2α3)1)(6β+2)(2α3)(β(2α3)1)+3)2βnK2Sρ12(β(2α3)1)β+𝔼[A2τ¯spr]\displaystyle\leq(J_{\star})^{\beta}+\left(\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)}{(2\alpha-3)(\beta(2\alpha-3)-1)}+3\right)\frac{2\beta nK^{2}S}{\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta}+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]
(166) =O(Sβ/(ρ12(β1))(Slog(S/Δ2)/Δ22)β/(β1)(d¯log(n+m))β/ρ1nK2S)+𝔼[A2τ¯spr],\displaystyle=O\left(S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee(\bar{d}\log(n+m))^{\beta/\rho_{1}}\vee nK^{2}S\right)+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}],

where JJ_{\star} is defined as in Claim 32 from Appendix F.3.

Proof.

We first observe

(167) 𝔼[Aτspr]=j=1(AjAj1)(τsprj)AJ1+j=J(AjAj1)(τsprj).\mathbb{E}[A_{\tau_{\text{spr}}}]=\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j)\leq A_{J_{\star}-1}+\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j).

For the first term in (167), using Claim 18 from Appendix F.1, we compute

(168) AJ1=AJ(AJAJ1)=(J)β(AJAJ1)(J)β+11=(J)β.A_{J_{\star}-1}=A_{J_{\star}}-(A_{J_{\star}}-A_{J_{\star}-1})=\lceil(J_{\star})^{\beta}\rceil-(A_{J_{\star}}-A_{J_{\star}-1})\leq(J_{\star})^{\beta}+1-1=(J_{\star})^{\beta}.

For the second term in (167), define the constant

(169) C=84ρ12(β(2α3)1)(6β+2)(2α3)(β(2α3)1)+3.C=\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)\ }{(2\alpha-3)(\beta(2\alpha-3)-1)}+3.

Then by Theorem 3, we have

(170) j=J(AjAj1)(τsprj)\displaystyle\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j) j=J(AjAj1)(τ¯spr>j/2)\displaystyle\leq\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2)
(171) +CnK2Sj=J(AjAj1)jρ12(β(32α)+1).\displaystyle\quad+CnK^{2}S\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

For (170), we simply use nonnegativity to write

(172) j=J(AjAj1)(τ¯spr>j/2)j=1(AjAj1)(2τ¯sprj)=𝔼[A2τ¯spr].\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2)\leq\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(2\bar{\tau}_{\text{spr}}\geq j)=\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}].

For the second term in (171), we use Claim 18 and β>1\beta>1 and ρ12(β(2α3)1)>β\rho_{1}^{2}(\beta(2\alpha-3)-1)>\beta by the assumptions of Theorem 2 to write

(173) CnK2Sj=J(AjAj1)jρ12(β(32α)+1)<2βCnK2Sj=Jj1(ρ12(β(2α3)1)β)\displaystyle CnK^{2}S\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}<2\beta CnK^{2}S\sum_{j=J_{\star}}^{\infty}j^{-1-(\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta)}
(174) 2βCnK2Sj=1j1(ρ12(β(2α3)1)β)𝑑j=2βCnK2Sρ12(β(2α3)1)β.\displaystyle\quad\leq 2\beta CnK^{2}S\int_{j=1}^{\infty}j^{-1-(\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta)}dj=\frac{2\beta CnK^{2}S}{\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta}.

Finally, combining the above bounds completes the proof. ∎

Appendix F Other proofs

F.1. Basic inequalities

In this sub-appendix, we prove some simple inequalities used frequently in the analysis.

Claim 18.

For any jj\in\mathbb{N}, we have

jβAjj2β,β(j1)β11<AjAj1<βjβ1+1,AjAj11,j^{\beta}\leq A_{j}\leq j^{2\beta},\quad\beta(j-1)^{\beta-1}-1<A_{j}-A_{j-1}<\beta j^{\beta-1}+1,\quad A_{j}-A_{j-1}\geq 1,

and for any z1z\geq 1 and ll\in\mathbb{N}, we have Alze2β(lz)βA_{l\lceil z\rceil}\leq e^{2\beta}(lz)^{\beta}.

Proof.

For the first pair of inequalities, observe Aj=jβ=j2β=1A_{j}=j^{\beta}=j^{2\beta}=1 when j=1j=1, and for j2j\geq 2,

(175) jβAjjβ+12jβj1+βj2β.j^{\beta}\leq A_{j}\leq j^{\beta}+1\leq 2j^{\beta}\leq j^{1+\beta}\leq j^{2\beta}.

For the second pair of inequalities, we first observe

(176) AjAj1>jβ(j1)β1=βxβ11β(j1)β11,A_{j}-A_{j-1}>j^{\beta}-(j-1)^{\beta}-1=\beta x^{\beta-1}-1\geq\beta(j-1)^{\beta-1}-1,

where the equality holds for some x[j1,j]x\in[j-1,j] by the mean value theorem and the second inequality is xj1x\geq j-1. By analogous reasoning, one can also show AjAj1<βjβ1+1A_{j}-A_{j-1}<\beta j^{\beta-1}+1, so the second pair of inequalities holds. The third inequality holds with equality when j=1j=1, and for j2j\geq 2, the lower bound in (176) and β>1\beta>1 imply AjAj1>0A_{j}-A_{j-1}>0, so since AjA_{j} and Aj1A_{j-1} are integers, AjAj11A_{j}-A_{j-1}\geq 1. Finally, using z1z\geq 1, β>1\beta>1, and 2<e2<e, we can write

Alz<(l(z+1))β+1<(2lz)β+(2lz)β=2β+1(lz)β<e2β(zl)β.A_{l\lceil z\rceil}<(l(z+1))^{\beta}+1<(2lz)^{\beta}+(2lz)^{\beta}=2^{\beta+1}(lz)^{\beta}<e^{2\beta}(zl)^{\beta}.\qed
Claim 19.

For any jj\in\mathbb{N} and c>1c>1, i=j+1icj1c/(c1)\sum_{i=j+1}^{\infty}i^{-c}\leq j^{1-c}/(c-1).

Proof.

Since j1j\geq 1 and c>1c>1, we can write

i=j+1ic=i=j+1x=i1iic𝑑xi=j+1x=i1ixc𝑑x=x=jxc𝑑x=j1cc1.\displaystyle\sum_{i=j+1}^{\infty}i^{-c}=\sum_{i=j+1}^{\infty}\int_{x=i-1}^{i}i^{-c}dx\leq\sum_{i=j+1}^{\infty}\int_{x=i-1}^{i}x^{-c}dx=\int_{x=j}^{\infty}x^{-c}dx=\frac{j^{1-c}}{c-1}.\qed
Claim 20.

For any x,y,z>0x,y,z>0 such that xyzlogxx^{y}\leq z\log x, x<((2z/y)log(2z/y))1/y<(2z/y)2/yx<((2z/y)\log(2z/y))^{1/y}<(2z/y)^{2/y}.

Proof.

Multiplying and dividing the right side of the assumed inequality by y/2y/2, we obtain xy(2z/y)logxy/2x^{y}\leq(2z/y)\log x^{y/2}. We can then loosen this bound to get xy<(2z/y)xy/2x^{y}<(2z/y)x^{y/2}, or x<(2z/y)2/yx<(2z/y)^{2/y}. Plugging into the log\log term of the assumed inequality yields xy<(2z/y)log(2z/y)x^{y}<(2z/y)\log(2z/y). Raising both sides to the power 1/y1/y establishes the first bound. The second bound follows by using log(2z/y)<2z/y\log(2z/y)<2z/y. ∎

Remark 14.

We typically apply Claim 20 with yy constant but zz not. It allows us to invert inequalities of the form xyzlogxx^{y}\leq z\log x to obtain x=O~(z1/y)x=\tilde{O}(z^{1/y}).

F.2. Bandit inequalities

Next, we state and prove some basic bandit inequalities. The proof techniques are mostly modified from existing work (e.g., (Auer et al., 2002)), but we provide the bounds in forms useful for our setting.

Claim 21.

Suppose that k1,k2[K]k_{1},k_{2}\in[K], tt\in\mathbb{N}, ,u>0\ell,u>0, and ι(0,1]\iota\in(0,1] satisfy

μk2μk1αlogt(21ιu).\mu_{k_{2}}-\mu_{k_{1}}\geq\sqrt{\alpha\log t}\left(\frac{2}{\sqrt{\ell}}-\frac{1-\iota}{\sqrt{u}}\right).

Let jj\in\mathbb{N} be such that t{1+Aj1,,Aj}t\in\{1+A_{j-1},\ldots,A_{j}\}, i.e., jA1(t)j\in A^{-1}(t). Then for any i[n]i\in[n], we have

(Tk1(i)(t1),Tk2(i)(Aj)u,k2Sj(i),It(i)=k1)2(ut)t12αι2\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq u,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq 2(\lfloor u\rfloor\wedge t)t^{1-2\alpha\iota^{2}}
Proof.

For k[K]k\in[K], let {Xk(s)}s=1\{X_{k}(s)\}_{s=1}^{\infty} be an i.i.d. sequence distributed as νk\nu_{k}, and for ss\in\mathbb{N}, let

(177) μ^k(i)(s)=1ss=1sXk(s),Uk(i)(t,s)=μ^k(i)(s)+αlogts\hat{\mu}_{k}^{(i)}(s)=\frac{1}{s}\sum_{s^{\prime}=1}^{s}X_{k}(s^{\prime}),\quad U_{k}^{(i)}(t,s)=\hat{\mu}_{k}^{(i)}(s)+\sqrt{\frac{\alpha\log t}{s}}

denote the empirical mean and UCB index of if ii has pulled the kk-th arm ss times before tt. Then Algorithm 1 implies that if k2Sj(i)k_{2}\in S_{j}^{(i)} and It(i)=k1I_{t}^{(i)}=k_{1}, we must have

(178) Uk1(i)(t,Tk1(i)(t1))Uk2(i)(t,Tk2(i)(t1)).U_{k_{1}}^{(i)}(t,T_{k_{1}}^{(i)}(t-1))\geq U_{k_{2}}^{(i)}(t,T_{k_{2}}^{(i)}(t-1)).

Next, note that if Tk2(i)(Aj)uT_{k_{2}}^{(i)}(A_{j})\leq u , then by monotonicity, Tk2(i)(t1)uT_{k_{2}}^{(i)}(t-1)\leq u as well. Combined with the fact that Tk2(i)(t1)[t]T_{k_{2}}^{(i)}(t-1)\in[t] by definition, we conclude that Tk2(i)(Aj)uT_{k_{2}}^{(i)}(A_{j})\leq u implies Tk2(i)(t1)utT_{k_{2}}^{(i)}(t-1)\leq\lfloor u\rfloor\wedge t. Similarly, Tk1(i)(t1)T_{k_{1}}^{(i)}(t-1)\geq\ell implies Tk1(i)(t1)T_{k_{1}}^{(i)}(t-1)\geq\lceil\ell\rceil (since Tk1(i)(t1)T_{k_{1}}^{(i)}(t-1)\in\mathbb{N}). Combined with (178), we obtain that if the event in the statement of the claim occurs, it must be the case that

maxs1{,,t}Uk1(i)(t,s1)mins2[ut]Uk2(i)(t,s2).\max_{s_{1}\in\{\lceil\ell\rceil,\ldots,t\}}U_{k_{1}}^{(i)}(t,s_{1})\geq\min_{s_{2}\in[\lfloor u\rfloor\wedge t]}U_{k_{2}}^{(i)}(t,s_{2}).

Therefore, by the union bound, we obtain

(179) (Tk1(i)(t1),Tk2(i)(Aj)u,k2Sj(i),It(i)=k1)s1=ts2=1ut(Uk1(i)(t,s1)Uk2(i)(t,s2)).\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq u,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq\sum_{s_{1}=\lceil\ell\rceil}^{t}\sum_{s_{2}=1}^{\lfloor u\rfloor\wedge t}\mathbb{P}(U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2})).

Now fix s1s_{1} and s2s_{2} as in the double summation. We claim Uk1(i)(t,s1)Uk2(i)(t,s2)U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2}) implies

μ^k1(i)(s1)μk1+αlog(t)/s1orμ^k2(i)(s2)μk2ιαlog(t)/s2.\hat{\mu}_{k_{1}}^{(i)}(s_{1})\geq\mu_{k_{1}}+\sqrt{\alpha\log(t)/s_{1}}\quad\text{or}\quad\hat{\mu}_{k_{2}}^{(i)}(s_{2})\leq\mu_{k_{2}}-\iota\sqrt{\alpha\log(t)/s_{2}}.

Indeed, if instead both inequalities fail, then by choice of s1,s2s_{1},s_{2} and the assumption of the claim,

(180) Uk1(i)(t,s1)\displaystyle U_{k_{1}}^{(i)}(t,s_{1}) <μk1+2αlog(t)/s1μk1+2αlog(t)/\displaystyle<\mu_{k_{1}}+2\sqrt{\alpha\log(t)/s_{1}}\leq\mu_{k_{1}}+2\sqrt{\alpha\log(t)/\ell}
(181) μk2+(1ι)αlog(t)/uμk2+(1ι)αlog(t)/s2<Uk2(i)(t,s2),\displaystyle\leq\mu_{k_{2}}+(1-\iota)\sqrt{\alpha\log(t)/u}\leq\mu_{k_{2}}+(1-\iota)\sqrt{\alpha\log(t)/s_{2}}<U_{k_{2}}^{(i)}(t,s_{2}),

which is a contradiction. Thus, by the union bound, Hoeffding’s inequality, and ι(0,1)\iota\in(0,1), we obtain

(Uk1(i)(t,s1)Uk2(i)(t,s2))\displaystyle\mathbb{P}(U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2})) (μ^k1(i)(s1)μk1+αlog(t)/s1)+(μ^k2(i)(s2)μk2ιαlog(t)/s2)\displaystyle\leq\mathbb{P}(\hat{\mu}_{k_{1}}^{(i)}(s_{1})\geq\mu_{k_{1}}+\sqrt{\alpha\log(t)/s_{1}})+\mathbb{P}(\hat{\mu}_{k_{2}}^{(i)}(s_{2})\leq\mu_{k_{2}}-\iota\sqrt{\alpha\log(t)/s_{2}})
e2αlogt+e2αι2logt=t2α+t2αι22t2αι2,\displaystyle\leq e^{-2\alpha\log t}+e^{-2\alpha\iota^{2}\log t}=t^{-2\alpha}+t^{-2\alpha\iota^{2}}\leq 2t^{-2\alpha\iota^{2}},

so plugging into (179) completes the proof. ∎

Corollary 5.

Suppose that k1,k2[K]k_{1},k_{2}\in[K], tt\in\mathbb{N},and >0\ell>0 satisfy μk2μk14αlog(t)/\mu_{k_{2}}-\mu_{k_{1}}\geq\sqrt{4\alpha\log(t)/\ell}. Let jj\in\mathbb{N} be such that t{1+Aj1,,Aj}t\in\{1+A_{j-1},\ldots,A_{j}\}, i.e., j=A1(t)j=A^{-1}(t). Then for any i[n]i\in[n], we have

(Tk1(i)(t1),k2Sj(i),It(i)=k1)2t2(1α).\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq 2t^{2(1-\alpha)}.
Proof.

Using Tk2(i)(Aj)AjT_{k_{2}}^{(i)}(A_{j})\leq A_{j} by definition and applying Claim 21 with u=Aju=A_{j} and ι=1\iota=1,

(Tk1(i)(t1),k2Sj(i),It(i)=k1)\displaystyle\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1}) =(Tk1(i)(t1),Tk2(i)(Aj)Aj,k2Sj(i),It(i)=k1)\displaystyle=\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq A_{j},k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})
2(Ajt)t12α=t2(1α).\displaystyle\leq 2(\lfloor A_{j}\rfloor\wedge t)t^{1-2\alpha}=t^{2(1-\alpha)}.\qed
Corollary 6.

For any i[n]i\in[n], k[K]k\in[K], and TT\in\mathbb{N}, we have

𝔼[t=Aτspr+1T𝟙(It(i)=k,Tk(i)(t1)4αlogtΔk2)]4(α1)2α3.\mathbb{E}\left[\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq\frac{4(\alpha-1)}{2\alpha-3}.
Proof.

First observe that since 1SA1(t)(i)1\in S_{A^{-1}(t)}^{(i)} whenever tAτspr+1t\geq A_{\tau_{\text{spr}}}+1 by definition, we have

(182) t=Aτspr+1T𝟙(It(i)=k,Tk(i)(t1)4αlogtΔk2)t=1𝟙(Tk(i)(t1)4αlogtΔk2,1SA1(t)(i),It(i)=k).\displaystyle\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq\sum_{t=1}^{\infty}\mathbbm{1}\left(T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}},1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k\right).

Next, let k1=kk_{1}=k, k2=1k_{2}=1, and =4αlog(t)/Δk2\ell=4\alpha\log(t)/\Delta_{k}^{2}. Then by definition, we have μk2μk1=Δk=4αlog(t)/\mu_{k_{2}}-\mu_{k_{1}}=\Delta_{k}=\sqrt{4\alpha\log(t)/\ell}. Therefore, we can use Corollary 5 to obtain

(183) (Tk(i)(t1)4αlog(t)/Δk2,1SA1(t)(i),It(i)=k)2t2(1α).\mathbb{P}(T_{k}^{(i)}(t-1)\geq 4\alpha\log(t)/\Delta_{k}^{2},1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k)\leq 2t^{2(1-\alpha)}.

Hence, taking expectation in (182), then plugging in (183) to the right side and using Claim 19, yields

𝔼[t=Aτspr+1T𝟙(It(i)=k,Tk(i)(t1)4αlogtΔk2)]2(1+t=2t2(1α))4(α1)2α3.\mathbb{E}\left[\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq 2\left(1+\sum_{t=2}^{\infty}t^{2(1-\alpha)}\right)\leq\frac{4(\alpha-1)}{2\alpha-3}.\qed
Claim 22.

For any i[n]i\in[n], t1,t2t_{1},t_{2}\in\mathbb{N} such that t1<t2t_{1}<t_{2}, and {t}t=t1t2(0,)\{\ell_{t}\}_{t=t_{1}}^{t_{2}}\subset(0,\infty),

t=t1t2𝟙(It(i)=k,Tk(i)(t1)<t)maxt{t1,,t2}t.\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell_{t}\right)\leq\max_{t\in\{t_{1},\ldots,t_{2}\}}\ell_{t}.
Proof.

Set =maxt{t1,,t2}t\ell=\max_{t\in\{t_{1},\ldots,t_{2}\}}\ell_{t}. Then clearly

t=t1t2𝟙(It(i)=k,Tk(i)(t1)<t)t=t1t2𝟙(It(i)=k,Tk(i)(t1)<).\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell_{t}\right)\leq\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell\right).

Now suppose the right strictly exceeds \ell. Then since the right side is an integer, we can find \lceil\ell\rceil times t{t1,,t2}t\in\{t_{1},\ldots,t_{2}\} such that It(i)=kI_{t}^{(i)}=k and Tk(i)(t1)<T_{k}^{(i)}(t-1)<\ell. Let t¯\bar{t} denote the largest such tt. Because It(i)=kI_{t}^{(i)}=k occurred at least 1\lceil\ell\rceil-1 times before tt, we know Tk(i)(t¯1)1T_{k}^{(i)}(\bar{t}-1)\geq\lceil\ell\rceil-1. But since Tk(i)(t¯1)<T_{k}^{(i)}(\bar{t}-1)<\ell, this implies +1>\ell+1>\lceil\ell\rceil, which is a contradiction. ∎

Finally, we recall a well-known regret decomposition.

Claim 23.

The regret RT(i)R_{T}^{(i)} defined in (5) satisfies RT(i)=k=2KΔk𝔼[t=1T𝟙(It(i)=k)]R_{T}^{(i)}=\sum_{k=2}^{K}\Delta_{k}\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(i)}=k)].

Proof.

See, e.g., the proof of (Lattimore and Szepesvári, 2020, Lemma 4.5). ∎

F.3. Calculations for the early regret

In this sub-appendix, we assume α\alpha, β\beta, η\eta, θj\theta_{j}, κj\kappa_{j}, ρ1\rho_{1}, and ρ2\rho_{2} are chosen as in Theorem 2. Recall CiC_{i}, CiC_{i}^{\prime}, etc. denote constants associated with Claim ii that only depend on α\alpha, β\beta, η\eta, ρ1\rho_{1}, and ρ2\rho_{2}.

Claim 24.

There exists C24,C24>0C_{\ref{clmThetaLower}},C_{\ref{clmThetaLower}}^{\prime}>0 such that θj/2C24jρ1\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}} and θθj/2C24jρ12jC24\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}\ \forall\ j\geq C_{\ref{clmThetaLower}}^{\prime}.

Proof.

This follows from the choice θj=(j/3)ρ1j\theta_{j^{\prime}}=(j^{\prime}/3)^{\rho_{1}}\ \forall\ j^{\prime}\in\mathbb{N} in Theorem 2. ∎

Claim 25.

There exists C25>0C_{\ref{clmJ2starFin}}>0 such that θj/2J2jC25\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\ \forall\ j\geq C_{\ref{clmJ2starFin}}.

Proof.

This follows from Claim 24 and the fact that J2J_{2}^{\star} is a constant by definition (122). ∎

Claim 26.

There exists C26,C26>0C_{\ref{clmAjDiffOverS}},C_{\ref{clmAjDiffOverS}}^{\prime}>0 such that for any jC26S1/(β1)j\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)},

(184) (AjAj1S+21)1=AjAj1S+21jβ1C26S.\left(\frac{A_{j}-A_{j-1}}{S+2}-1\right)\vee 1=\frac{A_{j}-A_{j-1}}{S+2}-1\geq\frac{j^{\beta-1}}{C_{\ref{clmAjDiffOverS}}^{\prime}S}.
Proof.

By Claim 18, we can find C,C>0C,C^{\prime}>0 depending only on β\beta such that AjAj1Cjβ1A_{j}-A_{j-1}\geq Cj^{\beta-1} whenever jCj\geq C^{\prime}. Hence, for any j(6S/C)1/(β1)Cj\geq(6S/C)^{1/(\beta-1)}\vee C^{\prime}, we know AjAj1Cjβ16SA_{j}-A_{j-1}\geq Cj^{\beta-1}\geq 6S, so

(185) AjAj1S+21Cjβ13S3SCjβ1Cjβ1/23S=Cjβ16S1,\displaystyle\frac{A_{j}-A_{j-1}}{S+2}-1\geq\frac{Cj^{\beta-1}-3S}{3S}\geq\frac{Cj^{\beta-1}-Cj^{\beta-1}/2}{3S}=\frac{Cj^{\beta-1}}{6S}\geq 1,

where we also used S1S\geq 1. The claim follows if we set C26=(6/C)1/(β1)CC_{\ref{clmAjDiffOverS}}=(6/C)^{1/(\beta-1)}\vee C^{\prime} and C26=C/6C_{\ref{clmAjDiffOverS}}^{\prime}=C/6. ∎

Claim 27.

There exists C27,C27>0C_{\ref{clmUglyDiff}},C_{\ref{clmUglyDiff}}^{\prime}>0 such that for any jC27S1/(β1)j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)},

(186) log(Aj11)βlog(j)/2>0,1ψκj2(AjAj1S+21)1>C27K2Sjρ2.\log(A_{j-1}\vee 1)\geq\beta\log(j)/2>0,\quad\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}>\sqrt{\frac{C_{\ref{clmUglyDiff}}^{\prime}K^{2}S}{j^{\rho_{2}}}}.
Proof.

By Claim 26, we can find constants C26,C26>0C_{\ref{clmAjDiffOverS}},C_{\ref{clmAjDiffOverS}}^{\prime}>0 such that for any jC26S1/(β1)j\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)},

2/([(AjAj1)/(S+2)]1)14C26S/jβ14C26K2S/jβ1,2/\sqrt{([(A_{j}-A_{j-1})/(S+2)]-1)\vee 1}\leq\sqrt{4C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}}\leq\sqrt{4C_{\ref{clmAjDiffOverS}}^{\prime}K^{2}S/j^{\beta-1}},

where we also used K1K\geq 1. Furthermore, since ρ2<β1\rho_{2}<\beta-1 by assumption, we can find C>0C>0 depending only on C26C_{\ref{clmAjDiffOverS}}^{\prime}, ψ\psi, β\beta, and ρ2\rho_{2}, such that for any jCj\geq C, we have 4C26/jβ1<(1ψ)2/(4jρ2)4C_{\ref{clmAjDiffOverS}}^{\prime}/j^{\beta-1}<(1-\psi)^{2}/(4j^{\rho_{2}}). Combined with the previous inequality and the choice κj=jρ2/(K2S)\kappa_{j}=j^{\rho_{2}}/(K^{2}S) in Theorem 2, we obtain

1ψκj2(AjAj1S+21)1>(1ψ)2K2S4jρ2j(C26C)S1/(β1).\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}>\sqrt{\frac{(1-\psi)^{2}K^{2}S}{4j^{\rho_{2}}}}\ \forall\ j\geq(C_{\ref{clmAjDiffOverS}}\vee C)S^{1/(\beta-1)}.

Hence, if we set C27=C26C4C_{\ref{clmUglyDiff}}=C_{\ref{clmAjDiffOverS}}\vee C\vee 4 and C27=(1ψ)2/4C_{\ref{clmUglyDiff}}^{\prime}=(1-\psi)^{2}/4, the second inequality in (186) holds for jC27S1/(β1)j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}. Finally, define h(j)=j1jjh(j)=j-1-\sqrt{j}\ \forall\ j\in\mathbb{N}. Then h(4)=1h(4)=1 and h(j)=11/(2j)>0j4h^{\prime}(j)=1-1/(2\sqrt{j})>0\ \forall\ j\geq 4, so h(j)0j4h(j)\geq 0\ \forall\ j\geq 4. Thus, for any jC27S1/(β1)4j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}\geq 4, we know j1jj-1\geq\sqrt{j}, so by Claim 18, log(Aj1)log((j1)β)log(jβ)=βlog(j)/2\log(A_{j-1})\geq\log((j-1)^{\beta})\geq\log(\sqrt{j}^{\beta})=\beta\log(j)/2, i.e., the first inequality in (186) holds. ∎

Claim 28.

There exists C28>0C_{\ref{clmJ1starFin}}>0 such that for any jC28S1/(ρ12(β1))j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}, θθj/2J1\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}.

Proof.

By Claims 26 and 27, we can set C=C26C27C=C_{\ref{clmAjDiffOverS}}\vee C_{\ref{clmUglyDiff}} to ensure that for jCS1/(β1)j\geq CS^{1/(\beta-1)}, AjAj12(S+2)A_{j}-A_{j-1}\geq 2(S+2) and δj,2>0\delta_{j,2}>0. Hence, J1CS1/(β1)J_{1}^{\star}\leq CS^{1/(\beta-1)} by definition (101). On the other hand, by Claim 24, we know θθj/2C24jρ12\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}} for jC24j\geq C_{\ref{clmThetaLower}}^{\prime}. Thus, if we set C28=(C/C24)1/ρ12C24C_{\ref{clmJ1starFin}}=(C/C_{\ref{clmThetaLower}})^{1/\rho_{1}^{2}}\vee C_{\ref{clmThetaLower}}^{\prime}, then for any jC28S1/(ρ12(β1))j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}, we obtain θθj/2C24jρ12\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}} (since jC24j\geq C_{\ref{clmThetaLower}}^{\prime}) and C24jρ12CS1/(β1)C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}\geq CS^{1/(\beta-1)} (since j(C/C24)1/ρ12S1/(ρ12(β1))=((C/C24)S1/(β1))1/ρ12j\geq(C/C_{\ref{clmThetaLower}})^{1/\rho_{1}^{2}}S^{1/(\rho_{1}^{2}(\beta-1))}=((C/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}^{2}}), which implies θθj/2CS1(β1)J1\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq CS^{1(\beta-1)}\geq J_{1}^{\star}. ∎

Claim 29.

There exists C29>0C_{\ref{clmJ3starFin}}>0 such that for any jC29S1/(ρ12(β1))j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}, θj/2J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}.

Proof.

We first upper bound J3J_{3}^{\star}. By Claim 24, we can find C24,C24>0C_{\ref{clmThetaLower}},C_{\ref{clmThetaLower}}^{\prime}>0 such that θjC24jρ1\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}} when jC24j\geq C_{\ref{clmThetaLower}}^{\prime}. Let jC24((C26/C24)S1/(β1))1/ρ1j\geq C_{\ref{clmThetaLower}}^{\prime}\vee((C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}}, where C26C_{\ref{clmAjDiffOverS}} is from Claim 26, and j{θj,,j}j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}. Then jθjC24jρ1C26S1/(β1)j^{\prime}\geq\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}, so we can find C26>0C_{\ref{clmAjDiffOverS}}^{\prime}>0 such that

(K+1)δj,1(K+1)4αlog(Aj)C26S(j)β1(2K)24αlog(j2β)C26(C24jρ1)β1=CK2Slogjjρ1(β1),(K+1)\delta_{j^{\prime},1}\leq(K+1)\sqrt{\frac{4\alpha\log(A_{j^{\prime}})C_{\ref{clmAjDiffOverS}}^{\prime}S}{(j^{\prime})^{\beta-1}}}\leq\sqrt{\frac{(2K)^{2}4\alpha\log(j^{2\beta})C_{\ref{clmAjDiffOverS}}^{\prime}}{(C_{\ref{clmThetaLower}}j^{\rho_{1}})^{\beta-1}}}=\sqrt{\frac{CK^{2}S\log j}{j^{\rho_{1}(\beta-1)}}},

where the first inequality uses Claim 26; the second inequality uses K1K\geq 1, jθjC24jρ1j^{\prime}\geq\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}, and Claim 18; and the equality defines C=32αβC26/C24β1C=32\alpha\beta C_{\ref{clmAjDiffOverS}}^{\prime}/C_{\ref{clmThetaLower}}^{\beta-1}. On the other hand, if jC27S1/(β1)j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}, where C27C_{\ref{clmUglyDiff}} is from Claim 27, we can find C27>0C_{\ref{clmUglyDiff}}^{\prime}>0 such that

δj,2>αβlog(j)C27K2S/(2jρ2)=CK2Slog(j)/jρ2,\delta_{j,2}>\sqrt{\alpha\beta\log(j)C_{\ref{clmUglyDiff}}^{\prime}K^{2}S/(2j^{\rho_{2}})}=\sqrt{C^{\prime}K^{2}S\log(j)/j^{\rho_{2}}},

where the inequality uses Claim 27 and the equality defines C=αβC27/2C^{\prime}=\alpha\beta C_{\ref{clmUglyDiff}}^{\prime}/2. Finally, by assumption ρ2<ρ1(β1)\rho_{2}<\rho_{1}(\beta-1), we can find C′′>0C^{\prime\prime}>0 such that, for any jC′′j\geq C^{\prime\prime}, we have C/jρ2C/jρ1(β1)C^{\prime}/j^{\rho_{2}}\geq C/j^{\rho_{1}(\beta-1)}. Combined with the previous two inequalities, we obtain that for jC24((C26/C24)S1/(β1))1/ρ1(C27S1/(β1))C′′j\geq C_{\ref{clmThetaLower}}^{\prime}\vee((C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}}\vee(C_{\ref{clmUglyDiff}}S^{1/(\beta-1)})\vee C^{\prime\prime} and any j{θj,,j}j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}, δj,2>(K+1)δj,1\delta_{j,2}>(K+1)\delta_{j^{\prime},1}. Therefore, by definition of J3J_{3}^{\star} (137), we conclude that J3C~S1/(ρ1(β1))J_{3}^{\star}\leq\tilde{C}S^{1/(\rho_{1}(\beta-1))}, where C~=C24(C26/C24)C27C′′\tilde{C}=C_{\ref{clmThetaLower}}^{\prime}\vee(C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})\vee C_{\ref{clmUglyDiff}}\vee C^{\prime\prime}. Therefore, if we set C29=C24(C~/C24)1/ρ1C_{\ref{clmJ3starFin}}=C_{\ref{clmThetaLower}}^{\prime}\vee(\tilde{C}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}, we obtain that for any jC29S1/(ρ12(β1))j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}, θj/2C24jρ1\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}} (since jC24j\geq C_{\ref{clmThetaLower}}^{\prime}) and C9jρ1C~S1/(ρ1(β1))C_{9}j^{\rho_{1}}\geq\tilde{C}S^{1/(\rho_{1}(\beta-1))} (since j(C~/C24)1/ρ1S1/(ρ12(β1))=(C~S1/(ρ1(β1))/C24)1/ρ1j\geq(\tilde{C}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}S^{1/(\rho_{1}^{2}(\beta-1))}=(\tilde{C}S^{1/(\rho_{1}(\beta-1))}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}), so stringing together the inequalities, we conclude θj/2C24jρ1C~S1/(ρ1(β1))J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}\geq\tilde{C}S^{1/(\rho_{1}(\beta-1))}\geq J_{3}^{\star}. ∎

Claim 30.

There exists C30>0C_{\ref{clmJ4star}}>0 such that J4C30(Slog(C30S/Δ22)/Δ22)1/(β1)J_{4}^{\star}\leq C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}.

Proof.

Let C~30=16C26αβ/(β1)\tilde{C}_{\ref{clmJ4star}}=16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta/(\beta-1) and C30=C~30(3C~301/(β1))(3C26)16C_{\ref{clmJ4star}}=\tilde{C}_{\ref{clmJ4star}}\vee(3\tilde{C}_{\ref{clmJ4star}}^{1/(\beta-1)})\vee(3C_{\ref{clmAjDiffOverS}})\vee 16, where C26C_{\ref{clmAjDiffOverS}} and C26C_{\ref{clmAjDiffOverS}}^{\prime} are the constants from Claim 26. Also define J4=C30(Slog(C30S/Δ22)/Δ22)1/(β1)J_{4}^{\dagger}=C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}. Then by definition of J4J_{4}^{\star} (147), it suffices to show δj,1<Δ2jJ4/2\delta_{j,1}<\Delta_{2}\ \forall\ j\geq\lfloor\lfloor J_{4}^{\dagger}\rfloor/2\rfloor. Fix such a jj and suppose instead that δj,1Δ2\delta_{j,1}\geq\Delta_{2}. Since C3016C_{\ref{clmJ4star}}\geq 16, we know J4C30(logC30)1/(β1)16J_{4}^{\dagger}\geq C_{\ref{clmJ4star}}(\log C_{\ref{clmJ4star}})^{1/(\beta-1)}\geq 16, so

(187) j(J4/2)1((J41)/2)1=(J4/2)(3/2)>J4/3.j\geq(\lfloor J_{4}^{\dagger}\rfloor/2)-1\geq((J_{4}^{\dagger}-1)/2)-1=(J_{4}^{\dagger}/2)-(3/2)>J_{4}^{\dagger}/3.

Hence, because C303C26C_{\ref{clmJ4star}}\geq 3C_{\ref{clmAjDiffOverS}}, we have jJ4/3C26S1/(β1)j\geq J_{4}^{\dagger}/3\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}, so by Claims 26 and 18, respectively,

(188) δj,124αlog(Aj)C26S/jβ14αlog(j2β)C26S/jβ1=8C26αβSlog(j)/jβ1.\delta_{j,1}^{2}\leq 4\alpha\log(A_{j})C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}\leq 4\alpha\log(j^{2\beta})C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}=8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(j)/j^{\beta-1}.

Rearranging and using the assumption δj,1Δ2\delta_{j,1}\geq\Delta_{2}, this implies jβ18C26αβSlog(j)/Δ22j^{\beta-1}\leq 8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(j)/\Delta_{2}^{2}. Hence, applying Claim 20 with x=jx=j, y=β1y=\beta-1, and z=8C26αβS/Δ22z=8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S/\Delta_{2}^{2}, we obtain

j(16C26αβSlog(16C26αβS/((β1)Δ22))/((β1)Δ22))1/(β1)=(C~30Slog(C~30S/Δ22)/Δ22)1/(β1).j\leq(16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S/((\beta-1)\Delta_{2}^{2}))/((\beta-1)\Delta_{2}^{2}))^{1/(\beta-1)}=(\tilde{C}_{\ref{clmJ4star}}S\log(\tilde{C}_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}.

But since C30C~30(3C~301/(β1))C_{\ref{clmJ4star}}\geq\tilde{C}_{\ref{clmJ4star}}\vee(3\tilde{C}_{\ref{clmJ4star}}^{1/(\beta-1)}), we have shown jJ4/3j\leq J_{4}^{\dagger}/3, which contradicts (187). ∎

Claim 31.

There exists C31>0C_{\ref{clmExpToPoly}}>0 such that, for any j(C31d¯log(C31d¯(n+m)))1/ρ1j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}\bar{d}(n+m)))^{1/\rho_{1}},

(189) (n+m)3exp(θj/2/(3d¯))jρ12(β(32α)+1).(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.
Proof.

Let C~31=12ρ1(β(2α3)1)/C24\tilde{C}_{\ref{clmExpToPoly}}=12\rho_{1}(\beta(2\alpha-3)-1)/C_{\ref{clmThetaLower}} and C31=C243(18/C24)C~31C_{\ref{clmExpToPoly}}=C_{\ref{clmThetaLower}}^{\prime}\vee 3\vee(18/C_{\ref{clmThetaLower}})\vee\tilde{C}_{\ref{clmExpToPoly}}, and suppose instead that (189) fails for some j(C31d¯log(C31d¯(n+m)))1/ρ1j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}\bar{d}(n+m)))^{1/\rho_{1}}. Then we can write

(190) θj/2<9d¯log(n+m)+3d¯ρ12(β(2α3)1)logj.\lfloor\theta_{\lfloor j/2\rfloor}\rfloor<9\bar{d}\log(n+m)+3\bar{d}\rho_{1}^{2}(\beta(2\alpha-3)-1)\log j.

Since C31C243C_{\ref{clmExpToPoly}}\geq C_{\ref{clmThetaLower}}^{\prime}\vee 3 and ρ1(0,1)\rho_{1}\in(0,1), we know that j(C24log(3))1/ρ1>C24j\geq(C_{\ref{clmThetaLower}}^{\prime}\log(3))^{1/\rho_{1}}>C_{\ref{clmThetaLower}}^{\prime}, so θj/2C24jρ1\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}} by Claim 24. Since C31(18/C24)1C_{\ref{clmExpToPoly}}\geq(18/C_{\ref{clmThetaLower}})\vee 1 and d¯1\bar{d}\geq 1, we also have j((18/C24)log(n+m))1/ρ1j\geq((18/C_{\ref{clmThetaLower}})\log(n+m))^{1/\rho_{1}}, or C24jρ1/29d¯log(n+m)C_{\ref{clmThetaLower}}j^{\rho_{1}}/2\geq 9\bar{d}\log(n+m). Combining these two bounds with (190), we conclude

(191) jρ1<(6ρ12(β(2α3)1)/C24)d¯logj=C~31(ρ1/2)d¯logj.j^{\rho_{1}}<(6\rho_{1}^{2}(\beta(2\alpha-3)-1)/C_{\ref{clmThetaLower}})\bar{d}\log j=\tilde{C}_{\ref{clmExpToPoly}}(\rho_{1}/2)\bar{d}\log j.

Applying Claim 20 with x=jx=j, y=ρ1y=\rho_{1}, and z=C~31(ρ1/2)d¯z=\tilde{C}_{\ref{clmExpToPoly}}(\rho_{1}/2)\bar{d}, we obtain j<(C~31d¯log(C~31d¯))1/ρ1j<(\tilde{C}_{\ref{clmExpToPoly}}\bar{d}\log(\tilde{C}_{\ref{clmExpToPoly}}\bar{d}))^{1/\rho_{1}}. But since C31C~31C_{\ref{clmExpToPoly}}\geq\tilde{C}_{\ref{clmExpToPoly}}, this contradicts the assumed lower bound on jj. ∎

Claim 32.

Define C32=C25C28C29C_{\ref{clmJstar}}=C_{\ref{clmJ2starFin}}\vee C_{\ref{clmJ1starFin}}\vee C_{\ref{clmJ3starFin}} and

J\displaystyle J_{\star} =(C32S1/(ρ12(β1)))(C30(Slog(C30S/Δ22)/Δ22)1/(β1))(C31d¯log(C31(n+m)))1/ρ1.\displaystyle=(C_{\ref{clmJstar}}S^{1/(\rho_{1}^{2}(\beta-1))})\vee(C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)})\vee(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}^{\prime}(n+m)))^{1/\rho_{1}}.
=Θ(S1/(ρ12(β1))(Slog(S/Δ2)/Δ22)1/(β1)(d¯log(n+m))1/ρ1).\displaystyle=\Theta\left(S^{1/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{1/(\beta-1)}\vee(\bar{d}\log(n+m))^{1/\rho_{1}}\right).

Then for any jj\in\mathbb{N} such that jJj\geq J_{\star}, we have

(192) jJ4,θj/2J2J3,θθj/2J1(2+jρ12/84)3\displaystyle j\geq J_{4}^{\star},\quad\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\vee J_{3}^{\star},\quad\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee(2+j^{\rho_{1}^{2}}/84)\geq 3
(193) (n+m)3exp(θj/2/(3d¯))jρ12(β(32α)+1).\displaystyle(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.
Proof.

The first bound holds by jC30((S/Δ22)log(C30S/Δ22))1/(β1)j\geq C_{\ref{clmJ4star}}((S/\Delta_{2}^{2})\log(C_{\ref{clmJ4star}}^{\prime}S/\Delta_{2}^{2}))^{1/(\beta-1)} and Claim 30. The second holds since θj/2J2\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star} by jC25j\geq C_{\ref{clmJ2starFin}} and Claim 25, and since θj/2J3\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star} by jC29S1/(ρ12(β1))j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))} and Claim 29. The third holds since θθj/2J1\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star} by jC28S1/(ρ12(β1))j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))} and Claim 28, and since θθj/2(2+jρ12/48)\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq(2+j^{\rho_{1}^{2}}/48) for large enough C24C_{\ref{clmThetaLower}}^{\prime}. The fourth holds since θθj/2>2\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor>2 (by the third) and θθj/2\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\in\mathbb{N}. The fifth holds by j(C31d¯log(C31(n+m)))1/ρ1j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}^{\prime}(n+m)))^{1/\rho_{1}} and Claim 31. ∎

F.4. Calculations for the later regret

In this sub-appendix, we assume α\alpha, β\beta, η\eta, θj\theta_{j}, κj\kappa_{j}, ρ1\rho_{1}, and ρ2\rho_{2} are chosen as in Theorem 2. Recall CiC_{i}, CiC_{i}^{\prime}, etc. denote constants associated with Claim ii that only depend on α\alpha, β\beta, η\eta, ρ1\rho_{1}, and ρ2\rho_{2}.

Claim 33.

There exists C33>0C_{\ref{clmSmallTtheta}}>0 such that, for any γi(0,1)\gamma_{i}\in(0,1),

(194) θTγi/ββ4αKlog(T)/Δ2logT(C33/γi)log(C33K/(Δ2γi)).\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\leq 4\alpha K\log(T)/\Delta_{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTtheta}}/\gamma_{i})\log\left(C_{\ref{clmSmallTtheta}}K/(\Delta_{2}\gamma_{i})\right).
Proof.

Similar to Claim 24 from Appendix F.3, we can find constants C,C>0C,C^{\prime}>0 such that for any jCj\geq C^{\prime}, θjCjρ1\lfloor\theta_{j}\rfloor\geq Cj^{\rho_{1}}. If Tγi/β<C\lceil T^{\gamma_{i}/\beta}\rceil<C^{\prime}, then TγiTγi/ββ<(C)βT^{\gamma_{i}}\leq\lceil T^{\gamma_{i}/\beta}\rceil^{\beta}<(C^{\prime})^{\beta}, so logT(β/γi)log(C)\log T\leq(\beta/\gamma_{i})\log(C^{\prime}), and the right side of (194) will hold for C33βCC_{\ref{clmSmallTtheta}}\geq\beta\vee C^{\prime}. If instead Tγi/βC\lceil T^{\gamma_{i}/\beta}\rceil\geq C^{\prime}, then θTγi/βCTγi/βρ1CTγiρ1/β\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\geq C\lceil T^{\gamma_{i}/\beta}\rceil^{\rho_{1}}\geq CT^{\gamma_{i}\rho_{1}/\beta}, so if the left side of (194) holds, we can write

CβTγiρ1=(CTγiρ1/β)βθTγi/ββ4αlog(T)/Δ2.C^{\beta}T^{\gamma_{i}\rho_{1}}=(CT^{\gamma_{i}\rho_{1}/\beta})^{\beta}\leq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\leq 4\alpha\log(T)/\Delta_{2}.

Hence, applying Claim 20 with x=Tx=T, y=γiρ1y=\gamma_{i}\rho_{1}, and z=4α/(CβΔ2)z=4\alpha/(C^{\beta}\Delta_{2}), we obtain

logTlog(8α/(Δ2γiCβρ1))2/(γiρ1)(C33/γi)log(C33K/(Δ2γi)),\log T\leq\log(8\alpha/(\Delta_{2}\gamma_{i}C^{\beta}\rho_{1}))^{2/(\gamma_{i}\rho_{1})}\leq(C_{\ref{clmSmallTtheta}}/\gamma_{i})\log(C_{\ref{clmSmallTtheta}}K/(\Delta_{2}\gamma_{i})),

where the last inequality holds for any C33(8α/(Cβρ1))(2/ρ1)C_{\ref{clmSmallTtheta}}\geq(8\alpha/(C^{\beta}\rho_{1}))\vee(2/\rho_{1}). ∎

Claim 34.

There exists C34>0C_{\ref{clmSmallTKappa1}}>0 such that, for any γi(0,1)\gamma_{i}\in(0,1),

κTγi/β4αKlog(T)/Δ2logT(C34/γi)log(C34K/(Δ2γi)).\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\leq 4\alpha K\log(T)/\Delta_{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTKappa1}}/\gamma_{i})\log\left(C_{\ref{clmSmallTKappa1}}K/(\Delta_{2}\gamma_{i})\right).
Proof.

Recall κj=jρ2/(K2S)\kappa_{j}=j^{\rho_{2}}/(K^{2}S) in Theorem 2. Hence, because SKS\leq K, we know that κTγi/βTγiρ2/β/K3\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\geq T^{\gamma_{i}\rho_{2}/\beta}/K^{3}. Rearranging and using the assumed bound, we obtain

Tγiρ2/βK3κTγi/βK34αKlog(T)/Δ2=(4αK4/Δ2)logT(4αK/Δ2)4logT,T^{\gamma_{i}\rho_{2}/\beta}\leq K^{3}\cdot\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\leq K^{3}\cdot 4\alpha K\log(T)/\Delta_{2}=(4\alpha K^{4}/\Delta_{2})\log T\leq(4\alpha K/\Delta_{2})^{4}\log T,

where the last inequality uses α1\alpha\geq 1 and Δ2(0,1)\Delta_{2}\in(0,1). Applying Claim 20 with x=Tx=T, y=γiρ2/βy=\gamma_{i}\rho_{2}/\beta, and z=(4αK/Δ2)4z=(4\alpha K/\Delta_{2})^{4}, and noting that 2/y(2/y)42/y\leq(2/y)^{4} (since γi(0,1)\gamma_{i}\in(0,1) and ρ2(0,β1)\rho_{2}\in(0,\beta-1)), we obtain

logTlog(8αβK/(Δ2γiρ2))8β/(γiρ2)(C34/γi)log(C34K/(Δ2γi)),\log T\leq\log(8\alpha\beta K/(\Delta_{2}\gamma_{i}\rho_{2}))^{8\beta/(\gamma_{i}\rho_{2})}\leq(C_{\ref{clmSmallTKappa1}}/\gamma_{i})\log(C_{\ref{clmSmallTKappa1}}K/(\Delta_{2}\gamma_{i})),

where the second inequality holds for C348αβ/ρ2C_{\ref{clmSmallTKappa1}}\geq 8\alpha\beta/\rho_{2}. ∎

Claim 35.

There exists C35>0C_{\ref{clmSmallTKappa2}}>0 such that, for any γi(0,1)\gamma_{i}\in(0,1),

jTγi/βs.t.κj1+4αlog(Aj)/Δ22logT(C35/γi)log(C35K/(Δ2γi)).\exists\ j\geq\lceil T^{\gamma_{i}/\beta}\rceil\ s.t.\ \lceil\kappa_{j}\rceil\leq 1+4\alpha\log(A_{j})/\Delta_{2}^{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTKappa2}}/\gamma_{i})\log\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right).
Proof.

Fix jTγi/βj\geq\lceil T^{\gamma_{i}/\beta}\rceil such that κj1+4αlog(Aj)/Δ22\lceil\kappa_{j}\rceil\leq 1+4\alpha\log(A_{j})/\Delta_{2}^{2}. Note that if j=1j=1, then 1Tγi/βTγi/β1\geq\lceil T^{\gamma_{i}/\beta}\rceil\geq T^{\gamma_{i}/\beta}, so since TT\in\mathbb{N}, we must have T=1T=1. This implies logT=0\log T=0, so the claimed bound is immediate. Hence, we assume j2j\geq 2 moving forward. We first observe

jρ2K3κjK3(1+4αlogAjΔ22)K3(1+8αβlogjΔ22)K3(16αβlogjΔ22)=16αβK3logjΔ22.j^{\rho_{2}}\leq K^{3}\lceil\kappa_{j}\rceil\leq K^{3}\left(1+\frac{4\alpha\log A_{j}}{\Delta_{2}^{2}}\right)\leq K^{3}\left(1+\frac{8\alpha\beta\log j}{\Delta_{2}^{2}}\right)\leq K^{3}\left(\frac{16\alpha\beta\log j}{\Delta_{2}^{2}}\right)=\frac{16\alpha\beta K^{3}\log j}{\Delta_{2}^{2}}.

where the inequalities use κj=jρ2/(K2S)jρ2/K3\lceil\kappa_{j}\rceil=\lceil j^{\rho_{2}}/(K^{2}S)\rceil\geq j^{\rho_{2}}/K^{3}, the assumed upper bound on κj\lceil\kappa_{j}\rceil, Claim 18, and 8αβlog(j)/Δ2218\alpha\beta\log(j)/\Delta_{2}^{2}\geq 1 (since α,β1\alpha,\beta\geq 1, j2j\geq 2, and Δ2(0,1)\Delta_{2}\in(0,1)), respectively. Applying Claim 20 with x=jx=j, y=ρ2y=\rho_{2}, and z=16αβK3/Δ22z=16\alpha\beta K^{3}/\Delta_{2}^{2}, and noting that 2z/y(32αβK/(Δ2ρ2))32z/y\leq(32\alpha\beta K/(\Delta_{2}\rho_{2}))^{3} (since α1\alpha\geq 1, ρ2(0,β1)\rho_{2}\in(0,\beta-1), and Δ2(0,1)\Delta_{2}\in(0,1)), we obtain j(32αβK/(Δ2ρ2))6/ρ2(C35K/(Δ2γi))6/ρ2j\leq(32\alpha\beta K/(\Delta_{2}\rho_{2}))^{6/\rho_{2}}\leq(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i}))^{6/\rho_{2}} for any C3532αβ/ρ2C_{\ref{clmSmallTKappa2}}\geq 32\alpha\beta/\rho_{2}. Therefore, by assumption jTγi/βj\geq\lceil T^{\gamma_{i}/\beta}\rceil, we obtain that for any C3532αβ/ρ2C_{\ref{clmSmallTKappa2}}\geq 32\alpha\beta/\rho_{2},

TTγi/ββ/γijβ/γi(C35K/(Δ2γi))6β/(ρ2γi)(C35K/(Δ2γi))C35/γi.T\leq\lceil T^{\gamma_{i}/\beta}\rceil^{\beta/\gamma_{i}}\leq j^{\beta/\gamma_{i}}\leq\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right)^{6\beta/(\rho_{2}\gamma_{i})}\leq\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right)^{C_{\ref{clmSmallTKappa2}}/\gamma_{i}}.\qed
Claim 36.

There exists C36>0C_{\ref{clmSmallTSum}}>0 such that, for any γi(0,1)\gamma_{i}\in(0,1),

(195) Δ2(2α3)8αK2log(T)j=Tγi/β(κj1)32αlogT(C36/γi)log(C36K/(Δ2γi)).\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}\leq\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTSum}}/\gamma_{i})\log(C_{\ref{clmSmallTSum}}K/(\Delta_{2}\gamma_{i})).
Proof.

We first eliminate the corner case where min{Tγi/β,κTγi/β}<2\min\{T^{\gamma_{i}/\beta},\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\}<2. In this case, one of Tγi/β<2T^{\gamma_{i}/\beta}<2 and κTγi/β<2\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}<2 must hold. If the former holds, then logT<(β/γi)log2\log T<(\beta/\gamma_{i})\log 2, and if the latter holds, then 2>κTγi/β=Tγi/βρ2/(K2S)Tγiρ2/β/K32>\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}=\lceil T^{\gamma_{i}/\beta}\rceil^{\rho_{2}}/(K^{2}S)\geq T^{\gamma_{i}\rho_{2}/\beta}/K^{3}, so logT(β/(γiρ2))log(2K3)\log T\leq(\beta/(\gamma_{i}\rho_{2}))\log(2K^{3}). In both cases, we can clearly find C36>0C_{\ref{clmSmallTSum}}>0 satisfying the right side of (195).

Next, we assume κTγi/β2\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\geq 2 and Tγi/β2T^{\gamma_{i}/\beta}\geq 2. By monotonicity, the former implies κj2\kappa_{j}\geq 2 for any jTγi/βj\geq\lceil T^{\gamma_{i}/\beta}\rceil. For any such jj, by definition and SKS\leq K, we can then write

κj1κj1κj/2=jρ2/(2K2S)jρ2/(2K3).\lceil\kappa_{j}\rceil-1\geq\kappa_{j}-1\geq\kappa_{j}/2=j^{\rho_{2}}/(2K^{2}S)\geq j^{\rho_{2}}/(2K^{3}).

Therefore, since 32α<03-2\alpha<0 by assumption in Theorem 2, we obtain

j=Tγi/β(κj1)32α22α3K3(2α3)j=Tγi/βjρ2(32α).\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\leq 2^{2\alpha-3}K^{3(2\alpha-3)}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}j^{\rho_{2}(3-2\alpha)}.

For the summation at right, we use Tγi/β2T^{\gamma_{i}/\beta}\geq 2 (which implies Tγi/β1Tγi/β1Tγi/β/2\lceil T^{\gamma_{i}/\beta}\rceil-1\geq T^{\gamma_{i}/\beta}-1\geq T^{\gamma_{i}/\beta}/2) and ρ2(2α3)>1\rho_{2}(2\alpha-3)>1 by assumption in Theorem 2, along with Claim 19, to write

j=Tγi/βjρ2(32α)(Tγi/β/2)1+ρ2(32α)ρ2(2α3)1=2ρ2(2α3)Tγi(1+ρ2(32α))/β2(ρ2(2α3)1).\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}j^{\rho_{2}(3-2\alpha)}\leq\frac{(T^{\gamma_{i}/\beta}/2)^{1+\rho_{2}(3-2\alpha)}}{\rho_{2}(2\alpha-3)-1}=\frac{2^{\rho_{2}(2\alpha-3)}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}}{2(\rho_{2}(2\alpha-3)-1)}.

Using ρ2(2α3)>1\rho_{2}(2\alpha-3)>1 (by assumption), we also know

Tγi(1+ρ2(32α))/βlog(T)=2βTγi(1+ρ2(32α))/βlog(Tγi(ρ2(2α3)1)/(2β))γi(ρ2(2α3)1)2βTγi(1+ρ2(32α))/(2β)γi(ρ2(2α3)1).T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}\log(T)=\frac{2\beta T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}\log(T^{\gamma_{i}(\rho_{2}(2\alpha-3)-1)/(2\beta)})}{\gamma_{i}(\rho_{2}(2\alpha-3)-1)}\leq\frac{2\beta T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)}}{\gamma_{i}(\rho_{2}(2\alpha-3)-1)}.

Combining the previous three inequalities, we then obtain

log(T)j=Tγi/β(κj1)32α)2(ρ2+1)(2α3)β(ρ2(2α3)1)2K3(2α3)γiTγi(1+ρ2(32α))/(2β).\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha)}\leq\frac{2^{(\rho_{2}+1)(2\alpha-3)}\beta}{(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{3(2\alpha-3)}}{\gamma_{i}}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)}.

Therefore, if the left side of (195) holds, we are guaranteed that

Δ2(2α3)8αK22(ρ2+1)(2α3)β(ρ2(2α3)1)2K3(2α3)γiTγi(1+ρ2(32α))/(2β),\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}\leq\frac{2^{(\rho_{2}+1)(2\alpha-3)}\beta}{(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{3(2\alpha-3)}}{\gamma_{i}}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)},

or, after rearranging, then using α>1\alpha>1 and Δ2,γi(0,1)\Delta_{2},\gamma_{i}\in(0,1),

Tγi(ρ2(2α3)1)/(2β)82(ρ2+1)(2α3)αβ(2α3)(ρ2(2α3)1)2K6α7Δ2γi(C36)6αK6α7Δ2γi(C36KΔ2γi)6α,T^{\gamma_{i}(\rho_{2}(2\alpha-3)-1)/(2\beta)}\leq\frac{8\cdot 2^{(\rho_{2}+1)(2\alpha-3)}\alpha\beta}{(2\alpha-3)(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{6\alpha-7}}{\Delta_{2}\gamma_{i}}\leq(C_{\ref{clmSmallTSum}})^{6\alpha}\frac{K^{6\alpha-7}}{\Delta_{2}\gamma_{i}}\leq\left(\frac{C_{\ref{clmSmallTSum}}K}{\Delta_{2}\gamma_{i}}\right)^{6\alpha},

where the second inequality holds for large C36C_{\ref{clmSmallTSum}} and the third uses α1\alpha\geq 1 and Δ2,γi(0,1)\Delta_{2},\gamma_{i}\in(0,1). Taking logarithms and choosing C36C_{\ref{clmSmallTSum}} appropriately in terms of ρ2\rho_{2}, α\alpha, and β\beta yields the right side of (195). ∎